0% found this document useful (0 votes)
26 views

Lotfi A. Zadeh Auth., Ronald R. Yager, Lotfi A. Zadeh Eds. An Introduction To Fuzzy Logic Applications in Intelligent Systems

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Lotfi A. Zadeh Auth., Ronald R. Yager, Lotfi A. Zadeh Eds. An Introduction To Fuzzy Logic Applications in Intelligent Systems

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 357

AN INTRODUCTION TO

FUZZY LOGIC APPLICATIONS


IN INTELLIGENT SYSTEMS
THE KLUWER INTERNATIONAL SERIES
IN ENGINEERING AND COMPUTER SCIENCE

KNOWLEDGE REPRESENTATION, LEARNING AND


EXPERT SYSTEMS

Consulting Editor

Tom MitcheU
Carnegie Mellon University
UNIVERSAL SUBGOALING AND CHUNKING OF GOAL IDERARCHIES, J.
Laird, P. Rosenbloom, A. Newell, ISBN: 0-89838-213-0
MACHINE LEARNING: A Guide to Current Research, T. Mitchell, J. Carbonell,
R. Michalski, ISBN: 0-89838-214-9
MACHINE LEARNING OF INDUCTIVE BIAS, P. Utgoff, ISBN: 0-89838-223-8
A CONNECTIONIST MACIDNE FOR GENETIC HILLCUMBING,
D. H. Ackley, ISBN: 0-89838-236-X
LEARNING FROM GOOD AND BAD DATA, P. D.Laird, ISBN: 0-89838-263-7
MACIDNE LEARNING OF ROBOT ASSEMBLY PLANS, A. M. Segre,
ISBN: 0-89838-269-6
AUTOMATING KNOWLEDGE ACQUISITION FOR EXPERT SYSTEMS,
S. Marcus, Editor, ISBN: 0-89838-294-7
MACHINE LEARNING, META-REASONING AND LOGICS, P. B.Brazdil,
K. Konolige, ISBN: 0-7923-9047-4
CHANGE OF REPRESENTATION AND INDUCTIVE BIAS: D. P. Benjamin,
ISBN: 0-7923-9055-5
KNOWLEDGE ACQUISITION: SELECTED RESEARCH AND
COMMENTARY, S. Marcus, Editor, ISBN: 0-7923-9062-8
LEARNING WITH NESTED GENERAUZED EXEMPlARS, S.L. Salzberg,
ISBN: 0-7923-9110-1
INCREMENTAL VERSION-SPACE MERGING: A General Framework
for Concept Learning, H. Hirsh, ISBN: 0-7923-9119-5
COMPETITIVELY INHIBITED NEURAL NETWORKS FOR ADAPTIVE
PARAMETER ESTIMATION, M. Lemmon, ISBN: 0-7923-9086-5
STRUCTURE LEVEL ADAPTATION FOR ARTIFICIAL NEURAL
NETWORKS, T.e. Lee, ISBN: 0-7923-9151-9
CONNECTIONIST APPROACHES TO LANGUAGE LEARNING, D. Touretzky,
ISBN: 0-7923-9216-7
AN INTRODUCTION TO
FUZZY LOGIC APPLICATIONS
IN INTELLIGENT SYSTEMS

Edited
by

Ronald R. Yager
Iona College
Lotfi A. Zadeh
University of California, Berkeley

"
~.

SPRINGER SCIENCE+BUSINESS MEDIA, LLC


Library of Congress Cataloging-in-Publieation Data

An Introduction to fuzzy logic applications in intelligent systems /


edited by Ronald R. Yager, Lotfi A. Zadeh.
p. cm. -- (The K1uwer International series in engineering and
computer science ; SECS 165)
Includes bibliographical references and index.
ISBN 978-1-4613-6619-5 ISBN 978-1-4615-3640-6 (eBook)
DOI 10.1007/978-1-4615-3640-6
1. Fuzzy sets. 2. Fuzzy systems. 3. Intelligent control systems.
1. Yager, Ronald R., 1941- II. Zadeh, Lotfi Asker.
III. Series.
QA248.I568 1992
511.3'22--dc20 91-39272
CIP

Copyright © 1992 Springer Science+Business Media New York


Originally published by Kluwer Academic Publishers in 1992
Softcover reprint ofthe hardcover Ist edition 1992
AII rights reserved. No part of this publication may be reproduced, stored in a retrieval
system or transmitted in any form or by any means, mechanical, photo-copying, recording,
or otherwise, without the prior written permission of the publisher, Springer Science+
Business Media, LLC .

Printed on acid-free paper.


CONTENTS

1. Knowledge Representation in Fuzzy Logic


L. A. Zadeh....................................................................... l

2. Expert Systems Using Fuzzy Logic


R. R. Yager..................................................................... 27

3. Fuzzy Rules in Knowledge-Based Systems


D. Dubois, H. Prade....................................................... 45

4. Fuzzy Logic Controllers


H. R. Berenji................................................................... 69

5. Methods and Applications of Fuzzy Mathematical


Programming
H.-J. Zimmerman ............................................................ 97

6. Fuzzy Set Methods in Computer Vision


J. M. Keller, R. Krishnapuram ....................................... 121

7. Fuzziness, Image Information and Scene Analysis


S. K. Pal........................................................................ 147

8. Fuzzy Sets in Natural Language Processing


V. Novak. ...................................................................... 185

9. Fuzzy -Set-Theoretic Applications in Modeling of


Man-Machine Interactions
W. Karwowsk~ G. Salvendy ..........................................201
vi

10. Questionnaires and Fuzziness


B. Bouchon-Meunier..................................................... 221

11. Fuzzy Logic Knowledge Systems and Artificial Neural


Networks in Medicine and Biology
E. Sanchez.................................................................... 235

12. The Representation and Use of Uncertainty and


Metaknowledge in Milord
R. Lopez de Mantaras, c. Sierra, J. Agusti. .................. 253

13. Fuzzy Logic with linguistic Quantifiers in Group


Decision Making
J. Kacprzyk, M. Fedrizzi, H. Nurmi. ............................. 263

14. Learning in Uncertain Environments


M. Botta, A. Giordana, L. Saitta................................... 281

15. Evidential Reasoning Under Probabilistic and


Fuzzy Uncertainties
J. F. Baldwin................................................................ 297

16. Probabilistic Sets-Probabilistic Extension of


Fuzzy Sets
K Hirota ......................................................................335

INDEX....................................................................................355
AN INTRODUCTION TO
FUZZY LOGIC APPLICATIONS
IN INTELLIGENT SYSTEMS
1
KNOWLEDGE REPRESENTATION
IN FUZZY LOGIC

Lotti A. Zadeh
Computer Science Division, Department of EECS
University of California, Berkeley, California 94720

ABSTRACT
The conventional approaches to knowledge representation, e.g., semantic
networlcs, frames, predicate calculus and Prolog, are based on bivalent logic. A seri-
ous shortcoming of such approaches is their inability to come to grips with the issue
of uncertainty and imprecision. As a consequence, the conventional approaches do
not provide an adequate model for modes of reasoning which are approximate rather
than exact Most modes of human reasoning and all of commonsense reasoning fall
into this category.
Fuzzy logic, which may be viewed as an extension of classical logical sys-
tems, provides an effective conceptual framework for dealing with the problem of
knowledge representation in an environment of uncertainty and imprecision. Mean-
ing representation in fuzzy logic is based on test-score semantics. In this semantics,
a proposition is interpreted as a system of elastic constraints, and reasoning is
viewed as elastic constraint propagation. Our paper presents a summary of the basic
concepts and techniques underlying the application of fuzzy logic to knowledge
representation and describes a number of examples relating to its use as a computa-
tional system for dealing with uncertainty and imprecision in the context of
knowledge, meaning and inference.

INTRODUCTION
Knowledge representation is one of the most basic and actively researched
areas of AI (Brachman, 1985,1988; Levesque, 1986, 1987; Moore, 1982, 1984;
Negoita, 1985; Shapiro, 1987; Small, 1988). And yet, there are many important is-
sues underlying knowledge representation which have not been adequately ad-
dressed. One such issue is that of the representation of knowledge which is lexically
imprecise and/or uncertain.
As a case in point, the conventional knowledge representation techniques
do not provide effective tools for representing the meaning of or inferring from the
kind of everyday type facts exemplified by
(a) Usually it takes about an hour to drive from Berkeley to Stanford in light
traffic.
(b) Unemployment is not likely to undergo a sharp decline during the nextfew
months.
( c) Most experts believe that the likelihood of a severe earthquake in the near fu-
ture is very low.
2

The italicized words in these assertions are the labels of fuzzy predicates,
fuzzy quantifiers and fuzzy probabilities. The conventional approaches to
knowledge representation lack the means for representing the meaning of fuzzy con-
cepts. As a consequence, the approaches based on first order logic and classical pro-
bability theory do not provide an appropriate conceptual framework for dealing with
the representation of commonsense knowledge, since such knowledge is by its na-
ture both lexically imprecise and noncategorical (Moore, 1982, 1984; Zadeh, 1984).
The development of fuzzy logic was motivated in large measure by the
need for a conceptual framework which can address the issues of uncertainty and
lexical imprecision. The principal objective of this paper is to present a summary of
some of the basic ideas underlying fuzzy logic and to desCribe their application to
the problem of knowledge representation in an environment of uncertainty and im-
precision. A more detailed discussion of these ideas may be found in Zadeh (197830
1978b, 1986, 1988a) and other entries in the bibliography.

ESSENTIALS OF FUZZY LOGIC


Fuzzy logic, as its name suggests, is the logic underlying modes of reason-
ing which are approximate rather than exact. The importance of fuzzy logic derives
from the fact that most modes of human reasoning-and especially commonsense
reasoning-are approximate in nature. It is of interest to note that, despite its per-
vasiveness, approximate reasoning falls outside the purview of classical logic largely
because it is a deeply entrenched tradition in logic to be concerned with those and
only those modes of reasoning which lend themselves to precise formulation and
analysis.
Some of the essential characteristics of fuzzy logic relate to the following.
In fuzry logie, exact reasoning is viewed as a limiting case of approximate
reasoning.
Infuzry logic, everything is a matter of degree.
Any logical system can be fuzzified.
Infuzry logic, knowledge is interpreted a collection of elastic or,
equivalently,fuzry constraint on a collection of variables.
Inference is viewed as a process ofpropagation of elastic constraints.
Fuzzy logic differs from the traditional logical systems both in spirit and in
detail. Some of the principal differences are summarized in the following (Zadeh,
1983b).
Truth: In bivalent logical systems, truth can have only two values: true or
false. In multi valued systems, the truth value of a proposition may be an element of
(a) a finite set; (b) an interval such as [0,1]; or (c) a boolean algebra. In fuzzy logic,
the truth value of a proposition may be a fuzzy subset of any partially ordered set but
usually it is assumed to be a fuzzy subset of the interval [0,1] or, more simply, a
point in this interval. The so-called linguistic truth values expressed as true, very
true, not quite true, etc. are interpreted as labels of fuzzy subsets of the unit interval.
Predicates: In bivalent systems, the predicates are crisp, e.g., mortal, even,
larger than. In fuzzy logic, the predicates are fuzzy, e.g., tall, ill, soon, swift, much
larger than. It should be noted that most of the predicates in a natural language are
fuzzy rather than crisp.
Predicate Modifiers: In classical systems, the only widely used predicate
modifier is the negation, not. In fuzzy logic, there is a variety of predicate modifiers
3

which act as hedges, e.g., very, more or less, quite, rather, extremely. Such predi-
cate modifiers play an essential role in the generation of the values of a linguistic
variable, e.g., very young, not very young, more or less young, etc., (Zadeh, 1973).
Quantifiers: In classical logical systems there are just two quantifiers:
universal and existential. Fuzzy logic admits, in addition, a wide variety of fuzzy
quantifiers exemplified by few, several, usually, most, almost always, frequently,
about five, etc. In fuzzy logic, a fuzzy quantifier is interpreted as a fuzzy number or
a fuzzy proportion (Zadeh, 1983a).
Probabilities: In classical logical systems, probability is numerical or
interval-valued. In fuzzy logic, one has the additional option of employing linguistic
or, more generally, fuzzy probabilities exemplified by likely, unlikely, very likely,
around 0.8, high, etc. (Zadeh 1986). Such probabilities may be interpreted as fuzzy
numbers which may be manipulated through the use of fuzzy arithmetic (Kaufmann
and Gupta, 1985).
In addition to fuzzy probabilities, fuzzy logic makes it possible to deal with
fuzzy events. An example of a fuzzy event is: tomorrow will be a warm day, where
warm is a fuzzy predicate. The probability of a fuzzy event may be a crisp or fuzzy
number (Zadeh, 1968).
It is important to note that from the frequentist point of view there is an in-
terchangeability between fuzzy probabilities and fuzzy quantifiers or, more general-
ly, fuzzy measures. In this perspective, any proposition which contains labels of fuz-
zy probabilities may be expressed in an equivalent from which contains fuzzy
quantifiers rather than fuzzy probabilities.
Possibilities: In contrast to classical modal logic, the concept of possibility
in fuzzy logic is graded rather than bivalent. Furthermore, as in the case of probabil-
ities, possibilities may be treated as linguistic variables with values such as possible,
quite possible, almost impossible, etc. Such values may be interpreted as labels of
fuzzy subsets of the real line.
A concept which plays a central role in fuzzy logic is that of a possibility
distribution (Zadeh, 1978a; Dubois and Prade, 1988; Klir, 1988). Briefly, if X is a
variable taking values in a universe of discourse U, then the possibility distribution
of X, n x , is the fuzzy set of all possible values of X. More specifically, let 1Cx(u)
denote the possibility that X can take the value u, u E U. Then the membership
function of X is numerically equal to the possibility distribution function 1Cx(u): U-
>[0, 1], which associates with each element u E U the possibility that X may take u
as its value. More about possibilities and possibility distributions will be said at a
later point in this paper.
It is important to observe that in every instance fuzzy logic adds to the op-
tions which are available in classical logical systems. In this sense, fuzzy logic may
be viewed as an extension of such systems rather than as system of reasoning which
is in conflict with the classical systems.
Before taking up the issue of knowledge representation in fuzzy logic, it
will be helpful to take a brief look at some of the principal modes of reasoning in
fuzzy logic. These are the following, with the understanding that the modes in ques-
tion are not necessarily disjoint.
1. Categorical Reasoning
In this mode of reasoning, the premises contain no fuzzy quantifiers and no fuzzy
probabilities. A simple example of categorical reasoning is:
4

Carol is slim
Carol is very intelligent
Carol is slim and very intelligent
In the premises, slim and vel)' intelligent are assumed to be fuzzy predicates. 1be
fuzzy predicate in the conclusion, slim and very intelligent, is the conjunction of slim
and intelligent.
Another example of categorical reasoning is:
Mary is young
John is much older than Mary
John is (much older young).
where (much_older young) represents the composition of the binary fuzzy predicate
much_older with the unary fuzzy predicate young. More specifically, let 1tm..ch DlMr
and n,......, denote the possibility distribution functions associated with the fuzzy
predicates much_older and young, respectively. 1ben, the possibility distribution
function of John's age may be expressed as (Zadeh, 1978a)

1tA,.(Jolut) (u) =v
y (1tm..ch_oIMr (u , v) " n,......, (nu)
where v and " stand for max and min, respectively.
2. Syllogistic Reasoning
In contrast to categorical reasoning, syllogistic reasoning relates to inference from
premises containing fuzzy quantifiers (Zadeh, 1985; Dubois and Prade, 1978a). A
simple example of syllogistic reasoning is the following
most Swedes are blond
most blond Swedes are tall
mos? Swedes are blond and tall

where the fuzzy quantifier most is interpreted as a fuzzy proportion and mos? is the
square of most in fuzzy arithmetic (Kaufmann and Gupta, 1985).
3. Dispositional Reasoning
In dispositional reasoning the premises are dispositions, that is, propositions which
are preponderantly but necessarily always tlUe (Zadeh. 1987). An example of disp0-
sitional reasoning is:
heavy smolcing is a leading cause of cancer
to avoid lung cancer avoid heavy smolcing

Note that in this example the conclusion is a maxim which may be intetpreted as a
dispositional command. Another example of dispositional reasoning is:
usually the probability offailUTe is not very low
usually the probability offailUTe is not very high
(2 usually 9 1) the probability offailure is not very low and not very high
In this example, usually is a fuzzy quantifier which is intetpreted as a fuzzy propor-
tion and 2 usually 9 1 is a fuzzy arithmetic expression whose value may be comput-
ed through the use of fuzzy arithmetic. (9 denotes the operation of subtraction in
fuzzy arithmetic.) It should be noted that the concept of usuality plays a key role in
dispositional reasoning (Zadeh, 1985, 1987), and is the concept that links together
5

the dispositional and syllogistic modes of reasoning. Furthermore, it underlies the


theories of noomonotonic and default reasoning (McCarthy, 1980; McDermott,
1980, 1982; Reiter, 1983).
4. Qualitative Reasoning
In fuzzy logic, the term qualitative reasoning refers to a mode of reasoning in which
the input-output relation of a system is expressed as a collection of fuzzy if-then
rules in which the antecedents and consequents involve linguistic variables (Zadeh,
1975, 1989). In this sense, qualitative reasoning in fuzzy logic bears some similarity
t<>-but is not coextensive with-qualitative reasoning in AI (de Kleer, 1984;
Forbus, 1989; Kuipers, 1986).
A very simple example of qualitative reasoning is:
volume is small ifpressure is high
volume is large ifpressure is low
volume is (wi " high + w2 " large) ifpressure is medium
where + should be inteqneted as infix max; and
wi = sup (high" medium)
and
=
w2 sup (low" medium)
are weighting coefficients which represent, respectively, the degrees to which the an-
tecedents high and low match the input medium. In wi, the conjunction high A
medium represents the intersection of the possibility distributions of high and low,
and the suprenum is taken over the domain of high and medium. The same applies to
w2.
Qualitative reasoning underlies many of the applications of fuzzy logic in
the realms of control and systems analysis (Sugeno, 1985; Pospelov, 1987; Togai,
1986). In this connection, it should be noted that fuzzy Prolog provides an effective
knowledge representation language for qualitative reasoning (Baldwin, 1984, 1987;
Mukaidono, 1987; Zadeh, 1989).

MEANING AND KNOWLEDGE REPRESENTATION


In a general setting, knowledge may be viewed as a collection of proposi-
tions, e.g.,
Mary is young
Pat is much taller than Mary
overeating causes obesity
most Swedes are blond
tomatoes are red unless they are unripe
usually high quality goes with high price
ifpressure is high then volume is low
To constitute knowledge a proposition must be understood. In this sense,
meaning and knowledge are closely interrelated In fuzzy logic, meaning
representation-and thus knowledge representation---is based on test-score seman-
tics (Zadeh, 1978a, 1986).
A basic idea underlying test-score semantics is that a proposition in a natur-
al language may be viewed as a collection of elastic, or, equivalently, fuzzy con-
straints. For example, the proposition Mary is tall represents an elastic constraint on
the height of Mary. Similarly, the proposition Jean is blonde represents an elastic
constraint on the color of Jean's hair. And, the proposition most tall men are not
very agile represents an elastic constraint on the proportion of men who are not very
6

agile amoog tall men.


In more concrete tenos, representing the meaning of a proposition, p,
through the use of test-score semantics involves the following steps.
1. ldentificatioo of the variables X I, . . . • X. whose values are constrained by the
proposition. Usually, these variables are implicit rather than explicit in p.
2. Identification of the constraints C I • .. . • Cm which are induced by p.
3. Owacterization of each constraint, Cj , by describing a testing procedure
which associates with C j a test score 'tj representing the degree to which Cj is
satisfied. Usually'tj is expressed as a number in the interval [0,1). More gen-
erally, however, a test score may be a probabilitylpossibility distribution over
the unit interval.
4. Aggreg~tion of_the partial test scores 'tl •...• 'tm into a smaller number of test
SCOl'l!!' 't]> . .-.: , 'tot , which are represented as an overall vector test score
't =('tl, ...• 'tot) . In most cases k =1, so that the overall test scores is a scalar.
We shall assume that this is the case unless an explicit statement to the con-
trary is made.
It is important to note that, in test-score semantics, the meaning of p is
represented not by the overall test score 't but by the procedure which leads to it.
Viewed in this perspective, test-score semantics may be regarded as a generalization
of truth-conditional, possible-world and model-theoretic semantics. However, by
providing a computational framework for dealing with uncertainty and
dispositionality-which the conventional semantical systems disregard-test-score
semantics achieves a much higher level of expressive power and thus provides a
basis for representing the meaning of a much wider variety of propositioos in a na-
turallanguage.
In test-score semantics, the testing of the constraints induced by P is per-
fonoed on a collection of fuzzy relatioos which constitute an explanatory dJJtabase,
or ED for short. A basic assumption which is made about the explanatory database is
that it is comprised of relations whose meaning is known to the addressee of the
meaning-representation process. In an indirect way, then, the testing and aggrega-
tion procedures in test-score semantics may be viewed as a description of a process
by which the meaning of p is composed from the meanings of the constituent rela-
tions in the explanatory database. It is this explanatory role of the relatioos in ED that
motivates its description as an explanatory dJJtabase.
As will be seen in the sequel, in describing the testing procedures we need
not concern ourselves with the actual entries in the constituent relations. Thus, in
general, the description of a test involves only the frames of the constituent relations,
that is, their names, their variables (or attributes) and the domain of each variable.
As a simple illustration of the concept of a test procedure, consider the pro-
positioo p ~ Maria is young and attractive. The ED in this case will be assumed to
consist of the following relations:
ED ~POPULATION [Name; Age; J.lAttractive) + YOUNG [Age; J.L) , (3.1)
in which + should be read as "and," and ~ stands for "denotes."
The relation labeled POPULATION consists of a collection of triples whose
first element is the name of an individual; whose second element is the age of that in-
dividual; and whose third element is the degree to which the individual in question is
attractive. The relation YOUNG is a collection of pairs whose first element is a value
of the variable Age and whose second element is the degree to which that value of
7

Age satisfies the elastic consttaint characterized by the fuzzy predicate young . In
effect, this relation serves to calibrate the meaning of the fuzzy predicate young in a
particular context by representing its denotation as a fuzzy subset, YOUNG, of the in-
terval [0,100].
With this ED, the test procedure which computes the overall test score may
be described as follows:
1. Detennine the age of Maria by reading the value of Age in POPULATION, with
the variable Name bound to Maria. In symbols, this may be expressed as
Age (Maria) = Age POPULATION [Name = Maria] .
In this expression, we use the notation yR [X =a] to signify that X is bound to
a in R and the resulting relation is projected on Y, yielding the values of Y in
the tuples in which X =a.
2. Test the elastic consttaint induced by the fuzzy predicate young:
't) = I' YOUNG [Age = Age (Maria)]

3. Detennine the degree to which Maria is attractive:


't2 =I1Atrracti.. POPULATION[Name =Maria]
4. Compute the overall test score by aggregating the partial test scores 't) and ~.
For this pUIpOSe, we shall use the min operator A as the aggregation operator,
yielding
(3.2)

which signifies that the overall test score is taken to be the smaller of the
operands of A. The overall test score, as expressed by (3.2), represents the
compatibility of p ~ Maria is young and attractive with the data resident in
the explanatory database.
In testing the constituent relations in ED, it is helpful to have a collection of
standardized translation rules for computing the test score of a combination of elastic
constraints C), ...• Ck from the knowledge of the test scores of each consttaint con-
sidered in isolation. For the most part, such rules are default rules in the sense that
they are intended to be used in the absence of alternative rules supplied by the user.
For pwposes of knowledge representation, the principal rules of this type
are the following.

1. Rules pertaining to modification


H the test score for an elastic constraint C in a specified context is 't, then in
the same context the test score for
(a) not C is l-'t (negation)
(b) very C is r (concentration)
)

(c) more or less C is 't"2 (diffusion) .

2. Rules pertaining to composition


H the test scores for elastic consttaints C) and C 2 in a specified context are
't) and ~, respectively, then in the same context the test score for
(a) C) and C 2 is 't) ~ ~ (conjunction), where ~ ~ min.
8

(b) C 1 or C 2 is 'tl V't2 (disjunction), where v ~ max.


(c) IfClthen C 2 isiA (l-tl+'t2) (implication).
3. Rules pertaining to quantification
1be rules in question apply to propositions of the general fonn
Q A's are B 's ,where Q is a fuzzy quantifier, e.g., most, many, several, few, etc,
and A and B are fuzzy sets, e.g., tall men, intelligent men, etc. As was stated earlier,
when the fuzzy quantifiers in a proposition are implied rather than explicit, their
suppression may be placed in evidence by referring to the proposition as a disposi-
tion. In this sense, the proposition overeating causes obesity is a disposition which
results from the suppression of the fuzzy quantifier most in the proposition most of
those who overeat are obese.
To make the concept of a fuzzy quantifier meaningful, it is necessary to
define a way of counting the number of elements in a fuzzy set or, equivalently, to
detennine its cardinality.
There are several ways in which this can be done (Zadeh, 1978a; Dubois
and Prade, 1985; Yager, 1980). For our purposes, it will suffice to employ the con-
cept of a sigma-count, which is defined as follows:

Let F be a fuzzy subset of U = {u I' . .. ,un}

expressed symbolically as
F =Jl.1/ul+·· ·+Jl.nlun =I:jJl.j/Uj
or, more simply, as

F =Jl.IUt+·· ·+Jl.nUn ,
in which the tenn Jl.j IUj, i = 1, ... ,n, signifies that Jl.j is the grade of membership of
Uj in F, and the plus sign represents the union.
The sigma-count of F is defined as the arithmetic sum of the Jl.j , i.e.,
I: Count(F) ~ I:jJl.j ,i = 1, . . . ,n ,
with the understanding that the sum may be rounded, if need be, to the nearest in-
teger. Furthennore, one may stipulate that the terms whose grade of membership
falls below a specified threshold be excluded from the summation. The purpose of
such an exclusion is to avoid a situation in which a large number of terms with low
grades of membership become count-equivalent to a small number of terms with
high membership.
The relative sigma-count, denoted by I: Count(F IG), may be intexpreted as
the proportion of elements of F which are in G. More explicitly,
I: Count(F rP)
I:Count(FlG) = I:Count(G) ,

where F rP ,the intersection of F and G, is defined by

Jl.FIlG(u)=Jl.F(U) A JI.G(U) , U E U.

Thus, in terms of the membership functions of F and G, the the relative sigma-count
of F in G is given by

I:C (FIG) = I:jJl.F(Uj) A JI.G(Uj)


ount ...
,k,jJI.G (Uj)
9

The concept of a relative sigma-count provides a basis for inteIpreting the


meaning of propositions of the form Q A IS are B 's, e.g., most young men are
healthy. More specifically, if the focal variable (i.e., the constrained variable) in the
proposition in question is taken to be the proportion of B's in A's, then the
corresponding translation rule may be expressed as
QA IS are B I S ~!. Count (B fA) is Q .

As an illustration, consider the proposition p ~ over the past few years


Naomi earned far more than most of her close friends. In this case, we shall assume
that the constituent relations in the explanatory database are:
ED ~ INCOME [Name; Amount; Year] +
FRIEND [Name; ~] +
FEW [Number;~] +
FAR.MORE [Incomel; Income2;~] +
MOST [Proportion;~] .
Note that some of these relations are explicit in p; some are not; and that
most of the constituent words in p do not appear in ED.
In what follows, we shall describe the process by which the meaning of p
may be composed from the meaning of the constituent relations in ED. Basically,
this process is a test procedure which tests, scores and aggregates the elastic con-
straints which are induced by p.
1. Find Naomi's income, INj, in Yearj, i = 1,2, 3,... , counting backward from
present In symbols,
INj ~ AmoJUII INCOME [Name =Naomi ;Year =Yearj],

which signifies that Name is bound to Naomi, Year to Yearj, and the resulting
relation is projected on the domain of the attribute Amount, yielding the value
of Amount corresponding to the values assigned to the attributes Name and
Year.
2. Test the constraint induced by FEW:

~j ~ .FEW [Year=Yearj] ,
which signifies that the variable Year is bound to Yearj and the corresponding
value of ~ is read by projecting on the domain of ~.
3. Compute Naomi's total income during the past few years:
TIN ~!.j ~;lNj ,
in which the ~j play the role of weighting coefficients. Thus, we are tacitly as-
suming that the total income earned by Naomi during a fuzzily specified inter-
val of time is obtained by weighting Naomi's income in year Yearj by the de-
gree to which Yearj satisfies the constraint induced by FEW and summing the
weighted incomes.
4. Compute the total income of each Namej (other than Naomi) during the past
few years:
TIName j =!'j ~;lName jj ,
where !Nameji is the income of Namej in Yearj.
10

5. Fmd the fuzzy set of individuals in relation to whom Naomi earned far more.
The grade of membership of Name j in this set is given by
JlFM(Namej)= ,.FARMORE [Income l=TIN; Income '1=TINamej ].

6. Fmd the fuzzy set of close friends of Naomi by intensifying (Zadeh, 1978a)
the relation FRIEND:
CF ~ CWSEFRIEND ~ 2FRIEND,

which implies that


JJ.CF(Namej )=( ,.FRIEND [Name=Namej])2,

where the expression


,.FRIEND [Name=Namej)

represents JJ.F(Namej), that is, the grade of membersbip of Namej in the set of
Naomi's friends.
7. Count the number of close friends of Naomi. On denoting the count in ques-
tion by .I: Count (CF). we have:
Uount(CF) =.I:j JJ.2 FRIEND (Namej)'

8. Find the intersection ofFM with CP. The grade of membership of Namej in the
intersection is given by
JlFMrlCF(Namej)=JJ.FM(Namej) A JlcF(Namej),

where the min operator }., signifies that the intersection is defined as the con-
junction of its operands.
9. Compute the sigma-count of FM nCF:
Uount(FM nCF)=.I:jJJ.FM(Namej)}., JlcF(Namej)'

10. Compute the relative sigma-count of FM in CP, i.e., the proportion of individu-
als in FM nCF who are in CP:
~ Uount(FM nCF)
P= Uount(CF)

11. Test the constraint induced by MOST:


't ~ ,.MOST[Proportion=p),

which expresses the overall test score and thus represents the compatibility of
p with the explanatory database.
In application to the representation of dispositional knowledge, the first step
in the representation of the meaning of a disposition involves the process of explici-
ration, that is, making explicit the implicit quantifiers. As a simple example, consid-
er the disposition
d ~ young men like young women

which may be interpreted as the proposition


p ~ most young men like mostly young women.
11

1be candidate ED for p is assumed to consist of the following relatioD$:


ED ~ POPULATION [ Name; Sex; Age] +
LIKE [ Namel; Name2; 1.1 ] +
MOST [ Proportion; 1.1 ],

in which 1.1 in LIKE is the degree to which Namel likes Name2.


To represent the meaning of p, it is expedient to replace p with the semanti-
cally equivalent proposition
q ~ most young men are P ,

where P is the fuzzy dispositional predicate


P ~ likes mostly young women .

In this way, the representation of the meaning of p is decomposed into two simpler
problems, namely, the representation of the meaning of P, and the representation of
the meaning of q knowing the meaning of P .
1be meaning of P is represented by the following test procedure.
1. Divide POPULATION into the population of males, M.POPULATION, and popula-
tion of females, F.POPULATION:
M.POPULATION ~ N_ .A~ POPULATION [Sex =Male]
F.POPULATION ~ N_ .A~ POPULATION [Sex =Female] ,

where N_ .A~ POPULATION denotes the projection of POPULATION on the attri-


butes Name and Age.
2. For each Namej ,j = 1, . .. ,I , in F.POPULATION, find the age of Namej :
Aj ~Ag. F.POPULATlON[Name =Namej] .

3. For each Namej , find the degree to which Namej is young:


CL; ~J1YOUNG[Age =A j ] ,

where <X; may be interpreted as the grade of membership of Namej in the fuz-
zy set, YW, of young women.
4. For each Name; ,i = 1, ... ,k ,in M.POPULATION, find the age of Name; :
B; ~A~ M.POPULATlON[Name =Name;] .

5. For each Name;, find the degree to which Name; is young:


~; ~ J1 YOUNG [Age =B;] ,
where ~; may be interpreted as the grade of membership of Name; in the fuz-
zy set, YM, of young men.
6. For each Namej' find the degree to which Name; likes Namej :
~;j ~ J1 UKE [Name 1 = Name; ;Name 2 = Namej] ,
with the understanding that ~;j may be interpreted as the grade of membership
of Namej in the fuzzy set, WL;, of women whom Name; likes.
7. For each Namej find the degree to which Name; likes Namej and Namej is
young:
12

'Yjj ~ Uj " J3jj •

Note: As in previous examples, we employ the aggregation operator min (" )


to represent the effect of conjunction. In effect, 'Yjj is the grade of membership
of Namej in the intersection of the fuzzy sets WL j and YW.
8. Compute the relative sigma-count of young women among the women whom
Namej likes:
pj ~ U'ount(YW /WL j)
U'ount(YW n WL j)
U'ount(WL j )
'Ej 'Yjj
= 'Ej J3jj

'Ej Uj " J3jj


=
'Ej J3jj

9. Test the constraint induced by MOST:


'tj ~ IJMOST[Proportion = pj] .
This test score, then, represents the degree to which Namej has the property
expressed by the predicate
P ~ likes mostly young women .
Continuing the test procedure, we have:
10. Compute the relative sigma-count of men who have property P among young
men:
p ~ 'E Count(P IYM)
'E Count(P n YM)
'ECount(YM)

'Ej 'tj " a j

'Ej aj

11. Test the constraint induced by MOST:


't =IJMOST[Proportion =p] .
This test score represents the overall test score for the disposition young men
like young women.

THE CONCEPT OF A CANONICAL FORM AND ITS


APPLICATION TO THE REPRESENTATION OF MEANING
When the meaning of a proposition, p , is represented as a test procedure, it
may be bard to discem in the description of the procedure the underlying structure of
the process through which the meaning of p is constructed from the meanings of the
constituent relations in the explanatory database.
A concept which makes it easier to perceive the logical structure of p and
thus to develop a better understanding of the meaning representation process, is that
of a canonical form of p , abbreviated as cf (p ) (Zadeh, 1978b, 1986).
13

1be concept of a canonical form relates to the basic idea which underlies
test-score semantics, namely, that a proposition may be viewed as a system of elastic
constraints whose domain is a collection of relations in the explanatory database.
Equivalently, let X I, ...• Xn be a collection of variables which are consttained by p .
Then, the canonical form of p may be expressed as
cf(P) ~X is F , (4.1)
=
where X (X I •...• Xn) is the constrained variable which is usually implicit in p ,
and F is a fuzzy relation, likewise implicit in p, which plays the role of an elastic (or
fuzzy) constraint on X. 1be relation between p and its canonical form will be ex-
pressed as
p ~X is F , (4.2)
signifying that the canonical form may be viewed as a representation of the meaning
ofp.
In general. the constrained variable X in cf (P) is not uniquely determined
by p, and is dependent on the focus of attention in the meaning-representation pro-
cess. To place this in evidence, we shall refer to X as the focal variable.
As a simple illustration, consider the proposition
p ~ Anne has blue eyes (4.3)
In this case, the focal variable may be expressed as
X ~ Color (Eyes (Anne» ,
and the elastic constraint is represented by the fuzzy relation BLUE. Thus, we can
write
p ~ Color (Eyes (Anne» is BLUE . (4.4)

As an additional illustration, consider the proposition


p ~ Brian is much taller than Mildred. (4.5)

Here, the focal variable has two components, X =(X I, X 2), where
XI =Height (Brian)
X 2 = Height (Mildred);
and the elastic constraint is characterized by the fuzzy relation MUCH.TAllER
[Height 1; Height2; J.l), in which J.l is the degree to which Height 1 is much taller
than Height2. In this case, we have
p ~ (Height (Brian), Height (Mildred» is MUCH.TALLER. (4.6)

In terms of the possibility distribution of X, the canonical form of p may be


n
interpreted as the assignment of F to x . Thus, we may write
p ~X is F ~ nx = F , (4.7)
in which the equation
nx=F (4.8)
is termed the possibility assignment equation (Zadeh 1978b). In effect, this equation
signifies that the canonical form cf (p ) ~ X is F implies that
14

Poss{X =U}=IlF(U). U e U • (4.9)


where IlF is the membership function of F. It is in this sense that F. acting as an
elastic constraint on X. restricts the possible values which X can take in U. An im-
portant implication of this observation is that a proposition. P. may be intetpreted as
an implicit assignment statement which characterizes the possibility distribution of
the focal variable in p .
As an illustration, consider the disposition
d ~ overeating causes obesity • (4.10)
which upon explicitation becomes
p ~ most of those who overeat are obese (4.11)

If the focal variable in this case is chosen to be the relative sigma-count of


those who are obese among those who overeat. the canonical form of p becomes
I: Count (OBESE/OVEREAT) is MOST • (4.12)
which in virtue of (4.9) implies that
Poss rr. Count (OBESE/OVEREAT) =u} =IlMOST(U) • (4.13)
where IlMOST is the membership function of MOST. What is important to note is that
(4.13) is equivalent to the assertion that the overall test score for p is expressed by
't = IlMOST(I: Count (OBESE /OVEREAT» • (4.14)
in which OBESE, OVEREAT and MOST play the roles of the constituent relations in ED.
It is of interest to observe that the notion of a semantic network may be
viewed as a special case of the concept of a canonical form. As a simple illustration,
consider the proposition
p ~ Richard gave Cindy a red pin . (4.15)
As a semantic netwolk. this proposition may be represented in the standard form:
Agent (GWE) = Richard (4.16)
Recipient (GWE) =Cindy
Time (GWE) =Past
Object (GWE) = Pin
Color (Pin) = Red .

Now. if we identify X I with Agent (GNE). X z with Recipient (GWE). etc.• the se-
mantic netwolk representation (4.16) may be regarded as a canonical form in which
X =(X\ •... •XS).and
XI = Richard (4.17)
X z = Cindy
X3 is Past
X 4 is Pin
Xs is Red
15

More generally, since any semantic network may be expressed as a collection of tri-
ples of the fODD (Object, Attribute, Attribute Value), it can be transfoDDed at once
into a canonical fODD. However, since a canonical fODD has a much greater expres-
sive power than a semantic network, it may be difficult to transfoDD a canonical
fODD into a semantic network.

INFERENCE
'The concept of a canonical fODD provides a convenient framework for
representing the rules of inference in fuzzy logic. Since the main concern of the pa-
per is with knowledge representation rather than with inference, our discussion of
the rules of inference in fuzzy logic in this section has the fODDat of a summary.
In the so-called categorical rules of inference, the premises are assumed to
be in the canonical fODD X is A or the conditional canonical fODD X is A if Y is B,
where A and B are fuzzy predicates (or relations). In the syllogistic rules, the prem-
ises are expressed as Q A's are B's, where Q is a fuzzy quantifier and A and Bare
fuzzy predicates (or relations).
'The rules in question are the following

CATEGORICAL RULES

X, Y, Z, . .. ~ variables taking values in U, V, W, . . .

Examples
X ~ Age(Mary), Y = Distance(Pl,P2)
A, B , C, .. . = fuzzy predicates (relations)
Examples
A =small, B =much larger

ENTAiLMENT RULE

XisA
A c B ~ ~A (u) S ~B (u), u e U
XisB

Example
Mary is very young
very young c young

Mary is young
16

CONJUNCTION RULE

XisA
XisB
XisA nB -+ IlA roB (U) =IlA (U) A IlB (U)
n =intersection (conjunction)
Example
pressure is not very high
pressure is not very low
pressure is not very high and not very low

DISJUNCTION RULE

XisA
or XisB

XisA UB -+ IlAUB (u) = IlA (u) V IlB (u)


U = union (disjunction)

PROJECTION RULE

(X,Y)isR
X is J!1 -+ IlxR (u)=sUPv IlR (u, v)
J!1 ~ projection of R on U

Example
(X.Y) is close to (3.2)
X is close to 3

COMPOSITIONAL RULE

(X ,Y) is R -+ binary predicate


YisB

Example
X is much larger than Y
Yis large
X is much largero large
17

NEGATION RULE

not (X is A)
X is ...., A ~ ~ (u) =1 - IlA (u)
...., ~negation

Example
not (Mary is young)
Mary is not young

EXIENSION PRINCIPLE

XisA
f(X) is f(A)
A =Il]!u] + J.12!U2 + ... +J.1,,!un
f(A) =Il]!f (u])+J.12!f (U2)+ ... +J.1n!f(un )
Example
X is small
X2 is 2 small
Ii very small ,!1ve", .....11 = ( Ilsmall )2
2 small =

It should be noted that the use of the canonical fonn of in these rules stands
in sharp contrast to the way in which the rules of inference are expressed in classical
logic. The advantage of the canonical fonn is that it places in evidence that infer-
ence in fuzzy logic may be interpreted as a propagation of elastic constraints. This
point of view is particularly useful in the applications of fuzzy lOgic to control and
decision analysis (Proc . of the 2nd IFSA Congress. 1987. Proc. of the International
Workshop. Iizuka, 1988).
As was pointed out already, it is the qualitative mode of reasoning that
plays a key role in the applications of fuzzy logic to control. In such applications,
the input-output relations are expressed as collections of fuzzy if-then rules (Mam-
dani and Gaines, 1981).
For example, if X and Y are input variables and Z is the output variable, the
relation between X ,Y , and Z may be expressed as

Z is C]ifX is A] andY is B]
Z is C 2 ifX isA 2 andY is B2

Z is Cn if X is An andY is Bn
where Cj , Aj , and Bj , i =1 •...• n are fuzzy subsets of their respective universes of
discourse. For example,
18

Z is small if X is large and Y is medium


Z is not large if X is very small and Y is not large

Given a characterization of the dependence of Z on X and Y in this fonn,


one can employ the compositional rule of inference to compute the value of Z given
the values of X and Y. This is what underlies the Togai-Watanbe fuzzy logic chip
(Togai, 1986) and the operation of fuzzy logic controllers in industrial process con-
trol (Sugeno, 1985).
In general, the applications of fuzzly logic in systems and process control
fall into two categories. First, there are those applications in which, in comparison
with traditional methods, fuzzy logic control offers the advantage of greater simplici-
ty, greater robustness, and lower cost The cement kiln control pioneered by the F.L.
Smidth Company falls into this category.
Second, are the applications in which the traditional methods provide no
solution. The self-parking fuzzy car conceived by Sugeno (Sugeno, 1985) is a prime
example of what humans can do so easily and is so difficult to emulate by the tradi-
tional approaches to systems control.

SYLWGISTIC RULES

In its generic fonn, a fuzzy syllogism may be expressed as the inference schema
QJA 's are B 's
Q2C 's are D 's
Q~ 's are F 's
in whichA. B. C. D. E and F are interrelated fuzzy predicates and QI' Q 2 and Q 3 are
fuzzy quantifiers.
The interrelations between A,B.C.D.E and F provide a basis for a
classification of fuzzy syllogisms. The more important of these syllogisms are the
following
(a) Intersection/product syllogism:
C=AAB,E=A,F=CAD
(b) Chaining syllogism:
C=B.E=A.F=D
(c) Consequent conjunction syllogism:
A=C=E,F=B J.. D
(d) Consequent disjunction syllogism:
A=C=E,F=B V D
(e) Antecedent conjunction syllogism:
B=D=F,E=A A C
(f) Antecedent disjunction syllogism:
B=D=F,E=A v C
19

In the context of expert systems. these and related syllogisms provide a set of infer-
ence rules for combining evidence through conjunction. disjunction and chaining
(Zadeh, 1983b).
One of the basic problems in fuzzy syllogistic reasoning is the following:
Given A, B, C, D, E and F. find the maximally specific (Le., most restrictive) fuzzy
quantifier Q3 such that the proposition Q 3 E's are F's is entailed by the premises. In
the case of (a), (b) and (c). this leads to the following syllogisms:

INTERSECTIONIPRODUCI SYLWGISM.

Q \ A's are B ' s (5.1)


Q2(A andB)'s are C 's
(Q\ ®Q2)A 's are (BandC)'s
where ® denotes the product in fuzzy arithmetic (Kaufmann and Gupta, 1985). It
should be noted that (5.1) may be viewed as an analog of the basic probabilistic
identity
P(B,CIA) =P(BIA)P(CI A.B)
A concrete example of the intersectioD/product syllogism is the following:
most students are young (5.2)
most young students are single
most 2 students are young and single
where mas? denotes the product of the fuzzy quantifier most with itself.

CHAINING SYLLOGISM.

Q\A's are B's


Q2B'S are C's
(Q\®Q2) A's are C's
This syllogism may be viewed as a special case of the intersection product syllo-
gism. It results when B c A and QJ and Q2 are monotone increasing. that is. ~ QJ
=
= QJ' and ~ Q2 Q2 .where ~ QJ should be read as at least QJ' Q2. A simple ex-
ample of the chaining syllogism is the following:
most students are undergraduates
most undergraduates are single
mosP students are single
Note that undergraduates c students and that in the conclusion F = single. rather
than young and single. as in (5.2).

CONSEQUENT CONJUNCIION SYLWGISM.

The consequent conjunction syllogism is a example of a basic syllogism


which is not a derivative of the intersectioD/product syllogism. Its statement may be
expressed as follows:
20

QJA's are 8's (5.3)


Q~'s are C's
Q A's are (8 andC),s,
where Q is a fuzzy quantifier which is defined by the inequalities
o~(Q)®Q291 )SQ SQ) S®Q2 (5.4)
in which ~,® ,@ and e are the operations of v (max), A (min), + and - in fuzzy ar-
ithmetic.

An illustration of (5.3) is provided by the example


most students are young
most students are single
Q students are single and young
where
2 most~ 1 SQ Smost.
This expression for Q follows from (5.4) by noting that
most ~ most = most
and

o ~ (2most e 1) = 2most e 1.
The three basic syllogisms stated above are merely examples of a collection
of fuzzy syllogisms which may be developed and employed for PWPOSes of infer-
ence from commonsense knowledge. In addition to its application to commonsense
reasoning, fuzzy syllogistic reasoning may serve to provide a basis for combining
uncertain evidence in expen systems (Zadeh, 1983b).

CONCLUDING REMARKS
One of the basic aims of fuzzy logic is to provide a computational frame-
work for knowledge representation and inference in an environment of uncertainty
and imprecision. In such environments, fuzzy logic is effective when the solutions
need not be precise and/or it is acceptable for a conclusion to have a dispositional
rather than categorical validity. The imponance of fuzzy logic derives from the fact
that there are many real world applications which fit these conditions, especially in
the realm of knowledge-based systems for decision-making and control.

REFERENCES AND RELATED PUBUCATIONS


Baldwin, J.P. "FRll..-A fuzzy relational inference language," Fuzzy Sets and Sys-
tems 14,155-174,1984.

Baldwin, J.F., Manin, T.P., and Pilswonh, B.W., "Implementation of FPROG-a


fuzzy Prolog interpreter," Fuzzy Sets and Systems 23,119-129,1987.
21

Bezdek, J.C. (ed.), Analysis of Fuzzy Iriformation-Voll.2, and 3: Applications in


Engineering and Science, CRC Press, Boca Raton, FL., 1987

Brachman, R.J. and Levesque, H.J. Readings in Knowledge Representation, Morgan


Kaufmann Publishers, Inc., Los Altos, Calif., 1985.

Brachman, R.J. "The basics of knowledge representation and reasoning," AT&T


Technical Journal 67, 25-40, 1988.

de Kleer, J., and 1. Brown, "A qualitative physics based on confiuences," Artificial
Intelligence, 24, 7-84, 1984.

Doyle,1. "A truth-maintenance system," Artificial Intelligence 12, 231-272, 1979.

Dubois, D. and Prade, a , Fuzzy Sets and Systems: Theory and Applications.
Academic Press, New York, 1980.

Dubois, D. and Prade, H. "Fuzzy cardinality and the modeling of imprecise


quantification," Fuzzy Sets and Systems 16, 199-230, 1985.

Dubois, D. and Prade, H., Possibility Theory--An Approach to Computerized Pro-


ceSSing of Uncertainty. Plenum Press, New York, 1988.

Dubois, D. and Prade, a "On fuzzy syllogisms," Computational Intelligence 14,


171-179, 1988a.

Dubois, D. and Prade, H. "The treatment of uncertainty in knowledge-based sys-


tems using fuzzy sets and possibility theory," Int. J. Intelligent Systems 3,
141-165, 1988b.

Farreny, a and Prade, H. "Dealing with the vagueness of natural languages in


man-machine communication," Applications of Fuzzy Set Theory in Human
Factors, W. Karvowski and A. Mital (eds.), Elsevier Science Publ., Amster-
dam, 71-85, 1986.

Forbus, K., "Qualitative physics: past, present, and future," Exploring Artificial Intel-
ligence, H. Shrobe, ed., Morgan Kaufman, Los Altos, CA, 1989.

Fujitec, "Anificial intelligence type elevator group control system," JErRO , 26,
1988.

Goguen, J.A., "The logic of inexact concepts," Synthese 19, 325-373,1969.

Goodman, I.R. and Nguyen, H.T., Uncertainty Models for Knowledge-Based Sys-
tems. North-Holland, Amsterdam, 1985.

Gupta, M.M and Yamakawa, T. (eds.). Fuzzy Logic in Knowledge-Based Systems.


North-Holland, Amsterdam, 1988.

Isik, C., "Inference engines for fuzzy rule-based control," Internationallour. ofAp-
proximate Reasoning 2,122-187,1988.
22

Johnson-Laird, P.N. "Procedural semantics," Cognition 5, 189-214, 1987.

Kacprzyk, J. and Yager, R.R (eds.), Management Decision Support Systems Using
Fuzzy Sets and Possibility Theory. Interdisciplinary Systems Research Series,
vol. 83, Verlag rov Rbeiland, Koln, 1985.

Kacprzyk, J. and Orlovski, S.A. (eds.), Optimization Models Using Fuzzy Sets and
Possibility Theory. D. Reidel, Dordrecht, 1987.

Kasai Y. and Y. Morimoto, "Electronically controlled continuously variable


transmission," Proc. Int. Congress on Transportation Electronics, Dearborn,
Michigan, 1988.

Kaufmann, A. and Gupta, M.M., Introduction to Fuzzy Arithmetic. Van Nostrand,


New York, 1985.

Kaufmann, A. and Gupta, M.M., Fuzzy Mathematical Models with Applications to


Engineering and Management Science. North-Holland, Amsterdam, 1988.

Kinoshita, M., and T. Fukuzaki, T. Satoh, and M Miyake, "An automatic operation
method for control rods in BWR plants," Proc. Specialists' Meeting on In-core
Instrumentation and Reactor Core Assessment, Cadarache, France, 1988.

Kiszka, J.B., MM. Gupta, and P.N. Nikiforuk, "Energetistic stability of fuzzy
dynamic systems," IEEE Transactions on Systems, Man and Cybernetics
SMC-15, 1985.

Klir, G.J. and Folger, T.A., Fuzzy Sets, Uncertainty and Information. Prentice Hall,
Englewood ruffs, N.J., 1988.

Kuipers, P., "Qualitative simulation," Artificial Intelligence 29, 289- 338, 1986.

Levesque, H.J. "Knowledge representation and reasoning," Annual Reviews uf


Computer Science 1, Annual Review, Inc., Palo Alto, Calif., 255-287,1986.

Levesque, H.J. and Brachman, R. "Expressiveness and tractability in knowledge


representation and reasoning," Computational Intelligence 3, 78-93, 1987.

Mamdani, B.H. and Gaines, B.R. (eds.), Fuzzy Reasoning and its Applications.
Academic Press, London, 1981.

McCarthy, J., "Circumscription: non-monotonic inference rule," Artificial Intelli-


gence 13,27-40,1980.

McDermott, D.V., "Non-monotonic logic, I," Artificial Intelligence 13, 41-72,


1980.

McDermott, D.V., "Non-monotonic logic, II: non-monotonic modal theories,"


Journal of the Association for Computer Machinery 29, 33-57, 1982.
23

Moore, R.C., "The role of logic in knowledge representation and commonsense rea-
soning," Proceedings of the National Conference on Artificial Intelligence,
428-433,1982.

Moore, R.C. and Hobbs, lC. (eds.), Formal Theories of the Commonsense World.
Ablex Publishing, Harwood. NJ., 1984.

Mukaidono, M., Z. Shen, and L. Ding, "Fuzzy Prolog," Proc. 2nd IFSA Congress,
Tokyo, Japan, 452-455,1987.

Negoita, C.V., Expert Systems and Fuzzy Systems. Benjamin/Cummings, Menlo


Parle, CA 1985.

Nilsson, N., "Probabilisticlogic," Artificiallntelligence 20, 71-87, 1986.

Peterson, P., "On the logic offew, many, and most," Notre Dame Journal of Formal
Logic 20, 155-179, 1979.

Pospelov, G.S., "Fuzzy set theory in the USSR," Fuzzy Sets and Systems 22,1-24,
1987.

Proceedings of the Second Congress of the International Fuzzy Systems Association,


Tokyo, Japan, 1987.

Proceedings of the International Workslwp on Fuzzy Systems Applications, Kyushu


Institute ofTechnology, Iizuka, Japan, 1988.

Reiter, R. and Criscuolo, G., "Some representational issues in default reasoning,"


Computers and Mathematics 9, 15-28, 1983.

Shapiro, J.C. (ed.), Enclyclopedia of Artificial Intelligence. John Wiley & Sons, New
York, 1987.

Small, S.L., Cottrell, G.W., and Tanenbaus, M.K. (eds.), Lexical AmbigUity Resolu-
tion. Morgan Kaufman Publishers, Los Altos, CA. 1988.

Sugeno, M., ed., Industrial Applications of Fuzzy Control, North Holland, Amster-
dam, 1985.

Talbot, CJ., "Scheduling TV advertising: an expert systems approach to utilising


fuzzy knowledge, " Proc. of the Fourth Australian Conference on Applications
of Expert Systems, Sydney, Australia, 1988.

Togai, M., and H. Watanabe, "Expert systems on a chip: an engine for real-time ap-
proximate reasoning," IEEE Expert 1,55-62, 1986.

Wilensky, R. "Some problems and proposals for knowledge representation,"


Technical Report 87/351, Computer Science Division, University of Califor-
nia, Berkeley, 1987.
24

Yager, R.R. "Quantified propositions in a linguistic logic," Proceedings of the 2nd


International Seminar at Fuzzy Set Theory, Klement, E.P. (ed.), Johannes
Kepler University, Linz, Austria, 1980.

Yager, R.R. "Reasoning with fuzzy quantified statements-I," Kybernetes 14,


233-240, 1985.

Yasunobu, S. and G. Hasegawa, "Evaluation of an automatic container crane opera-


tion system based on predictive fuzzy control," Control Theory and Advanced
Technology, vol. 2, no. 3, 1986.

Yasunobu, S. and S. Myamoto, "Automatic train operation by predictive fuzzy con-


trol," Industrial Applications of Fuzzy Control, M Sugeno, ed., North Holland,
Amsterdam, 1985.

Zadeh, L.A. "Probability measures of fuzzy events," Jour. Math. Anal. and Appli-
cations 23, 421-427, 1968.

Zadeh, L.A. "Outline of a new approach to the analysis of complex systems and de-
cision processes," IEEE Trans. on Systems, Man and Cybernetics SMC-3,
28-44, 1973.

Zadeh, L.A. "The Concept of a Linguistic Variable and its Application to Approxi-
mate Reasoning," Part I; Inf. Science 8, 199-249; Part IT In/. Science 8, 301-
357; Part ill In! Science 9,43-80, 1975.

Zadeb, L.A. "Fuzzy sets as a basis for a theory of possibility," Fuzzy Sets and Sys-
tems 1,3-28, 1978a.

Zadeh, L.A. "PRUF-A meaning representation language for natural languages,"


Int. J . Man-Machine Studies 10, 395-460, 1978b..

Zadeh, L.A. " A fuzzy-set-theoretic approach to fuzzy quantifiers in natural


languages," Computers and Mathematics 9,149-184, 1983a.

Zadeh, L.A. "The role of fuzzy logic in the management of uncertainty in expert
systems," Fuzzy Sets and Systems 11,199-227, 1983b.

Zadeh, L.A. "A theory of commonsense knowledge," in Aspects of Vagueness,


H.J. Skala, S. Termini and E. Trillas, (eds.), Reidel, Dordrecht, 1984.

Zadeh, L.A. "Syllogistic reasoning in fuzzy logic and its application to reasoning
with dispositions, " IEEE Trans. on Systems, Man and Cybernetics SMC-15,
754-763, 1985.

Zadeb, L.A. "Test-Score Semantics as a Basis for a Computational Approach to the


Representation of Meaning," Literary and Linguistic Computing 1, 24-35,
1986.
Zadeh, L.A., "A computational theory of dispositions," Int. J. at Intelligent Systems
2,39-63, 1987.
25

Zadeh, L.A., "Fuzzy logic," Computer 1,83-93, 1988a.

Zadeh, L.A., "Dispositional logic," Appl. Math. Lett., 95-99, 1988b.

Zadeh, L.A. " QSA/FL-Qualitative systems analysis based on fuzzy logic, " Proc.
AAAI Symposium, Stanford University, 1989.

Zemankova-Leech, M. and Kandel, A., Fuzzy Relational Data Bases - A Key to Ex-
pert Systems. Verlag TIN Rheinland, Cologne, 1984.

Zimmerman, H.I., Fuzzy Set Theory and its Applications. Kluwer, Nijhoff, Dor-
drecht, 1987.
2
EXPERT SYSTEMS USING FUZZY
LOGIC

Ronald R. Yager
Machine Intelligence Institute
Iona College
New Rochelle, NY 10801

ABSTRACT
We show how the theory of approximate reasoning developed by L.A. Zadeh
provides a natural format for representing the knowledge and performing the
inferences in rule based expert systems. We extend the representational ability of
these systems by providing a new structure for including rules which only require the
satisfaction to some subset of the antecedent conditions. This is accomplished by the
use of fuzzy quantifiers. We also provide a methodology for the inclusion of a form
of uncertainty in the expert systems associated with the belief attributed to the data
and production rules.

INTRODUCTION
In [1] Buchanan and Duda provide an excellent introduction to the principles
of rule-based expert systems. In [2] Buchanan provides a bibliography on expert
systems. A particularly well cited example of a rule based expert system is MYCIN
[3,4]. In [5] Van Melle has abstracted the basic structure of the MYCIN system and
provided a language for the development of prototypical rule based expert systems
called EMYCIN.
A rule based expert system is essentially an example of a production system
consisting of the following components [1]:

1. A rule or knowledge base - This consists of the experts knowledge


in the form of conditional type statements. Each conditional statement consisting of
an antecedent portion and consequent portion. Typical rules are of the form
if antecedent then consequent.
2. A problem or global database - This consists of a set of facts or
assertions about the current problem.
3. A rule interpreter- This consists of the portion of the system that
carries out the problem solving. The rule interpreter can be considered to have two
28
components. The first component consists of the inference mechanism. This helps
to determine when a particular rule is valid and what is the effect of applying this
rule. The second component consists of some meta-rules which helps determine in
which order the rule base is to be searched for applicability of rules.
The information in the problem database as well as the antecedent and
consequents of the rules are of the form the (attribute) of (object) is (value), an
optional certainty measure can be assigned to these propositions.
The expert system is generally activated by the introduction of a problem to
be solved in the form of a goal to be satisfied as well as the insertion in the problem
data base of data about the current problem. In many cases the expert system is a
pattern directed or forward chaining production system. In this type of situation the
problem is initiated with the insertion of the goal state and the data rule base is then
searched to find which rules can be applied based upon the information in the
problem database. A rule is applicable if the information in the global database
satisfies the antecedent portion of the rule. If a rule is fired the appropriate
information is added to the global database forming a new augmented database. One
sees this in essence as being an application of modens pollens. The rule base is then
again searched for frreable rules using this new augmented global database again
adding new information to the database. The meta-knowledge in the rule interpreter
is used to help direct the search for fireable rules. The determination of good
heuristics for searching the rule base plays a significant role in the intelligent aspects
of the system. The process of frring rules continues until no new information can be
added or a goal state is reached.
In this paper we are concerned with the question of representation of the
propositions forming the information in the global database and the antecedents and
consequents in rules, as well as the inference mechanism used to infer the
consequence of fired rules. We shall also provide a format for the representation of
complex rules in which only some of the antecedents need to be satisfied.
Furthermore, we shall provide a mechanism for the inclusion of some forms of
uncertainty. In particular we shall suggest that the theory of approximate reasoning
based upon fuzzy subsets developed by L.A. Zadeh provides a very robust
methodology for representing the propositions and implementing the appropriate
inference [6-10]. We note that Yager [11,12] has provided an approach for querying
large knowledge bases of the type found in expert systems where the information is
described in terms of fuzzy propositions.

REPRESENTATION OF DATA AND RULES

As noted by Buchanan and Duda [1] the fundamental building block for the
information in both the database and the rule base of an expert system are
propositional statements of the form:
the (attribute) of (object) is (value)
For example
The height of John is 6 feet.
29
The temperature of the patient is 102.
One can combine the ideas of attribute and object into a concept called a variable.
Thus in the above examples John's height and the patient's temperature can be
considered variables. In this notation the fundamental building blocks of the rule-
based expert systems would be
V is A,
where V is a variable, attribute (object) and A is its current value.
It is at this point we diverge from the current representational approach to
expert systems knowledge. In the current systems, such as MYCIN, the values of
the variables are left as symbols, words or values with no meaning. That is, the data
Temperature is high is left in this form, no attempt is made to give any meaning to
the value high. That is, the values are considered as atomic items with no further
attempts at understanding their meaning. The matching used to determine the
frreability of rules is carried out at this level of semantics. Using the values at this
level of detail provokes some important questions. When two people use the same
word, such as the designer of a system and the user, do they mean the same thing?
Secondly, if a rule has a certain value for a variable in its antecedent can we still learn
something about the consequent variable if we only know that the value of antecedent
is close to the value in the rule? The ability to handle these types of problems
requires us to provide a deeper semantics for the values associated with variables.
Just as the predicate logic refines and improves upon the propositional logic by
further decomposing the atomic statements the theory of approximate reasoning [6-
10] further refines the meaning of the values associated with variables.
The approach we suggest is based upon idea of fuzzy subsets introduced by
Zadeh [13]. Assume X is a set of objects. A fuzzy subset A of X is a subset in
which the membership grade for each x E X is a element in the unit interval [0,1].
We denote this membership function A(x). In our approach a proposition such as
. Age is old
has the effect of associating with the variable age a possibility distribution [9].
Assume we have the proposition
VisA
where A is some value. We can express A as a fuzzy subset of a base set, the set
values the variable can assume. For example, if A is old we can express A as a fuzzy
subset if interval of ages [0,150]. In particular, X is the set of all values that V can
assume. The statement in turn induces a possibility distribution, 1tv over the set X
such that
TIv (x) =A(x),
where A(x) is the membership grade of x in A. In particular TIv(x) is seen to be the
possibility that V =x given the data V is A.
In a rule based expert system the fundamental component of the rules are
conditional statements of the form
i/Vj is A then V2 is B.
As suggested by Zadeh [8] propositions of this type can also be seen to
30

generate possibility distributions. In particular if V 1 and V2 have as their base sets


the sets X and Y respectively then
if V I is A then V2 is B
induces a conditional possibility distribution 1tvllv2 over X x Y such that
IIVl lv2 (x. y) = Min (I. I - A(x) + B(y».
an alternative definition is
IIVl lv2 (x. y) =Max (I - A(x). B(y».
Thus in this approach the effect of both data statements and rules are to introduce
possibility distributions.
More complex forms of rules can easily be represented in this approach. If
V I. V2 •... Vn are variables taking values in the base sets XI. X2 •... Xn
respectively then the statement
V I is Aland V2 is A2 and . .. and Vn is An
is seen to induce the joint possibility distribution IIV V V V over
I· 2· 3' n
XI x X2 ... x Xn such that
IIV V V (XI. x2· ... xn) =Mini [Ai(xi)]
V.
I· 2· 3
n
The statement V = Al or V2 = A2 or Vn = An induces the joint possibility
distribution IIv 1. V2. V3' Vn over X I x X2 ... x Xn such that
IIV V V V (Xl. x2· ... xn) = Maxi [Ai(xi)]·
1· 2· 3' n
With these ideas we can easily represent more complicated rules. Let VI. V2 ...Vn
be variables with base sets Xl .. .xn respectively and let Ul.U2 •...U p be
variables with base sets Y 1. Y2•... Yn respectively. Consider the rule
if V 1 is A 1 and V2 is A2 ... and Vn is An then U 1 is B 1.
This induces a condition possibility distribution IIUllvl'V2' .. vn over
XI X X2 . . . Xn X YI such that
IIU1 Iv l.V2••.. vn (Xl. x2.··· xn• yt) = Min (I. I-H(xl. x2.··· xn) + Bl(yt».
where H(xl. x2 •... xn) = Mini [Ai(xi)]·
Consider the rule
i/VI is Al and V2 is A2 ... and Vn is An
then
UI isB1 or U2 is B2 or Up is Bp.
This generates the conditional possibility distribution IIUI.U2•... uplvl.V2 ....Vn
over the set X I x X2 •... x Xn x Y 1 x Y2 x Y 3 .. x Xp such that
II UI • U2 •..... UplVl. V2•... Vp (Xl. x2. xn • YI.Y2 •... yp) =
Min (1 .1-H(xl.x2.·· . xn) + G(yt.Y2 •... yp»
where
31

G(Yl, Y2, ... yp) = Maxi (Bi (Yi) ).


Other complex rules can be expressed in a similar manner.

INFERENCES FROM THE SYSTEM


The ability to use the database to search the rule base to infer further data in
this approach is based upon the inference laws of the theory of approximate
reasoning. The essential laws for this purpose are the conjunction principle, and the
entailment principle. These laws are related respectively to the laws of adjunction,
law of simplification, law of modens pollens and the law of addition in the classic
binary propositional logic.
The conjunction principle states that if we have two pieces of data about
some variable, for example
Y is A ~ TIy(x) = A(x)
y is B ~ TIy(x) = B(x)
then we can conjunct these distributions getting the proposition Y is C where
C(x)=A(x) 1\ B(x)
C(x) = Min (A(x), B(x) ).
The projection principle allows us to project out marginal possibility distributions
from joint distributions. Assume TIv I, v2 is a joint possibility distribution over the
base set XIx X2, then this the projection principle allows us to infer that
Max
TIv1 (x)= all [llyl,V2(x,y)].
y eX 2
The law of fuzzy compositional inference which combines conjunction and
projection plays a role similar to modens pollens in binary logic. Consider the data
YisAI
and the rule
ifY is A2 then U is B
The proposition V is A1 induces the possibility distribution
TIv= Al(x)
over X. The rule if V is A2 then U is B induces the conditional possibility
distribution
TIulv(x,y) = Min [I, 1 - A2(x) + B(y)]
over X x Y.
The law of fuzzy compositional inference says from these two pieces of
information we can infer that
TIu(Y) = Maxx [TIv(x) 1\ TIulv (x, y)].
Consider the situation where there is more than one element in the antecedent
if Y1 is Aland Y2 is A2 then U is B.
32

Let our data be


VI is CI
V2 is C2
First the rule "if V I is A I and V2 is A2 then U is B" induces the conditional
possibility distribution
=
TIul vl • v2 (Y. xI. xV Min [1. I - (AI(XI) A A2(XV) + B(y)]
In order to obtain U from this via fuzzy composition inference. we need
TIvl • v2. This can be obtained from our data as
TIvl • v2 (xI. xV =Min [CI(xI). C2(xV]
then
=
TIu (y) Max(XI.X2) [nvl • v2 (xI. XV A nul vl • v2 (y. XI. xV]
Consider next the situation in which we have two elements in the
consequent of our rule:
i/Vis A then U) is B} or U2 is B2.
This induces the conditional possibility distribution
n
uI.u2 v
=
I (YI. Y2. x) Min [1.1- A(x) + (BI(n) v B2(YV)]
using the data
V is C (nv(x) =C(x»
we can apply fuzzy composition inference to obtain
nUl' u2 (Yl. YV =Maxx [nv(x) A nUl' u21v (x. YI. YV]
The projection principle can now be applied to get either nUl or n U2 . For
example
nul(n) =Maxy2[nul • u2 (YI· YV]
The entailment principle implies that from the datum V is A. we can infer
V is B. where B is any fuzzy subset such as the A C B.
We now can see the applicability of this theory to rule-based expert
systems. Our global data base consists of information of the form Viis Ai. our rule
base consists of rules of the type "if Viis Bi then Uj is Cj" by application of the
laws of inference especially compositional fuzzy inference we can obtain new
information to add to our global data base.

QUANTIFIERS IN THE ANTECEDENT


In this section we shall provide an extension of the ideas presented in the
previous part to allow for the representation of more sophisticated rules in our rule
base. such as:
"if most of the conditions VI is AI. V2 is A2•. ..Vn is An are satisfied
then U is B"
"if at least half of the conditions V I is A I. V2 is A2 • . ..Vn is An are satisfied
33

then U is B"
The ability to represent such rules will greatly enhance the ability of any
expert system to capture the types of rules used by experts.
We shall provide a methodology for representing such rules in a manner
consistent with the rest of our formulation and one which allows inferences to be
made about the value of the consequence using the rule and observed values about the
variables in the antecedent. This methodology is based upon Zadeh's [14]
representation of quantifiers and Yager's procedure for evaluating quantified statements
[15].
The class of rules we are concerned with can be described to consist of the
following components, an antecedent and a consequence. The antecedent component
consists of a collection of requirements specified in the form of proposition of the
type Vi is Ai is a variable and Ai is a fuzzy subset of the base set X. In addition the
antecedent consists of a quantifier, Q, such as most, all, almost all, at least one, at
least half, etc. The consequent consists of a proposition of the type U is B.
The rule than reflects the fact that if Q of the antecedent conditions, the
V i is Ai's, are satisfied than U is B can be added to our knowledge base. The
fundamental difference between this type of rule and the types studied in the previous
section is that rather than requiring all the antecedent conditions to be satisfled, only
Q of them need be satisfied.
Like the other types of conditional rules, these rules also induce a
conditional possibility distribution IT over the set X 1 x X1 x Xn.
--UlV l' V2,··· Vn
X Y. In particular for any point (Xl' x2, . .. xn, y) where Xi E Xi and y E Y
IlulV V V (Xl, x2,·· .xn' y) =Min [I, 1 - H(xI, x2, ·· .xn) + B(y)].
1- 2- ··· n
The essential difference lies in the determination of the joint possibility
H(x I , . . xn), the component due to the antecedent. The method for determining
this H is based upon ideas developed by Yager [15].
As suggested by Zadeh [14], a linguistic quantifier can be expressed as a
fuzzy subset. In particular there exists three kinds of quantifiers, the first two of
which are of interest to us. A kind one quantifier or absolute quantifier such as
"about 5", "at least seven" and a kind two or relative quantifier is exemplified by
values such as "almost all" and "at least half." As suggested by Zadeh, a kind one
quantifier can be expressed as a fuzzy subset of the non-negative reals whereas a kind
two quantifier can be expressed as a fuzzy subset of the unit interval. For example, if
QI is the kind one quantifier "at least 5", then for each x E R+, Q(x) indicates the
degree to which x satisfied the concept "at least 5". Similarly, if Q2 is a kind two
quantifier, "most", then for any x E I, Q(x) indicates the degree to which the
proposition x satisfies the concept "most".
Let Q be a quantifier either kind I or kind II, with base set W, for kind I,
= =
W R+ and for kind II, W [0, 1]. Then Q is said to be monotonically non-
decreasing if for any WI, w2 E W such that u2 > ul then Q(uV ~ Q(uI). We shall
34

resuict ourselves to these monotonically non-decreasing quantifiers as they appear to


be the types that naturally appear in the rules used in expert systems.
We can now describe the procedure for obtaining H. from the antecedent of
Q and the conditions Vi is Ai. We shall initially assume Q is a kind I quantifier.
For any point (x I. x2 •... xn) e X I x X2 x ... Xn • where Xi is the base
set of Ai we obtain H(x I. x2•... xn) in the following manner.
Let D(xl. x2 •...x n ) = (AI(xI). A2(x2) •... An(xn)) and let
Di (x I. x2 •... xn) be the ith largest element in the set D(x I. x2 •...xn).
For any absolute quantifier QI
H(xI •.. · xn) = Maxi [QI(i) A Di(XI.·· .xn)]·
If QI is a relative quantifier then we replace QI (i) by QI (iln).
Having obtained H(x I •... xn) for every (x I •. .. xn) e X I x X2 x ... Xn we
obtain
n uIVI.V2• . ..Vn (xl.··· xn) = Min [ 1- H(xI.··· xn) + B(y)]
This then becomes the induced possibility disuibution from the rule.
.v
ifQ ojVj is Aj. V2 is A2 •. n is An are satisfred then U is B.
If in addition we have in our database the values V I is C I. V2 is C2 •...
V n is Cn. where Ci is a fuzzy subset of Xi then we can obtain a value for U as
UisM
where
M(y) = Max(XI.X2•.. X \ [nUl (xI.x2. · · Xn) A n (XI.X2 •.. Xn)
w vI.v2 •. vn vl.v2,. vn
where n (xI,x2, ..xn) = MiniCi(xV.
vI.v2,· vn
A simple example will illustrate this procedure. Assume VI, V2, V3, U
are variables with base sets
XI=(8ob}
X2= {c.d}
X3=(e,f}
Y = (g, h)
Let Q be the kind II quantifier, most defined by
Q(O) = O. Q(I/3) = 0, Q(2/3) = 1f2. Q(I) = 1.
Assume our rule is
If Q of [V I is A I, V2 is A2, V3 is A3] are satisfied then U is B
where
A I =a =(1/80 O/b )
A2 =c =(I/c. Old)
==
A3 f (Ole, lIf)
B = g = {l/g,O/h}
We shall first obtain H. Consider the point (a, c, e)
35
D(a. e, e) = (I, 1,0)
hence
01 (a,e, e)= 1
~ = (a.e,e) = 1
03 (a,e,e) = o.
Using this we get
H(a,e,e) = Maxi [Q(i!3) A Oi (a.e,e)] = [OAI, 1/2-\1, lAO] = 112
The following table provides the fonnulation ofH, H = Max[OAdI' I/2Ad2, IAd3]
xl x2 x3 A(xI) A(x2) A(x3) dl d2 d3 H
a eel 1 0 1 1 0 .5
a e f 1 1 1 1111
ad e 1 0 0 1001
a d flO 1 1 1 0 .5
bee 0 1 0 1000
b e f 0 1 1 1 1 0 .5
b de 0 0 0 0000
b d f 0 0 1 1000
Since
n UiV l'V2'V3 (xl. x2. x3. y) = Min (1.1 - H (xl. x2, x3) + B(y»
the following table expresses 7tuiV I,V2,v3
xl x2 x3 Y 7t u IVl. V2. V3 (xl. x2. x3. y)
a e e g IA (1 - t + 1) = 1
a e e h lA (1 - t + 0) =t
a e f g lA (1 - 1 + 1) = 1
a e f h =
IA (1 - 1 + 0) 0
a d e g lA (1 - 0 + 1) = 1
a d e h =
IA (1 - 0 + 0) 1
a d f g =
lA (1 - t + 1) 1
a d f h =
lA (1 - t + 0) t
bee g IA (1 - 0 + 1) = 1
bee h IA (1 - 0 + 0) = 1
b e f g =
IA (1 - t + 1 ) 1
b e f h =
lA (1 - t + 0 ) t
b d e g =
IA (1 - 0 + 1) 1
b d e h =
IA (1 - 0 + 0) 1
b d f g IA (1 - 0 + 1) = 1
b d f h =
lA (1 - 0 + 0) 1
Assume we have the data
VI = {I/a.OIb} =a=CI
V2 = {lIe. Old} = e = C2
V3= {Ole. 1If} =f=C3
then to obtain nu (y) we see that
36

nu(Y) = Max(XI' x2, x3) n uIVI,v2,v3 (xI, x2, x3, Y)I\CI(xI)I\C2(xVI\C3(x3)]


hence
nu(g) = Max [11\11\11\0, 11\11\11\1, 11\11\01\0, 11\11\01\1, 11\01\11\0, 11\01\11\1,
11\01\01\0, 11\01\01\1] =Max [0,1,0,0,0,0,0,0] = 1
nu (h) = Max [1/2AlI\II\O, 01\11\11\1, 11\11\01\0, 1121\11\01\1, 11\01\11\0,
1121\01\11\11\1, 11\01\01\0, 11\01\01\1] = Max [0,0,0, ....0] = 0
hence as we would have anticipated
U = {I/g, O/h} = g
Consider the next situation where
V I ={O/a, lib} = b
V2 = {I/c, Old} = c
V3 = (Ole, I/f) = f
nu (g) = Max [11\01\11\0, 11\01\11\1, 11\01\01\0, 11\01\01\1, 11\11\11\0, 11\11\11\1,
11\11\01\0, 11\11\01\1] = Max [0,0,0,0,0,1,0,0] = 1
nu (h)= Max [1/21\01\11\0,01\01\11\1, 11\01\01\0, 1121\01\01\1, 11\11\11\0, 1121\11\11\1,
11\11\01\0, 11\11\01\1] = Max [0,0,0,0,0,1/2,0,0] = 1/2
hence
U = {l/g, .5/h}
In the situation where
V I = (I/a, lib) = 1 don't know"
V2={l/c,0/d} =c
V3 = {Ole, I/f} =f
we can show that again
U = {1/g, .S/h}
In the case where
VI = {O/a, lib} =b
V2 = {Ole, 1/d} = d
V3 = {Ole, IIf} = f
we can show that
u = (1/g, 1/h) (Unkown).
The following theorem shows that the conjunction of antecedent conditions
is one of the quantifiers.
Theorem: When Q is the quantifier all then the rule
IfQ [Vi is Ai] are satisfied then U is B I
is equivalent to the proposition
ijVl is Xl andV2 isX2 and ... VnisXn then U =B n.
Proof: For rule II we have
ifVisHtheUisB
where V = (VI, V2, V3, ... ,vn) and
H(x 1~ ... xn) = AI (x 1) 1\ A2(xV ... 1\ An(xn).
For rule I we have
37

if Vis G the U is B
where
G(XI, ... Xn) =Maxi [Q(i) A D(i)]
=
where D(i) ith largest element in the set 0 ={AI (Xl), A2(xV, ... An(xn).
When Q is the quantifier all, then
Q(i) = I i=n
Q(i) =0 i;t n.
In this case
G(XI, ... xn) = I A D(n) = nth largest element in 0
= =
hence G(XI, ... xn) AI(XI) A A2(XVA ... A An(xn) H(xI, ... xn ).
Theorem: When Q is the quantifier at least one then the rule
if Q [Vi is Ail then U is B I
is equivalent to the proposition
ifV1 is A1 or V2 isA2 or Vn is An then U is B. ill
Proor: For rule III we have
if V is H the U is B
where V = (VI, V2, V3' ... ,Vn) and
H(xI, ... xn) =AI(xI) v A2(xV v .... An(xn)
For rule I we have
if V is G the U is B
where
G(XI, ... Xn) =Maxi [Q(i) A D(i)]
where D(i) =ith largest element in the set 0 = {A 1(x 1), A2(xV, ... An(xn).
When Q is the quantifier at least one, then
Q(i) = 1 for all i ~ 1.
Thus

CERTAINTY QUALIFICATION
In providing information to the database and rule base of an expert system,
as discussed by Buchanan and Duda [1], a person may not be completely confident as
to the value he is providing for a variable. Thus a user of a system may provide the
information that
V is A with confidence (or certainty) a.
In the above the quantity a, which is a number in the unit interval,
expresses the degree to which the informant believes that this information is valid.
We would like to provide a mechanism to include these types of qualified
statements into our system. In the spirit of keeping the very powerful structure
which we have developed the approach will be to assume that a statement
V is A with a confidence
38

is equivalent to an unqualified statement of the fonn


VisB.
This new statement implicitly implies a confidence of one. Thus we see that the
statement
V is B with 1 confidence
is equivalent to V is B. Thus all our previous work can be seen to have been done
with the implicit certainty one.
We impose the condition that the statement
V is A with zero confidence
should be equivalent to the proposition V is X, where X is the base set of V. Thus
zero confidence is equivalent to saying I don't know anything about the value of V
We should note that this is different than probability, for in probability we
should have
V is A zero probability => V is A with 1 probability
In general we see that an infonnant usually makes a tradeoff in providing
infonnation between the specificity of the infonnation and the confidence. That is,
the more specific he is required to provide the infonnation the less confident he can
be about it
In [17] Yager has suggested a mechanism for transfonning statements of the
fonn
V is A with confidence a
into statements of the fonn
VisB
with implied confidence one. In particular if A and B are fuzzy subsets of X then
V is A with confidence a
can be transfonned into the equivalent proposition V is B where for any x E X
B(x) =(a A A(x» + (I-a)
NOTE: For the statement V is A with confuience 1, then we get
VisA.
Proor: B(x) = (a A A(x» + (I-a) = 1 A A(x) + 0 = A(x).
NOTE: For the statement V is A with confidence 0, then we get the unqualified
proposition V is X.
=
Proor: B(x) = (aA A(x» + (1-a) 0 A A(x) + (1-0) = 1.
In [18] Yager has introduced a measure of specificity associated with a fuzzy
subset. Assume F is a fuzzy subset of the finite set X, then the specificity of F,

l
S(F) is defined as
cax
S(F) = 1 da
card Fa
o
where Fa = {x I F(x)~a}, Card Fa is the number of elements in Fa and a max is
the largest membership grade in F. For the case where F is nonnal, then
39

1
S(F)=1 1 da
card Fa
o
Yager [18] has shown for the case of nonnal fuzzy subsets if FcG, that is
for F(x) ~ G(x) for all xE X, then
S(F) ~ S(G).
The following theorems reinforce our observations about the tradeoff
between specificity and certainty.
Lemma : If A is nonnal, then the transformation of the proposition
V is A with a certainty into the proposition V is B will yield B as a nonnal set.
Proof: Let x be such that A(x) = 1, then
B(x) = aAA(x) + (I-a) = aAl + (I-a) = a + (I-a) = 1.
In the following theorems A is assumed nonnal.
Theorem: Assume the proposition V is A with a certainty transforms into
the proposition V is B then
S(A) ~ S(B).
Proof: We shall first show that for each xE X, B(x) ~ A(x) from the defmition
B(x) = (a A A(x» + (I-a).
Assume a ~ A(x) then
B(x) = a A A(x) + (I-a) = A(x) + (I-a) ~ A(x).
Assume a < A(x) then
B(x) = a + (I-a) = 1 ~ A(x).
Since B(x) ~ A(x) for each x, then it follows that S(A) ~ S(B).
Thus we see that the act of qualifying a proposition by a certainty has the
effect of reducing the specificity of its unqualified equivalent.
Theorem: Assume V is A with a 1 certainty transforms into V is Bland
V is A with a2 certainty transfonns into V is B2 if al > a2, then
S(Bl) ~ S(BV·
Proof:B 1(x) = (a 1 A A(x» + (l-a 1) and B2(x) = (a2 A A(x» + (1-av. There are three
possibly situations: 1. A(x) ~ a2 ~ al. In this case
Bl(x) = A(x) + (1-al)
B2(x) = A(x) + (l-aV,
since al > a2, then (l-al) ~ (l-aV and hence B2(x) ~ Bl (x).
2. a2 ~ A(x) ~ al. In this case
Bl(x) = A(x) + (1-al)
B2(x) = a2 + (l-aV = 1 ~ BI(x).
40

3. a2 S; al S; A(x). In this case


=al + (1-al) =1
Bl(x)
B2(x) =a2 + (1-av =1 ~ Bl(x)
Since Bl(x) S; B2(x) for all x, S(Bl) ~ S(BV'
It should be noted that this approach to certainty qualification can easily be
applied to rules in the expert system. Consider the rule
if V is A then U is B with a certainty
where A and B are fuzzy subsets of X and Y respectively. This transforms into the
possibility distribution,
TIulv (x,y) =(H(x,y) 1\ a) + (I -a)

where
H(x,y) =Min(1, l-A(x) + B(y».

REPRESENTATION OF DEFAULT KNOWLEDGE

The construction of useful knowledge based systems requires the


representation and manipulation of so called commonsense knowledge .
Commonsense knowledge is very often characterized by pieces of knowledge that are
usually true but not necessarily always true. The essential feature of this is 1M
assumption of a piece of knowledge without conclusive evidence of its truth. Within
this approach one assumes some piece of commonsense knowledge as valid if it is
consistent or possible within the framework of what we already know.
It is the use of the absence of contradictory evidence which strongly
characterizes the process of commonsense reasoning. That it, classic reasoning
systems require the certainty of propositions before asserting its truth. In
commonsense ,reasoning systems some facts are asserted as true it there exists a
possibility of it being true, nothing contradicts it Knowledge of this type is often
called defeasible because we want the option of withdrawing it if contradictory
evidence subsequently appears.
Systems which allow for the inference of information based upon the lack of
some contradictory fact are faced with the problem that their reasoning process is
nonmonotonic. In particular some proposition that was inferred may cease to be
inferable with the acquisition of further knowledge.
In [16] Yager introduced a reasoning system which we shall call fuzzy
default reasoning. This system is rooted in the theory of approximate reasoning [17].
He order to discuss this system we need introduce some additional concepts.
Intuitively speaking the statement
VisA
says that the value of V lies in the subset A. Knowledge that V is A can be used to
help determine viability of other st1\tements. If
V isB
41

is a second statement we defme


Poss[V is BN is A] = Maxx[A(x) 1\ B(x)].
Formally this defmition captures a measure of the degree of intersection between the
two sets A and B. Pragmatically, this measure provides an upper bound on the truth
of the statement V is B given V is A. That is if A and B intersect then it is possible
that V lies in B, V is B is true, given that V is A. We shall see this is a measure of
consistency of the two statements. A second closely related definition is
Cert[V is BN is A] = I - Poss[R/A]
We note that an equivalent formulation is
=
Cert[V is BN is A] Minx[A(x) v B(x)].
Formally this definition captures the degree to which A is contained in B.
Pragmatically this measure provides a lower bound on the truth of V is B given V is
A. In general
Cert[V is BN is A] :S Poss[V is BN is A]
In binary reasoning systems we require that
Cert[V is B/KB] = I,
where KB is our knowledge base, to infer that V is B is true. We shall see that in
the commonsense environment we essentially allow an inference of a commonsense
piece of knowledge to occur if
Poss[V is B/KB] = I
Recalling that a rule is of the form
if V is A then U is B,
where A and B are fuzzy subsets of the base sets X and Y. The above statement gets
translated into a joint canonical statement
(V,U)isD
where D is a fuzzy subset of X x Y such that
D(x, y) = (1 - A(x» v B(y).
If we have two pieces of knowledge
If V is A then U is B
VisE
then we can conjunct these to get
(U, V) is H.
Here H is a subset of the cartesian space X x Y such that
H =E () D =(A u B) () E =(A () E) u (B () E).
The inferred value of V, denoted G, can represent this as a
VisG
where
G(y) =Maxx[A(x) 1\ E(x)] v B(y) =Poss(AtE) v B(y) =(1- Cert(AIE» v B(y).
We see that if we are certain that A occurs given E, effectively E c A, then we get
G = B. If Poss(NE) = 1 then we get G = Y = "unknown", hence no inference is
made.
In [16,18,19] Yager has suggested that we can use possibility qualification
as a basis for the implementation of many different kinds of commonsense
knowledge. A possibility qualified statement is of the form
42

V is A is possible.
This statement characterizes a piece of information that says our knowledge of the
value of V is such that it is possible (or consistent) with it to assume that V lies in
the set A. Note that it doesn't specifically say V lies in A. Formally this statement
gets translated into
VisA+
where A+ is a subset of the power set of the base set X. In particular for any subset
GofX
= =
A+(G) Poss[NG] Maxx[A(x) A G(x)]
Essentially A+ is made up of the subsets of X which intersect, are consistent, with
A.
Closely related to possibility qualification is certainty qualification. A
statement
V is A is certain
translates into
VisAV
where A V is a subset of the power set of the base set of A, X, such that for any
subsetFofX
AV(F) =Cert(A/F} = 1- Poss (A@IF)
We shall now describe the representation of some primary types of commonsense
knowledge by the possibilistic reasoning approach.
We shall initially consider the statement
t):pically V is A.
The interpretation of "typically V is A" afforded by Reiter's default reasoning
system[20] is to say "if we have not established V is -.A then assume V is A. Thus
we can translate the above into
if Yis A is possible then V is A.
Using our translation rules we get
if V is A+ then V is A.
This translate into
V is -.(A+) U A.
We shall denote -.(A+) as A*, hence we get
V is (A* u A).
Furthermore assume that our knowledge base consists simply of the fact that
VisB.
Combining this with our typical knowledge we get V is D where
=
D (A* () B) u (A () B).
Furthermore as discussed in [16,18,19] this becomes
=
D(x) (B(x) A (1 - Poss[NB]) v (A(x) A B(x».
Two extremal cases should be noted. If our typical value A is completely
=
inconsistent with our known value, A () B ell, then Poss[NB] 0 =
43

and thus D(x) =B(x) and hence


VisB
We have discounted our typical information when it conflicted with our knowledge-
base.
On the other hand if A has some consistency with B, A n B ;# cp, thus
Poss[A!B] = 1 then we get
D(x) =B(x) A A(x)
aildhence
VisAn B.
Thus when our typical knowledge doesn't contradict our fIrm knowledge we conjunct
these sources of knowledge. In the special case when B is unknown, B =X, then we
get
VisA.

CONCLUSION

We have discussed the applicability of the theory of approximate reasoning


to rule based expert systems. The novel aspects of this work concerns our
introduction of an approach in this framework for the inclusion of complex rules and
the ability to introduce certainty qualifIcation into our system.

REFERENCES
(1) Buchanan, B.G. and Duda, R.O., "Principles of rule-based expert systems,"
Fairchild Technical Report No. 626, Lab. for Artificial Intelligence Research,
Fairchild Camera, Palo Alto, Ca., 1982.

(2) Buchanan, B.G., "Partial bibliography of work on expert systems," Sigart


Newsletter, No. 84,45-50, 1983.

(3) Shortliffe, E.H., Computer Based Medical Consultations: MYCIN, American


Elsevier, New York 1976.

(4) Davis, R., Buchanan, B.G. & Shortliffe, E.H., "Production rules as a
representation of a knowledge-based consultation program," ArtifIcial Intelligence 8,
15-45, 1977.

(5) Van Melle, W., "A domain independent system that aids in constructing
knowledge-based consultation program," Ph.D. dissertation, Stanford University
Computer Science Dept., Stanford CS-80-820, 1980.

(6) Zadeh, L.A., "Fuzzy logic and approximate reasoning," Synthese 30,407-428,
1975.
44

(7) Zadeh, L.A., "The concept of a linguistic variable and its application to
approximate reasoning," Information Science 8 and 9, 199-249, 301-357, 43-80,
1975.

(8) Zadeh, L.A., "A theory of approximate reasoning," in Hayes, J.E., Michie, D.
and Kulich, L.I., (eds) Machine Intelligence 9, 149-194, John Wiley & Sons, New
York,1979.

(9) Zadeh, L.A., "Fuzzy sets as a basis for a theory of possibility," Fuzzy Sets and
Systems I, 3-28, 1978.

(10) Zadeh, L.A., "PRUF-a meaning representation language for natural languages,"
Int. J. of Man-Machine Studies 10, 395-460, 1978.

(11) Yager, R.R., "Querying knowledge base systems with linguistic information via
knowledge trees," Int. J. Man- Machine Studies 19, 1983.

(12) Yager, R. R., "Knowledge trees in complex knowledge bases," Fuzzy Sets and
Systems 15,45-64, 1985.

(13) Zadeh, L.A., "Fuzzy sets," Information and Control 8, 338-353,1965.

(14) Zadeh, L.A., "A computational approach to fuzzy quantifiers in natural


languages," Com . .& Maths. with Appl. 9, 149-184, 1983.

[15]. Yager, R. R., "Quantifiers in the formulation of multiple objective decision


functions," Information Sciences 31, 107-139, 1983.

[16]. Yager, R. R., "Default and approximate reasoning," Proc. 2nd IFSA
Conference, Tokyo, 690-692,1987.

[17]. Yager, R. R., Ovchinnikov, S., Tong, R. and Nguyen, H., Fuzzy Sets and
Applications: Selected Papers by L. A. Zadeh, John Wiley & Sons: New York,
1987.

[18]. Yager, R. R., "Nonmonotonic inheritance systems," IEEE Transactions on


Systems, Man and Cybernetics 18, 1028-1034, 1988.

[19]. Yager, R. R., "On the representation of commonsense knowledge by


possibilistic reasoning," Int J. of Man-Machine Systems 31, 587-610,1989.

[20]. Reiter, R., "A logic for default reasoning," Artificial Intelligence 13, 81-132,
1980.
3
Fuzzy rules in knowledge-based systems
- Modelling gradedness,
uncertainty and preference -

Didier DUBOIS Henri PRADE


Institut de Recherche en Informatique de Toulouse
Universite Paul Sabatier, 118 route de Narbonne
31062 Toulouse Cedex (France)

The paper starts with ideas of possibility qualification and certainty qualification
for specifying the possible range of a variable whose value is ill-known. The notion
of possibility which is used for that purpose is not the standard one in possibility
theory, although the two notions of possibility can be related. Based on these
considerations four distinct types of rules with different semantics involving
gradedness and uncertainty are then introduced. The combination operations which
appear for taking advantage of the available knowledge are all derived from the
intended semantics of the rules. The processing of these four types of rules is studied
in detail. Fuzzy rules modelling preference in decision processes are also discussed.

1. INTRODUCTION
The applications of fuzzy set and possibility theories to rule-based expert
systems have been mainly developed along two lines in the eighties : i) the
generalization of the certainty factor approach introduced in MYCIN (Buchanan and
Shortliffe, 1984) by enlarging the possible operations to be used for combining the
uncertainty coefficients; ii) the handling of vague predicates in the expression of the
expert rules or of the available information. The first line of research is exemplified
by the inference system RUM (Bonissone et aI., 1987) where a control layer chooses
the triangular norm operation governing the propagation of uncertainty, or by the
inference system MILORD (Godo et aI., 1988) where the combination and
propagation operations associated with each rule reflect the expert knowledge. The
second trend has motivated a huge amount of literature especially for discussing the
multiple-valued logic implication connective ~ to be used in the modelling of a rule
of the form "if X is A then Y is B" by means of a fuzzy relation R (defined by
J.lR(x,y) = J.lA(x) ~ J.lB(y»· The choice of the implication function has been
investigated from an algebraic point of view by classifying the implications
according to axiomatic properties, and from a deduction-oriented perspective by
46

requiring some prescribed kind of results for the generalized modus ponens applied to
fuzzy "if... then ..... rules (e.g. Mizumoto and Zimmermann (1982), Dubois and
Prade (1984), Trillas and Valverde (1985), Bouchon (1987), Smets and Magrez
(1987». Although the available results indeed enable us to jointly choose an
implication function and the conjunction to be used for combining the two premisses
"X is A' .. and "if X is A then Y is B" in order to obtain an expected behavior for the
generalized modus ponens, these approaches do not really consider the intended
semantics of the rules. See (Dubois and Prade, 199Od) and (Dubois, Lang and Prade,
1990) for an extensive overview and a discussion of the generalized modus ponens
and of the certainty factor approaches respectively.
In this paper, extending recently obtained results (Dubois and Prade, 1989a,
1990b, d), we show how the choice of the implication operation is induced by the
type of rule we have to model in the framework of possibility theory. The approach
which is proposed formalizes ideas which have been more empirically studied by
Bouchon (1988), Despres (1989) about the role of different kinds of modifiers in the
expression and the intended meaning of fuzzy rules and can be also somewhat related
to recent works about possibility and necessity qualifications (Magrez and Smets,
1989; Dubois and Prade, 1990a; Fonck, 1990; Yager, 1990).
We first discuss two distinct ways of specifying a possibility distribution, either
by possibility or by certainty qualification. This can be regarded as a new approach in
possibility theory. The consequences of the mode of qualification on the
manipulation of the pieces of knowledge which are thus specified, are emphasized.
The notion of possibility which is used in possibility qualification do not correspond
to the standard notion of possibility measure in possibility theory ; the links between
the two concepts are clarified in Section 3. Using the ideas of Section 2, Section 4
introduces four different types of rules which are closely related to particular types of
fuzzy truth-values (or, if we prefer, of modifiers). Section 5 discusses the behavior of
these rules in the generalized modus ponens and when used in parallel. Section 6 is
devoted to another kind of fuzzy rules expressing preference.

2. TWO WAYS OF SPECIFYING A POSSIBILITY DISTRIBUTION


2.1. The Concept of a Possibility Distribution
A possibility distribution is a function 1t x ' attached to a variable x, from a so-
called universe of discourse U to the real interval [0,1] which aims at representing our
current view of the feasible, or epistemically possible, or admissible values of a
single-valued variable x whose domain is U. Depending on the interpretations, 1tX<u)
estimates the degree of ease, the degree of unsurprizingness or of expectedness, the
degree of acceptability or of preference attached to the proposition "the value of x is
u", i.e. x = u. The possibility distribution 1t x is just a way of specifying an ordering
among the elements of U, which expresses that the closer to 1 (resp. to 0), the more
(resp. the less) feasible, epistemically possible, or admissible, according to the
interpretation, the value u is for x. In the following we shall use the neutral term
"possible", saying that 1t x(u) estimates the extent to which u is possible for x, when
47
it is not interesting to put forward any specific interpretation. Thus the interval [0,1]
is just considered here as an ordinal scale where 1 stands for complete possibility and
o for complete impossibility. As soon as U entirely covers the domain of the variable
x, it is natural to require that there exists at least one element u of U which can be
considered as completely possible for x, i.e. such that 1tx(u) = 1 ; then 1tx is said to
be normalized.
2.2. Specifications by Means of Ordinary Subsets
A possibility distribution is not usually specified as such, but by the
qualification of subsets of U. Let A be an ordinary subset of U. It can be qualified
either in terms of possibility or in terms of certainty in order to specify a possibility
distribution 1t over A. Namely
i) if A is a (completely) possible range for x, it means that \;;f u E A, 1tX<u) =1 and
1t x remains unspecified outside of A, or equivalently, that
"A is possible" is translated by \;;f u E U, IlA(u) 1tx(u)~ (1)
where IlA is the (0,1I-valued characteristic function of A.
ii) if it is (completely) certain that the value of x lies in A, it means that any value
outside A is (completely) impossible, i.e. \;;f u e A, 1tx(u) =0 and 1tx is
unspecified over A, or if we prefer
"A is certain" is translated by \;;f u E U, 1tx(u) ~ IlA(u). (2)
Thus, let Ac and As be two ordinary subsets of U satisfying (1) and (2)
respectively for a possibility distribution 1tx ' then we have
\;;f u E U, IlAc(u) ~ 1tx(u) ~ IlAs(u) (3)
which expresses that Ac is included in the core of 1tx' i.e. (u E U, 1t X (u) =1) while
As contains the support of1tx' i.e. (u E
U, 1tx(u) > OJ .
Let A I and A2 be two subsets of U which both satisfy (1) for the same
possibility distribution 1tx ' then we see that
"AI is possible" and "A2 is possible" ~ \;;fUE U, max(IlA I (u),IlA2(u» ~ 1tx(u) (4)
while if Al and A2 both satisfy (2), we have
"AI is certain" and "A2 is certain" ~ \;;f u E u, 1tx(u) ~ min{J.lA I (u),IlA2(u» . (5)

We observe that pieces of knowledge which are simultaneously qualified in terms of


possibility are combined by means of max operation in a union-like manner, while
pieces of knowledge which are simultaneously qualified in terms of certainty are
combined by means of min operation in an intersection-like manner.
Let us consider cases of qualification where the possibility or the certainty is not
complete but corresponds to an intermediary level a in the scale [0,1]. It leads to the
two following generalizations of (1) and (2)
i) the statement "A is a possible range for x at least at the degree a" will be
understood as \;;fUE A, 1tx(u) ~ a, which leads to
48

"A is a-possible" is translated by 'V u E U, min(IlA(u), a) ~ 1tx (u). (6)


Note that for a = 1, (1) is recovered.
ii) the statement "it is certain at least at the degree a that the value of x is in A",
will be interpreted as any value outside A is at most possible at the
complementary degree, namely 1 - a, i.e. 'Vue A, 1tx(u) ~ 1 - a, which leads to
"A is a-certain" is translated by 'V u E U, 1tx(u) ~ max{J.I.A(u), 1 - a) (7)
Note that for a = 1, (2) is recovered. Certainty qualification was first discussed by
Yager (1984) and Prade (1985) ; in this latter reference, (7) already appears with
equality (i.e. the less restrictive possibility distribution compatible with (7) is
chosen). Possibility-qualification goes back to Zadeh (1978b) and Sanchez (1978).
Clearly (4) and (5) can be straightforwardly extended to "~ is ai-possible" and
to"~ is ai-certain", for i = 1,2 using (6) and (7) respectively.

2.3. Specifications by Means of Fuzzy Subsets


We now consider the more general case where the subset A which is qualified is
fuzzy. It is well-known that a fuzzy (sub)set A can be represented in terms of a
collection of ordinary subsets, namely the a-cuts Aa = {u E U, IlA(u) ~ a} of A.
We have (Zadeh, 1971)
'Vu, IlA(U) =sUPaE (0,1] min{J.I.Aa(u), a) (8)
Then we immediately notice that, if we interpret "A is possible" (where A is fuzzy)
as the conjunctive collection of possibility-qualified non-fuzzy statements of the form
"Aa is a-possible", 'V a E (0,1], we obtain, using (6) for the possibility-
qualification and (4) for the max-combination (here extended to a sup-combination
since the collection may be not finite)
'V u E U, sUPaE (0,1] min{J.I.Aa(U), a) ~ 1tx(u)
i.e. 'V u E U, IlA(u) ~ 1tx (u)
which clearly generalizes (1) to a fuzzy set A.
From (8), by taking the complement of A, i.e. the complement to 1 of its
membership function, we can obtain another representation formula, namely
= =
'V u E U, 1lA:(u) 1 - IlA(u) infaE (0,1] max(1lAI-a (u), 1 - a) (9)

where the overbar on a subset denotes the complementation and we use the identity
=
~ (A) 1- a ' with B ~ denoting the strong ~cut of a fuzzy set B, namely
{u E U, IlB(u) > ~} (i.e. '~' is changed into '>' in the definition of the level cut).
Clearly (9) applies to any A and thus (9) still holds changing A into A, which gives
'V u E U, IlA(u) =infaE (O,I] max{J.I.AI_a(u), 1- a) (10)

Then, if we interpret "A is certain" (where A is fuzzy) as the conjunctive


combination of certainty-qualified non-fuzzy statements of the form A I-a is a-
certain, 'V a E (0,1], we obtain using (7) and the min-combination (5) (extended to
49

an inf-combination)
'V u E U, 1tx(u) :s; infae (0,1] max{J.1A I-a(u), I - a)

i.e. 'V u E U, 1tx(u):s; ~A(u) (11)


which clearly generalizes (2) to a fuzzy set A. The interpretation of "A is certain"
(i.e. we are completely certain that the possible values of x are restricted by ~A>, as
the conjunction of statements of the form "A I-a is (at least) a-certain" is quite
natural. Indeed we are completely certain (a = I) that the support An = (u E U,
~A(u) > o} of A contains the value of x, while the certainty that the strong f3-cutof
A includes the value of x decreases when p increases, because the f3-cut becomes
smaller due to the nestedness property p:s; W~ Ap ~ AW (here p = I - a). Note
also that "AI-a is at least a-certain", according to (7), means that any value in
AI-a =(A)a is at most possible at the degree I - a (or if we prefer at least
impossible at the degree a).
An immediate consequence of (9) and (II) is that if the fuzzy set A is both a
completely possible and completely certain fuzzy range for the value of x in the
above sense, then we should have
'V u E U, 1tx(u) = ~A(u) (12)
i.e. the equality with which Zadeh (l978a) starts the introduction of possibility
theory.
We now generalize (6) and (7) to a fuzzy set A thus introducing gradedness in
possibility and certainty qualification of fuzzy subsets. "A is possible" has been
intetpreted as "Ap is at least f3-possible" 'V p E (0,1]. Then it is natural to interpret
"A is a-possible" as 'V p ~ a, "Ap is at least a-possible" and 'V p < a, "Ap is at
least P-possible". In other words, the smallest f3-cuts with p close to 1 are only
assigned a minimal degree of possibility equal to a (instead of P). Then using the
max-combination (4), we get
'V u E U, max(supp~a min{J.1Ap(u), a), supp<a min{J.1Ap(u), P» :s; 1tx(u)
i.e.
max(min(sup~a min(~Ap(u), P), a), sUPp<a min(~Ap(U), P» :s; 1tx(u)
i.e.
'VueU, min(suppe(O,I] min{J.1Ap(u),p),a):S;1tx(u) (since supp<a min{J.1Ap(u),p):s;a)

i.e.
(13)
which extends (6) to the case where A is fuzzy. Interestingly enough, (13) was already
discussed by Zadeh (l978b) and Sanchez (1978) for possibility-qualification purposes.
Similarly, "A is certain" has been intetpreted as "AI=]) is at least P-certain",
'V PE (0,1]. Then "A is a-certain" will be intetpreted as "Al'='P is at least min( a,p)
50

certain". Then using the min-combination (5), we get (Dubois and Prade, 1990a, d):
V' u E U,1tx(u) ~ inf~E (0,1] maxULA 4(u), 1 - min(cx,~»

i.e.
V' U E U,1tx(u) ~ max(inf~E (0,1] maxULA q(u), 1 - ~), 1 - cx)

i.e. V' U E U, 1tx(u) ~ maxULA(u), 1 - cx) (14)


which extends (7) to the case where A is fuzzy.
Before applying the above model, especially (13) and (14), to the representation
of different types of fuzzy rules in Section 4, it is important to clarify the
relationship between the notion of possibility qualification introduced in this section,
and possibility theory as developed until now (Zadeh, 1978a ; Dubois and Prade,
1988).

3. TWO COMPLEMENTARY NOTIONS OF POSSIBILITY


The idea of possibility which has been used for qualification purposes in the
preceding section is not the same as the one underlying the definition of a possibility
measure. Indeed when" A is possible" is modelled by the inequality
V' u E U, J.1A(u) ~ 1tX<u)
it means, when A is an ordinary subset, that
V' u E A, 1tx (u) 1= (15)
while the measure of possibility TI induced by 1tx, of a non-fuzzy event A is defined
by (Zadeh,1978a)
TI(A) =SUPUE A 1tx(u) (16)
and clearly, when A has a bounded support,
=
TI(A) 1 ~ 3 u E A, 1tx(u) 1 = (17)
TI(A) = 1 only says that the statement "x is in A" is consistent with the available
information described by 1tx. The discrepancy between (15) and (17) is obvious and is
expressed by the difference between the logical quantifiers 'V" and '3'. This
discrepancy between possibility measures and the notion of "possible" appearing in
possibility-qualification was noticed by Zadeh (1978b), but no attempt had been made
to define the set-function underlying possibility-qualification.
The notion of possibility introduced in Section 2 rather relates to the following
quantity, called "everywhere-possibility" or for short E-possibility of A
&(A) =infUE A 1t x(u) (18)
which is such that
"A is possible" in the sense of (1) ~ &(A) 1 = (19)
Clearly the set functions TI and & correspond respectively to a weak and a strong
requirement of possibility, and indeed V' A, TI(A) ~ &(A), or if we prefer V' cx, &(A) ~
cx => TI(A) ~ cx. In practice it corresponds to the distinction between saying that "it
is possible that A is true" which means that there exists at least one value u in A
51

=
which is completely possible for x (i.e. n(A) 1), and saying that "A is possible" is
short for "the range A is (completely) possible for x" whose intended meaning is
really that all the values in A are possible for the variable x. This latter notion of
possibility is particularly important, as advocated in this paper, for the specification
of possibility distributions in general and more particularly of fuzzy rules.
The notion of E-possibility seems to have been largely ignored in the fuzzy set
literature. However its counterpart in Shafer (1976),s evidence theory is well-known ;
it is the commonality function Q, which, by the way, is mainly used for technical
reasons and does not seem to have received any practical interpretation until now.
Indeed starting with a basic probability assignment m such that I. A m(A) = 1, the
=
commonality of A is defined by Q(A) I.B2A m(A) ; it can be easily checked that
the following analogue of (19) holds
Q(A) = 1 ¢:> 'V u E A, PI«(u}) = 1
=
where the plausibility function PI is defined by PI(C) I.Ci'lB.t0 m(B). This results
=
from Q(A) 1 ¢:> 'VB such that m(B) > 0, B 2 A ¢:> 'V u E A, 'VB such that
m(B) > 0, u E B ¢:> 'V u E A, PI«(u}) = 1. Moreover L1(A) can be put under a form
which looks analogous to Q(A). Indeed introducing the fuzzy set F such that ~F =
1t x' we have L1(A) =infuE A ~F(u) ; hence 'V u E A, ~F(u) ~ L1(A) or equivalently
A ~ F L1(A) , and more generally 'V a ~ L1(A), A ~ Fa' Besides ~£>O, A ~ F L1(A)+£
(from the defmition of L1(A». Hence
L1(A) = supra E (0,1], Fa 2 A) (with the convention sup & = 0 if & =0)
A possibility distribution 1tx such that (1t x(u) E (0,1], u E U) is an ordered
finite set M = {aI' ... , an} with al = 1 > .. . > an > a n+l = 0, is equivalent to the
basic probability assignment (Dubois and Prade~ 1982) defmed by
'Vi, m(Fai) =ai - ai+ 1
'V B ::F- Fai, m(B) =0
Then, using the nestedness property a < ~ ~ Fa:> F~, it can be easily seen that
=
L1(A) Q(A), where Q is defined from the function m above, i.e. the two definitions
coincide. More generally, for A remaining non-fuzzy, it is easy to see that
A is (at least) a-possible ¢:> 'V u E A, 1tx(u) ~ a ¢:> L1(A) ~ a (20)
which generalizes (19). The definition of the E-possibility can be extended to fuzzy
sets still preserving the equivalence
A is (at least) a-possible ¢:> 'V u E U, 1tx(u) ~ min~A(u), a) ¢:> L1(A) ~ a (21)
This is satisfied by taking
=
L1(A) infuEU ~A(u) ~ 1t x(u) (22)

={ .
lifa~b
where a ~ b is a multiple-valued logic implication connective,
bifa>b
known as GO<lel's implication. It is easy to see that (22) reduces to (18) when A is an
52
ordinary subset. Moreover (21) is ensured by the equivalence a --+ b ~ c ¢::> b ~
min(a,c). By contrast, a lower bound a on the extension of the possibility measure
TI (defined by (16» to a fuzzy event A, i.e.
TI(A) =sUPueU min{J.LA(u), 1tx(u» ~ a (23)
is equivalent to \:;/ ~ < a, TI(A[3> ~ a, i.e. \:;/ ~ < a, 3 u e A~, 1tx(u) ~ a (see, e.g.,
Dubois and Prade, 1990a), which clearly departs from .1(A) ~ a, the latter means that
\:;/ ~ ~ a, \:;/ u e A~, 1tx(u) ~ a and \:;/ ~ < a, \:;/ u e ~, 1tx(u) ~ ~.
We now identify to what evaluation of A is associated the certainty qualification
presented above. "A is a-certain" is represented, in the general case where A is fuzzy,
by (14), i.e.
\:;/ u e U, 1tx(u) ~ max{J.LA(u), 1- a)
Since c ~ max(a, 1 - b) ¢::> (1 - a) --+ (1 - c) ~ b, where --+ denotes Glklel's
implication, (14) is equivalent to (Dubois and Prade, 1989a)
eN"(A) =infueU (1 - ~A(u» --+ (1-1tx(u» ~ a (24)
i.e.
"A is a-certain" ¢::> eN"(A) ~ a.
When A is an ordinary subset, (24) reduces to
eN"(A) =1- TI(A) =infue:A (1-1tx(u» (25)
where the overbar denotes the complementation. It means that certainty qualification,
when A is non-fuzzy is in complete agreement with the necessity measure based on
1t x and defined by duality with respect to the possibility measure. However when A
is fuzzy, the duality relation eN"(A) = 1 - TIO\), where II is extended to fuzzy events
by (23), is no longer satisfied. When A is fuzzy, since (14) is equivalent to "AI=JJ is
at least min(a,~)-certain", we get
eN"(A) ~ a¢::>\:;/ ~ < 1, eN"(A]V ~ min(a, 1 - ~) (26)
which expresses the relation between certainty-qualification and the measure of
necessity of the f3-cuts of A. See (Dubois and Prade, 1990a) for a discussion about
certainty-qualification in possibilistic logic with fuzzy predicates (which is in full
agreement with eN" defined by (24» and (Dubois and Prade, 1989a, 1990d) for the
distinct uses of eN"(A) and 1 - TI( A), the former in certainty-qualification, the latter
in fuzzy pattern matching. More particularly, as already said in (Dubois and Prade,
1989a), the statement "A is certain" may either mean that we are certain that the
possible values of x are inside A, i.e. 1t x ~ ~A (which is captured by eN"(A) = 1), or
that we are certain that the value of x is among the elements of U which completely
belong to A, i.e. support1t x = (u e U, 1tx(u) > O} .s:. core(A) = (u e U, ~A(u) = I}
which is captured by 1 - TI( A) = 1). This latter interpretation which is more
demanding is clearly related to the fuzzy filtering of fuzzily-known objects ; see
(Dubois, Prade and Testemale, 1988).
53
4. REPRESENTATION OF DIFFERENT KINDS OF FUZZY RULES
We now apply the results of Section 2 on possibility and certainty qualifications
to the specification of fuzzy rules relating a variable x ranging on U to a variable y
ranging on V.
Possibility rules : A first kind of fuzzy rule corresponds to statements of the form
"the more x is A, the more possible B is a range for y". If we interpret this rule as
"'ltu, if x = u, B is a range for x is at least Jl A(u)-possible", a straightforward
application of (13), yields the following constraint on the conditional possibility
distribution 1tylx( . ,u) representing the rule when x u =
'It u E U, 'It v E V, min(JlA(u), JlB(v» :5 1tylx(v,u). (27)
Certainty rules : A second kind of fuzzy rule corresponds to statements of the form
"the more x is A, the more certain y lies in B". Interpreting the rule as "'ltu, if x u, =
y lies in B is at least JlA(u)-certain", by application of (14) we get the following
constraint for the conditional possibility distribution modelling the rule
'It u E U, 'It v E V, 1tylx(v,u) :5 max(JlB(v), 1 - JlA(u». (28)

In the particular case where A is an ordinary subset and where we know that, if x is
in A, B is both a possible and a certain range for y, (27) and (28) yield

{
'It u E A, 1tylx(v,u) =JlB(v) (29)
'It u e A, 1tylx(v,u) is completely unspecified.
This corresponds to the usual modelling of a fuzzy rule with a non-fuzzy condition
part. Note that B may be any kind of fuzzy set in (27), (28) and then in (29). Thus B
may itself includes some uncertainty ; for instance the membership function of B
=
may be of the form JlB max(JlB*' 1 - ~) in order to express that when x is A, B*
is the (fuzzy) range of y with a certainty ~ (any value outside the support of B*
remains a possible value for y with a degree equal to 1 - ~) ; we may even have an
unnormalized possibility distribution which can be put under the form JlB =
min(JlB*,a) if the possibility that y takes its value in V is bounded from above by a
(i.e. there is a possibility 1 - a that y has no value in V, when x takes its value in
A).
Gradual rules : This third kind of fuzzy rule has been discussed in (Dubois and Prade,
1989a, 1990b, d). Gradual rules correspond to statements of the form "the more x is
A, the more y is B". Statements involving "the less" in place of "the more" are
easily obtained by changing A or B into their complements A and B due to the
equivalence between "the more x is A" and "the less x is A" (with Jl A 1 - Jl~. =
More precisely, the intended meaning of a gradual rule can be understood in the
following way: "the greater the degree of membership of the value of x to the fuzzy
set A and the more the value of y is considered to be in relation (in the sense of the
rule) with the value of x, the greater the degree of membership to B should be for this
value of y", i.e.
54

or, using the equivalence min(a,t) S b ~ t S a -+ b where -+ denotes GOdel's


implication,
I if J.1A(u) S J.1B(v)
'tI u e U, 1ty lx (v,u) S J.1A(u) -+ J.1B(v) = { (31)
J.1B(v) if J.1A(u) > J.1B(v)
(31) can be equivalently written
'tIueU,1tylx(v,u) S max{J.1(J.1A(U),I]{J.1B(v», J.1B(v» =J.1[J.1A(U),I]uT{J.1B(v» (32)

where J.1(J.1A(u),I] is the characteristic function of the interval (J.1A(u),1] and where T

is the fuzzy set of [0,1], defined by 'tI t e [O,I],J.1T{t) =t, which models the fuzzy
truth-value 'true' in fuzzy logic (Zadeh, 1978b). If we remember that "x is A
is 't-true", where 't is a fuzzy truth-value modelled by the fuzzy set 't of [0,1], is
represented by the possibility distribution (Zadeh, 1978b)
=
'tI u e U, 1tx(u) Ilt{J.1A(u» (33)
=
(note that 't T yields the basic assignment (12» , we can interpret the meaning of
=
gradual rules in the following way using (32) : 'tI u e U, if x u then y is B is at
least J.1A(u)-true. The membership function of the fuzzy truth-value 'at least a-true' is
pictured in Figure La. As it can be seen, it is not a crisp "at least a -true" (which
would correspond to the ordinary subset [0.,1]), but a fuzzy one in agreement with
truth-qualification in the sense of (33), indeed "at least I-true" corresponds to the
fuzzy truth-value T.
If we are only looking for the crisp possibility distributions 1tylx (i.e. the {O,l)-
valued ones) which satisfy (30), because we assume that it is a crisp relation between
y and x which underlies the rule "the more x is A, the more y is B", then we obtain
the constraint
I if J.1A(u) S J.1B(v)
'tI u e U, 1ty lx(v,u) S{ . =J.1(J.1
(u) 1]{J.1B(v» (34)
o if J.1A(u) > J.1B(v) A'
which expresses that the rule is now viewed as meaning: 'tIueU, if x =u then y is B
is at least J.1A(u)-true, where the truth-qualification is understood in a crisp sense. The
reader is referred to (Dubois and Prade, 1990b) for a discussion of this kind of rule.
A fourth type of fuzzy rules : The inequality (30) looks like (27) when exchanging
J.1B(v) and 1t y lX<v,u), while (31), which is equivalent tQ (30), is analogue to (28) in
the sense that in both cases 1tylx is bounded from above by a multiple-valued logic
implication function (in (28) it is Dienes' implication: a -+ b =max(1 - a, b) which
appears). It leads to consider the inequality constraint obtained from (28) by
exchanging J.1B(v) and 1ty lx(v,u), i.e.
'tI u e U, 'tI v e V, max(1ty lx(v,u), 1 - J.1A(u» ~ J.1B(v) (35)
This corresponds to a fourth kind of fuzzy rules, of which we now investigate the
55
intended meaning. (35) is perhaps more easily understood by taking the complement
to 1 of each side of the inequality, i.e.
"I u e U, "I v e V, 1 - J.1B(v) ~ min{J.1A(u), 1 -1tylx(v,u»
which can be interpreted as "the more x is A and the less y is related to x, the less y
is B", which corresponds to a new type of gradual rule. Using the
equivalence min(a, 1 - t) :5> 1 - b <=> 1 - t :5> a -+ (1 - b) <=> t ~ 1 - (a -+ (1- b»,
where -+ is G&lel's implication, we can still write (35) under the form
0 if J.1A(u) + J.1B(v) :5> 1
"IueU, "IveV, 1tylX<v,u)~ {
J.1B(v) if J.1A(u) + J.1B(v) > 1
=min{J.1(I-J.lA(u),I]{J.1B(v», J.1B(v» (36)

=J.1(1-J.lA(U),l)n,.<J.1B(v»
Unsurprisingly, the lower bound of 1tylx(v,u) which is obtained, is a multiple-valued
logic conjunction function of J.1 A(u)and J.1B(v) (indeed f(a,b) =0 if a + b :5> 1 and
= = = =
f(a,b) b otherwise, is such that f(O,O) f(O,I) f(I,O) 0 and f(I,I) 1). From =
(36) we see that this type of gradual rules can be interpreted in the following way,
using (33) : "I u e U, if x = u then y is B is at least (1 - J.1A(u»-true. The
membership function of the corresponding fuzzy truth-value "at least (1 - (X)-true" is
1 1 .---------.
/
(X

I-a
/
O~ _ _ _ _ __ '
0 " - - - - - -......
a 1 1
a. "at least a-true" b. "at least a-certainly true"
(core point of view)
1 / 1

0-------....1 O.......----.~....
I-a (X 1
c. "at least a-possibly true" d. "at least (l-a)-true"
(support point of view)
Fi~ure 1 :Four basic OOles of fuzzy truth-values
56

pictured in Figure l.d. As it can be observed by comparing Figures l.a and l.d, they
correspond to two points of view in (fuzzy) truth-qualification of level a, one
insisting on the complete possibility of degrees of truth greater than a (core point of
view), the other insisting on the complete impossibility of degrees of truth less or
equal to 1 - a (support point of view).
Figures I.b and I.c picture the fuzzy truth-values "at least a-certainly true" and
"at least a-possibly true", whose respective membership functions are
maxOJ..r ' 1 - a) and min(J.1T.a). It can be seen on Figure 1 and formally checked
that the four fuzzy truth-values we have introduced satisfy the two duality relations
'at least a-certainly true' =compoant ('at least a-possibly true') (37)
=
'at least a-true (core p. of v.)' compoant (at least l-a-true (support p. of v.» (38)
where comp and ant are two transformations reflecting the ideas of
=
complementation and antonymy respectively, and defmed by com p(f(t» 1 - f(t) and
ant(f(t» =f(1 - t), V t E [0,1] and f ranging in [0,1]. Note that compoant =
antocomp. Note that when there are only two degrees of truth, 0 (false) and 1 (true),
"at least a-certainly true" corresponds to the possibility distribution 1t(true) = 1,
= = =
1t(false) 1 - a and "at least a-possibly true" to 1t(false) 0 and 1t(true) a, while
the two other (fuzzy) truth-values make no sense. Dually when there are only two
degrees of possibility, 0 (complete impossibility) and 1 (complete possibility), then
the representations of "at least a-true" and "at least (1 - a)-true" respectively coincide
with the ordinary subsets [a,l] and (1- a, 1].
The four fuzzy truth-values pictured in Figure 1 (with a =IlA(u» can be viewed
as representing modifiers q> (in the sense of Zadeh (1972» which modify the fuzzy set
B into B* such that IlB* = q>(J.1a), in order to specify the subset of interest for y in
the various rules when x =u. For summarizing, in the case of
- possibility rules, the possibility distribution 1t ylx(· ,u) is bounded from below by
= =
q>(J.1a) with q>(t) min(J.1A(u), t), i.e. B is truncated up to the height a IlA(u) ;
- certainty rules, the possibility distribution 1t ylx( . ,u) is bounded from above by
=
q>(J.1:S> with q>(t) max(t, 1 - IlA(u», i.e. B is drowned in a level of indetermination
I-a;
- gradual rules (core point of view), the possibility distribution 1tylx ( . ,u) is bounded
from above by q>(J.1:S> with q>(t) = IlA(u) ~ t (where ~ denotes GMel's
implication), i.e. the core of B is enlarged ;
- gradual rules (support point of view), the possibility distribution 1t ylx( . ,u) is
bounded from below by q>(J.1:S> with q>(t) = 0 if Il A(u) + t ~ 1 and q>(t) = t
otherwise, i.e. the support of B is diminished, truncated.
~ : The similarity between (27) and (30) suggests that "the more x is A, the more
y is B", where y is in relation R with x, can be understood as meaning that VUE U,
B represents the statement "R(u) is a range for y which is at least possible at the
57

degree IlA(u)", where R(u) is the fuzzy set of elements in V in relation with u.

5. INFERENCE WITH FUZZY RULES


5.1. Parallel Rules with a Precise Input
Depending on the kind of constraint induced by their representation (the
possibility distribution is bounded from below or from above) the combination of
several rules in parallel of the same type will be performed differently. Namely for
certainty rules and for gradual rules focusing on cores, described by means of pairs
(~,Bi)' i = l,n we have
'rj i = l,n, 'rj u e U, 'rj ve V,1tylx(v,u) ~ IlAi(u) ~ IlBi(v)
where E* denotes Glidel's or Dienes' implication; then by combination we get
'rj u e U, 'rj ve V, 1tylx(v,u) ~ mini=I,n (J.lAi(u) ~ IlBi(v» (39)
while with possibility rules and gradual rules focusing on supports, we have
'rj i = l,n, 'rj u e U, 'rj ve V, 1tylx(V,u) ~ IlAi(u) & IlBi(v)
where & denotes the min conjunction or the non-symmetrical one introduced above;
which yields
'rj u e U, 'rj ve V, 1tylx(v,u) ~ maxi=l,n (J.lAi(u) & IlBi(v» (40)
The existence of two models of combination of systems of fuzzy rules has been
pointed out by several authors including Baldwin and Pilsworth (1979), Tanaka et al.
(1982), Di Nola et al. (1985), when considering special cases of implication
functions E* in contrast with the min-conjunction for combining the fuzzy sets ~
andB i·
The Figures 2.a and 2.b illustrate the behaviour of the four types of rules in case
of two parallel rules relating A 1 and B 1 on the one hand and A2 and B2 on the other
hand when the value x is precisely known, i.e. 1t x = IlA' with A' = {uO}. In Figure
2.a, x = Uo perfectly satisfies the requirements modelled by A 1 and A 2 , i.e.,
IlAI (00) = IlA2(00) = 1 and we obtain as a conclusion for y, with 1ty = IlB'
'rj ve V, IlB'(v) ~ min(IlB l (v) ,IlB 2(v» (conjunction of the conclusions)
for certainty rules and for gradual rules (focusing on cores) due to (39), and
'rj ve V, IlB'(v) ~ max{J.lB1(v),IlB 2(v» (disjunction of the conclusions)
for possibility rules and for gradual rules (focusing on supports) due to (40). The fact
that we obtain B' 2 B1 u ~ when (uO) .s:. Al (\ A2 for possibility rules, for
instance, should not be surprizing ; indeed we have to remember that each rule
expresses that "the more x is Ai' the greater the level of possibility of Bi as a
possible range for y" for i = 1,2, then since y may lie in both B1 andB 2 , B 1 u ~

should be a possible range for y. In Figure 2.b, we have IlA2(UO) = 1 again, but now
58

IlAl (00) = ex < 1. The difference between certainty rules and gradual rules focusing
on cores appears clearly : for certainty rules, the intersection of B2 with B 1 is
pervaded with a level of uncertainty 1 - ex (i.e. min(Il~, max{J.1B 1' 1 - ex»), while
the upper bound of B' for the gradual rules stays between B 1 andB2' overlapping a
little more B 2 . Similarly the difference between possibility rules and gradual rules
focusing on supports also appears ; for the former we obtain IlB' ~
max{J.1~,min(IlBl'ex» which expresses that the values in Blare regarded as a priori

u
A = {uo}

1~--r---~---T----

B' ~Bl n B2

~--~------~--~--~v

Gradual rules (focusing on cores) Certainty rules

Possibility rules Gradual rules (focusing on supports)


Figure 2.a: 1\vo rules in parallel and a precise input perfectly matching the conditions
59

less possible than the ones in B 2 ; for the latter some values in the support of Blare
considered as potentially impossible.
S.2. Generalized Modus Ponens with One Rule
In section 4, when studying the representation of the four types of rules
considered in the paper, we have described the response B' of a rule to a precise input

~L---------~~----~u
A'= {uo}

1~--~--~--~----

L.-_ _ _ _ _ _ _ _ _--l-_____......
~ V

Gradual rules (focusing on cores) Certainty rules

I-a

.....r---'--------~---.L....~V

Possibility rules Gradual rules (focusing on supports)


Figure 2.b: 1\yo rules and a precise input imperfectly matchiDl~ one of the conditions
A' = {uO} under the form IlB' =<p(J.1B> where cp represents a particular fuzzy truth
value which acts as a modifier. We now consider the more general situation of the
generalized modus ponens (Zadeh, 1979), namely
60

x is A'
<rule relating x is A with y is B>
Y isB'
As usual, and in agreement with (12), "x is A' " will be understood as
'V u e U, 1tx(u) =J.1A'(u)
while the rule, depending on the case, is represented
by 'V u e U, 'V v e Y, 1tylx(v,u) ~ J.1A(u) ~ J.1B(v) (case I)
orby 'V u e U, 'V v e Y, 1tylx(v,u) ~ J.1A(u) & J.1B(v) (caseII)
Applying the combination/projection principle (Zadeh, 1979; see Dubois and Prade,
I990d for a discussion), i.e. here
'V v e Y, 1ty(v) =sUPueU min(1tx(u), 1tylx(v,u» (41)
Thus we get, with 'Vv, J.1B'(v) =1ty(v)
'V v e Y, J.1B{v) ~ sUPue U min(J.1A'(u), J.1A(u) ~ J.1B(v» (case I)
'V v e Y, J.1B'(v) ~ sUPueU min(J.1A{u), J.1A(u) & J.1B(v» (caseII)
Let us fll'St consider the two kinds of rules belonging to case I.
For certainty rules : we obtain
'V v e Y, J.1B{v) ~ sUPueU min{J.1A{u), max(1 - J.1A(u), J.1B(v»
=max{J.1B(v), 1 - N(A ; A'» (42)
provided that A' is normalized and where
N(A; A,) =infueU max{J.1A(u), 1 - J.1A'(u»
is the dual of the possibility measure n(A ; A') of the fuzzy event A defined by (23)
(with 1tx =J.1A')' and N(A; A') is thus equal to 1 - n(A; A') and plays a basic role
in fuzzy pattern matching as briefly recalled at the end of section 3. The inequality
(42) expresses the following. Our lack of certainty that all the values restricted by
J.1 A' are (highly) compatible with the requirement modelled by J.1 A' induces a
possibility at most equal to 1 - N(A ; A') that the value of y is outside the support
of B. In other words (42) means that it is N(A ; A') certain that y is restricted by B.
For gradual rules focusing on cores, we have
'V v e Y, J.1B'(v) ~ sUPue U min{J.1A'(u), J.1[J.1A(u),I]uT{J.1B(v)))
From which it can be concluded that the least upper bounds derivable from the above
inequality are given by (Dubois and Prade, 1988) :
• J.1B'(v) ~ 1, 'V v e (ve Y, J.1B(v) ~ infueU (J.1A(u) I J.1A'(u) = I}}
• J.1B'(v) ~ sUPueU (J.1A'(u) I J.1A(u) =O} =n(support(A) ; A'),
'V v e (ve Y, J.1B(v) =O}
This shows that when A' is no longer a singleton, the enlarging effect of the core of
B may be increased and a non-zero possibility n(support(A) ; A') may be obtained
for values outside the support of B, for the possibility distribution restricting y. The
level of possibility n( supporl(A) ; A') acknowledges the fact that some possible
61

values of x (in A') are not consistent at all with A.


\\e now consider the rules belonging to case II.
For gradual rules focusing on support, we have
'<I v e V, J.1B'(v) ~ sUPueU min{J.1A'(u), J.1(1-J..lA(u),I]nT]{J.1B(v)))
Noticeable greatest lower bounds derivable from the above inequality are
• J.1B'(v) ~ 0, '<I v e (v e V, J.1B(v) ~ I - sUP{J.1A(u) I J.1A'(u) > O}}
• J.1B'(v) ~ sUPueU (J.1A'(u) I J.1A(u) > O} = I1(support(A); A),
'<I v e (ve V, J.1B(v) = I}
As it can be seen, the truncation effect of the support of B may decrease, while the
height of the possibility distribution attached to y, equal to I1(support(A) ; A'), may
be less than I (without being 0), when A' is no longer a singleton. When some
values compatible with A' do not belong at all to A, the lower bound on the level of
possibility for values in B to be the value of y decreases.
For possibility rules, we have
'<IveV, J.1B'(v) ~ sUPueU min(J.1A,(u),J.1A(u),J.1B(v» = min{J.1B(v),I1(A; A) (43)
It expresses that as soon as there is no value in A' fully consistent with A, B is only
considered as an a-possible range for y (in the sense of (13», with a = I1(A ; A).
5.3 - Parallel Rules with a Fuzzy Input
Lastly, we consider the general problem of the generalized modus ponens in the
face of a collection of n rules of the same type, i.e. the pattern
x is A'
<rule relating x is Ai with Y is Bi> i = I,n
y is B'
At the theoretical level, the solution is straightforward, namely
'<I v e V, J.1B'(v) ~ sUPueU min(J.1A'(u), mini=1 ,n J.1A.(U)
• "1
ffi J.1B.(v)
1
(easel) (44)
'<I v e V, J.1B'(v) ~ sUPueU min(J.1A'(u), maxi=1 n J.1A.(U) & J.1B.(v) (ease II) (45)
, '"1 1
However at the practical level, the computation of these expressions raises problems
for some types of rules. There is no difficult problem for (45). Indeed it can be
checked that (45) is equivalent to
'<I v e V, J.1B'(v) ~ maxi=l,n sUPueU min(J.1A'(u), J.1-\(u) & J.1Bi(v»
i.e., for possibility rules we get
'<I v e V, J.1B'(v) ~ m~=I,n min(J.1B.(v), I1(~ ; A'» (46)
1
For certainty rules the following upper bound which can be derived from (44)
(assuming that A' is normalized) is no longer the best one,
'<I v e V, J.1B'(v) ~ mini=1 , n max{J.1B.(v),
1
1 - N(Ai ; A'» (47)
Indeed assume n = 2, and that AI' A2 , A' are distinct non{uzzy subsets, and
62
A' =Al U Az, with A' ~ AI' A' ~ Az, then N(~ ; A') =0, i = 1,2, and we get the
trivial result 'Vv, JlB'(v) S I by (47), although using (44) it would be possible to
conclude that 'Vv, JlB'(v) S max{JlB 1(v),Jl~(v», which is a satisfying result. It
emphasizes the fact that the rules should be jointly processed as in (44) in order to
get the more accurate result: it is not the case in (47) where the conclusions obtained
from x is A' and from each rule i are combined. Note that (46) and (47) are
respectively a weighted max and a weighted min combination ; when 'Vi,
I1(~ ; A') = 1, N(~ ; A') = 1, (46) and (47) yield the union and the intersection of

the Bi's respectively, see Dubois, Prade and Testemale (1988) for instance, for more
details. The case of implication-based gradual rules raises similar problems. The
processing of a collection of parallel gradual rules (focusing on cores) has been
investigated in (Dubois, Martin-Clouaire and Prade, 1988) and in Martin-Clouaire
(1988) to which the reader is referred. It is possible from the collection of rules to
build a new rule which, when applied to A', yields the optimal result, i.e. the value
of the upper bound expressed by (44) ; this new rule summarizes the knowledge
useful in the collection of rules for dealing with the fact "x is A' ".
Generally speaking, we have to define the consistency, the non-redundancy of
the set of fuzzy rules, and this leads to put some constraints on the coverage of U by
the Ai'S (see the first of the two above-mentioned references for definitions of these
notions). Clearly further research is needed for a complete investigation of the
practical processing of a collection of rules of a given type, also including the
problem of compound condition parts in the rules which have not been considered
here. Figure 2.c exhibits different behaviors of the four types of rules in the
case n = 2 where A' = Al ("'\ A2. We notice that we obtain B' ~ BI ("'\ ~ with
gradual rules focusing on cores, which confirms the "interpolation" flavor of this
behavior: if A' is between A 1 and A2 (in the sense of the intersection), then the
possible values of y are restricted by a fuzzy set in between Bland B2 ; a level of
uncertainty equal to 0.5 appears for certainty rules, this is due to the fact that with
continuous membership functions N(A; A) =0.5 as soon as A is a fuzzy set
(when A' =Al ("'\ A2 , we are not completely sure x that belongs to the core
of A I and to the core of A2' For the two types of rules corresponding to
case II, we obtain Bl U ~ ~ B' as expected (since here I1(~ ; A') =1 and
sUP{Jl~(u) I JlA'(u) > O} = 1, for i = 1,2).
As a final remark in this section, note that we may think of using two types of
rules simultaneously, especially for certainty and possibility rules, since it can be
checked that the two corresponding inequalities constraining 'Tty are consistent,
namely using both (42) and (43) we get
'V v E V, min{JlB(v) ; I1(A ; A'» S 'Ttlv) S max{JlB(v), 1 - N(A ; A'» (48)
It corresponds to the case of a piece of knowledge saying both that "the more x is A,
63

1~--~--~--~-----

L--_ _.L.-_ _ _ _ _ _---L.._ _~......~ V


Gradual rules (focusing on cores) Certainty rules

~--~------~----~~V
' - -_ _- ' - -_ _ _ _ _ _........_ _---.Il......~ V
Possibility rules Gradual rules (focusing on supports)
Figure 2.c: J\vo rules in parallel and a fuzzy input
the more possible B as a range for y and the more certain y is in B".

6 - RULES EXPRESSING PREFERENCE


So far, we have focused on "reasoning rules" whose aim is to describe the
relationship between relevant parameters in some problem, e.g. a diagnostic problem.
There is another class of "if... then ... rules" which express preference and whose aim
is to help in making choices rather than to make implicit knowledge explicit. Their
format is in the crisp case :
if <situation> then <decision>.
<situation> is the description of the states of the world where some decision can be
recommended. <decision> can be some action to perform, some procedure to trigger
64

or even an assignment statement (like "choose for y the value v0 ").When there are
many possible states of the world, it is difficult to partition them into rigid classes
where specific decisions can be totally recommended. As a result, the description of
the states of the world where a decision is relevant is often fuzzy, because decisions
can be more or less recommended. Hence the "if' part contains fuzzy predicates, and
the preference rule means
"the more the state of the world corresponds to <situation>,
the more recommended is <decisioo>"
Let x be a vector that contains the precise description of the world, S be a fuzzy set
of values of x corresponding to the description of a range of situations, and u(d) the
preference degree for decision d. By definition, u(d) = 0 means that d should be
rejected, u(d) = I means that d can be applied without any doubt The fuzzy preference
rule just indicates that u(d) can be quantified by IlS(x).
In fuzzy control (e.g. Mamdani (1977), Sugeno (1985», fuzzy rules can be
viewed as preference rules of the form
if x is ~ then (1ty = IlBi)
where Bi is viewed as a fuzzy set of recommended (possible) actions, and an action is
the selection of a value for y, the control parameter. Hence, it is a more general kind
of preference rules than the one when only one decision is involved in the conclusion
part Instead of proposing a single decision in situation Ai' a weighted ordered set is
proposed, as described by 1ty The preference ui(d) of assignment y = d, in the
presence ofx = Xo can be evaluated as a function ofIlBi(d) andll ~(xo>' for a single
rule i. Among natural conditions to be fulfilled is that ui(d) ~ min(J.1Bi(d),Il~(xo»,

which claims that Il~ (xo) stands as a upper bound on the degree of preference for y =
d, induced by rule i. When the equality is taken for granted, we get the fuzzy control
approach.
Usually a preference rule does not stand alone. The set of states of the world is
partitioned into a family of situations, where in each situation, a decision is
recommended. It corresponds to a decision table of the form
if <situation 1> then <decision 1>
else if <situation 2> then <decision 2>
else...
else if <situation n> then <decision n>
else <decision n + 1>
where <decision n + 1> may suggest to refrain from deciding in the case of an + Ith
situation that is defined by complementarity. If <decision i> corresponds to a single
decision, then, when <situation i> is fuzzy, the output is a fuzzy set of recommended
decisions {d i, ~(~», i=I,n+l} . This is what happens in the OPAL system for
instance (Bensana et al., 1988) where a decision table corresponds, to a priority rule
65

in a job shop, a decision is of the fonn "operation °


1 precedes operation ° 2 ", and
<10+ 1 recommends not to sequence the operations for the moment.
In fuzzy control, any elementary decision d receives an evaluation from several
preference rules, since the Bi's may overlap. Hence the preference weight for d, u(d),
is a function of ui(d), i = l,n (since there is no refraining from decision in fuzzy
control, i.e. ~+ 1 does not exist). A natural condition in a decision table is that a
decision is selected as soon as one rule recommending the decision in triggered.
Hence we should have
u(d) ~ maXj=I,n ui(d) (49)
The case of equality, again, corresponds to the choice of fuzzy control people. Strictly
speaking the above scheme works provided that the actual situation is precisely
described, i.e. x = xo. But, since the behavior of fuzzy decision rules is based on
fuzzy pattern matching between the description of the actual situation and the
prototypical situations in the fuzzy rules, extension to the case of ill-described inputs
is easy to envisage. If what is known about x is that x is A' where A' is a fuzzy set,
u(d) should become a fuzzy number induced by the degrees of compatibility between
A' and Ai (Zadeh, 1978b), i.e. Jl~(A'), defined by extending the function Jl~ to the
fuzzy value A'. Hence we should replace max in (49) by the extended max in fuzzy
arithmetic (e.g. Dubois and Prade, 1988).
In fuzzy pattern matching (Dubois, Prade and Testemale, 1988), we often use the
degrees of possibility n(A i ; A') and necessity N(Ai ; A )' , instead of the fuzzy truth-
value Jl ~ (A'). Letting an equality stand in (49), and approximating Jl~ (A') by the

interval [N(~ ; A'),n(A i ; A,)], we get the following results


min(N(Aj ; A'), JlB. (d» :S ui(d) :S min(n(Ai ; A'), JlB. (d)
am 1 1

maxi=l,n min(N(A i ; A'), JlB.(d»:S u(d):S maxi=1 n min(ll(Aj ; A'), JlB.(d)


1 ' 1
Note that in the fuzzy control literature only the upper bound of u(d) is adopted,
while the degree of necessity N(~ ; A') should be used to describe imprecision due to
a fuzzy input. Also it is clear from above that preference rules do not behave like
reasoning rules. Particularly, the inequality Uj(d) :S min{JlA(XQ),JlB.(d» is not in
1 1
accordance with the possibility rules in the previous sections since the semantics of
the latter leads to the opposite inequality.
A last issue with preference rules is that there may be more than one decision
tables involved in the selection of a decision d. Indeed several points of view,
objectives, etc ... may be simultaneously present, and there is some strategy to adopt
in the presence of conflicting decision tables. This problem is absent from the fuzzy
control literature where a fuzzy controller generally involves one decision table only.
However, in planning, decisions may be motivated by several conflicting criteria.
66

Again this is what happens in the OPAL system (Bensana et al., 1988) for instance.
The problem of cooperation between decision tables has been modelled in terms of
social choice (see Bel et al., 1989) and a software architecture for implementing fuzzy
decision tables and cooperation strategies has been devised (Dubois, Koning and Bel,
1989). It is based on a social choice interpretation of fuzzy set aggregation
connectives that is described elsewhere (Dubois and Koning, 1989).
The problem of preference rules and their implementation in rule-based systems
is certainly one of the important topics in Artificial Intelligence, for the forthcoming
years, as witnessed by some current activity in this area, from the standpoint of
utility theory (Keeney, 1988 ; Klein and Shortliffe, 1990), or cognitive psychology
(pinson, 1987).

7 - CONCLUDING REMARKS
The semantic contents of rules in fuzzy expert systems has received little
attention until now in spite of the enormous quantity of existing literature about
approximate reasoning and fuzzy controllers. The paper has tried to formally derive
different kinds of fuzzy rules based on very simple semantical considerations. Four
types of rules have emerged corresponding to very standard alterations of a possibility
distribution: enlarging its core, shrinking its support, truncating its height or,
drowning it in a uniform level of uncertainty. The paper has also pointed out that
fuzzy decision rules do not behave like fuzzy rules describing relationships.
Besides, rule-based expert systems have always been associated with an efficient
local computation strategy where a partial conclusion obtained from a (compound)
fact and a rule have to be combined with other conclusions pertaining to the same
matter and derived from other facts and rules. This kind of strategy can be especially
dangerous in presence of vague and uncertain pieces of knowledge since it may yield
conclusions which are not as accurate as it can be expected from the available
knowledge. Such conclusions may be even incorrect (see Pearl (1988), Heckerman
and Horvitz (1988), Dubois and Prade (1989b) for instance). This is due to the fact
that each rule, each variable to evaluate, cannot always be considered independently in
the evaluation process. A possibilistic hypergraph technique coping with this
problem has been recently developed by Kruse and Schwecke (1990), and by Dubois
and Prade (199Oc).

REFERENCES
Baldwin I.F., Pilsworth B.W. (1979) A model of fuzzy reasoning through multi-valued
logic and set theory. Int. I. of Man-Machine Studies, 11, 351-380.
Bel G., Bensana E., Dubois D., Koning J.L. (1989) Handling fuzzy priority rules in a
jobshop-scheduling system. Proc. of the 3rd. Inter. Fuzzy Systems Assoc. (IFSA)
Congress, Seattle, Wa., Aug. 6-11, 200-203.
Bensana E., Bel G., Dubois D. (1988) OPAL: a multi-knowledge-based system for
industrial job-shop scheduling. Int. I. Prod. Res., 26(5), 795-819.
Bonissone P.P., Gans S.S., Decker K.S. (1987) RUM : a layered architecture for
reasoning with uncertainty. Proc. of the 10th Inter. Ioint Conf. on Artificial
Intelligence (IJCAI-87), Milano, Italy, 891-898.
Bouchon B. (1987) Fuzzy inferences and conditional possibility distributions. Fuzzy
67
Sets and Systems, 23, 33-41.
Bouchon B. (1988) Stability of linguistic modifiers compatible with a fuzzy logic. In :
Uncertainty and Intelligent Systems (2nd Inter. Conf. on Information Processing and
Management of Uncertainty in Knowledge-Based Systems (IPMU'88), Urbino, Italy,
July 1988) (B. Bouchon, L. Saitta, R.R. Yager, eds.), Springer-Verlag, Berlin, 63-70.
Buchanan B.G., Shortliffe E.H. (1984) Rule-Based Expert Systems - The MYCIN
Experiment of the Stanford Heuristic Programming Project. Addison-Wesley, Reading,
Mass ..
Despres S. (1989) GRIF : a guide for representing fuzzy inferences. Proc. of the 3rd
Inter. Fuzzy Systems Assoc. (IPSA) Congress, Seattle, Wa., Aug. 6-11, 353-356.
Di Nola A., Pedrycz W., Sessa S. (1985) Fuzzy relation equations and algorithms of
inference mechanism in expert systems. In : Approximate Reasoning in Expert
Systems (M.M. Gupta, A. Kandel, W. Bandler, lB. Kiszka, eds.), North-Holland,
Amsterdam, 355-367.
Dubois D., Koning J.L. (1989) Social choice axioms for fuzzy set aggregation.
\\brkshop on Aggregation and Best Choices of Imprecise Opinions, Bruxelles, Jan.
1989. Available in Tech. Report IRITJ90-5/R, Univ. P. Sabatier, Toulouse, France. To
appear in Fuzzy Sets and Systems.
Dubois D., Koning J.L., Bel G. (1989) Antagonistic decision rules in knowledge-based
systems (in French). Proc. 7th AFCET Congress on Artificial Intelligence and
Pattern Recognition, Paris.
Dubois D., Lang J., Prade H. (1990) Fuzzy sets in approximate reasoning - Part 2 :
Logical approaches. Fuzzy Sets and Systems, 25th Anniversary Memorial Volume, to
appear.
Dubois D., Martin-Clouaire R., Prade H. (1988) Practical computing in fuzzy logic. In:
Fuzzy Computing - Theory, Hardware and Applications (M.M. Gupta, T. Yamakawa,
eds.), North-Holland, Amsterdam, 11-34.
Dubois D., Prade H. (1982) On several representations of an uncertain body of
evidence. In : Fuzzy Information and Decision Processes (M.M. Gupta, E. Sanchez,
eds.), North-Holland, Amsterdam, 167-181.
Dubois D., Prade H. (1984) Fuzzy logics and the generalized modus ponens revisited.
Cybernetics and Systems, 15, 293-331.
Dubois D., Prade H. (with the collaboration of Farreny H., Martin-Clouaire R.,
Testemale C.) (1988) Possibility Theory - An Approach to Computerized Processing
of Uncertainty. Plenum Press, New York.
Dubois D., Prade H. (1989a) A typology of fuzzy "If... then ... " rules. Proc. of the 3rd
Inter. Fuzzy Systems Assoc. (lFSA) Congress, Seattle, Wa., Aug. 6-11, 782-785.
Dubois D., Prade H. (1989b) Handling uncertainty in expert systems : pitfalls,
difficulties, remedies. In : Reliability of Expert Systems (E. Hollnagel, ed.), Ellis-
Horwood, Chichester, U.K., 64-118.
Dubois D., Prade H. (1990a) Resolution principles in possibilistic logic. Int. J. of
Approximate Reasoning, 3, 1-21.
Dubois D., Prade H. (1990b) Gradual inference rules in approximate reasoning. In :
Tech Report IRITJ90-6/R, IRIT, Univ. P. Sabatier, Toulouse, France. Information
Sciences, to appear.
Dubois D., Prade H. (199Oc) Inference in possibilistic hypergraphs. In : Tech. Report
IRITJ90-6/R, IRIT, Univ. P. Sabatier, Toulouse, France. Extended abstracts of the 3rd
Conf. on Information Processing and Management of Uncertainty in Knowledge-
Based Systems (IPMU'90), Paris, July 2-6, 228-230.
Dubois D., Prade H. (199Od) Fuzzy sets in approximate reasoning - Part 1 : Inference
with possibility distributions. Fuzzy Sets and Systems, 25th Anniversary Memorial
Volume, to appear.
Dubois D., Prade H., Testemale C. (1988) Weighted fuzzy pattern matching. Fuzzy Sets
and Systems, 28, 313-331.
Fonck P. (1990) Representation of vague and uncertain facts. Proc. of the 3rd Inter.
Conf. on Information Processing and Management of Uncertainty in Knowledge-
Based Systems (lPMU'90), Paris, July 2-6, 284-288.
Godo L., L6pez de Mantaras R., Sierra C., Verdaguer A. (1988) Managing linguistically
expressed uncertainty in MILORD : application to medical diagnosis. Artificial
68

Intelligence Communications, 1(1), 14-31.


Heckerman D.E., Horvitz EJ. (1988) The myth of modularity in rule-based systems for
reasoning with uncertainty. In : Uncertainty in Artificial Intelligence 2 (J.F. Lemmer,
L.N. Kanal, eds.), North-Holland, Amsterdam, 23-34.
Keeney R. (1988) Value-driven expert systems for decision support. Decision Support
Systems (4), 405-412.
Klein D.A., Shortliffe E.H. (1990) Integrating artificial intelligence & decision theory
in heuristic process control systems. Proc. of the 10th Inter. Workshop on Expert
Systems & their Applications, Avignon, France, May 28-June 1st, 2nd Generation
Expert Systems Volume, 165-177.
Kruse R., Schwecke E. (1990) Fuzzy reasoning in a multidimentional space of
hypotheses. Int. J. of Approximate Reasoning, 4, 47-68.
Magrez P., Smets P. (1989) Epistemic necessity, possibility, and truth - Tools for
dealing with imprecision and uncertainty in fuzzy knowledge-based systems. Int. J.
of Approximate Reasoning, 3,35-57.
Mamdani E.H. (1977) Application of fuzzy logic to approximate reasoning using
linguistic systems. IEEE Trans. Comput., 26, 1182-1191.
Martin-Clouaire R. (1989) Semantics and computation of the generalized modus
ponens : the long paper. Int. J. of Approximate Reasoning, 3, 195-217.
Mizumoto M., Zimmermann H.J. (1982) Comparison of fuzzy reasoning methods.
Fuzzy Sets and Systems, 8, 253-283.
Pearl J. (1988) Probabilistic Reasoning in Intelligent Systems - Networks of Plausible
Inference. Morgan Kaufmann Pub., San Mateo, Ca..
Pinson S. (1987) A multi-attribute approach to knowledge representation for loan
granting. Proc. of the 10th Inter. Joint Conf. on Artificial Intelligence (lJCAI-87),
Milano, Italy, 588-591.
Prade H. (1985) A computational approach to approximate and plausible reasoning
with applications to expert systems. IEEE Trans. on Pattern Analysis and Machine
Intelligence, 7(3), 260-283. Corrections, 7(6), 747-748.
Sanchez E. (1978) On possibility-qualification in natural languages. Information
Sciences, 15, 45-76.
Shafer G. (1976) A Mathematical Theory of Evidence. Princeton University Press,
Princeton.
Smets P., Magrez P. (1987) Implication in fuzzy logic. Int. J. of Approximate
Reasoning, 1, 327-347.
Sugeno M. (ed.) (1985) Industrial Applications of Fuzzy Control. North-Holland,
Amsterdam.
Tanaka H., Tsukiyama T., Asai K. (1982) A fuzzy system model based on the logical
structure. In : Fuzzy Sets and Possibility Theory (R.R. Yager, ed.), Pergamon Press,
New York, 257-325.
Trillas E., Valverde L. (1985) On mode and implication in approximate reasoning. In :
Approximate Reasoning in Expert Systems (M.M. Gupta, A. Kandel, W. Bandler,
J.B. Kiszka, eds.), North-Holland, Amsterdam, 157-166.
Yager R.R. (1984) Approximate reasoning as a basis for rule-based expert systems.
IEEE Trans. on Systems, Man and Cybernetics, 14(4), 636-643.
Yager R.R. (1990) On considerations of credibility of evidence. Tech. Report HMIT-
1006, Machine Intelligence Institute, lona College, New Rochelle, N.Y..
Zadeh L.A. (1971) Similarity relations and fuzzy orderings. Information Sciences, 3,
177-200.
Zadeh L.A. (1972) A fuzzy set-theoretic interpretation of linguistic hedges. J. of
Cybernetics, 2(3), 4-34.
Zadeh L.A. (1978a) Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and
Systems, 1, 3-28.
Zadeh L.A. (1978b) PRUF - a meaning representation language for natural languages.
Int. J. of Man-Machine Studies, 10. 395-460.
Zadeh L.A. (1979) A theory of approximate reasoning. In : Machine Intelligence (J.E.
Hayes, D. Michie. L.I. Mikulich. eds.), Vol. 9, Elsevier. New York, 149-194.
4
Fuzzy Logic Controllers
Hamid R. Berenji
Sterling Software
Artificial Intelligence Research Branch
NASA Ames Research Center
Mountain View, CA 94035

1 Introduction
Fuzzy Set Theory, introduced by Zadeh in 1965 [77], has been the subject
of much controversy and debate. In recent years, it has found many
applications in a variety of fields. Among the most successful applications
of this theory has been the area of Fuzzy Logic Control (FLC) initiated
by the work of Mamdani and Assilian [36]. FLC has had considerable
success in Japan, where many commercial products using this technology,
have been built.
In this paper, we will review the basic architecture of fuzzy logic con-
trollers and discuss why this technology often provides controllers with
performance similar to the performance of an expert human operator for
ill-defined and complex systems. In section 2, an introductory survey of
the basics of fuzzy set theory is presented. Next, the basic architecture
of a FLC is described, followed by a brief review of the application of
this theory. Finally, we discuss how a fuzzy logic based control system
can learn from experience to fine-tune its performance.

2 Fuzzy sets and Fuzzy logic: The basis for


Fuzzy Control
A fuzzy set is an extension of a crisp set. Crisp sets only allow full
membership or no membership at all, where fuzzy sets allow partial
membership. In other words, an element may partially belong to a set.
In a crisp set, the membership or non-membership of an element z in set
70

J.l(x)
NegatIve PosItIve

+
x
Figme 1: Examples of Fuzzy membership functions

A is described by a characteristic function #LA (z ), where:

1 ifzEA
#LA(Z)= { 0 ifz~A.

Fuzzy set theory extends this concept by defining partial memberships


which can take values ranging from 0 to 1:

#LA : X -+ [0,1]
where X refers to the universal set defined in a specific problem. If this
universal set is countable and finite, then a fuzzy set A in this universe
can be defined by listing each member and its degree of membership in
the set A : n
A = E#LA(Zi)/Zi .
i=l
Similarly, if X is continuous, then a fuzzy set A can be defined by

Note that in the above definitions, "/" does not refer to a division and
is used as a notation to separate the membership of an element from the
element itself. For example, in A = .2/ elementl + .6/ element2, elementl
has membership value of .2 and element2 has a membership value of .6
in the fuzzy set A. As another example, the linguistic term Positive
as shown in Figme 1 may be defined to take the following membership
function:
1 ifz>4
#Lpo3itive(Z) = { Z;l if1;:::z~4
o otherwise.
The support of a fuzzy set A in the universal set X is a crisp set that
contains all the elements of X which have degree of membership greater
71

than zero1 . In the above example, the support set includes all the real
numbers for which 1'( z) ~ o.
The a-cut of a fuzzy set A is defined as the crisp set of all the elements
of the universe X which have memberships in A greater than or equal
to a, where
Aa = {z E XII'A(Z) ~ a}.
For example, if the fuzzy set A is described by its membership function:

A = {.2/2 + .4/3 + .6/4 + .8/5 + 1/6}


and a = .3 then the a-cut of A is
A.3 = {3,4,5,6}.
The height of a fuzzy set is defined as the highest membership value of
the set. If height(A) = 1, then set A is called a normalized fuzzy set.

2.1 Fuzzy Set Operations


Assuming that A and B are two fuzzy sets with membership functions
of I'A and I'B, then the following operations can be defined on these sets.
The complement of a fuzzy set A is a fuzzy set A with a membership
function
I'A = l-I'A(Z).
The Union of A and B is a fuzzy set with the following membership
function
I'AuB = max{I'A,I'B}.
The Intersection of A and B is a fuzzy set

By definition, Concentration is a unary operation which, when ap-


plied to a fuzzy set A, results in a fuzzy subset of A in such a way that
reduction in higher grades of membership is much less than the reduction
in lower grades of membership. In other words, by concentrating a fuzzy
set, members with low grades of membership will have even lower grades
of memberships and hence the fuzzy set becomes more concentrated. A
common concentration operator is to square the membership function:

I'CON(A)(Z) = I'~(z)

1 Mabuchi [35] has used this concept in comparing fuzzy subsets.


72

( a) (b )

(c) (d)

Figure 2: (a)- Monotonic, (b)- Triangular, (c)- Trapezoidal, (d)- Bell-


shaped membership functions

and a typical concentration operator is the term Very which is also a


Linguistic Hedge [78]. For example, the result of applying the operator
Very on a fuzzy label Small is a new fuzzy label Very Small.
The Dilution operator is the converse of the concentration operator
described above:
J'DIL(A)(Z) = VJ'A(Z).

3 The Basic Architecture Of FLC


Different methods for developing fuzzy logic controllers have been sug-
gested over the past 15 years. In the design of a fuzzy controller, one
must identify the main control parameters and determine a term set
which is at the right level of granularity for describing the values of each
linguistic variable 2 • For example, a term set including linguistic values
such as { Small, Medium, Large} may not be satisfactory in some do-
mains, and may instead require the use of a five term set such as {Very
Small, Small, Medium, Large, and Very Large} .
Different type offuzzy membership functions have been used in fuzzy
logic control. However, four types are most common. The first type
assumes monotonic membership function such as those shown in Figure
2(a). This type is simple and has been used in studies such as [8, 64].
Other types using triangular, trapezoidal, and bell-shaped functions have
also been used as shown in Figure 2(b), (c), and (d) respectively.
2A linguistic variable is a variable which can only take linguistic values.
73

Decoder
(Defuzzifier)

Decision
Making Logic Know ledge Base I l Plant
I
Coder
(Fuzzifler)

Figure 3: A simple architecture of a fuzzy logic controller

The selection of the types of fuzzy variable directly affects the type
of reasoning to be performed by the rules using these variables. This is
described later in 3.3. Mter the values of the main control parameters
are determined, a knowledge base is developed using the above control
variables and the values that they may take. If the knowledge base is a
rule base, more than one rule may fire requiring the selection of a conflict
resolution method for decision making, as will be described later.
Figure 3 illustrates a simple architecture for a fuzzy logic controller.
The system dynamics of the plant in this architecture is measured by a
set of sensors. This architecture consists of four modules whose functions
are described next.

3.1 Coding the Inputs: Fuzzification


In coding the values from the sensors, one transforms the values of the
sensor measurements in terms of the linguistic labels used in the precon-
ditions of the rules.
If the sensor reading has a crisp value, then the fuzzification stage
requires matching the sensor measurement against the membership func-
tion of the linguistic label as shown in Figure 4( a). If the sensor reading
contains noise, it may be modeled by using a triangular membership
function where the vertex of the triangle refers to the mean value of the
data set of sensor measurements and the base refers to a function of the
standard deviation (e.g., twice the standard deviation as used in [69]).
Then in this case, fuzzification refers to finding out the intersection of
the label's membership function and the distribution for the sensed data
74

sensor
measurement
~(x)

1.0

x
I Xc>
(b)

Figure 4: (a)- Matching a sensor reading Zo with the membership func-


tion ~(z) to get ~(ZO)i (a)- crisp sensor reading (b)-fuzzy sensor reading.

as shown in Figure 4(b). However, the most widely used fuzzmcation


method is the former case when the sensor reading is crisp.

3.2 Setting up the Control Knowledge Base


There are two main tasks in designing the control knowledge base. First,
a set of linguistic variables must be selected which describe the values of
the main control parameters of the process. Both the main input param-
eters and the main output parameters must be linguistically defined in
this stage using proper term sets. The selection of the level of granularity
of a term set for an input variable or an output variable plays an im-
portant role in the smoothness of control. Secondly, a control knowledge
base must be developed which uses the above linguistic description of
the main parameters. Sugeno [49] has suggested four methods for doing
this:

1. Expert's Experience and Knowledge

2. Modelling the Operator's Control Actions

3. Modeling a process

4. Self Organization

Among the above methods, the first method is the most widely used
[36]. In modeling the human expert operator's knowledge, fuzzy control
rules of the form:

IF Error is small and Change-in-error is small then force is small


75

have been used in studies such as [51, 53]. This method is effective
when expert human operators can express the heuristics or the knowl-
edge that they use in controlling a process in terms of rules of the above
form. Applications have been developed in process control (e.g., cement
kiln operations [23]). Beside the ordinary fuzzy control rules which have
been used by Mamdani and others, where the conclusion of a rule is an-
other fuzzy variable, a rule can be developed whereby its conclusion is a
function of the input parameters. For example, the following implication
can be written:

IF X is Al and Y is BI THEN Z =ft(X, Y)


where the output Z is a function of the values that X and Y may take.
The second method above, directly models the control actions of
the operator. Instead of interviewing the operator, the types of control
actions taken by the operators are modelled. Takagi and Sugeno [55]
and Sugeno and Murakami [51] have used this method for modeling the
control actions of a driver in parking a car.
The third method deals with fuzzy modeling of a process where an
approximate model of the plant is configured by using implications which
describe the possible states of the system. In this method a model is de-
veloped and a fuzzy controller is constructed to control the fuzzy model,
making this approach similar to the traditional approach taken in con-
trol theory. Hence, structure identification and parameter identification
processes are needed. For example, a rule discussed by Sugeno [49] is of
the form:

for i = 1, .... , n where n is the number of such implications and the


consequence is a linear function of the m input variables.
Finally, the fourth method refers to the research of Mamdani and his
students in developing self-organizing controllers [44]. The main idea in
this method is the development of rules which can be adjusted over time
to improve the controllers' performance. This method is very similar to
recent work in the use of neural networks in designing the knowledge base
of a fuzzy logic controller which will be discussed later in this chapter.

3.3 Conflict Resolution and Decision Making


As mentioned earlier, because of the partial matching attribute of fuzzy
control rules and the fact that the preconditions of the rules do overlap,
76

I'(y)

11<><)

Sensor I Sensor 2 (d)


Figure 5: Examples of inference process with different fuzzy variable
membership functions: (a)- monotonic, (b)-triangular, (c)- trapezoidal,
(d)- bell-shaped

usually more than one fuzzy control rule can fire at one time. The
methodology which is used in deciding what control action should be
taken as the result of the firing of several rules can be referred to as the
process of conflict resolution. The following example, using two rules,
illustrates this process. Assume that we have the following:
Rule 1: IF X is At and Y is B t THEN Z is C t
Rule 2: IF X is A2 and Y is B2 THEN Z is C2
Now, if we have Zo and Yo as the sensor readings for fuzzy variables X
and Y, then their truth values are represented by PAl (zo) and PBI(YO)
respectively for Rule 1, where PAl represents the membership function
for At. Similarly for Rule 2, we have PA2(ZO) and PB2(YO) as the truth
values of the preconditions. The strength of Rule 1 can be calculated by:

where 1\ refers to the conjunction operator which was defined earlier to


be equal to Minimum operator. Similarly for Rule 2:
77

The control output of rule 1 is calculated by applying the matching


strength of its preconditions on its conclusion:

and for Rule 2:


pc~(W) = a2 t\ pc2 (w)
where w ranges over the values that the rule conclusions can take. This
means that as a result of reading sensor values Zo and Yo, Rule 1 is rec-
ommending a control action with Pc'1 (w) as its membership function and
Rule 2 is recommending a control action with Pc'2 (w) as its membership
function. The con1l.ict-resolution process then produces

where pc(w) is a pointwise membership function for the combined con-


clusion of Rule 1 and Rule 2. The t\ and V operators in above are defined
to be the Min and Maz functions respectively [36]. The result of this last
operation (i.e., pc(w)) is a membership function and has to be translated
(dejuzzifietf) to a single value as discussed next .

3.4 Decoding the Outputs: Defuzzification


This necessary operation produces a nonfuzzy control action that best
represents the membership function of an inferred fuzzy control action.
Several defuzzification strategies have been suggested in literature. Among
them, four methods which have been applied more often are described
here.

3.4.1 Tsukamoto's defuzziflcation method


As shown in Figure 6(a), if monotonic membership functions are used,
then a crisp control action can be calculated by:

where n is the number of rules with firing strength (Wi) greater than 0
and Zi is the amount of control action recommended by rule i.
78

3.4.2 The Center Of Area (COA) method


Assuming that a control action with a pointwise membership function
J.LC has been produced, the Center of Area method calculates the center
of gravity of the distribution for the control action. Assuming a discrete
universe of discourse, we have

z. _ L:~=l ZjJ.Lc(Zj)
- L:~=lJ.LC(Zj)

where q is the number of quantization levels of the output, Zj is the


amount of control output at the quantization level j and J.Lc(Zj) repre-
sents its membership value in C.

3.4.3 The Mean of Maximum (MOM) Method


The Mean of Maximum Method (MOM) generates a crisp control action
by averaging the support values which their membership values reach
the maximum. For a discrete universe, this is calculated by

where I is the number of quantized Z values which reach their maximum


memberships.

3.4.4 Defuziftcation when the output of the rules are functions


of their inputs
As mentioned earlier, fuzzy control rules may be written as a function
of their inputs. For example,

Rule i: IF X is Ai and Y is Bi THEN Z is fi(X, Y)

assuming that ai is the firing strengths of the rule i, then

Z. = L:i-l ai!i(zi, Yi)


L:i=l ai
where n is the number of firing rules.
79

Al

2.
3

Az z

1
3

x y z

xo=4 Yo=8

Figure 6: Defuzzification of the combined conclusion of rules described


in the example.

3.4.5 An example
Assume that we have the following two rules:
Rule 1: IF X is Al and Y is Bl THEN Z is C1
Rule 2: IF X is A2 and Y is B2 THEN Z is C 2
Suppose Zo and Yo are the sensor readings for fuzzy variables X and Y,
and the following are membership functions:
0:-2 0:-3
2:::;z:::;5 3:::;z:::;6
PAl = """"3
8-0:
-3- 5<z:::;8 PA2 = """"3
9-0:
-3- 6<z:::;9
ll=! 5:::;y:::;8 ll=! 4:::;y:::;7
PBI = 3
11-11
8<y:::;n PB2 = 1&-,1 7 < y:::; 10
3 3
%-1 %-3
1:::;z:::;4 3:::; z:::; 6
PCI = -3-
7-%
-3- 4<z:::;7 P C2 = -3-
9-%
-3- 6<z:::;9

Further assume that we are reading the sensor values Zo =4 and


Yo = 8. We illustrate how to calculate
1. the membership function for the control action recommended by
the combination of these two rules
80

2. the crisp value of the control action using the COA and MOM
methods.
First, the sensor readings Zo and Yo have to be matched against the
preconditions At. Bl respectively. This will produce PAl (Zo) = 2/3 and
PB I (Yo) = 1. Similarly, for rule 2, we have PA,(ZO) = 1/3 and PB,(YO) = 2/3.
The strength of rule 1 is calculated by:

al = Min(pAI (ZO),PBI(YO» = Min(2/3, 1) = 2/3.


and similarly for rule 2:

a2 = Min(PA,(zO),PB,(YO» = Min(I/3,2/3) = 1/3.


Applying al to the conclusion of rule 1 results in the shaded trapezoid
figure shown in Figure 6 for Cl . Similarly, applying a2 to the conclusion
of rule 2 results in the dashed trapezoid shown in Figure 6 for C 2 • By
superimposing the resulted memberships over each other and using the
M a:c operator, the membership function for the combined conclusion of
these rules is found (as shown the right hand side of the Figure 6).
Furthermore, using the COA method (explained earlier), the defuzzified
value for the conclusion is found:
2 · !+3· 1 +4 · 1 +5· 1 +6·!+7·!+8 · !
*
Z aOA -
-
3 3
1 2
3
2 2
3
1
3
1 1
3 3 -
-
4 •7•
3+3+3+3+3+3+3
Using the MOM defuzzification strategy, three quantized values reach
their maximum memberships in the combined membership function (i.e.,
3, 4, and 5 with membership values of 2/3). Therefore,
*
ZMOM = 3+4+5
3 = 4.0.
4 A Hierarchical Approach in Design of Fuzzy
Controllers
Berenji et. al. [8] have proposed the following algorithm in design of
fuzzy controllers with multiple goals. The algorithm has been applied in
control of a cart pole balancing system.
1. Let G = {gt. g2, ... gn} be the set of goals that system should achieve
and maintain. Notice that for n = 1 (i.e., no interacting goals),
the problem becomes simpler and may be handled using the earlier
methods in fuzzy control (e.g., see [36]).
81

2. Let G = p( G) where p is a function which assigns priorities among


the goals. We assume that such a function can be obtained in
a particular domain. In many control problems, it is possible to
specifically assign priorities to the goals. For example, in the simple
problem of balancing a pole on the palm of a hand and also moving
the pole to a pre-determined location, it is possible to do this by
first keeping the pole as vertical as possible and then gradually
moving to the desired location. Although these goals are highly
interactive (i.e., as soon as we notice that the pole is falling, we
may temporarily set aside the other goal of moving to the desired
location), we still can assign priorities fairly well.

3. Let U = {ttl, tt2, . •• , Un} where Ui is the set of input control param-
eters related to achieving gi.

4. Let A = {at, a2, ... , a,.} where at is the set of linguistic values used
to describe the values of the input control parameters in tti .

5. Let C = {Cl' C2, •••, en} where Ci is the set of linguistic values used
to describe the values of the output Z .

6. Acquire the rule set RI of approximate control rules directly related


to the highest priority goal. These rules are in the general form of

IF ttl is al THEN Z is CI.

7. For i = 2 to n, subsequently form the rule sets Ri. The format of


the rules in these rule sets is similar to the ones in the previous
step except that they include aspects of approximately achieving
the previous goal:

IF gi-l is approximately achieved and Ui is ai THEN Z is Ci .

The approximate achievement of a goal in step 7 of the above algo-


rithm refers to holding the goal parameters within smaller boundaries.
The interactions among the goal gi and goal gi-l are handled by form-
ing rules which include more preconditions in the left hand side. For
example, let us assume that we have acquired a set of rules RI for keep-
ing a pole vertical. In writing the second rule set R2 for moving to a
pre-specified location, aspects of approximately achieving gl should be
combined with control parameters for achieving g2. For example, a pre-
condition such as the pole is almost balanced can be added while writing
82

the rules for moving to a specific location. Afuzzy set operation known
as concentration [78] as described earlier can be used here to systemat-
ically obtain a more focused membership functions for the parameters
which represent the achievement of previous goals. The above algorithm
has been applied in cart-pole balancing and more details can be found
in [8].

5 Applications of Fuzzy Logic Controllers


In recent years, there has been a very significant increase in the number
of applications of fuzzy logic control. Although we will not provide a
complete list of the applications here, a selective number of both the
laboratory prototype systems and real commercial applications will be
discussed.
As mentioned earlier, Mamdani and Assilian [36] were the first to
apply the fuzzy set theory to control problems (e.g., the control of a
laboratory steam engine). This experiment triggered a number of other
applications such as the warm water process control [22], activated sludge
wastewater treatment [63], and traffic junction control [42]. Fuzzy logic
control has also been applied in a diverse set of domains such as arc weld-
ing [38], refuse incineration [40], automobile speed control [37], model
cars [55, 53, 51, 52], cement kiln control [65], aircraft flight control [30],
robot control [34, 59, 19, 66, 41], water purification process control [70],
nuclear reactor control [9], elevator control [33], process control [13, 44],
adaptive control [15], automatic tuning [39], control of a liquid level rig
[14], automobile transmission control [20], gasoline refinery catalytic re-
former control [3], two-dimensional ping-pong game playing [18], control
of biological processes [12], activated sludge plant[76], knowledge struc-
ture [46], and comparison with classical control theory [4, 58]. Among
these, the cement kiln controller [65] was the first industrial application.
The celebrated Hitachi's automatic train controller is among the more
recent fielded applications of fuzzy logic control. In the following, we
discuss a few of these systems in more details.

5.1 Automatic train control


Yasunobu and Miyamoto at Hitachi, Ltd. [73] have designed a fuzzy
controller for the Automatic Train Operation (ATO) systems. This sys-
tem has been in use in the city of Sendai, Japan since July 1987. The
two main operations of the system are Constant Speed Control (CSC)
83
and Train Automatic Stop Control (TASC). The CSC operation results
in maintaining a constant target speed (specified by the operator at the
start of the train operation) during the train travel. The TASC oper-
ation controls the speed of the train in order to stop the train at the
prespecified location. The system uses only a few rules (i.e., 12 rules for
each of the CSC and the TASC operations) and the control is evaluated
every 100 miliseconds. These operations use the evaluation of safety, rid-
ing comfort, traceability of target velocity, accuracy of stop gap, running
time and energy consumption criteria in deciding a control strategy. The
control rules are of the predictive fuzzy control rule types of the form:
IF( u is Ci --+ z is Ai and y is Bi) then u is Ci.
For example, when the train is in the TASC zone, the following rule
is used:
IT the control notch is not changed and
the train will stop at the predetermined location, then
the control notch is not changed.
The system performs as skillfully as human experts do and superior
to an ordinary PID3 automatic train operation controller in terms of
stopping precision, energy consumption, riding comfort, and running
time.

5.2 Fuzzy Logic Hardwares and Fuzzy Logic Computer


Chips
Yamakawa [71, 72] has pioneered using fuzzy logic at the hardware level
by developing systems which achieve information processing leading to-
ward what is referred to as the fuzzy computer. The systems developed
by Yamakawa can accept linguistic information and perform approxi-
mate reasoning-based inference at very high speeds (e.g., more than 10
million Fuzzy Logical Inferences Per Second or FLIPS). Among the many
applications of Yamakawa's fuzzy electronic circuits is the control of an
inverted pendulum of a short length (e.g., 15 cm, weighting 3.5 grams).
Also, Yamakawa's computer chips have been used in biomedical experi-
ments [57] and orthodentic results evaluation [75] .
Fuzzy Logic Chips were first developed by Togai and Watanabe at
AT& T Bell Lab [61]. Since the original design, several extensions have
been provided in [60] and [67].
3Proportional, Integral, and Derivative.
84

5.3 Sugeno's model car


Sugeno has designed a model car which can automatically park itself in a
garage. The fuzzy control rules are derived by modeling a human's park-
ing actions and developed based on the third method described earlier
in 3.2. The car uses the front wall distance (a:), side wall distance (y),
and the heading angle of the car (lJ) as its input variables. Three output
control variables are used: the angle of the front wheels in moving for-
ward (backward) and a control variable for speed control. For example,
a control rule for steering control in moving forward is:

If a: is A, y is B, lJ is C then f = Po +PIa: +P2Y + p3lJ.


Eighteen control rules are used for the steering control in moving forward
and sixteen control rules are used for the steering control in moving
backward. The input-output data is collected while a human parks the
car and is used in parameter identification (e.g., the identification of
the coefficients Po, PI, P2, and P3) of the above rules. Many successful
experiments have been done using the model car which is equipped with
a microprocessor and sensing devices.

5.4 Sugeno's model helicopter


Sugeno has initiated several projects on applying fuzzy logic control to
the control of a model helicopter. Among these are radio control by
oral instructions, automatic autorotation entry in engine failure cases,
and unmanned helicopter control for sea rescue [48]. Although these
projects have just started, several interesting results have already been
achieved. The input variables from the helicopter include pitch, roll, and
yaw, and their first and second derivatives. The control rules written for
the helicopter regulate the up/down, forward/backward, left/right, and
nose direction. For example, the longitudinal stick controls pitch, and
therefore forward/backward movement of the rotorcraft.
An example of a fuzzy control rule for hovering is as the following:

If the body rolls, then


control the lateral in reverse.

Or as another example for hovering control:

If the body pitches, then


control the longitude in reverse.
85

The helicopter control problems under study in these projects are


challenging control problems and are already producing results which
illustrate the strength of the fuzzy logic control technology.

5.5 Other recent applications


Among the other applications, fuzzy logic control has found applications
in the household appliances. Examples of these which will not further be
discussed here are: the air conditioner by Mitsubishi; washing machine
by Matsushita and Hitachi; VCR by Sanyo and Matsushita; vacuum
cleaner by Matsushita; palmtop computer by Sony; Microwave oven by
Toshiba, Sharp, Sanyo, and Hitachi; photography camera by Canon and
many others.

6 Learning Fuzzy Logic Controllers


In many controller design tasks, it is important to develop a controller
which can learn from experience to improve its performance. Here we
discuss the research on developing Self Organizing Controllers (SOCs)
and the recent work in using neural networks to provide a learning at-
tribute for the fuzzy logic controllers.

6.1 Self Organizing Controllers


The Self-Organizing Controllers (SOCs) are among the earliest fuzzy
controllers which provide an ability to change control policies with re-
spect to the process and the operating environment. The function of the
SOC is a combined system identification and control task. It infers from
the error and change in error, the change in control action to apply. In
addition to the error and change of error scales supplied by a control
designer, SOC uses a third scale to calculate the actual change in the
process input. In this sense, SOC is similar to the conventional PID
controllers' three gain parameters 4 .
SOCs evaluate their performance using a local measure which assesses
the performance over a small set of plant states and a global criterion
which measures the overall performance. The performance measure re-
sembles a human decision maker who determines the output correction
required from a knowledge of the error and the change of error. How-
ever, the decision tables required by the SOCs have to be generated and
4Proportional, Integral, and Derivative.
86

stored before hand. For more complex processes than the ones discussed
by Procyk and Mamdani [44](Le., other than single-input single-output
processes), the generation ofthese decision tables may be difficult.

6.2 Neural Networks in Fuzzy Logic Control


Similarities exist between the neural networks and the fuzzy logic con-
trollers. They both can handle extreme nonlinearities in the system.
Both techniques allow interpolative reasoning which frees us from the
true/false restriction of logical systems such as the ones used in sym-
bolic AI. For example, once a neural network has been trained for a set
of data, it can interpolate and produce answers for the cases not present
in the training data set. Similar properties hold for a fuzzy logic con-
troller. The weighted average scheme of fuzzy control and the sum of
the products of the neural nets are similar in principle.
The main idea in integrating fuzzy logic control with neural networks
is to use the strength of each one collectively in the resulting neuro-fuzzy
control system. This fusion allows:
1. A human understandable expression of the knowledge used in con-
trol in terms of the fuzzy control rules. This reduces the difficulties
in describing the trained neural network which is usually treated
as a black box.
2. The fuzzy controller learns to adjust its performance automatically
using a neural network structure and hence learns by accumulating
experience.
The main emphasis in the research so far has been on automatic de-
sign and fine-tuning of the membership functions used in fuzzy control
through learning by neural networks. Here, we focus on only three hybrid
models but many more references are available (such as the proceedings
of Iizuka-88 and Iizuka-90 conferences [1, 2]).

6.3 Fuzzy Control and AHC


Lee and Berenji [32] and Lee[31] have combined the Adaptive Heuristic
Critic (AHC) model of Barto, Sutton, and Anderson with fuzzy control
to learn the membership functions of the conclusions of the control rules.
This work builds on Barto et. al's pioneering work on applying neural
networks in control. Two neural-like elements are used in this model.
The Adaptive Heuristic Critic learns by updating the predictions of the
87

State Evaluation (CritiC>

r--'lnp:;.;;U.;...t- - - - - . I - Two-layer "..,..al net


State - Learns by updating the
predict ion of failures

Internal reinforcement

A5N

Action Determination (two-layer neural net)

Figure 7: The ARIC Architecture

system's failure over time. The integrated fuzzy-AHC model has been
tested in the domain of cart-pole balancing and the results have been
consistently better when compared with the performance of the AHC
model alone (e.g., in terms of speed of learning and smoothness of con-
trol). However, this model is difficult to apply for other control systems
due mainly to the fact that developing the mathematical functions for
the trace function and credit assignment are not trivial. The structure
proposed here suffers from the lack of generality and may be difficult to
apply to larger scale systems.

6.4 Fuzzy Control and two layer networks


Berenji [7, 6] has proposed a Neuro-Fuzzy Controller (ARIC) architec-
ture which extends the Fuzzy-AHC model mentioned earlier. Figure 7
presents the architecture of the proposed ARIC model. In ARIC, two
networks replace the two neural-like elements. These networks are re-
ferred to as the Action-state Evaluation Network (AEN) and Action Se-
lection Network (ASN). The AEN and ASN networks are multi-layered
neural networks which are based on the back propagation algorithm and
reinforcement learning.
88

6.5 Use of Clustering


Takagi and Hayashi [54] have presented an algorithm for combining neu-
ral networks with fuzzy logic which consists of three parts. The first
part finds a suitable partition of the training data by using a clustering
algorithm. Once the best partition of the data is found, then the sec-
ond part of the algorithm identifies the membership functions of the IF
parts of the rules. The last step of the algorithm is to determine the
amount of control output for each rule. A neural network is used for
the identification of the membership functions in the second step above.
After this network is trained, it assigns the correct rule number to the
combination of different sensor readings. To identify the THEN part of
a rule, a backward elimination method is used. This method arbitrarily
eliminates an input variable and the neural network of each THEN part
is retra.ined. This process determines the input variable with minimal
effects.
Takagi and Hayashi's model is very similar to Berenji's work in induc-
tive learning and fuzzy control [5] where an AI clustering method (C4,
a descendent of Quinlan's ID3 algorithm) is used. In Berenji's model,
a decision theoretic measure is used to decide which input variable the
decision tree should use to branch on next. The number of leaves of the
resulting decision tree will indicate the number of control rules needed.

6.6 Other Research


Among other work in this area is Kosko's work on Fuzzy Cognitive Maps
(FCM) [27]. The FCMs are graphical representations of the causal rela-
tionships between different factors, where the weights on the links rep-
resent the positive or negative causal relationships. Also, Kosko's Fuzzy
Associative Memories (FAM) [28] can map fuzzy input patterns into
stored fuzzy output patterns and hence are useful tools in representing
the knowledge base of a fuzzy controller.
Fuzzy logic controllers and neural network controllers are comple-
mentary, and it is expected that the amount of research into hybrid ap-
proaches will grow significantly in the next few years, especially in Japan,
where many applications have already been reported using a combina-
tion of these methods. In the U.S., NASA has taken an active role in
integrating these two powerful techniques (e.g., sponsorship of the First
and Second International Conferences on Neural Networks and Fuzzy
Logic at Johnson Space Center in 1988 and 1990).
89

7 Discussion
Among the problems which still deserve serious attention is the problem
of providing proof of stability for FLCs. In contrast to the analytical
control theory, FLC lacks this necessary attribute although some theo-
retical work has begun producing interesting results (e.g., [29, 26, 10, 11,
45, 17, 16, 68, 24, 62, 21, 43, 74, 47, 25]).
Another area that requires attention is in what we refered to as fuzzy
modeling of systems earlier in this paper. Here the attention should be
focused on structure identification and parameter identification of the
dynamics of a system in order to develop a model which could later be
used to develop the fuzzy logic controller [56, 50].
Finally, as we briefly discussed in the previous section, artificial neu-
ral networks and fusion techniques are being developed in order to de-
velop fuzzy logic controllers which can learn from experience. Despite
these open issues, fuzzy logic control has achieved a huge commercial
success in recent years. Because these controllers are easy to manufac-
ture and greatly resemble human reasoning, it is expected that there will
be many more applications in the near future.

References
[1] Intemational Conference on Fuzzy Logic & Neural Networks, vol-
ume one and two, lizuka, Japan, 1988.

[2] Intemational Conference on Fuzzy Logic & Neural Networks, vol-


ume one and two, Iizuka, Japan, 1990.

[3] W.H. Bare, R .J. Mulholland, and S.S. Sofer. Design of a self-tuning
rule based controller for a gasoline refinery catalytic reformer. IEEE
Transactions on Automatic Control, 35(2):156-164, 1990.
[4] H. R. Berenji, Y. Y. Chen, C. C. Lee, S. Murugesan, and J . S. Jang.
An experiment-based comparative study of fuzzy logic control. In
American Control Conference, Pittsburgh, 1989.

[5] H.R. Berenji. Machine learning in fuzzy control. In Int. Conf. on


Fuzzy Logic & Neural Networks, pages 231-234, lizuka, Fukuoka,
Japan, 1990.
90

[6] H.R. Berenji. A reinforcement learning based model for fuzzy logic
control. International Journal of Approximate Reasoning, 1991 (to
appear).
[7] H.R. Berenji. An architecture for designing fuzzy controllers using
neural networks. In Second Joint Technology Workshop on Neural
Networks and Fuzzy Logic, Houston, Texas, April 1990.
[8] H.R. Berenji, Y .Y. Chen, C.C. Lee, J.S. Jang, and S. Murugesan.
A hierarchical approach to designing approximate reasoning-based
controllers for dynamic physical systems. In Sixth Conference on
Uncertainty in Artificial Intelligence, pages 362-369, 1990.
[9] J. A. Bernard. Use ofrule-based system for process control. IEEE
Control Systems Magazine, 8, no. 5:3-13, 1988.
[10] M. Braae and D.A. Rutherford. Theoretical and linguistic aspects
of the fuzzy logic controller. Automatica, 15, no. 5:553-577, 1979.
[11] Y.Y. Chen. Stability analysis of fuzzy control-a lyapunov ap-
proach. In IEEE Systems, Man, Cybernetics, Annual Conference,
volume 3, pages 1027-1031, 1987.
[12] E. Czogala and T. Rawlik. Modelling of a fuzzy controller with
application to the control of biological processes. Fuzzy Sets and
Systems, 31:13-22, 1989.
[13] J. Efstathiou. Rule-based process control using fuzzy logic. In
E. Sanchez and L.A. Zadeh, editors, Approximate Reasoning in In-
telligence Systems, Decision and Control, pages 145-148. Pergamon,
New York, 1987.
[14] B. P. Graham and R. B. Newell. Fuzzy identification and control of
a liquid level rig. Fuzzy Sets and Systems, 26:255-273, 1988.
[15] B. P. Graham and R. B. Newell. Fuzzy adaptive control of a first
order process. Fuzzy Sets and Systems, 31:47-65, 1989.
[16] G. M. Trojan Gupta, M. M. and J. B. Kiszka. Controllability of
fuzzy control systems. IEEE Transactions on Systems, Man, and
Cybernetics, SMC-16, no. 4:576-582, 1986.
[17] W. Pedrycz Gupta, M. M. Cognitive and fuzzy logic controllers:
A retrospective and perspective. In American Control Conference,
pages 2245-2251, 1989.
91

[18] K. Hirota, Y. Arai, and S. Hachisu. Fuzzy controlled robot arm


playing two-dimensional ping-pong game. Fuzzy Sets and Systems,
32:149-159, 1989.

[19] C. Isik. Identification and fuzzy rule-based control of a mobile robot


motion. In IEEE Int. Symposium on Intelligent Control, 1987.

[20] Y. Kasai and Y. Morimoto. Electronically controlled continuously


variable transmission. In Proc. IntI. Congress on Transportation
Electronics, Dearborn, Michigan, October 1988.

[21] W. J. Kickert and E. H. Mamdani. Analysis of a fuzzy logic con-


troller. Fuzzy Sets and Systems, 1(1):29-44, 1978.

[22] W. J. M. Kickert and H. R. Van Nauta Lemke. Application of a


fuzzy controller in a warm water plant. Automatica,12(4):301-308,
1976.

[23] P. J. King and E. H. Mamdani. The application of fuzzy control


systems to industrial processes. Automatica, 13(3):235-242, 1977.

[24] J.B. Kiszka, M.M. Gupta, and P.N. Nikiforuk. Energistic stability
of fuzzy dynamic systems. IEEE Trans. Systems, Man and Cyber-
netics, SMC-15(6), 1985.

[25] J.B. Kiszka, M.M. Gupta, and G.M. Trojan. Multivariable fuzzy
controller under godel's implication. Fuzzy Sets and Systems,
34:301-321, 1990.
[26] S. V. Komolov, S. P. Makeev, and F. Shaknov. Optimal control
of a finite automation with fuzzy constraints and a fuzzy target.
Cybernetics, 16(6):805-810, 1979.
[27] B. Kosko. Fuzzy cognitive maps. International Journal of Man-
Machine Studies, 24:65-75, 1986.
[28] B. Kosko. Fuzzy associative memories. In Kandel A., editor, Fuzzy
Expert Systems. Addison-Wesley, 1987.
[29] G.R. Langari and M. Tomizuka. Stability of fuzzy linguistic control
systems. In IEEE Conference on Decision and Control, Hawaii,
December 1990.
92
[30] L. I. Larkin. A fuzzy logic controller for aircraft flight control. In
M. Sugeno, editor, Industrial Applications of Fuzzy Control, pages
87-104. North-Holland, Amsterdam, 1985.
[31] C.C. Lee. Self-learning rule-based controller employing approximate
reasoning and neural-net concepts. Int. Journal of Intelligent Sys-
tems, 1990.
[32] C.C. Lee and H.R. Berenji. An intelligent controller based on ap-
proximate reasoning and reinforcement learning. In Proc. of IEEE
Int. Symposium on Intelligent Control, Albany, NY, 1989.
[33] Fujitec Co. Ltd. Flex-8800 series elevator group control system.
Technical report, Fujitec Co. Ltd., Osaka, Japan, 1988.
[34] Scharf. E. M. and N. J .Mandie. The application ofafuzzy controller
to the control of a multi-degree-freedom robot arm. In M. Sugeno,
editor, Industrial Applications of Fuzzy Control, pages 41-62. North-
Holland, Amsterdam, 1985.
[35] S. Mabuchi. An approach to the comparison of fuzzy subsets with
an a-cut dependent index. IEEE Transactions on Systems, Man,
and Cybernetics, 18(2), 1988.
[36] E. H. Mamdani and S. Assilian. An experiment in linguistic syn-
thesis with a fuzzy logic controller. International Journal of Man-
Machine Studies, 7(1):1-13, 1975.
[37] S. Murakami and M. Maeda. Application of fuzzy controller to au-
tomobille speed control system. In M. Sugeno, editor, Industrial
Applications of Fuzzy Control, pages 105-124. North-Holland, Am-
sterdam, 1985.
[38] S. Murakami, F. Takemoto, H. Fujimura, and E. Ide. Weld-line
tracking control of arc welding robot using fuzzy logic controller.
Fuzzy Sets and Systems, 32:221-237, 1989.
[39] A. Ollero and A.J. Garcia-Cerezo. Direct digital control, auto-
tuning and supervision using fuzzy logic. Fuzzy Sets and Systems,
30:135-153, 1989.
[40] H. Ono, T . Ohnishi, and Y. Terada. Combustion control of refuse
incineration plant by fuzzy logic. Fuzzy Sets and Systems, 32:193-
206, 1989.
93

[41] Rainer Palm. Fuzzy controller for a sensor guided manipulator.


Fuzzy Sets and Systems, 31:133-149, 1989.
[42] C.P. Pappis and E. H. Mamdani. A fuzzy logic controller for a traffic
junction. IEEE Transactions on Systems, Man, and Cybernetics,
SMC-7(10):707-717,1977.
[43] W. Pedrycz. An approach to the analysis of fuzzy systems. Int.
Journal of Control, 34(3):403-421, 1981.
[44] T. J. Procyk and E. H. Mamdani. A linguistic self-organizing process
controller. Automatica, 15(1):15-30, 1979.

[45] K. S. Ray and D. D. Majumder. Application of circle criteria for


stability analysis of linear siso and mimo systems associated with
fuzzy logic controller. IEEE Transactions on Systems, Man, and
Cybernetics, SMC-14(2):345-349, 1984.
[46] Floor Van Der Rhee, Hans R. Van Nauta Lemke, and Jaap G. Dijk-
man. Knowledge based fuzzy control of systems. IEEE Transactions
on Automatic Control,35(2):148-155, 1990.
[47] William Siler and Hao Ying. Fuzzy control theory: The linear case.
Fuzzy Sets and Systems, 33:275-290, 1989.

[48] M. Sugeno. Current projects in fuzzy control. In Workshop on


Fuzzy Control Systems and Space Station Applications, pages 65-
77, Huntington Beach, CA, 14-15 November 1990.
[49] M. Sugeno. An introductory survey of fuzzy control. Information
Science, 36:59-83, 1985.
[50] M. Sugeno and G. T. Kang. Structure identification offuzzy model.
Fuzzy Sets and Systems, 28(1):15-33, 1988.
[51] M. Sugeno and K. Murakami. An experimental study on fuzzy
parking control using a model car. In M. Sugeno, editor, Indus-
trial Applications of Fuzzy Control, pages 125-138. North-Holland,
~terdam, 1985.

[52] M. Sugeno, T. Murofushi, T. Mori, and Tatematsu. Fuzzy algo-


rithmic control of a model car by oral instructions. Fuzzy Sets and
Systems, 32:207-219, 1989.
94

[53] M. Sugeno and M. Nishida. Fuzzy control of model car. Fuzzy Sets
and Systems, 16:110-113, 1985.
[54] H. Takagi and I. Hayashi. Artificial-neural-network-driven fuzzy
reasoning. Int. J. of Approximate Reasoning, (to appear).

[55] T. Takagi and M. Sugeno. Derivation of fuzzy control rules from


human operator's control actions. In IFAC Symposium on Fuzzy In-
formation, Knowledge Representation and Decision Analysis, pages
55-60, Marseille, France, 1983.

[56] T. Takagi and M. Sugeno. Fuzzy identification of systems and its


applications to modelling and control. IEEE Transactions on Sys-
tems, Man, and Cybernetics, SMC-15(1):116-132, 1985.
[57] M. Takahashi, E. Sanchez, R. Bartolin, J.P. Aurrand-Lions,
E. Akaiwa, T. Yamakawa, and J.R. Monties~ Biomedical appli-
cations of fuzzy logic controllers. In Int. Con/. on Fuzzy Logic &
Neural Networks, pages 553-556, Iizuka, Fukuoka, Japan, 1990.

[58] K. L. Tang and R. J. Mulholland. Comparing fuzzy logic with


classical controller designs. IEEE Transactions on Systems, Man,
and Cybernetics, SMC-17(6):1085-1087, 1987.
[59] R. Tanscheit and E. M. Scharf. Experiments with the use of a rule-
based self-organising controller for robotics applications. Fuzzy Sets
and Systems, 26:195-214, 1988.
[60] M. T~gai and S. Chiu. A fuzzy accelerator and a programming
environment for real-time fuzzy control. In Second IFSA Congress,
pages 147-151, Tokyo, Japan, 1987.

[61] M. Togai and H. Watanabe. Expert systems on a chip: an engine for


real-time approximate reasoning. IEEE Expert Systems Magazine,
1:55-62, 1986.

[62] R. Tong. Analysis of fuzzy control algorithms using the relation


matrix. International Journal of Man-Machine Studies, 8(6):679-
686, 1976.

[63] R . Tong, M. B. Beck, and A. Latten. Fuzzy control ofthe activated


sludge wastewater treatment process. Automatica, 16(6):695-701,
1980.
95

[64] T. Tsukamoto. An approach to fuzzy reasoning method. In M. M.


Gupta, R . K. Ragade, and R. R. Yager, editors, Advances in Fuzzy
Set Theory and Applications. North-Holland, Amsterdam, 1979.

[65] I. G. Umbers and P. J. King. An analysis of human-decision making


in cement kiln control and the umplications for automation. Inter-
national Journal of Man-Machine Studies, 12(1):11-23, 1980.

[66] M. Uragami, M. Mizumoto, and K. Tanaka. Fuzzy robot controls.


Cybernetics, 6:39-64, 1976.

[67] H. Watanabe and W. Dettloff. Reconfigurable fuzzy logic proces-


sor: a full custom digital vlsi. In Int. Workshop on Fuzzy System
Applications, pages 49-50, Iizuka, Japan, 1988.
[68] Chen-Wei Xu. Analysis and feedback/feedforward control of fuzzy
relational systems. Fuzzy Sets and Systems, 35:105-113, 1990.

[69] Murayama. Y. and T. Terano. Optimising control of disel engine. In


M. Sugeno, editor, Industrial Applications of Fuzzy Control, pages
63-72. North-Holland, Amsterdam, 1985.

[70] O. Yagishita, O. Itoh, and M. Sugeno. Application of fuzzy rea-


soning to the water purification process. In M. Sugeno, editor, In-
dustrial Applications of Fuzzy Control, pages 19-40. North-Holland,
Amsterdam, 1985.

[71] T. Yamakawa. Fuzzy microprocessors - rule chip and defuzzifier


chip. In Int. Workshop on Fuzzy System Applications, pages 51-52,
Iizuka, Japan, 1988.

[72] T. Yamakawa. Intrinsic fuzzy electronic circuits for sixth genera-


tion computer. In M.M. Gupta and T. Yamakawa, editors, Fuzzy
Computing, pages 157-171. Elsevier Science Publishers B.V. (north
Holland), north Holland, 1988.

[73] S. Yasunobu and S. Miyamoto. Automatic train operation by pre-


dictive fuzzy control. In M. Sugeno, editor, Industrial Applications
of Fuzzy Control, pages 1-18. North-Holland, Amsterdam, 1985.

[74] Hao Ying, William Siler, and James J. Buckley. Fuzzy control the-
ory: A nonlinear case. Automatica, 26(3):513-520, 1990.
96

[75] Y. Yoshikawa, T. Deguchi, and T. Yamakawa. Exclusive fuzzy hard-


ware system for the appraisal of orthodentic results. In Int. Conf.
on Fuzzy Logic & Neural Networks, pages 939-942, Iizuka, Fukuoka,
Japan, 1990.
[76] C. Yu, Z. Cao, and A. Kandel. Application of fuzzy reasoning to
the control of an activated sludge plant. Fuzzy Sets and Systems,
38:1-14, 1990.

[77] L. A. Zadeh. Fuzzy sets. Information and Control, 8:338-353, 1965.

[78] L.A. Zadeh. A fuzzy-set-theoretic interpretation of linguistic hedges.


Journal of Cybernetics, 2:4-34, 1972.
5
METHODS AND APPLICATIONS
OF
FUZZY MATHEMATICAL
PROGRAMMING

H.-J. Zimmermann
RWfHAachen
Tempiergraben 55
W-5100 Aachen (Germany)

1. INTRODUCTION

In spite of other uses of the term "mathematical programming" it shall


be interpreted here as it is normally done in Operations Research, i.e. an
algorithmic approach to solving models of the type

maximize f(x)

such that gi(x) = 0, i = l, ... ,m (1)

Depending on the mathematical character of the objective function,


f(x), and the constraints, gi(x), many types of mathematical programming
algorithms exist, such as, linear programming, quadratic programming,
fractional programming, convex programming etc. Exemplarily, we shall use
the simplest and most commonly used type, i.e. li~ear programming, which
focusses on the model

maximize f(x) =z = cTx

such that Ax:Sb


x~o (2)
with c, x E IRn, b E IRm, A E IRmxn.

In this model it is normally assumed that all coefficients of A, b, and c


are real (crisp) numbers; that ":s" is meant in a crisp sense, and that "maximize"
98

is a strict imperative. This also implies that the violation of any single
constraint renders the solution infeasable and that all constraints are of equal
importance (weight). Strictly speaking, these are rather unrealistic
assumptions, which are partly relaxed in "fuzzy linear programming".

If we assume that the LP-decision has to be made in fuzzy


environments, quite a number of possible modifications of (2) exist. First of all,
the decision maker might really not want to actually maximize or minimize the
Objective function. Rather he might want to reach some aspiration levels which
might not even be definable crisply. Thus he might want to "improve the
present cost situation considerably," and so on.
Secondly, the constraints might be vague in one of the following ways:
The :s sign might not be meant in the strictly mathematical sense but smaller
violations might well be acceptable. This can happen if the constraints
represent aspiration levels as mentioned above or if, for instance, the
constraints represent sensory requirements (taste, color, smell, etc.) which
cannot adequately be approximated by a crisp constraint. Of course, the
coefficients of the vectors b or c or of the matrix A itself can have a fuzzy
character either because they are fuzzy in nature or because perception of them
is fuzzy.
Finally the role of the constraints can be different from that in classical
linear programming where the violation of any single constraint by any amount
renders the solution infeasable. The decision maker might accept small
violations of different constraints. Fuzzy linear programming offers a number
of ways to allow for all those types of vagueness and we shall discuss some of
them below.

Before we develop a specific model of linear programming in a fuzzy


environment it should have become clear, that by contrast to classical linear
programming "fuzzy linear programming" is not a uniquely defined type of
model but that many variations are possible, depending on the assumptions or
features of the real situation to be modelled.
Essentially two "families" of models can be distinguished: One
interprets "fuzzy mathematical programming" as a specific decision making
environment to which Bellman and Zadeh's definition of a "decision in fuzzy
environments" [1970] can be applied. The other considers components of
model (2) as fuzzy, makes certain assumptions, for instance, about the type of
fuzzy sets which as fuzzy numbers replace the crisp coefficients in A, b, or c, and
then solve the resulting mathematical problem. The former approach seems to
us the more application oriented one. From experience in applications a
decision maker seems to find it much easier to describe fuzzy constraints or to
establish aspiration levels for the objective(s) than to specify a large number of
fuzzy numbers for A, b, or c. We shall, therefore, first describe the first
approach and then elaborate on the other approaches.
99

2. Fuzzy Mathematical Programming

2a Symmetric Fuzzy Linear Programming

As mentioned above Fuzzy LP is considered as a special case of a


decision in a fuzzy environment. The basis in this case is the definition
suggested by Bellman and Zadeh [1970]:

Definition 1:

Assume that we are given a fuzzy goal G and a fuzzy constraint Cin a
space of alternatives X. Then Gand Ccombine to form a decision, 15, which is
a fuzzy set resulting from intersection of 0 and C. In symbols, i5 = G n Cand
correspondingly

JJD = min {JJG, JJc}·

MO!e ge,!!,erally, suppose that we have n goals 01,... ,Gn and m


constraints C1, ... ,Cm. Then, the resultant decision is the intersection of the
given goals 01,... ,G~- and the given constraints Cl,...,Cm• That is,

15 = 0 1 n O2 n· ··· nan n C1 n ~ n n Cm

and correspondingly

=min {JJ5 . , JJC.} = min {J.'j} .


I J

This definition implies:

1. The "and" connecting goals and constraints in the model corresponds to


the "logical and".
2. The logical "and" corresponds to the set theoretic intersection.
3. The intersection of fuzzy sets is defined in the possiblistic sense by the
min-operator.

For the time being we shall accept these assumptions. An important


feature of this model is also its symmetry, i.e. the fact that, eventually, it does
not distinguish between constraints and objectives. This feature is not
considered adequate by all authors (see, for instance, Asai et. al. [1975]). We
feel, howl'ver, that this models quite well real behaviour of decision makers.
100

If we assume that the decision maker can establish in model (2) an


aspiration level, z, of the objective function, which he wants to achieve as far as
possible and if the constraints of this model can be slightly violated" -without
causing infeasibility of the solution - then model (2) can be written as

Find X
such thatcTx ~ z
Ax ""
~b
""
x ~o (3)

Here ~ denotes the fuzzified version of ~ and has the linguistic


interpretation "essentially smaller than or equal." ~ denotes the fuzzified
version of ~ and has the linguistic interpretation "essentially greater than or
equaL" The Objective function in (2) might have to be written as a minimizing
goal in order to consider z as an upper bound.

We see that (3) is fully symmetric with respect to Objective function


and constraints and we want to make that even more obvious by substituting
CC.J = Band (-Zb) = d. Then (3) becomes:

Find x
such that Bx ~ d
x~O (4)

Each of the (m + 1) rows of (4) shall now be represented by a fuzzy set,


the membership functions of which are I-'i (x). The membership function of
the fuzzy set "decision" of model (4) is

= min {I-'i (x)}


i (5)

l-'i(X) can be interpreted as the degree to which x fulfills (satisfies) the fuzzy
inequality BjX~dj (where Bj denotes the ith row of B).

Assuming that the decision maker is interested not in a fuzzy set but in
a crisp "optimal" solution we could suggest the "maximizing solution" to (5),
which is the solution to the possibly nonlinear programming problem

max min {I-'j(x)}


Xi:O i
= max
Xi:O
I-'i){x) (6)

Now we have to specify the membership functions I-'j(x). I-'j(x) should


be 0 if the constraints (including objective function) are strongly violated, and 1
if they are very well satisfied (i.e., satisfied in the crisp sense); and I-'j(x) should
increase monotonously from 0 to 1, that is:
101

ifBjX s di
if dj < BjX S dj + Pi i=1,... m+1
if Bj x > dj + pj (7)

Using the simplest type of membership function we assume them to be linearly


increasing over the "tolerance interval" Pj:

1 if BjX S dj

{
BjX-dj
ISj(X) = 1- if di < BjX S dj+pj i=1,..,m+1
pj
0 if BjX > dj+pj (8)

The Pj are subjectively chosen constants of admissable violations of the


constraints and the objective function. Substituting (8) into (6) yields, after
some rearrangements [Zimmermann 1976] and with some additional
assumptions, . ( BjX _d j )
maxmm 1 - - -
pj (9)

Introducing one new variable, A, which corresponds essentially to (5),


we arrive at

maximize A
such that Apj+Bjx sdj+pj i=1,... ,m+1
x ~O (10)

If the optimal solution to (10) is the vector (A, "0), the "0 is the
maximizing solution (6) of model (2) assuming membership functions as
specified in (8).
The reader should realize that this maximizing solution can be found
by solving one standard (crisp) LP with only one more variable and one more
constraint than model (4). This makes this approach computationally very
efficient.
A slightly modified version of models (9) and (to), respectively, results
if the membership functions are defined as follows: A variable tj, i = 1,... ,m + 1,
OstjsPj, is defined which measures the degree of violation of the ith
constraint: The membership function of the ith row is then

(11)
102

The crisp equivalent model is then

maximize .x
such that .xpj+tj :Spj i=1,.... ,m+1
BjX - tj :S dj
tj :S pj
x,t ~ 0 (12)

This model is larger than model (10) even though the set of constraints
tj:Spj is actually redundant. Model (12) has some advantages, however, in
particular when performing sensitivity analysis.
The main advantage, compared to the unfuzzy problem formulation, is
the fact that the decision maker is not forced into a precise formulation
because of mathematical reasons, even though he might only be able or willing
to describe his problem in fuzzy terms. Linear membership functions are
obviously only a very rough approximation. Membership functions which
monotonically increase or decrease, respectively, in the interval of [dj,dj+pj]
can also be handled quite easily, as will be shown later.
It should also be observed that the classical assumption of equal
importance of constraints has been relaxed: the slope of the membership
functions determines the "weight" or importance of the constraint. The slopes,
however, are determined by the pj's: The smaller the pj the higher the
importance of the constraint. For pj=O the constraint becomes crisp, i.e. no
violation is allowed.

So far, the objective function as well as all constraints were considered


fuzzy. If some of the constraints are crisp, Ox:Sb, then these constraints can
easily be added to formulations (11) or (12), respectively. Thus (11) would, for
instance, become:

maximize .x
such that .xpj+BjX:S dj+pj
Ox :Sb
x,.x ~ 0 (13)

2B LINEAR PROGRAMS WITH FUZZY CONSTRAINTS AND


CRISP OBJECTIVE FUNCTIONS

So far, it has been assumed that the objective function could be


calibrated by a given z and then formulated as a fuzzy set, resulting in the
symmetrical model formulation. It might, however, not be possible to find in a
natural way the required z. In this case the symmetry of the model can be
gained by applying a specialisation of Zadeh's "maximizing set" to the objective
function:
103

Definition 2: [Werners 1984]

Let f:X -+ IR 1 be the objective function, R = fuzzy feasible region, S(R) =


support ofR, and Rl = a-level cut ofR for a = 1. The membership function of
the goal (objective function) given solution space R is then defined as

0 iff(x) :s sup f
Rl
f(x)-sup f
Rl
J.'G(x) = if sup f < f(x) <sup f
Rl S(R)
su~ f-sup f
S( ) Rl
1 if sup f:s f(x)
S(R)

The corresponding membership function in functional space is then

sup J.'o(x) ifr E R,f- 1(r) ~ 0


~Q(r):= { xer1(r)
o else (13A)

Adding this fuzzy set to the fuzzy sets defining the solution space gives
again a symmetrical model to which (10) or (12) can be applied. Definition 2
becomes easier to understand if we apply it to a specific given LP-structure:

Let us modify (3) by adding a set of crisp constraints, Dx:sb, and


changing the Objective function to maximize f(x). This yields model

maximize f(x) = cTx


such that Ax~b }
Dxsb' R
x:sO (14)

Let the membership functions of the fuzzy sets representing the fuzzy
constraints be defined in analogy to (8) as

o (15)
104

On the basis of the two LP's following, the membership function of the
fuzzy set defined in definition 2 can then easily be defined:

maximize f(x) = cTx

such that Ax sb
Ox sb'
x ~O (16)

The optimal solution of this model is fl = sup f (CTx)opto


Rl

Maximize f(x) = cTx

such that Axsb + P


Oxsb'
x~O (17)

The optimal solution of the model model is fo = su~ f = (cTX)opto


S(R)

The membership function is therefore

to
J.'o(x) = {
2rx- f.
I
if
iffl
s cTx
< Crx < fo
to - fl
o ifcTx S fl (18)

The "equivalent" model to (14) is therefore:

maximize .oX.

such that A(fo - f1) - cTx s -fl

AP+ Axsb+p
Oxsb'
sl
A,X~O (19)
105

Example:

Consider the LP-Model

maximize

such that
Xl + x2:S 4
5xl + x2:S 3
xl,x2 ~ 0

The "tolerance intervals" of the constraints are PI = 6,P2 = 4, P3 = 2.


fo and fl can be determined to be 7 and 16, respectively. Hence, model (19) is

maximize A

such that 9A - 2xl - x2:s-7


6A + xl :S 9
4A + xl + x2 :S 8
2A + 5xl + x2 :S 5
A :s1
A,xl,x2 ~ 0

The solution to this problem is x~ = 5.84, x2 = 0, AO = .52.


Some authors suggest not to use a "symmetrical" approach but rather
to compute a fuzzy set "decision". They compute the optimal values of the
objective function for all a-level-sets of the solution space. The membership
function of the "decision" is then defined to be the a'S corresponding to the
respective optimal values of the objective function. [Orlovski 1977]

In a certain sense this philosophy is similar to that of those authors


who suggest to determine to model (4) not a crisp solution (6) but the fuzzy set
decision. To do that a parametric linear programming problem has to be solved
[Chanas 1983J. Even though this approach leads to quite impressive results in
the 2-dimensional case, it is rather questionable whether the decision maker
can make use of it in a realistically sized problem.

2C EXTENSIONS

So far, two major assumptions have been made in order to arrive at


"equivalent models" which can be solved efficiently by standard LP-methods:
106

1. Linear membership functions were assumed for all fuzzy sets involved.
2. The use of the minimum-operator for the aggregation of fuzzy sets was
considered to be adequate.

The relaxation of these two assumptions leads to complication which are


differently severe depending on the type of relaxation:

1. Nonlinear membership functions

The linear membership functions used so far could all be defined by


fixing two points, the upper and lower aspiration levels or the two bounds of
the tolerance interval. The most obvious way to handle nonlinear membership
functions is probably to approximate them piece-wise by linear functions. Some
authors [Hannan 1981; Nakamura 1985] have used this approach and shown
that the resulting equivalent crisp problem is still a standard linear
programming problem.
This problem, however, can be considerably larger than model (10)
because in general one constraint will have to be added for each "linear piece"
of the approximation. Quite often S-shaped membership functions have been
suggested, particularly if the membership function is interpreted as a kind of
utility function (representing the degree of satisfaction, acceptance etc.).
Leberling [1981], for instance, suggests such a function which is also uniquely
determined by two parameters. He suggests

with a, b, 0 ~ o. This hyperbolic function has the following formal


properties:

I'H(X) is strictly monotonously increasing.

1 a+b
I'H(x) =- where x =
2 2

I'H(X) is strictly convex on [- co, (a + b)/2] and strictly concave on [(a + b)/2, +
co ].

For all x e IR: = 0 < I'H(x) < 1 and I'H(x) approaches asymptotically f(x) =0
and f(x) = 1, respectively.
107

Leberling shows that choosing as lower and upper aspiration levels for
= =
the fuzzy objective function z ex of an LP a £ (lower bound of z) and b c =
(upper limit of the objective function), and representing this (fuzzy) goal by a
hyperbolic function one arrives at the following crisp equivalent problem for
one fuzzy goal and all crisp constraints:

minimize A

eZ'(x) - e -Z'(x) 1
1
such that A- ------<-
2 eZ'(x) + e-Z'(x) 2

Ox:s b'
X,A~O (20)

with Z'(x) = (Lj Cj ":; - ~(C + £)S. For each additional fuzzy goal or
constraint one of these exponential rows has, of course, to be added to (20).
For Xn+1 = tanh- 1(2A - 1), model (20) is equivalent to the following
linear model:

maximize
1
such that S fCj":; -Xn+l ~ 2 s(c + £)
Ox:s b'
Xn+l,x ~ 0
(21)

This is again a standard linear programming model which can be solved, for
instance, by any available simplex code.
The above equivalence between models with nonlinear membership
functions is not accidental. It has been proven that the following relationship
holds [Werners 1984, p. 143].

Theorem 1

Let {fk}' k = 1,... ,K be a finite family of functions fk: IRn -+ IR 1, i1 e X c IR n.


g: IRI -+ IRI strictly monotonously increasing and A, A'e IR. Consider the two
mathematical programming problems

maximize A

such that A:S fk(x) k= 1,... ,K


xeX
108

(22)

maximize A'

such that A' ~ g(fk(x)) k=I,... K.


xeX
(23)

If there exists a A0 e R'such that (A 0, xO) is the optimal solution of (22) then
there exists a A'O e R' such that (A 0, xO) is the optimal solution of (23).
Theorem 1 suggests that quite a number of nonlinear membership
functions can be accommodated easily. Unluckily, the same optimism is not
justified concerning other aggregation operators.
The computational efficiency of the approach mentioned so far has
rested to a large extent on the use of the min-operator as a model for the
logical "and" or the intersection of fuzzy sets, respectively. Axiomatic
[Hamacher 1978] as well as empirical (Thole, Zimmermann, Zysno 1979,
Zimmermann, Zysno 1980, 1983] investigations have shead some doubt on the
general use of the min-operator in decision models. Quite a number of context
free or context dependent operators have been suggested in the meantime [see,
e.g., Zimmermann 1990b, ch. 3]. The disadvantage of these operators is,
however, that the resulting crisp equivalent models are no longer linear [see,
e.g., Zimmermann 1978, p.45], which reduces the computational efficiency of
these approaches considerably or even renders the equivalent models
unsolvable within acceptable time limits. There are, however, some exceptions
to this rule, and we will present two of them in more detail.
One of the objections against the min-operator (see, for instance,
Zimmermann and Zysno [1980]) is the fact that neither the logical "and" nor
the min-operator is compensatory in the sense that increases in the degree of
membership in the fuzzy sets "intersected" might not influence at all
membership in the resulting fuzzy set (aggregated fuzzy set or intersection).
There are two quite natural ways to cure this weakness:

1. Combine the (limitational) min-operator as model for the logical "and"


with the fully compensatory max-operator as a model for the inclusive "or". For
the former, the product operator might be used alternatively and for the latter
the algebraic sum might be used. This approach departs from distinguishing
between "and" and "or" aggregation as being somewhere between the "and" and
the "or". (Therefore it is often called compensatory and.)
2. Stick with the distinction between "and" and "or" aggregators and
introduce a certain degree of compensation into these connectives.

Compensatory "and", For some applications it seems to be important that the


aggregator used maps above the max-operator and below the min-operator.
The A-operator [Zimmermann, Zysno 1980] would be such a connective. For
109

purposes of mathematical programming it has, however, the above-mentioned


disadvantage of low computational efficiency. An acceptable compromise
between empirical fit and computational efficiency seems to be the convex
combination of the min-operator and the max-operator:
m m
J'c(x) = 'Y ~in J'i(x) + (1 - 'Y) ~ax J'i(x) 'Y e [0, 1]
.-1 .-1
(24)

For determining the maximizing decision the following problem has to be


solved.

or

maximize 'Y' Al + (1- 'Y)A2


such that AI ::S J'i(X) i=1,... , m
A2 ::S J'i(X) for at least one i e {1,...,m}
xeX

or

maximize 'YAI + (1- 'Y)A2


such that AI ::S J'i(x) i=1,...,m
m A2::S J'i(x) + M'Yi i=1, ... , m
L'Yi::s m - 1
i ... l

'Y e {O,!}, M is a very large real number


xeX
(25)

For linear membership functions of the goals and the constraints (25) is a
mixed integer linear program that can be solved by the appropriate available
codes.
If one wants to distinguish between an "and"-aggregation and an "or"-
aggregation (for instance, for the sake of easier modelling) one may want to use
the following operators:
110

Definition 2 [Werners 1984]

Let I'i(x) be the membership functions of fuzzy sets which are to be aggregrated
in the sense of a fuzzy and (and). The membership function of the resulting
fuzzy set is defined to be
m 1
I'and(x) = 1 ·I!lin
,-I
l'i(X) + (1 - 1) - Ll'i(X)
m
with 1 E [0, 1].

Definition 3 [Werners 1984]

Let I'i(x) be membership functions of fuzzy sets to be aggregated in the sense of


a fuzzy or (or). The membership function of the resulting fuzzy set is then
defined as

m 1 m
I'Of(x) = l ·l!lax
l I'i(x) + (1- 1) m- Ll'i(x)
,- i-I

with 1 E [0, 1].

These two connectives are not inductive and associative, but they are
commutative, idempotent, strictly monotonic increasing in each component,
continuous, and compensatory [Werners 1984, p. 168]. These are certainly very
useful and acceptable properties.
If we use the aggregation operator from definition 2 in model (4), then
the "equivalent model" is:

maximize

such that A + Ai S l'i(X) i=1,..., m


Dxsd

(26)

If (A 0, A~, xO) is optimal solution of (26) then xO is a maximizing solution to (3).


It is obvious that if I'i(x) are linear (26) is again a standard linear programming
problem.

So far, the reference model from which we have departed has always
been the "standard LP". Depending on the type of operator chosen and the
111

operator used the "equivalent model" turns out to be either a linear or a


nonlinear programming model. Obviously, other reference models can be
chosen as reference models. This has already been done, for instance, for
integer programming [Zimmermann, Pollatschek 1984; Ignizio et al. 1983],
Fractional Programming [Luhandjula 1984], Nonlinear Programming [Sakawa
et al. 1989], etc.. The interrelationships between stochastic and fuzzy
programming and their possible integration have also been
investigated.[Buckley 1990; Dubois 1986]

3. FUZZY MATHEMATICAL PROGRAMMING WITH


FUZZY PARAMETERS
Already for the basic approach described in section 2 a unique
formulation for the "equivalent model", which eventually has to be solved, did
not exist - the diversity of algorithmic approaches is even larger if other types of
fuzzification of elements of mathematical programming models are considered.
To demonstrate the basic idea behind most of the approaches we shall describe
an easy to understand suggestion. For more general models the reader has to be
referred to the literature.
Ramik and Rim:1nek [1985] consider the problem

maximize f(x)

such that 'ail xI€a'iliz Xz EB ... Ea 'ain Xn :S ~, i =1,... , m


Xj <!: 0, j = ,... ,n
(27)

The 'ilij and the hi are supposed to be fuzzy numbers in L-R-


representation. e denotes the extended addition. They show that for two fuzzy
L-R numbers a = (m, n, a, ,8k-R and b = (p, q, ,,(, ok-R a :S b holds iff the
following 4 inequalities hold:

(28)

where ER = sup {u; R(u) = R(O) = 1 },


OR = inf in; R(n) = lim R(s)}
°
s ~ co
and E L> L correspondingly for L.
112

For "symmetric" fuzzy numbers a = (m, m, 0:, 0: k-L as shown in fig. 1 system
(28) reduces to

(29)

o m -(}' m m+{) t

FIG. 1: Fuzzy triangular number a = (m, m, 0:, ,8k-L

On the basis of a lemma that they proof in their paper:

(30)

Hence, the constraints of (27) can be written as

n n
- 0L ( .LO:ijXr1'i)~Pi- .LmijXj'
J-1 J-1

n n
£R (L,8 .. x:- o.)<q.-
j-1 IJ--J 1 - 1
Ln··x·
j_11J J'

n n
",8 IJ.. x:-
( .k -] o.)<q.-
1 - 1 "n··x·
.k IJ J'
J"1 J-1 (31)
113

(31) is a system of crisp linear inequalities, which - together with the crisp
objective function - can now be solved with any classical LP-method. Not
counting the nonnegativity constraints, the number of rows in (31) is, however,
four times as large as that of (27). It should also be noted that (28) is a specific
interpretation ·of the fuzzy inequality relation. The authors offer two other
interpretations, which lead to slightly different results.

Example 2 [Ramik, Rim~nek 1985]

Consider the following linear programming problem with fuzzy constraints.

Maximize z = 5xI + 4x2

subject to (4,4,2, 1k-LxI E9 (5, 5, 3, 1k_Lx2~(24, 24, 5, 8k-v


(4,4, 1, 2k-LxI E9 (1,1,0.5, lk-L x~12, 12,6, 3k-v

(32)

with the function L: [0, +00 [-+ [0, 1] being defined by the formula L(u) = max
{0,1 - u} for u ~ O.

As £L = 0, 6 L = 1, applying formulae (31) the system (33) is


equivalent to the system of ordinary inequalities

4XI + 5x2 ~ 24,


4xI + x2 ~ 12,
2x1 + 2x2 ~ 19,
3xI + 0.5x2 ~ 6,
5xI + 6x2 ~ 32,
6xI + 2x2 ~ 15,
xI,x2 ~ 0 (33)

In this way, the problem has been transformed to a classical linear


programming problem with the optimal solution and the corresponding value
of the ojective function being

Xl = 1.5, x2 = 3, Z = 19.5 (34)

With respect to section 2 complementary approaches to the one


described above are Tanaka et al. [1984, 1985]. There similar assumptions
114

concerning the fuzzy sets are made, but the objeCtive function(s) and the
nonnegativity constraints are also fuzzified. Also Rommelfanger [1989] goes
into this direction.
More general treatments of this problem can be found in Delgado et
al. [1989], Dubois [1987], Orlovski [1989] and others.

4. APPLICATIONS
Fuzzy Mathematical Programming has been applied to other areas of
theoretical investigations as well as to practical applications.

4A METHODOLOGICAL APPLICATIONS
Due to the "symmetry" of the majority of the models in FMP the number of
Objective functions does not matter. In classical mathematical programming,
however, normally only one Objective function, which generates the order over
the solution space, could be accepted. If there are more than one objective
function, multi objective decision making models or "vectorial optimization"
models have to be applied, which normally require a much higher
computational effort. It is, therefore, quite natural that FMP has been applied
extensively to the area of multi criteria analysis.

If the Objective function can be calibrated by given aspiration levels


either model (4) can be used directly or Tanaka et al. [1985] can be used for
fuzzy parameters. If the objective functions canno,t be calibrated naturally,
model (19) can be used. This is even possible if the constraints are crisp rather
than fuzzy. [Zimmermann 1978]
Modern systems for multi objective decision making are generally
interactive. FMP has also been applied to these decision making tools [Sakawa
et al. 1990; Yano et al. 1989].
An interesting application of FMP is Campos' contribution [1989] in
which zero-sum matrix games with imprecise payoffs are considered.
Somewhere between methodological applications and real applications
(which have been installed and used in practice) are those appplications of
FMP in which solutions to functional problems have been suggested but not
(yet) really been implemented. Examples for this type of application are
described in Wiedey, Zimmermann [1978], (Media Selection), Ernst [1982]
(Logistics), Holtz, Desonki [1981] (Maintenance), Hintz, Zimmermann [1989]
(Production Planning), Nickels [1990] (Cutting Stock Problem), etc..
115

4B PRACfICAL APPLICATIONS

Real applications of fuzzy mathematical programming are still pretty rare. This
is certainly not due to weaknesses of FLP. Experience shows that people who
have been using linear programming for quite a while have become so used to
"cutting problems" to LP-models that they do not see the need to allow for
uncertainty. The acceptance of FLP seems to be higher amongst people who
have never used LP before and who are looking for tools to solve their
problems properly. Another reason for not finding interesting applications in
the literature is, of course, that good applications are not published for
competitive reasons and failures are not published for other obvious reasons.
FLP has been applied to blending problems with sensory constraints (such as
the blending of chocolate stretch, champagne cuvoo, paints etc.). The paint
application was published [Zimmermann et at. 1986J. Another application was
in logistics by Ernst [1982J, which we will sketch in the following. He suggests a
fuzzy model for the determination of time schedules for containerships, which
can be solved by branch and bound, and a model for the scheduling of
containers on containerships, which results eventually in an LP. We shall only
consider the last model (a real project).
The model contained in a realistic setting approximately 2,000
constraints and originally 21,000 variables, which could then be reduced to
approximately 500 variables. Thus it could be handled adequately on a modern
computer. It is obvious, however, that a description of this model in a textbook
would not be possible. We shall, therefore sketch the contents of the modeling
verbally and then concentrate on the aspects that included fuzziness.
The system is the core of a decision support system for the purpose of
scheduling properly the inventory, movement, and availability of containers,
especially empty containers, in and between 15 harbors. The containers were
Shipped according to known time schedules on approximately 10 big
containerships worldwide on 40 routes. The demand for container space in the
harbors was to a high extent stochastic. Thus the demand for empty containers
in different harbors could either be satisfied by large inventories of empty
containers in all harbors, causing high inventory costs, or they could be shipped
from their locations to the locations where they were needed, causing high
shipping costs and time delays.
Thus the system tries to control optimally primarily the movements
and inventories of empty containers, the capacities of the ships, and the
predetermined time schedule of the Ships.
This problem was formulated as a large LP model. The objective
function maximized profit (from shipping full containers) minus cost for
moving empty containers minus inventory cost of empty containers. When
comparing data of past periods with the model it turned out, that very often
ships transported more containers than their specific maximum capacity. This,
116

after further investigations, lead to a fuzzification of the ship's capacity


constraints, which will be described in the next model.

[Ernst 1982, p. 90J

Let

z=cTx the net profit to be maximized


Bxsb the set of crisp constraints
Axsd
-- the set of capacity constraints for which a crisp formulation
turned out to be inappropriate

Then the problem to be solved is:

maximize z = cTx

such that Ax ~ d
Bxsb
x~O (35)

This corresponds to (14). Rather than using (18) to arrive at a crisp


equivalent LP model the following approach was used: Basing on (11) and (12)
the following membership functions were defined for those constraints that
were fuzzy:

I = Index set of fuzzy constraints.


As the equivalent Crisp model to (35) the following LP was used:

maximize z'= cT - if
1;s·(p.
II I
- b-)I'·
I I
(1-)
I

such that Ax s d + t
Bxsb
tsp-b
x, t~O (36)

where the Sj are problem-dependent scaling factors with penalty character.


Formulation (36) only makes sense if problem-dependent penalty
terms Sj' which also have the required scaling property, can be found and
justified.
In this case the following definitions performed successfully: First the
crisp constraints Bxsb were replaced by Bxs.9b, providing a 10% leeway of
117

capacity, which was desirable for reasons of safety. Then "tolerance" variables t
were introduced:

Bx - t s .9b
t s .1b
The objective function became

s was defined to be
average profit of shipping a full container
s= average number of time periods which elapsed
between departure and arrival of a container

By the use of this definition more than 90% of the capacity of the ships
was used only if and when very profitable full containers were available for
shipping at the ports, a policy that seemed to be very desirable to the decision
makers.

S. CONCLUSIONS

Mathematical programming is one of the areas to which fuzzy set theory has
been applied extensively. Even if one considers the area of linear programming
only, numerous new models - linear and nonlinear - have emerged through the
application of fuzzy set theory. A good part of the models are of primarily
theoretical interest. Still even from an application point of view, fuzzy
mathematical programming is a valuable extension of traditional crisp
optimization models. It is surprising that some areas, such as duality theory,
have not yet drawn more interest. There further developments can still be
expected.

REFERENCES

Behringer, F.A [1977]. lexicographic Quasiconcave Multiobjective


Programming. In: Z Operations Research 21, pp.103-116.
Behringer, F.A [1981]. A Simplex Based Agorithm for the Lexicographically
Extended Linear Maxmin Problem. In: European Journal of
Operational Research 7, pp. 274-283.
Bellman, R.E.; Zadeh, L.A [1970]. Decision-making in a Fuzzy Environment.
In: Management Science 17, pp. B141-164.
118

Buckley, J.J. [1989]. Solving Possibilistic Linear Programming Problems. In:


Fuzzy Sets and Systems 31, pp. 329-341.
Buckley, JJ. [1990]. Stochastic Versus Possibilistic Programming. In: Fuzzy
Sets and Systems 34, pp. 173-177.
Buckley, J.J. [1990]. Multiobjective Possibilistic Linear Programming. In: Fuzzy
Sets and Systems 35, pp. 23-28.
Campos, L. [1989]. Fuzzy Linear Programming Models to Solve Fuzzy Matrix
Games. In: Fuzzy Sets and Systems 32, pp. 275-289.
Chanas, S. [1983]. The Use of Parametric Programming in Fuzzy Linear
Programming. In: Fuzzy Sets and Systems, pp. 243-251.
Chanas, S. (1989). Fuzzy Programming in Multiobjective Linear Programming -
A Parametric Approach. In: Fuzzy Sets and Systems 29, pp. 303-313.
Chang, LL [1975]. Interpretation and Execution of Fuzzy Programs. In: Zadeh
et at (eds.) 1975, pp.191-218.
Delgado, M.; Verdegay, J.L; Vila, M.A [1989]. A General Model for Fuzzy
Linear Programming. In: Fuzzy Sets and Systems 29, pp. 21-29.
Dubois, D. (1984). Linear Programming with Fuzzy Data. In: Bezdek, J.c.(ed.),
analysis of Fuzzy Information, Vol. III - Applications in Engineering and
Science, Boca Raton 1987.
Ernst, E. [1982]. Fahrplanerstellung und Umlaufdisposition im
Containerschiffsverkehr (Diss. RWTH Aachen) Frankfun/M., Bern.
Fabian, L; Stoica, M. [1984]. Fuzzy Integer Programming. In: Zimmermann et
al. (Eds.) 1984, pp. 123-132.
Fuller, R. [1989]. On Stability in Fuzzy Linear Programming Problems. In:
Fuzzy Sets and Systems 30, pp. 339-344.
Hamacher, H.; Leberling, H.; Zimmermann, H.-J. [1978] Sensitivity Analysis in
Fuzzy Linear Programming. In: Fuzzy Sets and Systems 1, pp. 269-281.
Hannan, E.L. [1981]. Linear Programming with Multiple Fuzzy Goals. In:
Fuzzy Sets and Systems 6, pp. 235-248.
Hintz, G.-W., Zimmermann, H.-J.[1989]. A Method to Control Flexible
Manufacturing Systems. In: European Journal of Operations Research
41, pp. 321-334.
Holtz, M.; Desonski, Dr. [1981]. Fuzzy-Model fOr Instandhaltung. In:
Unscharfe Modellbildung und Steuerung IV, Karl-Marx- Stadt, pp.54-
62.
Inuiguchi, M.; Ichihashi, H.; Kume, Y. (1990). A Solution Algorithm for Fuzzy
Linear Programming with Piecewise Linear Membership Functions. In:
Fuzzy Sets and Systems 34, pp. 15-31.
Leberling, H. [1981]. On Finding Compromise Solutions in Multicriteria
Problems Using the Fuzzy Min-Operator. In: Fuzzy Sets and Systems 6,
pp. 105-118.
Luhandjula, M.K. (1984). Fuzzy Approaches for Multiple Objective Linear
Fractional Optimization. In: Fuzzy Sets and Systems 13, pp.11-24.
119

Nickels, W. [1990]. Ein wissensbasiertes System zur Produktionsplanung und -


steuerung in der Papierindustrie (Diss. RWTH Aachen), VOl
Fortschritt-Berichte Dusseldorf.
Orlovski, S.A [1985]. Mathematical Programming Problems with Fuzzy
Parameters. In: Kacprzyk, J. and Yager, RR(eds.), Management
Decision Support Systems using Fuzzy Sets and Possibility Theory, K61n
1985, pp. 136-145.
Orlovski, S.A [1977]. On Programming with Fuzzy Constraint Sets. In:
Kybernetes 6, pp. 197-20l.
Ostasiewicz, W. [1982]. A New Approach to Fuzzy Programming. In: Fuzzy
Sets and Systems 7, pp. 139-152.
Ramfk, J.; Rfm~nek, J. [1985]. Inequality Relation between Fuzzy Numbers and
its Use in Fuzzy Optimization. In: Fuzzy Sets and Systems 16, pp. 123-
138.
ROdder, W.; Zimmermann, H.-J. [1980]. Duality in Fuzzy Linear Programming.
In: Fiacco, AV.; Kortanek, K.O. (Eds.), Extremal Methods and Systems
Analyses, New York, pp. 415-429.
Rommelfanger, H.; Hanuscheck, R; Wolf, J. [1989]. Linear Programming with
Fuzzy Objectives. In: Fuzzy Sets and Systems 29, pp. 31-48.
Rubin, P.A; Narasimhan, R [1984]. In: Fuzzy Sets and Systems 14, pp. 115-
130.
Sakawa, M.; Yano, H. [1989]. Interactive Decision Making for Multiobjective
Nonlinear Programming Problems with Fuzzy Parameters. In: Fuzzy
Sets and Systems 29, pp. 315-326.
Sakawa, M. Yano, H. [1989]. An Interactive Fuzzy Satisficing Method for
Multiobjective Nonlinear Programming Problems with Fuzzy
Parameters. In: Fuzzy Sets and Systems 30, pp. 221-238.
Sakawa, M.; Yano, H. [1990]. An Interactive Fuzzy Satisficing Method for
Generalized Multiobjective Linear Programming Problems with Fuzzy
Parameters. In: Fuzzy Sets and Systems 35, pp. 125-142.
Tanaka, H.; Asai, K. [1984]. Fuzzy Linear Programming Problems with Fuzzy
Numbers. In: Fuzzy Sets and Systems 13, pp. 1-10.
Tanaka, H.; Ichihashi, H.;Asai, K. [1985]. Fuzzy Decision in Linear
Programming Problems with Trapezoid Fuzzy Parameters. In: Kacprzyk,
J. and Yager, RR(eds.), Management Decision Support Systems using
Fuzzy Sets and Possibility Theory, K61n 1985, pp. 146-154.
Tanaka, H.; Mizumoto, M. [1975]. Fuzzy Programs and their Execution. In:
Zadeh et at. (Eds.), pp. 41-76.
Tanaka, H.; Okuda, T.; Asai, K. [1974]. On Fuzzy Mathematical Programming.
In: Journal of Cybernetics 3, pp. 37-46.
Thole, U.; Zimmermann, H.-J.; Zysno, P. [1979]. On the Suitability of
Minimum and Product Operators for the Intersection of Fuzzy Sets. In:
Fuzzy Sets and Systems 2, pp. 167-180.
Verdegay, J.L. [1984]. A Dual Approach to Solve the Fuzzy Linear
Programming Problem. In: Fuzzy Sets and Systems 14, pp.131-141.
120

Werners, B. [1984]. Interaktive Entscheidungsunterstutzung durch ein flexibles


mathematisches Programmierungssystem (Diss. RWTH Aachen),
Munchen.
Werners, B. [1987]. Interactive Multiple Objective Programming Subject to
flexible Constraints. In: European Journal of Operational Research 31,
pp. 324-349.
Werners, B. [1988]. Aggregation Models in Mathematical Programming. In:
Mitra, G. (ed.), Mathematical Models for Decision Support, Berlin,
Heidelberg, New York, pp. 295-305.
Wiedey, G.; Zimmermann, H.-J. [1978]. Media Selection and Fuzzy Linear
Programming. In: Journal of the Operational Society 29, pp. 1071-1084.
Yano, H.; Sakawa, M. [1989]. Interactive Fuzzy Decision Making for
Generalized Multiobjective Linear Fractional Programming Problems
with Fuzzy Parameters. In: Fuzzy Sets and Systems 32, pp. 245-261.
Zadeh, L.A; Fu, K.S.; Tanaka, K.; Simura, M. (Eds.). [1975]. Fuzzy Sets and
their Applications to Cognitive and Decision Processes, New York.
Zimmermann, H.-J. [1976]. Description and Optimization of Fuzzy Systems.
In: International Journal of General Systems 2, pp. 209-215.
Zimmermann, H.-J. [1978], Fuzzy Programming and Linear Programming with
Several Objective Functions. In: Fuzzy Sets and Systems 1, pp. 45-55.
Zimmermann, H.-J. [1990]. Fuzzy Set Theory - and its Applications (Rev. Ed.),
Boston, Dordrecht, Lancaster.
Zimmermann, H.-J. [1986]. Fuzzy Set Theory and Mathematical Programming.
In: Jones, A et al. (eds.), Fuzzy Sets Theory and Applications, pp.99-
114.
Zimmermann, H.-J. [1987]. Fuzzy Sets, Decision Making and Expert Systems,
Boston, Dordrecht, Lancaster.
Zimmermann, H.-J. (with Hermanns, Kaffenberger, R<>dder, Seiter,
Stilianakis)[1986]. Lack- und Farbmischungen zu minimalen Kosten. In:
Farbe & Lack 92, pp.379-382.
Zimmermann, H.-J.; Pollatschek, M.A [1984]. Fuzzy 0-1 Programs. In:
Zimmermann, Zadeh Gaines (eds.). Fuzzy Sets and Decision Analysis,
Amsterdam, New York 1984. pp. 133-146.
Zimmermann, H.-J.; Zadeh, L.A; Gaines, B.R. (Eds.) [1984]. Fuzzy Sets and
Decision Analysis, New York.
Zimmermann, H.-J.; Zysno, P. [1980]. Latent Connectives in Human Decision
Making. In: Fuzzy Sets and Systems, pp. 37-51.
Zimmermann, H.-J.; Zysno, P. [1983]. Decisions and Evaluations by
Hierarchical Aggregation of Information. In: Fuzzy Sets and Systems 10,
pp.243-266.
6
FUZZY SET METHODS IN COMPUTER VISION

James M. Keller and Raghu Krishnapuram


Electrical and Computer Engineering
University of Missouri-Columbia
Columbia, Missouri 65211

INTRODUCTION
Computer vision is the study of theories and algorithms involving the
sensing and transmission of images; preprocessing of digital images for noise
removal, smoothing, or sharpening of contrast; segmentation of images to isolate
objects and regions; description and recognition of the segmented regions; and
finally interpretation of the scene. We normally think of images in the visible
spectrum, either monochrome or color, but in fact, images can be produced by
a wide range of sensing modalities including X-rays, neutrons, ultrasound,
pressure sensing, laser range finding, infrared, and ultraviolet, to name a few.

Uncertainty abounds in every phase of computer vision. Some of the


sources of this uncertainty include: additive and non-additive noise of various
sorts and distributions in the sensing and transmission processes, questions which
are often ill-posed, vagueness in class definitions, imprecisions in computations,
ambiguity of representations, and general problems in the interpretation of
complex scenes. The use of multiple modalities is receiving increased attention
as a means of overcoming some of the limitations imposed by a single image, but
the use of more than one source of information has caused new uncertainties to
surface: how should the complementary and supplementary information be
combined?, how should redundant information be treated?, how should conflicts
be resolved?, etc.

Traditionally, probability theory was the primary mathematical model


used to deal with uncertainty problems in computer vision. More recently, both
Dempster-Shafer belief theory and fuzzy set theory have gained popularity in
modeling and propagating uncertainty in imaging applications. While both
probability theory and belief theory are important frameworks for this field, the
purpose of this paper is to explore the use of fuzzy set theory in computer vision.
We will consider contributions of fuzzy sets to the image model, preprocessing,
segmentation, object/region recognition, and reasoning aspects of the computer
vision problem. Most of the examples given hi this paper are those of the
authors and we apologize a priori to those researchers whose work we will
undoubtedly (though inadvertently) omit from the references.
122

LOW LEVEL IMAGE PROCESSING


An image is a function f. R!'-+R!" where normally n is 2 or 3 and m is
1 (intensity) or 3 (color). However, images can be constructed over numerous
modalities, as well as over time, and so, the dimension of the range space can
be quite large. A digital image is an image which has been discretized in both
the domain and range spaces. This is commonly referred to as sampling and
quantization respectively. In this paper we will restrict ourselves to two spatial
coordinates, where each element P=(x,y) in the domain of the image is called
a pixel. If m = 1, then the value f(x,y) is called the gray level of pixel (x,y); if
m > 1, then f(x,y) is referred to a feature vector.

The first connection of fuzzy set theory to computer vision was made by
Prewitt [1] who suggested that the results of image segmentation should be fuzzy
subsets rather than crisp subsets of the image plane. In order to apply the rich
assortment of fuzzy set theoretic operators to an image, the gray levels (or
feature values) must be converted to membership values. Let X denote the
domain of the digital image. Then a fuzzy subset of X is a mapping J.'f X -+[0,1],
where the value of J.'/x,y) is dependant upon the original feature vector f(x,y).
The calculation of membership functions is central to the application of fuzzy set
theory, just as the calculation of conditional probability density functions or basic
probability assignments are crucial in the use of probabilistic or Dempster-Shafer
belief models.

There are many methods of transforming pixel feature vectors into


membership functions. In the case of gray scale images several authors have
used S-functions when there are only two regions (object and background) and
combinations of S-functions and 1r-functions for multiple regions (or suitable
generalizations) [2-6]. These functions are defmed by [7]
o z sa
?( z-a)2 a<zsb
\c-a
S (z;a,b,c) =
1-,( Z-C)2 b<z s c
\c-a
1 z>c
with b = a+c .
2 '
and

S (z., c-b, c-!!.2' c) zs c


IT (z; b, c) = { b
1-S (z; c, c+2"' C+b) Z > c,
where z is the gray level at pixel P. These functions are symmetric, but can be
123

easily made nonsymmetric by relaxing the requirement that b be the midpoint


of a and c. Intuitively, these functions correspond to the statements HZ is brightH
and HZ is approximatelyc respectively. Pal and King [2-4) usedS and 11" functions,
H
,

along with approximations of them, as the basic building blocks for both contrast
enhancement and smoothing. Following Nakagawa and Rosenfeld [8), they
applied min and max operations on membership values in the neighborhood of
each pixel to produce smoothing or edge detection. Other approaches to edge
detection using fuzzy set methods can be found in [9,10).

One problem with this approach is that the parameters which defme the
membership functions must be supplied, primarily in an interactive fashion by
the user. Pal and Rosenfeld [11), in a two class segmentation problem,
automated this process by using several choices and picking the one which
optimized a certain geometric criterion which we will describe later. Recently,
we have used normalized histograms of the feature values generated from
training data to estimate the particular membership functions [12-14). This has
the advantages that it does not force any particular shape to the resultant
distributions, can be extended to deal with multiple features instead of gray level
alone, and can easily accommodate the addition of new classes.

Probably the most popular method of assigning multi-class membership


values to pixels, for either segmentation or other processing, is to use the fuzzy
c-means (FCM) algorithm [15,16). Let R be the set of real numbers and Rd be
the d-dimensional vector space over the reals. Let X be a fmite subset of Rd ,
X = {xl' x~ ..., xn }. In our case, each Xi is a feature vector for a pixel in the
image. For an integer c, 2:s; c:s; n, a c x n matrix U = [uik) is called a fuzzy
c-partition of X whenever the entries of U satisfy three constraints:
c

E
;=1
u/k =1 for all k

E"
1=1
U/k> 0 for all i

U/k E [0, 1] for all i, k.

Column j of the c x n matrix U represents membership values of Xj in the c fuzzy


subsets of X. Row i of U exhibits values of a membership function ui on X
whereby uik = ui(xk) denotes the grade of membership of xk in the ith fuzzy
subset ofX.

The FCM algorithm attempts to cluster feature vectors by searching for


local minima of the following objective function:
124
11 C

J". (U, y) =L L u';llxA: - viii!, 1 s: m < 00

1=1 i=1

where U is a fuzzy c-partition of X, 1*1.4 is any inner product norm, V = (vI'


v2> ..., vel is a set of cluster centers, v,-ERd , and mE(1,oo) is the membership
weighting exponent.

Cluster center vi is regarded as a prototypical member of class i, and the


norm measures the similarity (or dissimilarity) between the feature vectors and
cluster centers. When m = 1, 1m is the classical total within-group sum-of-
squared error function; the ui defme hard clusters in X and the Vi are the
centroids of the hard ui. It is shown in [16] that for m > 1 under the assumption

(t
that xk .. Vi for all i, k, (u, V) may be a local minimum of 1m only if

U
it
= (I\XA: - V~IAl2/("'-l)l-1 (1)
}=1 I\XA: - V~IA
for all i, k, and
11

1
L
= 1
".
Uit Xl
Vi = (2)
11

L
1 • 1
Uit
m

for all i.

The algorithm defmed by looping iteratively through the above


conditions is known to generate sequences (or subsequences) that terminate at
fIXed points of 1m. The FCM algorithm is comprised of the following steps

BEGIN
Set e, 2 ~ e < n
Set €, € ~ 0
Set m, 1 ~ m < 00
Initialize l.f>
Initialize j = 0
DO UNTIL ( IU j -U j-I 1 < € )
Increment j . .
Calculate {v/} using (2) and. U J-I
Compute rJ using (1) and {v/}
END DO UNTIL
END

The inner product norm "*"


A' or its replacement by more general distance
metrics ti(xk Vi) (as will be used later) controls the fmal shape of the clusters
125

generated by the FCM: hyperspherica1, hyperellipsoidal, linear subspace, etc.

In terms of generating membership functions for later processing, the


fuzzy c-means has several advantages. It is unsupervised, that is, it requires no
initial set of training data; it can be used with any number of features and any
number of classes; and it distributes the membership values in a normalized
fashion across the various classes based on "natural" groupings in feature space.
However, being unsupervised, it is not possible to predict ahead of time what
type of clusters will emerge from the fuzzy c-means from a perceptual
standpoint. Also, the number of classes must be specified for the algorithm to
run, although as will be seen in the next section, there are modifications which
avoid this problem. Finally, iteratively clustering features for a 512 x 512
resolution image can be quite time consuming. In [17, 18J, approximations and
simplifications were introduced to ease this computational burden.

SEGMENTATION
Image segmentation is one of the most critical components of the
computer vision process. Errors made in this stage will impact all higher level
activities. Therefore, methods which incorporate the uncertainty of object and
region definition and the faithfulness of the features to represent various objects
and regions are desirable.

The process of segmentation has been defmed by Horowitz and Pavlidis


[19J as follows: Given a defmition of uniformity, a segmentation is a partition
of the picture into connected subsets, each of which is uniform, but such that no
union of adjacent subsets is uniform.

This defmition is based on crisp set theory. The fuzzy c-partition


introduced in the previous section can be defmed as a fuzzy segmentation.
Furthermore, if we defme a uniformity predicate P(lJij) such that assigns the
value true or false to the sample point ~ based on its membership value (for
example, P(lJij) = 1 if lJij ~ IJki for an k), we will have paralleled crisp
segmentation. The fuzzy c-means lias been successfully used as a segmentation
approach by several researchers [17, 18, 2O-22J. (We will see an example of this
segmentation shortly).

All of the methods for converting image feature values into class mem-
bership numbers contain adjustable parameters: the cross-over point b for S and
11' functions, the fuzzifier m in the c-means, etc. Varying these parameters affects
the fmal fuzzy partition, and hence the ultimate crisp segmentation of the scene.
Also, the number of classes desired impacts the resultant distributions, since the
memberships are required to sum to one for a fuzzy c-partition. In some cases,
these problems are not serious. For example, many segmentation problems
involve separating an object from its background. Here the number of classes
is obviously two. However, in general situations, the choice of these parameters
must be carefully considered.
126

The basic approach which is taken to pick the number of classes and/or
the function shaping parameters iteratively varies these parameters and picks the
set of values which optimizes some measure of the final fuzzy partition. The
optimization criteria can be based on the geometry of the fuzzy subsets of the
image or on properties of the clusters in feature space.

In a series of papers [23-25], Rosenfeld studied the geometry and


topology of fuzzy subsets of the digital (i.e., image) plane. These properties were
later generalized by Dubois and Jaulent [26, 27]. Many of the basic geometric
properties of, and relationships among, regions can be generalized to fuzzy
subsets. Rosenfeld has extended the theory of these fuzzy subsets to include the
topological concepts of connectedness, adjacency and surroundedness, extent and
diameter, and convexity. Rosenfeld et a1. have also developed geometrical
operations on fuzzy image subsets, including shrinking and expanding, and
thinning. [28, 29)

Of the above-mentioned geometrical properties, we discuss here only the


connectedness, area, perimeter, and compactness of a fuzzy image subset,
characterized by a membership function array I'A.Xij). In defining the above
mentioned parameters we replace I'I-xij) by I' fol simplicity.

A neighbor can be dermed in several ways. In two-dimensional images,


point P = (.x;y) of a digital image has two horizontal and two vertical neighbors,
namely the points: (x-I. y), (.x; y-l), (x+ 1. y), and (.x; y+ 1). The four points are
called the 4-connected neighbors of P, and we say that they are 4-adjacent to P.
Similarly, P has four additional neighbors: (x-I. y-l), (x-I. y+ 1), (x+ 1. y-l), and
(x+ 1. y+ 1). We call these eight points the 8-connected neighbors of P (8-
adjacent to P) (30) .

Thedetinition of connectedness for the crisp case as dermed by


Rosenfeld (30) is as follows: Let P, Q be two points of an image. A path p of
length n from P to Q in an image is a sequence of points P = P l' P2' ... , Pn =
Q such that Pi is a neighbor of Pi-1, l<i<n. There are two versions of path p
(a 4-path or an 8-path) depending on whether "neighbor'" means N4-neighborN or N8-
neighbor'". If P and Q are points of an image subset S, we say that P is 4-(8-)
connected to Q in S if there exists a 4- (or 8-) path from P to Q. For any Pin
S, the set of points which are connected to PinS is called a connected
component of S.

For the fuzzy case, let I' be a mapping from X into [0,1), that is, let I'
be a fuzzy subset of X . Let P, Q E X . Then the degree of connectedness of P
and Q with respect to I' is

Cp(P.Q) = _ ....,( min ~(R»)


~lt~PI'Q
where the operator max is taken over all paths PPQ (either 4- or 8-path
connected) from P to Q, and the operator min is taken over all points R on the
127

path. P and Q are said to be connected in J1. if


CIl(P,Q) ~ min (11(P), 11 (Q».

The fuzzy set J1. is said to be connected if every pair of points P, Q is connected
in J1..

The area of J1. is defmed as


a(Jl) = JJl
where the integral is taken over the whole image set, or for the digital case,
],I N
a(Jl) = L LII
III
Jl/llll'

Let us call a fuzzy subset J1. of S "piecewise constant" if there exists a


segmentation E = {S l' ... , Sn} of S such that J1. has a constant value J1.i on each
Si and J1. = 0 on Sn (i.e., J1. n = 0). Here, Sn is considered the boundary of the
image. If J1. is piecewise constant (for example, in a digital image) a(J1.) is the
weighted sum of the areas of the regions on which J1. has constant values, where
the areas of the regions are weighted by these values.

For the piecewise constant case, the perimeter of J1. is dermed as

This is just the weighted sum of the lengths of the arcsA ijk along which
the i-th and j-th regions having constant J.£ values J.£i and J.£j respectively meet,
weighted by the absolute difference of these values.

Considering the 4-adjacent definition for connectedness, the above


equation for p(J.£) reduces to:
],I N-I N ],I-I

p(Jl) = .E.E
. =1 11=1
IJl/1111 -JlIII,II+11 + L .E IJlJIIII -Jl ...1.1I1·
"zl . =1

The compactness of J.£ is then dermed as


comp(l1) = a(Jl) .
p2(Jl)

For crisp sets, the compactness is largest for a disk, where it is equal to
1/411". For a fuzzy disk where J.£ depends only on the distance from the origin
(center), it can be shown that
a(l1) ~.1...
p2(Jl) 4ft

In other words, of all possible fuzzy disks, the compactness is smallest for the
crisp version.
128

Therefore, one approach to finding an optimal partition is to generate


many candidate partitions by varying the membership generation parameters and
then choosing the partition which minimizes the fuzzy compactness of the result.
Pal and Rosenfeld used this technique in a two class problem (object and back-
ground) to fmd the best choice of S function shaping parameters [11). In [31]
Liao extended this approach to the case of several features and several classes
by using the fuzzy c-means. Here the fuzzifier m was the variable. Figure la
shows a 256 x 256 forward looking infrared image of a natural scene containing
trees, grass areas and two vehicles. Because of the noisy .nature of infrared
images, this picture was smoothed using local averaging (Figure Ib). The
number of classes was fixed at 4 and the fuzzifier m varied from 1.2 to 5.0. For
each choice of m, the sum of the compactness values of the resultant four fuzzy
subsets was computed and the value of m (m =3.0) giving minimal overall
compactness was chosen. The result of the closest crisp partition segmentation
is shown in Figure 1c. Note that there are still many small noise components in
the segmentation. By smoothing the membership matrix, giving higher weight
to the vehicle class, and performing a noise cleaning operation (shrink-and-
expand) [30], the excellent segmentation shown in Figure Id was obtained. The
important point is that the initial segmentation formed a fuzzy c-partition of the
image, and so, post-processing on the fuzzy subsets of the image was possible.

The a priori setting of the number of classes is not always possible, espe-
cially in segmentation of natural scenes. In such cases an algorithm called the
Unsupervised Fuzzy Partition-Optimum Number of Clusters (UFP-ONC) algo-
rithm [32] may be used. The UFP-ONC algorithm is derived from a combination
of the fuzzy c-means algorithm and the fuzzy maximum likelihood estimation
(FMLE). It attempts to obtain a satisfactory solution to the problem of large
variability in cluster shapes and densities, and to the problem of unsupervised
tracking of classification prototypes. There are no initial conditions on the
location of cluster centroids, and classification prototypes are identified during
a process of unsupervised learning [32]. The algorithm is essentially the same
as the FCM algorithm described in the previous section, except that the distance
measure defmed by

d IF, pl2 {I'2(Xl -lI,)T F,-I}


2(Xl' 11,) = -p-exp (Xl-lI,), (3)

is used instead of the inner product norm. In (3) Pi is the fuzzy covariance
matrix of cluster i given by
= ~ U:(Xl -lI,) (Xl -lI,(
F, L.. ' (4)
I: ~ ..
L.. u/k
I:

and Pi is the a priori probability of the i-th cluster defmed by


LU/k (5)
p = _1:_ _
'N'
129

(a) (b)

(c) (d)

Figure 1. Segmentation using the fuzzy c-means. (a) Infrared image of


scene; (b) Figure la smoothed by local averaging; ( c) Closest
crisp segmentation from fuzzy 4-means with optimal choice of
m (3.0); (d) [mal segmentation using the fuzzy partition.
130

where N is the total number of feature vectors. In addition to updating the


cluster prototypes using (2) and the memberships using (3), the covariance
matrices Fi are also updated in every iteration. After the algorithm converges,
certain performance measures are computed for the resulting fuzzy partition.
This process is repeated for increasing number of clusters in the data set
computing performance measures in each run, until a partition into an optimal
number of subgroups is obtained.

The performance criteria in this algorithm is the minimization of the


overall hypervolume of the clusters as calculated from the determinants of the
fuzzy covariance matrices. Figure 2a shows an intensity image containing trees,
roads, and sky regions. A set of new local features, based on fractal geometry
was generated from this image [33]. Figure 2b shows the resulting segmentation
when these fractal features were used with the UFP-ONC algorithm. An
another example, Figure 2c shows the original range image of a block. Figure
2d shows the segmentation obtained when the mean curvature and another new
differential geometric feature [34] were used as the input to the UFP-ONC
algorithm. The clusters corresponding to the minimum total hypervolume in the
feature space are shown in Figure 2e. As can be seen the UFP-ONC algorithm
is effective in locating the ellipsoidal clusters of various sizes and orientations.

A different approach to both segmentation and object recognition is


taken by Krishnapuram and Lee [35, 36] and Keller, et al. [5, 37, 38]. The two
different techniques share the common idea that class labeling for segmentation
or object labeling for recognition should be viewed as an aggregation of evidence
problem. The evidence can be derived from several sensors (for example, color),
several distinct pattern recognition algorithms, different features, or the
combination of image data with non-image information (intelligence). The
advantages of multi-sensor fusion lie in redundancy, complementarity, timeliness
and low cost of the information. The support for a decision may depend on
supports for (or degrees of satisfaction of) several different criteria, and the
degree of satisfaction of each criterion may in turn depend on degrees of
satisfaction of other sub-criteria, and so on. Thus, the decision process can be
viewed as a hierarchical network, where each node in the network" aggregates" the
degree of satisfaction of a particular criterion from the observed evidence. The
inputs to each node are the degrees of satisfaction of each of the sub-criteria,
and the output is the aggregated degree of satisfaction of the criterion.

For image segmentation as discussed in [35, 36], the decision making


problem reduces to i) determining the structure of the aggregation network to
be used, ii) determining the nature of the connectives at each node of the
network, and iii) computing the input supports (degrees of satisfaction of
criteria) based on observed features.

The structure of the aggregation network depends on the problem at


hand [39]. The connectives used at each node of the network are based on fuzzy
union, fuzzy intersection, or compensative operators (such as generalized mean
131

. '. "'"
. ,:..,'\.

.'

Figure 2. Fuzzy segmentation fmding the optimum number of classes.


(a) intensity image of natural scence and (b) range image of
block; (c) & (d) optimal partition of top tow using UFP-ONC
algorithm; (e) feature space clusters for the block image.
132

or the 1-model) [39]. The innovative aspect of this work is that a backpropaga-
tion algorithm (and convergence theory) was developed so that both the type of
connective at each node, as well as the parameters associated with the connective
can be learned from training data [35, 36].

As an example, consider the fusion of information from different modal-


ities for segmentation of outdoor scenes. In particular, the modalities considered
are color images of size 256 x 256 (obtained from the University of Massachu-
setts) with intensity components r, g and b (red, green and blue). In an initial
experiment, to keep the problem tractable, the following features were used as
criteria: intensity value (r+g+bj3), blue-red difference b-r, excess green 2g-r-b,
and position (row number). The frrst three features correspond to the Ohta color
space and were chosen because they have been found to correspond to
meaningful colors and they have also been found to be effective for color image
segmentation [40]. The position of a pixel is important for labels such as sky and
road. The image was frrst median flltered, and the feature images were
normalized so that all the values fall between 0 and 255. The following six labels
were considered: sky, tree, roof, walls, grass, and road. In the example, we used
one layer aggregation networks based on the generalized mean to determine the
parameters of the network. About 60 training samples were taken from different
parts of the image for each class. Membership values were calculated using the
feature histograms of the training data. Since the histogram is very jagged, it
had to be smoothed by a window of length 11 before normalizing it. After
training, the network was used for segmentation of the image by assigning each
pixel to the class which had the highest degree of satisfaction generated from the
pixel's features.

Figure 3a shows the original intensity image and Figure 3b shows the
segmented and labeled image when the 1-model was used as the aggregation
operator. The labels in increasing order of grey level are: road, tree, wall, roof,
grass, and sky. The results are excellent, considering the small number of
features used and the simplicity of the network employed. Note that most of the
misclassifications occur at areas where the true label is not any of the six labels
considered. This segmented image was improved by a shrink-and-expand
operator and this image is shown in Figure 3c. An important point here is that
this method not only partitions the image into connected components of similar
properties, but also labels these components. In other words, it produces both
a segmentation and a region recognition simultaneously, while capturing an
abstract model of the decision making process.

The fuzzy integral has also been used to fuse both objective information
from features and (possibly subjective) information on the importance of subsets
of features for segmentation in [5, 37]. This approach will be described in the
section on object and region recognition.
133

(a) (b)

(c)

Figure 3. Multispectral segmentation by hierarchical fuzzy aggregations.


(a) Intensity image of natural scene; (b) Six class segmentation
and labeling; (c) Figure 3b cleaned up by shrink-and-expand
operator.
134

BOUNDARY DETECTION
Boundary detection is another approach to segmentation. In this
approach, an edge operator is first used on the image to detect edge elements.
The edge elements so detected are considered to be part of the boundaries
between various objects or regions in the image. The boundaries are sometimes
described in terms of analytical curves such as straight lines, circles, and other
higher degree curves.

The FCM algorithm can be used to detect (or fit) straight lines to edge
elements. This is achieved by initializing the FCM with c linear prototypes
rather than c centers. Each linear prototype consists of a point (which acts as
cluster center) and a parameter defining the orientation of the cluster. The
fuzzy covariance matrix Fi of each cluster (as defmed in (4» may be used to
define its orientation since its principal eigenvector gives the direction of
maximum variance of the cluster. The c prototypes are updated in each iteration
as described in the previous section except that in each iteration the covariance
matrix of each cluster is also updated. Several distance measures may be used
for the detection of lines. One of them is defmed by
d 2(xl' Vi) = ex,D~ + (1-ex) (d~2
where D i/ is the distance of the point from the line and di / is the Enclidean
distance between xk and Vi' 0i is chosen as 1-(>'l;1>'2i)' where >'li and >'2i are
the smaller and larger eigenvalues of cluster i [41]. We have shown that the
scaled Mahalanobis distance given by
z~ = IFi 1112 (Xi - 1I,JTF,- 1 (X, - 11) (6)
is also very effective for the detection of lines or linear clusters. [42]. In (6) Pi
is the fuzzy covariance matrix of cluster i as defmed in (4). As mentioned
earlier, one problem with the FCM is that the number of clusters needs to be
specilled. In the line detection case, one way to overcome this is to specify a
relatively high value of c and then merge compatible clusters after the algorithm
converges [42]. Figure 4 shows an example of this method. Figure 4a shows the
original image. This image is equivalent to the threshold output of an edge
operator (such as the Sobel operator) on an intensity image of the characters
UMC. Figure 4b shows the clustering when c was speclled to be 14. Note that
the leading stroke of both the U and the M are split into two subclusters (in
some examples the initial cluster organization is much worse). Figure 4c shows
the clustering after compatible clusters are merged. The final optimal number
of clusters was determined to be 10, which is correct in this case. In this
implementation, two (or more) clusters were considered compatible if i) their
orientation was the same, ii) the line joining their centers had the same orien-
tation as the clusters and iii) the cluster centers were not more than 4 principal
eigenvalues apart. The lines so found by the algorithm can then be used to
describe large sections of the boundary or the linear substructures in the image.
135

jVlr
.... I.

.
iI ...
!,o Ir.\ ."J
.. "

IL
, .
,:- . ;
~·l

[ ~
.
I I)
: r.,.
., !

(a) (b)

.. II,!Vt r-
L
_.J I I
B

(c)

Figure 4. Segmentation by boundary detection. (a) Simulated thresholded


edge output; (b) Output of modified FCM to detect linear
clusters (c = 14); (c) Optimal linear partitions by compatible
cluster merge algorithm (c = 10).
136

The FCM algorithm with linear prototypes may be generalized to detect


combinations of subspaces [43-44] and also non-linear clusters such as circles
[45].

There are numerous techniques for incorporating fuzzy set theoretic


operators into the segmentation process of which we have only highlighted a few.
It is our belief that the benefits of producing fuzzy subsets of the image will
encourage more research into the utilization of fuzzy approaches to this crucial
aspect of computer vision.

OBJECT/REGION RECOGNITION AND HIGH LEVEL VISION


The area of computer vision concerned with assigning meaningful labels
to regions in an image can be thought of as a subset of pattern recognition.
There is a large amount of research in the use of fuzzy set theory in pattern
recognition, but here we will only discuss a few approaches for object recognition
in image analysis.

As was seen in the previous section, the fuzzy-connective-based hierarch-


ical aggregation networks not only segmented an image, but also provided class
labels for each pixel based on local feature evidence and training information.
Normally, once segmentation has been completed, features are computed for the
entire region and this data is used to classify the areas found. The aggregation
networks can function well in this setting also. The reader is referred to [13] for
several examples of object recognition using fuzzy aggregation networks.

The fuzzy integral is another numeric-based approach which we have


used for both segmentation and object recognition [5,14,37,38]. It also uses a
hierarchical network of evidence sources to arrive at a confidence value for a
particular hypothesis or decision. The difference from the proceeding method
is that besides this directly supplied objective evidence, the fuzzy integral utilizes
information concerning the worth or importance of the sources in the decision
making process.

The fuzzy integral relies on the concept of a fuzzy measure which


generalizes probability measure in that it does not require additivity, replacing
it with a weaker continuity condition. A particularly useful set of fuzzy measures
is due to Sugeno [46]. A fuzzy measure g;. is called a Sugeno measure if it
satisfies the following additional property:

IfAnE = ~, theng;. (AUB) = g;.(A) + g;.(B) + A8;.(A)g;.(B),


for some ). > -1.

SuvposeXis a finite set, X = {Xl' ... ,xn }, and lett = g;,({xi}). Then the set
{g', ..., t'} is called the fuzzy density function for g;..

Using the above defmitions one can easily show that g;. can be
constructed from a fuzzy density function by
137

II (1 + 18~ - 1
8,'<..4.) = ....;z,u
_ _ _ __
1
It
for any subset A of X. Using the fact that X = U{Xi}' 1 can be determined
from the above equation. lEI

Let h: X-+[O,l). The fuzzy integral of h over X with respect to gl is


defined in (46) by:
fzh(X)o8). = sup [u A 8). (F.)]
ue[O,I]
where F. = (x e X I hex) ~ ul.
In our applications, the set X is the set of information sources (sensors,
algorithms, features, etc.) and the function h supplies a confidence value for a
particular hypothesis or class from the standpoint of each individual source of
information. The fuzzy measure supplies the expected worth of each subset of
sources from this hypothesis.

IfX= {x1' ...,xn }, is a finite set, arranged so that h(x1) ~ h(x2) ~ ... ~
h(xn ), then

f h(x)o8). = Y
II

[h(x) A 81(X,)]
X 1=1

where Xi = {xl' ... , xi}' Also, given>. as calculated above, the values g1 (Xi) can
be determined recursively from the definitions (46). The fuzzy integral is
interpreted as an evaluation of object classes where the subjectivity is embedded
in the fuzzy measure. In comparison with probability theory, the fuzzy integral
corresponds to the concept of expectation. In general, fuzzy integrals are
nonlinear functionals (although monotone) whereas ordinary (e.g., Lebesque)
integrals are linear functionals.

As an example, the fuzzy integral algorithm was tested using forward


looking infrared (FUR) images containing two tanks and an armored personnel
carrier (APC) (38). There were three sequences of 100 frames each used for
training purposes. In each sequence, the vehicles appeared at a different aspect
angle to the sensor (cP, 4SO, 9rP). In the fourth sequence the APC NcircledNone
of the tanks, moving in and out of a ravine and fmally coming toward to sensor.
This sequence was used to perform the comparison tests. The images were
preprocessed to extract object of interest windows. The classification level
integration was performed using four statistical features calculated from the
windows. To get the partial evaluation, hex), for each feature, the fuzzy two-
mean algorithm (16) was used. The fuzzy densities, the degree of importance of
each feature, were assigned based on how well these features separated the two
classes Tank and APC on training data [38). The result of the fuzzy integral
138

classifier is presented in the form of confusion matrix, in Table 1, where the


count of samples listed in each row are those which belong to the corresponding
class and the count of samples listed in each column are those after classifica-
tion, which was made by choosing the class with the largest integral value.

The fuzzy integral outperformed a simple Bayes classifier on this data,


but more importantly, the fmal integral values provide a different measure of
certainty in the classification than posterior probabilities. The integral evaluation
need not sum to one, so that lack of evidence and negative evidence can be
distinguished.

This approach was also compared to a Dempster-Shafer rule-based


classifier [47). A conceptual difference between the fuzzy integral and a
Dempster-Shafer classifier is in the frame of discernment [48]. For the fuzzy
integral the frame of discernment contains the knowledge sources related to the
hypothesis under consideration, whereas with belief theory, the frame of discern-
ment contains all of the possible hypotheses. Thus the fuzzy integral algorithm
has a means to assess the importance of all groups of knowledge sources towards
answering the questions as well as the degree to which each knowledge source
supports the hypothesis. With belief theory, each knowledge source would have
to generate a belief function over the power set of the set of hypotheses, which
are then combined using Dempster's rule. This calculation can have exponential
complexity with the number of hypotheses. With the fuzzy integral, the Sugeno
measure need only be calculated for n subsets (where n is the number of know-
ledge sources for each hypothesis). These measures are then combined with the
objective evidence to produce the integral values.

TABLE 1.

Fuzzy INTEGRAL ClASSIFIER FOR A Two ClASS A TR PROBLEM

Computed densities and ~ values


i I
Tank 0.16 0.23 0.19 0.22 0.760
APC 0.15 0.24 0.18 0.23 0.764
Confusion Matrix
Tank APC
Tank 175 1
APC 17 49
Total comct 92.6%

Recently, Tahani has extended this information fusion approach to a


large family of S-decomposable measures and generalized the definition of the
fuzzy integral, thereby significantly increasing the flexibility of this powerful tool
[14].
139

The above techniques, as well as many other fuzzy pattern recognition


algorithms, are numeric feature-based procedures. On the other hand, fuzzy
logic, and in general possibility theory, is inherently set-based, and so, offers the
potential to manipulate higher order concepts. For example, in [49] (and rermed
in [50)) Keller et al used linguistic weighted averaging of possibility distributions
[51] to generate object confidence from a combination of feature level results
and harder-to-quantify values relating to range and motion. Rough estimates of
object range and motion were used to construct trapezoidal possibility
distributions which were averaged, using alpha-level set methods [51], with
similar trapezoidal numbers formed from the output of fuzzy pattern recognition
algorithms such as the fuzzy k-nearest-neighbors [52]. In [50] we developed a
scaling technique to actually tum the averaging procedure into a confidence
fusion methodology overcoming the spreading inherent in fuzzy arithmetic.

In [53], normalized histograms of color components of images of beef


steaks were used directly in a linguistic approximation scheme to assess the
degree-of-doneness of the steak. It was felt that because of the large amount of
uncertainty inherent in food processing, the entire distribution of color (primarily
in the red and brown regions) was important for class recognition. Note that
this is conceptually distinct from those techniques described earlier which used
normalized histograms of training data to calculate membership numbers for
particular instances of the domain variable. Here, the object (a steak image) is
represented by a group of fuzzy sets (various color histograms) and a set-based
nearest prototype algorithm was used to assign class labels and confidences.

Rule-based systems have gained popularity in computer vision


applications, particularly in high level vision activities. In guiding the choice of
parameters for low level algorithms, a vision knowledge base may have a rule
such as

IF the range is LONG, THEN


the prescreener window size is SMALL.

If LONG and SMALL are modeled by possibility distributions over appropriate


domains of discourse, then fuzzy logic offers numerous approaches to translate
such rules and to make inferences from the rules and facts modeled similarly.
Nafarieh and Keller [54] designed a fuzzy logic rule based system for automatic
target recognition which contained the above rule and approximately 40 other
such rules.

Most fuzzy logic inference is based on Zadeh's composition rule. This


generalizes traditional modus ponens which states that from the proposition

Pi IfXisAThen YisB
and Pi XisA,
140

we can deduce Y is B. If proposition P2 did not exactly match the antecedent


of PI' for example, X is A', then the modus ponens rule would not apply.
However, in [55], Zadeh extended this rule if A, B, andA' are modeled by fuzzy
sets. In this case, X and Yare fuzzy variables [55] defined over universes of
discourse U and V respectively. As described above, the propositions X isA and
Y is B, where A and B are fuzzy subsets of U and V respectively, generate
possibility distributions for the variables X and Y. The proposition PI concerns
the joint fuzzy variable (X, YJ and is characterized by a fuzzy set over the cross
product space U x V. Specifically, PI is characterized by a possibility
distribution:
Uxly) =R where

11.(Il,V) = max{(l-I1,,(Il», 11.(v)}


It should be noted that this formula corresponds to the statement "not
A or B", the logical translation of P 1. Zadeh now makes the inference Y is B'
from PR and PA' by
11./(v) = max{min {11~(U,V), !lA,(Il)}}•

While this formulation of fuzzy inference, called the composition rule,
directly extends modus ponens, it suffers from some problems. In fact, if
proposition P' is X is A, the resultant fuzzy set is not exactly the fuzzy set B.

Besides changing the way in which PJ is translated into a possibility


distribution, methods involving truth modification have been proposed. In this
approach, the proposition X is A' is compared with X is A, and the degree of
compatibility is used to modify the membership function of B to get that for B'.

A fuzzy truth value restriction T is a fuzzy subset of X = [0,1], and can


be defined by its membership function, P T' which is a mapping

P T : X -> [0,1].

For example, we can defme fuzzy truth value restrictions true, very true, false,
unknown, absolutely true, absolutely false, etc.

In the truth value restriction methodology, the degree to which the


actual given value A' of a variable X agrees with the antecedent value A in a
proposition If X is A then Y is B is represented as a fuzzy subset of a truth
space. This fuzzy subset of truth space is what is referred to by the phrase truth
value restriction; it is used in a fuzzy deduction process to determine the
corresponding restriction on the truth value of the proposition Yis B. This latter
truth value restriction is then "inverted", which means that a fuzzy proposition Y
is B' in the Y universe of dis-course is found such that its agreement with Y is
B is equal to the truth value restriction derived by the aforementioned fuzzy
inference process. That is PB{v) = PT(PB(v». The rule-based system
141

described in [54] utilized a new inference technique based on truth value


restriction which outperformed most methods of fuzzy logic inference when the
inputs were exponentially defined functions of the antecedent clause (VERY
LONG, MORE-OR-LESS LONG, etc.).

To ease the computational burden of performing modus ponens


inferences with fuzzy sets, and to preserve the generalization capability, we
introduced neural network architectures to accomplish the fuzzy logic inferences.
These architectures could be trained on multiple conjunctive or disjunctive
antecedent clause rules and could actually store several compatible rules in one
structure, providing a natural method of conflict resolution [56-58].

CONCLUSIONS
The use of fuzzy set theory is growing in computer vision as it is in all
intelligent processing. The representation capability is flexible and intuitively
pleasing, the combination schemes are mathematically justifiable and can be
tailored to the particular problem at hand from low level aggregation to high
level inferencing, and the results of the algorithms are excellent, producing not
only crisp decisions when necessary, but also corresponding degrees of support.

There is much work left to be done at all levels of computer vision.


One area of particular need is the calculation and subsequent use of (fuzzy)
features from the output of fuzzy segmentation algorithms. More research is also
necessary in high level vision processes. Fuzzy set theory offers excellent poten-
tial for describing and manipulating object and region relationships, thereby as-
sisting with scene interpretation. rmally, we believe that possibility distributions
should be the model for the interface between (1) the human and the vision
system and (2) high level vision subsystem and mid or low level vision processes.

This paper represents a short survey of fuzzy set methods in computer


vision. Once again we apologize to all whose work we have inadvertently
omitted from review. We strongly believe in the potential of fuzzy set theory to
solve increasingly difficult computer vision problems, and hope that this survey
will increase research in this area.

REFERENCES

1. J.M. Prewitt, "Object enhancement and extraction", in Picture Processin~ and


Psychopictorics, B.S. Lipkin and A. Rosenfield (Eds.), Academic Press, New
York, 1970, pp. 75-149.

2. S.K. Pal, and RA. King. "Image enhancement using smoothing with fuzzy
sets; IEEE Transactions on System. Man. and Cybernetics, Vol. SMC-ll,
1981, pp. 494-501.

3. S.K. Pal, and RA. King. "Histogram equalization with S and '" functions in
detecting x-ray edges", Electronics Letters, Vol. 17, 1981, pp. 302-304.
142

4. S.K. Pal, and RA. King. HOn edge detection of x-ray images using fuzzy sets,H
IEEE Transactions on Pattern Analysis and Machine Intelli~nce, Vol. PAMI-
5,1983, pp. 69-77.

5. J . Keller, H. Qiu,and H. Tahani, "The fuzzy integral in image segmentation, H

Proceedin~! NAFIPS-86., New Orleans, June 1986, pp. 324-338.

6. R. Sankar, HImprovements in image enhancement using fuzzy setsH,


Proceedin~ NAFIPS-86., New Orleans, June 2-4, 1986, pp. 502-515.

7. LA. Zadeh, HCalculus of fuzzy restrictionsH, in Fum Sets and Their


Anlicatioos to Cognitive and Decision Processes, LA. Zadeh, K.S. Fu, K.
Tanaka, and M. Shimura, Eds., Academic Press, London, 1975, pp. 1-26.

8. Y. Nakagowa, and A. Rosenfeld, H A note on the use of local min and max
operators in digital picture processing,H IEEE Transactions on System. Man
and Gybernetics, Vol. SMC-8, 1978, pp. 632-635.

9. M.M. Gupta, G.K. Knopf, and P .N. Mikiforuk, HEdge Perception Using Fuzzy
LogicH, in Fum Coml'utin~ Theory Hardware and AJ!Plications, North
Holland, 1988.

10. Huntsberger, and M. Desclazi, HColor edge detectionH, Pattern Recognition


Letters, 3, 1985, 205.

11. S.K. Pal and A. Rosenfeld, HImage enhances and thresholding by optimization
of fuzzy compactnessH, Pattern Recwition Letters, vol. 7, 1988, pp. 77-86.

12. R. Krisbnapuram and J. Lee!. HFuzzy-Compensative-Connective-Based Hier-


archical Networks and their Application to Computer VisionN under review.

13. Lee, HFuzzy-Set-Theory-Based Aggregation Networks for Information Fusion


and Decision Making*, Ph.D. Thesis, University of Missouri - Columbia.

14. Tahani, "The generalized fuzzy integral in computer vision, N Ph.D.


dissertation, University of Missouri - Columbia, 1990.

15. J.C. Dunn, A fuzzy relative of the Isodata process and its use in detecting
compact well-separated clusters, Journal Cybernet 31(3), 1974, pp. 32-57.

16. C. Bezdek, Pattern Recognition with Fum Objective Function Algorithms,


Plenum Press, New York, 1981.

17. T. Huntsberger, C. Jacobs, and R. Cannon, HIterative fuzzy image


segmentation,H Pattern Recotnition, vol. 18, 1985, pp. 131-138.
143

18. R. Cannon, J. Dave and J. Bezdek, "Efficient implementation of the fuzzy c-


means clustering algorithm," IEEE Transactions on Pattern Analysis Machine
Intelligence. Vol. 8, No.2, 1986, pp. 248-255.

19. S. Horowitz and T. Pavlidis, "Picture segmentations by a directed split and


merge procedure", Proceedin&I ojthe Second International Journal Conference
Pattern Recowition, 1974, pp. 424-433.

20. T. Huntsberger., "Representation of uncertainty in low level vision", IEEE


Transactions on Computers, Vol. 235, No.2, 145, 1986, p. 145.

21. R. Cannon, J. Dave, J.C. Bezdek, and M. Trivedi, "Segmentation of a


thematic mapper image using the fuzzy c-means clustering algorithm," IEEE
Transactions on Geogrqphical Science and Remote Sensin2. Vol. 24, No.3,
1986, pp. 400-408.

22. J. Keller and C. Carpenter, "Image Segmentation in the Presence of Uncer-


tainty," International Journal of[ntelli~nt Systems, vol. 5, 1990, pp. 193-208.

23. A Rosenfeld, "Fuzzy digital topology" Jnfonnation and Control, 40, 1979, pp.
76-87.

24. A Rosenfeld, "On connectivity properties of gray scale pictures", Pattern


Recognition, 16, 1983, pp. 47-50.

25. A Rosenfeld, "The fuzzy geometry of image subsets", Pattern Recognition


Letters, 2, 1984, pp. 311-317.

26. D. Dubois and M.C. Jaulent, "Shape understanding via fuzzy models",2.n4
IFACIIFIPIIFORSIIEA Conference on analysis. design and evaluation of
man-machine systems, 1985, pp. 302-307.

27. D. Dubois and M.C. Jaulent, "A general approach to parameter evaluations
in fuzzy digital pictures", Pattern Recognition Letters. to appear.

28. S. Peleg and A Rosenfeld, "A mini-max medial axis transformation, IEEE
Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-3,
1981, pp. 208-210.

29. C.R. Dyer and A Rosenfeld, "Thinning operations on grayscale pictures,"


IEEE Transactions 011 Pattern Analysis and Machine Intelligence, Vol. PAMI-
1, 1979, pp. 88-89.

30. A Rosenfeld and AC. Kak, Di~tal Picture Processing, Vol. 2, Academic
Press, N.Y., 1982.

31. L. Liao, "Image segmentation and enhancement by optimizing geometric


parameters", M.S. Thesis, University of Missouri-Columbia, 1990.
144

32. I. Gath and A.B. Geva, "Unsupervised Optimal Fuzzy Clustering', IEEE
Transactions on Pattern Analysis Machine Intelli~nce, vol. PAMI-11, no. 7,
pp. m-781, July 1989.

33. J. Kel1er and Y. Seo, "Local fractal geometric features for image
segmentation", to appear International Journal oj Ima~n~ Systems and
Technology, 1990.

34. R. Krisbnapuram and A. Munshi, "Cluster-Based Segmentation of Range


Images Using Differential-Geometric Features", submitted to under review.

35. R. Krisbnapuram andJ. Lee "Fuzzy-Connective-Based Hierarchical Aggrega-


tion Networks for Decision Making", Fum Sets and $.ystems, to appear.

36. R. Krisbnapuram and J. Lee "Determining the Structure of Uncertainty


Management Networks", to appear in the Proceedin~ ofthe SPIE Conference
on Robotics and COm]!Uter VISion, Philadelphia, November 1989.

37. H . Qiu and J. Keller, "Multispectral segmentation using fuzzy techniques,"


Proceedings NAFIPS-87, Purdue University, May 1987, pp. 374-387.

38. H. Tahani and J. Keller, "Information fusion in computer vision using the
fuzzy integral", IEEE Transactions on $.ystem, Mal! and Cybernetics, vol. 20,
no. 3, 1990, pp. 733-741.

39. H.J. Zimmermann and P. Zysno "Decisions and evaluations by hierarchical


aggregation of information", Fum Sets and Systems, vol.l0, no.3, 1983 pp.
243-260.

40. Y. Ohta, Knowledge-Based Interpretation of Outdoor National Scenes, Pitman


Advanced Publishing, Boston, 1985.

41. R. Dave, "Use of the adaptive fuzzy clustering algorithm to detectIines in


digital images", Proceedin~ oj the Intelli~nt Robots and Computer Vision
VIII, vol. 1192, no. 2, 1989, pp. 600-611.

42. c.-P. Freg, "Algorithms to detect linear and planar clusters and their applica-
tions", MS Project Report, University of Missouri-Columbia, May 1990.

43. J. Bezdek, C. Cordy, R. Gunderson and J. Watson, "Detection and


characterization of cluster substructure", SIAM !oumalAnlied Mathematics,
Vol. 40, 1981, pp. 339-372.

44. M. Windham, "Geometrical fuzzy clustering algorithms", Fum Sets and


Systems, vol. 10, 1983, pp. 271-279.

45. R. Dave, "Fuzzy Shell-Clustering and applications to circle detection in digital


images", International Joumal of General Systems, 1990.
145

46. M. Sugeno, "Fuzzy measures and fuzzy integrals: A survey", in Fum Auto-
matic and Decision Processes, North Holland, Amsterdam, 19n, pp. 89-102.

47. J. Wootton, J. Keller, C. Carpenter, and G. Hobson, "A multiple hypothesis


rule-based automatic target recognizer", inPattem Recoznition, Lecture Notes
in Computer Science, Vol. 301, J. Kittler (ed.), Springer-Vedag, 1988, pp.
315-324.

48. G. Shafer, A Mathematical Theory of Evidence, Princeton University Press,


Princeton, 1976.

49. J. Keller, G. Hobson, J. Wootton, A. Nafarieh, and K. Luetkemeyer, "Fuzzy


confidence measures in midlevel vision," IEEE Transactions on System. Man
and Cybernetics, Vol. SMC-17, No.4, 1987, pp. 676-683.

50. J. Keller and D. Jeffreys, "Linguistic computations in computer vision",


Proceedin~ NAFIPS-90, Vol. 2, Toronto, 1990, pp. 432-435.

51. J. Keller, H. Shah, and F. Wong, "Fuzzy Computations in risk and decision
analysis", Civil EI!~neerin~ Systems, vol. 2, 1985, pp. 201-208.

52. J. Keller, M. Gray, and J. Givens, "A fuzzy k-nearest neighbor algorithm,"
IEEE Transactions on System. Man. and Cybernetics, vol. 15, 1985, pp. 580-
585.

53. J. Keller, D. Subhanghasen, K. Unklesbay, and N. Unklesbay, "An


approximate reasoning technique for recogntion in color images of beef
steaks", International Journal General Systems, to appear, 1990.

54. A. Nafarieh and J. Keller, "A fuzzy logic rule-based automatic target
recognizer", International Journal oflntel/i~ent Systems to appear, 1990.

55. L. Zadeh, "The concept of a linguistic variable and its application to


approximate reasoning", Infonnation Sciences, Part 1, Vol. 8, pp. 199-249;
Part 2, Vol. 8, pp. 301-357; Part 3, Vol. 9, pp. 43-80,1975.

56. J. Keller and H. Tahani, "Backpropagation neural networks for fuzzy logic",
Infonnanon Sciences, to appear 1990.

57. J. Keller and R. Yager, "Fuzzy logic inference neural networks",Proceedin~


of the SPIE Symposium on Intelli~ent Robots and Computer Vision VIII,
1989, pp. 582-591.

58. J. Keller and H. Tahani, "Implementation of conjunctive and disjunctive fuzzy


logic rules with neural networks'", International Journal of APJ!roximate
Reasonin~, to appear.
7
FUZZINESS, IMAGE INFORMATION
AND SCENE ANALYSIS

Sankar K. Pal *
Software Technology Branch/PT4
National Aeronautics and Space Administration
Lyndon B. Johnson Space Center
Houston, Texas 77058, U.S.A.

INTRODUCTION
An application of the theory of fuzzy subsets to image processing and scene
analysis problems has been described here. The problems considered are
(pre)processing of 2-dimensional image pattern, extraction of primitives, and
recognition and interpretation of image.
A gray tone picture possesses some ambiguity within the pixels due to the
possible multi valued levels of brightness. The incertitude in an image may arise
from grayness ambiguity or spatial (geometrical) ambiguity or both. Grayness
ambiguity means "indefiniteness" in deciding a pixel as white or black. Spatial
ambiguity refers to "indefiniteness" in shape and geometry of a region e.g., where is
the boundary or edge of a region? or is this contour "sharp"?
When the regions in a image are ill-defined (fuzzy), it is natural and also
appropriate to avoid committing ourselves to a specific (hard) decision e.g.,
segmentation/thresholding and skeletonization by allowing the segments or skeletons
or contours, to be fuzzy subsets of the image. Similarly, for describing and
interpreting ill-defined structural information in a pattern (when the pattern in-
determinary is due to inherent vagueness rather than randomness), it is natural to
define primitives and relation among them using labels of fuzzy set For example,
primitives may be defined in terms of arcs with varying grades of membership from 0
to 1 and production rules of a grammar may be fuzzified to account for the fuzziness
in physical relation among the primitives; thereby increasing the generative power of
a grammar.
The first part of the article consists of a definition of an image in the light of
fuzzy set theory, and various information measures (arising from fuzziness) and tools
relevant for processing e.g., fuzzy geometrical properties, correlation, bound
functions and entropy measures. The second part provides formulation of various
algorithms along with management of uncertainties (ambiguities) for image
enhancement, edge detection, skeletonization, filtering, segmentation and object
extraction. Ambiguity in evaluation and assessment of membership function has

* Dr. Pal is on leave from the post of Professor in the Electronics and
Communication Sciences Unit, Indian Statistical Institute, Calcutta 700035,
India
148

also been described here. The third part describes the way of exuacting various fuzzy
primitives in order to describe the contours of different object regions of an image.
Finally the fuzzy grammars are used to demonsuate how syntactic algorithms can be
formulated for identifying different region structures! classes of patterns. The above
features have been illusuated through examples and various image data.

IMAGE DEFINITION
An image X of size MxN and L levels can be considered as an array of fuzzy
singletons, each having a value of membership denoting its degree of brightness
relative to some brightness levell , l =0, 1, 2, .. . L - 1. In the notation of fuzzy
sets, we may therefore write
X = {Ilx(x mn )= Ilmn/xmn; m =1, 2 . . . M ; =
n 1, 2, ... N} (1)
X = UU Ilmn/xmn, m = 1,2, .. . , M; n = I, 2, ... N
mn
where

denotes the grade ofpossessiog some property Ilmn (e.g., brightness, edginess,
smoothness) by the (m,n)th pixel intensity xmn. In other words, a fuzzy subset of
an image X is a mapping Il from X into [0, 1]. For any point p eX, Il(P) is
called the degree of membership of p in Il.
One may use either global or local information of an image in defming a
membership function characterizing some property. For example, brightness or
darkness property can be defined only in terms of gray value of a pixel xmn whereas,
edginess, darkness or textural property need the neighborhood information of a pixel
to defme their membership functions. Similarly, positional or co-ordinate
information is necessary, in addition to gray level and neighborhood information to
characterize a dynamic property of an image.
Again, the aforesaid information can be used in a number of ways (in their
various functional forms), depending on individuals opinion and/or the problem to
his hand, to defme a requisite membership function for an image property.

MEASURES OF FUZZINESS AND IMAGE


INFORMATION
The definitions of various measures which represent grayness ambiguity in an
image (based on individual pixel as well as a collection of pixels) are listed below.

Linear Index of Fuzziness

(2)

m n
m = 1, 2, ... M; n = 1, 2, ... N
149

Quadratic Index oj Fuzziness


0.5
lq(X) = (2/.JMN) ;~{J.lmn -~mn }2
[ ]
(3)
m = I, 2, . . .M; n = I, 2, . . . N

Entropy
H(X) = (lfMN In 2)LLSn(J.lmn) (4)
mn

m = I, 2, . . . M; n = I, 2, . .. N
J.lmn denotes the degree of possessing some property J.l by the (m, n)th pixel
xmn . ,...
J.lmn denotes the nearest two tone version of J.lmn

rth Order Entropy

(5)
i = I, 2, ... k

sf denotes the ith combination (sequence) of r pixels in X. k is the number of such

sequences. J.L{sf) denotes the degree to which the combination sf, as a whole,
possesses the property J.l.
Hybrid Entropy
Hhy(X) = -Pw log Ew - I\, log Eb (6)

with Ew = (lfMN) LL J,lmn exp(l- J,lmn)


mn
Eb = (lfMN) LL(I-J,lmn)exp(J,lmn)
mn
m = I, 2, ... M; n = I, 2, ... N
J,lmn denotes the degree of "whiteness" of (m, n)th pixel. Pw and Pb denote
probability of occurrences of white (J.lmn= I) and black (J.lmn=0) pixels respectively.
Ew and ~ denote the average likeliness (possibility) of interpreting a pixel as white
and black respectively.
Correlation

C(J.ll, J.l2)=1-4[~~{J.llmn -J.l2mn}2 ]/<Xl +X2) (7)


150

C(IlI, 1l2)= 1 if XI +X2 =0

Xl = II{21l1mn _1}2
with mn

mn
m = 1, 2, . . . M; n = 1, 2, . . . N
C(Jl1' ~ ) denotes the correlation between two properties III and ~ (dermed over
the same domain). Illmn and 1l2mn denote the degree of possessing the properties III
and ~ respectively by the (m, n)th pixel.
These expressions (equations 2-7) are the versions extended to two dimensional
image plane from those defined for a fuzzy set For example, index of fuzziness was
defined by Kaufmann [1], entropy by DeLuca and Termini [2], rth order entropy and
hybrid entropy by Pal and Pal [3], and correlation by Murthy, Pal and Dutta
Majumdar [4].
Index of fuzziness reflects the ambiguity present in an image by measuring the
distance between its fuzzy property plane and the nearest ordinary plane. The term
"entropy", on the other hand, uses Shannon's function in the property plane but its
meaning is quite different from the one of classical entropy because no probabilistic
concept is needed to define it HT(X) gives a measure of the average amount of
difficulty in taking a decision on any subset of size r with respect to an image
property. Ifr = 1, Hr(X) reduces to (unnormalized) H(X) of equation (4). Hhy(X)
represents an amount of difficulty in deciding whether a pixel possesses certain
properties or not by making a prevision on its probability of occurrence. In absence
of fuzziness (i.e.,with proper defuzzification), Hhy reduces to two state classical
entropy of Shannon, the states being black and white. Since a fuzzy set is a
generalized version of an ordinary set, the entropy of a fuzzy set deserves to be a
generalized version of classical entropy by taking into account not only the fuzziness
of the set but also the underlying probability structure. In that respect, Hhy can be
regarded as a generalized entropy such that classical entropy becomes its special case
when fuzziness is properly removed.
All these terms, which give an idea of 'indefiniteness' or fuzziness of an image
may be regarded as the measures of average intrinsic information which is received
when one has to make a decision (as in pattern analysis) in order to classify the
ensembles of patterns described by a fuzzy set.
reX) and H(X) are normalized in the interval [0, 1] such that
Pr1: rmin = Hmin = 0 for Ilmn = 0 for all (m,n)(X) (8a)
Pr2: rmax = Hmax = 1 for Ilmn = 0.5 for all (m,n) (8b)

Pr3: r(X) ~ r(X*)(or, H(X)~H(X*)) (8c)


151

Pr4: y(X) = y(X)( or, H(X) ~ H(X)) (&1)


where X* is the 'sharpened' or 'intensified' version of X such that
Jl.x *(xmn) ~ Jl.x(xmn) if Jl.x(xmn) ~ 0.5
and Jl.x*(x mn ) ~ Jl.x(xmn) if Jl.x(xmn) ~ 0.5 (9)
In other words. Y(X) or H(X) increases monotonically with Jl., reaches a
maximum at Jl. =0.5 and then decreases monotonically. This is explained in Fig. 1.

o 0.5 1
Jl.
Figure 1 Variation of Fuzziness with Jl..
According to property 8(c). these parameters decrease with contrast enhancement
of an image. Now through processing, if we can partially remove the uncertainty on
the grey levels of X, we say that we have obtained an average amount of information
given by oy =y(X) - y(X *) or oH =H(X) - H(X *) by taking a decision bright or
dark on the pixels of X. The criteria y(X *) ~ y(X) and H(X *) ~ H(X), in order to
have positive oy and OH -values. follow from Eq. (8c). If the uncertainty is
completely removed, then y(X *) = H(X *) =O. In other words. Y(X) and H(X) can
be regarded as measures of the average amount of information (about the grey levels
of pixels) which has been lost by transforming the classical pattern (two-tone) into a
fuzzy pattern X.
It is to be noted that Y(X) or H(X) reduces to zero as long as Jl.mn is made 0 or 1
for all (m, n), no matter whether the resulting defuzzification (or transforming
process) is correct or not. In the following discussion it will be clear how l\y'
takes care of this situation.
Hf (X) has the following properties:
Pr 1: Hr attains a maximum if Jl.i =0.5 for all i.
Pr 2: Hr attains a minimum if Jl.i = 0 or 1 for all i.
152

Pr 3: Hr ~ H*r , where H*r is the rth order entropy of a sharpened version


of the fuzzy set.
Pr4: Hr is, in general, not equal to H r , where H r is the rth order entropy
of the complement set.
Hf $; Hf+l when all Jl.i E [0.5, 1]
Pr 5:
Hf ~ Hr+l when all Jl.i E [0, 0.5].
Note that the property P4 of equation 8(d) is not, in general, valid here. The
additional property Pr 5 implies that Hf is a monotonically nonincreasing function
of r for Jl.i E [0, 0.5] and a monotonically nondecreasing function of r for
Jl.i E [0.5, 1] (when 'min' operator has been used to get the group membership
value).
When all the Jl.ivalues are same, HI(X)= H2 (X)= ... =Hr(X). This is
because of the fact that the difficulty in taking a decision regarding possession of a
property on an individual is same as that of a group selected therefrom. The value of
Hf would, of course, be dependent on the Jl..1 ~alues.
Again, the higher the similarity among singletons the quicker is the convergence
to the limiting value of Hr. Based on this observation, let us derme an index of
similarity of supports of a fuzzy set as S = H1[H2 (when H2 = 0, HI is also zero and
S is taken as 1). Obviously, when Jl.i E [0.5, 1] and the min operator is used to
assign the degree of possession of the property by a collection of supports, S will lie
in [0, 1] as Hf $; Hr+l. Similarly, when Jl.i E [0, 0.5] S may be defined as H2[HI
so that S lies in [0, 1]. Higher the value of S the more alike (similar) are the
supports of the fuzzy set with respect to the property P. This index of similarity can
therefore be regarded as a measure of the degree to which the members of a fuzzy set
are alike.
Therefore, the value of conventional fuzzy entropy (HI or Eq. 4) can only
indicate whether the fuzziness in a set is low or high. In addition to this, the value
of Hr also enables one to infer whether the fuzzy set contains similar supports (or
elements) or not. The similarity index thus defined can be successfully used for
measuring interclass and intraclass ambiguity (i.e., class homogeneity and contrast)
in pattern recognition and image processing problems.
The aforesaid features are explained in Table 1 when Jl.i E [0.5, 1], min operator
is used to compute group membership and k in Eq. 5 is considered to be
lOe r ,r = 1, 2, .. . 6.

Hhy(X) has the following properties. In the absence of fuzziness when MNPb

pixels become completely black (Jl.mn = 0) and MNPw pixels become completely
153

Table 1 HiU!iherOrder EnlI'ODV


Case Ilx HI H2 H3 H4 H5 H6 S
1 (1,1,1,1,1,1,1,1,1,1) 0 0 0 0 0 0 1
2 ( .5, .5,.5,.5,.5,.5,.5,.5,.5, .5,.5 ) 1 1 1 1 1 1 1
3 ( 1,1,1,1,1,.5,.5,.5,.5,.5) .5 .777 .916 .976 .996 1 .6 42
4 (.5,.5,.5,.5,.5,.6,.6, .6,.6,.6 ) .980 .991 .996 .999 .999 1 .989
5 (.6,.6,.65,.9,.9,.9,.9,.9,.9,.915) .538 .678 .781 .855 .905 .937 .793
6 (.8, .8,.8,.8,.8,.8,.9,.9,.9,.9 ) .538 .613 .641 .649 .650 .650 .878
7 {.5,.5,.5,.5,.5,.5,.9, .9, .9,.9 } .748 .916 .979 .997 1 1 .816
8 {.7,.7,.7 ,.7 ,.7,.8,.8,.8,.8,.8} .748 .
.802 .830 .841 .845 .846 .932
white (Ilmn = 1) then Ew= Pw' ~= Pb and Hhy boils down to two state classical
entropy
He =-Pw log Pw - lb log lb, (10)

the states being black and white. Thus. Hhy reduces to He only when a proper

defuzzification process is applied to detect (restore) the pixels.IHhy - He I can


therefore be acted as an objective function for enhancement and noise reduction. The
lower the difference. the lesser is the fuzziness associated with the individual symbol
and higher will be the accuracy in classifying them as their original value (white or
black). (This property was lacking with y(X) and H(X) measures (equations 2-4)
which always reduce to zero irrespective of the defuzzification process). In other

words.IHhY - Hel represents an amount of information which was lost by


transforming a two tone image to a gray tone.
For a given Pwand Pb (pw +lb = 1. 0::; pw.Ib:S; 1), ofallpossibledefuzzified
versions, Hhy is minimum for the one with properly defuzzified.
If Ilmn = 0.5 for all (m, n) then Ew=Eb
and Hhy = -log(0.5expO.5) (11)

i.e .• Hhy takes a constant value and becomes independent of P wand Pb' This is
logical in the sense that the machine is unable to take decision on the pixels since all
Ilmn values are 0.5.
Let us consider an example of a digital image in which, say, 70% pixels look
white, while the remaining 30% look dark. Thus the probability of a white pixel Pw
is 0.7 and that of a dark pixel Pb is 0.3. Suppose, the whiteness of the pixels is not
constant, i.e., there is a variation (grayness) and similar is the case with the black
pixels.
Let us now consider the effect of improper defuzzification on the pattern shown
in case 1 of the Table 2. Two types of defuzzifications are considered here. In cases
2-4 all the symbols with Il = 0.5 are transformed to zero when some of them were
154

actually generated from symbol 'I'. In cases 5-6 of Table 2 some of the ~ values
greater than 0.5 which were generated from symbol 1 (or belong to the white portion
of the image) are wrongly defuzzified and brought down towards zero (instead of I).

In both situations, it is to be noted that IH - Hhyl does not reduce to zero. The case
7, on the other hand, has all its elements properly defuzzified. As a result, E1 and Eo

become 0.3 and 0.7 respectively and IHhy - He Ireduces to zero.


Table 2: Effect of wrong defuzzification(with Po = OJ and PI = 0.7)

Case J.lX E1 EO Hhy IH - Hhy I


1 {.9,.9,.8,.8,. 7,.6,.5,.5,.4,.3} .620 .876 .235 .375
2 {.999,.999,.9,.8,.7,.7, .3,.3,.2,.1 } .576 .776 .342 .268
3 {l,l,l, .99, .9,.9,.l,.I,O,O} .450 .648 .542 .068
4 (l,l,l. l,l,l,O,O,O,O) .400 .600 .632 .021
5 {.99,.99,.1..1..9,.8, .7,.2,.1•. 1} .630 .634 .456 .154
6 (1.1,0,0,1,1.1, 0, O,O} .500 .500 .693 .082
7 {1 1.1.1,1
, .1,1.0,0.0) .300 .700 .611 0

C{J.L1' J.l:2} of equation (7) has the following properties.


a) If for higher values of ~1 (X), ~(X) takes higher values and the converse is
also true then C{J.L1' J.l:2} must be very high.
b) If with increase of x, both J.11 and ~ increase then C{J.11'~) > O.
c) If with increase of x, J.11 increases and ~ decreases or vice versa then
C{J.L1' J.1 2) < O.
d} C{J.L1' ~1} = 1
e) C{J.LI' J.1I) ~ C{J.LI ' J.12)
f) C{J.1l' I-J.11} = -1
g) C{J.LI'~) = C~, J.1I)
h) -1 $ C(J.11'~) $1
i) C{J.1l'~) = -C(1 -~l' ~2)
j) C{J.Ll,~)=C(1 -J.1I,I-~)

IMAGE GEOMETRY
The various geometrical properties of a fuzzy image subset (characterized by
~X(xmn) or simply by J.1) as defmed by Rosenfeld [5,6] and Pal and Ghosh [7] are
given below with illustration. These provide measures of ambiguity in geometry
(spatial domain) of an image.
155

A. Area The area of a fuzzy subset Jl is defmed as [5]


a{Jl) = JJl (12)
where the integration is taken over a region outside which Jl=O. For Jl being
piecewise constant (in case of digital image) the area is
a{Jl) = LJl (13)
where the summation is over a region outside which Jl=O. Note from equation (13)
that area is the weighted sum of the regions on which Jl has constant value weighted
by these values.
Example 1 Let Jl be of the fonn
0.2 0.4 0.3
0.2 0.7 0.6
0.6 0.5 0.6
Area a(jl) = (0.2+0.4+0.3+0.2+0.7+0.6+0.6+0.5+0.6) = 4.1

B. Perimeter If Jl is piecewise constant, the perimeter of Jl is defined as [5]


p(Jl) = L IJl(i) - Jl(j)1 * IA( i, j, k)1
i, j, k (14)
This is just the weighted sum of the lengths of the arcs A(i, j, k) along which the
regions having constant Jl values Jl(i) and Jl(j) meet, weighted by the absolute
difference of these values. In case of an image if we consider the pixels as the
piecewise constant regions, and the common arc length for adjacent pixels as unity
then the perimeter of an image is defined by
p(Jl) = L IJl( i) - Jl(j)1
i, j (15)
where Jl(i) and Jl(i) are the membership values of two adjacent pixels.
For the fuzzy subset Jl of example 1, perimeter is
p(Jl) = 10.2-0.41+10.2- 0.21+10.4 - 0.31+10.4 -0.71
+10.3 - 0.61 + 10.2 - 0.61 + 10.2 - 0.71 + 10.7 - 0.61
+10.7 -0.51+ 10.6-0.61+10.6-0.51+10.5- 0.61
=2.3

c. Compactness The compactness of a fuzzy set Jl having an area of a (Jl) and a


perimeter of p{Jl) is defined as [5]

comp(Jl) = a{Jl) 2
(p{Jl)) (16)
Physically, compactness means the fraction of maximum area (that can be encircled
by the perimeter) actually occupied by the object. In non fuzzy case the value of
compactness is maximum for a circle and is equal to 1t /4. In case of fuzzy disc,
where the membership value is only dependent on its distance from the center, this
compactness value is ~ 1t /4 [6]. Of all possible fuzzy discs compactness is
therefore minimum for its crisp version.
For the fuzzy subset Jl of example 1, comp(Jl) = 4.11(2.3*2.3) = 0 7. 75.
156

D. Height and Width The height of a fuzzy set J.1 is defined as [5]
h (J.1) = f max J.1mn dn
m [17]
where the integration is taken over a region outside which J.1mn =o.
Similarly the width of the fuzzy set is defined by
w{J.1) = f max J.1mndm
n (18)
with the same condition over integration as above. For digital pictures m and n can
take only discrete values, and since J.1 = 0 outside the bounded region, the max
operators are taken over a finite set. In this case the definitions take the form
h{J.1) = ; m:: J.1mn (19)

w(J.1) = L max J.1mn


m n (20)
m = I, 2, ... M; n = I, 2, ... N
So physically, in case of a digital picture, height is the sum of the maximum
membership values of each row. Similarly, by width we mean the sum of the
maximum membership values of each column.
For the fuzzy subset J.1 of example I, height is h(J.L) = 0.4+0.7+0.6 = 1.7 and
width is w(J.1) =0.6+0.7+0.6 = 1.9.

E. Length and Breadth The length of a fuzzy set J.1 is defmed as [7]
1(J.1) = max (fJ.1 mn dn) (21)
m
where the integration is taken over the region outside which J.1mn =O. In case of a
digital picture where m and n can take only discrete values the expression takes the
form

(22)
Physically speaking, the length of an image fuzzy subset gives its longest expansion
in the column direction. If J.1 is crisp, J.1mn =0 or I; in this case length is the
maximum number of pixels in a column. Comparing equation (22) with (19) we
notice that the length is different from height in the sense, the former takes the
summation of the entries in a column first and then maximizes over different
columns whereas, the later maximizes the entries in a column and then sums over
different columns.
The breadth of a fuzzy set J.1 is defmed as
b(J.1 ) = max (f J.1mn dm ) (23)
n
where the integration is taken over the region outside which J.1mn =O. In case of a
digital picture the expression takes the form
157

(24)
Physically speaking, the breadth of an image fuzzy subset gives its longest
expansion in the row direction. If J.1 is crisp, J.1mn = 0 or I; in this case breadth is
the maximum number of pixels in a row. The difference between width and breadth
is same as that between height and length.
For the fuzzy subset J.1 in example I, length is 1{J.1) = 0.4 + 0.7 + 0.5 = 1.6 and
breadth is b{J.1) = 0.6 + 0.5 + 0 .6 = 1.7.
F. Index ofArea Coverage (lOAC) The index of area coverage of a fuzzy set may be
defmed as [7]
area(J.1)
IOAC(J.1) = 1(J.1) * b(J.1) (25)
In nonfuzzy case, the IOAC has value of 1 for a rectangle (placed along the axes of
measurement). For a circle this value is m 2 / (2r * 2r) = 1t/ 4 . Physically by
IOAC of a fuzzy image we mean the fraction (which may be improper also) of the
maximum area (that can be covered by the length and breadth of the image) actually
covered by the image.
For the fuzzy subset J.1 of example I, the maximum area that can be covered by
its length and breadth is 1.6*1.7 = 2.72 whereas, the actual area is 4.1, so the IOAC
= 4.1/2.72 = 1.51.
It is to be noted that 1 (X)/h (X)::; 1 (26)
b (X)/w (X) ::; 1 (27)
When equality holds for (26) or (27) the object is either vertically or horizontally
oriented.
G. Degree of Adjacency The degree to which two regions S and T of an image are
adjacent is defined as
a(S,T)= I. 1 * 1
p € BP(S) 1 + !J.1(p)-r(q)! l+d(p) (28)
Here d(P) is the shortest distance between p and q, q is a border pixel (BP) ofT and p
is a border pixel of S. The other symbols are having their same meaning as in the
previous discussion.
The degree of adjacency of two regions is maximum (= 1) only when they are
physically adjacent i.e., d(p)=O and their membership values are also equal i.e., J.1(P) =
r(q). If two regions are physically adjacent then their degree of adjacency is
detennined only by the difference of their membership values. Similarly, if the
membership values of two regions are equal their degree of adjacency is detennined by
their physical distance only.
IMAGE PROCESSING OPERATIONS
In this section we will be explaining how the various grayness and geometrical
ambiguity measures can be used for image enhancement, segmentation, edge
detection and skeleton extraction problems. The algorithms which will be described
here provide both fuzzy and non fuzzy (as a special case) outputs.
158

Segmentation and Object Extraction


The problem of grey level thresholding plays an important role in image
processing. For example, in enhancing contrast in a image we need to select proper
threshold levels from its histogram so that some suitable non-linear transformation
can highlight a desirable set of pixel intensities compared to others. Similarly, in
image segmentation one needs proper histogram thresholding whose objective is to
establish boundaries in order to partition the image spaces into meaningful regions.
This Section illustrates an application of theory of fuzzy sets to make this task.
automatic so that an optimum threshold (or set of thresholds) may be estimated
without the need to refer directly to the histogram.
Criteria/or Threshold Selection
Let us consider, first of all, the parameters Y(X) or H(X) to explain the criterion of
thresholding.
Consider the standard S-function [8]
Pmn = Ilx(x mn ) = S(x mn ; a, b, c)=O, xmn:sa (29a)

= 2[(xmn -a)/(c-a)]2 ,a:S xmn:S b (29b)

=1-2[(x mn -c)/(c-a)]2,b:SX mn :SC (29c)


= 1, xmn ~c (29d)
with b =(a+e)/2, boa =cob = Ab,
for obtaining the Ilmn plane from the spatial xmn plane of the image X and for
computing y(X) and H(X) values from Eqs. (2), (3) and (4). The parameter b is the
cross-over point, i.e., S(b; a, b ,c) =0.5. Ab is the bandwidth. This is explained in
Fig. 2 for an L-Ievel image. Such a Il plane may be viewed to represent a fuzzy set
"bright image" so that the degree of brightness of a pixel increases with its gray
value.

u
;,
.
• 0.5
~
><
;;)

Xmn
Figure 2 Standard S function for an L-Ievel image.
For a particular cross-over point, say, b = lc we have Ilx(lc) = 0.5 and the Ilmn
plane would contain values> 0.5 or < 0.5 corresponding to xmn > lc or < lc' The
159

tenns y(X) and H(X) then measure the average ambiguity in X by computing
)..LXnX (x mn ) or Sn{1l x (xmn)) which is 0 if 11 x (xmn) =0 or 1 and is maximum for

IlX{xmn) =0.5.
The selection of a cross-over point at b =Ic implies the allocation of grey levels
< I and > I within the two clusters namely, background and object of a binodal
c c
image. The contribution of the levels towards y(X) and H(X) is mostly from those
around Ic and would decrease as we move away from Ic . Again, since the nearest
ordinary plane ?f (which gives the two-tone version of X) is dependent on the
position of cross-over point. a proper selection of b may therefore be obtained which
will result in appropriate segmentation of object and background. In other words, if
the grey level of image X has binodal distribution, then the above criteria for different
values of b would result in a minimum Y or H value only when b corresponds to the
appropriate boundary between the two clusters.
For such a position of the threshold (cross-over point), there will be minimum
number of pixel intensities in X having 1.1mn =- 0.5 (resulting in Y or H =- 1) and
maximum number of pixel intensities having 1.1mn =- 0 or l(resulting in Y or H =-
0) thus contributing least towards y(X) or H(X). This optimum (minimum) value
would be greater for any other selection of the cross-over point.
This suggests that modification of the cross-over point will result in variation of
the parameters y(X) and H(X) and so an optimum threshold may be estimated for
automatic histogram-thresholding problems without the need to refer directly to the
histogram of X. The above concept can also be extended to an image having
multimodal distribution in grey levels in which one would have several minima in Y
and H values corresponding to different threshold points in the histogram.
Let us now consider the geometrical parameters comp(X) and IOAC(X)
(equations 16 and 25). It has been noticed that for crisp sets the value of index of
area coverage (IOAC) is maximum for a rectangle. Again, of all possible fuzzy
rectangles IOAC is minimum for its crisp version. Similarly, in a nonfuzzy case the
compactness is maximum for a circle and of all possible fuzzy discs compactness is
minimum for its crisp version (6). For this reason, we will use minimization (rather
than maximization) of fuzzy compactness/lOAC as a criterion for image
segmentation (9).
Suppose we use equation (29) for obtaining the 'bright image' )..L(X) of an image
X. Then for a particular cross over point of S function, compacUless (J..L) and
IOAC{J..L) reflect the average amount of ambiguity in the geometry (i.e., in spatial
domain) of X. Therefore, modification of the cross over point will result in different
)..L(X) planes (and hence different segmented versions), with varying amount of
compactness or IOAC denoting fuzziness in the spatial domain. The )..L(X) plane
having minimum IOAC or compactness value can be regarded as an optimum fuzzy
segmented version of X.
For obtaining the nonfuzzy threshold one may take the cross over point (which
is considered to be the maximum ambiguous level) as the threshold between object
and background. For images having multiple regions, one would have a set of such
160

optimum J.l.(X) planes. The algorithm developed using these criteria is given below.

Algorithm 1

Given an L level image X of dimension MxN with minimum and maximum


gray vales lmin and lmax respectively,
Step 1: Construct the membership plane using equation (29) as
J.l.mn = J.l.(l) = S(l; a, b, c)
(called bright image plane if the object regions possess higher gray values)
or J.l.mn = J.l.(l) = 1- S(l; a, b, c)
(called dark image plane if the object regions possess lower gray values)
with cross-over point b and a bandwidth t:. b.
Step 2: Compute y(X),H(X), Comp(X) and IOAC(X)
Step 3: Vary b between lmin and lmax and select those b for which I(X)
(where I(X» denotes one of the aforesaid measures or a combination of them) has
local minima. Among the local minima let the global one have a cross over point s.
The level s, therefore, denotes the cross over point of the fuzzy image plane
J.l.mn, which has minimum grayness and/or geometrical ambiguity. The J.l.mn plane
then can be viewed as a fuzzy segmented version of the image X. For the purpose of
non fuzzy segmentation, we can take s as the threshold or boundary for classifying or
segmenting image into object and background.
The measure I(X) in Step 3 can represent either grayness ambiguity (i.e., y(X)
or H(X) or geometrical ambiguity (i.e., comp(X) or IOAC(X) or a(S,T» or both
(Le., product of grayness and geometrical ambiguities).

Faster Method o/Computation

From the algorithm 1 it appears that one needs to scan an L level image L times
(corresponding to L cross over points of the membership function) for computing the
parameters for detecting its threshold. The time of computation can be reduced
significantly by scanning it only once for computing its co-occurrence matrix, row
histogram and column histogram, and by computing J.l.(I), I = 1 , 2,... L every time
with the membership function of a particular cross over point
The computations of y(X) (or H(X», a(X), P(X), 1(X) and b(X) can be made
faster in the following way. Let h(i), i= 1,2..L be the number of occurrences of the
level i, c[ij], i = 1,2 .. L, j = 1,2 .. L the co-occurrence matrix and J.l.(i), i = 1,2..
L the membership vector for a fixed cross over point of an L level image X.
Determine y(X) , area and perimeter as

Y(X)=~ fT(i)h(i)
MN i=l (3Oa)
T(i)= min{J.l.(i),I-J.l.(i)} (30b)
161

L
a{X) = r. h{i) .1J.{i)
(31)
i=l
L L
p{X) =i:l j:lC[i. j]. 11J.{i) -1J.(j)1 (32)

For calculating length and breadth following steps can be used. Compute the
row histogram R[m. 1]. m = 1•... M. I = 1 .. L. where R[m. 1] represents the
number of occurrences of the gray level 1 in the mth row of the image. Find the
column histogram C[n. 1]. n = 1. . N, I = 1 .. L. where C[n. 1] represents the
number of occurrences of the gray level 1 in the nth column of the image. Calculate
length and breadth as
L
I(X)=max L C[n. 1]. ~(l)
(33)
n 1=1
L
b(X) = max L R[m. 1]. J.l.(l} (34)
m 1=1

Some Remarks

The grayness ambiguity measure e.g .• y(X} or H(X) basically sharpens the
histogram of X using its global information only and it detects a single threshold in
its valley region. Therefore. if the histogram does not have a valley. the above
measures will not be able to select a threshold for partitioning the histogram. This
can readily be seen from Equation (30) which shows that the minima of y(X}
measure will only correspond to those regions of gray level which has minimum
occurrences (Le.• valley region). Comp (X) or IOAC(X). on the other hand. uses
local information to determine the fuzziness in spatial domain of an image. As a
result. these are expected to result better segmentation by detecting thresholds even in
the absence of a valley in the histogram.
Again. comp(X) measure attempts to make a circular approximation of the
object region for its extraction. whereas. the IOAC(X) goes by the rectangular
approximation. Their suitability to an image should therefore be guided by this
criterion.
Choice of Membership Function
In the aforesaid algorithm w =2~b is the length of the interval which is shifted
over the entire dynamic range of gray scale. As w decreases. the ~(xmn) plane would
have more intensified contrast around the cross-over point resulting in decrease of
ambiguity in X. As a result, the possibility of detecting some undesirable thresholds
(spurious minima) increases because of the smaller value of fl b. On the other hand.
increase of w results in a higher value of fuzziness and thus leads towards the
possibility of losing some of the weak minima.
The criteria regarding the selection of membership function and the length of
window (Le.• w) have been reported recently by Murthy and Pal [10] assuming
continuous function for both histogram and membership function. For a fuzzy set
162

"bright image plane", the membership function Jl: [0, w] ~ [0,1] should be such
that
i) Jl is continuous, Jl(O) =0, Jl(w) = 1
ii) Jl is montominally non-decreasing, and
iii) Jl(x) = 1- Jl(w-x) for all x E [0, w]where w>o is the length of the window.
Furthermore, Jl should satisfy the bound criteria derived based on the correlation
measure (equation 7). The main properties on which correlation was formulated are
PI: If for higher values of Jl 1, ~ takes higher values and for lower values
of Jl 1, Jl2 also takes lower values then C{Jll' Jl2) > 0
P 2: If Jl 1i and ~ i then C{Jll' Jl 2) > 0
P3: If Jl 1i and ~ J. then C{Jll' Jl 2) < 0
[i denotes increases and J. denotes decreases].
It is to be mentioned that P2 and P3 should not be considered in isolation of Pl.
Had this been the case, one can cite several examples when Jl 1i and ~ i but C{Jll'
Jl2) < 0 and Jl 1i and ~ J. but C{Jll' Jl2) > O. Subsequently, the type of
membership functions which should not be considered in fuzzy set theory are
categorized with the help of correlation. Bound functions hI and h2 are accordingly
derived [11]. They are
(35)
= x- E, ES XS 1
h2(X)=X+E.OSxSl-E (36)
=1. 1- ES X S 1
where E =0.25. The bounds for membership function Jl are such that
hI (x) ~ Jl(x) ~ h2(X) for x E [0,1].
For x belonging to any arbitrary interval, the bound functions will be changed
proportionately. For hI ~Jl~h2, C(hl.h2)~O ,C(hl.Il)~O andC(h2.1l)~O.
The function Jllying in between hI and ~ does not have most of its variation
concentrated (i) in a very small interval, (ii) towards one of the end points of the
interval under consideration and (iii) towards both the end points of the interval under
consideration.
Figure 3 shows such bound functions. It is to be noted that Zadeh's standard S
function (equation 29) satisfies these bounds.
It has been shown [10] that for detecting a minimum in the valley region of a
histogram, the window length w of the Jl function should be less than the distance
between two peaks around that valley region.
163

o .1 l. ~
4 2 4
x
Figure 3 Bound Functions for J..L(x).

Hr as an Objective Criterion

Let us now explain another way of extracting object by minimizing higher order
fuzzy entropy (equation 5) of both object and background regions. Before explaining
the algorithm, let us describe the membership function and its selection procedure.
Let s be an assumed threshold which partitions the image X into two parts
namely, object and background. Suppose the gray level ranges [I - s] and [s + I - L]
denote, respectively, the object and background of the image X. An inverse 1t-type
function as shown by the solid line in the Figure 4 is used here to obtain J..L mn values
of X. The inverse 1t-type function is seen (from Fig. 4) to be generated by taking
union of S(x ; (s - (L - s», s, L) and I - S(x; I, s, (s + s - I», where S denotes the
standard S function defined by Zadeh (equation 29).
The resulting function as shown by the solid line, makes J..L lie in [0.5,1]. Since
the ambiguity (difficulty) in deciding a level as a member of the object or the
background is maximum for the boundary level S, it has been assigned a membership
value of 0.5 (i.e., cross-over point). Ambiguity decreases (Le., degree of
belongingness to either object or background increases) as the gray value moves away
from s on either side. The J..Lmn thus obtained denotes the degree of belongingness of
a pixel xmn to either object or background
Since s is not necessarily the mid point of the entire gray scale, the membership
function (solid line if Fig. 4) may not be a symmmetric one. It is further to be noted
that one may use any linear or nonlinear equation (instead of Zadeh's standard S
function) to represent the membership function in Fig. 4. Unlike the Algorithm-I,
the membership function does not need any parameter selection to control the output.

Algorithm 2
Assume a threshold s, I ~ s ~ L and execute the following steps.
Step 1: Apply an inverse 1t - type function [Fig. 4] to get the fuzzy J..Lmn plane,
164

O.S to - - - - - ..I - - - - - t f - - - - - - - - - + -
L
,

o __ ______________ ____•• _________


Figure 4 Inverse 1t function (solid line) for computing object and background
entropy.

with JlmnE [0.5, I]. (The membership function is in general asymmetric).

Step 2: Compute the rth order fuzzy entropy of the object Ho and the background
HS considering only the spatially adjacent sequences of pixels present within the
object and background respectively. Use the 'min' operator to get the membership
value of a sequence of pixels.
Step 3: Compute the total rth order fuzzy entropy of the partitioned image as
H~ =Hb+Hs.

Step 4: Minimize H~ with respect to s to get the threshold for object background
classification.

Referring back to the Table I, we have seen that H2 reflects the homogeneity
among the supports in a set, in a better way than HI does. Higher the value of r, the
stronger is the validity of this fact. Thus, considering the problem of object-
background classification, Hr seems to be more sensitive (as r increases) to the
selection of appropriate threshold; i.e., the improper selection of the threshold is
more strongly reflected by Hr than Hr- I For example, the thresholds obtained by H2
measure has more validity than those by HI (which only takes into account the
histogram infonnation). Similar arguments hold good for even higher order (r > 2)
entropy.

Example 2

Figures 5 and 6 show the images of Lincoln and blurred chromosome along with
the histogram. Table 3 shows the thresholds obtained by comp (X) and IOAC (X)
measures for various window sizes w when Zadeh's S function is used as membership
function. Lincoln image is of 64x64 with 32 gray levels whereas, chromosome
image is of 64x64 with 64 gray levels.
165

h(l)

o 1 31
FIgure 5(a) Input. Figure 5(b) Histogram

h(l)

o I 63
Figure 6(a) Input. Figure 6(b) Histogram.
166

Figure 7(a) Threshold = 10.

Figure 7(b) Threshold = 32. Figure 7(c) Threshold = 56.

Table 3 Various Thresholds (* denotes global minimum)


Lincoln Chromosome
W W
Comp IOAC Comp IOAC
8 10 11 * 23 12 33 56 * 30 * 51
10 10 11 * 23 16 55 31 * 49
12 10 11 * 23 20 54 32 * 46
16 9 11 24 52 34

Threshold produced by H2 measure (Algorithm 2) is 8 for Lincoln image. Some


typical nonfuzzy thresholded outputs of these images are shown in Figure 7.
Recently, transitional correlation and within class correlation have been defined [12]
based on equation (7) for image segmentation which takes both local and global
information into account.
167

Image Enhancement
The object of enhancement technique is to process a given image so that the
result is more suitable than the original for a specific application. The term 'specific'
is of course, problem oriented. The techniques used here are based on the
modification of pixels in the fuzzy property domain of an image. Three kinds of
enhancement operations namely, contrast enhancement, smoothing and edge detection
will be discussed here.

Enhancement in Property Domain [13-16J

The contrast intensification operator on a fuzzy set A generates another fuzzy set
A' = INT (A) in which the fuzziness is reduced by increasing the values of J.lA(x)
which are above 0.5 and decreasing those which are below it. Define this INT
operator by a transformation Tl of the membership function J.lrnn or Prnn as

Tl{Prnn}=T{{Prnn}=2P~, O:;Prnn :;0.5 (37a)

= T{'{Prnn }= 1- 2{1- Prnn}2, 0.5:; Pmn :; 1 (37b)


m = I, 2, ... M, n = I, 2, ... N

In general, each Pmn or J.lmn in X (Eq. 1) may be modified to P'mn to enhance


the image X in the property domain by a transformation function Tr where
P~ = Tr{Pmn } == T;{Pmn},O:; Pmn :; 0.5 (38a)
(38b)
r = 1,2, ...
The transformation function Tr is defined as successive applications of T1 by the
recursive relationship
Ts{Prnn}==Tl {Ts-l{Pmn }}, s==I, 2, . . . (39)
and Ts-l (pmn) represents the operator INT defined in (37).
This is shown graphically in Figure 8. As r increases, the curve tends to be
steeper because of the successive application of INT. In the limiting case, as
r ~ 00, Tr produces a two-level (binary) image. It is to be noted here that,
corresponding to a particular operation of T', one can use any of the multiple
operations of T" , and vice versa, to attain a desired amount of enhancement. Now it
is up to the user how he will interpret and exploit this flexibility depending on the
problems to hand. It is further to be noted from equation 8(c) that H(X) or y(X) of
an image decreases with its contrast enhancement [16].
The membership plane J.lrnn for enhancing contrast around a cross-over point
may be obtained from [13]
168

(40)
where the position of cross-over points, bandwidth and hence the symmetry of the
x
curves are determined by the fuzzifiers Fe and Fd . When = xmax (maximum level
in X), Ilmn represents an S type function. When x = any arbitrary level I., Ilmn
represents a 1t type function. Zadeh's standard functions do not have the provision
for controlling its cross-over point. The parameters Fe and Fd of equation (40) are
determined from the cross-over point across which contrast enhancement is desired.
After enhancement in the fuzzy property domain, the enhanced spatial domain
x:OO may be obtained from
xmn
I
=0-1(Il'mn) '
a S Il mn S 1
I (41)
where a is the value of Ilmn when xmn=0.

1.0 r-----r--~~.,.

Il ~ 0.5 I - - - - - - - - ' I I I - - - - - - - f

o 0.5
Ilmn
Figure 8 !NT Transformation function for contract enhancement in property plane.

Smoothing Algorithm

The idea of smoothing is based on the property that image points which are
spatially close to each other tend to possess nearly equal grey levels. Smoothing of
an image X may be obtained by using q successive applications of 'min' and then
'max' operators within neighbors such that the smoothed grey level value of (m, n)th
pixel is [13,14]
x' = max q minq{x"}
mn Ql Ql 1J' (42)
(i,j)~(m , n),(i,j)eQl, q=l, 2 ...
Smoothing operation blurs the image by attenuating the high spatial frequency
169

components associated with edges and other abrupt changes in grey levels. The
higher the values of Q I and q, the greater is the degree of blurring.

Edge Detection

If x:nn denotes the edge intensity corresponding to a pixel xmn then edges of the
image are defined as [13,15]
Edges ~U U x~
mn (43a)

where
x~n = IXmn - min{Xij}1 (43b)
Q

or,
x~ =Ixmn -max{xij}1 (43c)
Q

or, x~ =mQ{xij}-mQin{xij}, (i,j)eQ (43d)


Q is a set of N coordinates (i, j) which are on/within a circle of radius R centered at
the point (m,n). Equation (43 c) as compared with (43b) causes the boundary to be
expanded by one pixel. Equation (43 d), on the other hand, results in a boundary of
two-pixel width. It therefor appears from Eq. (43) that the better the contrast
enhancement between the regions, the easier is the detection and the higher is the
intensity of contours x:OO among them.
Other operations based on max and min operators are available in [17,18].
Automatic selection of an appropriate enhancement operator based on fuzzy geometry
is available in [19].

Edginess Measure

Let us now describe an edginess measure [20,21] based on H1(Equation 5) which


denotes an amount of difficulty is deciding whether a pixel can be called an edge or
not. Let N3 be a 3 x 3 neighborhood of a pixel at (x, y) such that
x,y

3
Nx,y = {(x, y), (x-1.y), (x+I,y), (x,y-I), (x,y+I), (x-I, y-l),

(x-I, y+I), (x+l,y-l), (x+l,y+l)} (44)

The edge-entropy, HE
x,y
of the pixel (x, y), giving a measure of edginess at (x,
y) may be computed as follows. For every pixel (x, y), compute the average,

maximum and minimum values of gray levels over N3x,y . Let us denote the average,
maximum and minimum values by Avg, Max, Min respectively. Now define the
following parameters.
170

D =max { Max - Avg, Avg - Min) (45)


B =Avg (46)
A= B - D (47)
C= B+D (48)
A 1t-type membership function is then used to compute Ilxy for all (x, y)

e Ni,y ,such that Il(A) =1l(C) =0.5 and Il(B) =1. It is to be noted that J.l xy ;:: 0.5.
Such a J.lxy ' ti}erefore, gives the degree to which a gray level is close to the average

value computed over N!.y. In other words. it represents a fuzzy set "pixel intensity

close to its average value", averaged over N~.y. When all pixel values over N~.y
are either equal or close to each other (i.e., they are within the same region), such a
transformation will make all Jlxy =I or close to I. In other words. if there is no
edge, pixel values will be close to each other and the Il values will be close to
one(l); thus resulting in a low value of HI. On the other hand, if there is an edge

(dissimilarity in gray values over N~.y). then the Jl values will be more away from

unity; thus resulting in a high value of HI . Therefore, the entropy HI over Ni. oy

can be viewed as a measure of edginess (H;,y) at the point (x, y). The higher the

value of H;.y, the stronger is the edge intensity and the easier is its detection. As
mentioned before. there are several ways in which one can define a 1t-type function as
shown in Fig. 9.

0.5 - - -
I \

'.
l ,
O____.ce~/_______________________
,/ '_.".._______
A B c
Figure 9 1t function for computing edge entropy.

The proposed entropic measure is less sensitive to noise because of the use of a
171

dynamic membership function based on a local neighborhood. The method is also


not sensitive to the direction of edges. Figure 10 shows edge output of biplane and
Lincoln images. Other edginess measures are available in [13,22].

.-_.,,_ -
-
---
.
-
--- .'
..
.-- -
.- - _-..
. _ . _ II

..-.
-0- ._'-'

~~( . ;.:..~-

_._,,- :.:' ~ -:; ..


" ;;;;;;::===='.~

Input

Figure 10 Edge output of Lincoln and Biplane.


172

Fuzzy Skeletonization
The problem of skeletonization or thinning plays a key role in image analysis
and recognition because of the simplicity of object representation it allows. Let us
now explain a skeletonization technique [23] based on minimization of compactness
property over the fuzzy core line plane. The output is fuzzy and one may obtain its
nonfuzzy (crisp) single pixel width version by retaining only those pixels which have
strong skeleton-membership value compared to their neighbors.

Coreline Membership Plane

After obtaining a fuzzy segmented version (as described before) of the input
image X. the membership function of a pixel denoting the degree of its belonging to
the subset 'Core line' (skeleton) is determined by three factors. These include the
properties of possessing maximum intensity. and occupying vertically and
horizontally middle positions from the edges (pixels beyond which the membership
value in the fuzzy segmented image is zero) of the object
Let xmax be the maximum pixel intensity in the image and Po(x mn ) be the
function which assigns the degree of possessing maximum brightness to the (m,n)th
pixel. Then the simplest way to derme Po(x mn ) is
Po(x mn ) =xmn/xmax' (49)
It is to be mentioned here that one may use other monotonically nondecreasing
functions to define Po(x) with a flexibility of varying cross-over point. Equation 49
is the simplest one with fixed cross-over point at xmax 12.
Let Xl and "2 be the distances of xmn from the left and right edges respectively.
(The distance being measured by the number of units separating the pixel under
consideration from the first background pixel along that direction). Then Ph(x mn)
denoting the degree of occupying the horizontally central position in the object is
defined as

2x2
= -.,..---=-~
Xl(Xl +X2) (50)

where d(Xl,X2)=l xl -x21


173

Similarly, the vertical function is defmed as

pv(X mn } = 1!.
Y2
= Y2
YI

= 2Y2 ifd(YI,Y2}>landYI>Y2,
YI (YI + Y2) (51)
Equations (50,51) assign high values ( ..1.0) for pixels near core and low values to
pixels away from the core. The factor (xl + x2) or (y I + Y2) in the denominator
takes into consideration the extent of the object segment so that there is an
appreciable amount of changes in the property value for the pixels not belonging to
the core.
These primary membership functions Po,Ph and Pv may be combined as either

Jlc( xmn) = max {min(Po,Ph), min(Po ,pv )' min(Ph ,pv )} (52)
or Jlc(X mn } = WIPo + W2 Ph + W3Pv (53a)
with wI + w2 + w3 = I (53b)
to defme the grade of belonging of x to the subset 'Core Line' of the image.
mn
Equation (52) involves connective properties using max and min operators such
that Jlc = I when at least two of the three primary properties take values of unity.
All the three primary membership values are given equal weight in computing the Jlc
value. Equation (53), on the other hand, involves a weighted sum (weights being
denoted by WI' W2 and W3)· Usually, one can consider the weight WI attributed to
Po (property corresponding to pixel intensity) to be higher than the other two and
W2 =W3.
Equation (52) or (53) therefore extracts (using both gray level and spatial
information) the subset 'Core line' such that the membership value decreases as one
moves away towards the edges (boundary) of object regions.

Oplimum a.-CUI

Given the Jlc(xmn) plane developed in the previous stage with the pixels having
been assigned values indicating their degree of membership to 'Core line', the
optimum (in the sense of minimizing ambiguity in geometry or in spatial domain)
skeleton can be extracted from one of its a.-cuts having minimum comp{Jl) value
(Eq. (16». The a.-cut of Jlc(xmn) is defined as
174

(54)
Modification of a will therefore result in different fuzzy skeleton planes with varying
comp(J.l) value. As a increases, the comp(J.l) value initially decreases to a certain
minimum and then for a further increase in a, the comp(J.l) measure increases.
The initial decrease in comp(J.l) value can be explained by observing that for
every value of a, the border pixels having J.1-values less than a are not taken into
consideration. So, both area (Eq. (13» and perimeter (Eq. (14» are less than those for
the previous value of a. But the decrease in area is more than the decrease in its
perimeter and hence the compactness (Eq. 16) decreases (initially) to a certain
minimum corresponding to a value (l = a',say.
Further increase in a (i,e,. for a> a'), results in a J.1 Ca plane consisting of a
number of disconnected regions (because majority of core line pixels being dropped
out). As a result, decrease in perimeter here is more than the decrease in area and
comp(J.l) increases. The J.1 Ca' plane having minimum compactness value can be
taken as an optimum fuzzy skeleton version of the image X. This is optimum in the
sense that for any other selection of a (i,e,. (l::p (l') the comp(J.l) value would be
greater.
If a nonfuzzy (crisp) single-pixel width skeleton is deserved, it can be obtained
by a contour tracking algorithm [24] which takes into account the direction of
contour, multiple crossing pixels, lost path due to spurious wiggles etc. based on
octal chain code.
Fig. 11 shows the optimum fuzzy skeleton of biplane image (Fig. 10). This
corresponds to a =0.55. The connectivity of the skeleton in the optimum version
can be preserved, if necessary, by inserting pixels having intensity equal to the
minimum of those of pairs of neighbors in the object.

Figure 11 Optimum fuzzy


skeleton of
biplane.
175

PRIMITIVE EXTRACTION
In picture recognition and scene analysis problems, the structural information is
very abundant and important, and the recognition process includes not only the
capability to assign the input pattern to a pattern class, but also the capacity to
describe the characteristics of the pattern that make it ineligible for assignment to
another class. In these cases the recognition requirement can only be satisfied by a
description of pattern-rather than by classification.
In such cases complex patterns are described as hierarchical or tree-like structures
of simpler subpatterns and each simpler subpattern is again described in terms of even
simpler subpatterns and so on. Evidently, for this approach to be advantageous, the
simplest subpatterns, called pattern primitives are to be selected.
Another activity which needs attention in this connection is the subject of shape-
analysis that has become an important subject in its own right Shape analysis is of
primal importance in feature/primitive selection and extraction problems. Shape
analysis also has two approaches, namely, description of shape in terms of scalar
measurements and through structural descriptions. In this connection, it needs to be
mentioned that shape description algorithms should be information-preserving in the
sense that is is possible to reconstruct the shapes with some reasonable
approximation from the descriptors.
This section presents a method [24] to demonstrate an application of the theory
of fuzzy sets in automatic description and primitive extraction of gray-tone edge-
detected images. The ultimate aim is to recognize the pattern using syntactic
approach as described in the next section.
The method described here provides a natural way of viewing the primitives in
terms of arcs with varying grades of membership from 0 to 1.

Encoding

The gray tone.contour of an image can be encoded into one-dimensional symbol


strings using the rectangular (octal) array method. The directions of the octal codes
are shown in Figure 12. An octal code is used to describe a w-pixel (w>l) length
contour by taking the maximum of its grades of membership corresponding to
'venical', 'horizontal' and 'oblique' lines. This approximation of using w-pixel
(instead of one-pixel) length line saves computational time and storage requirement
without affecting the system performance.
J.1 y (x), J.1Ifx) and J.1ob(x) representing the membership functions for vertical,
horizontal and oblique lines respectively of a line segment x marking an angle e
with the horizontal line H (Figure 13) may be defined as [13,24].
JlV(x):: l-ll/mxI Fe .Imxl > 1. (55)
= 0 otherwise
JlH(X):: l-lmxlFe .Imxl < 1. (56)
=0 otherwise
176

7 r.------r---~:rt 1 v x

6 .....---~~--...... 2

H
5 4 3
Figure 12 The directions of octal codes. Figure 13 Membership function for
vertical and horizontal lines.

~ob(X) = l-I(9-45)/45I Fe ,0 < Imxl < 00, (57)


= 0 otherwise
Fe is a positive constant which controls the fuzziness in a set and mx = tan 9 .
The equations (55-57) are such that
J.!.v(x) ~ 1 as 191 ~ 90 0 ,

~(x) ~ 1 as Ie I ~ 0 0 ,

J.!.ob(x) ~ 1 as 191 ~ 45 0 ,

aOO J.!./x) § 1 as 191 § 45 0 ,


The details of encoding technique are available in [13].

Segmentation and Contour Description

The next task before extraction of primitives and description of contours is the
process of segmentation of the octal coded strings. Splitting up of a chain is
dependent on the constant increase/decrease in code values. For extracting an arc, the
string is segmented at a position whenever a decrease/increase after constant
increase/decrease in values of codes is found [13]. Again, if the number of codes
between two successive changes exceeds a prespecified limit, a straight line is said to
exist between two curves. In the case of a closed curve, a provision may be kept for
increasing the length of the chain by adding first two starting codes to the tail of the
string. This enables one to take the continuity of the chain into account in order to
reflect its proper segmentation [13].
After segmentation one needs to provide a measure of curvature along with
direction of the different arcs and also to measure the length of lines in order to
extract the primitives. The degree of 'arcness' of a line segment x is obtained using
the function
177

-- ---
." --:.
- --
.
.... . . ---:.
..
:

--
'c
_ . _._1- _~_,_

x ~ ~.
-----
~-------a--------~
Figure 14 Membership function for arc. FIgure 15 Nuclear Pattern of brain cell.

(58)
a is the length of the line joining the two extreme points of an arc x (Figure 14), l
is the arc-length such that the lower the ratio a/ l is, the higher is the degree of
'arcness'.
For example, consider a sequence of codes 5 6 6 7 denoting an arc x. For
computing its l note that if a code represents an oblique line, the corresponding
increase in arc-length would be .fi, otherwise increase is by unity. Arc diameter a is
computed by measuring the resulting shifts ~m and ~n of spatial coordinates (along
mth and nth axes) due to those codes in question. For the aforesaid example we have
~ = 1+0+0+-1=0,

M=-I-I-I-I=-4,

a =~~m2 +M2 =4,


l=4.828,
~arc(x) = 0.643 (for Fe = 0.25).
Since the initial code (5) is lower than the final code (7), the sense of the curve is
positive (clockwise).
Similarly, for sequences 56 and 5 6 7 the ~arc values are respectively 0.52 and
0.682. The figures thus obtained for the different sequences agree well with our
intuition as far as their degree of arcness (curvature) is concerned. Also note that the
sequences like 5 5 6 6 7 7 and 5 5 5 6 6 6 7 7 7 have the same ~arc
values as obtained with the sequence 5 6 7. Similarly, the sequences 5566 and 5 6
have the same ~arc value.

Example 3

To explain the aforesaid features, let us consider the Fig. 15 showing a two-tone
contour of nuclear pattern of brain neurosecretory cells [25]. The string descriptions
of Fig. 15 in terms of arcs (of different arcness) and lines are shown below.
178

11, 187, 7881, 112233,321, 11, 112,223,334,44,445,543,3345,55,56,667,


7665,55,5667,78,776,66,678,11
S 64 LV.64 V. 52Y49 LV.68 L
LY68 Y.64 Y.68Y64 LV.49 V.Sl V. 49 LY.51 Y68 V.49 LV.52 V. S
Here, L, V, and V denote the straight line, 'clockwise arc' and 'anticlockwise arc'
respectively. The suffix of V represents the degree of arcness of the arc V. The
positions of segmentation are shown by a comma (,).
It is to be mentioned here that the approach adopted here to define and to extract
arcs with varying grades of membership is not the only way of doing this. One may
change the procedure so as to result in segments with membership values different
from those mentioned here.

FUZZY SYNTACTIC ANALYSIS


The syntactic approach to pattern recognition involves the representation of a
pattern by a string of concatenated subpatterns called primitives. These primitives
are considered to be the terminal alphabets of a formal grammar whose language is
the set of patterns belonging to the same class. Recognition therefor involves a
parsing of the string.
The syntactic approach has incorporated the concept of fuzzy sets at two levels.
First, the pattern primitives are themselves considered to be labels of fuzzy sets, i.e.,
such subpatterns as 'almost circular arcs', 'gentle', 'fair' and 'sharp' curves are
considered. Secondly, the structural relations among the subpatterns may be fuzzy,
so that the formal grammar is fuzzified by the weighted production rules and the grade
of membership of a string is obtained by min-max composition of the grades of the
production used in the derivations. Inference of a fuzzy grammar is another
interesting problem which infers from the specified fuzzy language, the productions
as well as the weights of these rules. In this section we will be explaining the
elementary notions of fuzzy grammar with examples.
The formal definition of a fuzzy grammar is as follows:

Definition: A fuzzy grammar FG is a 6-tuple


FG=(VN,VT, P, S, J, 11)
where
VN : a set of non-terminals (which are essentially labels for certain

fuzzy subsets called fuzzy syntactic categories of VT* )


VT n
: a set of terminals, such that VN VT = 0 (null set)
P : a set of production (or rewriting) rules of the type a ~ ~ (a is
replaced by P), where a,p E (VNUVT)*
S : a starting symbol, such that S E VN

J : {ril i = 1, 2 . . n,. n = cardinality of p} , is the set of labels for


the production rules
179

: a mapping J.1.:J~ [0,1] such that J.1.(r.) denotes the membership in


1
P of the rule labelled r.1

v.*T : the set of finite strings obtained by the concatenation of elements


ofVT

: the set of fmite strings obtained by the concatenation of element


ofVNUVT
A fuzzy grammar FG generates a fuzzy language L(FG) as follows:
A string X E VT* is said to be in L(FG) iff it is derivable from S and its grade of

membership flL(FG)(X) in L(FG) is> 0, where

flL(FG)(X)= max [ ~in


l~k~m l~l~lk
J.1.(rr)]
(59)
where m in the number of derivations that X has in FG; lk is the length of the kth

derivation chain, k =1(1)m; and fik is the label of the ith production used in the kth
derivation chain, i = I, 2, ... , lk'
Clearly, if a production ex ~ ~ be visualized as a chain link of strength J.1.(r), r
being the label of ex ~ ~ , then the strength of a derivation chain is the strength of its
weakest link, and hence
J.1.L(FG)(X) =strength of the strongest derivation chain from

S to X for all X E VT*


Example 4: Suppose FG1=({A, B, S}, {a, b}, P, S, {I, 2,3, 4}, J.1.) where J, P and
J.1. are as follows
1: S -+ AB with J.1.(1) =0.8
2: S -+ aSb J.1.(2) = 0.2
3: A -+ a J.1.(3) = 1
4: B -+ b J.1.(4) = 1
Clearly, the fuzzy language generated is FLl= {X I X = anbn, n = 1,2, ... }

with flFLl (anbn )= 0.8 ifn = 1


= 0.2 ifn ~ 1
Example 5: Consider the fuzzy grammar FG2 =((S,A,B},{a,b,c},P,S,J,J.1.)
where J, P and J.1. are as follows
r 1: S -+ aA J.1.(r 1) =J.1. H(a)
A-+ bB J.1.(r2) =J.1.v(b)
B-+c J.1.(r3) =J.1.ab(c)
180

the primitives a, b, c being 'horizontal', 'vertical' and 'oblique' directed line segments
respectively; the terms 'horizontal; 'vertical' and 'oblique' are taken to be fuzzy with
membership values JlH, Jly and Jlob respectively as dermed in the previous section.
Further, the concatenation considered is of the 'head-tail' type. Hence the only
string generated is X = abc which is in reality a triangle having membership
JlL( FG 2)(abc) = min(JlH(a),Jly(b),Jlob(C))
which attains its maximum value 1 when abc is an isosceles right triangle. Thus
L(FGV is the fuzzy set of isosceles right triangles.
The membership of a pattern triangle given in Fig. 16b is
min (1.0, 1.0, 0.66)=0.66

...

t: / )- (0)

(b)
d


Figure 16 (a) Primitive (b) Production of Triangle and Letter B.
Example 6: Consider the following fuzzy grammar for generating the fuzzy set
representing the English upper case letter B
y N = (S, A, B, C, D)
YT= (a,b)
where the primitive a denotes a directed 'vertical' (fuzzy) line segment and b denotes a
directed arc (clockwise). The concatenation considered here is again of the 'head-tail'
type.
Also J, P and Jl are as foUows
r 1: S ~ aB Jl(r l )= Jly(a)
r2: B ~ aC Jl(r2) = Jly(a)
r3: C ~ bD Jl(r3) = JlCir(b)
r4: D~ b Jl(r4) = JlCir(b)
The string generated is X=aabb having the following membership in set B.
JlB(X) = min (Jly(a)l' Jly(a)u' JlCir(b)I' Jlci/b)u)
where the suffices l and u denote the locations ('lower' and 'upper) of the primitives
a and b.
For the pattern given in the Fig. 16b
Jly(a)u = Jly(a)l = 0.83, JlCir(a)u = 0.36, JlCi/a)1 = 0.5
181

so that JlB(X) = 0.36


DePalma and Yau [26] introduced the concept of fractionally fuzzy grammars
with a view to tackling some of the drawbacks of fuzzy grammars which make them
unsuitable for use in pattern recognition problems. Some of these drawbacks are as
follows:
(i) Memory requirements are greatly increased when fuzzy grammars are
implemented with the help of parsing algorithms that require backtracking. This is
because when fuzzy grammars are being used, it is not sufficient to keep track of the
current derivation tree alone. The fuzzy value at each preceding step must also be
simultaneously remembered at each node, in case back-tracking in needed at some
step.
(ii) All strings in the language L(FG) generated by a fuzzy grammar FG can be
classified into a finite number of subsets by their membership in the language. The
number of such subsets is strictly limited by the number of productions in FG.

Definition: A fractionally fuzzy grammar (FFG) is a 7-tuple


FFG =(VN , VT , P, S, J, g, h)
where VN' VT' P, S, J are as before, and g and h are mappings from J into the set of
non-negative integers such that g(r) S h(r) for all r E J. Various applications of
fuzzy and fractionally fuzzy grammars are available in [13,25,27].
The incorporation of the element of fuzziness in defining 'sharp', 'fair' and
'gentle' curves in the grammars enables one to work with a much smaller number of
primitives. By introducing fuzziness in the physical relations among the primitives,
it was also possible to use the same set of production rules and non-terminal at each
stage. This is expected to reduce, to some extent, the time required for parsing in the
sense that parsing needs to be done only once at each stage, unlike the case of the
non-fuzzy approach [28], where each string has to be parsed more than once, in
general, at each stage. However, this merit has to be balanced against the fact that
the fuzzy grammars are not as simple as the corresponding non fuzzy grammars.

Acknowledgements

This work was done while the author held an NRC-NASA research
Associateship at the Johnson Space Center, Houston, Texas. The author gratefully
acknowledges Dr. Robert N. Lea for his interest in this work, Ms. Dianne Rader,
Ms. Kim Herhold, and Mr. Todd Carlson for typing the manuscript and Mr. Albert
Leigh for his assistance in getting some results.
182

References

1. A Kaufmann, Introduction to the Theory of Fuzzy Subsets-Fundamental


Theoretical Elements, vol 1, Academic Press, NY, 1975.
2. ADe Luca and S. Termini, A definition of nonprobabilistic entropy in the
setting of fuzzy set theory, Inform and Control, vo120, pp 301-312, 1972.
3. N.R. Pal and S.K. Pal, Higher order fuzzy entropy and hybrid entropy of a
set, Information Sciences (to appear).
4. C.A. Murthy, S.K.Pal and D. Dutta Majumder, Correlation between two
fuzzy membership functions, Fuzzy Sets and Systems, vol. 7, no. 1, pp 23-
38, 1985.
5. A. Rosenfeld, The fuzzy geometry of image subsets, Patl. Recog. Lett., vol
2, pp 311-317, 1984.
6. A. Rosenfeld and S. Haber, The perimeter of a fuzzy set, Technical Report,
University of Maryland, Center for Automation Research, TR-8, 1983.
7. S.K. Pal and A. Ghosh, Index of area coverage of fuzzy image subsets and
object extraction, Patl. Recog. Lett .• vol. 11, pp. 831-841, 1990.
8. L.A. Zadeh, K.S. Fu, K. Tanaka and M. Shimura, Fuzzy Sets and Their
Applications to Cognitive and Decision Processes, Academic Press,
London, 1975.
9. S.K. Pal and A. Rosenfeld, Image enhancement and thresholding by
optimization of fuzzy compactness, Patt. Recog. Lett., vol. 7, pp 77-86,
1988.
10. C.A. Murthy and S.K. Pal, Histogram thresholding by minimizing
graylevel fuzziness, Information Sciences (to appear).
11. C.A. Murthy and S.K. Pal, Bounds for membership functions: correlation
based approach, Information Sciences (to appear).
12. S.K. Pal and A. Ghosh, Image segmentation using fuzzy correlation,
Information Sciences (to appear).
13. S.K. Pal and D. Dutta Majumdar, Fuzzy Mathematical Approach to Pattern
Recognition, John Wiley and Sons, (Halsted Press), NY, 1986.
14. S.K. Pal and R.A. King, Image enhancement using smoothing with fuzzy
sets, IEEE Trans. Syst .. Man and Cyberns., vol. SMC-11 , pp. 494-501,
1981.
15. S.K. Pal and R.A. King, On edge detection of x-ray images using fuzzy set,
IEEE Trans. Patt. Anal. and Machine Intell., vol. PAMI-5, pp. 69-77,
1983.
16. S.K. Pal, A note on the quantitative measure of image enhancement through
fuzziness, IEEE Trans. Patt. Anal. and Machine Intell. vol. PAMI-4, pp.
204-208, 1982.
17. Y. Nakagowa and A. Rosenfeld, A note on the use of local min and max
operations in digital picture processing, IEEE Trans. Syst.• Man and
Cyberns., vol. SMC-8, pp. 632-635, 1978.
18. S. Peleg and A. Rosenfeld, A min-max medial axis transformation, IEEE
Trans. Pall. Anal. and Mach. Intell. vol. PAMI-3, no. 2, 1981.
19. M.K. Kundu and S.K. Pal, Automatic selection of object enhancement
operator with quantitative justification based on fuzzy set theoretic measure,
Palt. Recog. Lett.• vol. 11, pp. 811-829, 1990.
183

20. N.R Pal. On Image Information Measure and Object Extraction. Ph. D.
Thesis. Indian Statistical Institute. Calcutta. India. March 1990.
21. S.K. Pal and N.R Pal, Higher order entropy. hybrid entropy and their
applications. Proc. INDO-US Workshop on Spectrum analysis in one and
two dimensions. Nov 27-29.1990. New Delhi, NBH Oxford Publishing
Co. New Delhi (to appear).
22. S.K. Pal. A measure of edge ambiguity using fuzzy sets. Patt. Recog. Lett .•
vol. 4. pp. 51-56.1986.
23. S.K. Pal. Fuzzy skeletonization of an image. Patt. Recog. Lett .• vol. 10.
pp. 17-23. 1989.
24. S.K. Pal. RA. King and A.A. Hashim. Image description and primitive
extraction using fuzzy set, IEEE Trans. Syst .• Man and Cyberns.• vol.
SMC-13. pp. 94-100, 1983.
25. S.K. Pal and A. Bhattacharyya, Pattern recognition technique in analyzing
the effect of thiourea on brain neurosecretory cells, Patt. Recog. Lett., vol.
11. pp. 443-452, 1990.
26. G.F. DePalma and S.S. Yau, Fractionally fuzzy grammars with applications
to pattern recognition, in [8], pp. 329-351.
27. A. Pathak and S.K. Pal, Fuzzy grammars in syntactic recognition of
skeletal maturity from X-rays,IEEE Trans. Syst.. Man and Cyberns., vol.
SMC-16, pp. 657-667. 1986.
28. K.S. Fu. Syntactic Pattern Recognition and Applications, Prentice-Hall.
N.J., 1982.
8
FUZZY SETS IN NATURAL
LANGUAGE PROCESSING

Vilem Novak
Czechoslovak Academy of Sciences,
Mining Institute, Studentsk' 1768,
708 00 Ostrava-Poruba, Czechoslovakia

1 INTRODUCTION

NaturalllUl"~ is one of the moet complicated structures a man


has met with. It plays a fundamental role not only in human communication
but even in human way of thinking and regarding the world. Therefore, it
is extremely importut to study it in all its respeds. MlIch has been done in
anderstanding its structure, especially the phonetic ud syntactic aspects. Less,
however, is anderstood its semantics. There are muy linguistic sy.tems, often
based on let theory ud logic, attempting to grup (at leut some phenomena)
of the natllrallanguage. However, none them is fally a.cceptted ud satisfactory
in all respects.
A serio.., obstacle on the way to thit goal is, besides the complexity
mentioned, also the vagueness of the mea.ning of separate lexical nits as well
as of the sentences and 101l~r text. On the other ,ide, the capability of human
mind to talae vagueness into 8CCOant ud to handle it, which is refleded in the
semantics of naturallangu~, i, the main cause of the extreme power of natural
language to convey relevant and succint information. There i, no way 01t than
to cope with the YaiUenen in the models of natural langu~ semantics.
Fuz-zy set theory i, a mathematical theory whose program is to provide
lIS with methods and tools which may make us possible to grasp vagae phenom-
ena instrumentally. Therefore, it seems to be appropriate for using in modellillg
of natural language semantics.
In this paper, we provide the reader with an overview of the main
results obtained so far in processing of lome phenomena of the semantics of
natural language uBing fuzzy sets.
186

2 FUZZY SETS

In this .tction,we briely touch the notion of I. fuzzy eet, especially


those Uptct. which are important for our further explant.tion. Words and more
complex syntt.gma 1 of nuurallangUILge can in general be conaidered aanames
of properties encoutered by I. man in the world.
An object i. I. phenomenon to which we concede its iDdividuality keep-
ing it together nd eepan.ting it from the other phenomenL Objects are usually
a.ccompt.Died by properties. h general, however, the AIDe property a.ccompa..
nies more objtcts. If all such objects are grouped together then they Ct.D be
seen aa one, Dew objtct of I. spedal kind. A grouping of objtcts being teeD aa
t.D objtct is called I. eta". Hence, if tp is I. property then there is I. clus X of
objtcts z hlLving tp. In symbols we can write
x- {Z;tp(z)}. (1)

If the property tp is aimple and sharp then the clus X forma I. eet.
However, most properties I. man meets iD the world are not of this kind. Then
the class X is Dot separeted sharply, i.e. there is no wt.y how to nt.me or
imagine all the object. z from X without any doubt whethf>r I. given object z
haa the property tp, or llOt. Thus, we encouDtered the phenomenon ofVlpeness.
The t.bove melltioned doubt, which probt.bly stems from the illDer, still Dot
understood, complexity of tp is I. core of the pheDomeDoD of Vlpenes. being
encontereci. Clusical mt.themt.tiea haa no other pOllibllity than to model the
grouping X using (sharp) sets. Therefore, the result cannot be .t.tiefa.ctory
from the very beginDing. UDlike clusical eet theory, fUD)' set theory t.Uempts
t.t finding I. more suitt.ble model of the class (1).
Let us take the objtcts z from lOme sufficiently big eet U called the
unif1tr,t. Note tht.t this usumption is not reatrictive since such I. eet alwe.ys
exist. FOr eXlLmple, colllider the property tp :e'o Ie 0 ,moll numltr'. Then
there surely exist. I. number ~ e N which is not small (e.g. ~ == 210 and we me.y
put U .. {z e N;z ~ ~}).
Our doubt whether an object z e U haa the given property tp cn be
expresaed by means of I. certain scale L haviDg the smallest 0 and pt.teet 1
elements, respectively. Thus, 1 expresses thlLt tp(z) (z haa the property tp) with
DO doubt while 0 means that tp(z) does Dot hold t.t all. We obtlLill I. fuction
A:U-L (2)
usigDing an element
beL
aD,
I A .1ftt..... i. part of a MIItftee (nft a word or a whole MIItftee) that it eo...tnaeted
aceordin, to tIM sramatieal ruIet.
2 A Datural Dumber, for timplicit,.
187

from the ecale L to each element


zeU.
Thi. f~"c\ion serves us as a ~rta.in charaderi'tation of the class X in (1) and
it is called the Juzq I t t We can view the fuzzy set A as a set
{Az/z;z e U}. (3)
The element
beL
is called the mtmltr,"ip It,rtt of z and thus (2) i. otten called the mtmltr'Aip
Jundion of the fuzzy set A. One can see that a fuzzy set is Identified with its
membership fuction. U A is a fUDY set (3) in the universe U then we write
AC NI
U. The scale L is usually put to be L =< 0, 1 > and it is Ulumed to form
the structure
C-«O,I>,V,A,®,-,O,l> (4)
where V and A are the operations of ,upremum (maximum) and infimum (min-
imum) respectively, ® is the operation of loll prolud defined by
1I®6-0A(II+6-1)
and - is the operation of rt,illuum defined by
II - 6 = 1 A (1- 11+6)
for all the 11,6 e< 0,1>.
There are deep reasons for the choice of this structure. The reader may
find them in (13,16J. The operations with fuzzy seta form the structure 4. The
basic ones are
union
C=AuB ift" Cz-AzvBz
in'eruetion
C = An B ift" Cz = Az A Bz
loll interJection
C = AnB ift" Cz - b0Bz
refilluum
C = A9B ift" Cz = Az - Bz
On the basis of residuum, one can done the complement A:II: A 9' s
where' is the empty fuzzy set
• - {O/z;z e U}
'Thi. defiDitioD P"- A• • 1 - U for all • E V which it the .aal defiDitioD of the
complement.
188

In modelling of natural language semantics it is necessary to introduce


also new, additional operation•• Put

(biresiduation) and
0' - ..II ® ...
....
® II.,
,-timea
(power) for all the 0,6 e< 0,1 >.
When introducing a new ft - or, operation 0 on L, the following jitein,
e,ndiC',n must be fulilled: there are '1, ... ". such that

holds for every G,,6, e L, i-I, ... , ft. The justification ofthe fitting condition
can be found in (16,15). Note that all the basic operations fulfil the fiUlng
condition. Moreover, the folowing hold, true:

Theorem 1 All 'Ae 'lae ,peroCi,n. derived lrom 'Ae ,per,di,n. /Vllillin, 'Ae
jiltin, e,ndi,i,n Iflllili', tI'
welL

Proof - see (16).

The following operatioDl are fitting:


produd

hunded .um

c,ncen'ro",n

diltl'ion •

in'en,ijica,i,n

() {
2G2 G e < 0, 0.5 >
INT G - 1 _ 2(1 _ .... )2 0 e (0.u, 1 >
If

for all the G,6 e< 0, 1 >.


4Th. wieWy ..ed operatioD of dilaUOD D1L(.) &: .1.1 i. Dot fiUi, and thu. it ca_ot be
oed.
189

The operations in L lead to the operatiol\ll with {uny sets u follows.


Let
o:L· - L
and A 1, ••• ,A. £ U be funy sets. Then 0 is a buis of the operation 0 auigning

-
a funy set C C U to Ah ... ,A. when we put

(G)
for every z e U.
For example, we can define the operation of bounded BUm of f1lZ~y sets
by putting
C = AI:IB if Cz - Az$Bz
for every z e U.
A very important notion is that of a funy tONinoli', of a funy set.

-
There are several kinds of them (13,22). We will use the following ones defined
for funy sets with finite support: Aholu'e juu, eONinGli', of A C U:

FCard(A) - {a./n;n e N} (7)

where
a. = V{ft;Card(A,) = n}

-
and A, is a {J cut of A. RelGtive JuZ%, eGNinGlit, of A with respect to B where
A,B C U:
FCardA(B) - {ad';' e Be} (8)
where

The notions illtroduced above will be used in tile sequel. For other
notion. ud operation. with {uny set. see e.g. (13,4).

3 THE USE OF FUZZY SETS IN MODELLING OF


NATURAL LANGUAGE SEMANTICS
a.l The general repl'e8elltation of the meaning

Let u. tum ou attention to the problem of gruping of naturalluguage


.emantics. A sentence of naturallugulle can be viewed from several points.
In claaaicallinguistic, it is usual to talk about representation of a sentence on
variOU8 levels. In the .ystem called the Junetionol ,enena,i,e de,erip'ion o{
190

natural language (FGD),(aee (18» Ave levels are dlft'eretiated, namely ,Aonelic
(PH) (how a sentence is composed u a .y.tem of souds), ,A,nemic (PM)
(how word. of a sentence are composed), morphemic (MR) (how a llentence is
composed of ita words), '''r/oc, "n'" (88) (the ')'Item of grammatical rules)
and tectogrammatical (TR) which is the highest level correspondinl to the Be-
mlJttics. The latter is also called the lee, ,'rud",.,
of the sentence and this
structure i. the objective of poeaible application of fulll)' eet theory• .M hu al-
ready been .tated, word. and more complex .yntagma of natllrallangaage can
be udel'8tood to be names of propertics enocountered by a man in the world.
In the lilht of the previous section, fuzzy set. can be 118ed u follows: let A be
a syntagm of nat1lrallangaage and rp the correspondinl property. If the clua
(1) determined by rp is approximated by a fun)' eet A £ U then the meaninl
M(A) of A is
M(A) = A. (9)
Thu., our job consists in determininl of the membenhip fuction A.
However, the situation i. by no' meanl simple u not every word of
naturallangaage coreaponda to such a property and, above all, there are vari0118
relations between words. Thus, determination of the membenhip function which
corresponds to a complex syntagm may be a very complicated tuk.
On the tectogrammaticallevel, a meaning of a sentence is represented u
a complex dependency .tructure which can be depicted in the fonn of a labelled
graph. For example, the eentence

Peler wri'" 0 ,Aon 'e"" '0 Ai'/rient!.

can be depicted in the form of a graph on Fil. 1.


For the detailed explanation of this graph see (18,17]. The letten , and J
mean lopie and 10e.I, respectively. A topic is a part of a sentence containing the
theme which is spoken about and foc.. contai.. a new information conveyed
by the eentence. Of coune, one surface .tructure of a .entence may lead to
.everal deep structUrel. Up to now, we are far from the detailed udentandinl
to all the nuances of the sentence semantics. The pretent .tate of art makes
118 poeaible to model the meaninl of only some simple Iyntagma, i.e. certain
branches of the tectogrammatical tree IUch u thu on Fig. 1. Thi. will be
diaac1l8ed in the subsequent sections.

s., Fu••y eemantics or selected 8)'DtaglD8

Fint, let us stop at the modellinl of the semantics of nouDI. In leneral,


if S is a nou, then it. meaning is a fuzzy eet

M(S) - S,S £ U.
191

;7 Ad~
(Write,')

(Peter, f) (Filend,') (LetterJ)

Appun G,n,r

(He,C) (ShortJ)

Figu~ 1: The tectocra.mmatical t~ or the sentence Pder wriCu 0 ,Aon Idfer


fo Ai, friend

What is the uiwrte U? It is a set of objects choeen in such a way that whenever
an object z hu the property f/ls Damed by tile 1l0UD 8 tIlen z E U. This co
be COll.tructed e.g. u follow.: Let K be a set or
generic e1emellts called the
keme1 space. For example, K C&D be a ullion of all the objects described in our
dictionary,or thOle we have ~garded during the last week, of those we see in
our tat etc. In short, K .Ilould contain all tile sp«ific object. we haw met or
or
imagine. Let F(K) be a set all the fuzzy aet. on K and put

r(K) = F( ... (F(K) ... )


.. .... '
.-times

Let EK be the smallest set closed with respect to all the Cartma
powers of K, or r(K),. - 1, ••• ,. and all the Cartesian produeta of these
elementa. Thia set is called the "monCie 'poee. Then the universe of S is a
sufficiently big subset U ~ EK.
A certain problem is the determination of tile membership function.
There are sewn! metllods propoeed jn the liter&tu~ (d. [13]). The member-
,hip function correaponding to the object noUDS (e.g. 'oJI" eor, doni" etc.)
could be constructed Oil the basi, of the outer cltaracteriaties of elementa. For
example, we may use proportioDl of lOme geometric patterna contained in ob-
jecta etc. Avery often used method ia atatiatical analy.ia of expert (subjectiw)
eatimation.. Several experiment. have been deacribed in the literature. Let UI
mention that fuzzy methoda are rather robu.t andthaa exact determination of
the membership function ia not as important u it might seem at first glance.
192

The experience SU""ts that even individual estimation worb well when it is
done carefully and seriously.
In practical applications, e.g. in artificial intelligence, it is not very
useful to model the meaning of nouns because we would have to find proper
representation of its elements in the computer which, in fact, we do not need. The
moet successful applications are bued on modelling of the meaning of adjectives
and the syntagms of the form

(fuantifier -) adv.rl - adjective (- noun)

where the syntagm


all,er. - adject;" (10)
plays the crucial role. The moet important (and very freequnt) adjectives are
thOle inducing an ordering ~ in the univeJ'8e U. We will uaume that ~ is
lineU'. According to lingustitic coneiderations u well u experiments (d. (10)),
there are certain points m,., tI, e U where m < • < tI. The point • is called
the umantie eenter. The adjectives inducing an ordering in U llIually form
antonyms which can be characterized u follows. Let,A- ,,A+ be antonyms (e.g.
small- big, cold - hot etc.). Then their meanings are fuzzy sets

M(.A-) = A-
M(,A+) = A+

such that SuppA- ~< m,.) and SuppA+ ~ (.,11 > where SuppA := {z e
UiAz > O}.' We will often call ,A- a nef.';'" and .A+ a pDI;I;ve adjective,
respectively. There are also couples of antonyms nch that a third member .A0
exist. Its meaning is
M(,A°) _ AO
where the membership function AO hu the property

I"e KerAo = {z e UjAOz = l}e.


A typical example of the adjective.A° is,A°:.a"era,e. In the Bequel we will call
,A0 a uro adjective.
The curves corresponding to the fuzzy sets A- ,A+, AO have character-
istic shapes depicted on Fig. 2. Note that they are sometimes called the S- , S+
and n fuzzy sets, respectively.

'Thi, .et it caDed the _nul


'Thi, .et it eaDed the ,.pport of the fllB, .et ... .
of the fllB, tet ...' .
193

1.0 1.0

0.5 0.5

1.0

0.5

42 U

Figure 2: The membership functiODS corresponding to the meaning of the ne-


gative, positive and zero .yntagms.
194

A general formul1l. for &ll the three funy sets is the following:
o if Z < 41 or Z > 42
1 if Cl ~ Z ~ C2
1
:a
(,!::.!1..)
tt-e1
3
if 41 ~ Z < 61

F( Z,411 611 ClI C3, 6 )=


3, 43
1- 1(.L:SJ..)3
:a C1-'1
iU 1 <_ Z < Cl

1- l ((:s: f if C:a < Z ~ 6,

~ (:,__~ f iU:a < Z ~ 43


lie The mening of the poinu 41t 61t Cit 43, 62 , C2 e U is cleu from Fig. 2.
The adverb in (10) is u intensifying one (e.g. ver" Ai,AI" .""lutel"
,Ii,AII, etc.) ud it is usu&lly ce.lled the /in,ui"ie modifier in (azzy set theory.
In genenl, the meuing of the intensifying adverb m is & pair of function.

M(m) =< ,.,v. >


where , . : U - U is & displacement function ud v'" : L - L is & un&ry
opere.tion fitting LT. Hence, the mening of the syntagm (10) is obtained using
the comp08ition of fuctions

M(mA) - v. oAo'•• (11)

A typical example, widely used in fuzzy set theory, is the modifier ver,
defined as follows:
Vnr ,(4) - CON(4), 4 e< 0,1 >
ud
' ..r,(Z) = Z + (-1)" . d·1I Ker(A) "
where Ie - 1 for A+ or for AO if Z ~ I, ud Ie :I:: 2 for A- or for AO if Z ~ ••
II Ker(A) II is the length of the interval < int(Ker(A»,sup(Ker(A» >. The
puameter d was experimentally estim1l.ted to be & number d e< 0.25,0.40, >.
Examples of some other linguistic modifiers cu be found in the lite~
ture.
The me1l.ning of verbs is & very complic1l.teci problem ud so fu, only
the copula "to be" in .yntagma as

p:= l' i• .A (12)


'Some ••tholl .implify tid. model by PllUini (. -= idl1 (u idntieal function 08 U).
195

is modelled where .A is usually the Iyntagm (10). However, (12) is interpreted


rather as a limple aslignement than a verb. The'P in (12) il a noun but it il
1I8ully not treated u IUch. Thlll, we obtain two ways how the meaning of (12)
can be modelled.
a) We put
M(P) = M(A) = A,
i.e. the meaning of p ia set equal to the meaning of the aynta.gm.A. This ia quite
reasonable lince, as was Itated above, we llIually need not know the meaning
of the noun 'P in the applicationa.
b) Let M('P) == Pc
,.., V. Then we put
M(P) S;; P x A (13)
where Px A ia the Cartesian prodllct of fllfty seta defined by

(P x A) < s" >- PsA A,.


The inclusion in (13) may be proper or improper dependingly on the
kind of the noun 'P. The I'dation (13) meana that each element s from the
universe V (a representative of the noun 'P) ia uigned an attribute, from the
universe U where A CU'.
,..,
Aa we are in the fuz~ environment it seems reasonable to take the
reaulting membership degree of the couple < z" > u minimum of Ps and A,.
L. A. Zadela [22,21) and lOme other authors following him suggest to
interpret the membership degree (P x A) < z" > as a p."ifilil, le,ree of the
fact that s ia A. However, the poaibility degree concerna uncertainty which,
in our oppinion, does not reflect the vapenesa phenomenon contained in the
semantics of natural language.
Very important are the conditional sentences of the form
C :- IF p THEN q (14)

where p and q are ayntagma of the form (12). In fuzzy .et theory we llIually put

M(C) == M(p) 9M(q), (15)


i.e. we interpret the implication (14) using the reaidum operation between
fu~ set. M(p) and M(q).

The interpretation of the conditional lentences (14) playa the crucial


role in the 80 called approximate reuoning which is one of the moet lucceaafully
• U i. a.aally the real Ime. For e..mple, in the 'YDtapDI .ach u Pe'er .. ''''. the elemeDt
Pcf.r i, _ped it. Ia.,ht beia, • real Bamber.
196

applied areas of fuzzy eet theory (lee e.g. (8,9)). Let us remark that many
authors interpret (14) as the Cartesian product

M(C) = M(P) x M(q). (16)

In the applicationl of the approximate reasoning, thi, may work since all the
fuzzy methodl are very robut. However, puUing the meaning of the implication
(14) equal to the Cartesian product is linguistically as well as logically incorrect
lince (16) it symmetric and the implication il not. Another reason why (16)
often worb in the practical applicationl may be the fact that the implication
(14) often describes only some kind of a relation between the input and output
and it it not, in fact, understood to be the implication. This discrepancy need.
Itill more analysil.
Let us alto mention the problem oflinguiatic quantiAers (we will denote
them by the letter Q). They do not form a uniform group form the linguiltic
point of view. We place among them numerals including the indeAnite ous (e.g.
,et1ertJ~,some adverbs (e.g. mlln" Jev, mo"),some pronoulll (e.g. e,err),some
noulll (e.g. majo"'" mino"',) and others. From the point of fuzzy eet theory,
their meaning generally il a fuzzy number, i.e. a fuzzy set M(Q) = Q £ Be in
the real line. It il propoeed in (22) how to interpret the syntagma of the form

(17)

or
QA'. are 8' •. (18)
In (17), the quantiAer Q it interpreted as a fuzzy characterization of the absolute
fuzzy cardinality (7) of the fuzzy set

A-M(A)

while in (18), it characterizes the relative fuzzy cardinality (8) of M(A) with
respect to B = M(B). More exactly, we pat

M(C2A) - QnFCard(A) (19)

and
(20)
However, the problem il not fiDiahed yet lince (19) and (20) have eenle only for
fuzzy eetl with Anite IUPPOrt.
In the literature on fuzzy let theory (lee e.g. [19,20,22,13) and others),
one may bd the semantics of the compound Iyntagma of the form

A and B (21)
197

and
.A 0,. B (22)
deADed using the operations of interaection and union of fuzzy seta, respectively.
However, thil il only a tentative IOlution since the Iyntagms (21) and (22) are
Ipecial cues of the very complicated phenomenon bown in linguistics u the
eotmlina,ion. The use of the operationl of intenection and union of fuzy setl
may work in lOme special cue. of the dOle coordination in Iyntagms e.g. Pd,,.
ani Pltd •••, Oil ani lirt, lIJr ••• etc.
Even wone situation il encountered with negation. The simple ue
of the operation of complement of fuzzy sets works only with lome kinds of
adjective. and nouns. However, negation contained in more complex Iyntagms
il inodental to the phenomenon of the 'op;e-Ioea, art;,.da'ion when only focus is
being negated. FUlly comprehensive description of this phnomenon in linguistics
is not, however, still done.

4 FEW COMMENTS TO THE APPLICATIONS AND


LINGUISTIC APPROXIMATION
The theory presented 10 fu hu found many interesting applications,
especlaly in the modell connected with the 10 caDed .ppro';ma', ,.,.,on;n,.
However, we ue quite far from gruping of the semantics of naturallanguaae
more compreheniively and much work hu still to be done.
A very important concept which deserves to be mentioned here is that
of a lin,,,;,,i, varioJ/, (20). This concept made UI pouible to see a certain part
of linguistics from a more technical point of view.
A linguistic variable ii, in general, a quintuple
< X,T(X),U,G,M >
where X is a name of the variable, T(X) is its term-aet, U is the univene, G
the syntactic and M the semantic rules, respectively. For example,
X:- .in,
U :-< 0,1000 >.
G is a certain, 1l8ually context-free gr&mmU generating the let T(X) of terms
IUch u .;" .ma/~ ",,., .;" ro,la,r a",ro" etc. and M il the Iyntactic rule
usigning to each term.A e T(X) itl meaning being a fuzzy let
M(A) £ U.
Lingustic variables play important role in applicationl. For example, the pa-
rameten of a technical system such u ',mpera'ure, 'peel, tlft;,/a, etc. can be
198

understood to be ling1lltic variables. As its values can be also crisp, e.g.ezadl,


181.' etc., the concept of a linguistic variable i. ~neral eaough to capture &lao
the clanical concept of the variable.
In application., one may &lao meet the problem of a lin,uiltie .,rozi-
m.';on. We may lay it down u follows. Let T be a !Jet of .yntagma of natural
language e.g. the term....et of a lingui.tic variable u.d let 111 be given a fuzzy
eet A:, C
,.., U. Our tuk is to find a .yntagm .Ao E T IUch that it. meu.ing
M(.Ao) - Act is u dOle to A:, u pOllible.
There are mu.y ways how to solve thi. tuk. However, no sufficiently
eflicieat and general method i. known till now. One of the poeaible proceduree
is the following.
Let J :< 0,1 >-< 0,1 > be a smooth, increuing, u.d meuunble
function. Put

II < A,B >_ 1 _ Is.,,(A)us.P,(B) df(As- Bs) (23)


Is.pp(A) "(As) + JS.,,(B) dJ(Bs)

Then we may find .Ao E T IUch that

R < Ao,A:, >


is maximal one.
ll'T hu small number of elements then we may also find .Ao such that

(24)

is minimal for .ome suitable, previlOusly set number p. This method is often
llIed in thecnical application ••
A quite eft'ective procedure wu proposed by P. Eeragh and E. H. Mam-
du.i in (&J. This procedure i. suitable for the .yntagme of the form (21) u.d (22)
where .A u.d B may consist of an adjective, a n01lD u.d a linguistic modifier,
and the universe U i. ordered. According to this method, the membership func-
tion A~ is divided into parts by the eft'ective turniag points (i.e. special points
where the membership fu:action chu.p it. course), the parts are approximated
by the above partial syntagme .A, B uing (24) u.d the resulting syntagm is
obtained by joining .A and B uaing the corresponding connective. In particular,
if the two neighbouring part. of the membership function form a "hill' then the
corresponding syntagme are joined by the connective anti, u.d if they form a
"valley" then they are joined by the connective Dr. Note that this works only
in the cue when the connective an~ is interpreted u the intersection u.d Dr u
the union of fuzzy eets.
199

Ii CONCLUSION

We have briefty presellted the main ideu of the modelling of natural


langua~ semantics using fuzzy set theory. We attempted to demonstrate how
the semantics of some of the buic units can be interpreted, namely the semantics
of nouns, adjectives, eelected adverbe and the copula "to be". Moreover, the
semantics of some cues of the dOle coordination (the 1l8e of connectives) wu
also touched along with the semantics of conditional sentences. Let us stress
that we are still far from grasping of the meaning of more complex syntagma,
and even simple clauses when they contain a verb. The reuon consists in an
extreme complexity of verb semantics since verbe represent the ID08t important
units of our language stepping towards the human's recording the surrounding
world on the highest level of his intellectual capability.
Some work in thia respect is done in (14J where, however, the new world
of mathematics calJed the .hemalitle .d llaeo,., (AST) is used. Fuzzy set theory
serves there u a special technical tool which is used at a second stage after the
semantics of a sentence (syntagm) in the frame of AST is formed.
Despite the above facta, the use of fuzzy sets in modelling of natural
language semantics hu already found many succeaaful applications. This is a
convincing argument in favour of the uaefullnell of fuzzy set theory.

References
(lJ Be'Zdek, J. (ed.), Analys. of Fu ••y Infonnation - Vol. 1: Math&
rnatica and Logic, eRe Press, Boca Raton, Fl. 1987.
(2) Bezdek, J. (ed.), Analysis of Fussy Information - Vol. ~: ArtiAdal
Intelligence and Decision Syatell18, eRe Preu, Boca Raton, Fl. 1987.
(3J Bezdek, J. (ed.), Analysis of Fos.y Information - Vol. as Applica-
tions in Engineering and Science, eRe Press, Boca Raton, Fl. 1987.
(4J Dubois, D., Prade, H., Fussy Seta and Systema:Theory and Appli-
cations, Academic Press, New York 1980.
(6) Eeragh, F., Mamdani, E.H. A ,eneralappreaela to lin,u •• t'e appres'mal'on,
Int. J. Man-Mach. Stud., 11(1979), &01-619.
(6) Gaines, B.a, Boose, J.U. (edI.), Machine Learning and Uncertain
ReMOning, Academic Preas, London 1990.
(7) Girdenfora, P. (ed.), Generalised qoantiAen, D. Reidel, Dordecht, 1987.
(8J Gupta, M.M., Yamakawa, T. (eds.), Fussy Compotings Theory, Hard-
ware and AppHcations, North-Holland, Amsterdam 1988.
200

(9) Gupta., M.M., Yamab.wa., T. (eds.), fullY Logic in Knowledge-BMed


Sy.tetm, Ded.ion and Control, North-Holland, Amsterdam 1988.
(10) Kuz'min, V. B., Aiou' .emonficol "rudure o/lin,ui,'ic "ed,u:on eqeri.
men'ol ""otle.;" BUSEFAL 24, 1985, 118 - 125, Univenit~ Paul Sabatler,
Toulouse.
(11) Lakoft', G., Ned,e.: A "ud, in meonin, criCerio ond lo,ic olluu, conceptI,
J. PhilOi. Logic 2(1973), 458 - 508.
(12) Mamdani, B.H., Gaines, B.a. (ede.), Fu••y R.e8lOnmg and ita Appli-
cation., Academic Press, London 1981.
(13) Nov'-k, V., Fu.lY Set. and Their Applications, Adam-Hilger, Bristol,
1989.
(14) Novak, V., The Alternative Mathematical Model of Natural Lan-
guage Semantlca. Manuscript. Mining lutitute, Ostrava 1989. (To be
published by Cambridge Univenity Press)
(16) Nov'-k, V., Pedrycz, W., Fuzz, U" ond '·norm. in Cle
Int. J. Man-Mach. Stud., 21(1988), 113 - 127.
Ii,,,, o//uzq lo,ie,
(16) Pavelka, J., On /uti, lo,ie 1, 11, 111, Zeit. Math. Logic. Grandi. Math.
21(1979), 4~52, 119-134, 447-464.
(17J Sgall, P. (ed.), Contribution. to functional 8yntax, semantics, and
language comprehen.ion, Academia., Prague 1984.
(18) Sgall, P., Hajioova, B., Panevov{, J., The meaning of the eentence in
ita semantic and prapnatic -peets, D. Reidel, Dordecht 1986.
(19) Zadeh, L.A., Quon,i'o,ite Fuzz, Semon,iCl, hf.Sci.,a(1973), 169-176.
(20) Zadeh, L.A., T"e eoneep' 01 • linlui.,ie ,orioile ond i,. opplieo,ion '0
oppronmote reolOnin, I, 11, 111, Inf.Sci.,8(1975), 199-257,301.357;1(1975),
43-80.

(21) Zadeh, L.A., PRUF - 0 Meonin, Repreun'otion Lonluo,e lor No'u,..l


Lon, tAli' e" Int.J.Man-Mach.Stud.!0( 1978), 396-460.
'0
(22) Zadeh, L.A., A eomputll,ionol opprooel /tuq fuan,ijier. ;n no'uNllan.
,tAo,e., Comp. Math. with Applic. 0(1983), 149-184.
9
FUZZY -SET -THEORETIC
APPLICATIONS IN MODELING OF
MAN-MACHINE INTERACTIONS

By Waldemar Karwowski
Center for Industrial Ergonomics
University of Louisville
Louisville, KY 40292, USA
and
Gavriel Salvendy
School of Industrial Engineering
Purdue University
West Lafayette, IN 47907, USA

INTRODUCTION
According to Harre (1972) there are two major purposes of models in
science: 1) logical, which enables to make certain inferences which would not
otherwise be possible to be made; and 2) epistemiological, to express and extend
our knowledge of the world. Models are helpful for explanation and theory
formation, as well as simplication and concretization. Zimmermann (1980)
classifies models into three groups: 1) formal models (purely axiomatic systems
with purely fictitious hypotheses), 2) factual models (conclusions from the models
have a bearing on reality and they have to be verified by empirical evidence), and 3)
prescriptive models (which postulate rules according to which people should
behave). The quality of a model depends on the properties of the model and the
functions for which the model is designed (Zimmermann, 1980). In general, good
models must have three major properties: 1) formal consistency (all conclusions
follow from the hypothesis), 2) usefulness, and 3) efficiency (the model should
fulfill the desired function at a minimum effort. time and cost).
Although the usefulness of the mathematical language for modeling
purposes is undisputed, there are limits of the possibility of using the classical
mathematical language which is based on the dichotomous character of set theory
(Zimmermann, 1980). Such restriction applies especially to the man-machine
systems. This is due to vagueness of the natural language, and the fact that in
empirical research natural language cannot be substituted by formal languages.
Formal languages are rather simple and poor, and are useful only for specific
purposes. Mathematics and logic as research languages widely applied today in
202

natural sciences and engineering are not very useful for modeling purposes in
behavioral sciences and especially in human factors studies. Rather, a new
methodology, based on the theory of fuzzy sets and systems is needed to account
for the ever present fuzziness of man-machine systems.
As suggested by Smithson (1982), the potential advantages for
applications of a fuzzy approach in human sciences are: I) fuzziness, itself, may be
a useful metaphor or model for human language and categorizing processes, and 2)
fuzzy mathematics may be able to augment conventional statistical techniques in
the analysis of fuzzy data. Fuzzy methods are useful supplements for statistical
techniques such as reliability analysis and regressions, and structurally oriented
methods such as hierarchical clustering and multidimensional scaling.

HUMAN FACTORS
Human factors discipline is concerned with "the consideration of human
characteristics, expectations, and behaviors in the design of the things people use
in their work and everyday lives and of the environments in which they work and
live" (McCormick, 1970). The "things" that are designed are complex man-
machine systems. According to Pew and Baron (1983) the ultimate reasons for
building models in general, and man-machine models in particular, are to provide
for:

1. A systematic framewor~ that reduces the memory load of the


investigator, and prompts him not to overlook the important
features of the problem,
2. A basis for extrapolating from the information given to draw new
insights and new testable or observable inferences about system or
component behavior,
3. A system design tool that permits the generation of design
solutions directly,
4. An embodiment of concepts or derived parameters that are useful as
measures of performance in the simulated or real environment,
5. A system component to be used in the operational setting to
generate behavior, for comparison with the actual operator behavior
to anticipate a display of needed data, to introduce alternative
strategies or to monitor operator performance, and
6. Consideration of otherwise neglected or obscure aspects of the
problem.

According to Topmiller (1981), research in man-machine systems poses


an important methodological challenge. This is due to the complexity of such
systems, and a need for simultaneous consideration of a variety of interacting
factors that affect several dimensions of both individual and group performance.
Chapanis (1959) argues that "we do not have adequate methods for finding out all
the things we need to know about people. Above all, we need novel and
imaginative techniques for the study of man. This is an area in which behavioral
203

scientists can learn much from the engineering and physical sciences."
Research techniques applied in man-machine research typically include the
following methods: 1) direct observation (operator opinions, activity sampling
techniques, process analysis, etc.), 2) accident study method (risk analysis, critical-
incident technique) 3) statistical methods, 4) experimental methods (design of
experiments), 5) psychophysical methods (psychophysical scaling and
measurement), and 6) articulation testing methods (Chapanis, 1959). Today we are
stiU at the beginning stage of building robust mathematical models for the analysis
of complex human-machine systems. This is partially due to lack of appropriate
design theory, as well as complexity of human behavior (Topmiller, 1981). The
human being is too complex a "system" to be fully understood or describable in all
his/her properties, limits, tolerances, and performance capabilities, and no
comprehensive mathematical tool has been available up to now to describe and
integrate all the above mentioned measures and findings about human behavior
(Bemotat, 1984).

FUZZY MODELS
Human work taxonomy can be used to describe five different levels
ranging from primarily physical tasks to primarily information processing tasks
(Rohmert, 1979). These are:

1) producing force (primarily muscular work),


2) continuously coordinating sensory-monitor functions (like
assembling or tracking tasks),
3) converting information into motor actions (e.g. inspection tasks),
4) converting information into output information (e.g. required control
tasks), and
5) producing information (primarily creative work).

Regardless of the level of human work, three types of fuzziness are


present and should be accounted for in modeling of man-machine systems, i.e.,: 1)
fuzziness stemming from our inability to acquire and process adequate amounts of
information about the behavior of a particular subsystem (or the whole system), 2)
fuzziness due to vagueness of the relationships between people and their working
environments, and complexity of the rules and underlying principles related to such
systems, and finally, 3) fuzziness inherent in human thought processes and
subjective perceptions of the outside world (Karwowski and Mital, 1986). Figure
1 illustrates the above thesis. Traditional man-machine interfaces, which include:
1) information sensing and receiving, 2) information processing, 3) decision-
making, 4) control actions, and 5) environmental and situational variables, are
represented in two blocks, i.e., human interpretation block and a complex work
system block.
204

+ .-
PERCEIVED TASK Hl..NANOPERATOA CftRATOA caAA..EX

,
&MRCNAENT ~ INTERPRETATJa.I ~ RESf'CH)E
~ WOA< SYSTEMS
(FUZZINESS) (HUMAN FtN;TIONING) (FUZZINESS)

~
t
PERCEIVED PERCEIVED
TASK DEMAtm '--- TASK WOR<lOAD 4-

L ~
IIaM)

AND INTUITION
--1

Figure 1. Fuzziness in man-machine interfacing (after Karwowski


and Mital, 1986).

Uncertainty, (looked upon in the context of mental workload) which


causes unpredictability in one's stimulus and/or response, enters a work situation
from several sources (Audley et al., 1979). These are: 1) external disturbance
model, 2) varying parameters of the system structure external to the human
operator, 3) human produced noise in observing the task stimuli, 4) lack of good
internal model of the external system, 5) human-produced distortions in
interpreting the externally stipulated criterion of performance, and 6) human-
produced motor noise.
In view of the above, the theory of fuzzy sets offers a useful approach
when the task demands are vague, with the main advantage being its ability to
model imprecise task situations and, therefore, a potential to develop a framework
for implementation of workload measures.
205

FUZZINESS AND HUMAN-MACHINE SYSTEMS


Man-machine studies aim to optimize work systems with respect to
physical and psychological characteristics of the users, and investigate complex and
ill-defined relationships between people, machines, and physical environments.
The main goal of such investigation is to remove the incompatibilities between
humans and tasks, and to make the workplace healthy, productive, comfortable and
satisfying.
Human-centered systems, which are the objects of man-machine studies,
are very complex and difficult to analyze. There are at least three different types of
uncertainty inherent to such systems; i.e., inaccuracy, randomness, and vagueness
(Bezdek, 1981). Uncertainties due to inaccuracy are related to observations and
measurements (representations), while those due to randomness (of events) are
independent from observations and constitute an objective property of some real
process. Uncertainty due to vagueness (or fuzziness) has to do with the complexity
of the system under investigation and the human thought and perception processes
(Zadeh, 1973).
A new methodology in the area of man-machine is needed to account for
imprecision and vagueness of such relationships. Zadeh (1974) points out that
.. Although the conventional mathematical techniques have been and will continue
to be applied to the analysis of humanistic systems, it is clear that the great
complexity of such systems call for approaches that are significantly different in
spirit as well as in substance from the tr:aditional methods -- methods which are
highly effective when applied to mechanistic systems, but are far too precise in
relation to systems in which human behavior plays an important role."
In the past, most of the traditional methodologies disregarded the system
complexities, and assumed that the formal properties of mathematics correspond to
existing relationships characteristic to the system under investigation (Zadeh,
1974). For example, an uncertainty due to vagueness was often modeled as being
of stochastic nature. Such treatment appears to defeat the purpose of any formal
man-machine systems' analysis and modeling efforts.

The concept of fuuiness


Fuzziness relates to the specific kind of vagueness having to do with
gradations in categories, i.e., degree of vagueness (Smithson, 1982). Uncertainty
measured by fuzziness refers to the gradation of membership of an element in some
class (category). Although such uncertainty arises at all levels of cognitive
processes, people have the abilities to understand and utilize vague and imprecise
concepts which are difficult to analyze within the framework of traditional
scientific thinking (Hersh et al., 1976; Kramer, 1983; Karwowski and Mital,
1986). Therefore, awareness of vagueness and inexactness, implicit in human
behavior, should be the basis of any man-machine studies.
According to Zadeh (1965), the theory of fuzzy sets represents an attempt
for constructing a conceptual framework for a systematic treatment of vagueness
and uncertainty due to fuzziness in both quantitative and qualitative ways. Such
framework is much needed in the human-machine interaction area. As pointed out
206

by Singleton (1982) "most human characteristics have very complex contextual


dependencies which are not readily expressible in tabulations of numbers even in
multivariate equations." Yet, there is growing evidence that people comprehend
vague concepts, such as concepts of a natural language, as if those concepts were
represented by fuzzy sets, can manipUlate them according to the rules of fuzzy logic
(Oden, 1977 and Brownell et al., 1978). Recent research in semantic memory and
concept fonnation indicates that natural categories are fuzzy sets with no clear
boundaries separating category members from nonmembers (McCloskey et al.,
1978). One can certainly understand the meaning of such concepts as "excessive
workload," "low illumination," "heavy weight," "high level of stress," and "tall
man," to name a few commonly used descriptors of the human-environment
relationship.
As noted by Singleton (1982), "no one has yet developed a comprehensive
set of crude and approximate but simple and inexpensive techniques finding
solutions to ergonomics problems." Fuzzy set theory. which allows interpretation
and manipulation of imprecise (vague) infonnation and recognition and evaluation
of uncertainty due to fuzziness (in addition to randomness). may be the closest
solution to the above stated need available today.

Conventional versus lu1.1.Y set theory and logic


In a conventional (classical) set theory. an element x either belongs or
does not belong to a set X. and the characteristic (membership) function fx can be
represented as foUows:

~ 1 if x E X (truth value = 1: true)


lO if x ~ X (truth value =0: false)

The concept of fuzzy set extends the range of membership values for fx•
and allows graded membership. usually defined on an interval [0. 1] .
Consequently. an element may belong to a set with a certain degree of
membership. not necessarily 0 or 1. The "excluded middle" concept is then
abandoned, and more flexibility is given in specifying the characteristic function.
In view of the above, the mathematical logic can also be modified. Interestingly.
the classical logic was actually extended as early as 1930 by Lukasiewicz. who
proposed the inrmite-valued logic. As stated by Giles (1981). "Lukasiewicz logic
is exactly appropriate for the fonnulation of the 'fuzzy set theory' first described by
Zadeh; indeed, it is not too much to claim that is related to fuzzy set theory exactly
as classical logic is related to ordinary set theory."
The theory of fuzzy sets has been successfuUy applied in the modeling of
ill-defined systems in a variety of disciplines (cognitive psychology, information
processing and control, decision-making sciences. biological and medical sciences,
sociology and linguistics. image processing and pattern recognition. and artificial
inteUigence).
Willaeys and Malvache (1979) investigated the perception of visual and
207

vestibular infonnation in a "watch and decision" or industrial inspection (control)


tasks. The imprecise nature of the human problem solving procedures was related
to the "shaded" sttategy of the operator's perception and to the "hard-to-predict"
environment of the man-machine environment. The labels of fuzzy sets used by
the operaux- to describe different physical variables of the task were identified, and
the fuzzy model of the process-control task was formulated.
Benson (1982) developed an interactive computer graphics program for
analytical tasks which are not well defined or utilize imprecise data. Color scales
were used to model subjectively defmed categories under investigation. Such fuzzy
categories were then presented to the analyst. The use of a linguistic approach
allowed the identification of membership for different categories of description of
visual inspection. The perceptual properties of color proved to be useful in
selective focus attention and in distinguishing or disregarding variations between
imprecisely defined categories.
Karwowski and others (1988, 1984a and 1984b) developed a fuzzy set
based model to assess the acceptability of stresses in manual lifting tasks.
Measures of acceptability were expressed in tenns of membership functions which
described the degrees to which the combined effect of biomechanical and
physiological stresses were acceptable to the human operator. The combined
acceptabilities of a lifting task were similar to the subjective estimations of the
overall task acceptability established by the subjects in a psychophysical
experiments.
Terano et at. (1983) introduced a fuzzy set approach into fault-tree
analysis, and studied the fuzziness of a human-reliability concept from the man-
machine systems safety point of view. Kramer and Rohr (1982) developed a fuzzy
model of driver-behavior based on simulated visual pattern processing in lame
control. Saaty (1977) distinguished two types of fuzziness in layman perception
(for example, perception of illumination intensity) and fuzziness in meaning,
advocating that fuzziness is a basic quality of understanding. Hirsh et at. (1981)
used a fuzzy dissimilitude relation to describe human vocal patterns.

FUZZY·SET THEORETIC MODELING


OF HUMAN·COMPUTER INTERACTION
The interaction between people and computers reflects the cognitive
imprecision of the data and uncertainty exhibited in the user's perception of the
computing environment, including the limitations of the computer software used.
Since human reasoning is not precise, the human-computer interaction (HCI)
should be imprecision-tolerant, and should allow for the inexact mode of
communication (Karwowski et at., 1990). Recent developments in fuzzy
methodologies, fuzzy computing, and fuzzy hardware (computers based on fuzzy
logic processing units), created a ~t of new possibilities for the development of
vagueness-tolerant human-computer interfaces.
208

Human-computer int~rllction system


The human-computer interaction system (HCIS) can be formally defmed
(Karwowski et al., 1990) as a quintuple:

HCIS =(T, U, C, E,I) (1)


whfre: T - task requirements (Physical and cognitive)
U - user characteristics (physical and cognitive)
C -computer characteristics (hardware and software including
computer interfaces)
E - an environment
I - a set of interactions.

The set of interactions I embodies all possible interactions between T,


U, C in E regardless of their nature or strength of association. For example, one
of the possible interactions can relate to the data stored in the computer memory
and the corresponding knowledge, if any, of the user. The interactions I can be
elemental, i.e. one to one association, or complex, such as an interaction between
the user, the particular software used to achieve the desired task, and available
physical interface with the computer. Also, the elemental interactions do not have
to directly involve the user. For example. the interaction may involve only T and
C components. It should be pointed out that the elemental interaction between U
and C reflects the narrow concept of the traditional human-computer interface.
In human-computer interaction. the uncertainty and imprecision due to
vagueness (or fuzziness) stems from the high complexity of human-computer
systems as well as the nature of computer user's perception and thought processes.
As pointed out by Zadeh (1973), the key elements in human thinking are linguistic
descriptors, or labels, of classes of objects with gradation of membership of their
elements. i.e., fuzzy sets. Furthermore. the human reasoning is approximate rather
than exact, and is based upon a logical system with fuzzy truths, connectives, and
fuzzy rules of inference (Lakoff, 1973; Kochen. 1975; Hersh and Cararnazza, 1976;
Mamdani and Gaines. 1981; Schmucker, 1984; Karwowski and Mital, 1986;
Smithson, 1987).

Fuwn~ss in Hel r~s~lIrch


Recently, there have been some initial attempts to incorporate fuzziness
in the HCI research. Simcox (1984) presented a method to determine compatibility
functions that describe the degree of implied attribute of the visual display and the
linguistic category that summarizes values of this attribute. Such compatibility
functions were postulated to be useful in the construction of the computer graphs
as a communication mode. Boy and Kuss (1986) have proposed a fuzzy method for
modeling of human-computer interactions in information retrieval tasks, and
implemented their method in the computer-based library retrieval system (BIBLIO).
Recently, Hesketh et al. (1988) developed a computerized method for fuzzy graphic
rating scale using the FUZRATE program which feeds back to the user his/her
fuzzy ratings. and then presents the results of combining these ratings.
209

THE GOMS MODEL


One of the recently proposed models of computer user's information-
processing is the GOMS concept (Card et al., 1983). According to the GOMS
model, the user's cognitive structure consists of four components 1) a set of
Goals, 2) a set of Operators, 3) a set of Methods for achieving the goals, 4) a set
of Selection Rules for choosing among competing methods for goals. These
components can be further dermed as follows:

1) Goals:A goal is a symbolic structure that defines a state of affairs to


be achieved and determines a set of possible methods by which it may
be accomplished.
2) Operators:Operators are elementary perceptual, motor, or cognitive
acts, whose execution is necessary to change any aspect of the user's
mental state or to affect the task environment
3) Methods: Methods describe procedures used by the user to accomplish
a goal. Methods have a chance of success distinctly less than certain,
because of the user's lack of knowledge or appreciation of the task
environment This uncertainty is a prime contributor of the problem-
solving character of a task; its absence is a characteristic of a
cognitive skill.
4) Selection Rules:Rules for predicting from knowledge of the task
environment which of several possible methods will be selected by
the user in order to accomplish a specific goal.

In 1983, Card et al. devised a text-editing experiment to show the validity


of GOMS model. In one experiment subjects were told to perform simple line
location tasks and the methods that each subject used to locate a line was recorded.
From sample editing sessions, the methods and the associated selection rules for
locating a line were inferred. The study concluded that the GOMS knowledge
representations were valid for such tasks.

FUZZY GOMS MODELING


Card et al., (1983) noted possible extensions of the GOMS model.
Among these were "the assurance that a GOMS description can be given for a
display oriented editor" and methods for improving the accuracy of the predictions
of user's actions. It was also suggested that "the probabilistic selection rules and
conditionalities for predicting which method the user will employ and for
expressing probabilistic conditionality within those methods" be explored
Another enhancement to the GOMS model would be the ability to
account for uncertainty within sele(;tion rules (Karwowski et at., 1989). The
original GOMS study inferred, from user behavior, rules such as "If the number of
lines to the next modification is less than 3 then use the LF-METHOD; else use
the QS-METHOD." This type of rule assumes perfect knowledge and absolute
certainty of the user's cognitive ability to observe, at a glance, the number of lines
210

to the next change.


In order to account for natural fuzziness of the above human-computer
interactions, the GOMS model was recently extended by allowing its components
to assume precise, probabilistic or fuzzy values. Such preliminary generalization
of the GOMS model proposed by Karwowski et al. (1990) is given in Table 1.
The Goals, Operators and Methods components can either be precise or fuzzy,
while the Selection Rules are expressed in either probabilistic or fuzzy (as
possibilistic or linguistic inexactness) manner.

Table 1. Generalized computer user's cognitive structure based on GOMS


model and nature of model components (after Karwowski et al, 1990).

Structure Goals Operators Methods Selection rules


category (description) (nature of acts) (description) (reasoning
processes)

1 Precise Precise Precise Probabilistic


2 Precise Precise Precise Fuzzy
3 Precise Precise Fuzzy Probabilistic
4 Precise Precise Fuzzy Fuzzy
5 Precise Fuzzy Precise Probabilistic
6 Precise Fuzzy Precise Fuzzy
7 Precise Fuzzy Fuzzy Probabilistic
8 Precise Fuzzy Fuzzy Fuzzy
9 Fuzzy Precise Precise Probabilistic
10 Fuzzy Precise Precise Fuzzy
11 Fuzzy Precise Fuzzy Probabilistic
12 Fuzzy Precise Fuzzy Fuzzy
13 Fuzzy Fuzzy Precise Probabilistic
14 Fuzzy Fuzzy Precise Fuzzy
15 Fuzzy Fuzzy Fuzzy Probabilistic
16 Fuzzy Fuzzy Fuzzy Fuzzy

FUZZY GOMS MODEL: PILOT STUDY


The example presented below refers to the generalized GOMS structure
category #4, where the set of Goals and the set of Operators are precisely defined,
while the (predicted) Methods used by the subjectS as well as specific Selection
Rules applied to accomplish the editing task were based on fuzzy modeling
concepts, including application of the linguistic values, fuzzy connectives and
fuzzy logic, and possibilistic measures of uncertainty. Such model is referred to as
the Fuzzy GOMS model.
211

Karwowski et al. (1989) reported an experiment performed to validate the


fuzzy-based GOMS model for text editing task. The experiment was a variation of
the manuscript editing experiment by Card et al. (1983). The experiment consisted
of the following steps:

1. The subject performed a familiar text editing task using a screen


editor (VI)
2. The methods by which the subject achieved his goals (word location)
as well as selection rules were elicited
3. It was established that many of the rules had fuzzy components
4. Several compatibility functions for fuzzy terms used by the subject
were derived
5. The possibility measure was used to predict the methods that the
subject would use
6. The selected methods were compared to non-fuzzy predictions and
actual experimental data

The subject did not know the file to be edited. The task was performed
from the subject's own office and desk. The subject was familiar with and
regularly used the VI screen editor.

Knowledge elicitation
The knowledge engineers can use sample runs to infer the rules by which
the subjects select their preferred methods of editing text. An additional benefit
from a GOMS perspective would be in structuring knowledge elicitation. For
example, the expert could be prompted to present the methods and the selection
rules and respond in the following manner: "IF the condition X exists and the
condition Y exists, THEN use method Z." For example, while performing a
task, the subject could be asked to describe why he chose a particular method:
[Subject: The word (to be changed) is more than half of a screen down. so I
will use the control-D method and then return-key to the word."}
[Knowledge
Engineer: "How strongly do you feel that it is more than half!"/
[Subject: "Very strong. say 0.8."}

The actual distance to the word was measured directly and found to be, 39
lines. So the degree of membership of belonging to the more than half class was
0.8 for 39 lines. By having the subject perform many tasks while verbalizing the
rules, the methods used, and membership of fuzzy quantifiers can be found.

Experimental methods for pilot study


The results of a pilot study reported by Karwowski et at. (1989) are
discussed here in detail. The subject utilized the following five methods to place
the cursor on the word(s) to be changed: 1) Control- D: scrolls down one half of
a screen; 2) Control-F: jumps to the next page; 3) Return Key: moves the
212

cursor to the left side of the page and down one line; 4) Arrow Up or Down:
moves the cursor directly up or down; and 5) Pattern Search: places the cursor on
the first occurrence of the pattern.
The subject verbalized five cursor placement rules and seven fuzzy
descriptors. The following rules were used: 1) If the word is more than half of a
screen from the cursor and on the same screen or if the word is more than half of a
screen from the cursor and across the printed page then use method #1; 2) If the
word is more than 70 lines and the pattern is not distinct then use method #2; 3)
If the word is less than half of a screen and on the left half of the page use method
#3; 4) If the word is less than half of a screen and on the right half of the page
use method #4; and 5) If the word is distinct and more than 70 lines away use
method #5.
An example of the compatibility functions for the "right hand side of the
screen" descriptor elicited in the experiment is given in Figure 2. The knowledge
engineer assumed that the subject did not have the perfect cognitive ability to
divide a screen directly in half, and rather elicited the knowledge as fuzzy
knowledge. For all descriptors, the membership functions were perceived numbers
of lines or characters, except the distinct and non-distinct descriptors. The distinct
and non-distinct descriptors were given as counts of failed pattern recognitions and
served, basically, to predict the patience of the user.

RIGHT HAND SIDE OF THE SCREEN

Q.
X
1
0.9
0.8
1
_ _L
!
t- - - I-

I ~ --
. , l
j

CI)
I I
0:: I i i , j i
t&J 0.7
ID
~ 0.6
I I ! i I
.- .-
I
.
I
/ !
1 1 i :
t&J
~ 0':;
I
i
I
i 1/1 i; i I i
I
!
, I
I
I I , j . I iJ I
I I
I
I
I
i
l&.
0
0.4
i
, , : I ~ I i
003
t&J
0 0.2
I
I ! !
II
~ J ......10"'"
"
0. 1
0
.Y ..
6 7 8 9 10 11 12 13 14 IS 16 17 18 19 20 21 22 2J 24 2S

NUUBER OF CHARACTERS FROM LEFT HAND SIDE

Figure 2. Fuzzy descriptor for the "right hand side of the screen"
(after Karwowski et aI., 1990).
213

Example of rule selection procedure


Once all the rules, methods, and corresponding membership functions
have been elicited, the theory of possibility (Zadeh, 1978) was used to model the
expert's rule selection process. For this purpose, each of the potential rules was
assigned a possibility measure equaito the membership value(s) associated with it
during the elicitation phase of experiment. The possibility measure x(A) was
defmed after Zadeh (1978) as follows:

x(A) = Poss {X is A} => sup min (f A(U), Xx (u)}, (2)

where Xx (u) is the possibility distribution induced by the proposition (X is Z), and
A is a fuzzy set in the universe U.
The following sub-task is used to illustrate the process of predicting the
rule selection based on the linguistic inexactness of expert's actions. Sub-Task:
Move down 27 lines to a position in column 20. The following rules
(R) apply:

Rule #1 :Membership value of more lhan a half oflhe screen = 0.4


[The possibility that the rule applies is 0.4.1

Rule #2:Membership value of more lhan 70 lines = 0


[The possibility that the rule applies is 01.

Rule #3:Membership value of less lhan half of lhe screen = OJ, and
Membership value of left hand side of lhe line = 0.4
[The possibility that the rule applies is 0.3 and 0.4.]

Rule #4:Membership value of righl half of line = 0.9, and


Membership value of less lhan half of lhe screen = OJ
[The possibility that the rule applies is 0.3 and 0.9.1

Rule #S:Membership value of more than 70 lines = 0


[The possibility that the rule applies is 01.

The possibility measure of the possibility distribution of X that the


subject would select a given rule form the universe of available rules R was
defmed after Zadeh (1978). In case of the example cited above, the most applicable
rule was derived based on the possibility measure of {X is Rule #}as follows:
Poss {X is Rule #}=MAX [{(Rule#l, 0.4)} , {(Rule#2, 0»), MIN {(Rule#3, 0.3),
(Rule#3, 0.4)}, MIN { (Rule#4, 0.3 ) , (Rule#4 , 0.9»), {(Rule#S, O)}]=MAX
[{(Rule#l.. 0.4)}, {(Rule#2, '0 »),{(Rule#3, OJ»), {(Rule#4, OJ)}, {(Rule#S, O)}]
= {(Rule#l, 0.4)} .
Given the set of five applicable rules (R), the possibility of selecting
Rule#1 as the most applicable one is 0.4. It was predicted based on the
214

possibilistic measure of uncertainty that the subject would use Rule #1, i.e. the
CONTROL-D method. All fuzzy model predictions in the experiment were
checked against the selection rule decisions made by the sUbjects.

Rtsults 0/ tht pilot study


In the pilot study reported by Karwowski et aI. (1989), one model was run
using the fuzzy GOMS approach to the cursor placement task. Out of seventeen
decisions, the fuzzy GOMS model predicted 13, or 76% correctly. Another run
was made by replacing fuzzy quantifiers with the non-fuzzy rules. The non-
fuzzy GOMS model predicted only 8, or 47% of the cursor placement decisions
correctly.

Table 2. Sample #1 : cursor placement rules for the pilot study


(after Karwowski et aI., 1989).

WORD LOCATION METHODS

Number of Word's column Method Fuzzy Non-fuzzy


lines down number used prediction prediction

8 15 3 3,4 3
27 20 3 1 3
12 14 4 4 3
21 20 4 4 3
44 21 5 1 1
11 24 1 3 3
10 29 3 3 3
31 29 1 1 3
26 18 1 1 3
7 24 4 4 3
29 22 1 1 3
101 25 2 2 2
100 22 5 5 5
7 5 4 3 3
4 42 4 4 4
70 21 1 1 1
12 20 4 4 3

It was also observed that the use of fuzzy concepts seemed very natural
within the knowledge elicitation process. It seemed much easier to ask for fuzzy
memberships in the linguistic terms, than it would be to tty and ascertain exact cut-
offs for selection rules. This observation supports the results of the study by
215

Kochen (1975) who concluded that a higher degree of consistency in subjects


response was found if they were allowed to give imprecise (verbal) descriptors of
fuzzy concepts.

FUZZY INTERACTIONS: MORE RESULTS


Five subjects, graduate engineering students, participated in the main
laboratory experiment reported by Karwowski et al. (1990). Subjects were asked to
perform word placement while explaining what and why they were choosing their
particular methods and the associated selection rules. If these selection rules
appeared to have fuzzy components, these components were quantified by asking
the subject to verbalize a membership value for the applicability of the rule. It
was noted that fuzziness was based upon the participants cognitive ability to
measure terms such as "about one half of a page".
For example, if the rule displayed a fuzzy component, either the paper or
the screen (depending upon how the subjects referenced the fuzzy term) was pointed
to, and the subjects were asked questions such as: "From 0 to 100 how much does
this case belong to the class of FAR?" The resulting value was used to define the
corresponding membership functions.
One antecedent, universally identified, was: "If the word to be located is
distinct, then." The subjects were asked to determine whether the word to be
located was distinct or not (binary decision). Later, each participant was asked to
rate the distinctness as a fuzzy number from 0 to I. A somewhat surprising result
was that, on the whole, participants were more correct in choosing the fuzzy
"distinctness" as opposed to the binary, non-fuzzy "distinct" category. Once the
methods and the selection rules were elicited, the subjects were asked to perform
similar word placement tasks on a different file. The methods which they used
were recorded, and this case served to test the validity of both the fuzzy and non-
fuzzy models.
For each of the fuzzy components of the text editing task, the uncertainty
was quantified by presenting the subject with different scenarios. The curser would
be placed on the screen and pointed to a word asking the degree such a scenario
belonged to one of the fuzzy sets. An exhaustive collection of points was not
conducted, but rather only a few points taken and interpolated (graphically) between
the points.
The results of the main swdy are first illustrated using one subject only.
Subject #2 utilized two methods to place the cursor at the given word. This
reflected the subject's perception of the task as not an editing task, but rather a
word location task. The methods used were: 1) Search/or the pattern (lxxx would
search/or the next occurrence 0/ pattern xxx), and 2) Search/or a near pattern. The
two rules utilized by the subject were simple: 1) 1/ the word is "distinct" thl!n use
method #1, otherwise, 2) Use method #2.
The subject was asked to rate whether each word was distinct (for non-
fuzzy analysis), and then later asked for the "distinctness" or a fuzzy number for
each word (in essence giving a fuzzy rating). For simplicity, the subject was only
asked to rate the 20 words to be searched for and not the words located nearby.
216

This was not ideal because another rule was noted (but not verbalized): "If there is
a 'very distinct' word 'near' the word to be located, search for that pattern instead. "
Table 3 shows the word number, distinctness ratings, methods actually
used, and the non-fuzzy model predictions (differentiated by using the concept of
distinctness to predict the subject's keystrokes). It is obvious that in the case of
subject #2, the results were not conclusive, and did not imply that fuzziness helps
in the GOMS modeling. The non-fuzzy model correctly predicted 55% of the
keystrokes, while the fuzzy model predicted 60% of the keystrokes. This low
rating may be due to the fact that the rules elicited were not those used, and that the
relationship between concept of distinctness and the methods used could depend on
the distance to the searched word

Table 3. Example of results for subject #2 (after Karwowski et aI., 1990).

Wmi Number Distinct Grade of Method Non-fuzzy Fuzzy


number of lines (yes/no) distinctness used prediction prediction

1 34 Y 0.75 S S* S*
2 12 Y 0.8 S S* S*
3 52 Y 0.75 SN S S
4 116 N 0.6 SN SN* S
5 8 N 0.3 S SN SN
6 30 N 0.35 S SN SN
7 44 N 0.65 S SN S*
8 118 Y 0.4 S S* SN
9 12 N 0.55 S SN S*
10 54 N 0 SN SN* SN*
11 13 N 0 SN SN* SN*
12 16 N 0.35 S SN SN
13 171 N 0 SN SN* SN*
14 25 N 0.35 S SN SN
15 4 N 0 SN SN* SN*
16 4 Y 0.4 S S* SN
17 38 N 0.6 S SN S*
18 198 Y 0.45 SN S SN*
19 16 N 0 SN SN* SN*
20 14 Y 0.8 S S* S*

Rate of correct model predictions 11/20 12/20


(55%) (60%)

S =Direct pattern search (method #1)


SN =Search pattern near word (indirect pattern search: method #2)
* =Correct prediction
217

Model predictioll co",parisoll


Table 4 shows a summary of prediction performance for both models
and all subjects. Overall, across all subjects and trials, the DOD-fuzzy GOMS
model successfully predicted 58.7% of the responses, while the fuzzy GOMS
model predicted 82.3% of the subjects decisions. The Wilcoxon test showed that
this difference was highly significant ( [chi-square statistics] Xl:: 9.95, P < 0.01).

Table 4. Summary of experimental results for the main study


(after Karwowski et al., 1990).

Success rate (correct prediction)

SUbject Number Non-fuzzy GOMS Fuzzy GOMS


number of trials prediction rate prediction rate

1 20 11 (55.0%) 12 (60.0%)
2 74 35 (47.0%) 63 (85.1%)
3 26 19 (73.0%) 22 (84.6%)
4 27 19 (70.4%) 21 (76.9%)
5 153 92 (60.1%) 129 (84.3%)

Total 300 176 (58.7%) 247 (82.3%)

Several interesting observations were made through this expansion of the


experimental data. The most important one was that in many cases adding the
fuzzy functions helped tremendously in clarifying the meaning of rules.
Specifically, a fuzzy definition of "distinctness" proved to be superior (in many
cases) to its binary definition. Although the addition of fuzziness to the model
structure could be seen as a "fine tuning" taking place in the elicitation process,
this was nOl always the case (for example see results for subject #2).

CONCLUSIONS
Fuzzy methodologies can be very useful in the analysis and design of man-
machines systems in general, and human-computer interaction systems in
panicular, by allowing to model vague and imprecise relationship between the user
and computer. In order for this premise to succeed, one must identify the sources
of fuzziness in the data and communication schemes relevant to the human-
computer interaction. By incorporating the concept of fuzziness and linguistic
inexactness based on possibility theory into the model of system performance,
better performance prediction for human-computer system may be achieved.
The imprecision-tolerant communication scheme for human-computer interaction
218

tasks should be based on fuzzy-theoretic extension of the GOMS model. In order


to realize the potential benefits of fuzzy communication scheme, the natural
fuzziness of the Operators. Methods and Selection Rules of the GOMS model
should be modeled in order to allow the user to communicate with the computer
system in a vague but intuitively comfortable way.
Since fuzziness plays an essential role in human cognition and performance,
more research is needed to fully explore the potential of this concept in the area of
human factors. It is believed that the theory of fuzzy sets and systems will allow
one to account for natural vagueness, nondistributional subjectivity, and
imprecision of man-machine systems which are too complex or too ill-defined to
admit the use of conventional methods of analysis.
A formal treatment of vagueness is an important and necessary step toward more
realistic handling of imprecision and uncertainty due to human and behavior
through process at work. It is our view that the theory of fuzzy sets will prove
successful in narrowing the gap between the world of the precise or "hard" sciences
and the world of the cognitive or "soft" sciences. This can be achieved by
providing a mathematical framework in which vague conceptual phenomena where
fuzzy descriptors, relations, and criteria are dominant (Zimmermann, 1985) can be
adequately studied and modeled.

ACKNOWLEDGEMENTS
We are indebted to Mrs. Laura Abell, Secretary at the Center for Industrial
Ergonomics, University of Louisville, for her work on preparation of the
manuscript

REFERENCES
AUDLEY, R. J., ROUSE, W., SENDERS, T., and SHERIDAN, T. 1979, Final
report of mathematical modelling group, in N. Moray (ed.), Mental WorlcJoad.
Its Theory and Measurement. (plenum Press, New York), 269-285.
BENSON, W. H. 1982, in Fuzzy Sets and Possibility Theory. R. R. Yager (ed.),
(pergamon Press, New York).
BERNOTAT, R. 1984, Generation of ergonomic data and their application to
equipment design, in H. Schmidtke (ed.), ErgonorrUc Data for Equipment
Design. (plenum Press, New York), 57-75.
BEZDEK, J. 1981, Pattern Recognition with Fuzzy Objective Function
Algorithms (plenum Press: New York).
BOY, G. A., and KUSS, P. M. 1986, A fuzzy method for modeling of human-
computer interactions in information retrieval tasks, in W. Karwowski and
A. Mital (eds.), Applications of Fuzzy Set Theory in Human Factors.
(Elsevier: Amsterdam),117-133.
BROWNELL, H. H. and CARAMAZZA, A. 1978, Categorizing with overlapping
categories, Memory and Cognition, 6, 481490.
CARD, S. K.,MORAN, T. P., and NEWELL, A. 1983, The Psychology of
Human-Computer Interaction (London: Lawrence Erlbaum Associates.
219

CHAPANIS, A. 1959, Research Techniques in Human Engineering. (The John


Hopkins Press, Baltimore).
GILES, R. 1981, Fuzzy Reasoning and Its Applications. E. H. Mamdani and B.
R. Gaines (eds.), (Academic Press, London).
HARRE, R. 1972, The Philosophies of Science (Oxford University Press,
London).
HERSH, H. M., and CARAMAZZA, A. 1976, A fuzzy set approach to modifiers
and vagueness in natural language, Journal of Experimental Psychology:
General, 3, 254-276.
HESKETH, B., PRYOR, R., GLEI'IZMAN, M., and HESKETH, T. 1988,
Practical applications of psychometric evaluation of a computerized fuzzy
graphic rating scale, in T. Zeteni (ed.), Fuzzy Sets in Psychology. (North-
Holland: Amsterdam), 425454.
HIRSH, G., LAMOTTE, M., MASS, M. T., and VIGNERON, M. T. 1981,
Phonemic classification using a fuzzy dissimilitude relation, Fuzzy Sets and
Systems. 5, 267-276.
KARWOWSKI, W. and AYOUB, M. M. 1984a, Fuzzy modeUing of stresses in
manual lifting tasks, Ergonomics. 27,641-649.
KARWOWSKI, W., AYOUB, M. M., ALLEY, L. R., and SMITH. T. L. 1984b,
Fuzzy approach in psychophysical modeling of human operator-manual lifting
system, Fuzzy Sets and Systems. 14,65-76.
KARWOWSKI, W., and MITAL, A., (Editors), 1986, Applications of Fuzzy Set
Theory in HUI1Ul1l Factors (Elsevier: Amsterdam).
KARWOWSKI, W., MAREK, T. and NOWOROL. C. 1988, Theoretical basis of
the science of ergonomics, in Proceedings of the 10th Congress of the
International Ergonomics Association. Sydney, Australia, (Taylor & Francis,
London) 756-758.
KARWOWSKI, W., KOSIBA, E., BENABDALLAH, S., and SALVENDY, G.
1989, Fuzzy data and communication in human-computer interaction: for bad
or for good, in G. Salvendy and MJ. Smith (eds.), Designing and Using
Human-Computer Interfaces and Knowledge Based Systems. (Elsevier:
Amsterdam),402409.
KARWOWSKI, W., KOSIBA, E., BENABDALLAH, S. and SALVENDY, G.
1990, A Framework for development of fuzzy GOMS model for human-
computer interaction, International Journal of Human-Computer Interaction.
2, 287-305.
KOCHEN, M. 1975, Applications of fuzzy sets in psychology, in L. A. Zadeh, K.
S. Fu, K. Tanaka and M. Shimuro (eds.), Fuzzy Sets and Their Applications
to Cognitive and Decision Processes. (Academic Press: New York), 395408.
KRAMER, U. 1983, in Proceedings of the Third European Annual Conference on
Human Decision MaJdng and Manual Control. (Roskilde, Denmark), 313.
KRAMER, U. and ROHR, R. 1982, in Analysis. Design and Evaluations of Man-
Machine Systems. G. Johannsen and J. E. Rijnsdorp (eds.), (pergamon Press,
Oxford),31-35.
220

LAKOFF, H. 1973, A study in meaning criteria and the logic of fuzzy concepts,
Journal of Philosophical Logic, 2, 458-508.
MAMDANI, E. H., and GAINES, B. R., (Editors), 1981, Fuzzy Reasoning and Its
Applications, (Academie Press: London).
MCCLOSKEY, M. E. and GLUCKSBERG, S. 1978, Memory and Cognition. (;,
·462-472.
MCCORMICK, E. J. 1970, Human Factors Engineering, (McGraw Hill, New
York).
ODEN, G. C. 1977, Human perception and performance, Journal of Experimental
Psychology. 3, 565-575.
PEW, R. W. and BARON, S. 1983, Automatica. 19,663-676.
ROHMERT, W. 1979, in N. Moray (Ed.), Mental Workload, Its Theory and
Measurement. (plenum Press, New York), 481.
SAATY, S. L., 1977, Exploring the interface between hierarchies, multiple
objectives and fuzzy sets, Fuzzy Sets and Systems. 1,57-68.
SCHMUCKER, K. J. 1984, Fuzzy Sets. Natural Language Computations. and
Risk Analysis. (Computer Science Press, Maryland).
SIMCOX, W. A. 1984, A method for pragmatic communication in graphic
displays, Human Factors, U, 483-487.
SINGLETON, W. T. 1982, The Body at Work. Biological Ergonomics.
(University Press, Cambridge).
SMITHSON, M. 1982, Applications of fuzzy set concepts to behavioral sciences,
Mathematical Social Sciences. 2,257-274.
SMITHSON, M. 1987, Fuzzy Set Analysis for Behavioral and Social Sciences
(Springer-Verlag: New York).
TERANO, T., MURAYAMA, Y., AKUAMA, N. 1983, Human reliability and
safety evaluation of man-machine systems, Automatica. 19, 719-722.
TOPMILLER, D.A. 1981, in Manned Systems Design: Methods, Equipment and
Applications, J. Moraal and K. F. Kraiss (eds.), (plenum Press, New York), 3-
21.
WILLAEYS, D. and MALVACHE, N. 1979, in Advances on Fuzzy Set Theory
and Applications. in M. M. Gupta, R. K. Ragade and R. R. Yager (eds.),
(North Holland, Amsterdam).
ZADEH, L. A. 1965, Fuzzy sets, Information and Control. 8, 338-353.
ZADEH, L. A. 1973, Outline of a new approach to the analysis of complex
systems and decision processes, IEEE Trans. Systems. Man. and Cybernetics,
SMC·3, 28-44.
ZADEH, L. A. 1974, Numerical versus linguistic variables, Newspaper of the
Circuits and Systems Society, 7, 3-4.
ZADEH, L. A. 1978, Fuzzy sets as a basis for a theory of possibility, Fuzzy
Sets and Systems: 1, 3.
ZIMMERMANN, H. J. 1980, Testability and meaning of mathematical models in
social sciences, Mathematical Modeling. 1. 123-139.
ZIMMERMANN, H. J. 1985, Fuzzy Set Theory and Its Applications, (Kluwer-
Nijhoff Publishing, Boston).
10
QUESTIONNAIRES AND FUZZINESS

Bernadette Bouchon-Meunier
CNRS, LAFORIA, Universite Paris VI, Tour 46
4 place Jussieu, 75252 Paris Cedex OS, France

INTRODUCTION
Questionnaires represent hierarchical processes disjoining the elements of a
given set by using successive tests or operators [12]. They involve the probabilities
of the results of the tests, or the probabilities of the modalities of the operators. In the
case where the tests or operators depend on imprecise factors, such as the accuracy of
physical measurements or the linguistic description of variables, the questionnaires
take into account coefficients evaluating the fuzziness of the data. The construction
of such questionnaires is submitted to several kinds of constraints and requires
appropriate algorithms.

When the questionnaire is only characterized by probabilistic elements, its


average length is generally interesting to minimize in order to improve the efficiency
of the process it represents, with respect to some basic constraints. The tests or
operators can be chosen with regard to the quantity of information they process. The
construction of the most efficient questionnaire is either holistic [11, 12], taking into
account all the tests or operators which must be used, or it is selective, based on the
choice of the most significative tests or operators with regard to the purpose of the
process [2, 13, 14].

If fuzzy criteria are involved, the efficiency of the questionnaire


deals with the specificity of the results it provides. A trade-off must be obtained
between the preservation of some fuzziness in the tests or operators allowing a
flexibility in the management of the available data, and the reliability of the results
obtained through the questionnaire [3,4].
222

The suppo{t of a questionnaire is a finite, directed and valuated graph without


circuit, where every vertex is connected with one of them, called the root, by at least
one path or series of edges (exactly one in the case of arborescent questionnaires).
No edge ends in the root. There exist terminal vertices from which no edge is
descending. Several systems of valuations can be defined for the edges and the
vertices, for instance probabilistic valuations, utility values, coefficients of fuzziness.
The tests or operators are connected to the non-terminal nodes and their possible
results or modalities are associated with the edges descending from this node.

The simplest case of a questionnaire is arborescent. Such a model is


extensively used in various fields and it corresponds to weighted trees. In the
classical probabilistic framework, where the data are associated with a given
uncertainty, applications of arborescent questionnaires exist in the study of search
trees, decision trees, fault trees, species identification, hierarchical classification,
diagnosis assistance, decision-making, preference elicitation, knowledge acquisition
for instance.

When the involved tests or operators are not precisely described, arborescent
questionnaires must take into account both uncertainty and imprecision and they
must lead to conclusions which are acceptable in spite of the imprecision. We present
here several utilizations of questionnaires in a fuzzy framework.

QUESTIONNAIRES WITH LINGUISTIC VARIABLES

Let us consider a given set 0 = {d l •...• dn} of elements to identify. for


instance a set of decisions to make, of diagnosis to identify, of classes to recognize.
which are supposed to be defined without any ambiguity. We suppose that the
probability distribution P = {PI' •••• Pn} is available. with Pj the probability of dj to
be present in the considered world, for 1 ~ j ~ n.

=
We also consider a set Q {qt .... , qm} of so-called "questions". which
represent tests or operators. A question qj is a link between a linguistic variable ~
defined on a universe Ui and a family ofa(i) labels. denoted by qi I ..... qi8 (i) • and
associated with possibility distributions fi 1..... fi a(i). defined on Ui and lying in [0.
1] (see Figurel).
223

qi = The color of the skin is:

~dis~
f..a(i)
I
t ,--
ILL-

II . I IL
redness of the skin redness of the skin
Example of continuous possibility distributions
f.) f.a(i)

pallo! ~...
normality
pal..!. ! .!wshoess
normality
Example of discrete possibility distributions

Figure!

Two different types of problems can be regarded, depending on the fact that the
questions of Q are deterministic or not with regard to the elements of D.
- either there is a possibilisitic relationship between lists of answers to questions of
Q and the elements of D, yielding the possibility of d E D to be concerned in a
studied situation, according to the obtained answers, and the certainty we can have
in this assertion [7]. We construct a questionnaire by successively choosing
questions of Q bringing as much information as possible on the elements of D and
we stop asking new questions when an element of D is sufficiently well identified
(selective construction).
- or there is a precise relationship between lists of answers to questions of Q and
elements of D, and we construct a questionnaire by ordering the questions of Q in
224

such a way that every element of D can be associated with a terminal vertex of the
questionnaire (holistic construction) [1, 3].

SELECTIVE CONSTRUCTION OF QUESTIONNAIRES

(See annex 1 for technical details about this section).


In a probabilistic study, the probabilities prob (qik / dj ), 1 ~ i ~ m, 1 ~
k ~ a(i), 1 ~ j ~ n, of obtaining every label associated with a question would be
given for every element of D. In many cases, there is no means of knowing these
probabilities and the only knowledge we have regarding the simultaneous presence
of a given label and an element of D is possibilistic.

Let us suppose given the possibility 7r(dj / qik) that we are in front of the case
dj of D, for 1 ~ j ~ n, when we obtain the label qik for question qi' for 1 ~ i ~ m, 1
~ k ~ a(i). As there is no absolute certainty that this answer implies that dj must be

identified, we also suppose given the necessity N(dj / qik) quantifying this certainty.
We can also suppose given some knowledge about the fact that the element
dj can be thought of, when an answer different from qik is obtained to question qi :
let 1T(dj /..., qik) and N(dj /..., qik) denote the possibility and the certainty that dj is
acceptable when qik is not obtained. If these values are not precisely known, they
will be replaced [10] by the interval [0, 1] to which they belong.
We fix thresholds sand t in [0, 1], defining the acceptable values [s, 1] and
\
[t, 1] for the lowest acceptable possibility and the lowest acceptable certainty of an
element of D to be satisfying when given labels are obtained for a question.

The problem we consider is the following:


- first of all, how to determine the sequence of questions necessary and sufficient to
identify every element of D as reliably and efficiently as possible, ~
- secondly, how to use this sequence of questions every time we have to recognize
a particular case under study. (slQ2.2)

Applications of this model can be found in knowledge acquisition, in


225

diagnosis assistance. in species identification for instance. The first step corresponds
to the construction of the sequence of questions providing the best recognition of
classes on a training set of examples. the second step is associated with the
identification of the convenient class for an example not belonging to the training
set.

It is obvious that the element d· of D will be immediatly recognized if there


is a question qi yielding an answer qi~ such that N(dj I qik) = 1r(dj I qik) = 1. No
further question will be necessary in this case. but at least one other question must
be asked in the general case.

The first question to be asked will be qi' for I ~ i ~ m, processing the most
efficient information about the elements of D and we propose to evaluate this
efficiency by means of the avera~ certainty Cer(qi) provided by qi on the
recognition of any element of D
Then. the first question to be asked will be qi such that Cer(qi) is maximum.

Now. let us suppose that a sequence Sr = (Xl' .... ~) of questions is not


sufficient to determine an element djo of D such that its possibility to be present.
given the answers it provides to questions Xl' .... ~. is sufficiently high and the
certainty available on its identification is acceptable (see Figure 2).

Iflabels xlk(l)..... x!-(r) are respectively obtained for this sequence Sr of


~ti~. we evaluate the. possibility Pos (djo I XI k(l) ••..• ~k(r) .> ~t d~o ~oul.d
be Identified, and the certainty Nee (djo I XIIC{ I) ••..• Xrk(r) ) that this Identification 18
satisfying.

As the sequence Sr of questions is not sufficient to identify an element of D


with the list of obtained labels xI k(l)..... ~k(r). a new question qi must be asked.
For an obtained label qik • we evaluate the ayem~e certainty Cer(xlk(l).....
~k(r). qik) provided by these (r+ I) questions on any element of D.
We choose the question qi which processes the most efficient information
226

qi = The color of the skin is:

d1
cr ,
IPoisoning 1
~
Figure 2

about D, with regard to all its possible labels, or, equivalently, which gives the
highest absolute certainty C( qi ).

No further question will be asked when a sequence of labels xlk(l), ••• ,


Xrk(r) is obtained for questions xl' •.. , xr,and there exists an element djo of D
such that Pos(djo I xlk(l), •••, Xrk(r» ~ s, and Nee (djo I xl k (1), .. • , Xrk(r» ~ t.
Then djo will be associated with Sf' which is called terminal.

~:
For a new given particular situation co' an element of D must be identified
from the answers to the various questions of the questionnaire we have
constructed.
227

Let Sr be a terminal sequence of questions of Q in this questionnaire. to which Co


provides answers xlk(l) ..... Xrk(r) chosen. for every question. in the list of
available labels. Then. we clearly identify the element of D associated with Sr'

As the labels associated with every question are not precise. we must accept
that an answer is provided in a way somewhat different from the expression we
expect in the list of authorized labels. Let us denote by q'i the label obtained as an
answer to question qi' 1 ~ i ~ m. more or less different from all the qik. 1 ~ k ~ a(i).
and by gi the possibility distribution describing q'i • defined on Ui and lying in [0.
1]. (See Figure 3)

The color of the skin is:

Figure 3

The compatibility of this answer q'j with one of the labels qjk proposed for qj•• with
l~~a(i). is measured by the classical possibility and necessity measureS of
adeqyation [9] respectively denoted by 7[( qi~ q'i) and N( qik.. q'i)'

We deduce the possibility xkcdj} that dj is concerned by the particular


228

situation co' according to the proximity of its answer with qik • and the certainty of
this assertion Nk(dj ). This evaluation will be performed for the labels qik such that
7r{ qik ; q'i) ~ s and N( qik ; q'i) ~ t. It is then possible to have several sequences of
questions to use. i.e. several pathes of the questionnaire to follow before the
recognition of a particular element of D.

More genera1ly. let us consider again a terminal sequence Sr of questions


leading to the identification of djo in the questionnaire. Because of the differences
which may exist between the expected answers to these questions. and the labels
obtained from the particular case co' the possibility and certainty of djo will be the
following:
Pos (~o I xlk(l) •••.• Xrk(r» = minl~~ Jc(i)~djo)'
Ncc(djo I xlk(I)••••• Xrk(r»=maxg~~l)(djo).

The element djo will be definitely identified for the situation co' by means
of the sequence of questions Sf' if there exists labels Xl k( I) •.•.• Xrk(r) yielding Pos
d, I k(l)
(io Xl .···.xr
k(r)
)~s
and
Ncc(djo IXI
k(l)
.···.xrk(r\.
'}~t.

HOLISTIC CONSTRUCTION OF QUESTIONNAIRES

(See annex 2 for technical details about this scction).

Let us suppose that we want to use all the tests or operators of Q. and we
have to order them in such a way that the questionnaire we construct associates an
element of D with each terminal node. The questionnaire could be arborescent or
not. We suppose that Q and D are compatible. which means that such a construction
is possible.
The problem we consider is the choice of the questions providing the most
efficient questionnaire with regard to the recognition to make. Its quality can be
evaluated [3] with respect to the fuzziness which is involved in the characterizations
deduced from the fuzzy tests or operators. and improved. when several
constructions of questionnaires are possible. by an appropriate choice of a the order
229
of some questions when possible. Applications can be found in search trees, in
species identification, for instance.
Several aspects of such a choice can be proposed [3, 4, 6] and we propose
one method hereunder.

Let us suppose that the labels qik associated with the questions qi of Q are
conveniently defined in such a way that they determine a fuZU partition IIi of the
universe Ui on which the concerned linguistic variable ~ is defined. The classes of
this fuzzy partition are fuzzy subsets of Ui defined by membership functions equal
to fik, lilla(i), in every point of Ui . We suppose given the probability
distribution Pi of the variable Xi' for the studied population.
The problem we consider is the identification of a crisp (non-fuzzy) partition
of Ui, able to represent the information contained in IIi. We may think of several
applications of this problem: in knowledge acquisition, if the training set deals
with crisp data and then non-fuzzy tests or operators, and the new examples are
described by means of fuzzy questions; in decision-making, when a crisp decision
must be taken from fuzzy test s or operators or from the answers provided by the
inquired personto a crisp question qi by indicating preference grades for the
elements qik which are proposed to her; in preference elicitation, when the inquirer
makes a choice between two fuzzy questions about the same variable.

For a given threshold r in [0, 1], we associate with IIi a crisp partition IIi '" of
level r, by defining crisp classes as qik'" = { u I fik(u) ~ r } , lrua(i). Obviously,
such a crisp partition does not exist for any value of r and some thresholds
correspond to several possible crisp partitions. We suppose that the tests or
operators are defmed in such a way that there always exist a value r providing a
crisp partition. We can consider the average weight of each fuzzy label by
introducing its r-probability p{( qik) as the average value of its associated
possibility distribution, for the values at least equal to r.
This generalization of the concept of probability to a fuzzy subset of the
universe allows to measure the fuzzy information I{'"(IIi) processed by IIi for the
threshold r with respect to the crisp partition lIt We use this tool as a measure of
the proximity between IIi and lIt
Let us consider the case where we are given a set of fuzzy operators Q and
we look for the crisp partition associated with each of them, loosing as little
information as possible when passing from fuzzy descriptions to crisp descriptions.
230

For every qik associated with the fuzzy partition TIi of Ui , we choose the crisp
partition ITt such that the fuzzy information I{*(ITi)' processed by ITi for the
threshold r with respect to TIi"', is maximum.
If several tests or operators are available for the same linguistic variable ~
on Ui, the most interesting is the one processing the greatest absolute fuzzy
information with regard to all the possible crisp partitions which could be
associated with it.

REFERENCES

[1] AKDAG, H., BOUCHON, B. (1988) - Using fuzzy set theory in the analysis
of structures of information. Fuzzy Sets and Systems, 3,28.
[2] AURAY J.P., DURU G., TERRENOIRE M., TOUNISSOUX D. ZIGHED
A. (1985) - Un logiciel pour une m6thode de segmentation non arborescente,
Informatiqye et Sciences Humaines, vol. 64.
[3] BOUCHON B. (1981) - Fuzzy questionnaires, Fuzzy Sets and Systems 6, pp.
1-9.
[4] BOUCHON B. (1985) - Questionnaires in a fuzzy setting, in Mana~ment
decision support systems usin, fuzzy sets and possibility theory, eds. J.
Kacprzyk and R.R. Yager, Vertag TUV Rheinland, 189-197.
[5] BOUCHON, B. (1987) - Preferences deduced from fuzzy questions, in smt
o.ptimization models usin, fuzzy sets and possibility theory (J. Kacprzyk and
S.A. Orlovski, eds), D. Reidel Publishing Company pp. 110-120.
[6] BOUCHON, B. (1988) - Questionnaires with fuzzy and probabilistIc
elements, in Combiniu, fuzzy iI!JPfeCision with probbilistic uncertainty in
decision makin, (1. Kacprzyk, M. Fedrizzi, eds.), Springer Verlag, pp. 115-
125.
[7] BOUCHON, B. (1990) - Sequences of questions involving linguistic
variables; in Ap,proximate reasonin, tools for artificial inteUiience, (M.
Delgado, J.1.. Verdegay, eds.), Verlag TUV Rheinland.
[8] BOUCHON B., COHEN, G. (1986) - Partitions and fuzziness, L-..2f
Mathematical Analysis and Applications. vol. 113, 1986.
[9] DUBOIS, D. PRADE, H. (1987) -1Morje des possibilit6s. applications Ala
re.pr6sentation des CQDnaissauces en informatiqye. Masson.
[10] FARRENY H., PRADE H., WYSS E. (1986) - Approximate reasoning in a
rule-based expert system using possibility theory : a case study, in
Information Processin, (HJ. Kugler, ed.), Elsevier Science Publishers B.V ..
[11] PAYNE R. (1985) - Genkey : a general program for constructing aids to
identification. Infonnatiqye et Sciences Httmnjnes, vol. 64.
[12] PICARD C.F. (1980) - Graphs and questionnaires, North Holland,
231

Amsterdam.
[13J TERRENOIRE M. (1910) - Pseudoquestionnaires et information, C.R. Acad.
~ 211 A, pp. 884-881.
[14] M. TERRENOlRE (1910) - Pseudoquestionnaires, 'l'Mse de Doctorat d'Etat,
Lyon.
[15] E. WYSS (1988) - TAIGER, un g6n6rateur de syst~mes experts adapt6 au
traitement de donn6es incertaines et impr6cises, ~, Institut National
Polytechnique de Toulouse.
232

Annex 1:
Possibility and necessity coefficients associated with every element rl.i of D.
when the label qik is obtained as an answer to the test or the operator q. of Q. are
respectively denoted by 7r(dj / <tik) and N(dj / qik). They belong to [0. 1] and they
are such that N(dj / qh ~ 7r(dj / qik), with N(dj / qik) =0 if7r(dj / qh < 1 and 7r(dj /
qik) = 1 if N(dj / qik) ¢ o.

The averaee certainty Cer(qi) provided by a single test or operator qi on the


recognition of any element of D is defined as follows:
Ca(qi)=!I~(i) !gj~ N(d/'lik)Pj. (1)

The possibility Pos (djo / XI k( 1), •.. , "'rk(r) ) that the element djo of D must
be identified, and the certainty Nec (djo / Xl k( 1), ..., "'rk(r) ) that this identification
is satisfying, when labels xl k(l), ••• , "'rk(r) are obtained as answers to tests or
operators Xl"'" Xr> are evaluated by means of the following coefficients :
Pos(djo/Xlk(l), •.• , xl'(r» =minl~iY 7r(rl.io/~k(i», (2)
Nec (rl.io / xl k(l), ••• ,"'rk(r» = DlaXgiY 7r(rl.io / ~k(i». (3)

We define as follows the ayemae cgtainty Cer(xl k(1), ... , Xsk(s~, provided
about the recognition of any element of D, by a sequence of labels xl k(l) •••• ,
Xsk(s) obtained as answers to tests or operators Jt 1' ... , Xs of Q :
Cer( k(l) k(s). - ~ A. kN'..,./d / k(l) k(s). (4)
Xl '''''Xs 'J-~l~t~n-;J ~ j Xl .... ,xs 'JPj'
with~l = lifPos (rl.i / Xl (1), ... , xsk(s~ ~ s, and 0 otherwise.

The absolute certainty of a test or an operator qi of Q, after the sequence of


labels xlk(l), ... , "'rk(r) is obtained as answers to tests or operators Xl' .... Xs is
defined as follows :
C( qi) = (l/a(i» !l~(i) Ca'(xl k(1), ... , "'rk(r), qh (5)

Possibility measure of the adequation of any answer q'i with a given label <lik,
for a question (test or operator) qi of Q:
7r( qik; q'i) = sup {u in Ui } min ( fik(u), ii(u) ), 1 ~ k ~ a(i), (6)
Necessity measure of this adequation :
233

N( qi~ q'i) = inf {u in Ui } max ( 1- fi~u), gi(u) ), 1 ~ k ~ a(i). (7)

For the particular situation co' the possibility ,.f(dj ) that dj is concerned
according to the proximity of the obtained answer q'i with qik , and the certaipty of
this assertion Nk(~), will be evaluated by the following coefficients [10. 15] :
,.f(dj )=max[min{1I( dj / qh.1I( qik;q'i)},min{1I( djjl...,qh.1-N( qik; q'i)}]' (8)
~(dj)=min[max{N ~ / qik).I-1T( qik; q'i)},max{N( dj /..., qik). N( qik; q'i)}]' (9)
As indicated in [10]. the values can be replaced by the interval to which they
belong in the case where they are not precisely known.

Annex 2
A fuzzy partition of the universe Ui on which the concerned linguistic
variable Xi is defined satisfies :
I 1~(i) fi~U) = 1 • for every point u in Ui,
and I {u in Ui} fi~u) >0, for every k. l~~(i).

The r-probability P{( qik) of a fuzzy label qik with regard to the crisp class
qik* is defined by :
p{( qik) = I {u in qik* } fi~U) Pi(u).

The fuzzy information I{'"(IIi) processed by a fuzzy partition IIi of Ui for the
threshoid r with respect to the crisp partition IIi* is defined as follows:
~r"'<ITi) = I l~i) 1.( p{( ~k» I [ I l~i) P{( qb 1.
with the function L(x) =-x log(x).
Properties of this fuzzy information lead to its mazimization in order to have
the best compatibility between a fuzzy partition and any possible associated crisp
partition for a given threshold r.

The absolute fuzzy information processed by a fuzzy partition IIi* with


regard to all the possible crisp partitions which could be associated with it equals :
max { ~r'"<ITi ) / ITt associated with ITi }.
11
FUZZY LOGIC
KNOWLEDGE SYSTEMS AND
ARTIFICIAL NEURAL NETWORKS
IN MEDICINE AND BIOLOGY

Elie Sanchez
Faculty of Medicine, University of Marseille, and
*Neurinfo Research Department
Institut Mediterraneen de Technologie
13451 Marseille Cedexl3, France
ABSTRACT
This tutorial paper has been written for biologists, physicians or beginners
in fuzzy sets theory and applications. This field is introduced in the framework of
medical diagnosis problems. The paper describes and illustrates with practical
examples, a general methodology of special interest in the processing of borderline
cases, that allows a graded assignment of diagnoses to patients. Apattern of medical
knowledge consists of a tableau with linguistic entries or of fuzzy propositions.
Relationships between symptoms and diagnoses are interpreted as labels of fuzzy
sets. It is shown how possibility measures (soft matching) can be used and combined
to derive diagnoses after measurements on collected data.
The concepts and methods are illustrated in a biomedical application on
inflammatory protein variations. In the case of poor diagnostic classifications, it is
introduced appropriate ponderations, acting on the characterizations of proteins, in
order to decrease their relative influence. As a consequence, when pattern matching is
achieved, the final ranking of inflammatory syndromes assigned to a given patient
might change to better fit the actual classification. Defuzzification of results (Le.
diagnostic groups assigned to patients) is performed as a non fuzzy sets partition
issued from a "separating power", and not as the center of gravity method commonly
employed in fuzzy control.
It is then introduced a model of fuzzy connectionist expert system, in which
an artificial neural network is designed to build the knowledge base of an expert
system, from training examples (this model can also be used for specifications of
rules in fuzzy logic control). Two types of weights are associated with the
connections: primary linguistic weights, interpreted as labels of fuzzy sets, and
secondary numerical weights. Cell activation is computed through MIN-MAX fuzzy
equations of the weights. Learning consists in finding the (numerical) weights and
the network topology. This feedforward network is described and illustrated in the
same biomedical domain as in the first part.

• Address for correspondence


236

Keywords : Fuzzy Logic, Linguistic Model, Fuzzy Propositions,


Medical Knowledge Representation, Medical Diagnosis, Soft Matching,
Relative Importance, Defuzzification, Separating Power, Artificial Neural
Networks, Fuzzy Connectionist Expert Systems, Linguistic Weights.

INTRODUCTION
In many situations, physicians use subjective or intuitive judgments. They
cannot always logically, or in simple terms, explain how they derive conclusions,
because of the complex mental processes inherent to the nature of the cases to be
diagnosed, or to the difficulty of recalling their years of ttaining and experience.
Interpretation of biological analyses suffers from some arbitrariness,
particularly at the boundaries of the quantities that are measured, or evaluated. It is
customary to use symbols like +++, ++, +, N, -, - -, - - -, or, ttt, tt, t, N, ,1"
J,J" J,J,J, to denote variations ('N' stands for 'Normal'). In general, limits of values
that characterize abnormalities, or normality, define numerical intervals that are used
to describe standards in variations. First of all, normal or non pathological states,
have to be determined. They constitute the reference to which abnormalities are
specifIed.
Biologists are familiar with normal variation ranges that are a prerequisite to
a proper interpretation of all laboratory tests. Notions of statistical normality are
usually derived from frequency distributions, not always confined to Gaussian
distributions. But, depending on the measurement procedures of a given laboratory,
on the epidemiologist, the biologist or the clinician who manipulates and interprets
measurements, but also on the nature of the populations under study, and on
conditions of physiological (biological) normality, one has to commonly rely on
fiduciallimitts (see for example [1,2] for discussions on normality).
The main drawback in working with intervals to represent normality, or
ranges of variations for abnormalities, is the weak reliability on thresholds.
Moreover, such boundaries are more or less physician dependent in practice. For
example [3], "the normal base-line value for a given individual's lactic acid
dehydrogenase may be at the extreme low point of the normal range for the general
populations. Thus he (the physician) could develop an elevation due to a disease
process that is significant and still within the normal range of the population." Still
in [3], under a table defining the range of normal values for blood chemistry, one
may read: "these ranges are a guide to the normal concentrations of blood
constituents. For accurate interpretations, always refer to normal values established
by individual laboratories, since individual differences in procedures may affect the
actual ranges. n
A problem that is often posed lies in the ill-definition and in the treatment
of the boundaries of the intervals. To cope with borderline cases, fuzzy set theory
provides very natural and appropriate tools. So it is here assumed that imprecision in
the description of variations is of a fuzzy type and terms like "Normal, Slightly
Decreased, Very Increased, etc.," will be treated as labels of fuzzy sets in (possibly
different) universes of discourse. These fuzzy sets represent linguistic intervals, and
around cutoff boundaries, very close points will not be totally accepted or rejected
like in yes-or-no procedures, according to their position with respect to the frontier.
A coding with i's or J,'s is sometimes too restrictive: it is not always
possible to choose between i and ii for example, and in some patterns one may fmd
237
"from i to ii." A scale with degrees ranging from 0 to 1 is very convenient. Note
that it is not needed to set up precise values in [0,1] : in interpreting patterns, it is
sufficient to have a rough idea of the curve expressing the compatibility between
measurements and concepts.
It will now be described a general methodology, illustrated with an
application, of special interest in the processing of borderline cases, and which offers
the physician, practical assistance in obtaining the same results in the same abnormal
profJJ.es.

PATTERN (MEDICAL KNOWLEDGE)


In this paper, a pattern of Medical Knowledge consists of a tableau with
linguistic entries. These linguistic associations are supposed to be given by experts,
having in mind that different experts may provide somehow different characterisations
for a same pattern.
This Medical Knowledge can be intetpreted in terms of fuzzy propositions,
like "Temperature is Slightly_Increased," i.e. of the form "S is F," where S is a
variable (referred to as the name of a Sign, of a Symptom, or generally of an
Attribute) taking values in a universe of discourse U, and F is a fuzzy subset of U.
The tableau expresses relationships between attributes (S) such as temperature,
plasma lipids, arterial pressure, serum proteins, etc. , and diagnoses (A) or groups,
types, syndromes, diseases, etc. The linguistic entries are assumed to be labels of
fuzzy sets (F), or more specifically, fuzzy intervals. Note that the term "diagnosis" is
more or less arbitrary, it is a convenient way to summarize or synthetize
information. In decision processes, symptoms can be viewed as diagnoses and vice-
versa. Characterizations of diagnoses appear in the rows of a tableau as shown in
fig. I.
ATIRIBUIES

SI ... S.1 Sn

tn
~
tn

~ A FI F.
I
F
n
~
is

Fig. 1 - Tableau with linguistic entries represented by fuzzy sets.

In this tableau, Si (i=l,n) is the name of a variable (Sign, Symptom, or Attribute)


taking values in a universe of discourse Ui' and Fi is a fuzzy subset of Ui. For
example, in a typical serum protein pattern [3], one may fmd the tableau of fig. 2.
238

SERUM PROTEINS

Total Albumin Glo~lin Glottrtin Gl06.rtin


protein

Decrease Decrease Frequent Slight


Decrease Increase

Fig. 2 - Part of a serum protein pattern.

The Medical Knowledge represented by a generic Diagnosis L\ in the tableau of


fig. I is interpreted as conjunctions (ANDs) of elementary propositions :
L\ IF PI ANI! ... AND. Pi ... AND. Pn ,
where for i=l,n, Pi takes the form "Si is Fi'" For example (see fig. 2) :
Cirrhosis IE Total proteins (Sl) are Decreased (FI)
ANI! ... ANI! a-Globulins (Si) are Frequently Decreased (Fi)
ANI! ... ANI! 'Y-Globulins (Sn) are Increased (Fn)'
Here is another example [4], in the framework of inflammatory protein variations
Vasculitis IE C3-Complement Fraction is Decreased or Normal
ANI! Alpha-I-Antitrypsine is Decreased or Normal
ANQ Orosomucoid is Increased
AND Haptoglobin is Very Increased
ANI! C-Reactive Protein is Very Increased.
In the characterisation of Vasculitis, one has for example "Haptoglobin is Very
Increased," where "Very Increased" is the label ofa fuzzy set "VERY INCREASED",
depicted in fig.3.
J.i. F
NORMAL VERY INCREASED
I

0.5 ............:. .............. ~ ..... .


··
··
...
. .
. no-""-"" values
O~~~'----~'--~~~~~~ 1.25 1.6 1.9 2.2 of haptoglobin
0.2 0.5
Fig. 3 - lllustration of "Haptoglobin is Very Increased."

The information contained in "Haptoglobin is Very Increased" does not provide a


precise characterisation of the numerical values to be assigned to a variable named
"Haptoglobin," but it indicates a soft constraint on its possible values. In the pattern
239

of Medical Knowledge, the fuzzy sets are fuzzy intervals that extend the definition of
usual (crisp) intervals. Fuzzy intervals are here of three types, they fuzzify crisp
intervals and they mean "fuzzily greater (or smaller)" than a given value a, or "fuzzily
between" two values b and c (for example fuzzy intervals representing NORMAL
ranges are usually of this last type). Values like a, b, c, have a grade of membership
equal to 0.5. In particular, a fuzzy [b,c]-type interval can reduce to a fuzzy number D,
meaning "around a value d" (see fig.4). In this case, the bandwidth is the separation
between the two values having a 0 .5 grade of membership, it is a convenient
fuzziness indicator of the fuzzy number D.
JlD

1 D

0.5

numerical
o' - - - -..........;:.....;d~:--l.....- -. . values
:.,..;
bandwidth
Fig. 4 - Fuzzy number D, meaning "around d."

It is very important for these membership functions to be easily modifiable during


the training phase, for their evaluation. For simplicity, their shapes have been chosen
here, as trapezoidal or triangular ones. Practically, it is not very important to set, for
example, 0.7 or 0.75 as grades of membership when the curves are empirically
designed. What mostly matters, is the monotonicity of the function and the position
of strategic values, i.e. values with grades of membership equal to 0, 0.5 or to 1.
Usually, for each type of laboratory analysis, the biologist determines for a specific
purpose or, more generally refers to a variation range in which should fall the normal
quantitative measurements. He/she has a rough idea of the limits for abnormalities,
having in mind more or less well-defined intervals. To determine a patient's
condition, it is then sufficient to check in which interval the measured value falls. If
we consider a non fuzzy proposition of the form: "Si is a number in the interval
[2,5]," we mean that any number in the interval [2,5] is a possible value to be
assigned to the variable Si and it is not possible for a number outside this interval to
be assigned to Si' In other words, for ui in the universe of discourse Ui :
Possibility {Si =ui} = 1 for 2 ~ ui ~ 5
= 0 for ui < 2 or ui > 5.
Returning now to the fuzzy case, the proposition "Haptoglobin is Very Increased"
(i.e. of the form "Si is Fj") means that:
Possibility {Haptoglobin = ui} = J!VERY_INCREASED(Ui)
or Possibility {Si =ui} =J!P.(Ui)·
1
240

ASSIGNMENT OF DIAGNOSES TO PATIENTS


A given patient will be assigned each diagnosis. a grade between 0 and 1. In
typical cases. one diagnosis will have a grade equal (or close) to 0 and all the other
diagnoses will have a grade equal (or close) to 1. The interesting cases will be the
intrinsically fuzzy ones. i.e. several diagnoses assigned to a patient, with grades
between 0 and 1.
Let us consider a diagnosis L\. characterised by "(SI is Fl) ANI! ... ANll
(Si is Fi) ANJl. ... ANI!. (Sn is Fn)". The attributes S 1•...• Sit ...• Sn have to be
measured on the patient. yielding the values :
SI(patient) =dl in Ul •...• Si(patient) =di in Ui •...• Sn(patient) =dn in Un.
Then. Possibility{SI(patient) =dl ..... Sn(patient) =dn • GIVEN "(SI is Fl) AND
... ANI! (Sn is Fn)") =MIN(J.1F l(dl)•...• J.1Fn(dn». where the MIN operator usually
translates the conjunction ANll. Finally. such minimum of the above numbers
provides a grade of compatibility of the patient's condition. for diagnosis L\. The
same operations are performed for all diagnoses. yielding a ranking in diagnoses for
the patient.
In fact, the measured data are often fuzzy in at least two aspects :
i) imprecision in measurements.
ii) interpretation of the values.
so that it is natural to transform each measured (numerical) value into a fuzzy number
(like in fig. 4). e.g. "Si(patient) = dj" is transformed into "Si(patient) is Di'" The
patient's condition is now expressed as a conjunction of fuzzy propositions involving
fuzzy numbers. so that now. one has the following.
Possibility(SI (patient) is Dl Mill ... AMI! Sn(Patient) is On. GIYEN "(SI is Fl)
ANI! ... Am! (Sn is Fn)") =MIN(1t(Fl.Dl)•...• 1t(Fn.Dn». where for i =l.n.
1t(Fi.Di) =SUP(Fi n Di).
i.e. 'v'ui E Ui' 1t(Fi.Di) =SUPuI' MIN[J.1F.(ui).J.1D.(ui)]'
I I

1t(Fi.Di) = Possibility{Di GIYEN Fi} is called a possibility measure [5]. It is


illustrated in fig.5. where its numerical value indicates a weak compatibility of
"around dj" with the fuzzy interval representing "Very Increased."

J.lF (VERY INCREASED)


F·I
1

1t (F i .0 i) ....................... ..
o di
S·I

Fig. 5 - Possibility measure of Di with respect to Fi.


241

Finally. the patients are assigned a ranking in all diagnostic proflles. by


means of grades lying between 0 and 1. For each patient, the set of all diagnoses.
associated with their grades of assignment derived from the possibility measmes (they
are numbers in the interval [0,1]). can be considered as a discrete fuzzy set. n. For
each diagnosis (A). one has for example J.1j)(A) = MIN[1t(FI.DI) • ...• 1t(Fn.Dn)). n
wil be defuzzified, as shown in the sequel.

RELATIVE WEIGHTING
Practically. some attributes might be less important than others in the
characterization of a diagnosis. For a given diagnosis. relative importance among
attributes can be translated by means of weights (a. P. 'Y ••• ) ranging in [0.1]. A
value "0" weight assigned to an attribute means that this attribute is not important at
all in the evaluation of the diagnosis and hence it can be deleted. whereas a value "I"
weight does not modify the importance of the protein. Intermediate grades of
importance can be tuned by adjusting values of weights within the unit interval.
In the pattern. fuzzy propositions ("S is F." in the generic form)
characterizing a given group. appear as conjunctions (ANDs). Assignement of a
weight a to take into account the relative importance of protein variations. can
assume the following form [6.7]. for F fuzzy set in a universe of discourse U :
a
F =MAX (1-a. F).
i.e. Vx E U. Jlpa (x) = MAX[(I-a). J.lp (x)].
Generally. a t-conorm could replace the MAX operator in the above formula [8].
Limit cases have the following meanings.
a =0: Vx E u. J.lpD(x) = 1. i.e. FOis neutral for conjunctions and therefore. it can
be deleterl,
a = I: V x E U. J.lFl (x) =J.lF(x). i.e. the weight has no effect
In the case of Vasculitis. the following weights have been assigned. yielding
the modified rule :
Vasculitis IE C3-Complement Fraction is (Decreased or Normal)O.l
ANI! Alpha-I-Antitrypsine is (Decreased or Normal)l.O
Mm Orosomucoid is (Increased)O.8
Mm Haptoglobin is (Very Increased)O.3
Mm C-Reactive Protein is (Very Increased)°·8.
Note that C3-Complement Fraction could have been neglected (weight close to 0) and
that no weight might be assigned to "Decreased or Normal" in Alpha-I-
Antitrypsine (weight equal to I. i.e. no effect of the weight). For example. the
modified fuzzy variations of Haptoglobin (with weight 0.3 to "Very Increased"). and
the corresponding modified fuzzy measure are presented in figure 6.
242

IlF
0·1 F~.3

----IIIIIIf......
1
1

1t (F~.3. 0i ) ... -~ (l-a.)


F·1
a =0.3
O~------'----r~~--------~~ S·1

Fig. 6 - Possibility measure of 0i with respect to a weighted Fi.

DEFUZZIFICATION
If the patients are to be assigned non fuzzy diagnoses, we must defuzzify the
fuzzy set n of diagnoses, that has been evaluated following a MIN aggregation. For
this purpose, one may use the concept of separating power [9], which is different
from the center of gravity method commonly employed in fuzzy control. The
separating power s(:D) allows to evaluate to which extent a fuzzy set, like n, of a
universe of discourse U (U is here the set of the given diagnoses under study),
separates optimally U into a non fuzzy partition (A,A), where A' is the complement
set of A. The set A is defined as follows:
s(:D) =n • A =sup {n.B such that U ~ B, B *0 }, in which
I
n • B = card(:DB) / card(B) - card(1)B') / card(B) I,
where nB denotes the restriction of t> to B, Card(B) is the cardinality of B, and
card(:DB) is the fuzzy cardinality oft>B ; for example, card(1)B) =I Ae B ~(A).
Applying the separating power to the fuzzy set n, it is derived the optimal partition
«A,A') above) to n. A is fmally the (non fuzzy) set of diagnoses assigned to
patients.

APPLICATION TO INFLAMMATORY PROTEIN


VARIATIONS
This application is reported from [4]. The following five proteins, involved
in biological inflammatory reactions, have been chosen.
- C3 (C3-Complement Fraction)
- AIAT (Alpha-l-Antittypsine)
- Om (Orosomucoid)
- Hpt (Haptoglobin)
- CRP (C-Reactive Protein).
The Protein-Biological_Inflammatory_Syndrome (p.B.I.S.) pattern contains eleven
groups:
- Normal condition
243

- Eight Biological Inflammatory Syndromes :


· Bacterial Infections
· Viral Infections
· Vasculitis
· Nephrotic syndromes
· Acute Glomerular Nephritis
· Intravascular Hemolysis with inflammation
· Collagen Diseases non Lupus and without infection
· Lupus
- Intravascular Hemolysis without inflammation
- Glomerular Renal Insufficiency without inflammation
The protein variations can be easily interpreted in linguistic terms by physicians, so
that the P.B.I.S. pattern is well adapted to a fuzzy sets representation. The fuzzy
propositions in this pattern have been inteq>reted in a linguistic tableau form (one of
its rows is reproduced in Table 7).
PROTEINS
C3 AIAT Om Hpl CRP

.J
~
Decreased Decreased Increased Very Very
~ Vasculitis
or Normal or Normal Increased Increased
.J
o
~
=:I

Table 7 - Linguistic characterisation of Vasculitis in the P.B.I.S.


pattern.

The fuzzy sets corresponding to this linguistic pattern have been established for each
entry of the tableau (one of its rows is in Table 8).
PROTEINS
C3 AIAT Om Hpl CRP

.J
~ Vasculitis ~
~ 1
l l i 1.8 ILI~I~
1.7 1.4 1.2 2.2 ~ ~
oJ
o
~
=:I
Table 8 - Fuzzy sets characterisation of Vasculitis in the P.B.I.S. pattern.

In this study, fuzzy numbers issued from measurements over patients have
been compared with the corresponding fuzzy sets in the P.B.I.S. pattern, by means of
244

three measures or indexes hereafter defmed: possibility measure (X), necessity


measure (V), bUth-possibility index (P).
For each protein (S), let F be a fuzzy set characterizing S in a diagnostic group, and
let D be the fuzzy number issued from the serlc level of S, measured over a patient,

i) Possibility measure [5]. By definition, x(F,D) =Sup (Fr.D).

ii) Necessity measure [10]. By definition, V(F,D) = 1 - X(F',D), where F


denotes the fuzzy complement of F, i.e. F' = 1 - F. Note that v(F,D) = 1 -
Sup (Fr.D) =Inf (FUD').

iii) Truth-possibility index [11,12]. By definition, p(F,D) =X('tO,'tI), where


'to and 'tl are related to U'Uth-qualification [5], according to the semantic entailment:

[(S is F) is 'tl] ~ X is D ~ [(S is F) is 'to]'


With the special case of fuzzy sets in this study, one simply shows that
p(F,D) = JlD(d), where D means "around d". Moreover, one can show that the
following ranking holds [13] (see figure 9 for an illustration) :
V s; p s; X,
so that these indexes can be chosen according to optimistic or pessimistic
considerations.

Jl D
1

1t (F,D)
P(F,D)
V (F,D)

o
d
Fig. 9 -Compatibility measures or indexes.

For each of the eleven groups, comparison of a patient's condition with the
pattern yields five (one for each protein) ttiples of numbers (Vi' Pi' Xi)' i = 1,5,
which are aggregated by means of the MIN operator, expressing conjunctions:
245

(V, P,X) =(MINi Vi' MINi Pi' MINi Xi)' Finally for each patient, one has three
different rankings of diagnoses derived from V, P, X.

ILLUSTRATIVE EXAMPLE
This patient case we report from [4], has been medically diagnosed as
Vasculitis. The protein profile of this patient is given as follows.
C3 AIAT Om Upt CRP
raw data (gil) 1.50 2.26 1.75 10.0 0.060
nonnalized data 1.85 1.00 1.99 5.59 10.0
For simplicity, we only present the matching results from the possibility measure
(the Xi'S) and the corresponding crisp partition (A,A,) that has been found to be
associated with the fuzzy set of diagnostic groups n.
A = (Collagen Diseases) (mini Xi =0.43) s(1)= 0.40.
A' consists of all the remaining diagnostic groups. Vasculitis does not appear here for
one of the possibility measures, Xl =x(Vasculitis,C3), is nearly equal to zero.
Hence, the MIN operator acting on the Xi's produces a value practically equal to zero,
whatever values are computed from the other possibility measures associated with
Vasculitis. In fact, for Vasculitis, the possibility measure results are as follows.
C3 AIAT Om Upt CRP mini xi
Xi'S 0.04 1 0.82 1 1 0.04
The four proteins (AIAT, Om, Hpt and CRP) have a high grade of matching and
Vasculitis is rejected because of the only mismatch due to C3. But as already pointed
out, in the case of Vasculitis, C3 can be nearly neglected (weight equal to 0.1).
Hence, in a weighted process, one will derive:
A =(Vasculitis) (mini Xi =0.85) s(1)= 0.78.
The right diagnostic group of Vasculitis appears now, and it is computed with a
better separating power (0.78) than in the case of a non-weighted process (0.40).
We recall now the weights of importance associated with the five proteins in
the characterisation of Vasculitis. For this patient's case, we also give the matching
results in the non-weighted process, followed by the ones of the weighted process,
using only the Xi's.
C3 AIAT Om Upt CRP min·x·
I I
Weights (Vasculitis) 0.1 1.0 0.8 0.3 0.8
Xi's (non weighting) 0.04 1 0.82 1 1 0.04
Xi'S (weighting) 0.9 1 0.85 1 1 0.85

In an automatic classification process, the aggregation we have presented for


Vasculitis (L\), has to be performed for all of the eleven diagnostic groups, yielding
for each patient a fuzzy set (1) of diagnostic groups that can be defuzzified by means
of the separating power.
246

FUZZY LOGIC AND ARTIFICIAL NEURAL NETWORKS


It is now introduced a model of fuzzy connectionist expert system, in which
an artificial neural network is designed to build the knowledge base of an expert
system, from training examples (this model can also be used for specifications of
rules in fuzzy logic control).
Expert systems have shown some weaknesses, for example in the process of
eliciting knowledge from experts, in learning capabilities or in producing poor results
at the limits of the system's domain of expertise. Neural networks are offering
noticeable contributions to expert systems such as : training by example, dynamical
adjustment of changes in the environment, ability to generalize, tolerance to noise,
graceful degradation at the border of the domain of expertise, ability to discover new
relations between variables. Fuzzy logic, supporting interpolative reasoning [14], is
playing a key role in human cognitive systems, it lies at the base of pattern
classification, qualitative reasoning, analogical reasoning, case-based reasoning,
neural modeling, system identification and related fields. The standards of accuracy
and precision prevailing in traditional computers are presently questioned or discarded,
especially while narrowing the gap between human reasoning and machine reasoning.
In the context of approximate reasoning, expert systems and fuzzy logic control on
one side, and artificial neural networks, on the other side, share common features and
techniques [15]. Connectionist networks (or artificial neural networks) tools are now
used in learning control problems like the cart-pole balancing system [16-18].
Combination of fuzzy logic with neural networks theory is enhancing the capability
of intelligent systems to learn from experience and adapt to changes in an
environment with qualitative, imprecise, uncertain or incomplete information.

FUZZY CONNECTIONIST EXPERT SYSTEMS


Fuzzy logic has been used in conjunction with artificial neural networks in a
variety of recent papers [17-31]. In the spirit of S.I. Gallant's model [19] of
connectionist expert system (CES), we proposed in [31] an expert classification
system in which a connectionist model is used to extract or to tune the knowledge
from a training set of examples. An important feature of this model is its fuzzy
nature with an intrinsic treatment of fuzziness. Nevertheless, unlike in the CES
model, fuzzy sets are not considered from their crisp representations.
Inputs to the neural system are weighted, but we assume that weights are of
two types: primary weights, in general followed by secondary weights. Primary
weights express the main information on knowledge. They have a linguistic form and
they are interpreted as labels of fuzzy sets, meaning for example : Increased,
Decreased. Very-increased, Normal, etc., like in the application we just described.
Depending on applications, these fuzzy sets are defined over universes of discourses
related to the nature of the input cells or,like in fuzzy control, they can be members
of a given partition of the interval [-1,+1], with triangular shaped membership
functions (fig.lO) typically meaning "Negative Large (NL). Negative Medium (NM).
Negative Small (NS). Approximately Zero (ZR). Positive Small (PS). Positive
Medium (PM). Positive Large (PLY', or more simply having the only three
linguistic values "Decreased. Normal. Increased". Secondary weights are numbers in
[0,1], they reflect the grade of weakness of the corresponding connection (the weaker
the connection, the closer to 1 the weight) and they do not necessarily act on
connections but when they do so, they follow a primary weight they are combined
with.
247

Fig. 10 - A fuzzy partition of [-1,+1]

The neuro fuzzy system is a feedforward network with no thresholds: fuzzy


sets avoid the use of thresholds, by considering graded transitions from one state to
the other. There are no directed cycles, no feedback, one iteration is sufficient for
inferencing. The training phase is not performed from methods involving weighted
sums of inputs, but from intrinsically fuzzy equations, using MIN and MAX
operators. This phase consists in fmding the numerical weights from examples. It is
not asked to find the membership functions of the primary weights in the general
case, for any universe of discourse. It is assumed that a human expert has a rough
idea of the shapes, the task is to tune the curves according to the information
provided from input-output examples: this is a general remark to keep in mind when
designing models of fuzzy connectionist expert systems (FCBS).
Learning now mainly consists in finding the numerical secondary weights
and the network topology : numerical weights close to "I" will indicate an absence
of the corresponding primary weight, whereas numerical weights close to "0" will
not influence at all the corresponding primary weight. Then the primary linguistic
weights might be adjusted, when appropriate, by moving the slopes of the curves in
the intrinsically fuzzy zone (grades of membership different from 0 and from 1).
The neuro fuzzy network consists of connections between input cells (Sj),
output cells (L\i), and possible hidden cells (Hij). Primary weights (wij) are linguistic
labels of fuzzy sets, characterizing the variations of the input cells ("Sj is wif) in
relation with the output cells (see fig. 11).
(linguistic)
f2\ _______w.1_'. ____ f:\
~ 4.~~

Input Cell Output Cell


Fig. 11 - Connection with an only primary (linguistic) weight

We assume, depending on the context, that Wij denotes indifferently (as no


confusion arises) a linguistic weight or the associated fuzzy set Secondary weights
(biP are numbers in the unit interval. In the network, input cells have connections
pomting either to hidden cells and followed by connections towards output cells
(fig. 12) or, directly, to output cells (this case corresponds to a numerical weight
equal to 0), but not necessarily to all output cells (no connection at all corresponds to
248

a numerical weight equal to 1). As soon as a connection is issued from an input cell,
a linguistic weight exists, but not always a numerical weight does, in the case of no
hidden cell. Hidden cells have only numerical weights associated with connections
towards output cells.
(linguistic) (numerical)
~______w_i_i____...~~~____~~j____~••~
Input Cell Hidden Cell Output Cell
Fig. 12 - General connection with a primary (linguistic) weight
and a secondary (numerical) weight
Input cells can take on numerical values or fuzzy numbers, in their
underlying universe of discourse. When the input cells Sj's are given, output cells
Ai'S are computed according to the following formula (combination of weights for
inferencing) :
Ai =MIN· MAX lbi· ,Jl (d·)] for numerical dj's
J ~ w··IJ ~
or else, Ai =MINj MAX lbij , 1t(wij,0j)] for fuzzy numbers OJ's,
where: - dj's are numerical value assigned to Sj's,
- OJ's are fuzzy numbers meaning "around dj's,"
- Jlw..<dj) is the grade of membership of dj in Wij,
IJ
- 1t(wij,oj) is the possibility measure of OJ GIVEN wij-
Of course, a mixed formula for a Ai can involve both numerical dj's and fuzzy
numbers OJ's, and in the above formula, t-norms and t-conorms could replace MIN
and MAX operators, respectively.
Let us consider now training examples, i.e. for a Ai, it is given the
corresponding Sj's connected to it

1st case.- The wij'S are assumed to be known, at least as a rough


approximation, so that the unknown are the bij's. How to solve this type of equation
was early presented in [32] (see also [33] for extensions and more developments) in
the general case of complete dually Brouwerian lattices, in which the set of x's such
that MAX(a,x) ~ b contains a least element, denoted aEb (note that aEb is also
defined in [0,1] as being equal to b if a < b and to 0 if a ~ b). In case of poor
solutions, membership functions of the wij's are adjusted by shifting or changing the
slopes (tuning).
2nd case.- Neither the wij's, nor the bij's are known, but the wij's are
supposed to be members of a known fmite fuzzy partition of [-1,+1], like in fuzzy
logic control (see fig.IO). Again, for each Wij of the fuzzy partition, the above
equation has to be solved.
249

BIOMEDICAL APPLICATION
We now illustrate the fuzzy connectionist network, using the same previous
biomedical domain: inflammatory protein variations. We consider the same five
proteins : C3-Complement Fraction (C3), Alpha-l-Antitrypsine (AIAT),
Orosomucoid (Om), Haptoglobin (Hpt) , C-Reactive Protein (CRP) and, for
simplicity, only four diagnostic groups composed of the Normal condition and of
three biological inflammatory syndromes: Bacterial Infection, Vasculitis, Nephrotic
Syndromes.
The P.B.I.S. network we present here, is depicted in fig. 13, in which the
five proteins correspond to the input cells SI, ... , S5 and the four groups to the
output cells Ill,... , 1l4. There are seven hidden cells associated with numerical
weights. The linguistic weights have the following meaning.
wll : normal,
w12 : normal, w22: increased, W32 : decreased or normal, w42 : decreased or normal,
w13 : normal, W23 : increased, w33 : increased, W43 : decreased or normal,
W14 : normal, W24: increased, W34 : very increased, W44 : slightly increased or
increased,
W1S : normal, W25 : very increased, w3S: very increased.

BACfERIAL NEPHRonC
NORMAL 1NFECI10N VASCUunS SYNDROMES

C3 AIAT Om Hpt CRP


Fig. 13 - P.B.I.S. neuro fuzzy network.

For example, in this network, Vasculitis (1l3) is connected with :


- AIAT (Sv : decreased-or-normal (W3V,
- Om (S3) : increased ( W33), with weight 0.2 (b:n),
- Hpt (S4) : very-increased ( W34), with weight 0.7 <b34),
- CRP (S5) : very-increased ( W3S), with weight 0.2 <b3S).
There is no connection with C3 (S}), corresponding to a numerical weight 1 <b31).
250

We are now practically exploring this biomedical application (computed


results will be presented in a forthcoming extended version of the last section of this
paper) and we are studying an application of this method to handwritten character
recognition.

ACKNOWLEDGEMENTS. The author wishes to thank Dr. R. Bartolin, for his


participation and earlier collaboration on medical aspects of fuzzy sets applications.

REFERENCES
[1] J.L. Beaumont, L.A. Carlson, G.R. Cooper, Z. Fejfar, D.S. Fredrikson and T.
Strasser, "Classification of Hyperlipidemias and Hyperlipoproteinemias," Bull.
W.H.O., 43 (1970), pp. 891-915.
[2] D.S. Fredrickson, R.I. Levy and R.S. Lee, "Fat transport in lipoproteins," N.
Engl. J. Med., 276 (1967), pp. 32-44, 94-103,148-156,215-226,273-281.
[3] R.M. French, "Guide to Diagnostic Procedures," Mc Graw-Hill, New-York
(1975).
[4] E. Sanchez and R. Bartolin, "Fuzzy Inference and Medical Diagnosis, a Case
Study," Proc. of the First Annual Meeting of the Biomedical Fuzzy Systems
Association, Kurashiki, Japan (1989), in J. of the Biom. Fuzzy Syst. Ass.,
VoLl, N.l (1990), pp.4-21.
[5] L.A. Zadeh, "Fuzzy Sets as a Basis for a Theory of Possibility," Fuzzy Sets and
Systems, 1 (1978), pp. 3-28.
[6] E. Sanchez, "Soft Queries in Knowledge Systems," Proc. of the Second IFSA
World Congress, Tokyo (1987), pp. 597-599.
[7] E. Sanchez, "Importance in Knowledge Systems," Information Systems, Vol. 14,
N°6 (1989), pp. 455-464.
[8] E. Sanchez, "Handling Requests in Intelligent Retrieval," in Contributions on
Approximate Reasoning and Artificial Intelligence, M. Delgado and J.L.
Verdegay, eds. (to appear).
[9] C. Dujet,"Valuation et Separation dans les Ensembles Flous," Structures de
l'Information, 18 - Publications du CNRS (1980), pp. 95-105.
[10] M. Cayrol, H. Farreny and H. Prade, "Fuzzy Pattern Matching," Kybernetes, 11
(1982), pp. 103-116.
[11] E. Sanchez, "On Truth-qualification in Natural Languages," Proc. Int. Con/. on
Cybernetics and Society, Tokyo (1978), pp. 1233-1236.
[12] E. Sanchez, "Mesures de Possibilite, qualifications de Verite et Classification de
Formes Linguistiques en Medecine," in Actes Table Ronde C.N.R.S., Lyon
(1980).
[13] R. Bartolin, "Aide au Diagnostic Medical par Mesures de Comparaisons Floues
et Pouvoir Separateur. Approche Linguistique des Profils Proteiques
Inflammatoires Biologiques," These d'Etat en Bilogie Humaine, Marseille (1987).
[14] L.A. Zadeh, "Interpolative Reasoning Based on Fuzzy Logic and its Application
to Control and Systems Analysis," invited lecture, abstract in the Proc. of the
Int. Con/. on Fuzzy Logic & Neural Networks, Iizuka, Japan (1990).
[15] E. Sanchez, "Connectionism, Artificial Intelligence and Fuzzy Control," invited
lecture, abstract in the Proc. of the Second Annual Meeting of the Biomedical
Fuzzy Systems Association, Kawasaki Medical School, Kurashiki, Japan (1990).
251

[16] A.G. Bano, R.S. Sutton and C.W. Anderson, "Neuronlike Adaptive Elements
that Can Solve Difficult Learning Conttol Problems," IEEE Trans. S.M.C., vol.
13, N°5 (1983) pp.834-846.
[17] C.C. Lee(1989), "A Self-learning Rule-based Controller Employing
Approximate Reasoning and Neural Net Concepts," Memo U.C. Berkeley,
N°UCB/ERL, M89/84 (1989), to appear in the Int. J. of Intelligent Systems.
[18] C.C. Lee, "Intelligent Conttol Based on Fuzzy Logic and Neural Network
Theory," Proc. of the Int. Con/. on Fuzzy Logic &. Neural Networks, Iizuka,
Japan (1990) pp.759-764.
[19] S.I. Gallant S.I. (1988), "Connectionist Expert Systems," Com. of the ACM,
vol. 31, N~ (1988) pp.152-169.
[20] M. Frydenberg and S.I. Gallant S.I. (1987), "Fuzziness and Expert System
Generation," Lect. Notes in Computer Science (B. Bouchon and R.R. Yager,
Eds), Springer-Verlag, vol. 286 (1987) pp.137-143.
[21] B. Kosko, "Fuzzy Associative Memories," in Fuzzy Expert Systems (A.
Kandel, Ed.), Addison-Wesley, Reading, Mass. (1986).
[23] M. Togai, "Fuzzy Neural Net Processor and its Programming Environment,"
Preprints of the 1988 first joint technology workshop on neural networks and
fuzzy logic, NASA, Johnson Space Center, Houston, TX (1988).
[24] H. Takagi and I. Hayashi, "ArtificiaCNeuraCNetwork-Driven Fuzzy
Reasoning," Proc. of the Int. workshop on fuzzy systems applications, Kyushu
Institute of Technology, Iizuka, Japan (1988) pp.217-218.
[25] R.R. Yager, "On the Interface of Fuzzy Sets and Neural Networks," Proc. of the
Int. workshop on fuzzy systems applications, Kyushu Institute of Technology,
Iizuka, Japan (1988) pp. 215-216.
[26] D.L. Hudson, M.E. Cohen and M.F. Anderson, "Determination of Testing
Efficacy in Carcinoma of the Lung Using a Neural Network Model," in
Computer applications in medical care (R.A. Greenes, Ed.), vol. 12 (1988)
pp.251-255.
[27] S.S. Chen, "Knowledge Acquisition on Neural Networks," LeCI. Notes in
Computer Sciences (B. Bouchon, L. Saitta and R.R. Yager, Eds.), Springer-
Verlag, vol. 313 (1988) pp.281-289.
[28] T. Yamakawa and S. Tomoda, "A Fuzzy Neuron and its Application to Pattern
Recognition," Proc. of the Third IFSA Congress, Seattle, WA (1989) pp.30-38.
[29] K. Yoshida, Y. Hayashi and A. Imura A. (1989), "A Connectionist Expert
System for Diagnosing Hepatobiliary Disorders," Proc. of MEDINFO 89,
Beijing and Singapore (1989) pp. 116-120.
[30] J. Yen, "Using Fuzzy Logic to Integrate Neural Networks and Knowledge-based
Systems," Proc. of the Neural networks and fuzzy logic workshop, NASA,
Johnson Space Center, Houston, TX (1990).
[31] E. Sanchez, "Fuzzy Connectionist Expert Systems," Proc. of the Int. Con/. on
Fuzzy Logic &. Neural Networks, Iizota. Japan (1990) pp.31-35.
[32] E. Sanchez, "Resolution of Composite Fuzzy Relation Equations," Information
and Control, vol. 30, N°l (1976) pp.38-48.
[33] A. Di Nola, W. Pedrycz, E. Sanchez and S. Sessa. "Fuzzy Relation Equations
and their Applications to Knowledge Engineering," Kluwer Acad. Pub.,
Dordrecht (1989).
12
THE REPRESENTATION AND USE
OF UNCERTAINTY AND
METAKNOWLEDGE IN MILORD
R. Lopez de Mantaras, C. Sierra, J. Agusti

Centre d'Estudis Avan~ats de Blanes


CSIC
17300 Blanes, Spain
e-mail: [email protected]

INTRODUCTION
One of the most interesting aspects of Expert Systems research is to
gain some insights about human problem solving strategies by trying to
emulate them in programs. Experts in a domain are better than novices in
performing problem solving tasks. This is due to their greater experience in
solving problems that provides them with better strategies. Such strategies are
knowledge about how to use the knowledge they have in their domain of
expertise. This kind of knowledge is called metaknowledge and is represented
by means of meta-rules in the MILORD system for diagnostic reasoning.
Diagnostic reasoning heavily involves metaknowledge to focuss attention on
the most plausible hypotheses or goals in a given situation and to control the
inference process. Furthermore, uncertainty also plays an important role at the
control level, for example, decisions are taken depending on the uncertainty of
the facts supporting them.
On the other hand, psycological experiments (Kuipers et aI., 1989)
show that human problem solvers do not use numbers to deal with uncertainty
but symbolic descriptions expressing categorical and ordinal relations and that
in complex situations, the propagation and combination of uncertainty is a
local context dependend process. MILORD has a modular structure that allows
to represent and manage uncertainty by means oflocal operators defined over a
set of ordered linguistic terms defined by the expert.
In this paper we describe the MILORD system foccusing in the
metaknowledge and in role that uncertainty plays in such modular system,
that is, its role in the local deductive mechanisms within each module and as a
control feature in the task of selecting and combining modules to achieve a
solution.
Before describing MILORD, the paper starts by presenting
fundamental concepts on control structures for rule-based systems.
254

INFERENCE CONTROL FOR RULE-BASED SYSTEMS


Inference control in problem solving is the aspect where the use of
metaknowledge has more strongly been used (Aiello & Levi, 1988). Problem
solving consists in the activation of rules starting from a: set of known facts.
The application of one rule may cause the activation of another one resulting it
what is known as Rule Chaining. This can happen when the conclusions of one
rule match the conditions for another. In general there is more than one rule
that may be applicable at the same time but only one must be selected. This
situation is known as the comflict resolution problem. Most expert systems
make arbitrary choices such as the first rule in the list of applicable rules is the
one selected, or the one containing more conditions, etc. On the other hand,
having control knowledge represented by meta-rules allows to reason about
which rule should be applied, that is, the system can dynamically decide which
is the best object-level inference to perform.
Another important aspect in problem solving is the control flow, that is,
in which order the modules and submodules will be executed. In traditional
software the control structure is fixed: one module calls other modules to
execute its subtasks and the calling sequence imposes the order of execution of
the tasks. Control structures in expert systems can not be as rigid because
often the expert has to adapt the order of execution of modules based on
opportunities or obstacles that may arise. Such opportunistic problem solving
behaviour is driven by what we call strategic knowledge and it is also part of
the control meta-level. This control knowledge is extremely important because
allows to closely emulate human's problem solving behaviour and therefore
increases the credibility of the expert system.
An example of a meta-rule representing strategic problem solving
knowledge is (Godo et al., 1989):

IF pneumonia is suspected and patient has AIDS


THEN consider first the modules: P-CARINII, TBC,
CITOMEGALOVIRUS,
CRIPTOCOCCUS

It is important to make clear that we have two levels of reasoning:


object-level and control-level.
The object-level is where the inferences about the problem domain are
performed. At this level we have the rules that represent knowledge about the
domain as well as descriptions of objects, properties and relations in the
domain.
The control-level is concerned with the problem solving strategies, that
is, it controls in which order the tasks and subtasks will be executed. More
sophisticated expert systems may have several control levels like in MILORD
where there is a level whose goal is to combine different sequences of goals
resulting from the application of more than one control meta-rule as we will see
later.
The overall problem solving control flow jumps back and forth between
those levels. Part of the reasoning takes place at the control-level to deduce the
next task (module) to be executed. Then, reasoning will proceed at the object-
level (inside the module) to deduce new domain facts. As a result of that, new
255

control meta-rules might be applied that could suggest a new sequence of goals
to be considered and combined with the previous one. This combined strategy
will then be executed and so on. Later in the paper we will describe in more
detail this process.

UNCERTAINTY MANAGEMENT
Most AI research on reasoning under uncertainty is concerned with
normative methods to propagate and combine certainty values and there is
some disagreement between the proponents of the different methods
(Bayesians, Dempster-Shaferians, Fuzzy logicians, etc.). Hoewever, these
methods do not really claim to closely mimic human problem solving under
uncertainty. Although human problem solvers are almost always uncertain
about the possible solution in complex domains, they often achieve their goals
despite uncertainty by using methods that are particularized to the type of
problem solving that they are performing at a given time. In fact, like (Cohen
et aI., 1987) puts it, managing uncertainty consists in selecting actions that
simultaneously achieve solutions and reduce their uncertainty. This view leads
to consider uncertainty as playing an important role at the control level
because it is useful to constrain the focuss of attention (which part of the
problem to work next) and action selection (how to work on it) as will be shown
in the framework of MILORD.
Furthermore, we belive that large complex expert systems draw their
problem solving capabilities more from the power of the structure and control
of their knowledge bases than from the particular uncertainty management
formalism they use. On the other hand, the structure in the knowledge bases
makes the propagation and combination of uncertainty a local, context
dependent process.

MODULARITY AND LOCALITY


A knowledge base (KB) is a large set of knowledge units that covers a
domain of expertise and provides solutions to problems in that domain of
expertise.
When faced with a particular case, human experts use only a subset of
their knowledge for two reasons: adequacy of the general knowledge - the
theory - to the particular problem and availability and cost of data. For
example, the suspicion of a bacterian disease will rule out all knowledge
referring to virical diseases; and also a patient in coma will make useless all
the knowledge units that need patient's answers.
The adequation of general knowledge to a particular problem is done at
a certain level of granularity, for instance, the expert uses all the knowledge
related to the diagnisis of a colon neoplasy or the knowledge related to the
radiological analysis of a chest x-ray.
In particular the structuration of KB's is made in MILORD taking into
account this granularity in the use of knowledge.
Each structural unit or theory (module from now on) will define an
indivisible set of knowledge units (for example rules and predicates). The
control will be responsible for the combination of the modules. The combination
will represent the particularization of general knoledge to the problem that is
being solved. The control will determine which combinations are acceptable.
256

For example, a module that determines the dosis of penicillin that has to be
given to a patient must not be presented in any acceptable combination for a
patient allergic to penicillin.
The modularization of KB's leads to the concept of locality in the
modules of a KB. It is possible to define the contents of a module independently
of the definition of the rest of the modules. This possibility, methodologicaly
desirable, allows the use of different local logics and reasoning mechanisms
adapted to the subtasks that the system is performing.

MODULARITY OVER MILORD: THE COLAPSES


LANGUAGE
The basic units of KB's written in our language, COLAPSES, are the
modules. These may be hierarchically organized, and cpnsist of an
encapsulated set of import, export, rule, meta-rule and submodule
declarations. The declarations of submodules in a module is what structures
the hierarchy. The declarations of submodules do not differ from the
declaration of modules. We shall briefly outline which is the meaning of the
primitive components of a module. A complete definition of the language and
its semantics can be found in (Sierra, Agusti, 1990).
Import: , determines the non-deducible facts needed in the module to
apply the rules. These facts are to be obtained from the user at run time.
Export: defines which facts deduced or imported inside a module are
visible from the rest of the modules that include the module as a submodule.
Rule: define the deductive units that relate the import and the export
components within a module.
Metarule: defines the meta-logical componentof the module. Thus, the
meta-rules of a module will control the execution of the rules in the module and
the execution of the sub modules in the hierarchy underneath the module.

The syntax of a module definition is as follows:

Module modid =modexpr

where modid stands for an identifier of the module and modexpr for the body of
the definition made out of the components specified above. Let us look at an
example of module definition.

Module gram_esputum =
begin
import Class, Morphology
export morpho, esputum_ok
deductive knowledge:
Rules:
ROOI If class >4 then esputum_ok is sure

end deductive
end
257
There is also the possibility of defining generic modules that represent
functional abstractions of several non genric modules.

LOCAL LOGICS
It is clear that experts use different approaches to the management of
uncertainty depending on the task they are performing. Usually expert
systems building tools provide a fixed way of dealing with uncertainty
proposing a unique and global method for representing and combining
evidence. In the COALPSES language it is possible to define different
deduction procedures for each one of the modules. If from a methodological
point of view a task is associated with a module then, a different logic can be
used depending on the task.
The definition of local logics is made by the next primitive in the
COLAPSES language:

Inference system:
Truth values = list of linguistic terms
Renaiming= morphisms between linguistic terms
Connectiyes:
Conjunction = function definition
Disjunction = function definition
Inference patterns:
Modus ponens = function definition

This primitive is included as a component of the deductive knowledge


ofa module.
Next, we shall explain each one of the components of the local logic
definition.
Truth values. This component defines the set of linguistic terms that
will be used in the logical valuation offacts, rules and meta-rules of the module
where this logic is to be used. Different modules can have different sets of
linguistic terms.
Renaiming. Modules in a KB define a hierarchy of tasks. Each of the
modules can have a different logic, so it is necessary to define a way of
interconnecting these different logics. In MILORD this is done in a declarative
way. Each module that contains several sub modules has a set of morphism
definitions that translate the valuations of predicates in the submodules to
valuations in the logic of the module.

ModuleB=
begin
Module A=
begin
ImportC
ExportP
Deductive knowledge:
Rules:
Rl ifC then conclude P is possible
258

Inference system:
Truth values = (false, possible, true)
End deductive
end
ImportD
ExportQ
Deductive knowlegdge:
Rules:
RI if NP and D then conclude Q is
quite-POssible
Inference system:
=
Truth values (impossible, moderatelY-POssible,
quite-POssible, sure)
Renaming = Nfalse = = > impossible
Npossible = = > quite-POssible
Ntrue = = > sure
end deductive
end

Notice in the above example that the predicate P exported by the


submodule A of B which is used in the rule defined in B will be evaluated with
one of the three values: false, possible or true. To use this fact in the module B
we need to change that value for a different one which can be used by the logic
defined in B. This is done by the via of the renaming definition.
Connectives. This component defines the function that will be used in
the deduction process associated with the module. Different multiple-valued
functions can be defined or elicited depending on the task defined by the
module. Next we explain the connectives elicitation process.

OPERATOR ELICITATION WITH LINGUISTIC TERMS


The elicitation of connective operators has been widely studied when
truth values are expressed in the unit interval [0, 1]. On the contrary, little
effort has been devoted to study what such operators would be like in the case of
a finite number of truth-values. This problem has been encountered in the field
of Expert Systems when trying to model expert reasoning by means of
linguistically expressed uncertainty about the truth of rules and facts (Godo et
aI., 1989). Most previous works (Lopez de Mantaras, 1990) in generating
operators for lingustic terms used some kind of discretization on the continuous
truth-space [0, 11. In this approach the expert was requiered to give a
numerical representation for the linguistic terms (intervals, fuzzy intervals,
fuzzy labels). then, a combination function in [0, 1] was selected to model a
logical connective. The selection was made according to some properties the
function should fulfill. Next, the selected function was applied to the
representations of terms, and, whenever the result of a combination lied
outside the term set, it was approximated to the "closest" term, in order to keep
the term set closed under combinations. This approach has some drawbacks,
however;
259

- often the experts supplying the knowledge are not able to define the
meaning of the linguistic values using a numerical scale, although they have
no difficulty in ordering them.
- different experts might not agree on the representation of some or all
the linguistic values.
- the necessary approximation process does not always ensure that
resulting operators satisfy the properties which originally were required to the
functions used to generate them.
These disadvantages lead us to propose an alternative approach (Lopez
de Mantaras et aI., 1990). The central idea consists in treating linguistic terms
as mere labels without assuming any underlying numerical repesentation, and
then eliciting the connective operators directly on the set of labels. The only a
priori requirement is that these labels should represent a totally ordered set of
linguistic expressions about uncertainty. For each logical connective, a set of'
desirable properties of the corresponding operator is listed. These properties act
as constraints on the set of possible solutions. In this way, all operators
fulfilling the set of properties are generated. Mterwards, the domain expert
may select the one he thinks fits better his own way of uncertainty
management. This approach can be easily implemented by formulating it as a
constraint satisfaction problem, and most of the disadvantages of the former
approach are avoided.

META-REASONING BY INTROESPECTION USING


UNCERTAINTY
Having considered uncertainty as a logical component of the
COLAPSES language, i.e. the semantics of formulae, the control of reasoning
under the uncertainty must be considered as a component of the meta-logic.
Thus the meta-inference over the uncertainty will determine which will the
inference control be at the logic level. This meta-inference acts upon the logic
component using mechanisms of introspection, that is, the same language
represents the uncertainty of the propositions and provides mechanisms both
to look at this uncertainty and to determine the control to be followed.
This meta-control is defined as a component of the modules, allowing a
local meta-logic definition. This control component acts over the deductive
knowledge and over the submodules hierarchy. It determines which rules and
submodules are useful for the current case. The mechanism of interaction
between both components is a reflexion mechanism: the deductive component
reflects on the control component to know which will be the next strategic step,
which submodule to execute next, or which rule to use next.
It is not a full reflection mechanism because we allow the meta-logic to
see only the valuation of atomic formulae (facts) and the valuation of strategies
(sets of modules that combined can lead the system to the solution of the
problem), rules and meta-rules can not be consulted by the meta-logic.
This general mechanism is used to guide the inference process in
different directions; we are going to discuss some of them.

EVIDENCE INCREASING
The current uncertainty of facts can be used to control the deduction
steps in order to increase the evidence of a given hypothesis. So, for example, if
260

we have an alcoholic patient with a cavitation in the chest x-ray and there is
low evidence for tuberculosis, then the Ziehl-Nielssen test to determine more
clearly whether he has a tuberculosis should not be done. But if he presents a
risk factor for AIDS then we shall increase our evidence for tuberculosis and
the test will be suggested. This is expressed as follows:

If tuberculosis> moderately-possible
then conclude Test Ziehl-Nielssen

If risk_factor_for_AIDS then conclude tuberculosis is possible

If Alcoholic and Cavitation


then tuberculosis is almost_impossible

Remark: The first rule is a rule of the meta-logic component of the


language whilst the others are rules at the logic level.

STRATEGY FOCUSING
The uncertainty of facts can determine the set of hypothesis to be
followed in the sequel.

Example:

If the pneumonia is bacterial with certainty <quite-POssible and


the pneumonia is atipical with certainty> possible
Then consider
Mycoplasm, Virus, Clamidia, Tuberculosis, Nocardia,
Criptococcus, Pneumocistis-Carinii
with certainty quite possible

This example means that the modules to be used in order to find a


solution to the current case are those indicated in the conclusion of the meta-
rule and should be considered in the order specified there.
Strategies have a certainty degree attached to them. This is useful to
differentiate the strategies generated by every especific data, from those
generated by general data. As an example consider the case of a patient with
AIDS (which is a kind of immunodeprtlssion). If we know that the patient
suffers from AIDS, a more specific strategy (and also more certain) can be
generated. But if just know that the patient has a immunodepression a less
certain general strategy would be generated. Since we may have several
candidate strategies simultaneously, combining different strategies is a matter
of great importance in the control of the system. This is also achived by looking
at the uncertainty of the strategies, as the next example shows:

If Strategy (X) and Strategy (Y) and Certainty (X) > Certainty (Y)
and Goals (X) n Goals (Y) ~ 0
Then Ockham (X, Y)
261

where Ockham (X, Y) is a combination of the strategies that favour those


moduls found in the intersection of both strategies

KNOWLEDGE ADEQUATION
As indicated at the begining of the paper a KB is a set of knowledge
units that have to be adapted to the current case. For example alcoholism is a
useful concept when determining a bacterial pneumonia, but it is useless for
non-bacterial diseases. Then, for example a possible use of the uncertainty of
the fact bacterianicity is to decide about the use of a given concept in the whole
KB, i.e. to adequate the general knowledge to the particular problem.
Example:

If no bacterian disease
then do not consider alcoholism in the search of the solution

SOLUTION ACCEPTANCE
The degree of uncertainty of a fact can also be used to stop the
execution of the system. For example

If Pneumocitis-carinii and tuberculosis < possible


and Criptococcus < possible
Then stop

The control tasks we have discussed use uncertainty as a control


parameter and are tasks ofthe meta-logic level. They are represented as a local
meta-logic component of each module in what is called the control knowledge
component of a module. In the next paragraph we shall describe in some detail
this locality.

METACONTROLAND LOCALITY
The structured definition of KB's helps not only in the definition of safe
and maintainable KB's but also gives some new features that where impossible
to achieve in the previous generation of systems. Among them the most
important is the possibility of defining a local meta-logical components for each
one of the modules.
The definition of strategies <ordered set of elementary steps to solve a
problem) in a previous version of the MILORD system (Godo et aI., 1989) was
made globally. Only one strategy could be active at any moment. Presently, as
many strategies as nodes in the module graph structure can be active. This
flexibility is linked with the fact that each module can have a different
treatment of tlncertainty. So, the uncertainty plays a different role as a control
feature depending on the association between module and logic.
Furthermore, given the fact that the system consists of a hierarchy of
submodules the meta-logical components act ones upon the others in a
pyramidal fashion. This allows us to have as many meta-logic levels as
necessary in an application. Further research will be purused along this line. A
richer representation of the logic components in the meta-logic will also be
investigated and sound semantics from the logic point of view will be defined.
262

CONCLUSION
One interesting aspect of building expert systems is to learn something
about human problem solving strategies by trying to reproduce them in
programs. Human problem solver's are uncertain in many situations and do
not use a simple normative method to handle uncertainty. Instead they take
advantage of a good organization in the problem solving task to obtain good
solutions using qualitative approximations. This suggests to consider
uncertainty as playing an important role at the control level by guiding the
problem solving strategies. In order to illustrate these points, we have
described a modular architecture and language that extensively exploits
uncertaintyas a control feature and uses local context dependent combination
and propagation uncertainty operators.

BIBLIOGRAPHY
1. AgustiJ., Sierra C., Sannella D. (1989): "Adding generic modules to
flat-rules based languages: a low cost approach", in Methodologies for
Intelligent Systems 4, (Z. Ras ed), Elsevier Science Pub, pp43-51.
2. Aiello L., Levi G. (1988): ''The uses of Metaknowledge in AI Systems", in
Meta-Level Architectures and Reflection (P. Maes, D.Nardi, ed.), North-
Holland,243-254.
3. Cohen P.R., Day D., De lisio J., Greenberg M., Kjeldsen R., Suthers D.,
Berman P. (1987) : "Management of Uncertainty in Medicine",
I nternational Journal ofApproximate Reasoning 1: 103-116.
4. Godo L., LOpez de Mantaras R., Sierra C., Verdaguer A. (1989): "MILORD,
the architecture and management of linguistically expressed
uncertainty", Int . Journal ofIntelligent Systems, vol. 4, n.4, pp 471-50l.
5. Kuipers B., Moskowitz A. J ., KassirerJ. P. (1988): "Critiacl Decisions under
Uncertainty: Representation and Structure", Cognitive Science 12, 177-
210.
6. LOpez de Mantaras R., Godo L., SangOesa R. (1990): "Connective operators
Elicitation for Linguistic Term Sets", Proc. Intl. Conference on Fuzzy
Logic and Neural Networks, Iizuka, Japan, 729-733.
7. LOpez de Mantaras R. (1990): "Aproximate Reasoning Models", Ellis
Horwood Series in Artificial Intelligence, London.
8. Sierra C., Agusti J . (1990): "COLAPSES: Syntax and Semantics", CEAB
Reserach Report 90/8.
13
FUZZY LOGIC
WITH LINGUISTIC QUANTIFIERS
IN GROUP DECISION MAKING

Janusz Kacprzyk*, Mario Fedrizzi**


and Hannu Nurmi***

* Systems Research Institute,


Polish Academy of Sciences, ul. Newelska 6,
01-447 Warsaw, Poland
** Institute of Informatics, University of Trento,
Via Rosmini 42, 38100 Trento, Italy
*** Department of Political Science,
University of Turku, SF - 20500 Turku, Finland

Abstract
We present how fuzzy logic with linguistic quantifiers, mainly its calculi of
linguistically quantified propositions, can be used in group decision making.
Basically, the fuzzy linguistic quantifiers (exemplified by most. almost all, ... ) are
employed to represent a fuzzy majority which is in many cases closer to a real
human perception of the very essence of majority. Fuzzy logic provides here means
for a formal handling of such a fuzzy majority which was not possible by using
traditional formal apparata. Using a fuzzy majority, and assuming fuzzy individual
and social preference relations, we redefine solution concepts in group decision
making, and present new «soft» degrees of consensus.

Keywords
Fuzzy logic, linguistic quantifier, fuzzy preference relation, fuzzy majority,
group decision making, social choice.

1. INTRODUCTION

Decision making, whose essence is basically to find a best option from among
some feasible (relevant, available, ... ) ones, is what human beings constantly face
in all their activities. In virtually all nontrivial situations decision making does
require intelligence.
264

Due to an increasing complexity of environments in which decisions are to be


made today, the human decision maker is often under pressure and stress, and
overloaded. Some (computerized) decision support may be therefore of much help.
Since, as we mentioned before, intelligence is required, a decision support system
should be what might be termed intelligent. However, in spite of a considerable
progress in broadly perceived artificial intelligence, we are still far from knowing
defmitely how to devise intelligent systems, i.e. in our context how to introduce
intelligence into decision support systems.
One of crucial difficulties in this respect is that decision support should rely on
some formal decision making models. Unfortunately, though there is an abundance
of them, for virtually all imaginable situations, they have been developed within a
traditionally perceived mathematical direction where, roughly speaking, «nice»
formal properties have had priority over «human consistency». This has led to some
crucial problems among which what may be termed an implementation barrier is
certainly of primal concern. Basically, its essence is that the human decision makers
are often not willing to accept results obtained by formally (mathematically) valid
models.
Attempts to incorporate some sort of human consistency (which may be viewed
as a first step to the incorporation of intelligence) in decision making models have
been undertaken for a long time (see e.g. Braybrook and Lindblom, 1963). For
instance, in this perspective we can view various aspiration - level - based
approaches in which, say, a strict optimization (which is often contradictory to a
real and human perception of the problem's specifics) is replaced by a much milder
requirement to attain some levels of satisfaction (see Simon, 1972).
There has also been attempts to attain the above mentioned human consistency
by means of fuzzy -logic - based tools. This has mainly involved the use of calculi
of linguistically quantified statements. These attempts have concerned multicriteria
decision making (cf. Kacprzyk and Yager, 1984a, b, 1990; Yager, 19831, b, 1984,
1985a, b), multistage decision making (Kacprzyk, 1983; Kacprzyk and Iwanski,
1987), and group decision making and consensus formation which will be discussed
in more detail in this paper. For more general papers on issues related to that fuzzy
-logic - based perspective on human consistency, see also Kacprzyk (1987b).
In this paper we will consider the problem of how fuzzy logic may be used to
attain a higher human consistency of group decision making and consensus models.
Such models, adopting our perspective, may help provide a basis of intelligent
decision support systems for group decision making and consensus formation.
The essence of group decision making may be summarized as follows. There is a
set of options and a set of individuals who provide their preferences over the set of
options. The problem is basically to find a solution meant to be an option (or a set
of options) which is best acceptable by the group of individuals as a whole.
Though the above basic problem formulation seems to be extremely simple,
maybe even trivial, it is certainly not. Since its very beginning group decision
making has been plagued by negative results exemplified by Arrow's general
impossibility theorem, Gibbard's and Satterthwaite's results on the manipulability
of social choice functions, McKelvey's and Schofield's findings on the instability of
solutions in spatial contexts, etc. (Arrow, 1963; Gibbard, 1973; Safferthwaite,
1975; McKelvey, 1979; Schofield, 1984; see also Nurmi, 1987; Nurmi, Fedrizzi
and Kacprzyk, 1990). Basically, all these fmdings can be summarized as follows: no
265

matter which group choice procedure we will employ, it will satisfy some set of
plausible conditions but not another set of equally plausible ones. This general
property pertains to all possible choice procedures, so that attempts to develop new,
more sophisticated choice procedures do not seem very promising in this respect
Much more promising seems to be to modify some basic assumptions underlying
the group decision making process. This line of reasoning is pursued here.
Since the process of decision making, notably of group type, is centered on the
human beings, with their inherent subjectivity, imprecision and vagueness in the
articulation of opinions, etc., fuzzy sets have been used in this field for a long time.
A predominant research direction is here based on the introduction of an individUIJI or
social fuzzy preference relation which is then used to fmd some choice sets. There is
a rich literature on this topic (cf. Tanino, 1984, 1988, or many articles in Kacprzyk
and Fedrizzi, 1990), and since this is not explicitly related to the use of fuzzy logic,
we will not discuss these issues in more detail here (though we will assume that the
preference relations are fuzzy). We will concentrate on other elements of group
decision making models where a contribution of fuzzy logic can be explicitly
demonstrated.
One of basic elements underlying group decision making is the concept of a
majority (notice that the solution is to be some option(s) best acceptable by the
group as a whole, that is by most of its members since in no real situation it would
be accepted by all). Some of the above mentioned problems with group decision
making are closely related to a (too) strict perception of majority (e.g., at least a
half). A natural line of reasoning is to try to somehow make that strict concept of
majority closer to its human perception. And here, we fmd many examples in all
kinds of human judgments that what the human beings consider as a required
majority to, say, justify the choice of a course of action is often much more vague.
A good example in a biological context may be found in Loewer and Laddaga
(1985): «... It can correctly be said that there is a consensus among biologists that
Darwinian natural selection is an important cause of evolution though there is
currently no consensus concerning Gould's hypothesis of speciation. This means
that there is a widespread agreement among biologists concerning the frrst matter
but disagreement concerning the second ... ». A rigid majority as, e.g., more than
75% would evidently not reflect the essence of the above statement. It should be
noted that there are naturally situations when a strict majority is necessary, for
obvious reasons, as in all political elections.
To briefly summarize the above considerations, we can say that a possibility to
accommodate a less rigid (<<soft») majority (as, say, an equivalent of a widespread
agreement in the above citation) would certainly help make group decision models
more human consistent
It is easy to see that most natural manifestations of such a «soft» majority are
the so-called linguistic quantifiers as, e.g., most. almost all. much more than a hal/.
etc. One can readily notice that no conventional formal (e.g., logical) apparatus
provides means for handling such quantifiers since, e.g., in virtually all
conventional logics only two quantifiers, at least one and all are accounted for.
Fortunately enough, there have been proposed in recent years some fuzzy -logic
- based calculi of linguistically quantified propositions (Yager, 1983a, b; Zadeh,
1983) which can make it possible to handle fuzzy linguistic quantifiers. These
calculi have been applied by the authors to introduce a fuzzy majority (represented
266

by a fuzzy linguistic quantifier} into group decision making and consensus


formation models (Fedrizzi and Kacprzyk. 1988; Kacprzyk. 1984. 1985b. 1986.
1987a; Kacprzyk and Fedrizzi. 1986. 1988. 1989; Kacprzyk. Fedrizzi and Nurmi.
1990; Kacprzyk and Nurmi. 1988; Nurmi and Kacprzyk. 1990; Nurmi. Fedrizzi and
Kacprzyk. 1990). and also in an implemented decision support system for consensus
reaching (Fedrizzi. Kacprzyk and Zadrozny. 1988; Kacprzyk. Fedrizzi and Zadrozny.
1988).
All that is clearly an example of a contribution fuzzy logic (with linguistic
quantifiers) can make to qualitatively improve group decision making models.
We will briefly present below the essence of this approach trying to maintain
readability. and referring the reader who might be interested in more detail to a
proper literature.
Our notation related to fuzzy sets is standard. A fuzzy set A in X. A !:: X. is
characterized by. and often equated with its membership function ~A: X ~ [0. 1];
~A(X} E [0. 1] is the grade of membership of x in A. from full membership to full
nonmembership through all intermediate values. For a finite X = {Xl •...• Xn} we
write A = ~A (XI)/XI + ... + ~A (xn}/xn where '~A (XV/Xi' is the pair 'grade of
membership-element' and «+» is meant in the set - theoretic sense. Moreover. we
denote a A b =min (a, b). a v b =max (a. b). and «~» stands for an implication
operator in multivalued logic. Other. more specific notation will be introduced when
needed

2. FUZZY· LOGIC • BASED CALCULI OF LINGUISTICALLY


QUANTIFIED PROPOSITIONS

Linguistically quantified propositions (statements) are commonly used in


everyday life and may be exemplified by. say. «most experts are convinced» or
«almost all good cars are expensive».
In general. we can write a linguistically quantified proposition as

Qy'sareF (1)

where Q is a linguistic quantifier (e.g .• most). Y = {y} is a set of objects (e.g .•


experts). and F is a property (e.g.• COllvinCed).
It is quite natural that we may wish to assign to the particular y's (objects) a
different importance (or relevance from the point of view of the fact mentioned in
the statement}. Importance. B. may therefore be added to (I) yielding

QBy'sareF (2)

that is. say. «mOst (Q) of the important (B) experts (y's) are convinced (F)>>.
For our purposes. the main problem is now to find the truth of such
linguistically quantified statements. i.e. eiTher truth (Qy's are F) or truth (QBy's are
F) knowing truth (Yi is F). 'V Yi E Y. This may be done using two basic calculi.
one due to Zadeh (1983) and one due to Yager (1983a. b). In the following we will
present the essence of Zadeh' s calculus since it is simpler and more transparent.
267

hence better suited for the purposes of this volume. though we should bear in mind
that in many instances Yager's calculus may be more «adequate» (cf. Kacprzyk.
1986. 1987b; Kacprzyk and Fedrizzi. 1989).
In Zadeh's (1983) method. a/wzy linguistic quantifier Q is assumed to be a
fuzzy set defined in [0.1].
For instance. Q =«most» may be given as

IlcmOSl» (x) = 1 for x ~ 0.8


= 2x - 0.6 for 0.3 < x < 0.8
= 0 for x!!> 0.3 (3)

which may be meant as that if at least 80% of some elements satisfy a property.
then most of them certainly (to degree 1) satisfy it. when less than 30% of them
satisfy a property. then most of them certainly do not satisfy it (satisfy to degree 0).
and between 30% and 80% - the more of them satisfy that property. the higher the
degree of satisfaction by most of the elements.
Notice that we will consider here the proportional quantifiers exemplified by
«most». «almost all». etc. as they are more important for the modelling a fuzzy
majority than the absolute quantifiers exemplified by «about 5». «much more than
10». etc. The reasoning for the absolute quantifiers is however analogous.
Property F is defined as a fuzzy set in Y. For instance. if Y = (X. Y. Z) is the
set of experts and F is a property «convinced». then F may be exemplified by F =
«convinced» =0.1/X + 0.6IY + 0.8/Z which means that expert X is convinced to
degree 0.1. Y to degree 0.6 and Z to degree 0.8. If now Y = (Yt. .. .. yp), then it is
assumed that truth (Yi is F) =J.l.F (Yi). i =1• ...• p .
The value of truth (Qy's are F) is determined in the following two steps (Zadeh.
1983):

r =I,Count(F) I I,Count (y) = ~ L J.I. ~y i)


i-I
(4)

truth (Qy's are F) =~ (r) (5)

Basically. (4) determines some mean proportion of elements satisfying the


property under consideration. and (5) determines the degree to which this percentage
satisfies the meaning of Q.
In the case of importance added. B is defined as a fuzzy set in Y. and J.l.B (yJ E
[0.1] is a degree of importance of Yi: from 1 for definitely important to 0 for
definitely unimportant. through all intermediate values. For instance. B =
«important» =0.2/X + 0.5IY + 0.6/Z means that expert X is important (competent)
to degree 0.2. Y to degree 0.5. and Z to degree 0.6.
We rewrite first «QBy's are F» as «Q(B and F)y's are B» which leads to the
followings counterparts of (4) and (5)
268

r' =ICount(B and F) I ICount (B) =


p p

= L JlS<Y)
i-I
1\ Jl..<Y) / L JlB(Yj)
I-I
(6)

truth (QBy's are F) =~ (r') (7)

The essence of these two steps is similar as that of (4) and (5).

Example 1. Let Y =«experts" ={X. Y, Z}, F =«convinced» =0.1/X + 0.6/Y +


0.8/Z, Q = «most» be given by (3), B = «important = 0.2/X + 0.5ty + 0.6/Z.
Then: r =0.5, r' =0.8, and truth (<<most experts are convinced») = 0.4 and truth
(<<most of the important experts are convinced») =1.
The method presented is simple and efficient and has proven to be useful in a
multitude of cases. Sometimes, however, it may lead to somewhat counterintuitive
results (cf. Yager, 1983b). An alternate calculus by Yager (198380 b; 1985a, b) may
often be more useful though it is far more complicated, conceptually and
numerically, and will not be dealt with here.

3. GROUP DECISION MAKING UNDER FUZZY PREFEREN·


CES WITH A FUZZY MAJORITY REPRESENTED BY A
LINGUISTIC QUANTIFIER

The purpose of this section is to redefine some solution concepts of group


decision making under fuzzy preference relations by employing Zadeh's fuzzy logic
- based - calculus of linguistically quantified propositions to deal with a fuzzy
majority.
To set the stage for our next discussion, we will sketch the essence of group
decision making. We have therefore a set of n options, S = {sI, ... , So}, and a set of
m individuals, I = {I ..... m}. Each individual k E I provides his or her preferences
over S. Since these preferences may be not clear - cut, their representation by
individual fuzzy preference relations is strongly advocated (see, e.g., the articles in
Kacprzyk and Fedrizzi, 1990).
A fuzzy preference relation of individual k. Rk. is given by its membership
function JlR k : S x S -+ [0. 1] such that

JlRk (Sit Sj) = 1 if Si. is definitely preferred over Sj


= c E (0.5,1) if Si. is slightly preferred over Sj
= 0.5 if there is no preference (i.e. indifference)
= de (0,0.5) if Sj is slightly preferred over Si.
= 0 if Sj is definitely preferred over Si. (8)
269

If card S is small enough, as we assume here, RIt,may be represented by a matrix


R = [r~~ , rt = J1R t (S. s~ ; i, j= 1, .•. n; k = 1, •.. m. Rit is commoly
assumed (also here) reciprocal, i.e. rt + rti =1; moreover, r~ =0, for all i, j, k.
The fuzzy preference relations, similarly as their nonfuzzy countrparts, are
evidently a point of departure for devising a multitude of solution concepts.
Basically, two lines of reasoning may be followed here (cf. Kacprzyk, 1986):
a direct approach
{Rt, ... , Rm} ~ solution
- an undirect approach
{RJ, ... , Rm} ~ R ~ solution
that is, in the first case we determine a solution just on the basis of the individual
fuzzy preference relations, and in the second case we form fJrSt a social fuzzy
preference relation (defined similarly as its individual counterpart but concerning the
whole group of individuals) which is then used to find a solution. A solution is here
not clearly understood - see, e.g., Nurmi (1983, 1988 a) for diverse solution
concepts.
More details related to the use of fuzzy preference relations as a point of departure
in group decision making can be found in, e.g., Nurmi (1988) and in other articles
in Kacprzyk and Roubens (1988) or Kacprzyk and Fedrizzi (1990).
Here we will show how to redefine some better known solution concepts, the
core for the direct approach and the consensus winner for the indirect approach, using
aJuzzy majority represented by a linguistic quantifier.

3.1 . Direct derivation of a solution - the core


Among many solution cncepts proposed in the literature for the direct approach
(i.e. for (R}. ...• Rml ~ solution) the core is intuitively appealing and often used.
Conventionally. the core is defined as a set of undominated options, i.e. those not
defeated in pairwise comparisons by a required majority (strictI) r S; m, i.e.

C ={Sj e S: ..., 3 Si e S such that r~ > 0.5 for at least r individuals} (9)

Nurmi (1981) extends the core to the fuzzy a - core defined as

Ca = {Sj e S: ..., 3 Si e S such that r~ > a ~ 0.5 for at leastr individuals}


(10)

i.e. as a set of options not sufficiently (at least to degree a) defeated by the required
majority.
Suppose now that the required majority is imprecisely specified as, e.g., given
by a fuzzy linguistic quantifier as, say, most defined by (3).
While trying to redefine the above concepts of cores under a fuzzy majority, we
start by denoting
270
It
h ij = 1 ifrt < 0.5
=0 otherwise (11)

where here and later on in this section. if not otherwise specified. i. j = 1•.. .• n and
It
k = 1•.. .• ffi. Thus. hij reflects if option Sj defeats Sj or nOL
Then

(12)

is the extent to which individual k is not against option Sj.


Next

(13)

is to what extent all the individuals are not against Sj.


AOO

(14)

is to what extent Q (say. most) individuals are not against sj-


The fuzzy Q - core is now defmed as a fuzzy set

(15)

i.e. a fuzzy set of options that are not defeated by Q (say. most) individuals.
Analogously. by introducing a threshold of the degree of defeat in (11). we can
define the fuzzy a./Q - core. First. we denote

It It
hij(a) = 1 if hij < a ~ 0.5
= 0 otherwise (16)

It
and then. following the line of reasoning (12) - (15). and using hj (a) • hj (a)

and v~. respectively. we define thefuzzy a./Q - core as

(17)

i.e. a fuzzy set of options that are not sufficiently (at least to degree 1 - a) defeated
by Q individuals.
271

We can also explicitly introduce the strength of defeat into (11) and define the
fuzzy s/Q - core. Namely, we can introduce a function like

k
hij = 2 (0.5 -
k
rij) if rt < 0.5

= 0 otherwise (18)

and then, following the line of reasoning (12) - (15), but using h~ ,hj and Y~
instead of h ~ , h j and Y~, respectively, we define the fuzzy slQ - core, as

(19)

i.e. as a fuzzy set of options that are not strongly defeated by Q individuals.
Example 2. Suppose that we have four individuals, k = 1,2, 3,4, whose fuzzy
preference relations are

j j

o.t]
Rl = [ 0~7 0.3 0.7
o 0.6 0.6
0.3 0.4 0 0.2 R2 =
0.4 0.6 0.2]
[ 006 o 0.7 0.4
0.9 0.4 0.8 0 0.4 0.3 0 0.1
0.8 0.6 0.9 0

j j

[ 0~5 [ 0.30~6
0.5 0.7 0.4 0.7 0.8]
o 0.8 o 0.4 0.3
0.3 0.2 0 0.6 0 0.1
1 0.6 0.8 0.7 0.7 0.9 0

Suppose now that the fuzzy linguistic quantifier is Q = «most» defined by (3).
Then, say,

Ccmost» = 17 S2 + 1/ S4
30
CO.3/cmost» = 0.9/S4
Cs/cmost» = 0.4/S4
272

that is. for instance. in case of Ccm01t» option S2 belongs to the fuzzy Q - core to
the extent 17/30 and option S4 to the extent 1. and analogously for Co.3/cmost» and
C./cmo1t». Notice that though the results are different, for obvious reasons. S4 is
clearly the best choice which is evident if we examine the given individual fuzzy
preference relations.

3.2. Indirect derivation of a solution - the consensus winner


We follow now the scheme {RI ..... Rml -+ R -+ solution. i.e. from the
individual fuzzy preference relations we determine fIrst a social fuzzy preference
relation. which is similar to its individual counterpart but concerns the whole group
of individuals. and then fmd a solution from the social fuzzy preference relation.
We will not deal here with the first step. i.e. {R1o .... Rml -+ R. and assume
that R =[rij] is given by

1
= mLat
ID

rij if i 'jt: j
.1<-1

=0 otherwise (20)

where

k
aij = 1 if rt
> 0.5
=0 otherwise (21)

Notice that R need not be reciprocal (for reciprocal R 10 .... Rm). For other
approaches to the determination ofR. see. e.g.• Blin and Whinston (1973).
We will discuss now the second step. i.e. R -+ solution. that is how to
determine a solution from a social fuzzy preference relation. A solution concept of
much intuitive appeal is here the consensus winner (Nurmi. 1981) which will be
extended here under a fuzzy majority expressed by a fuzzy linguistic quantifIer.
We start with

gij =1 if rij > 0.5


=0 otherwise (22)

which expresses whether Sl defeats Sj or not, and then

(23)

which is a mean degree to which option s;, is preferred over all the other options
options. Next
273

(24)

is the extent to which Si is preferred over Q other options.


Finally, we define thefuzzy Q - consensus winner as

(25)

i.e. as a fuzzy set of options that are preferred over Q other options.
And analogously as in the case of the core, we can introduce a threshold to (22),
i.e.

gij (a) =1 if rij > a ~ 0.5


=0 otherwise (26)

and then, following the reasoning (23) and (24), and replacing gi and zb by gi (a),
and zb (a), respectively, we can defme the fuzzy a/Q - consensus winner as

WaJQ = z~(a) I S1 + ... + zQ(a) I Sn. (27)

i.e. as a fuzzy set of options that are preferred over Q (say, most) other options.
Furthermore, we can also explicitly introduce the strength of preference into (22)
by, e.g., defining

gij = 2 (rij - 0.5) if rij > 0.5


= 0 otherwise (28)

and then, following the reasoning (23) and (24), and replacing gi and zb by &i
and z.b ,respectively, we can define thefuzzy slQ - consensus winner as

(29)

i.e. as a fuzzy set of options that are strongly preferred over Q other options.
For more details on the above solution concepts, as well as on some other ones,
see, e.g., Kacprzyk (1985b, c; 1986a) and Kacprzyk and Nurmi (1988).
Example 3. For the same individual fuzzy preference relations as in Example 2,
and using (20) and (21), we obtain the following social fuzzy preference relation
274

R =

If now Q =«most» is given by (3), then we obtain


1 11
Wemost» = 15/ SI + 15 / S2 + 1/ S4
1 11
W O.S/emOSl» =15 / sl + 15 / S4
2 11
WI/emost» =15 / SI + 15 / S2 + 1/ S4

which is not to be read similarly as for the fuzzy cores in Example 2. Notice that
here once again option s4 is clearly the best choice which is obvious by examining
the social fuzzy preference relation.
This concludes our brief exposition of how to employ fuzzy linguistic
quantifiers to model the fuzzy majority in group decision making. For readability
and simplicity we have only shown the application of Zadeh's calculus of
linguistically quantified propositions. The use of Yager's calculus is presented in the
source papers by Kacprzyk (1984; 1985b, c; 1986a; 1987a) or in the surveys by
Kacprzyk and Nurmi (1989) or Fedrizzi, Kacprzyk and Nurmi (1989). On the other
hand, information on some newer solution concepts based on individual and social
fuzzy preference relations which are the so-called fuzzy tournaments may be found in
Nurmi and Kacprzyk (1990).

4. «SOFT» DEGREES OF CONSENSUS UNDER FUZZY


PREFERENCES AND A FUZZY MAJORITY REPRESENTED
AS A FUZZY LINGUISTIC QUANTIFIER

In this section we will show how to use fuzzy linguistic quantifiers as


representations of a fuzzy majority to define a new «soft» degree of consensus as
proposed in Kacprzyk (1987), and then advanced in Kacprzyk and Fedrizzi (1986,
1988,1990), and Fedrizzi and Kacprzyk (1988). This degree is meant to overcome
some «rigidness» of conventional degrees of consensus in which full consensus (=
1) occurs only when «all the individuals agree as to all the issues». This may often
be countrinuitive, and not consistent with a real human perception of the very
essence of consensus (see, e.g., the citation from a biological context given in the
beginning of the paper). Our new degree of consensus can be therefore equal to I,
which stands for full consensus, when, say «most of the individuals agree as to
almost all (of the relevant) issues (options)>>.
275

Our point of departure is again a set of individual fuzzy preference relations


which are meant analogously as in Section 3 (see. e.g.• (8».
The degree of consensus is now derived in three steps. First. for each pair of
individuals we derive a degree of agreement as to their preferences between all the
pair of options. next we pool (aggregate) these degrees to obtain a degree of
agreement of each pair of individuals as to their preferences between Q 1 (a linguistic
quantifier as. e.g.• «most». «almost alb. much more than 50%» •... ) pairs of
relevant options. and, fmally. we pool these degrees to obtain a degreee of agreement
of Q2 (a linguistic quantifier similar to QI) pairs of important individuals as to their
preferences between Q 1 pairs of relevant options. This is meant to be the degree of
consensus sought The above derivation process may be formalized by using Zadeh's
calculus of linguistically quantified propositions outlined in Section 2.
We start with the degree of strict agreement between individuals ki and k2 as to
their preferences between options Si and Sj

Vij (ki. k2) = 1 if r~.1


IJ
= r~.2
IJ
= 0 otherwise (30)

where here and later on in this section. if not otherwise specified. k 1 = 1•...• m - 1;
k2 = ki + 1•...• m; i = 1•...• n - 1; j =i + 1•...• n.
Relevance of options is assumed to be a fuzzy set defined in the set of options
such that JJ.B (sJ E [0.1] is a degree of relevance of option Si: from 0 standing for
«defmitely irrelevant» to 1 for «definitely relevant». through all intermediate values.
Relevance of a pair of options. (Si. Sj) E S X S. may be defmed in various ways
among which

(31)

is certainly the most straightforward; obviously. b~ = bn. and b~'s are


irrelevant since they concern the same option.
And analogously (or the importance of individuals. I. which is defined as a fuzzy
set in the set of individuals. with JJ.I (k) E [0. 1]. k = 1•...• m. representing the
importance of individual k. from definitely important (= 1) to definitely unimportant
(= 0) through all intermediate values. Then. the importance of a pair of individuals.

b~ 1. k2 • E [0. 1] may also be defined in various ways among which the mean
value of type (31) is the most straightforward. and will be used here too.
The degree of agreement between individuals ki and k2 as to their preferences
between all the relevant pairs of options is

.-1 • 8-1 •

VB (ki. k2) = L L (Vij (ki. k2) b~ ILL b~


1=1 J-I+ 1
1\
I-I J-I+ 1
(32)
276

The degree of agreement between individuals ki and k2 as to their preferences


between QI relevant pairs of options is

(33)

In turn, the degree of agreement of all the pairs of important individuals as to


their preferences between Q1 relevant pairs of options is

(34)

and, finally, the degree of agreement of Q2 pairs of important individuals as to


their preferences between QI relevant pairs of options, called the degree of
QIIQ21IIB-consensus, is

con (QI, Q2, I, B) = J.lB(V~~ (35)

Since the strict agreement (30) may be viewed too rigid, we can use the degree of
sufficient agreement (at least to degree a E [0, 1]) of individuals ki and k2 as to
their preferences between options Si and Sj, dermed by

a
Vij (kI, k2) = 1 if Ir~-rm ~I-a~I
=0 otherwise (36)

Then, following the reasoning (31) - (35), we obtain the degree of sufficient
agreement (at least to degree a) of Q2 pairs of individuals as to their preferences
between QI pairs of relevant options (with replacements similar to those in Section
3), called the degree of alQlIQ21IIB - consensus, given by

con IX (QI, Q2, I, B) = J.lQ2 (v ~ I. ~ (37)

We can also explicitly introduce the strength of agreement into (30), and
analogously define the degree of strong agreement of individuals ki and k2 as to
their preferences between options Si and Sj, e.g., as

vij (kI, k2) =s Qr~ - rm) (38)

where s: [0, 1] -+ [0, 1] is some function representing the degree of strong


agreements as, e.g.,

s (x) = 1 for x ~ 0.05


= -lOx + 1.5 for 0.05 < x < 0.15
=0 for x ~ 0.15 (39)
277

such that x' < x" -+ s(x') ~ s (x"), for all x', x" E [0, 1], and s(x) = 1 for some x
E [0, 1].
Then, following the reasoning (31) - (35) (with replacements similar to those in
Section 3), we obtain the degree of strong agreement of Q2 pairs of important
individuals as to their preferences between Ql pairs of relevant options, called the
degree of slQl1Q21I1BI - consensus, as

COIlS (Ql, Q2) = JiQl (VQ1. W (40)

Example 4. Suppose that n =m =3, Ql =Q2 =«most» are given by (3), a =


0.9, s(x) is defined by (39), and the individual preference relations are

j
1 2 3

1 [0.0 0.1 0.6]


= 2 0.9 0.0 0.7
3 0.4 0.3 0.0

j
1 2 3

1 [0.0 0.1 0.7]


R2 = [r~ = 2 0.90.0 0.7
3 0.3 0.3 0.0

j
1 2 3

1 [0.0 0.2 0.6]


R3 = [r~ = 2 0.8 0.0 0.7
3 0.4 0.3 0.0

Now, we assume that b~ = I/S1 + 0.6/S2 + 0.2/s3, i.e. b~ = 0.8, b~ =


0.6 and b~ =0.2, and b! =0.8/1 + III + 0.4/3, i.e. b~ =0.9, b~ =0.6
I
and bIJ = 0.7

Therefore:
conB (<<most», «most», I, B) == 0.35
conO. 90 (<<most», «most», I, B) == 1.0
coni (<<most», «most», I, B) == 0.75
278

For more information on these degrees of consensus, see Fedrizzi and Kacprzyk
(1988), Kacprzyk (1987a) and Kacprzyk and Fedrizzi (1986, 1988, 1989). Moreover,
the use of Yager's fuzzy - logic - based calculus of linguistically quantified
propositions is given in Kacprzyk and Fedrizzi (1989).

s. CONCLUDING REMARKS

In this paper we have tried to show how fuzzy logic with linguistic quantifiers
can be used to model a fuzzy majority, and then to define new solution concepts and
degrees of consensus based on the fuzzy majority. Fuzzy quantifiers are certainly a
natural way of representing a fuzzy majority which cannot practically be adequately
represented by conventional formal means. On the other hand, fuzzy logic based
calculi of linguistically quantified propositions, in particular the one employed in
this paper, offer much simplicity and intuitive appeal, and can help attain more
human consistent, hence more adequate and easier implementable group decision
making and consensus formation models.

BmLIOGRAPBY

ARROW KJ. (1963), Social Choice and Individual Values, 2nd ed. Yale University
Press, New Haven.
BUN J.M. and A.P. WHINSTON (1973), Fuzzy sets and social choice. Journal of
Cybernetics, 4, 17 - 22.
BRAYBROOK D. and C. LINDBLOM (1963), A Strategy of Decision. Free Press,
New York.
CALVERT R. (1986), Models of Imperfect Information in Politics. Harwood
Academic Publishers, Chur.
FED RIZZI M. (1986), Group decisions and consensus: a model using fuzzy sets
theory (in Italian). Rivista per Ie scienze econ. e soc. A. 9, F. 1, 12 - 20.
FEDRIZZI M. and J. KACPRZYK (1988), On measuring consensus in the setting of
fuzzy preference relations. In J. Kacprzyk and M. Roubens (Eds.), Non -
Conventional Preference Relations in Decision Making. Springer - Verlag,
Berlin - New York - Tokyo, 129 - 141.
FEDRIZZI M., J. KACPRZYK and S. ZADROZNY (1988), An interactive multi - user
decision support system for consensus reaching processes using fuzzy logic
with linguistic quantifiers. Decision Support Systems 4, 313 -327.
KACPRZYK J. (1984), Collective decision making with afuzzy majority rule. Proc.
WOGSC Congress, AFCET, Paris, 153-159.
KACPRZYK J. (1985a), Zadeh's commonsense knowledge and its use in
multicriteria, multistage and multiperson decision making. In M.M. Gupta et
al. (Eds.), Approximate Reasoning in Expert Systems, North - Holland,
Amsterdam,105-121.
KACPRZYK J. (1985b), Some «commonsense» solution concepts in group decision
making via fuzzy linguistic quantifiers. In J. Kacprzyk and R.R. Yager
(Eds.), Management Decision Support Systems Using Fuzzy Sets and
Possibility Theory. Vedag TOv Rheinland, Cologne, 125-135.
279

KACPRZYK J. (1985c), Group decision - making with a fuzzy majority via


linguistic quantifiers. Part I: A consensory - like pooling; Part II: A
competitive - like pooling. Cybernetics and Systems: an Int. Journal 16, 119
- 129 (Part I), 131 - 144 (part II).
KACPRZYK J. (1986 a), Group decision making with afuzzy linguistic majority.
Fuzzy Sets and Systems 18, 195 - 118.
KACPRZYK J. (1986b), Towards an algorithmic/procedural «human consistency» of
decision support systems: afuzzy logic approach. In W. Karwowski and A.
Mital (Eds.), Applications of Fuzzy Sets in Human Factors. Elsevier,
Amsterdam, pp. 101 - 116.
KACPRZYK J. (1987a), On some fuzzy cores and «soft» consensus measures in
group decision making. In J.C. Bezdek (Ed.), The Analysis of Fuzzy
Information, Vol. 2. CRC Press, Boca Raton, pp. 119-130.
KACPRZYK J. (1987b), Towards «human consistent» decision support systems
through commonsense - knowledge - based decision making and control
models: afuzzy logic approach. Computers and Artificial Intelligence 6, 97-
122.
KACPRZYK J. and FEDRIZZI M. (1986), «Soft» consensus measures for monitoring
real consensus reaching processes under fuzzy preferences. Control and
Cybernetics 15, 309-323.
KACPRZYK J. and FEDRIZZI M. (1988), A «soft» measure of consensus in the
setting of partial (fuzzy) preferences. European Journal of Operational
Research 34, 315-325.
KACPRZYK J. and FEDRI2ZI M. (1989), A «human-consistent» degree of consensus
based on fuzzy logic with linguistic quantifiers. Mathematical Social
Sciences 18, 275-290.
KACPRZYK J. and FEDRIZZI M., Eds. (1990), Multiperson Decision Making
Models Using Fuzzy Sets and Possibility Theory. Kluwer, Dordrecht -
Boston - Lancaster - Tokyo.
KACPRZYK J., FEDRIZZI M. and NURMI H. (1990), Group decision making with
fuzzy majorities represented by linguistic quantifiers. In J.L. Verdegayand
M. Delgado (Eds.): Approximate Reasoning Tools for Artificial Intelligence.
Verlag TOY Rheinland, Cologne, 126-145.
KACPRZYK J. and NURMI H (1989), Linguistic quantifiers and fuzzy majorities for
more realistic and human-consistent group decision making, in G. Evans, W.
Karwowski and M. Wilhelm (Eds.): Fuzzy Methodologies for Industrial and
Systems Engineering, Elsevier, Amsterdam, 267-281.
KACPRZYK J. and NURMI H. (1990), On fuzzy tournaments and their solution
concepts in group decision making. European Journal of Operational
Research (forthcoming).
KACPRZYK J. and ROUBENS M., Eds. (1988), Non - Conventional Preference
Relations in Decision Making. Springer - Verlag, Berlin - New York -
Tokyo.
KACPRZYK J. and YAGER R.R. (1984a), Linguistic quantifiers and belief
qualification in fuzzy multicriteria and multistage decision making. Control
andCybemetics 13,155-173.
KACPRZYK J. and YAGER R.R. (1984b), «Softer» optimization and control models
viafuzzy linguistic quantifiers. Information Sciences 34,157-178.
280

KACPRZYK J., S. ZADROZNY and M. FEDRIZZI (1988), An interactive user -


friendly decision support systemjor consensus reaching based onjuzzy logic
with Unguistic quantifiers. In M.M. Gupta and T. Yamakawa (Eds.): Fuzzy
Computing. Elsevier, Amsterdam, 307-322.
LOEWER B., Guest Ed. (1985), Special Issue on Consensus. Synthese 62, No.1.
LOEWER B. and LADDAGA R. (1985), Destroying the consensus. In Loewer (1985),
79-96.
MCKELVEY R.D. (1979), General Conditionsjor Globallntransitivities in Formal
Voting Models. Economemca47, 1085-1111.
NURMI H. (1981), Approaches to collective decision making with fuzzy preference
relations. Fuzzy Sets and Systems 6, 249-259.
NURMi H. (1983), Voting procedures: a summary analysis. British Journal of
Political Science 13, 181-208.
NURMI H. (1987), Comparing Voting Systems. Reidel, Dordrecht - Boston -
Lancaster - Tokyo.
NURMI H. (1988), Assumptions on individual preferences in the theory of voting
procedures. In Kacprzyk and Roubens (1988), pp. 142-155. ,
NURMI H., M. FEDRIZZI and J. KACPRZYK (1990), Vague notions in the theory of
voting. In J. Kacprzyk and M. Fedrizzi (Eds.): Multiperson Decision Making
Models Using Fuzzy Sets and Possibility Theory. Kluwer, Dordrecht -
Boston - Lancaster - Tokyo, 43-52.
SCHOFIELD N. (1984), Existence of Equilibrium on a Manifold. Mathematics of
Operations Research 9, 545-557.
SIMON H.A. (1972), Theories of Bounded Rationality. In C.B. McGuire and R.
Radner (Eds.): Decision and Organization. North-Holland, Amsterdam.
SAFFERlHWAITE M. (1975), Strategy-proofness and Arrow's Conditions: Existence
and Correspondence Theoremfor Voting Procedures and Social Welfare
Functions. Joumal of Economic Theory 10, 187-217.
TANINO T. (1988). Fuzzy preference relations in group decision making. In
Kacprzyk and Roubens (1988), 54-71.
YAGER R.R. (1983a). Quantifiers in the formulation of multiple objective decision
junctions. Information Sciences 31, 107-139.
YAGER R.R. (1983b), Quantified propositions in a linguistic logic. International
Journal of Man - Machine Studies 19, 195-227.
YAGER R.R. (1984), General multiple - objective decision functions and
linguistically quantified statements. International Journal of Man - Machine
Studies 21, 389 - 400.
YAGER R.R. (1985a), Reasoning with fuzzy quantified statements: Part I.
Kybernetes 14,233-240.
YAGER R.R. (1985b), Aggregating evidence using quantified statements.
Information Sciences 3,179-206.
YAGER R.R. (1986), Reasoning with fuzzy quantified statements: Part II.
Kybernetes 15.111-120.
ZADEH L.A. (1983). A computational approach to fuzzy quantifiers in natural
languages. Computers and Mathematics with Applications 9,149-184.
ZADEH L.A. (1985), Syllogistic reasoning in fuzzy logic and its application to
usuality and reasoning with dispositions. IEEE Transactions on Systems,
Man and Cybernetics SMC -15, 754-763.
14
LEARNING IN UNCERTAIN
ENVIRONMENTS

Marco Botta, Attilio Giordana and Lorenza Saitta

Universita di Torino
Dipartimento di Informatica
Corso Svizzera 185
10149 TORINO (Italy)
E-mail: [email protected]

ABSTRACI'
In this paper we brieny survey the problems arising in learning concept
descriptions from examples in domains affected by uncertainty and vagueness. A
programming environment, called SMART-SHELL, is also presented: it addresses
these problems, exploiting fuzzy logic. This is achieved by supplying the learning
system with the capability of handling a fuzzy relational database, containing the
extensional representation of the acquired logic formulas.

INTRODUCI'ION
Knowledge acquisition has been recognized as a major problem for the quick
and low cost development of expert systems. In fact, knowledge elicitation is a hard
and time consuming task, especially in domains where there is a lack or shortage of
human experts and/or the knowledge is difficult to be formalized. As a consequence,
automated learning methods became appealing and machine learning is now receiving
an increasing attention.
Even though the complete automatization of the knowledge acquisition
process is beyond the possibilities of the current AI technology, developing tools
allowing a substantial part of the necessary knowledge to be flfSt acquired, and, next,
maintained and updated, is both a medium-term reachable goal and a very useful one.
These tools are likely to become, in the future, a fundamental part of expert systems
builders, provided that adequate interfaces towards knowledge engineers and domain
expert will be supplied.
Traditionally, machine learning tasks have ranged from acquiring concepts
descriptions from examples £1-4] to improving planning heuristics [5-7] and
knowledge representation schemes included logical formulas, decision trees (or
networks), production rules and semantic networks [8-10,11,31]. Researches on
scientific discovery [12-14] and concept formations [7,15-17] received also attention.
In all these problems, the notion of learning as a search process, in a space of
descriptions or hypotheses, plays a central role [18], especially in inductive
approaches.
Recently, new trends emerged, such as the proposal of chunking as a
general cognitive architecture [19] and the use of deductive methods to performs
"justified" learning [5,20-24] . As new, more complex tasks are faced, methods
282
become more refined and integrated models of learning are proposed with the hope of
coping with the complexity of real world tasks [25-28].
A great activity is also going on in the field of connectionist models of
learning, as it appears, from instance, from [29]. Another interesting approach is
also constituted by the genetic algorithm [30], presented as a general-purpose learning
method for parallel rule systems.
Unfortunately, many learning systems only work in ideal domains, in which
noise in the data and uncertainty in the task are absent. However, the effective use of
learning systems in real-world applications substantially depends upon the ability
these systems show in handling noise. Some kind of problems arising in real
applications are summarized in [31].
Several systems are provided with mechanisms for facing statistical noise,
such a'i the pruning techniques proposed to limit the sizes of decision trees [32-34].
Similar methods have been also proposed for knowledge represented in the form of
production rules, as in the AQ15 system, where the initially acquired rules are
truncated to limit complexity and avoid overfitting [35]. This kind of noise mainly
concerns random errors in assigning a value to an attribute or a label to a training
event. Several experiments have been performed to investigate the effects of this
noise on the effectiveness of the acquired knowledge [36].
However, statistical noise is not the only source of problems; in fact,
relevant concepts and relations can be ill-defined and vague. To this purpose, the
fuzzy set theory seems the most appropriate tool for handling this type of
uncertainty. We have to notice that a continuous-valued semantics, associated to the
description language, is a major source of complexity in learning methodologies.
Hence, very few systems are able to handle it explicitly and most of these limit
themselves to attaching weights to the pieces of acquired knowledge [37,38].
Fuzzy sets occur in symbolic learning methodologies with different roles.
In [39], they are used to describe concepts and the varying degrees of typicality of
their instances. In [40] the intensional description of a set of classes, to be
discriminated from each other, are expressed as fuzzy languages, learned from a set
of examples. This approach has been applied to problems in medical diagnosis [41].
Finally, in ML-SMART, a system which learns concept descriptions from
examples [42,43] and a domain theory [26,27], the use of fuzzy set theory has proved
to be very suited to transform continuous-valued features into a set of categorical
attributes and, in general, to define the vague semantics associated with real-world
terms and predicates, both in the description of the examples and in the domain
theory. ML-SMART is a learning system which uses a full memory approach,
supported by the special-purpose shell SMART-SHELL [44,45], especially designed
to ease the development of different learning systems. SMART-SHELL mainly
consists of a logic programming environment, interfaced toward a relational data-base
through a set of operators implementing the basic primitives necessary for a learner.
The logic environment has been realized in Common Lisp, whereas the data-base
manager has been tailored for the specific class of applications. This data-base differs
from the commercial relational data-bases in the sense that many standard features
have not been implemented, being not relevant to the particular use it is oriented to,
whereas other important aspects have been enhanced: a query language based on full
first order logics, including a set of non-standard quantifiers, and the capability of
handling continuous-valued semantics.
This paper is organized as follows. Section 2 briefly describes the learning
framework used in systems like ML-SMART. Section 3 describes the logic language
used for representing both the background knowledge and the acquired knowledge,
whereas Section 4 illustrates the behaviour of the basic operators, interfacing the
data-base and the learning system. Finally, Section 5 presents some conclusions.
283

THE LEARNING FRAMEWORK


The learning tasks addressed by the ML-SMART system is that of "learning
concept descriptions from examples" [45]. More precisely, the task can be fonnally
defined as follows:
Given: A set HO oflrnown concepts (classes),
A set FO of classified learning events, represented in an event description
language LE,
A concept description language L.
Find: For every concept h E HO' a formula cpE L such that cp ~ h, i.e., cp
consists of a set of conditions, sufficient for an example to be an instance
ofh.

EX •• EX.2 EXIJ

D. & 8·
8 b
0
II
m

~
Fig. 1 - Example of instances from a block world domain.

One of the peculiarities distinguishing ML-SMART from other systems, devoted to


the same task, consists in the use of a relational data-base as a working memory. In
particular, both the learning set FO and every inductive hypothesis cp E L, generated
during the search, are described extensionally using relations in the data-base. In the
following cp* will denote the extensional representation of the logical formula cpo
For the sake of exemplification, we will use a simple example from a block
world domain. Consider the set FO of instances reported in Fig. 1; they can be
described by means of a set of relations, two of which are reported in Fig. 2(a). All
the relations have the homogeneous fonnat <F,H,Xl,X2, ... ,Xn>, where F contains
the identifier of the event f, H the correct classification h, and Xl,X2, ... ,Xn are the
identifiers of the parts of f (objects) which satisfy the relation; later on, we will
slightly extend this basic scheme. By using the standard operators of relational
algebra [461, such as natural join, selection and projection, the extension cp* of a
generic fonnula cp can be computed from the extensions of the predicates occurring in
cp; Fig. 2(b) shows the extension of the fonnula cp = Triangle(x) A Large(x).
284

Circle(xl) Large(xI)

F H xl F H xl Triangle(xl) 1\ Large(xI)

EXI I b EXt I a F H xl
EXt I d EXt I c
I EXt I
EX~
EX 2
~
1 EX2 2 fh EXI I
2
a
tt
EX~ I k EX2 2
~*~ I J
~*~ I
EX I 1 k
m
(a) (b)

Fig. 2 - Example of relations associated to simple fonnulas, evaluated on the


instances of Fig. I. Relations in (a) are given by the teacher, the one in
(b) is computed from the preceding ones. By definition, example 1 and 3
are instances of a concept h I, whereas example 2 is an instance of another
concept h2'

The learning process can be modeler as a search through the space of fonnulas which
can be generated in this way [18,43]. However, the set of fonnulas having a non
empty extension may be too large and cannot be searched exhaustively. For this
reason, ML-SMART uses several strategies for limiting the the number of fonnulas
which are actually created and tested. In particular, it develops a tree of fonnulas
using a set of specialization operators; the root of the tree is the maximally general
formula "true", which obviously holds for all the events in FO, and the leaves are
either fonnulas corresponding to acceptable concept descriptions or fonnulas which
are no more interesting. Three kinds of criteria are used to bias the inductive process
in order to limit the size of the tree:
Simplicity and readability of the fonnulas.
Statistical criteria: fonnulas verified by many examples are preferred.
If background knowledge is available, fonnulas which can be deduced from it are
preferred. Moreover, fonnulas contradicting the background knowledge cannot be
generated. An extensive description of the methodology can be found in [42,43].
The SMART-SHELL environment provides the basic operators of
speciali7.ation (and generalization) necessary to implement a problem solver of the
type of ML-SMART, as well as a forwardlbackward inference engine, and the
primitives necessary for implementing the search strategies. The relational data-base
is a special-purpose one, implemented in such a way to achieve high speed on the
most critical operations. On the top of this data-base, a logic environment has been
implemented, as well as a user interface, designed to eases the process of supplying
the system with the background knowledge and application description.
The tool SMART-SHELL basically consists of three main modules:
SMART-CONF, SMART-RUN and SMART-DATABASE which provide the user
with a knowledge editor, a set of high level primitives and a data-base manager,
respectively. The scheme of the system is reported in Fig. 3.
The module SMART-CONF consists of a user-friendly interface, usable to
describe both the background knowledge in input to the learner and other kind of
knowledge (control knowledge) which has to be used by the learning strategies.
Moreover, it contains a set of compilation procedures which translate this kind of
knowledge in a more efficient fonn, internally used by the other two modules.
285

SMART-RUN basically implements the logic environment and the learning


operators which will be described in the next sections, whereas SMART-
DATABASE implements the low level procedures necessary for performing set
intersection, natural join, selection and so on. SMART-RUN is embedded in a
standard Common Lisp environment, whereas SMART-DATABASE is implemented
in C language, for the sake of efficiency.
The learning system is implemented on top of SMART-RUN and basically
consists of a set of high level learning strategies which guide the application of the
basic deductive and inductive operators. When such strategies are to be very
sophisticated, it is a good practice to implement them as knowledge intensive
procedures; to this aim the module SMART-CONF turns out to be useful again as a
true expert system shell.

Applications

ML-Smart ID-Smart INC-Smart SMART


Oassifier

BackjpOOd
and control
knowledge

SMART
SMART-SHELL
CONF

Evaluable
predicaaes

Fig. 3 - Scheme of the SMART-SHELL environment.

KNOWLEDGE REPRESENTATION

In the SMART-SHELL environment, a first order logic language L is used


to describe in a unified form aU the knowledge involved in the learning process: the
background knowledge (given in input to the learner), and the concept descriptions
(the learned knowledge). In addition, in order to facilitate the use of the system by the
part of human experts, which may be not familiar with the abstract logic notation,
SMART-SHELL offers a frame system, in the style of many standard expert system
shells, which allows the knowledge to be also represented in an equivalent object-like
286
paradigm. Acompiler automatically performs the conversion from the frame-format
representation to the logical one. The frame system can also be used as a tool for
implementing the learner itself. as it has been done for ML-SMART [43].
The logic language (, is a Hom clause language. extended with functors.
negation and quantifiers. In particular. a well formed formula (wft) of (, takes the
form:
(I)

where p is a predicate belonging to a predicate set p. the terms tl.t2 •....•tk and
sl.s2 •....•sn can be variables. constants or functions and <p is a logical expression
built up using predicates in the set p. the connectives 1\ and -, and the quantifiers
ATM. ATL and EX. These quantifiers stand for ATMost. ATLeast and EXactly.
respectively. and can be considered as an extension of the standard existential
quantifier (similar to the numeric quantifiers used in the system INDUCE [44]).
Fuzzy quantifiers are a very important extension to logical languages and have been
proposed and deeply analyzed by Zadeh [49] . More precisely. let 'I'(x l.x2•...•xm) be a
logical expression built up using only the connectives 1\ and -,; then. the expression:

is true of a given example f iff there exist at least n different bindings. between the
variables variables yl.Y2 •...• yk and the objects occurring in f. satisfying '1'. In an
analogous way. ATM n <ylS2 •...Sk> ['I'(xl.x2 •...•xm)] and EX n <Yl.Y2 •...• yk>
['I'(x l.x2 •...•xm)] require at most n and exactly n different bindings in order to be
satisfied. Notice that. for n = 1. the quantifier ATL corresponds to the existential
quantifier 3. whereas. for n =O. the quantifier ATM n corresponds to -,3. Quantifiers
can be nested according to the usual rules of the predicate calculus. For instance. the
expression :

ATL 1 <x> [EX 2 <y> [Triangle(x) 1\ Circle(y)]] (3)

is an example of a wff of the language (, (provided that the predicates Triangle and
Circle belong to P).
However. some structural restrictions are imposed on the formulas of (,. In
particular. the set of basic predicates P is divided into two disjoint subsets. p(o) and
p(n). The set p(o) contains predicates whose extension is evaluable. on the learning
set, by means of queries to the data-base manager; examples of predicates belonging
to p(o) are the ones reported in Fig. 2(a). By contrast, predicates in p(n) are defined
by means of implication rules such as (1). We recall that a predicate is evaluable
when the data-base manager has a procedure for computing. through a selection
operation. its extension from a given relation [47]; examples of standard evaluable
predicates are the arithmetic predicates >.< and =. According to the defmition given in
[26]. predicates in p(o) will be said operational and predicates in p(n) non-
operational.
=
Given the set of concepts HO (hl.h2 •...•hn). each concept hi corresponds
to a non-operational predicate. which is true of f iff f is an instance of hi' Concepts
descriptions are wffs of the type:
<p(o) ~ hi (hi E HO) (4)
where <p(o) is a conjunctive wff containing only operational predicates. As. in
general. more than one formula (4) is needed to completely define hi. all these
287

formulas are considered implicitly OR-ed. In the current implementation. the


following restrictions on wITs are set:
Predicates occurring within the scope of a quantifier must belong to the set p(0 ).
Negation ( ..., ) can be used only in formulas of the following format:

(5)

Expression (5) states that variables occurring in a negated predicate must also occur
in a non-negated one in the same formula. In this way. a simple extension of the
SLD-resolution can be used as inference engine.
In order to cope with the vagueness invariably associated with real-world
applications. a continuous-valued semantics has been associated to the language L.
Each formula <p(x1 •..•xn) E L has a corresponding truth degree ~ E [0.11. computed
by combining the truth degrees of the predicates occurring in <po For this reason. the
relation <p*. associated to the formula <po has been extended (with respect to the
format described in Fig. 2). by adding a new field M. containing the truth value ~ of
<p(x 1....xn) when x1•....• xn are bound to the objects specified in the corresponding
tuple.
The semantics of an operational predicate can be defined in two ways:
extensionally. by giving the corresponding relation on the data-base. or intensionally.
by defining a function on attribute values. This two specification forms can both be
used in the system. In particular. the implicit form is more compact and efficient but
needs an analytic definition. whereas the explicit form can always be given by simply
filling up a table when an analytic expression is not available. An example of
extensional semantic definition is given in Fig. 4.

F M xl x2 IN (xl, x2)
(Object xl is inside object x2)
EXI d c
EX~
EX
1.8
1.
1.
g
1 t
EX3 1.0 n m

Fig. 4 - Extensional definition of the predicate IN(x1.YV.

Furthermore. to ease the writing of semantic functions. the learning events FO are
usually described by means of a set of numerical and categorical attributes a1 •... ,8n:
to this aim. a new type of relation has been introduced in SMART-SHELL: the
attribute values can be all collected into a unique (n+3)-ary relation. called OBJ. The
fields F. H and X contain the identifier f of an event, the classification h of f and the
identifier x of a part of f. respectively. whereas the other n columns store the values
of the defined attributes for the object x. An an examples. the relation OBJ for the set
of instances in Fig. 1 is reported in Fig. 5.
288

F H X Shape Area Texture

~~l I t coangle
lfCle 18 §1~ed
~~l I ~ ~~~ 18 ~~ed
~~~ ~ f ~ircle
quare 98 §1~ed
~~~ ~ R :m~~lg A~ 8~
~~~ r i t~ele AU ~~ed
~~~
EX3
I1 fm
~~l{
Square
~8
70
~~:a~
Oear
EX3 1 n Tnangle 15 Oear

Fig. 5 - The relation OBJ describing the set of instances of Fig. 1.

The semantic evaluation of a predicate P. depending on the values of the attributes


al.a2 •...•3Jc. can be obtained by computing the value of a function:

(6)

where Ai is the domain of the attribute a i' A library of primitive functions has been
defined to this aim. For instance. the semantics of a Boolean predicate. such as
Triangle(x). can be specified as follows:
ir shape(x) = triangle then J.I.=l else J.I.=O.
Analogously. the continuous-valued semantics of the predicate small(x) can be
assigned as a membership function of the object x in the fuzzy set "small". This can
be done according to the following syntax:

fuzzy(O.lO.30.40,area(x» (7)

Expression (7) states that the fuzzy set "small" has been defined over the base
variable area(x) and has a trapezoidal shape. specified by the four values. O. 10. 30
and 40. expressed in same suitable measure units. The corresponding fuzzy set is
reported in Fig. 6. What is interesting. in SMART-SHELL. is that the user can give
a default semantic for a fuzzy set definition; then the system itself. by analyzing the
available examples. can adjust this definition or even learn it from scratch. This
facility eases the burden of the domain expert in precisely defining the meaning of the
tenns he/she uses.

Small
1

Area
Fig. 6 - Fuzzy set defining the semantics of predicate "smaU(x)".
289

Given the truth values u and v of two wffs <p and", of L. the semantics of <pA'" and
of <pV'lf is computed according to a pair of corresponding t-nonn a(u.v) and t-cononn
~(u .v); this evaluation reduces to the classical two-valued one in the case of Boolean
predicates.
The evaluation of a formula containing a negated predicate (as in fonnula
(5» is perfonned by evaluating the function a(u.l-v). where u is the evidence of
<p(x l.x2 •... ,y1 ,y2 •.. ·,ym) and v the evidence of ",(Yl ,y2 •... ,ym). For what concerns
the fuzzy quantifiers. a semantics of the type proposed by Yager in [48] has been
adopted. Let us consider the formula:

<p(x - Y) = ATL m < Y > r",'( y) A ","( X- y)] (8)

For the sake of simplicity. the vectors X and y


denote. in (8). sets of variables.
Given an example f.let bl •...•br be the different bindings between the variables in y
and the objects in f such that ",'( y)
is true on f. Let. moreover. ~j("") be the
evaluation of ",' for the binding bj (l:;;j:;;r). Let us now sort the ~j's in a non-
increasing order:
~l ~ ~2 ~ ... ~ ~r
Then. the evidence of the quantified formula (8) is computed as follows:

~(a(~1.~2 •...•~m).~m+l •...•~r) ifm:;;rAY=x


~(<p(x» = a(~(a(~1.~2 •... '~m).llm+ I •...•llr).I1("'''» ifm:;;rAYcx (9)
o otherwise

The evaluation of the other two fuzzy quantifiers can be derived from the following
relationships:

ATM m <y > [",(X)] =--, ATL(m+l) <y > [",(X)] (10)
EX m <y > [",(X)] = ATL m <y > [<P(x)] A ATM m <y > [",(X)] (11)

For efficiency reasons. the procedures for evaluating the predicate semantics and for
updating the evidence of the formulas are handled by the data-base manager
program. The truth evaluation of a non-operational predicate activates a deductive
procedure which builds a corresponding operational fonnula. evaluable as described
above.

THE LEARNING OPERATORS


The description of the learning methodology is out of the scope of this
paper; we will focus. instead. on the mechanisms used to handle the fuzzy relations
associated to the formulas generated during the search. To understand how these
mechanisms work. it is sufficient to know what basic specialization and
generalization operators can be applied to candidate fonnulas. The system SMART-
SHELL provides the basic primitives necessary to search for concept descriptions in
the space of formulas belonging to the language L. In particular it provides inductive
operators. namely. specialization and generalization operators. and a deductive
operator.
290

The inductive operators

The basic operations available in the inductive part of the system are specialization
and generalization. Each one of them can be performed by applying different
operators. as described in the following.

Specialization operators

Specialization by detailing. Given a formula <p(xl.x2 .....yt.y2 .... ,yn). one way of
obtaining from it a more specific formula", is by adding to it a predicate containing
a subsets of the variables occurring in <p:

In this way. the original description is enriched with some new details on the same
objects considered before. Given the relation <p*. the extension ",* of", is built up
by selecting from <p* those tuples satisfying the predicate p(y l •...• yn). The relation
'1'* will have the same number of columns as <p*. An example is given in Fig. 7.
cp(xl.x2)=Triangle(xl) I\Large(xl) I\Circle(x2) ",(xl.x2)=Triangle(xl) I\Large(xl) 1\

cp* I\Circle(x2) I\On(xl.x2)


F M xl x2
'II•

lil
F M xl x2

.,
0.6 a e
0.6 a b
8:g
0.6
~
d
I
1
On(xl.x2)
E~l 8:g
0.6
a
a
a
td
J
EX3 0.6 J k EX3 0.6 j k

Fig. 7 - Example of specialization by detailing.

Specja1i za tjon by gegation. Let <p(xl ..... xk,yl •...• yn) and P(xl .....xk) be two
formulas. Then. the ~ew formula

is obtained by negating the assertion p for the objects bound to <x I •...• xk> in <po
This operator is based on the negation as failure paradigm : given the extensions <p*
and p*. the resulting relation is obtained from <p* by removing those tuples which do
not verify p. An example is given in Fig. 8.

<p(xl) ::Circle(xl) ",(xl) :: Circle(xl) 1\-, Clear(xl)

",*
F M xl F M xl
-, Clear(xl)
EXI
EXI
EXI
1.~
1.
1.
t
9 II
EXI
EX~
EX
EX3
1.~
1.
1.
1.0
"
1
I
k
~~~
EX l:~
1.
Ik

Fig. 8 -Example of specia1ization by negation.


291

Specialization by con junction. Let cp(x I .....xk) and 'II(yt .....yn) be two formulas;
the formula P(xl •...•xk.yt •...• yn) = cp(xl •...• xk) A 'II(yt •...• yn) is more specific than
both cp and'll. A natural join is performed between the two relations cp* and '11*. The
resulting relation. an example of which is reported in Fig. 9. will have k+n+2
columns.

<p(xl) .. Triangle(xl) I\Large(xl)

<p*
F M xl

0.6 a
~~ 8:& h
J
~ (xl.x2) .. TriangJe(xl) 1\

1\ Square(x2)1\ Large(x2)
Large(xl) 1\

F M xl x2

ljI(xl) .. Square(xl) 1\ Large(xl) 0.6 a c


IjI * F M xl
i~i 0.6
0.6 J
P i
m

1.0 c
i*~ 1.0
1.0
i
m

Fig. 9 - Example of specialization by conjunction.

Speciali7.ation by Quantificatjon. Let cp(x l .....xi .....xk) be a formula; we can build


up the quantified expression'll = q n <x l.x2 ..... x i> [cp(x I ..... xi .....xk)]. where
qe {EX.ATM.ATL}. The formula'll is closed with respect to the variables
<x l.x2 ..... Xi> and. then. the tuples in the relation '11* contain only the variables
<xi+I .....xk>. While the quantifiers ATL and EX are implemented as data-base
operators. ATM is handled by a higher level procedure. that uses both the
quantification operators implemented and a set-difference operator described in the
following.

<p(xl.x2)ETri~gJe(xl) I\Large(xl) 1\

1\ CircJe(x2) 1\ On(x I ,x2)

• 'I'(xl) EAlL 2 <x2> [Obj(x2) I\CircJe(x2) I\On(xl,x2)]1\

.,
<p F M xl x2 1\ Triangle(xl) I\Large(xl)

0.6 a e

~l 8:&
0.6 J
a
!l ~
k
AlL 2 <x2>
'I'
• F M xl

EXI 0.6 a

Fig. 10 - Example of specialization by quantifying.

Quantification is similar to a counting operator: for each example in a relation. it


counts the number of tuples having different bindings to the variables <x l .....xi>
which exist in that relation; then. it projects over <xi+ I ..... xk> the relation.
discarding those tuples whose count is not q n. An example of Llis operation is
292
reported in Fig. 10. According to this procedure, the result of the ATM quantifier is
neither a more specific nor a more general formula; in fact, the resulting extension is
not comparable with the original one (it is not always possible to say which is a
subset of the other one).

Generalization operators
Only one basic generalization operator has been considered, i.e., the one that
performs a disjunction of two formulas having the same number of variables. Let
cp(x I ,... ,xk) and ",(x I , ...,xk) be two formulas with the same number of variables; the
formula p(x I ,... ,xk) = ",(x I ,... ,xk) v <p(x I ,... ,xk) is a generalization of both. A
merging operator, similar to the union operator of relational algebra, is used for
implementing this operation. In Fig. 11 an example is reported.

~(xl) ==Triangle(xl) "Small(xl)


~*
F M xl
~ (xl) == [Triangle(xl)" Sma1l(xl)] v
~~ 1:8 §
[Circle(xl) ,,-, Clear(xl)]

F M xl

1.0 b
~~ 1.0 g
'I' *
EX2 1.0 1
F M xl 1.0 I
~~ 1.0 k
EX3 1.0 n
~~~
1.0
1.0
P
1
1.0 I
~~~ 1.0 k

Fig. 11 - Example of generalization by disjunction.

Using the described basic operators, any kind of formula in the relational calculus,
extended with the above defined non standard quantifiers, can be built up.

Basic mechanisms for deduction

As mentioned in Section 3, SMART-SHELL allows the background knowledge to


be described as a Hom clause theory, in which the non-operational predicates p(n) can
be defined by means of implication rules (1). We will now briefly describe how the
standard SLD-resolution mechanism can be extended in order to perform deduction
using all the learning examples belonging to FO at the same time. Let us consider a
goal expressed in conjunctive form:

g =PI (0) 1\ 1'2(0) 1\ •• • 1\ PI (n) 1\ P2(n) 1\ •••

and the implication rule:


(12)
293
Variables are omitted for the sake of simplicity. By applying the basic step of Hom
clause resolution, the goal g can be reduced to the goal g':

where the non-operational predicate PI (n) has been replaced by the body of the rule
(13), after applying the unification with the terms occurring in g. However, suppose
we know the extension g* on FO of the operational subformula PI (0) 1\ P2(0) 1\ ... in
the goal g; then, the extension g'* can be easily computed by specializing g* with
the formula q 1(0 ) 1\ q2 (0 ), i.e., by using the specialization operators defined in the
previous section.
This basic deductive step corresponds to the one used in deductive data-bases,
which utilize the method of "queries and subqueries" [50]; in particular, SMART-
SHELL incorporates a deductive data-base of this form, which has been obtained by
extending Robinson's LOGLISP [51].
Inductive specialization and deductive steps can be also easily interleaved,
realizing an effective integration of analytical and empirical learning (27): in this
framework, specialization steps allow one to modify the partial operational
descriptions obtained from the theory, thus improving their classification
performance. On the other hand, the deductive use of background knowledge supplies
a skeleton for the inductive process, limits the search space and gives structural
meaning to the obtained concept descriptions.
Finally, as the semantics of the predicates can be freely defined by the user,
he is also allowed to change it dynamically, in the sense that the shell provides a
mechanism to firstly define non-operational predicates through a set of Hom clauses
and then move them to an operational state, by deducing their operational form in a
context free environment. In this case the predicates' semantics is given
extensionally, by means of the relations built up during the former process.

CONCLUSIONS
In this paper we have described the tool SMART-SHELL, designed to ease
the development of learning systems oriented to classification and diagnostic expert
systems. The learning framework is based on an integrated paradigm allowing
empirical learning (i.e. induction) and analytic learning (i.e explanation-based
learning) to be the interleaved. This paradigm, that proved very effective in practice,
can be easily implemented using a deductive data-base. Then, the environment
SMART-SHELL can be considered as a special-purpose deductive data-base, extended
in order to support the development of knowledge-intensive learners. An important
feature of the system is the capability of handling fuzzy relations.
So far, SMART-SHELL has been used to develop four families of learners,
the best known being ML-SMART; they have been applied in several real-world
domains, such as pattern recognition [43] and fault diagnosis of electromechanical
equipments [45] among others. In these applications SMART-SHELL proved to be
reliable enough and usable even by peoples who did not participate to the
implementation of the tool itself (one version of ML-SMART has been develop by
the SOGESTA s p. .a). The help obtained for speeding up the prototyping time has
been evaluated excellent, when used by a trained programmer; several prototypes have
been developed in few weeks.
The facility of handling fuzzy logic was also a key for the success,
especially in diagnostic problems, where coping with the vagueness of the terms used
by a human expert and with the approximation of the measures cues was a must.
294
Moreover, the possibility of automatically acquiring the required fuzzy set defmitions
greatly enhance the system's usefulness.

REFERENCES
1. F. Hayes-Roth and J. McDermott, "An Interference Matching Technique for
Inducing Abstractions," Communications of the ACM, vol. 21, no. 5, pp. 401-
410, 1978.
2. R. S. Michalski, "Pattern Recognition as Rule-guided Inductive Inference,"
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 2, pp.
349-361, 1980.
3. S. A. Vere, "Induction of Concepts in the Predicate Calculus," in Proc. of
the Fourth IJCAI, pp. 281-287, Thilisi, USSR, 1975.
4. P. H. Winston, "Learning Structural Descriptions from Examples," in The
Psychology of Computer Vision, ed. P.H. Winston, McGraw Hill, New York,
1975.
5. S. Minton, J.G. Carbonell, "Strategies for Learning Search Control
Rules: An Explanation-Based Approach," in Proc IJCAI-B7, pp. 228-235,
Milano, Italy, 1987.
6. L. Rendell, "A General Framework for Induction and a Study of Selective
Induction," Machine Learning, vol. I, pp. 177-226, 1986.
7. D.B. Lenat, "The Role of Heuristics in Learning by Discovery: Three Case
Studies," in Machine Learning, An Artificial Intelligence Approach, ed. R. S.
Michalski, J. G. Carbonell, T. M. Mitchell, pp. 243-306, Tioga Publishing
Company, 1983.
8. R. S. Michalski, J. Carbonell, and T. Mitchell, Machine Learning. An
Artificial Intelligence Approach. Vol. I, Tioga Publishing Company, Palo
Alto, CA, 1983.
9. R. S. Michalski, J. Carbonell, and T. Mitchell, Machine Learning. An
Artificial Intelligence Approach, Vol. 2, Morgan Kaufmann, Los Altos, CA,
1985.
10. R. S. Michalski and Y. Kodratoff, eds. : "Machine Learning: An Artificial
Intelligence Approach", vol. 3, Morgan Kaufmann, Palo Alto, CA, 1988.
11. Artiricial Intelligence, Special Issue on Machine Learning, J. Carbonell
(Ed.),~, N. 1-3 (1989)
12. P. Langley, G.L. Bradshaw, and H.A. Simon, "Rediscovering Chemistry with
the Bacon System," in Machine Learning. An Artificial Intelligence Approach,
ed. R. S. Michalski, J. G. Carbonell, T. M. Mitchell, pp. 307- 330, Tioga
Publishing Company, 1983.
13. P. Langley, J.M. Zytkow, H.R. Simon, and G.L. Bradshaw, "The Search of
Regularities: Four Aspect of Scientific Discovery," in Machine Learning , An
Artificial Intelligence Approach, Vol. 2, ed. R. S. Michalski, J. G. Carbonell,
T. M. Mitchell, pp. 425-470, Morgan Kaufmann, Los Altos, CA, 1985.
14. B.C. Falkenhainer, R.S . Michalski, "Integrating Quantitative and Qualitative
Discovery: The ABACUS System," Machine Learning, no. 1-4, pp. 367-402,
1986.
15. R.S. Michalski and R.E. Stepp, "Learning from Observation: Conceptual
Clustering," in Machine Learning. An Artificial Intelligence Approach, ed. R.
S. Michalski, J. G. Carbonell, T. M. Mitchell, pp. 331-364, Tioga
Publishing Company, 1983.
16. M. Lebowitz, "Experiments with Incremental Concept Formation: VNIMEM,"
Machine Learning, no. 2-2, pp. 103-138, 1987.
17. D.H. Fisher, "Knowledge Acquisition Via Incremental Conceptual
Clustering," Machine Learning. no. 2-2, pp. 139-162, 1987.
295

18. T. M. Mitchell, "Generalization as Search," Artificial Intelligence, vol. 18, pp.


203-226, 1982.
19. J.E. Laird, P.S. Rosenbloom, and A. Newell, "Chunking in Soar: The
Anatomy of a General Learning Mechanism," Machine Learning, no. I-I, pp.
11-46, 1986.
20. T. M. Mitchell, R. M. Keller, and S. J. Kedar-CabeIli, "Explanation-based
Generalization: a Unifying View," Machine Learning, vol. I, pp. 47-80,1986.
21. G. Dejong and R. Mooney, "Explanation-Based Learning: An Alternative
View," Machine Learning, vol. I, pp. 145-176.
22. MJ. Pazzani, "Explanation-based learning for knowledge-based systems," Int. l .
of Man-Machine Studies, pp. 413-424,1987.
23. R. Keller, "Defining Operationality for Explanation Based Learning," in
Proc. AAAI-87, pp. 482-487, Seattle (WS), 1987.
24. S. Rajamoney, G. DeJong, "The Classification, Detection and Handling of
Imperfect Theory Problems," in Proc IJCA/-87, pp. 205-207, Milano, Italy,
1987.
25. M. Lebowitz, "Not The Path to Perdition: The Utility of Similarity Based
Learning," in Proc IMAL-86, Les Arc, 1986.
26. F. Bergadano and A. Giordana : "A knowledge Intensive Approach to Concept
Induction", Proc. of the Fifth International Conference on Machine Learning,
Ann Harbor (1988).
27. F. Bergadano, A. Giordana and L. Saitta: "Concept Acquisition in an Integrated
Ebl and Sbl Environment", Proc. European Con/. on Artificial Intelligence,
(Munich, Gennany, 1988) pp. 363-368.
28. A. Danyluk: "The Use of Explanations for Similarity Based Learning", Proc.
IlCA/-87 (Milano, Italy, 1987), pp. 274-279.
29. IAAA, "Neural Network," in Proc of the International Conference on Neural
Networks, S. Diego, CA, 1987.
30. J.H. Holland, "Escaping Brittleness: The Possibility of General-Purpose
Learning Algorithms Applied to Parallel Rule Based Systems," in Machine
Learning, An Artificial Intelligence Approach, Vol. 2, ed. R. S. Michalski, J.
G. Carbonell, T. M. Mitchell, pp. 593-624, Tioga Publishing Company,
1983.
31. Y. Kodratoff, M. Manago, "Generalization and Noise", Int. l . of Man-Machine
Studies, ZL 181-204 (1987).
32. J.R. Quinlan, "Induction of Decision Trees," Machine Learning , no. I-I, pp.
81-106, 1986.
33. P. Clark, T. Niblett: "Induction in Noisy Domains", in Progress in Machine
Learning, I. Bratko and N. Lavrac (Eds.), Sigma Press (Wilmslow, UK, 1987),
pp.11-30.
34. J. Mingers: "An Empirical Comparison of Pruning Methods for decision Trees
Induction", Machine Learning ,~, 227-243 (1989).
35. R.S. Michalski, Igor Mozetic, Jiarong Hong, and Nada Lavrac, "The AQ15
Inductive Learning System: An Overview and Experiments," in Proc. of the
International Meeting on Advances in Learning -IMAL, 1986.
36. J. C. Schlimmer and R. H. Granger, Jr., "Incremental Learning from Noisy
Data," Machine Learning, vol. I, pp. 317-354,1986.
37. P. Politakis, S. Weiss, "Using Empirical Analysis to Refine Expert System
Knowledge Bases," Artificial Intelligence, no. 22, pp. 23-48,1984.
38. R. Rada, "Gradualness Facilitates Knowledge Refinement," IEEE Trans. on
Pattern Analysis and Machine Intelligence, vol. PAMI-7, pp. 523-530, 1985.
39. A. Ralescu, J. Baldwin: "Concept Learning from Examples and Counter-
Examples", Int. l. of Man-Machine Studies,lfl. 329-354 (1989)
40. R. De Mori, L. Saitta: "Automatic Learning of Fuzzy Naming Relations over
Finite Languages",lnformation Sciences, 21. 93-139 (1980).
296
41. L. Lesmo, L. Saitta, P. Torasso: "Learning of Fuzzy Production Rules in
Medical Diagnosis", Invited paper in M. Gupta and E. Sanchez (Eds.),
'Approximate Reasoning in Decision Analysis', North-Holland Publ. Co.
(1982), pp. 249-260.
42. F. Bergadano, A. Giordana, and L. Saitta, "Learning from Examples in
Presence of Uncertainty," in Approximate Reasoning in Intelligent Systems
Decision and Control, ed. E. Sanchez and L. Zadeh, pp. 105-124, Pergamon
Press , 1986.
43. F. Bergadano, A. Giordana, and L. Saitta, "Concept Acquisition in Noisy
Environment," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.
PAMI-I0, 1988.
44. F. Bergadano, R. Gemello, A. Giordana and L. Saitta : "Smart: A Problem
Solver for Learning from Examples", Fundamenta Informaticae, 11. pp. 29-50
(1989).
45. R. S. Michalski, "A Theory and Methodology of Inductive Learning,"
Artificial Intelligence, 2il, pp. 111-161, 1983.
46. J. D. Ullman: Principles of Database Systems, Computer Science Press (1983).
47. J. D. Ullman: "Implementation of Logical Query Languages for Database",
ACM Trans. on Database Systems. lJl. pp. 289-321 (1985).
48. R. Yager: "Quantified Propositions in a Linguistic Logic", International 1. of
Man-Machine Studies.1!l., 195-227 (1983).
49. L. Zadeh: "A Computational Approach to Fuzzy Quantifiers in Natural
Languages", Computer and Mathematics with Applications. 2,149-184 (1983).
50. L. Vieille : "Recursive Axioms in Deductive Database: the Query Sub-query
Approach", Proc. of the 1st Int. Con! on Expert Database Systems. (Charlston,
SC, 1986).
51. J. A. Robinson and E. E. Siebert: "LOGLISP : An Alternative to Prolog",
Machine Intelligence.lJl. J. E. Hayes and D. Michie (eds), 399-419 (1982).
52. F. Bergadano. F. Brancadori, A. Giordana, L. Saitta: " A System that Learns
Diagnostic Knowledge in a Data-Base Framework", Proc.lJCAl Workshop on
Knowledge Discovery in Databases (Detroit, MI, 1989). pp. 4-15.
15
EVIDENTIAL REASONING
UNDER PROBABILISTIC AND FUZZY
UNCERTAINTIES

J. F. BALDWIN
Engineering Mathematics Dept
University of Bristol
Bristol BS8 1TR
England

1. INTRODl 'CD ON

1.1 GENERAL KNOWLEDGE AND EVIDENCE

An expert's knowledge of an application is concerned with general tendencies, what


is likely to be the case, frequent conjunctions, rules of thumb and other forms of
statistical statements. An investigator may know that a certain type of crime is
common among criminals of a certain type, an insurance company may know that a
person with certain characteristics is a good risk, a doctor knows that certain
symptoms almost always means the person is suffering from a given disease. The
conclusion in each of these cases comes from studying tendencies in a population of
relevant cases and using these to infer something about an individual case.

Rules of thumb such as


"most tall persons wear large shoes"
can be expressed as the rule
person X wears large shoes IF person X is tall : very likely
X is a variable which can be instantiated to any member assumed to be drawn at
random from the population of persons. This says that the head of the rule is very
likely given the body of the rule is true. It makes a vague statement about the
conditional probability Pr(large shoes I tall). This probability represents the
proportion of persons who wear large shoes in some population of tall persons. It is a
statement about the population as a whole rather than a statement about any
particular individual person. If an individual person is known to be tall then one can
infer that the probability of this person wearing large shoes is very likely. In this
sense the variable in the rule is universally quantified,
i.e 't/x Pr(x wears large shoes I x is tall) = very likely.
298

The use of "very likely" rather than a point probability value further complicates
matters. As a first approximation we might equate "very likely" with the interval
[0.9, I]. This means that the Pr(x wears large shoes I x is tall) lies in the interval [0.9,
1]. We could further express this as the necessary support in favour of (x wears large
shoes I x is tall) is 0.9 and the necessary support in favour of (x does not wear large
shoes I x is tall) is O. The term necessary support can be replaced with the term
"belief". We can also express this in the form of a mass assignment over the power
set of
{(x wears large shoes I x is tall), (x does not wear large shoes I x is tall)}
namely,
(x wears large shoes I x is tall) : 0.9
(x does not wear large shoes I x is tall) : 0
{(x wears large shoes I x is tall), (x does not wear large shoes I x is tall)} : 0 .1
where an assignment of mass m to set Y means m is the probability associated with
exactly Y but not to any subset of Y. The meaning of these various terms will be
expanded upon later in the paper. In order to more adequately capture the true
semantics of the vague statement "very likely" we require to model this linguistic
term using a fuzzy set, [ZADEH 1965]

1.2 AN EXAMPLE

Consider the following simple example. A bag contains 70% red balls and 30% blue
balls. Each ball is either large or small. 60% of the red balls are large and 40% of the
blue balls are large.

Problem la
What is the probability that a ball drawn randomly from the bag is large?
Of course this is a very elementary problem and can be solved by fusing the pieces
of information concerning the balls in the bag to calculate this probability. If {yl, y2.
y3, y4} stand for the probabilities (Pr(rl), Pr(rs). Pr(bl). Pr(bs)} respectively and r, I
signifies "red", "large" respectively then

yl +y2=0.7 ; y3+y4=0.3
ylt (yl + y2) = 0 .6 ; y3t (y3 + y4) = 0.4
so that yl = 0.42. y2 = 0.28, y3 = 0.12 and y4 = 0 .18
from which Pr(l) = yl + y3 = 0.54.
This is simply a probability logic problem. In the sequel this fusion of probabilistic
information will be done by means of a general assignment method.

Problem Ib
A ball drawn at random from the population is known to be large. What is the
probability that it is red?
The solution is given by ylt (yl + y3) = 0.7778 and comes from fusing the given
information using probability logic and calculating the required conditional
probability.
299

Of course y1/ (y1 + y3) = Pr(ll r)Pr(r) / Pr(l) which is Bayes theorem applied to this
problem. This can therefore be viewed as an updating problem in which the apriori
distribution {y1, y2, y3, y4} is updated using the certain information that the ball in
question is large.

Problem 2
The balls in the bag are shown, as a black and white image on a screen, one by one to
an observer. The observer is then asked if the third ball shown was red. The observer
believes that the third ball was large but is not certain of this fact He expresses this
belief as Pr(third ball shown is large) =0.8. He does not have information about the
colours of the balls shown. What should be his belief that it is red?
One possible answer to this problem is obtained by using Jeffrey's rule, [JEFFREY
1967] namely
Pr(third ball is r) =Pr(r Il)Pr(third ball is I) + Pr(r I s)Pr(third ball is s)
=0.7778 ... 0.8 + {y2/ (y2 + y4)} ... 0.2
= 0.7778 * 0.8 + 0.6087 ... 0.2 = 0.744
It looks as if we have used the theorem of total probabilities, namely,
Pr(third ball is r) =Pr(third ball is r Ithird ball is l)Pr(third ball is I)
+ Pr(third ball is red I third ball is small)Pr(third ball
is s)
with the assumption that
Pr(third ball is r Ithird ball is 1) =Pr(r II) and
Pr(third ball is r I third ball is s) =Pr(r Is).
We do not have to make this assumption if the following philosophy is accepted. The
apriori distribution over the labels {rI, rs, bl, bs} is {yl, y2, y3, y4}. This is to be
updated using the specific information P'r(l) =0.8, where the • is used to signify that
this is not the proportion of large balls in the population but a belief in one particular
ball being large. We could update to (y'l, y'2, y'3, y'4l by choosing the (y'il such
that the relative information

1= Ly'i Ln (y'i / yi)

is minimised. This will be discussed further later. This forms the basis of the iterative
assignment method to be discussed in detail in a later section.

1.3 RULE FORM OF KNOWItEDGE REPRESENTATION

Various forms of knowledge representation can be used to express the general


tendencies and specific information discussed above. The assignment methods
mentioned above can be used with various forms of knowledge representation but in
this paper we will concentrate on a rule form like that used by Prolog but extended to
allow for uncertainties of both a probabilistic and a fuzzy kind to be expressed. The
methods developed in this paper are extensions of those used in the AI language
FRIL, [BALDWIN 1986, 1987] and [BALDWIN et al1987].
300

Consider the foUowing Prolog program.

married(X):- middle_aged(X), has_chiidren(X).


middle_aged(mary).
has_children(mary).
This says that any middle aged person who has children is married and that Mary has
children. We can conclude from this that Mary is married. In Prolog we ask the
query
7- married(mary)
to which we get the reply
yes.

Suppose it is known that at least 70% and at most 90% of middle aged persons who
have children are married. We will defme the fuzzy term "middle aged" using the
fuzzy set
middle_aged with membership function
l/5.x - 7 for 35 ~ x ~ 40
Xmiddle_aged(x) = 1 for 40 ~ x ~ 50
-l/5.x + 11 for 50 ~ x ~ 55
oelsewhere
We also defme the fuzzy term "abouc35" using the fuzzy set abouc35 with
membership function
1/5.x - 6 for 30 ~ x ~ 35
Xabout_35(x) = -1/5.x + 8 for 35 ~ x ~ 40
oelsewhere
Suppose it is known that Mary is about 35 and it is believed with a probability of at
least 0.8 that Mary has children.

A more realistic program is

married(X):- age(X, middle_aged), has_chiidren(X): [0.7,0.9).


age(mary, abouc35)
has_children(mary) : [0.8, I).

which is interpreted as saying that the conditional probability that someone is


married if the person is middle aged and has children lies between 0.7 and 0.9 and
that the person Mary is about 35 years old and the probability that Mary has children
lies between 0.8 and 1.

We can now ask how do we answer the query


7- married(mary)
We would expect the answer to take the form of an interval containing the
301

probability that Mary is married.

The methods of inference developed below will allow us to answer this query for this
program. In deriving this interval both the probabilistic and fuzzy types of
uncertainty must be taken into account. For example, the rule talks about middle
aged persons while the age of Mary is given as "about 35". From a syntactic point of
view it would appear that the rule has no relevance to Mary but from a semantic
point of view it does since someone who is "about 35" is to some degree middle
aged. This degree depends on the definitions of the fuzzy sets "middle aged" and
"about 35". In order to answer the query given it is necessary to determine an
interval containing the conditional probability Pr{age(mary, middle_aged) 1
age(mary 1 abouc35). We term this process "semantic unification", [BALDWIN
1990a].

When the second argument of the age predicate is always a crisp set then this interval
is
[0,0], [1, 1] or [0, 1].
For example,
Pr{age(mary, [40,50]) 1age(mary 1[35, 39])} =0 and
Pr{age(mary, [35,45]) 1age(mary 1[37, 42])} =1 and
Pr{age(mary, [40,50]) 1age(mary 1[45, 55])} is contained in the interval [0,1].
Semantic unification extends this to the case of fuzzy sets.

A program can also contain universally quantified facts. For example


middle_aged(X) : [0.4, 0.5]
would be interpreted as saying that between 40% and 50% of the relevant population
of objects being considered are middle aged. While X can be instantiated to the
object mary, the statement
middle_aged(x) : [0.4,0.5] with X =mary
is not interpreted in the same way as
middle_aged(mary) : [0.4, 0.5].
The former is a statement about the population as a whole and if Mary is an object
drawn at random from this population then Pr{ middle_aged(mary)} lies in [0.4, 0.5].
The latter statement makes no reference to the population but is a statement made
about the object Mary by inspecting this object independently of any population
statistics. The fIrst statement is concerned with a general tendency while the second
statement is specific to the object concerned.

1.4 AIMS OF PAPER

In this paper we will discuss methods for answering queries of the types given above
from a knowledge base expressed in rule form containing statements representing
general tendencies and also specific facts. The specific facts expressed in
probabilistic terms will be used as evidences to update the family of possible apriori
distributions obtained from the relevant general statements and this update used to
302

provide the answer to the given query.The rules and facts can contain both
probabilistic and fuzzy uncertainties.

The inference method for processing the knowledge base to answer queries will use
the foUowing three methods
(1) general assignment method
(2) iterative assignment method
(3) semantic unification.

In special cases the inference process simplifies to using the theorem of total
probabilities if only general statements from the same population are used or
Jeffrey's rule when both general statements and specific evidences are used. This is
the inference mechanism of the AI language FRIL.

Each of these methods requires the information in the form of a mass assignment
over a frame of discernment whose elements are labels formed from the information.
We discuss this more fully in the appropriate sections that follow. A general
treatment will not be given here and the reader is expected to generalise for
him/berself from those cases discussed. Other aspects can be found in [BALDWIN
1990b, 199Oc]

2. MASS ASSIGNMENTS. SUPPORT MEASURES AND SupPORT PAIRS

2.1 LABE!tS AND fRAME OF DISCERNMENT

A knowledge base statement, either in the form of a rule or a fact, is convened into
the form of a mass assignment over a set of labels. Each label is a concatenation of
instantiations of the proposition variables and the proposition variables come from
all the information of the knowledge base relevant to answering the given query. The
following example will illustrate this. A general theory in terms of inference
diagrams and logic proof paths can be given but space does not allow this to be
included here.
Consider the knowledge base
fly(X) :- bird(X) : [0.9,0.95].
bird(X) :- penguin(X).
fly(X) :- penguin(X) : [0,0].
penguin(obj) : [0.4,0.4].
bird(obj) : [0.9, 1].
etc

and the query


1- fly(obj)

To answer this query we form the following inference diagram


303

[0.9.0.95] [0,0]

[0.9,1] [0.4,0.4)

from which we extract the propositional variables


B = bird(obj) ; F = fly(obJ) ; P = penguin(obj)
Each of these variables can be instantiated to "true" or "false". We represent the set
of instantiations for a propositional variable X by (x, -,x). A label is a possible
instantiation of the concatenation of the three variables B, P and F written as BPF.
Thus the possible set of labels corresponding to the frame of discernment is
L =(....,b-.p-.f, ....,b-.pf, ....,bp-.f, -.bpf, tr-p-,f, b-.pf, bp-,f, bpf).

For any knowledge base consisting of facts and rules and for any query, a frame of
discernment can be established using this method of constructing an inference
diagram and extracting the propositional variables. The inference diagram is
obtained by using the unification and backtracking mechanisms of Prolog with
extensions to include semantic unification as discussed later.

2.2 MASS ASSIGNMENTS

A mass assignment over a finite frame of discernment X is a function


m : P (X) -+ [0, 1] where P(X) is the power set of X
such that
m(0)=Oand
L m(A) =1
AeP(X)
and corresponds to the basic probability assignment function of the Shafer I
Dempster theory of evidence, [SHAFER 1976]. m(A) represents a probability mass
assigned exactly to A. It does not include any masses assigned to subsets of A.

As an example consider the specific evidences


304

penguin(obj) : [0.4,0.4].
bird(obj) : [0.9, 1].
given above.
The first is equivalent to the following mass assignment over the set of labels L
m( (-.bp-.f, -.bpf, bp-.f, bpf}) = 0.4
m( (--.b-.p-,f, -,b--,pf, b-.p-,f, b-.pf}) = 0.6
which can be written as
Lp _} : 0.4
L -.p_}: 0.6
where _ can be instantiated to the appropriate proposition or its negation.
Similarly the second evidence is equivalent to the mass assignment
(b __ I : 0.9
L __ }: 0.1
Consider the general statements
fly(X) :- bird(X) : [0.9,0.95].
bird(X) :- penguin(X).
fly(X) :- penguin(X) : [0,0].
The second of these clauses say that the labels {-,bpf, -.bp-,f} are not possible. The
third says that (bpf) are not possible. The combined two statements says that the
labels (-.bpf, -.bp-.f, bpf) are not possible. We can therefore express the first clause
as a mass assignment over the reduced set of labels
L'= (-.b-.p-.f, -.b-.pf, b-.p-.f, b-.pf, bp-.f), by combining the following two
evidences, each expressed as a mass assignment over L'
(l) (b-.p _, bp-.f) : k , (-.b-.p _ ) : l-k
(2) (b-.pf) : 0.9k , (-.b-.p _, b _ -,f) : 1-O.9k
corresponding to
= =
Pr(bird(obj)} k, Pr(-. bird(obj)} l-k .
and
Pr(bird(obj) 1\ fly(obj)} = 0.9k Pr(-. (bird(obj) 1\ fly(obj»} = 1-0.9k
The combination of these two evidences, using the general assignment method
defined below gives the mass assignment over L' as
(b-.pf) : 0.9k
(-.b-.p_) : l-k
(b _ -.f) : O.1k
for the combined relevant general statements in the knowledge base. The conditional
statements of rules can always be treated in this way. Pure logic rules simple reduce
the set of possible labels.

2.3 SupPORT PAIRS

We use the concept of belief and plausibility measures of [SHAFER 1976] to define
necessary support and possible support measures. Names are changed to be
consistent with the notation used in support logic programming, [BALDWIN 1986]
and the FRIL language [Baldwin et al 1987], and to avoid confusion with
305

conclusions and derived results based on the use of the Dempster rule of combining
evidences. The methods given here do not use the Dempster rule and the necessary
and possible supports are more in keeping with upper and lower probabilities,
[DUBOIS, PRADE 1986].

A necessary support measure is a function


Sn : P (X) ~ [0, 1]
where X is a set of labels and P (X) is the power set of X
that satisfies the following axioms

Axiom 1 (boundary condition). Sn(0) =0 and Sn(X) =1 where 0 is the empty set
Axiom 2 : Sn(A1u ~ u ... u ~) ~ Li Sn(Ai) - Lkj Sn(Ai '"' Aj)
n+l
+ ... + (-1) Sn(A 1 '"' ~ '"' ... '"' An)
for every collection of subsets of X.
For each A e P (X), Sn(A) is interpreted as the necessary support, based on available
evidence, that a given label of X belongs to the set A of labels.
When the sets AI' ~, ... , An in axiom 2 are pairwise disjoint ie.
(Ai'"' Ap = 0 for all i, j e {I, 2, ... , n} such that i '* j
the axiom requires that the necessary support associated with the union of the sets is
not smaller than the sum of the necessary supports pertaining to the individual sets.
The basic axiom of necessary support measures is thus a weaker version of the
additivity axiom of probability theory.

It is easy to show that axiom 2 above implies that for every A, B e P (X), if A C B,
then Sn(A) ~ Sn(B)
and also that
Sn(A) + Sn(A) ~ 1

Possible Support Measure


Associated with each necessary support measure is a possible support measure Sp,
dermed by the C9,!lation
Sp(A) =1 - Snl(A)
for altA eP (X).
Similarly
Sn(A) = 1 - Sp(A)

Necessary support measures and possible support measures are therefore mutually
dual and it i~asy to show that
SP(A) + Sp(A) ~ 1

Given a basic probability assignment m, a necessary support measure and possible


support measure are uniquely determined by the formulae
306

Sn(A) = L m (B)
B~A

and

Sp(A) = I: m(B)
AnBY!;0
which are applicable for all A £ P (X).

Focal Elements
Every set A £ P (X) for which meA) > 0 is called a focal element of m. We can
represent the mass assignment as (m, F) where F is the set of focal elements.

Total ignorance is expressed in terms of the mass assignment by


m(X) = 1 and m(A) = 0 for all A y!; X.
Using the formula above for Snl in terms of m, we can therefore also express total
ignorance as
Sn(X) = 1 and Sp(A) = 0 for all A y!; X
The total ignorance in terms of the possible support measure is
Sp(0) = 0 and Sp(A) = 1 for all A y!; 0.

A support pair for A £ P (X) is given by [MIN Sn(A), MAX SP(A)] and this
dermes an interval containing the Pr(A) where the MIN and MAX are over the set of
values of any possible parameters that Sn(A) and Sp(A) may depend on. This will be
illustrated later.
3. GENERAl. ASSIGNMENT METHOD

3.1 COMBINING MASS ASSIGNMENTS

Let m1 and m2 be two mass assignments over the power set P (X) where X is a set
of labels. Evidence 1 and evidence 2 are denoted by (mI, FI) and (m2, F2)
respectively, where FI and F2 are the sets of focal elements of P (X) for mI and m2
respectively.

Suppose FI = {Uk} for k = I, ..., nI and F2 = {L2k} for k = 1, ... , n2


then Lij is a subset of P (X) for which mi(Lij) y!; O.

Let (m F) be the evidence resulting from combining evidence 1 with evidence 2


using the general assignment method. This is denoted as
(m, F) = (ml, FI) $ (m2, F2)
where
F = (Lli n L2j I m(Lli nL2j) y!; O}
307

m(Y) = L m'(Lli IiL2j)


ij: Lli IiL2j = Y foranyYeF

m'(Lli IiL2J) for i =1• ...• n1 ; j =1•...• n2 satisfies


L m'(Lli IiL2J) =ml(Lli)
j for i =1•...• nl

Lm'(Lli IiL2j) =m(L2j)


for j =1•...• n2

m'(Lli IiL2j) = 0 if Lli Ii L2j = 0 the empty set; for i = 1•...• nl ; j = 1•...• n2

The problem of determining the mass assignment m is an assignment problem as


depicted in the following diagram

If there are more than two evidences to combine then they are combined two at a
time. For example to combine (mI. Fl). (m2. F2). (m3. F3) and (m4. F4) use
(m. F) =«(mI. Fl) e (m2. F2» e (m3. F3» e (m4. F4)

m(L2j)
L2j

m1(Lli) m'(Lli IiL2j)


Lli Lli /iL2

The labels in a cell is the intersection of the subset of labels of evidence 1 associated
with the row of the cell and the subset of labels of evidence 2 associated with the
column of the cell.
308

The mass assignment entry in a cell is 0 if the intersection of the subset of labels of
evidence 2 associated with the column of the cell and the subset of labels of evidence
1 associated with the row of the cell is empty.

The mass assignment in a cell is associated with the subset of labels in the cell.

The sum of the cell mass assignment entries in a row must equal the mass
assignment associated with ml in that row.

The sum of the cell mass assignment entries in a column must equal the mass
assignment associated with m2 in that column.

If there are no loops, where a loop is formed by a movement from a non zero
assignment cell to other non zero assignment cells by alternative vertical and
horizontal moves returning to the starting point, then the general assignment problem
gives a unique solution for the mass assignment cell entries. If a loop exists then it is
possible to add and subtract a quantity from the assignment values around the loop
without violating the row and column constraints and the solution will not then be
unique. If a non-unique solution exists then the family of solutions can be
parametrised with known constraints on the parameter values. These possible
parameter values must be taken into account when determining support pairs from
the necessary and possible support measures.

3.2 THE BIRD PENGUIN EXAMPI.EREVISITED

Consider the example discussed above of combining the two evidences


(1) {b-.p _, bp-.f} : k , (-.b-.p _ ) : l-k
(2) (b-.pf) : O.9k , (-,b-.p _, b _ -J) : l-O.9k
Using the general assignment method we obtain
O.9k 1-O.9k
{b-.pf} {-.b-.p_,b_-.f}

{b-.pf} {b _ -. f}

O.9k O.lk

(2) {-.b-.p_l
0 l-k
309

giving the result quoted previously.

Consider combining the specific evidences expressed as mass assignments over the
set
X = {bp, o-'p, -,bp, -'o-'p}
in the example above, namely
penguin(obj) : [0.4,0.4].
bird(obj) : [0.9, 1].
We have
(1) L p } : 0.4 , L -,p } : 0.6
(2) {b_} :0.9, L_} :0.1
Using the general assignment method we obtain
0.9 0.1
b {b,-,b}

bp Lp}
0.4
P 0.4 - x x

bop L -,p}
0.6
-p 0.5+x 0.1- x

where 0 S x SO.l
An abbreviated form of labelling is used for convenience.

The necessary and possible supports for the various elements of X are given by
Sn(bp) =0.4 - x Sp(bp) =0.4
Sn(b-,p) =0.5 + x; Sp(b-,p) =0.6
Sn(-,bp) =0 SP(-,bp) =x
Sn(-,o-'p) =0 Sp(-'o-'p) = 0.1 - x
from which we can calculate the support pairs
bp: [0.3,0.4] ; o-'p: [0.5,0.6] ; -,bp: [0,0.1] ; -'o-'p: [0,0.1]

3.3 ANOTHER EXAMPLE

Consider the example:

90% of birds can fly ------------------------------- (1)


No penguins can fly ------------------------------- (2)
All penguins are birds - --------------------------- (3)
310

70% of objects exhibited are birds -------------- (4)


5% of objects exhibited are penguins ----------- (5)
which in program fonn would be

fly(X) :- bird(X) : [0.9,0.9].


fly(X) :- penguin(X) : [0, 0].
bird(X) :- penguin(X) : [1, 1].
bird(X): [0.7,0.7].
penguin (X) : [0.5,0.5].
We can ask the following question. What is the probability that an object drawn at
random can fly.

To answer the query we combine the following mass assignments over the label set
L' using the general assignment method.
(1) {1>-, Jh f, 1>-, pf, bp-, f} : 0.7; {..., 1>-, p-, f,..., 1>-, pf} : OJ using (4)
(2) {bp-, f} : 0.05 ; {..., 1>-, p-, f,..., 1>-, pf, 1>-, p-, f, 1>-, pf} : 0.95 using (5)
(3) (1)-, pf) : 0.9k ; (..., 1>-, p-, f,..., 1>-, pf, 1>-, p-, f, bp-, f): 1 - 0 9. k using (1)
where k is the probability assigned to (1)-, p-, f, 1>-, pf, bp-, f)

For this particular example the solution is easily found by elementary analysis to be
(1)-, pf) : 0.63
(bp-, f) : 0.05
{1>-, p-, f} : 0.02
(..., 1>-, pf, ..., 1>-, p-, f) : OJ
so that the answer to the query is that the Pr{ (x can f)} lies in the interval [0.63, 0.93]
since the 0.3 associated with (..., 1>-, pf,..., 1>-, p-, f) could be all be associated with...,
1>-, pf although this is not necessarily the case. This conclusion is expressed in the
fonn of a support pair. We can obtain this result using the general assignment as
follows:

Combining (1) and (2) gives


0.05 Evidence 2 0.95
(bp-, f) (..., 1>-, p , 1>-, p
(1)-,p_)
-
0.7 {1>-, P _ , bp-, f} (bp-,f)

0.05 0.65
Evidence 1
(2) {...,1>-,p_}

0 OJ

The mass assignment resulting from combining(l) and (2) is thus


(bp-, f) : 0.05
311

(b-. pf, b-. Jh f) : 0.65


(-, b-. p f, -, b-. Jh f) : OJ
This is now combined with (3) as follows

0.9k Evidence 3 1 - 0.9k


b-.pf (-,b-.p_,b_-,f)

Evidence 1,2 0 (bJhf)

0.05 : (bJh f) 0 0.05

{b-.pf} (boJhf)

0.9k 0.65 - 0.9k

0 {-,b-.p_l

0 0.3

where 0.9k + 0.05 + (0.65 - O.9k) =k


so that k =0.7 giving the combined mass assignment

(bp-, f) : 0.05
{b-. pfl : 0.63
{b-. p-, f} : 0.02
(-, b-. pf, -, b-. p-, f) : OJ

4.ITERATIYE ASSIGNMENT METHOD

4.1 UPDATING PROBLEM

Suppose an apriori mass assignment rna is given over the focal set A whose
elements are subsets of the power set P (X) where X is a set of labels. This
assignment represents general tendencies and is derived from statistical
considerations of some sample space or general rules applicable to such a space.

Suppose we also have a set of specific evidences {EI, E2, ... , En} where for each i,
Ei is (mi, Fi) where Fi is the set of focal elements of P (X) for Ei and mi is the mass
312

assignment for these focal elements. These evidences are assumed to be relevant to
some object and derived by consideration of this object alone and not influenced by
the sample space of objects from which the object came from.

We wish to update the apriori assignment rna with (EI ..... En) to give the updated
mass assignment m such that the minimum information principle concerned with the
relative information of m given rna is satisfied.

4.2 ALGORITHM FOR SIMPLE CASE

The minimum inrormation principle


Let p be an apriori distribution defmedover the set of labels X.

Let specific evidences El ..... En be given where Ei expresses a probability


distribution over a partition of X.

Let p' be a distribution such that

L p'(x) Ln (p'(x) / p(x»


xeX
is minimised subject to the constraints
EI ..... En .

p' is said to satisfy the minimum information principle for updating the distribution p
over X with specific evidences El ..... En where each Ei is expressed as a
distribution over a partition of X.

The sequential iterative assignment algorithm updates p using El to obtain the


update pI satisfying the minimum information principle which is similarly updated
using E2 to obtaJn p2 which ... in turn is updated using En to obtain pn. At each stage
of this process 'only the evidence used for updating is necessarily satisfied and
previous evidences used will no longer be necessarily satisfied. We therefore replace
the apriori with pn and repeat the process. The iteration is continued until a pn is
found which satisfies all the evidences EI ..... En.

This iterative process in fact converges to the solution which satisfies the minimum
information principle of minimising the relative information with respect to the
apriori p subject to the constraints El ..... En. The multi constraint optimisation
problem is therefore solved by a succession of single constraint optimisation
ptoblems and iterating.
The single constraint optimisation problem has a particularly simple algorithm for its
solution which we will now consider.

Let p(r-I) = p. say. be updated to p(r) = p'. say. using Er with the following
313

algorithm which gives a p' such that

LP'(x).Ln(p'(x) / p(x)}
xeX
is minimised subject to the constraint Er being satisfied.

Let the partition for evidence Er be (Xl, ... , Xk} with probability distrIbution
(Pr(Xi)} given.

Let Ki = L p(x)
x e Xi
for i = 1, ... , k
then
p'(x) =p(x)Pr(Xi) /Ki for x e X IF xe Xi for x a label of X, for all labels of X

The algorithm is particularly simple when the apriori is expressed as a probability


distribution over the set of labels and the evidences as a distribution over a partition
of the set of labels.

We can give a pictorial representation for this algorithm

Pr(Xj)
Xj ={ ... , Ii, ... }

pi: Ii Ii : Kj.pi.Pr(Xj) ,0: 0


for all other cells
in this row
0:0
if label in row is not in Xj

1
Kj =
L pk
k:lkeXk

4.3 GENERALISATION FOR EYIDENCES EXPRESSED AS MASS ASSIGNMENTS

If the evidences are expressed as mass assignments over X with the apriori
assignment still being a probability distribution over the set of labels X then a more
complicated case must be considered
314

Let the p(r-l) = p be denoted as in 4.2 and let Er be the mass assignment
(Xrk: mrk, for k = 1, ... , or)
where Xrk is a subset of X for all k and mde is the mass assignment given to Xrk.

p'(x) = L p(x).mrk. Kk ; for x any label, for all labels


k: x£Xrk
where
1
Kk=
LPs
s:ls£Xrk
We can express this in pictorial fonn as:

Aprio·
n mrk mrq Update
Xde Xrq

similarly for any Ii : Kq.pi.mrq p'i = tk+ tq


pi: Ii Ii : Kk.pi.mrk
other column k cells such =tq plus any
=tk
that Ii £ Xrk other cases

1 1
Kk= Kq=

LPS
s: Is £Xrk
Each column constraint is satisfied. The labels in the tableau cells are all labels of X
since the focal elements of Er when intersected with a label in the apriori gives the
apriori label. The update of this label is the sum of all the cell assignments associated
with this label. A cell in a row where the apriori label is not a member of the cell
column focal element of Er has a zero mass assignment associated with the empty
set.
In this case the update solution p' satisfies the following relative infonnation
optimisation problem.

is minimised subject to the constraints


Sn(Y) S p'(y) S Sp(Y) for all subsets Y of P (X)
where Sn(Y) and Sp(Y) are detennined from the mass assignment (mr , Fr)

4~4 GENERAI.ISATIQN FOR BOTH APRIORI AND EVIDENCES EXPRESSED AS MASS


ASSIGNMENTS

In this case the intersection of the row subset of labels of the apriori assignment with
315

the column subset of labels of the evidence assignment for a given cell of the tableau
is a subset of P (X). In the case when this intersection is the empty set the mass
assignment for that cell is zero, When the intersection is not empty then the mass
assignment is the product of the row apriori assignment and the evidence column
assignment scaled with the K multiplier for that column. The K multiplier for a
column is the sum of the apriori row assignments corresponding to those cells of the
tableau in the column which have non-empty label intersections. The update is a
mass assignment over the set of subsets of labels of the cells of the tableau. The
update can therefore be over a different set of subsets of labels to that of the apriori.
Iteration proceeds as before and convergence will be obtained with a mass
assignment over P (X) which will correspond to a family of possible probability
distributions over X.

We can give a pictorial view of the algorithm:


Apriori is mass assignment Evidence Er
(t 1') where T = {Tt .•. Tm} mrk - - mrq
where t =tl, ..., tm} Xrk Xrq
where Ti is subset ofP(X)

Tv f"IXrk Tv f"IXrq

tv: Tv Kk.tv.mrk Kq.tv.mrq

1
- - 1
Kk= Kq= --
L
s :Tsf"l
ts
Xrq~0
The apriori assignment was also a family of possible probability distributions. For
anyone of these, the calculation is that of 4.3 and satisfies the minimum information
principle where the constraints are in terms of necessary and possible supports from
the evidence assignments. A member of the apriori set of possible probability
distributions over the label set X will be updated with the evidences EI, ..., Er to a
fmal probability distribution over X satisfying the minimum information principle.
Each member of the set of possible apriori probability distributions will be updated,
in general to a different final probability distribution. The fmal set of distributions
can be expressed as an assignment over P (X). This is what the algorithm described
above does in this case. The calculation is no more involved as for the other more
simple cases apart from having to determine the intersections for fmding the subset
of labels for each cell and taking note of these in the final update.

The examples which follow will illustrate the method.


In this case the updating tableau can contain loops in a similar manner to the general
316

assignment case. The loop can be treated in exactly the same way as for the general
assignment method. This will be illustrated in the examples that follow. Each
solution of the loop satisfies the minimum information principle.

5. EXAMPI.ES OF IlSE OF ITERATIVE ASSIGNMENT METHOD

5.1 A SIMPLE RULE SYSTEM

a(X) :- b(X), c(X) : [0.9,0.9], [0,0]. ------------------------------------- (I)


c(X) :~ d(X) : [0,.85 1], [0,0]. -------------------------------------------- (2)
b(mary) : [0.8, 0.8]. --------------------------------------------------------- (3)
d(mary) : [0.95, 0.95]. ---------------------------------------------------- (4)

This is a simple FRll.. type program. X is a variable and a, b, c, d are predicates. The
fU'St two sentences are rules which express general statements about persons and the
third and fourth are facts about a specific person mary.

If a rule contains one list after the colon then this gives the interval containing the
probability of the head of the rule given the body of the rule is true. If the list after
the colon contains two lists then the fust gives the probability of the head of the rule
given the body of the rule is true and the second gives the interval for the probability
of the head of the rule given the body of the rule is false.

The first rule says that for any person X the probability that (X is a) given that (X is
b) and (X is c) is 0.9. This expresses the fact that 90% of persons who are both band
c are also a. It also says that (X is a) cannot be true unless both b and c are satisfied.

The second rule says that at least 85% of persons who are d are also c while no
person who is not a d can be a c.

We can ask the query in FRll..


1- a(mary).
to determine the support pair for (Mary is a)

In this example the rules are used to determine a family of apriori assignments over
the two sets of labels
{ABC} and {CD}
where A, B, C, D denotes a or..., a, b or..., b, c or-, c, d or..., d respectively. Rule (2)
can be used to construct a family of apriori assignments for {CD} which can be
updated using (4) for the specific person Mary and from this a support pair for
Pr{c(mary)} determined. This can be used with (3) to update a family of apriori
assignments determined from (1) for the set of labels {ABC}. From this update the
Pr{a(mary)} can be determined.

Alternatively we could update the family of apriori assignments over the set of labels
317

{ABCD} constructed using rules (I) and (2) with specific evidences (3) and (4) and
detennine Pr{a(mary)} from the fmal update.

These two approaches are equivalent and the fU'St approach decomposes the problem
of fmding Pr{a(mary)} into two sub problems. Decomposition will not be discussed
in detail in this paper but it is important to reduce the computational burden
associated with updating over a large set of labels.

The computation of the flI'st approach is shown below

Labels with apriori assignment


yl cd
y2 c-,d
y3-,cd
y4 -,c-,d
rule (2) constrains {yi} such that
ylf (yl + y3) =[0.8S, I] and y2f (y2 + y4) =0
so that
yl = [0.8S, I]k where k = yl + y3 for 0 < k ~ I
y2=0
Thus
L d} : k , {-, c-, d} : l-k ------------------------------------ (S)
(cd) : 0.8Sk , (-, c _ ) : 1 - k , (cd, -, c _ ) : O.lSk -------- (6)
(5) and (6) can be combined using the general assignment method to fonn a family
of apriori assignments over this set of labels

0.8Sk l-k O.lSk


(cd) f-,c_) (cd,-,c_)
k: Ld} {cd}: 0.85k {-, cd}: O+x Cd} : 0.15k-x
l-k: (-,c-,d) 0: 0 {-, c-, d} : l-k-x {-, c-, d}: O+x

where 0 ~ x ~ MIN{O.ISk, l-k)


giving apriori assignment
(cd) :0.8Sk, (-,cd) :x, {-, c-, d) : I-k-x , Ld) :O.lSk-x
which can be updated using
Ld) :0.9S , L-,d) :O.OS
by the iterative assignment method as follows

0.9S O.OS
Apriori Ld) L -,d} Update
0.8Sk: (cd) 0.8075 o 0.8075
x : (-, cd) 0.95x/k o 0.95x/k
l-k: (-, c-, d) o 0.05 0.05
O.ISk-x: L d) 0.1425 - (0.95x I k) o 0.1425 - (0.95x I k)
K's Ilk l/(l-k)
318

From this we calculate


Pr{c(mary)} E [0.8075, MAX {0.95 - (0.95x fk)}]
The upper limit is maximum for x = 0, so that
Pr{ c(mary)} E [0.8075, 0.95]
which is to be used for the next stage.

Labels and Apriori


yl abe
y2 --.abe
y3 --.alhc
y4 --.a-,be
y5 --. a-, b-, c

Also ylf (yl + y2) = 0 .9


so that yl =0.9k where k = yl + y2
ie we must combine
1. (abe): 0.9k, (--.a __ ): l-O.9k
2. L be} : k , {--. ab-. c, --. a-, b _ } l-k
:

The general assignment method gives


0.9k 1-0.9k
{abe} {--.a __ }
k: Lbe} {abc}: O.9k {-, abc} : O.lk
l-k: {--. ab-. c, 0: 0 {-, ab-. C, --. a-. b _} : l-k
--. a-, b_}
giving the apriori family of assignments over the labels (ABC) as
O.9k (abe)
O.lk (--. abc)
l-k (--. ab-. c, --. a-. b _ )
which is updated using
Specific Evidence 1 :- {_ b _ } : 0.8, { ___ } : 0.2
Specific Evidence 2 :- { __ c} : 0.8075, { __ --. c} : 0 0. 5, { ___ } : 0.1425

0.9k {abc} UPDATE USING {abe}


O.lk {--.abc} Specific Evidence 1 {-,abc}
l-k {-, alh c, --. a-, b _ } TO GIVE UPDATE {--. ab-, c}
{--.a-,b_}

{abe} UPDATE USING (abc)


{--. abc} Specific Evidence 2 (--. abc)
{-,alhc} TO GIVE UPDATE (--. ab-, c)
(--.a-,b_) (--. a-, be)
(--. a-, b-, c)
(--.a-,b_)
319

{abe} UPDATE USING {abe}


{-,abc} Specific Evidence 1 {-, abe}
{-, aJ>-, c} TO GIVE UPDATE {-,aIr,c}
{-,a-.be} {-,a-.be}
{-,a-.b-.c} {-,a-.b-.c} I
{-,a-.b_} {-,a-.b_} T
E
R
A
{abe} UPDATE USING {abe} T
[-,abc} Specific Evidence 2 {-,abe} E
{-, aJ>-, c} TO GIVE UPDATE {-, air, c}
[-,a-.be} {-,a-.be}
{-, a-. b-. c} {-, a-. b-. c}
{-,a-.b_} {-,a-.b_}
For all values of k this will give an interval for [abe} and thus a. We leave the actual
calculation to the reader. The result of this calculation is
a : (0.6675 0.72)
so that Pr{ a(mary)} E [0.6675,0.72].

The interval for be can also be calculated and this is [0.6075, 0.8]. It should be noted
that if only the answer for a(mary) is required the last 5 rows of the last two tables
can be collapsed into one row with the assignment for this row equal to the sum of
the assignments of the five rows. This simplifies the calculation process.

In this example each stage of the process retains the infonnation given by the
appropriate rule. For example in the final table Pr{a(mary) I b(mary), c(mary)} =0.9.
This simply means that there is a member of the family of apriori assignments which
can satisfy the specific evidences. For this example several steps are required for the
fmal iteration to converge. This is because of the imprecision found for Pr{ c(mary)}.
If a point value was used for Pr{ c(mary)} then the iteration would have converged in
one step. In a later section we will deal with the non-monotonic logic case in which
the specific evidences are inconsistent with the family of apriori assignments.

S.2 llIREE CLOWNS EXAMPLE

Three clowns stood in line. Each clown was either a man or a woman. The audience
was asked to vote on each of the flJ'St and last clown being male. 90% voted that the
flJ'St clown, the one on the left, was a man and 20% thought the third clown, the one
on the far right, was a man. Nothing was recorded about the middle clown. What is
the probability that a male clown stands next to a female clown with the male on the
left?

If it were known for sure that the first was male and the third was female then a male
would certainly be standing next to a female with the male on the left This problem
320

can be expressed in flCSt order logic and the theorem proved by case analysis. The
refutation resolution method popular in computer theorem proving programs could
also be used but is much more cumbersome. The problem posed above is a
probabilistic version of this.

The set of labels for this problem is


{mmm, mmf, mfm, mff, fmm, fmf, ffm, fff}
Two evidences have been supplied:-
Evidence 1:- {mmm, mmf, mfm, mff} : 0 .9
Evidence 2:- {mmm, mfm, fmm, ffm} : 0.2

We can combine these two evidences using the general assignment method
0.2 Evidence 2 0.8
{
-- m} {-- f}
Evidence 1 {m_m} {m_f}

x 0.9 - x

0~x~0.2
{f_m} {f _f}

0.2-x x - 0.1

Therefore the support pair for the statement S ="clowns of opposite sex stand next
to each other with a male on the left of the pair =[MIN(0.9 - x), MAX(0.8 + x)] =
[0.7, 1].

We now consider this example using the iterative assignment method. Above we
used specific information about the three clowns in line. We did not use any apriori
information concerning clowns in general. In fact the apriori information we
assumed was of the form
{mmm, mmf, mfm, mff, fmm, fmf, ffm, fff} : 1
This mass assignment could be used. with the iterative assignment method using the
specific information given for updating as foUows

0.9 0.1
{m __ l {f __ }
{m } : 0.9 {f __ } : 0.1
K's 1 I
321

0.2 O.S
L_m} L_f}
0.9: {m __ } {m _m}: 0.18 {m _f} : 0.72
0.1 : {f __ } {f _ m}: 0.02 {f _ f} : 0.08
K's 1 1

Final Update is
O.IS:(m_m}
0.72: (m _ f)
0.02 : (f_ m)
O.OS : {f _ f}
since (m __ ) : 0 .9is satisfied so that both updating evidences are satisfied.

The loop in this final mass assignment means that we can add and subtract around
the loop without destroying the constraints and all these solutions satisfy the
minimum relative entropy criteria with respect to some apriori assignment in the set
of all possible apriori assignments. The solution which is produced by the iterative
assignment method before any adding and subtracting around the loop is performed
is that corresponding to the maximum entropy apriori assignment, ie. that member of
the set of possible apriori assignments corresponding to maximum entropy.

To obtain the necessary support for the statement S we must minimise the
assignment given to (m _ f) so that we use the assignment
0.2 : (m_m)
0.7 : (m _ f)
0 : (f_m)
0.1: (f _ f)
since 0.02 is the maximum value we can subtract from 0.72 since otherwise the entry
in the cell with assignment 0.02 would go negative.

The possible support for S is obtained by maximising the assignment given t9


(m_ m, m _ f, f _ f) ie minimising the assignment given to {f _ m}. This is also
satisfied by this last assignment giving the support pair [0.7 1] for S.

The above analysis is equivalent to using the iterative assignment with all apriori
distributions over the label set {mmm, mmf, mfm, mff, fmm, fmf, ffm, fff} which
will allow both specific evidences to be retained when using the iterative assignment
method. For example
The apriori (0 0 0.25 0 .25 0 0 0.25 0.25) will give 0.9
The apriori (0.25 0.25 0 0 0.25 0.25 0 O) will give O.S
The apriori (0 0 0 .1 0.4 0 0 0 25 . 0.25) will give 0.9
The apriori (0 0.7 0.2 0 0 0.1 0 O) gives 1
The apriori (0.2 0.4 0 0 .3 0 0 0 01) .gives 0.7
322

-
6. NON MONOTONIC REASONING

6.1 WHY SHOULD mERE BE A PROBLEM?

Consider the following example. Population statistics tell us that a thirty year old
Englishman has a very high probability of living another 5 years. The statistics also
tell us that a thirty year old Englishman who has lung cancer only has a small
probability of living another 5 years. We are told that John is a thirty year old
Englishman. We can conclude that it is very probable that he will live another 5
years. If we are later told that he has lung cancer then we conclude he has little
chance of living another 5 years. What we could conclude before this additional
piece of information was given we can no longer conclude. From a logic point of
view it appears that we have a situation in which we can approximate the modelling
of this situation by replacing propositions with high probabilities with those
propositions and propositions with low probabilities with their negation. Thus we
have

'Vx {Englishman (x) 1\ InThirties (x)} ::> Live5yrsmore (x)


'Vx {Englishman (x) 1\ InThirties (x) 1\ Cancer (x)} :::> -. Live5yrsmore (x)
If Englishman (John) 1\ InThirties (John)
then we conclude LiveSyrsmore (John)
If Englishman (John) 1\ InThirties (John) 1\ Cancer (John)
then we conclude -. Live5yrsmore (John)
showing a nonmonotonic behaviour.
Thus situations like the above seem to make difficulties if we tty to model them
using first order predicate logic.

From a probabilistic point of view there is no problem. We are told that


Pr{Live5yrsmore (x) I Englishman (x) 1\ InThirties (x)} is high;
for any x ------------ (1)
Pr{Live5yrsmore (x) I Englishman (x) 1\ InThirties (x) 1\ Cancer (x)} is low;
for any x --------- (2)
This will not lead to any form of inconsistency. In fact if it is known that
Pr{Live5yrsmore (John) I Englishman (John) 1\ InThirties (John)} is high
this will teU us nothing about
Pr{Live5yrsmore (John) I Englishman (John) 1\ InThirties (John) 1\ Cancer (John)}
which can take any value in the range [0,1].

We make inferences by selecting the correct sample space using the given specific
information and determine the desired probability using this. In the case of John who
is known to be an Englishman in his thirties the answer for the probability of him
living another 5 years will be "high" if this is aU we know about him. If we also
know that he has lung cancer then a different sample space is used and the answer is
"low".
323

In tenns of the iterative assignment method, the general statements (I) and (2) above
are used to detennine a family of apriori assignments which are updated with the
specific evidences concerning John. These specific evidences could be uncertain in
some sense, ie probabilistic statements, in this case.

The next example illustrates this.

6.2 PENGIDN EXAMPLE

We reconsider this example which was discussed above.

fly(X) :- bird(X) : [0.9, 0.9]. ------------------------------------------------------ (1)


bird(X) :- penguin(X). ----------------------------------------------------------- (2)
fly(X) :- penguin(X). ---------------------------------------------------------- (3)
penguin(obj) : [0.4, 0.4] ---------------------------------------------------------- (4)
bird(obj) : [0.9, 0.9]. ------------------------------------------------------------- (5)

The rules (I), (2) and (3) defme the family of apriori assignments. (2) and (3)
eliminate certain possible labels as discussed above. The labels are:

yl -,b-,pof
y2 -,b-,pf
y3 b-,Jhf
y4 b-,pf
y5 bp.f
so that y4 / (y3 + y4 + y5) =0.9 and yl + y2 + y3 + y4 + y5 =1
If we let
k =y3 + y4 + y5 ---------------------------------------------------------------- (6)
then
y4 =0.9k --------------------------------------------------------------------------- (7)
and the family of apriori for a given k, 0 < k S 1 is
(b-, pf) : 0.9k
(-,b-,p_}:I-k
(b_-,f}:O.lk
detennined by combining (6) and (7) using the general assignment method.
This family of assignments are updated using the specific evidences (4) and (5) with
the iterative assignment method using the scheme

0.9k (b-,pf) UPDATE USING (b-,pf)


O.1k (b _ -, f) Pr(b)} =0.9 (b _ -,f)
I-k (-,b-,p_) TO GIVE UPDATE (-,b-,p_l
324

{b-.pf} UPDATE USING {bJhf}


{b_ ...,f} Pr{(P)} =0.4 {b-.pf}
{...,b-.p_} TO GIVE UPDATE {b-.lhf}
{...,b-.p_ }

{bJhf} UPDATE USING {bJhf}


{b-.pf} Pr( (b)} =0.9 {b-.pf} I
{b-.Jhf} TO GIVE UPDATE {b-.Jhf} T
{...,b-.p_} {...,b-.p_ } E
R
A
{bJhf} UPDATE USING {bJhf} T
{b-.pf} Pr{(P)} =0.4 {b-.pf} E
{b-.Jhf} TO GIVE UPDATE {b-.lhf}
(-,b-.p_ ) {...,b-.p_ }

From the fmal family of assignments we can determine the support pair for f(obj)
f(le) : [assignment for (b-. pf), assignment for (b-. pf) + assignment for {..., b-. p _ }]
=[0.45, 0.55]
This final support pair is in actual fact independent of k, so that this is the actual
support pair for "f'

ie.Pc(fly(obj)} £ [0.45,0.55]

6.3 COMPLETE MODEL FOB PENGUIN EXAMPLE

Consider the program


bird(X): [0.7,0.7].
fly(X) :- bird(X) : [0.9,0.9].
fly(X) :- bird(X), penguin(X) : [0,0][0.95, 0.95]L -1[0.1,0.1].
bird(X) :- penguin(X) : [1, 1].
fly(X) :- penguin(X) : [0,0)].
penguin(obj): [0.4,0.4].
bird(obj) : [0.9,0.9)].

This program says that


The proportion of birds in the relevant population of objects is 70%. 90% of the birds
can fly. No object which is a bird and penguin can fly, 95% of birds which are not
penguins can fly, 10% of objects which are not birds can fly. All penguins are birds.
No penguin can fly. The 4 support pairs associated with the third rule correspond to
Pr(fly(X) I bird(X), penguin(X)}, Pr{fly(X) I bird(X),..., penguin(X)}
Pr{fly(X) I ..., bird(X), penguin(X)}, Pr{fly(X) I ..., bird(X), ..., penguin(X)}
respectively.

It also gives specific information about the object obj, namely that there is a
325

probability of 0.9 that obj is a bird and a probability of O.4lhat obj is a penguin.

This infonnation allows the following unique distribution over the relevant labels to
be constructed:
Apriori
y1 =0.27
y2= 0.03
y3 =0.0332
y4 =0.63
y5 =0.0368
using
y4 / (y3 + y4) = 0.95 ; y2 / (yl + y2) = 0.1 ; y4 / (y3 + y4 + y5) = 0.9
y3 + y4 + y5 = 0.7 ; y1 + y2 + y3 + y4 + y5 = 1

The iterative assignment update then gives (fly Mary) : (0.485 0.485).

Intuitive solution
In this problem we are presented with two pieces if infonnation:
1. Object Mary came from a population with statistics
...., 1>-. po f 0.27
....,1>-.pf 0.03
1>-. po f 0.0332
lnpf 0.63
bp-, f 0.0368
so that
IF object obj has properties bp then Pr(obj can fly) = 0
IF object obj has properties 1>-, p then Pr(obj can fly) = 0.63 /0.6632 = 0.95
IF object obj has properties...., 1>-, p then Pr(obj can fly) = 0.03/0.3 = 0.1

2. Object properties
2(a) b: 0.9 ; ...., b: 0.1
2(b) P : 0.4 ; ...., P : 0.6
2(c) obj cannot be a penguin and not a bird
Combining 2(a) and 2(b) taking account of 2(c) by allowing only the set of labels
{...., 1>-, p. b...., p. bp }
using the general assignment method gives
p :0.4 ....,p : 0.6
bp b-.p
0.9
b 0.4 0.5

....,~ ....,b-.p
0.1 (not allowed)
....,b 0.1
0
326

giving
bp: 0.4 ; b-. p : 0.5 ; ...., b-. p : 0.1

Expected value of Pr(obj can fly) = 0.5*0.95 + 0.1*0.1 = 0.485


the value given by the iterative assignment method.

We can write
P'r(t) =Pr(f 1bp)P'r(bp) + Pr(f 1b-. p)P'r(b-. p) + Pr(f I...., b-. p)P'r(f I...., b-. p)
where Pr(.) signifies a probability determined from the population statistics,
information I, and P'r(.) signifies a probability determined from the specific
information, information 2 and set of possible labels.

This is a form of Jeffrey's rule.

7. SEMANTIC UNIFICATION

7.1 NESTED SETS • PoSSIBII,ITV DISTRIBVOQNS AND MAss AsSIGNMENTS

Let X = (xl, x2, ... ,xn)


Let AI, A2, ... , An be nested subsets of X such that
. Al C A2 ... C An where Ai = (xl, ... , xi)
Let m be a mass assignment over these nested sets

m(Ai) ~ 0 , all i

L m(Ai) = 1

Let the necessary support and possible support measures for this special case of
nested sets be called necessary and possibility measures denoted by N(.) and P(.)
respectively. It is easy to show that
P(A u B) =MAX (P(A), P(B)}
N(A n B) + MIN (N(A), N(B)}
for all A, B E P(X)
[ZADEH 1978], [KLIR, FOLGER 1988].

Let Pf be a function
Pf: X ~ [0,1]
called the possibility distribution of f over X.

Let pi = pt<xi) for all xi E X and ordered such that


PI ~ P2 ~ ... ~ Pn
We will only consider normalised possibility distributions corresponding to PI = 1.
327

Define a mass assignment over the nested sets


Al = (xl), A2 = {xl. x2} •...• An = X
as
mi = m(Ai) where mi = Pi - Pi+ 1 with Pn+1 = 0
so that

P(Ai) = L m(Ak) =MAX p.


xi E Ai 1
AirtAk*0
and more specifically
n
P({xi}) = L mk = Pi
k=i
Corresponding to a possibility distribution Pf for f over X there is a unique mass
assignment m over the subsets {Ai} given by the set of formulae above.

Let f be a normalised fuzzy set


f = xl/ Xl + x2/ X2 + ... + xn / 10
where Xl=l and Xl~ X2 ~ ... ~ 10
This induces a possibility distribution Pf over X given by
pt<xi) = Pi = ~
with an associated mass assignment over the nested sets
Al = (xl), A2 = (xl. x2), ...• An = (xl. x2 •...• xn)
given by
meAl) = I - Xl ; m(A2) = X2 - X3 ; ... ; meAn) = 10
This mass assignment represents the family of possible probability distributions over
X induced by the fuzzy set f.

We can generalise this to the case of continuous fuzzy sets like those discussed in the
introduction but we will not do this in this paper. The continuous case can always be
treated by approximating the continuous fuzzy set f with membership function Xf
defmed over R by a discrete set of pairs (xi / Xi) where XtCxi) = Xj and the interval R
is approximated by the set of points (x 1. x2•...• xn).

7.2 EXAMPLES

We can associate with the fuzzy set defined on (a. b. c. d. e)


fl = a/ 0.2 + b /0.4 + c / 0.8 + d /1
the mass assignment
(d) : 0.2 ; (c. d) : 0.4 ; {b. c. d} : 0.2 ; (a, b. c, d) : 0.2
328

We give an example with repeated membership levels:


Associated with the fuzzy set
f2 =a / 0.1 + b / 0.3 + c /0.3 + d / 0.7 + e / 1 + f / 1
is the mass assignment
{e,f} : 0.3 ; {d,e,f} :0.4 ; {b,c,d,e,f} :0.2; {a,b,c,d,e,f} :0.1

7.3 VOTING MODEL INTERPRETATION

We will use a voting model with constant thresholds to interpret the meaning of a
fuzzy . set. Consider the fuzzy set "tali" defmed on the height space [4ft, 8ft] by
means of the membership function Xtall. How can we interpret Xtall(Sft 10")?
Consider a representative population sample of persons, S say. We ask each member
of S to accept or reject the height Sft 10" as satisfying the concept "tall". Each
member must accept or reject; there is no allowed abstention. Xtall(Sft 10'') is put
equal to the proportion of S who accept.

We can therefore interpret the fuzzy set


f1 =a / 0.2 + b / 0.4 + c /0.8 + d / 1
as
20% of S accept a as f1
40% of S accept b as f1
80% of S accept c as f1
100% of S accept d as f1
100% of S reject e as f1

One possible voting pattern of acceptances is


1 2 3 4 S 6 7 8 9 10
a a
b b b b
c c c c c c c c
d d d d d d d d d d

An alternative pattern is
1 2 3 4 S 6 7 8 9 10
a a
b b b b
c c c c c c c c
d d d d d d d d d d

The fIrst pattern is more reasonable than the second. In the second pattern voter 3
accepts a which has a low membership level but doesn't accept b which has a higher
membership level. It seems that anyone who accepts a member with a certain
membership level will accept all members with a higher membership level. This we
call the constant threshold assumption. The fast pattern satisfies the constant
329

threshold assumption. From the first pattern we can deduce


20% of S give acceptance to exactly (d)
40% of S give acceptance to exactly {c, d}
20% of S give acceptance to exactly {b, e, d}
20% of S give acceptance to exactly {a, b, c, d}
and this defines a mass assignment over the nested sets
{d}, {e, d}, {b, c, d}, {a, b, c, d}
namely
{d} : 0.2 ; {c, d} : 0.4 ; {b, c, d} : 0.2 ; {a, b, c, d} : 0.2

We can interpret this mass assignment in the following way. If the population S is
told Z has property fl and a member of the population drawn at random is asked
what the value of this property taken from {a, b, c, d, e} for Z is, the answer would
be a family of distributions over {a, b, c, d, e} deduced from the mass assignment
above.

This interpretation is consistent with the general method given above.

This interpretation is not valid if the fuzzy set is non- normalised since the constant
threshold model cannot be satisfied.

7.4 SEMANTICJJNWICATION

We discussed the need to determine an interval containing the conditional probability


Pr{age(mary, middle_aged) I age(mary I abouc35) for the example given in the
introduction. This we term semantic uniftcation.

Consider the statements


X is fI
a is f2
where fI and f2 are fuzzy sets dermed on the universe of discourse F. Then we are
interested in determining Pr(a is fIl a is f2}.

We can associate the mass assignments (ml, FI) and (m2, F2) with fI and f2
respectively where FI and F2 are the focal elements and are nested sets.

For any member sli of FI and any member s2j of F2 we can determine the support
pair for sli I s2j from the set {[O, 0], [I, 1], [0, I]}. Let this be [Sn(sli I s2j), Sp(sli I
s2j)]

Let ml = {mli} and m2 = {m2j}

Therefore the expected value of Pr(a is fI I a is f2) is contained in the support pair
[Sn(a is fI I a is f2), Sp(a is fIl a is f2)]
where
330

Sn(a is fll a is 12) = L mli.m2j.Sn(sli Is2j)


i, j

Sn(a is fll a is 12) = L mli.m2j.Sp(sli I s2J)


i, j
7.5 EXAMPLE

Consider the fuzzy sets dermed on {a, b, c, d, e}


fl = a/ 0.2 + b /0.4 + c / 0.8 + d /1
12 = a/I + b /0.8 + c /0.1

The corresponding associated mass assignments are


Cd} : 0.2 ; {c, d} : 0.4 ; {b, c, d} : 0.2 ; {a, b, c, d} : 0.2
and
{a} : 0.2 ; {a, b} : 0.7 ; (a, b, c) : 0.1
respectively.

Therefore
S({d} I {a}) =[0,0]
S({c, d} I (a}) =[0,0]
S({b, c, d} I {a}) =[0,0]
S({a, b, c, d} I {a}) = [1, 1]
and
S({d} I {a, b}) = [0,0]
S«(c, d} I (a, b}) =[0,0]
S«(b, c, d} I {a, b}) = [0, 1]
S«(a, b, c, d} I (a, b}) =[1,1]
and
S«(d} I {a, b, c}) =[0,0]
S«(c, d} I (a, b, c}) = [0,1]
S({b, c, d} I {a, b, c}) =[0,1]
S({a, b, c, d} I (a, b, c}) = [I, 1]
so that
Sn(a is fll a is 12) = 0.2*0.2 + 0.2*0.7 + 0.2*0.1 = 0.2
Sp(a is fll a is 12) = 0.2*0.2 + 0.2*0.7 + 0.2*0.7 + 0.4*0.1 + 0.2*0.1 + 0.2*0.1 = 0.4

so that
The support pair for the unification of fl given 12 is
a is fll a is 12 : [0.2,0.4]

7.6 A SPECIAL CASE

LetF= {el,e2, ... ,el0}


331

and
f = ellO.1 + e2/0.2 + e3/O.3 + e410.4 + e510.5
+ 00/0.6 + e710.7 + eS/O.S + e9/O.9 + e10/1.0
then
fll fl : [0.55, I]

LetF= {el,e2, ... ,eIO}


and

f = ellO + e2lO.1 + e3/O.2 + e4/O.3 + e510.4


+ 00/0.5 + e7/0.6 + eS/O.7 + e9/O.S + e101O.9
then
fl f : [0.45, I]

These are two approximations for detennining ftf where f is a ramp fuzzy set on an
interval R. In the limit as more and more points in R are used we obtain
fl f: [0.5, I].

This illusb'ates how we can deal with continuous fuzzy sets.

7.'ITERATIVE ASSIGNMENT METHOD WITH SEMANTIC JJNIFICAIJQN

Consider the program discussed in the introduction

married(X):- age(X, middle_aged), has_children(X): [0.7,0.9].


age(mary, abouc35)
has_children(mary) : [O.S, I].

We can ask the query


1- married(mary)

To answer this query we determine the support pair[x, y] below by the method in the
last section applied to
middle_aged 1abouc35

age(mary, middle_aged) :- age(mary, abouc35) : [x, y].

We then solve

married(X) :- age(X, middle_aged), has_children(X) : [0.7,0.9].


age(mary, middle_aged) : [x, y].
has_children(mary) : [O.S, I].

using the iterative assignment method as described previously.


332

8. CONCLUSIONS

This paper provides a general approach to evidential reasoning when the knowledge
representation is in the form of rules and fact with both probabilistic and fuzzy
uncertainties included. The methods provided ean be used for other fonns of
knowledge representation, for example, Bayesian networks [pEARL 1988], Moral
Graphs [LAURITZEN, SPIEGELHALlER 1988] with extensions to the case of
uncertain specific information and valuation-based languages for expert systems
[SHENOY 1989]. The non-monotonic case is not seen to be a problem. Without
decomposition methods the approach given here could easily become
computationally excessive. Decomposition methods have only been lightly touched
on in this paper although expressing knowledge in the form of rules provides a
natural decomposition. Inference diagrams mean be used to construct a
decomposition from a group of rules. For special cases the decomposition allows the
calculus of support logic programming used in FRll. to be used for answering
queries. The methods given here extends FRll. to cases which cannot be treated by
the present version. The next version will take account of these extensions.

9. REFERENCES

Baldwin J.F, (1986), "Support Logic Programming", in: A.Uones et al,. Eds., Fuzzy
Sets Theory and Applications, (Reidel, Dordrecht-Boston).

Baldwin J.F, (1987), "Evidential Support Logic Programming", Fuzzy Sets and
Systems, 24, pp 1-26.

Baldwin J.F. et ai, (1987), "FRll. Manual", Fril Systems Ltd, St Anne's House, St
Anne's Rd, Bristol BS4 4A, UK

Baldwin J.F., (1990a), "Computational Models of Uncertainty Reasoning in Expert


Systems", Computers Math. Applic., Vol. 19, No 11, pp 105-119.

Baldwin J.F., (1990b), "Combining Evidences for Evidential Reasoning", Inll. of


Intelligent Systems, To Appear.

Baldwin J.F., (199Oc), "Towards a general theory of intelligent reasoning", 3rd Inl
ConfIPMU, Paris, luly 1990

Dubois D., Prade H., (1986), "On the unicity of Dempster Rule of Combination",
Inl J. oflntelligent Systems, 1, no. 2, pp 133-142

Jeffrey R., (1965) "The Logic of Decision", McGraw-Hill, New York

Klir GJ., Folger T.A., (1988), Fuzzy Sets, Uncertainty, and Information, Prentice-
333

Hall

Lauritzen S.L, Spiegelbalter D..J., (1988), ''Local computations with probabilities


on graphical structures and their application to expert systems", 1. Roy. Stat Soc.
Ser. B 50(2), 157-224

Pearl J., (1988), "Probabilistic reasoning in Intelligent Systems", Morgan Kaufmann


Pub. Co.

Sharer G., (1976), "A mathematical theory of evidence" ,Princeton Univ. Press

Shenoy P.P., (1989), "A Valuation - Based Language for Expert Systems, Int 1. of
Approx. Reasoning, VOl 3, No.5.

Zadeb L, (1965), ''Fuzzy sets", Information and Control, 8, pp 338-353.

Zadeh L, (1978), ''Fuzzy Sets as a basis for a theory of Possibility", Fuzy Sets and
Systems 1,3-28
16
PROBABILISTIC SETS
PROBABILISTIC EXTENSION OF FUZZY SETS

Kaoru Hirota
Dept. of Instrument & Control Engineering College of
Engineering, Hosei University 3-7-2 Kajino-cho, Koganei-
city, Tokyo 184, Japan

Introduction
In the field of pattern recognition or decision making
theory, the following complicated problems have been
left unsolved: (l)ambiguity of objects, (2)variety of
character, (3)subjectivity of observers, (4)evolution of
knowledge or learning. With regard to each problem,
however, there are several general theories: many-valued
logic, fuzzy set theory (in connection with (1)and(2)),
modal logic (in conj unct ion wi th (2) and (4) ), and
subjective probability (in relation to (3)). It seems,
however, that there are few carefully thought-out
investigations by paying attention to all problems
mentioned above. In this paper we would like to give our
opinion about these problems and to introduce a new
concept called 'probabilistic sets'.
By giving examples in comparison with fuzzy set theory,
the background idea of probabilistic sets is explained
in Section 2. In probabilistic sets, it is essential to
regard the value of membership functions of fuzzy sets
as a random variable. A probabilistic set A on a total
space X is defined by a defining function ILA(X, w),
which is a point (i.e. x E xl-wise (8, BJ-measurable function
from a parameter space (n.B,p) to a characteristic space
(nc. Be)' The parameter space (n, B, P) is a probability
space and is closely related with subjectivity,
personality, and evolution of knowledge. The
characteristic space (n e , Be) is a measurable space
usually adopt ([ 0,1) ,Borel sets) as (n e , Be) . Section 3
describes definitions of probabilistic sets from a
measure-theoretical viewpoint. The concept of
probabilistic sets includes the concept of classical
fuzzy sets . Some other properties are important results
336

is that the family of all probabilistic sets constitutes


a complete pseudo-Boolean algebra. In Section 5, some
new concepts are shown such as moment analysis and
expected cardinal numbers. The possibility of moment
analysis is an essential feature of probabilistic sets
and it is a great advantage in applications.

The background idea of probabilistic sets


Digital computers have been widely used in the field of
pattern recognition, decision making theory, artificial
intelligence and so on. It should be noted, however that
they involved the following complicated problems to be
solved:
(1) ambiguity of objects,
(2) variety of property or character,
(3) subjectivity of observers,
(4) evolution of knowledge or learning.
In order to take up these problems, several general
studies have been made such as fuzzy set theory, many-
valued logic, modal logic, quantum logic, subjective
probability. In particular, fuzzy set theory has been
widely studied. (More than one thousand papers have been
published since L.A. Zadeh presented fuzzy set theory
[14].) We have also been studying these problems
especially by paying attention to inherent and special
characteristics of pattern recognition and decision
making theory. Giving an example, we shall deal with the
background of our idea 'probabilistic sets' and shall
compare it with fuzzy sets.
Let all real numbers be a total space X . Consider all
numbers nearly equal to one and all nu.bers nearly equal
to minus one. In fuzzy set theory, their membership
functions are shown as in Fig.l.

h~ffi
-1 0 1 X -1 0 1 X -10 1 X
(a) (b) (C)

Fig. 1. Fuzzy sets; (a) numbers near one, (b) numbers near minus one,

(c) the union (numbers near one or minus one).


337

A~x~
(0-1) (0-2) (01-3)

4-.~~
(b-l) (b-2) (b-3)

~~~(c-1)
d:!fining fllrlicn
(c-2)
mean valUe!
(c-3)
variance

Fig. 2. Probabilistic sets; (a) numbers near one, (b) numbers near minus one,
(c) the union (numbers near one or minus one).

In this situation, however, the following discussion


may be possible: If the degree of ambiguity were
accurately given, it would no longer be ambiguous.
Although mean value or variance may be determined and a
rough tendency nay be given, it is impossible in general
to assign definite [O,l]-values. To make the matters
worse, the tendency varies according to observers'
subjectivity, situations and so on. Hence we shall
i n t rod u c e apr 0 b a b iIi t y spa c e (!l, B, P) , call e d a
parameter space, whose element represents a standard of
judgments. It is assumed that if a standard w(e!l) is
fixed, the degree of ambiguity of considered objects
(i.e. elements of the total space X ) can be definitely
determined. A set of all degrees' of ambiguity will be
called a characteristic space (!le, Be) • We usually adopt
([O,l],Borel sets) as the characteristic space, because
it is an infinite totally ordered set with a maximum
element 1 and a minimum element °
and because it is in
harmony with characteristic functions of ordinary sets
and membership functions of fuzzy sets. A probabilistic
set on a total space X is defined by giving a (!le, Be)-
valued random variable on (!l, B, P) for each object X(EX) ,
and this correspondence will be called a defining
function of the probabilistic set. The corresponding
probabilistic sets of Fig.l are shown in Fig.2(a-l),(b-
2),(c-l). The parameter space (!l,B,P) is expected to be
adopted suitably according to each situation, hence in
general no restrictions are added to the parameter space
except it is a probability space. For example, in the
case of Fig.2, the parameter space might exist in
observers' subconscious and might be changed according
338

to estimate it by a statistical method. One of the most


important facts in probabilistic set theory is a
possibility of moment analysis by using a probabilistic
measure P of the parameter space. For instance, in
Fig.2, mean values and variances are shown the parameter
space. For instance, in Fig.2, mean values and variances
are shown in (a-2),(b-2),(c-2),and (a-3),(b-3),(c-3),
respectively. The mean value indicates the first
approximation of probabilistic sets and might be
considered to be the same one as a membership function
of 'union' generally has a continuous but non-smooth
(i.e. non-differentiable) point as shown in Fig.l(c) (at
a point of x=O ) . It will be natural to expect a smooth
curve like Fig.2(c-2) as the first approximation of
'numbers nearly equal to one or minus one'. The variance
provides the second information and it indicates a
disordered degree of judgments. Higher moments can be
considered in the same way . Moreover, it can be shown
theoretically that the nth moment around mean value
tends to zero as ntends to infinity (cf. Proposition 9).
Hence, from a practical viewpoint, it is sufficient to
consider only the lower moments, i.e. mean value and
variance. If we consider a probabilistic set with
variance zero, it could be identified with a fuzzy set.
In this sense, it can be concluded that the concepts of
probabilistic sets include classical fuzzy concepts.
To make sure our option, we shall give several
comments. A distinction between the total space X and
the parameter space (n,a~ is very important in
probabilistic set theory. The concept of probabilistic
sets differs intrinsically from Zadeh's way of thinking
[15] on this point.
A notion of fuzzy set of type 2(Mizunoto and Tanaka
[12]) is introduced in order to resolve the difficulty
of settling a definite ambiguous degree. Fuzzy set of
type n is also characterized by n step recursively
defined ambiguity. However, the number of steps (i .e. n )
has no upper bound and, to make matters worse, realistic
meanings decline as n increases. In probabilistic sets,
the ambiguity is arranged on the parameter space and
realistic meanings are made clear in connection with the
subjectivity of observers.
A family of probabilistic sets constitutes a complete
pseudo-Boolean algebra(cf . Theorem 1). A pseudo-Boolean
algebra is a subclass of distributive lattices(Fig.3).
339

Hence, from a lattice theoretical viewpoint (cf.[l]),


probabilistic set theory takes its position between L-
fuzzy set theory [3] and Boolean algebra valued set
theory [13].
In probabilistic set theory, the parameter space
(n,B,p) plays an important role, but it has no
restriction except it is a probabilistic measure space.
The most important task in applications is a choice of a
suitable parameter space, especially an establishment of
probabilistic measure P. Finally, we would like to add
that there is no need to recollect a probabilistic
randomness like casting a dice in spi te of a diction
'probabilistic' sets.

Definitions of probabilistic sets


The above discussion is informal from a mathematical
point of view. The strict definitions are shown in this
section. The mathematical foundation of this theory is
measure theory and some well-known facts in measure
theory will be used(cf.[5]).
First, we would like to define the following three
terms. (The meanings were discussed in a previous
section. )
Defini tion 1.
(n, B, P) is a parameter space, (nc, Bc) = ( [0,1] ,Borel sets)
is a characteristic space, M={IL I lL:n-nc(B,Bc) -
measurable function} is a family of characteristic
variables.
It is easily shown that M satisfies the following
properties.
Proposition 1.
For arbitrary IL/S(lLi EM,i=1,2,... at most countably
infinite), the following properties are satisfied.
(1)
(2)
where c E nc = [0, 1] (IL: constant tn.), (3)
340

IILI -1L21 e M, (4)

AILI + (1- A)1L2 eM where 0 ~ A~ 1, (5)

lL~eM where a ~O, (6)

1L11L2 e M, (7)
inf ILl eM,
i;;.1 (8)

sup lLi eM, (9)


i;;at

lim lLi = sup inf lLi e M, (10)


i_!XI i.-I j"'i

(11)
lim lLi = inf sup lLi e M.
i-+oo ' __ I j;;ai

The fundamental definition of probabilistic sets will


be given as follows. Here a total space X = {xl ' which
represents a set of all objects discussed in each
situation, is arbitrarily fixed.

Definition 2.
A probabilistic set A on X Is defined by a defining
function ILA
ILA :Xx.fl-.flc'
UJ UJ
(12)
(x, w)>-+ ILA (x, w)

where ILA(X,') is the (B, Be) -measurable function for each


fixed x(eX).

For any two probabilistic sets A and B , whose


defining function are ILA (x, w) and ILB(X, w), respectively,
A is said to be included in B(AcB) if for each x(eX)
there exists E(eB) which satisfies

P(E) = 1, (13)
ILA (x, w) ~ ILB(X, w) for all weE.. (14)

In this situation we will sometimes use a brief notation


as follows,
341

ILA(X, W)"'ILB(X, w) for all x e X and a.e. we!l (15)

If both AcB and BcA are satisfied, A and Bare


said to be equivalent (A:: B) . (Indeed thi s relat ion ::
satisfies an equivalence relation; 1. e. reflexivity,
symmetricity, and transitivity.) All equivalent
probabilistic sets are considered to be the same one and
are not distinguished. All probabilistic sets on X is
said to be a family of probabilistic sets and is denoted
by ~(X) •
Note.
An element of ~(X) represents an equivalence class of
M by the equivalence relation :: for each x(eX) .
The inclusion relation in ~(X) satisfies reflexivity,
anti-symmetricity, and transitivity , hence (~(X) , c) •
constitutes a poset (partially ordered set).

In the following, several operations in ~(X) will be


defined. A fundamental operation in ~(X) is 'union',
howe v e r , i t i s a l i ttl e com p 1 i cat e d . Let Ay
be r, r :possibly infinite) be probabilistic sets on X
whose defining functions are ILA)X,W) respectively. The
union of {Ay}..,er, which is denoted by UAy, is defined
by a defining function lLu A)X, w) which will be given by
the following procedure. For the time being, consider a
case where each x(eX) is arbitrarily fixed . Then ILA (x, ·)
may be regarded as a function of we!l (1. e. an element
of M). Since ILA)X, ,) is a !lc = [0,1] -valued measurable
function, and since the total aeasure is finite (i.e.
P(fl) = 1 ), ILA.(X, ,) is always P -integrable,

0 ... L ILA.(X, w) ' dP(w)"'l. (16)


For arbitrarily fixed n indices 1'1> 'Y2> •• • ,'Yn(er) , a
function max {ILA. (x,·) 11 ... i ... n l is also an element of M
(see Proposition i(2» . Hence it is also P -integrable,
o... f H
max{ILA.,(x,w)11 ... i ... n}·dP(w)"'1. (17)

The selection of 1' .. 1'2" '" 'Yn froll f is varied . The least
upper bound, denoted by a(x) , can be calculated,
342

a(x) = sup{L maX{ILAy,(X, w) 11 ~ i ~ n}dP(w) I


n E N (natural numbers), 'Y; E r}, (18)

O~a(x)~ 1. (19)

Since a(x) is a 'least upper bound', there exists a


countably infinite subsequence
{max{ILAy,(x, w) 11 ~ i ~ nil I ni E N, 'Y; E Tli= I
such that
lim
J_OO
Ja max{ILAy,(X, w) 11 ~ i ~ ni }· dP(w) = a(x). (20)
Although an elelBent X(EX) was arbitrarily fixed, this
procedure could be done for each X(EX). We shall define
the defining function lLu Ay (x, w) by
lLu Ay(X, w) = sup{max{ILAy,(X, w) 11 ~ i ~ nil 11 ~j <oc} (21)

The justification of this definition will be ensured by


the following Proposition 2.
Proposition 2.
(I)The union U Ay is determined uniquely by (21), i.e.
if there exists another countably infinite subsequence
which satisfies (20), the result given by the same
equation as (21) also belongs to the same equivalence
class of M (for each XEX) in the sense of Definition 2.
(2) For all 'YEr, we have Ayc UA, .
(3) If there exists an A which satisfies AycA for
all 'YE r, then we have UA, cA.
The proof is omitted here, since it requires some
results in measure theory and a rather long description
(cf. [7]).
Although the above stated procedure of union is rather
complicated, it can be simplified in a case that the
index set r is at most countably infinite. For example,
the union of A and B (whose defining functions are
ILA(X, w) and ILB(X, w), respectively) may be defined by

(22)
343

for each x E X and each WEn, and the union of {An}:_1 may
be defined by
ILU A. (x, w) = SUP{ILA. (x, w) /1 :s;; n < oc} (23)

for each x E X and each WEn. The complexity in a general


case arises from the fact that A1 is not always closed
by more than countably infinite operations (see
Proposition 1).
The 'intersection' of {AY~Er , which is denoted by
nAy, is a dual concept of 'union' UAy' and it is
defined as follows. Put
b(x) = inf{L min{ILAy,(X, w) /1:s;; i:s;; n} dP(w) / n E N, 1'; E r}, (24)

O:s;;b(x):s;;I, (25)

and choose a countably infinite subsequence


{min{ILAy,(X, w) /1:s;; i:s;; nil / ni E N, 1'; E r}i= I (26)

such that
lim
}_oo
f
11
min{ILAy(x, w) /1:s;; i:s;; n} dP(w) = b(x),
t J
(27)

and define

The justification of this definition will also be en-


sured by the same proposition as Proposition 2. (Change
symbols U, c to n,:::> respectively in Proposition 2.)
Some other useful concepts or operations on ~(X) could
be defined. They will be summarized as follows. The
justification of these definitions is also ensured by
Proposition 1.
Definition 3.
Total set X
ILx(X, w) =1 for all x E X and a.e. WEn. (29)
(This notation will be omitted until (36).)
Void set (or null set) ~
344
IL,.(X, w) = O. (30)

Complement of A AC

ILA'(X, w) = l-ILA (x, w) . (31)

Difference A - B
ILA-B(X, w) = max{O, ILA(X, W)-ILB(X, w)}. (32)

Symmetric difference AaB


ILA.1 B(X, w) = lILA (x, w) -ILB(X, w)l· (33)

Algebraic sum AEaB

A sum A;B (where O~A ~ 1)


ILA~B(X, w) = AILA(X, w)+(l- A)ILB(X, w) . (35)

a power AU (where a;;;'O)

(36)

Superior limit of {An}';;. I


lim An
"_GO
n kU- n Ak •
= ,,-I (37)

Inferior limit of {An}';;=,

lim An = U
.. n.. A k• (38)
"_00 ,,-I k=n

An ordered pair (ILA(X, w), ILB(X, w» is said to be a direct


product of A and B. and is denoted by A x B .
Ay is said to be an one point probabilistic set at
yeX.if its defining function ILA,(X,W) satisfies
x=f y,
x= y. (39)
345

Ay is said to be a full one point probabilistic set at


yE X, if its defining function #LA/X, w) satisfies
L #LA,(X, w) ·dP(w) = {~
xi- y,
x= y.
(40)

Some properties of probabilistic sets


Some properties of probabilistic sets can be
characterized from a lattice theoretical viewpoint
(cf. [17] ) .
A family of probabilistic sets (~(X),c) constitutes a
poset (see the note of Definition 2). For arbitrary
A, B(E~(X» , there exist a supremum AU B and an infimum
AnB with respect to this partial order c (see
Proposition 2 (2)and(3». Hence the poset (~(X),c) forms
a lattice and the following proposition holds. (Note
that a set of the following properties is a necessary
and sufficient condition of being a lattice.)
Proposition 3.
For arbitrary A, B, C(EPJl(X» , we have
commutativity
A UB=BUA, (41)
A nB=BnA, (42)
associativi ty
(A U B) U C = A U (B U C), (43)
(A n B) n C= A n (B n C), (44)
absorption law
A U(A n B)=A, (45)
A n(A U B)=A. (46)
It is also possible to show that there exist pseudo
complements in ~(X) • Let A and B be arbitrarily fixed
two probabilistic sets whose defining functions are
ILA{X, w) and #LB(X, w), respectively. Consider the following
equation,
if ILA (x, w) .;:; #LB (x, w),
ILA.(X,W)={ (1 ) (47)
#LB X, W if #LA (x, w) > ILB(X, w).
346

For each x(eX), IJ-A'(X,') is a (B, Bc) -measurable function and


is an element of M, since a set {w I IJ-a(x, w)- IJ-A(X, w)~O}
belongs to B (i. e. this set is measurable). Hence it is
possible to define a probabilistic set A' by (47). It is
also clear that A' is the largest probabilistic set of
those C's which satisfy An Cc B(Ce ~(X». In this sense,
A' is said to be a pseudo complement of A relative to
B. Hence (~(X), c) constitutes a pseudo-Boolean algebra
(cf. Fig.3). (A pseudo-Boolean algebra is a relative
complemented lattice with a minimum element. In this
case the minimum element is q,.)
Moreover, for arbitrary {AY}YEr(c~(X» ( r :possibly
infinite), the existence of U Ay and nAy was shown in
a previous section and they played a role of a supremum
and an infimum with respect to the order c . Hence it is
proved that the lattice (~(X), c} is complete, and so we
can conclude the following Theorem 1 from a lattice
theoretic~l viewpoint.

POSET PARTIALLY ORDERED SET


LATTICE
MODULAR LATTICE
DISTRIBUTIVE LATTICE
PSEUDO-BOOLEAN ALGEBRA
BOOLEAN ALGEBRA

Fig. 3. An inclusion diagram of various lattices.

Theorem 1.
A family of probabilistic sets (~(X),c) constitutes a
complete pseudo-Boolean algebra.
Note.
In ordinary set theory, a family of all subsets
constitutes a complete Boolean algebra. The difference
between the two is the lack of a complemented law
(i.e. AU AC f X, An AC f q,) • In probabilistic set theory,
it is essential to consider ambiguous states, so we can
not get any definite information if we know that the
considered object is not in one state. In ordinary set
theory, however, we information that it is not in one
347

state. (Note that ordinary set theory can be considered


to be a two-valued logic.) Hence the lack of
complemented law is unavoidable in probabilistic set
theory.
Since the notion of pseudo-Boolean algebra is included
in that of distributive lattice (see Fig.3), a
distributive law holds in (~(X) , c) . Moreover, in
connection with its completeness, we can generalize
commutative law, associative law, distributive law, and
de-Morgan's law as follows. (Proofs are omitted here.)
Proposition 4.
For arbitrary subfamilies of probabilistic sets
{Ay}yer and {B"'}"'eA'
we have
generalized associative law

Cy,.Ay) U CYA B... ) = ~ (Ay U B... ), (48)

nAy) n ( . .neA B... ) = yn. . (Ay n B... ),


(yer ( 49 )

generalized distributive law

UAy) n (. .U
(yer eA B... ) = U(Ay n B... ),
y~
(50)

CfJ.Ay) u CQ B... )= Q(AyU B... ), (51)


generalized de-Morgan's law
(52)

(53)

Some other important properties in ~(X) will be


mentioned in the following without proofs.
proposition 5.
For arbitrary A, B, C(E~(X», we have
idempotent law
AU A=A, (54)
348

(55)
A nA=A,
involution law
ACC=A, (56)

elimination law
AUB=AUC}~B=C,
AnB=AnC (57)
identi ty law
AUX=X,
(58)
AnX=A, (59)
A UcfJ =A, (60)
An cfJ = cfJ· (61)
Proposition 6.
For arbitrary {An l:. 1( c ~(X» , we have
lim An C lim An. (62)
"_co n_OO

(63)

If AI C A2 C ..• cAn c ... , then we have


00

lim An = lim An = U An. ( 64 )


~ "_00 "-1

If A I :::> Ai:::> ••• :::> An :::> • •• , then we have


00

lim An = lim An = nAn. (65)


,;:::::;;, "_00 "-1

If A 2n + 1 = A and A 2n =B , then we have


(66)

Proposition 7.
Each of (~(X), U), (~(X), n), (~(X),·) ,and (~(X), EB)
constitutes a commutative monoid (i.e. a commutative
semigroup with a unit) and, for arbitrary A,B,C(E~(X»,
we have
349

AAB= (A -B)U(B-A), (67)


A~B = (ACW)C, (68)
A·BcAnBcA+BcAUBeA~R
A
(69)

Note.
In ordinary set theory, it is possible to define
sixteen different kinds of binary operations. (Because
the total space X can be divided into four regions for
arbitrary subsets A and B , hence there exist 24 =16
combinations.) Among these sixteen binary operations,
symmetric difference AAB has a very good property from
an algebraic viewpoint, namely, it constitutes an
Abelian group. In probabilistic set theory, however,
(~(X),~) doesn't satisfy such a good property. On the
contrary, it doesn't satisfy the associative law.
Proposition 8.
Let Xy( 'Yen be total spaces (possibly infinite), then
we have

yer
u
U ~(Xy)c ~( X
yer
y) ,
(70)

n ~(Xy) = ~ n Xy).
yer yer
(71 )

Some extended concepts of probabilistic sets


I.Probabilistic mappings
A mapping f fro. X to Y is usually defined as a
correspondence from an element x(eX) to an element
y(e Y). There also exist' some variations such as a set
function (a correspondence from a subset A(cX) to an
e lemen t y(e Y) ) and a mul t i val ued mapp i ng (a
correspondence from an element x(eX) to a subset
B(cy». The concepts of set functions and multivalued
mappings play an important role in the fields of measure
theory and functional analysis, respectively. In the
field of pattern recognition or learning theory, it is
essential to consider an ambiguous correspondence (i.e.
a probabilistic mapping) which will be defined as
follows.
350

Definition 4.
A probabilistic mapping f from X to Y on a parameter
space (nm,Bm,Pm) is defined by
f:xxnm-y,
IV IV (72)
(x, w",) ..... f(x, w m ).

Some extended concepts can be defined in connection


with probabilistic mappings, such as induced images and
induced inverse images of probabilistic sets by a
probabilistic mapping, and some properties are also
investigated. However, all of them are omitted here.
2.Moment analysis
The parameter space (~B,P) is a (probabilistic)
measure space and plays an essential role in
applications of probabilistic set theory. By using the
measure P of this parameter space, we can carry out
moment analysis. The possibility of a moment analysis is
one of the most important features in probabilistic set
theory and can not be found in other theories.
Definition 5.
Let A be a probabilistic set on X whose defining
function is ILA(X, w) • For each fixed x(eX) , mean value
E(ILA)(X), variance V(ILA)(X), standard deviation U(ILA)(X) ,
nth moment M"(ILA)(X) , nth moment around mean value
M~(ILA)(X), n th absolute moment around mean value
M~(ILA)(X) are de fined as follows.
E(ILA)(X) = J ILA (x, w) · dP(w)( ~ M1(ILA)(X», (73)
II

V(ILA)(X) = J
n
(ILA (x, w) - E(ILA)(X»2 dP(w)( ~ M~(ILA)(X», (74)
(75)
(n EN),
(76)
M~(ILA)(X) = f II
(ILA(X, w)- E(ILA)(X»" . dP(w), (77)
~l~(ILA)(x)=f IJ
IILA(X,w)-E(ILA)(x)l" ·dP(w). (78)
The justification of above stated definitions is
ensured by Proposition I, and the following properties
351

follow from these definitions.

Proposition 9.
In the situation of Definition 5, we have
0.:;; E(ILA )(x).:;; 1 for all x e X, (79)
0.:;; . . . .:;; M 3 (ILA )(x).:;; M2(ILA )(x) .:;; MI (ILA )(x)
= E(ILA)(X)':;; {M2(ILA )(x)}!
for all x e X, (80)

for all x e X, (81)

for all x e X, ( 82 )

for all x e X. (83)

Definition 6.

Let A and B be probabilistic sets on X . For each


fixed xeX ,covariance C(ILA, lLu)(X) and correlation
coefficient r(ILA, ILB)(X) are defined by

C(ILA, lLa)(X) = f.
!I
(ILA (x, w) - E(ILA )(x)) (84)
. (lLu(X, w)- E(lLa)(X» . dP(w),
r(ILA, ILB)(X) = C(ILA, ILB)(X)f../V(ILA)(X) . V(lLa)(X). (85)
(If V(ILA)(X)· V(lLa)(X)=O, r(ILA, lLu)(X) is not defind.)

Proposition 10.
In the situation of Definition 6, we have
0.:;; IC(ILA' ILn)(x)l.:;; V(ILA )(x) . V(lLn)(X)':;; 1, (86)
O.:;;lr(ILA, lLu)(x)l.:;; 1,
(87)

C(ILA, ILA )(x) = V(ILA)(X) , (88)


C(ILA' lLu)(X) = E(ILA' IL/J)(x)-E(ILA)(X)· E(lLu)(X),
(89)
r(ILA, ILII )(x) = ± 1~ there exist real Itumbers a altd /J such that
ILA(X, w) = a . lLu(X, w)+ b for a.e. we fl. (90)
352

Definition 7.
Let AI> A 2 , ••• , An be probabilistic sets on X whose
defining functions are #J.A,(X, w), #J.A2(X, w), ... , #J.A.(X, w) •
respectively. For arbitrary x, Y(EX) • moment matrix
M(x,y) and variable-covariance matrix V(x,y) of
A I' A 2 , • • • , An are defined by

M(x, y) = [miJI",i.i",n where mi•i = J. #J.A,(X, w) · #J.Aj(X, w) · dP(w), (91)


II

where Vi •i = fa (#J.A,(X, w) - E(#J.A.)(X»


(92)
• (#J.A j(x, w) - E(#J.A)(X» • dP(w).

3.Expected cardinal number


In ordinary set theory a notion of the cardinal number
of a finite set is defined as the number of elements of
the set. This concept can be extended to probabilistic
set theory as follows.
Defini tion 8.
Let A be a probabilistic set on X whose defining
function is #J.A(X, w) • The expected support of A is
defined as the following(ordinary) subset of x,
supp A = {x E X I E(#J.A)(X) = fa #J.A(X, w) dP(w»O}, (93)

and the expected cardinal number of A, denoted by #A ,


is defined by

#A = {"ES~PA (fa ILA(X, w) dP(w»)


if # supp A ~ Xo,
(94)
#suppA if # supp A > Xo.

Conclusions
The background idea of probabilistic sets was discussed
in comparison with fuzzy sets, and its mathematical
structure was explained without proofs. Main results are
(l)a family of probabilistic sets constitutes a pseudo-
Boolean algebra; (2)the possibility of moment analysis
is a great advantage in applications(cf . [9]).
The concepts of probabilistic sets presented in this
353

paper seems to provide a new mathematical foundation in


the field of pattern recognition or provide a new
mathematical foundation in the field of pattern
recognition or decision making theory. Several studies
are being done in these fields [9]. We will be glad if
our idea is any help to the people concerned.

References
[l]G.Birkhoff, Lattice Theory,Am.Math.Soc.Colloq.Publ.
(Am.Math.Soc.,Mew York,1969).
[2]J.G.Brown, A note on Fuzzy sets, Information and
Control 18(1971) 32-39.
[3]J.A.Geguen, L-fuzzy sets, J.Math.Anal.Appl.
18(1967) 145-174.
[4]P.R.Halmos, Naive Set Theory(Van Nostrand,New York,
1960) .
[5]P.R.Halmos, Measure Theory (Van Nostrand, New York,
1960) .
[6]K.Hirota, Kakuritsu-Shugoron to sono Oyourei
(Probabilistic sets and its applications),
Presented at the Behaviormetric Society of Japan
3rd Conference (1975) (in Japanese).
[7]K.Hirota, Kakuritsu-Shugoron (Probabilistic set
theory), in: Fundamental research works of fuzzy
system theory and artificial intelligence,
Research Reports of Scientific Research fund
from the Ministry of Education in Japan(1976)
193-213(in Japanese).
[8]K.Hirota, Concepts of probabilistic sets,IEEE Conf.
on Decision and Control (New Orleans)(1977)
1361-1366.
[9]K.Hirota, Extended fuzzy expression of
probabilistic sets-Analytical expression of
ambiguity and subjectivity in pattern recognition,
Presented at Seminar on Applied Functional
Analysis(July,1978) 13-18.
[10]K.Hirota er al., A decision making model-A new
approach based on the concepts of probabilistic
sets, Presented at Int. Conf. on Cybernetics and
Society 1978, Tokyo (Nov. 1978) 1348-1353.
[ll]K.Hirota et al., The bounded variation quantity
(B.V.Q) and its application to feature extractions,
Presented at the 4th Int.Conf. on Pattern Recognition,
354

Kyoto(Nov.1978) 456-461.
[12]M.Mizumoto and K.Tanaka,Some properties of fuzzy
sets of type 2, Information and Control 31(1976) 312-
340.
[13]K.Nanba, shugo-ron(Set Theory) (Science-sha Publ.,
1975) (in Japanese).
[14]L.A.Zadeh, Fuzzy sets, Information and Control
81965) 228-353.
[15]L.A.Zadeh, Probabilistic measure of fuzzy events,
J. Math. Ana .. Appl.23(1968) 421-427.
[16]L.A.Zadeh et al., Fuzzy Sets and their Applications
to Cognitive and Decision Theory (Academic Press,
New York,1975).
INDEX

A
"and" operators, 108 fuzzy mathematical
applications of fuzzy logic programming, 97
control, 82 fuzzy parameters, 112
applications, 114
B
fuzzy rules, 53
biology, 235
fuzzy set operations, 71, 187
boundary detection, 134
fuzzy syntactic analysis, 178
C fuzzy truth values, 55
canonical propositions, 12, 28 G
categorical reasoning, 3
gradual rules, 53
certainty qualification, 37, 49
gray scale, 122
clustering, 123
commonsense knowledge, 42
H
computer vision, 121 hierachical fuzzy logic
consensus, 274 control, 80
high level vision, 136
D
human factors, 202
decision making, 241,263
decision trees, 226 I
default knowledge, 40 image enhancement, 167
defuzzification, 77, 242 image geometry, 154
diagnosis, 240 image processing, 122
importance, 241, 275
E
imprecise matching, 59
evidential reasoning, 297
inference rules, 15,31
extension principle, 17
entailment rule, 15
F conjunction rule, 16
FRIL, 299 disjunction rule, 16
fuzzy c means, 124 projection rule, 16
fuzzy chips, 83 composition rule, 16
fuzzy contraints, 5, 99
K
fuzzy goals, 99
knowledge acquisition, 281
fuzzy linear programming, 99
knowledge representation, 299
fuzzy logic control, 69, 72
self organization, 85
neural networks, 86
356

L Q
learning, 281 quantifiers, 3, 263
learning operators, 289 quantified propositions, 32, 266
linguistic approximation, 197 questionnaires, 221
linguistic quantifiers, 263 R
linguistic variables, 222 rule based system, 254
M rules of quantification, 8
man-machine interactions, 201 S
measures of fuzziness, 148 segmentation, 125, 158
medicine, 235, 253 semantic unification, 326
membership functions, 70, 106, smart shell, 283
161, 187, 239 specificity, 38
meta control, 261 support measures, 302
meta reasoning, 259 syllogistic reasoning, 4, 18
moment analysis, 350 T
N test score semantics, 6
natural language, 185 translation rules, 7
neural networks, 86, 235, 246
nonmonotonic reasoning, 322
p
parallel rules, 61
possibility, 3, 50
possibility distribution, 30, 46
possibility measure, 41, 227, 244
possibility qualification, 41, 47
probabilistic mappings, 349
probabilistic sets, 335, 339
probability297, 335
probabilistic masses, 306
prolog, 299

You might also like