Lotfi A. Zadeh Auth., Ronald R. Yager, Lotfi A. Zadeh Eds. An Introduction To Fuzzy Logic Applications in Intelligent Systems
Lotfi A. Zadeh Auth., Ronald R. Yager, Lotfi A. Zadeh Eds. An Introduction To Fuzzy Logic Applications in Intelligent Systems
Consulting Editor
Tom MitcheU
Carnegie Mellon University
UNIVERSAL SUBGOALING AND CHUNKING OF GOAL IDERARCHIES, J.
Laird, P. Rosenbloom, A. Newell, ISBN: 0-89838-213-0
MACHINE LEARNING: A Guide to Current Research, T. Mitchell, J. Carbonell,
R. Michalski, ISBN: 0-89838-214-9
MACHINE LEARNING OF INDUCTIVE BIAS, P. Utgoff, ISBN: 0-89838-223-8
A CONNECTIONIST MACIDNE FOR GENETIC HILLCUMBING,
D. H. Ackley, ISBN: 0-89838-236-X
LEARNING FROM GOOD AND BAD DATA, P. D.Laird, ISBN: 0-89838-263-7
MACIDNE LEARNING OF ROBOT ASSEMBLY PLANS, A. M. Segre,
ISBN: 0-89838-269-6
AUTOMATING KNOWLEDGE ACQUISITION FOR EXPERT SYSTEMS,
S. Marcus, Editor, ISBN: 0-89838-294-7
MACHINE LEARNING, META-REASONING AND LOGICS, P. B.Brazdil,
K. Konolige, ISBN: 0-7923-9047-4
CHANGE OF REPRESENTATION AND INDUCTIVE BIAS: D. P. Benjamin,
ISBN: 0-7923-9055-5
KNOWLEDGE ACQUISITION: SELECTED RESEARCH AND
COMMENTARY, S. Marcus, Editor, ISBN: 0-7923-9062-8
LEARNING WITH NESTED GENERAUZED EXEMPlARS, S.L. Salzberg,
ISBN: 0-7923-9110-1
INCREMENTAL VERSION-SPACE MERGING: A General Framework
for Concept Learning, H. Hirsh, ISBN: 0-7923-9119-5
COMPETITIVELY INHIBITED NEURAL NETWORKS FOR ADAPTIVE
PARAMETER ESTIMATION, M. Lemmon, ISBN: 0-7923-9086-5
STRUCTURE LEVEL ADAPTATION FOR ARTIFICIAL NEURAL
NETWORKS, T.e. Lee, ISBN: 0-7923-9151-9
CONNECTIONIST APPROACHES TO LANGUAGE LEARNING, D. Touretzky,
ISBN: 0-7923-9216-7
AN INTRODUCTION TO
FUZZY LOGIC APPLICATIONS
IN INTELLIGENT SYSTEMS
Edited
by
Ronald R. Yager
Iona College
Lotfi A. Zadeh
University of California, Berkeley
"
~.
INDEX....................................................................................355
AN INTRODUCTION TO
FUZZY LOGIC APPLICATIONS
IN INTELLIGENT SYSTEMS
1
KNOWLEDGE REPRESENTATION
IN FUZZY LOGIC
Lotti A. Zadeh
Computer Science Division, Department of EECS
University of California, Berkeley, California 94720
ABSTRACT
The conventional approaches to knowledge representation, e.g., semantic
networlcs, frames, predicate calculus and Prolog, are based on bivalent logic. A seri-
ous shortcoming of such approaches is their inability to come to grips with the issue
of uncertainty and imprecision. As a consequence, the conventional approaches do
not provide an adequate model for modes of reasoning which are approximate rather
than exact Most modes of human reasoning and all of commonsense reasoning fall
into this category.
Fuzzy logic, which may be viewed as an extension of classical logical sys-
tems, provides an effective conceptual framework for dealing with the problem of
knowledge representation in an environment of uncertainty and imprecision. Mean-
ing representation in fuzzy logic is based on test-score semantics. In this semantics,
a proposition is interpreted as a system of elastic constraints, and reasoning is
viewed as elastic constraint propagation. Our paper presents a summary of the basic
concepts and techniques underlying the application of fuzzy logic to knowledge
representation and describes a number of examples relating to its use as a computa-
tional system for dealing with uncertainty and imprecision in the context of
knowledge, meaning and inference.
INTRODUCTION
Knowledge representation is one of the most basic and actively researched
areas of AI (Brachman, 1985,1988; Levesque, 1986, 1987; Moore, 1982, 1984;
Negoita, 1985; Shapiro, 1987; Small, 1988). And yet, there are many important is-
sues underlying knowledge representation which have not been adequately ad-
dressed. One such issue is that of the representation of knowledge which is lexically
imprecise and/or uncertain.
As a case in point, the conventional knowledge representation techniques
do not provide effective tools for representing the meaning of or inferring from the
kind of everyday type facts exemplified by
(a) Usually it takes about an hour to drive from Berkeley to Stanford in light
traffic.
(b) Unemployment is not likely to undergo a sharp decline during the nextfew
months.
( c) Most experts believe that the likelihood of a severe earthquake in the near fu-
ture is very low.
2
The italicized words in these assertions are the labels of fuzzy predicates,
fuzzy quantifiers and fuzzy probabilities. The conventional approaches to
knowledge representation lack the means for representing the meaning of fuzzy con-
cepts. As a consequence, the approaches based on first order logic and classical pro-
bability theory do not provide an appropriate conceptual framework for dealing with
the representation of commonsense knowledge, since such knowledge is by its na-
ture both lexically imprecise and noncategorical (Moore, 1982, 1984; Zadeh, 1984).
The development of fuzzy logic was motivated in large measure by the
need for a conceptual framework which can address the issues of uncertainty and
lexical imprecision. The principal objective of this paper is to present a summary of
some of the basic ideas underlying fuzzy logic and to desCribe their application to
the problem of knowledge representation in an environment of uncertainty and im-
precision. A more detailed discussion of these ideas may be found in Zadeh (197830
1978b, 1986, 1988a) and other entries in the bibliography.
which act as hedges, e.g., very, more or less, quite, rather, extremely. Such predi-
cate modifiers play an essential role in the generation of the values of a linguistic
variable, e.g., very young, not very young, more or less young, etc., (Zadeh, 1973).
Quantifiers: In classical logical systems there are just two quantifiers:
universal and existential. Fuzzy logic admits, in addition, a wide variety of fuzzy
quantifiers exemplified by few, several, usually, most, almost always, frequently,
about five, etc. In fuzzy logic, a fuzzy quantifier is interpreted as a fuzzy number or
a fuzzy proportion (Zadeh, 1983a).
Probabilities: In classical logical systems, probability is numerical or
interval-valued. In fuzzy logic, one has the additional option of employing linguistic
or, more generally, fuzzy probabilities exemplified by likely, unlikely, very likely,
around 0.8, high, etc. (Zadeh 1986). Such probabilities may be interpreted as fuzzy
numbers which may be manipulated through the use of fuzzy arithmetic (Kaufmann
and Gupta, 1985).
In addition to fuzzy probabilities, fuzzy logic makes it possible to deal with
fuzzy events. An example of a fuzzy event is: tomorrow will be a warm day, where
warm is a fuzzy predicate. The probability of a fuzzy event may be a crisp or fuzzy
number (Zadeh, 1968).
It is important to note that from the frequentist point of view there is an in-
terchangeability between fuzzy probabilities and fuzzy quantifiers or, more general-
ly, fuzzy measures. In this perspective, any proposition which contains labels of fuz-
zy probabilities may be expressed in an equivalent from which contains fuzzy
quantifiers rather than fuzzy probabilities.
Possibilities: In contrast to classical modal logic, the concept of possibility
in fuzzy logic is graded rather than bivalent. Furthermore, as in the case of probabil-
ities, possibilities may be treated as linguistic variables with values such as possible,
quite possible, almost impossible, etc. Such values may be interpreted as labels of
fuzzy subsets of the real line.
A concept which plays a central role in fuzzy logic is that of a possibility
distribution (Zadeh, 1978a; Dubois and Prade, 1988; Klir, 1988). Briefly, if X is a
variable taking values in a universe of discourse U, then the possibility distribution
of X, n x , is the fuzzy set of all possible values of X. More specifically, let 1Cx(u)
denote the possibility that X can take the value u, u E U. Then the membership
function of X is numerically equal to the possibility distribution function 1Cx(u): U-
>[0, 1], which associates with each element u E U the possibility that X may take u
as its value. More about possibilities and possibility distributions will be said at a
later point in this paper.
It is important to observe that in every instance fuzzy logic adds to the op-
tions which are available in classical logical systems. In this sense, fuzzy logic may
be viewed as an extension of such systems rather than as system of reasoning which
is in conflict with the classical systems.
Before taking up the issue of knowledge representation in fuzzy logic, it
will be helpful to take a brief look at some of the principal modes of reasoning in
fuzzy logic. These are the following, with the understanding that the modes in ques-
tion are not necessarily disjoint.
1. Categorical Reasoning
In this mode of reasoning, the premises contain no fuzzy quantifiers and no fuzzy
probabilities. A simple example of categorical reasoning is:
4
Carol is slim
Carol is very intelligent
Carol is slim and very intelligent
In the premises, slim and vel)' intelligent are assumed to be fuzzy predicates. 1be
fuzzy predicate in the conclusion, slim and very intelligent, is the conjunction of slim
and intelligent.
Another example of categorical reasoning is:
Mary is young
John is much older than Mary
John is (much older young).
where (much_older young) represents the composition of the binary fuzzy predicate
much_older with the unary fuzzy predicate young. More specifically, let 1tm..ch DlMr
and n,......, denote the possibility distribution functions associated with the fuzzy
predicates much_older and young, respectively. 1ben, the possibility distribution
function of John's age may be expressed as (Zadeh, 1978a)
1tA,.(Jolut) (u) =v
y (1tm..ch_oIMr (u , v) " n,......, (nu)
where v and " stand for max and min, respectively.
2. Syllogistic Reasoning
In contrast to categorical reasoning, syllogistic reasoning relates to inference from
premises containing fuzzy quantifiers (Zadeh, 1985; Dubois and Prade, 1978a). A
simple example of syllogistic reasoning is the following
most Swedes are blond
most blond Swedes are tall
mos? Swedes are blond and tall
where the fuzzy quantifier most is interpreted as a fuzzy proportion and mos? is the
square of most in fuzzy arithmetic (Kaufmann and Gupta, 1985).
3. Dispositional Reasoning
In dispositional reasoning the premises are dispositions, that is, propositions which
are preponderantly but necessarily always tlUe (Zadeh. 1987). An example of disp0-
sitional reasoning is:
heavy smolcing is a leading cause of cancer
to avoid lung cancer avoid heavy smolcing
Note that in this example the conclusion is a maxim which may be intetpreted as a
dispositional command. Another example of dispositional reasoning is:
usually the probability offailUTe is not very low
usually the probability offailUTe is not very high
(2 usually 9 1) the probability offailure is not very low and not very high
In this example, usually is a fuzzy quantifier which is intetpreted as a fuzzy propor-
tion and 2 usually 9 1 is a fuzzy arithmetic expression whose value may be comput-
ed through the use of fuzzy arithmetic. (9 denotes the operation of subtraction in
fuzzy arithmetic.) It should be noted that the concept of usuality plays a key role in
dispositional reasoning (Zadeh, 1985, 1987), and is the concept that links together
5
Age satisfies the elastic consttaint characterized by the fuzzy predicate young . In
effect, this relation serves to calibrate the meaning of the fuzzy predicate young in a
particular context by representing its denotation as a fuzzy subset, YOUNG, of the in-
terval [0,100].
With this ED, the test procedure which computes the overall test score may
be described as follows:
1. Detennine the age of Maria by reading the value of Age in POPULATION, with
the variable Name bound to Maria. In symbols, this may be expressed as
Age (Maria) = Age POPULATION [Name = Maria] .
In this expression, we use the notation yR [X =a] to signify that X is bound to
a in R and the resulting relation is projected on Y, yielding the values of Y in
the tuples in which X =a.
2. Test the elastic consttaint induced by the fuzzy predicate young:
't) = I' YOUNG [Age = Age (Maria)]
which signifies that the overall test score is taken to be the smaller of the
operands of A. The overall test score, as expressed by (3.2), represents the
compatibility of p ~ Maria is young and attractive with the data resident in
the explanatory database.
In testing the constituent relations in ED, it is helpful to have a collection of
standardized translation rules for computing the test score of a combination of elastic
constraints C), ...• Ck from the knowledge of the test scores of each consttaint con-
sidered in isolation. For the most part, such rules are default rules in the sense that
they are intended to be used in the absence of alternative rules supplied by the user.
For pwposes of knowledge representation, the principal rules of this type
are the following.
expressed symbolically as
F =Jl.1/ul+·· ·+Jl.nlun =I:jJl.j/Uj
or, more simply, as
F =Jl.IUt+·· ·+Jl.nUn ,
in which the tenn Jl.j IUj, i = 1, ... ,n, signifies that Jl.j is the grade of membership of
Uj in F, and the plus sign represents the union.
The sigma-count of F is defined as the arithmetic sum of the Jl.j , i.e.,
I: Count(F) ~ I:jJl.j ,i = 1, . . . ,n ,
with the understanding that the sum may be rounded, if need be, to the nearest in-
teger. Furthennore, one may stipulate that the terms whose grade of membership
falls below a specified threshold be excluded from the summation. The purpose of
such an exclusion is to avoid a situation in which a large number of terms with low
grades of membership become count-equivalent to a small number of terms with
high membership.
The relative sigma-count, denoted by I: Count(F IG), may be intexpreted as
the proportion of elements of F which are in G. More explicitly,
I: Count(F rP)
I:Count(FlG) = I:Count(G) ,
Jl.FIlG(u)=Jl.F(U) A JI.G(U) , U E U.
Thus, in terms of the membership functions of F and G, the the relative sigma-count
of F in G is given by
which signifies that Name is bound to Naomi, Year to Yearj, and the resulting
relation is projected on the domain of the attribute Amount, yielding the value
of Amount corresponding to the values assigned to the attributes Name and
Year.
2. Test the constraint induced by FEW:
~j ~ .FEW [Year=Yearj] ,
which signifies that the variable Year is bound to Yearj and the corresponding
value of ~ is read by projecting on the domain of ~.
3. Compute Naomi's total income during the past few years:
TIN ~!.j ~;lNj ,
in which the ~j play the role of weighting coefficients. Thus, we are tacitly as-
suming that the total income earned by Naomi during a fuzzily specified inter-
val of time is obtained by weighting Naomi's income in year Yearj by the de-
gree to which Yearj satisfies the constraint induced by FEW and summing the
weighted incomes.
4. Compute the total income of each Namej (other than Naomi) during the past
few years:
TIName j =!'j ~;lName jj ,
where !Nameji is the income of Namej in Yearj.
10
5. Fmd the fuzzy set of individuals in relation to whom Naomi earned far more.
The grade of membership of Name j in this set is given by
JlFM(Namej)= ,.FARMORE [Income l=TIN; Income '1=TINamej ].
6. Fmd the fuzzy set of close friends of Naomi by intensifying (Zadeh, 1978a)
the relation FRIEND:
CF ~ CWSEFRIEND ~ 2FRIEND,
represents JJ.F(Namej), that is, the grade of membersbip of Namej in the set of
Naomi's friends.
7. Count the number of close friends of Naomi. On denoting the count in ques-
tion by .I: Count (CF). we have:
Uount(CF) =.I:j JJ.2 FRIEND (Namej)'
8. Find the intersection ofFM with CP. The grade of membership of Namej in the
intersection is given by
JlFMrlCF(Namej)=JJ.FM(Namej) A JlcF(Namej),
where the min operator }., signifies that the intersection is defined as the con-
junction of its operands.
9. Compute the sigma-count of FM nCF:
Uount(FM nCF)=.I:jJJ.FM(Namej)}., JlcF(Namej)'
10. Compute the relative sigma-count of FM in CP, i.e., the proportion of individu-
als in FM nCF who are in CP:
~ Uount(FM nCF)
P= Uount(CF)
which expresses the overall test score and thus represents the compatibility of
p with the explanatory database.
In application to the representation of dispositional knowledge, the first step
in the representation of the meaning of a disposition involves the process of explici-
ration, that is, making explicit the implicit quantifiers. As a simple example, consid-
er the disposition
d ~ young men like young women
In this way, the representation of the meaning of p is decomposed into two simpler
problems, namely, the representation of the meaning of P, and the representation of
the meaning of q knowing the meaning of P .
1be meaning of P is represented by the following test procedure.
1. Divide POPULATION into the population of males, M.POPULATION, and popula-
tion of females, F.POPULATION:
M.POPULATION ~ N_ .A~ POPULATION [Sex =Male]
F.POPULATION ~ N_ .A~ POPULATION [Sex =Female] ,
where <X; may be interpreted as the grade of membership of Namej in the fuz-
zy set, YW, of young women.
4. For each Name; ,i = 1, ... ,k ,in M.POPULATION, find the age of Name; :
B; ~A~ M.POPULATlON[Name =Name;] .
'Ej aj
1be concept of a canonical form relates to the basic idea which underlies
test-score semantics, namely, that a proposition may be viewed as a system of elastic
constraints whose domain is a collection of relations in the explanatory database.
Equivalently, let X I, ...• Xn be a collection of variables which are consttained by p .
Then, the canonical form of p may be expressed as
cf(P) ~X is F , (4.1)
=
where X (X I •...• Xn) is the constrained variable which is usually implicit in p ,
and F is a fuzzy relation, likewise implicit in p, which plays the role of an elastic (or
fuzzy) constraint on X. 1be relation between p and its canonical form will be ex-
pressed as
p ~X is F , (4.2)
signifying that the canonical form may be viewed as a representation of the meaning
ofp.
In general. the constrained variable X in cf (P) is not uniquely determined
by p, and is dependent on the focus of attention in the meaning-representation pro-
cess. To place this in evidence, we shall refer to X as the focal variable.
As a simple illustration, consider the proposition
p ~ Anne has blue eyes (4.3)
In this case, the focal variable may be expressed as
X ~ Color (Eyes (Anne» ,
and the elastic constraint is represented by the fuzzy relation BLUE. Thus, we can
write
p ~ Color (Eyes (Anne» is BLUE . (4.4)
Here, the focal variable has two components, X =(X I, X 2), where
XI =Height (Brian)
X 2 = Height (Mildred);
and the elastic constraint is characterized by the fuzzy relation MUCH.TAllER
[Height 1; Height2; J.l), in which J.l is the degree to which Height 1 is much taller
than Height2. In this case, we have
p ~ (Height (Brian), Height (Mildred» is MUCH.TALLER. (4.6)
Now. if we identify X I with Agent (GNE). X z with Recipient (GWE). etc.• the se-
mantic netwolk representation (4.16) may be regarded as a canonical form in which
X =(X\ •... •XS).and
XI = Richard (4.17)
X z = Cindy
X3 is Past
X 4 is Pin
Xs is Red
15
More generally, since any semantic network may be expressed as a collection of tri-
ples of the fODD (Object, Attribute, Attribute Value), it can be transfoDDed at once
into a canonical fODD. However, since a canonical fODD has a much greater expres-
sive power than a semantic network, it may be difficult to transfoDD a canonical
fODD into a semantic network.
INFERENCE
'The concept of a canonical fODD provides a convenient framework for
representing the rules of inference in fuzzy logic. Since the main concern of the pa-
per is with knowledge representation rather than with inference, our discussion of
the rules of inference in fuzzy logic in this section has the fODDat of a summary.
In the so-called categorical rules of inference, the premises are assumed to
be in the canonical fODD X is A or the conditional canonical fODD X is A if Y is B,
where A and B are fuzzy predicates (or relations). In the syllogistic rules, the prem-
ises are expressed as Q A's are B's, where Q is a fuzzy quantifier and A and Bare
fuzzy predicates (or relations).
'The rules in question are the following
CATEGORICAL RULES
Examples
X ~ Age(Mary), Y = Distance(Pl,P2)
A, B , C, .. . = fuzzy predicates (relations)
Examples
A =small, B =much larger
ENTAiLMENT RULE
XisA
A c B ~ ~A (u) S ~B (u), u e U
XisB
Example
Mary is very young
very young c young
Mary is young
16
CONJUNCTION RULE
XisA
XisB
XisA nB -+ IlA roB (U) =IlA (U) A IlB (U)
n =intersection (conjunction)
Example
pressure is not very high
pressure is not very low
pressure is not very high and not very low
DISJUNCTION RULE
XisA
or XisB
PROJECTION RULE
(X,Y)isR
X is J!1 -+ IlxR (u)=sUPv IlR (u, v)
J!1 ~ projection of R on U
Example
(X.Y) is close to (3.2)
X is close to 3
COMPOSITIONAL RULE
Example
X is much larger than Y
Yis large
X is much largero large
17
NEGATION RULE
not (X is A)
X is ...., A ~ ~ (u) =1 - IlA (u)
...., ~negation
Example
not (Mary is young)
Mary is not young
EXIENSION PRINCIPLE
XisA
f(X) is f(A)
A =Il]!u] + J.12!U2 + ... +J.1,,!un
f(A) =Il]!f (u])+J.12!f (U2)+ ... +J.1n!f(un )
Example
X is small
X2 is 2 small
Ii very small ,!1ve", .....11 = ( Ilsmall )2
2 small =
It should be noted that the use of the canonical fonn of in these rules stands
in sharp contrast to the way in which the rules of inference are expressed in classical
logic. The advantage of the canonical fonn is that it places in evidence that infer-
ence in fuzzy logic may be interpreted as a propagation of elastic constraints. This
point of view is particularly useful in the applications of fuzzy lOgic to control and
decision analysis (Proc . of the 2nd IFSA Congress. 1987. Proc. of the International
Workshop. Iizuka, 1988).
As was pointed out already, it is the qualitative mode of reasoning that
plays a key role in the applications of fuzzy logic to control. In such applications,
the input-output relations are expressed as collections of fuzzy if-then rules (Mam-
dani and Gaines, 1981).
For example, if X and Y are input variables and Z is the output variable, the
relation between X ,Y , and Z may be expressed as
Z is C]ifX is A] andY is B]
Z is C 2 ifX isA 2 andY is B2
Z is Cn if X is An andY is Bn
where Cj , Aj , and Bj , i =1 •...• n are fuzzy subsets of their respective universes of
discourse. For example,
18
SYLWGISTIC RULES
In its generic fonn, a fuzzy syllogism may be expressed as the inference schema
QJA 's are B 's
Q2C 's are D 's
Q~ 's are F 's
in whichA. B. C. D. E and F are interrelated fuzzy predicates and QI' Q 2 and Q 3 are
fuzzy quantifiers.
The interrelations between A,B.C.D.E and F provide a basis for a
classification of fuzzy syllogisms. The more important of these syllogisms are the
following
(a) Intersection/product syllogism:
C=AAB,E=A,F=CAD
(b) Chaining syllogism:
C=B.E=A.F=D
(c) Consequent conjunction syllogism:
A=C=E,F=B J.. D
(d) Consequent disjunction syllogism:
A=C=E,F=B V D
(e) Antecedent conjunction syllogism:
B=D=F,E=A A C
(f) Antecedent disjunction syllogism:
B=D=F,E=A v C
19
In the context of expert systems. these and related syllogisms provide a set of infer-
ence rules for combining evidence through conjunction. disjunction and chaining
(Zadeh, 1983b).
One of the basic problems in fuzzy syllogistic reasoning is the following:
Given A, B, C, D, E and F. find the maximally specific (Le., most restrictive) fuzzy
quantifier Q3 such that the proposition Q 3 E's are F's is entailed by the premises. In
the case of (a), (b) and (c). this leads to the following syllogisms:
INTERSECTIONIPRODUCI SYLWGISM.
CHAINING SYLLOGISM.
o ~ (2most e 1) = 2most e 1.
The three basic syllogisms stated above are merely examples of a collection
of fuzzy syllogisms which may be developed and employed for PWPOSes of infer-
ence from commonsense knowledge. In addition to its application to commonsense
reasoning, fuzzy syllogistic reasoning may serve to provide a basis for combining
uncertain evidence in expen systems (Zadeh, 1983b).
CONCLUDING REMARKS
One of the basic aims of fuzzy logic is to provide a computational frame-
work for knowledge representation and inference in an environment of uncertainty
and imprecision. In such environments, fuzzy logic is effective when the solutions
need not be precise and/or it is acceptable for a conclusion to have a dispositional
rather than categorical validity. The imponance of fuzzy logic derives from the fact
that there are many real world applications which fit these conditions, especially in
the realm of knowledge-based systems for decision-making and control.
de Kleer, J., and 1. Brown, "A qualitative physics based on confiuences," Artificial
Intelligence, 24, 7-84, 1984.
Dubois, D. and Prade, a , Fuzzy Sets and Systems: Theory and Applications.
Academic Press, New York, 1980.
Forbus, K., "Qualitative physics: past, present, and future," Exploring Artificial Intel-
ligence, H. Shrobe, ed., Morgan Kaufman, Los Altos, CA, 1989.
Fujitec, "Anificial intelligence type elevator group control system," JErRO , 26,
1988.
Goodman, I.R. and Nguyen, H.T., Uncertainty Models for Knowledge-Based Sys-
tems. North-Holland, Amsterdam, 1985.
Isik, C., "Inference engines for fuzzy rule-based control," Internationallour. ofAp-
proximate Reasoning 2,122-187,1988.
22
Kacprzyk, J. and Yager, R.R (eds.), Management Decision Support Systems Using
Fuzzy Sets and Possibility Theory. Interdisciplinary Systems Research Series,
vol. 83, Verlag rov Rbeiland, Koln, 1985.
Kacprzyk, J. and Orlovski, S.A. (eds.), Optimization Models Using Fuzzy Sets and
Possibility Theory. D. Reidel, Dordrecht, 1987.
Kinoshita, M., and T. Fukuzaki, T. Satoh, and M Miyake, "An automatic operation
method for control rods in BWR plants," Proc. Specialists' Meeting on In-core
Instrumentation and Reactor Core Assessment, Cadarache, France, 1988.
Kiszka, J.B., MM. Gupta, and P.N. Nikiforuk, "Energetistic stability of fuzzy
dynamic systems," IEEE Transactions on Systems, Man and Cybernetics
SMC-15, 1985.
Klir, G.J. and Folger, T.A., Fuzzy Sets, Uncertainty and Information. Prentice Hall,
Englewood ruffs, N.J., 1988.
Kuipers, P., "Qualitative simulation," Artificial Intelligence 29, 289- 338, 1986.
Mamdani, B.H. and Gaines, B.R. (eds.), Fuzzy Reasoning and its Applications.
Academic Press, London, 1981.
Moore, R.C., "The role of logic in knowledge representation and commonsense rea-
soning," Proceedings of the National Conference on Artificial Intelligence,
428-433,1982.
Moore, R.C. and Hobbs, lC. (eds.), Formal Theories of the Commonsense World.
Ablex Publishing, Harwood. NJ., 1984.
Mukaidono, M., Z. Shen, and L. Ding, "Fuzzy Prolog," Proc. 2nd IFSA Congress,
Tokyo, Japan, 452-455,1987.
Peterson, P., "On the logic offew, many, and most," Notre Dame Journal of Formal
Logic 20, 155-179, 1979.
Pospelov, G.S., "Fuzzy set theory in the USSR," Fuzzy Sets and Systems 22,1-24,
1987.
Shapiro, J.C. (ed.), Enclyclopedia of Artificial Intelligence. John Wiley & Sons, New
York, 1987.
Small, S.L., Cottrell, G.W., and Tanenbaus, M.K. (eds.), Lexical AmbigUity Resolu-
tion. Morgan Kaufman Publishers, Los Altos, CA. 1988.
Sugeno, M., ed., Industrial Applications of Fuzzy Control, North Holland, Amster-
dam, 1985.
Togai, M., and H. Watanabe, "Expert systems on a chip: an engine for real-time ap-
proximate reasoning," IEEE Expert 1,55-62, 1986.
Zadeh, L.A. "Probability measures of fuzzy events," Jour. Math. Anal. and Appli-
cations 23, 421-427, 1968.
Zadeh, L.A. "Outline of a new approach to the analysis of complex systems and de-
cision processes," IEEE Trans. on Systems, Man and Cybernetics SMC-3,
28-44, 1973.
Zadeh, L.A. "The Concept of a Linguistic Variable and its Application to Approxi-
mate Reasoning," Part I; Inf. Science 8, 199-249; Part IT In/. Science 8, 301-
357; Part ill In! Science 9,43-80, 1975.
Zadeb, L.A. "Fuzzy sets as a basis for a theory of possibility," Fuzzy Sets and Sys-
tems 1,3-28, 1978a.
Zadeh, L.A. "The role of fuzzy logic in the management of uncertainty in expert
systems," Fuzzy Sets and Systems 11,199-227, 1983b.
Zadeh, L.A. "Syllogistic reasoning in fuzzy logic and its application to reasoning
with dispositions, " IEEE Trans. on Systems, Man and Cybernetics SMC-15,
754-763, 1985.
Zadeh, L.A. " QSA/FL-Qualitative systems analysis based on fuzzy logic, " Proc.
AAAI Symposium, Stanford University, 1989.
Zemankova-Leech, M. and Kandel, A., Fuzzy Relational Data Bases - A Key to Ex-
pert Systems. Verlag TIN Rheinland, Cologne, 1984.
Zimmerman, H.I., Fuzzy Set Theory and its Applications. Kluwer, Nijhoff, Dor-
drecht, 1987.
2
EXPERT SYSTEMS USING FUZZY
LOGIC
Ronald R. Yager
Machine Intelligence Institute
Iona College
New Rochelle, NY 10801
ABSTRACT
We show how the theory of approximate reasoning developed by L.A. Zadeh
provides a natural format for representing the knowledge and performing the
inferences in rule based expert systems. We extend the representational ability of
these systems by providing a new structure for including rules which only require the
satisfaction to some subset of the antecedent conditions. This is accomplished by the
use of fuzzy quantifiers. We also provide a methodology for the inclusion of a form
of uncertainty in the expert systems associated with the belief attributed to the data
and production rules.
INTRODUCTION
In [1] Buchanan and Duda provide an excellent introduction to the principles
of rule-based expert systems. In [2] Buchanan provides a bibliography on expert
systems. A particularly well cited example of a rule based expert system is MYCIN
[3,4]. In [5] Van Melle has abstracted the basic structure of the MYCIN system and
provided a language for the development of prototypical rule based expert systems
called EMYCIN.
A rule based expert system is essentially an example of a production system
consisting of the following components [1]:
As noted by Buchanan and Duda [1] the fundamental building block for the
information in both the database and the rule base of an expert system are
propositional statements of the form:
the (attribute) of (object) is (value)
For example
The height of John is 6 feet.
29
The temperature of the patient is 102.
One can combine the ideas of attribute and object into a concept called a variable.
Thus in the above examples John's height and the patient's temperature can be
considered variables. In this notation the fundamental building blocks of the rule-
based expert systems would be
V is A,
where V is a variable, attribute (object) and A is its current value.
It is at this point we diverge from the current representational approach to
expert systems knowledge. In the current systems, such as MYCIN, the values of
the variables are left as symbols, words or values with no meaning. That is, the data
Temperature is high is left in this form, no attempt is made to give any meaning to
the value high. That is, the values are considered as atomic items with no further
attempts at understanding their meaning. The matching used to determine the
frreability of rules is carried out at this level of semantics. Using the values at this
level of detail provokes some important questions. When two people use the same
word, such as the designer of a system and the user, do they mean the same thing?
Secondly, if a rule has a certain value for a variable in its antecedent can we still learn
something about the consequent variable if we only know that the value of antecedent
is close to the value in the rule? The ability to handle these types of problems
requires us to provide a deeper semantics for the values associated with variables.
Just as the predicate logic refines and improves upon the propositional logic by
further decomposing the atomic statements the theory of approximate reasoning [6-
10] further refines the meaning of the values associated with variables.
The approach we suggest is based upon idea of fuzzy subsets introduced by
Zadeh [13]. Assume X is a set of objects. A fuzzy subset A of X is a subset in
which the membership grade for each x E X is a element in the unit interval [0,1].
We denote this membership function A(x). In our approach a proposition such as
. Age is old
has the effect of associating with the variable age a possibility distribution [9].
Assume we have the proposition
VisA
where A is some value. We can express A as a fuzzy subset of a base set, the set
values the variable can assume. For example, if A is old we can express A as a fuzzy
subset if interval of ages [0,150]. In particular, X is the set of all values that V can
assume. The statement in turn induces a possibility distribution, 1tv over the set X
such that
TIv (x) =A(x),
where A(x) is the membership grade of x in A. In particular TIv(x) is seen to be the
possibility that V =x given the data V is A.
In a rule based expert system the fundamental component of the rules are
conditional statements of the form
i/Vj is A then V2 is B.
As suggested by Zadeh [8] propositions of this type can also be seen to
30
then U is B"
The ability to represent such rules will greatly enhance the ability of any
expert system to capture the types of rules used by experts.
We shall provide a methodology for representing such rules in a manner
consistent with the rest of our formulation and one which allows inferences to be
made about the value of the consequence using the rule and observed values about the
variables in the antecedent. This methodology is based upon Zadeh's [14]
representation of quantifiers and Yager's procedure for evaluating quantified statements
[15].
The class of rules we are concerned with can be described to consist of the
following components, an antecedent and a consequence. The antecedent component
consists of a collection of requirements specified in the form of proposition of the
type Vi is Ai is a variable and Ai is a fuzzy subset of the base set X. In addition the
antecedent consists of a quantifier, Q, such as most, all, almost all, at least one, at
least half, etc. The consequent consists of a proposition of the type U is B.
The rule than reflects the fact that if Q of the antecedent conditions, the
V i is Ai's, are satisfied than U is B can be added to our knowledge base. The
fundamental difference between this type of rule and the types studied in the previous
section is that rather than requiring all the antecedent conditions to be satisfled, only
Q of them need be satisfied.
Like the other types of conditional rules, these rules also induce a
conditional possibility distribution IT over the set X 1 x X1 x Xn.
--UlV l' V2,··· Vn
X Y. In particular for any point (Xl' x2, . .. xn, y) where Xi E Xi and y E Y
IlulV V V (Xl, x2,·· .xn' y) =Min [I, 1 - H(xI, x2, ·· .xn) + B(y)].
1- 2- ··· n
The essential difference lies in the determination of the joint possibility
H(x I , . . xn), the component due to the antecedent. The method for determining
this H is based upon ideas developed by Yager [15].
As suggested by Zadeh [14], a linguistic quantifier can be expressed as a
fuzzy subset. In particular there exists three kinds of quantifiers, the first two of
which are of interest to us. A kind one quantifier or absolute quantifier such as
"about 5", "at least seven" and a kind two or relative quantifier is exemplified by
values such as "almost all" and "at least half." As suggested by Zadeh, a kind one
quantifier can be expressed as a fuzzy subset of the non-negative reals whereas a kind
two quantifier can be expressed as a fuzzy subset of the unit interval. For example, if
QI is the kind one quantifier "at least 5", then for each x E R+, Q(x) indicates the
degree to which x satisfied the concept "at least 5". Similarly, if Q2 is a kind two
quantifier, "most", then for any x E I, Q(x) indicates the degree to which the
proposition x satisfies the concept "most".
Let Q be a quantifier either kind I or kind II, with base set W, for kind I,
= =
W R+ and for kind II, W [0, 1]. Then Q is said to be monotonically non-
decreasing if for any WI, w2 E W such that u2 > ul then Q(uV ~ Q(uI). We shall
34
if Vis G the U is B
where
G(XI, ... Xn) =Maxi [Q(i) A D(i)]
=
where D(i) ith largest element in the set 0 ={AI (Xl), A2(xV, ... An(xn).
When Q is the quantifier all, then
Q(i) = I i=n
Q(i) =0 i;t n.
In this case
G(XI, ... xn) = I A D(n) = nth largest element in 0
= =
hence G(XI, ... xn) AI(XI) A A2(XVA ... A An(xn) H(xI, ... xn ).
Theorem: When Q is the quantifier at least one then the rule
if Q [Vi is Ail then U is B I
is equivalent to the proposition
ifV1 is A1 or V2 isA2 or Vn is An then U is B. ill
Proor: For rule III we have
if V is H the U is B
where V = (VI, V2, V3' ... ,Vn) and
H(xI, ... xn) =AI(xI) v A2(xV v .... An(xn)
For rule I we have
if V is G the U is B
where
G(XI, ... Xn) =Maxi [Q(i) A D(i)]
where D(i) =ith largest element in the set 0 = {A 1(x 1), A2(xV, ... An(xn).
When Q is the quantifier at least one, then
Q(i) = 1 for all i ~ 1.
Thus
CERTAINTY QUALIFICATION
In providing information to the database and rule base of an expert system,
as discussed by Buchanan and Duda [1], a person may not be completely confident as
to the value he is providing for a variable. Thus a user of a system may provide the
information that
V is A with confidence (or certainty) a.
In the above the quantity a, which is a number in the unit interval,
expresses the degree to which the informant believes that this information is valid.
We would like to provide a mechanism to include these types of qualified
statements into our system. In the spirit of keeping the very powerful structure
which we have developed the approach will be to assume that a statement
V is A with a confidence
38
l
S(F) is defined as
cax
S(F) = 1 da
card Fa
o
where Fa = {x I F(x)~a}, Card Fa is the number of elements in Fa and a max is
the largest membership grade in F. For the case where F is nonnal, then
39
1
S(F)=1 1 da
card Fa
o
Yager [18] has shown for the case of nonnal fuzzy subsets if FcG, that is
for F(x) ~ G(x) for all xE X, then
S(F) ~ S(G).
The following theorems reinforce our observations about the tradeoff
between specificity and certainty.
Lemma : If A is nonnal, then the transformation of the proposition
V is A with a certainty into the proposition V is B will yield B as a nonnal set.
Proof: Let x be such that A(x) = 1, then
B(x) = aAA(x) + (I-a) = aAl + (I-a) = a + (I-a) = 1.
In the following theorems A is assumed nonnal.
Theorem: Assume the proposition V is A with a certainty transforms into
the proposition V is B then
S(A) ~ S(B).
Proof: We shall first show that for each xE X, B(x) ~ A(x) from the defmition
B(x) = (a A A(x» + (I-a).
Assume a ~ A(x) then
B(x) = a A A(x) + (I-a) = A(x) + (I-a) ~ A(x).
Assume a < A(x) then
B(x) = a + (I-a) = 1 ~ A(x).
Since B(x) ~ A(x) for each x, then it follows that S(A) ~ S(B).
Thus we see that the act of qualifying a proposition by a certainty has the
effect of reducing the specificity of its unqualified equivalent.
Theorem: Assume V is A with a 1 certainty transforms into V is Bland
V is A with a2 certainty transfonns into V is B2 if al > a2, then
S(Bl) ~ S(BV·
Proof:B 1(x) = (a 1 A A(x» + (l-a 1) and B2(x) = (a2 A A(x» + (1-av. There are three
possibly situations: 1. A(x) ~ a2 ~ al. In this case
Bl(x) = A(x) + (1-al)
B2(x) = A(x) + (l-aV,
since al > a2, then (l-al) ~ (l-aV and hence B2(x) ~ Bl (x).
2. a2 ~ A(x) ~ al. In this case
Bl(x) = A(x) + (1-al)
B2(x) = a2 + (l-aV = 1 ~ BI(x).
40
where
H(x,y) =Min(1, l-A(x) + B(y».
V is A is possible.
This statement characterizes a piece of information that says our knowledge of the
value of V is such that it is possible (or consistent) with it to assume that V lies in
the set A. Note that it doesn't specifically say V lies in A. Formally this statement
gets translated into
VisA+
where A+ is a subset of the power set of the base set X. In particular for any subset
GofX
= =
A+(G) Poss[NG] Maxx[A(x) A G(x)]
Essentially A+ is made up of the subsets of X which intersect, are consistent, with
A.
Closely related to possibility qualification is certainty qualification. A
statement
V is A is certain
translates into
VisAV
where A V is a subset of the power set of the base set of A, X, such that for any
subsetFofX
AV(F) =Cert(A/F} = 1- Poss (A@IF)
We shall now describe the representation of some primary types of commonsense
knowledge by the possibilistic reasoning approach.
We shall initially consider the statement
t):pically V is A.
The interpretation of "typically V is A" afforded by Reiter's default reasoning
system[20] is to say "if we have not established V is -.A then assume V is A. Thus
we can translate the above into
if Yis A is possible then V is A.
Using our translation rules we get
if V is A+ then V is A.
This translate into
V is -.(A+) U A.
We shall denote -.(A+) as A*, hence we get
V is (A* u A).
Furthermore assume that our knowledge base consists simply of the fact that
VisB.
Combining this with our typical knowledge we get V is D where
=
D (A* () B) u (A () B).
Furthermore as discussed in [16,18,19] this becomes
=
D(x) (B(x) A (1 - Poss[NB]) v (A(x) A B(x».
Two extremal cases should be noted. If our typical value A is completely
=
inconsistent with our known value, A () B ell, then Poss[NB] 0 =
43
CONCLUSION
REFERENCES
(1) Buchanan, B.G. and Duda, R.O., "Principles of rule-based expert systems,"
Fairchild Technical Report No. 626, Lab. for Artificial Intelligence Research,
Fairchild Camera, Palo Alto, Ca., 1982.
(4) Davis, R., Buchanan, B.G. & Shortliffe, E.H., "Production rules as a
representation of a knowledge-based consultation program," ArtifIcial Intelligence 8,
15-45, 1977.
(5) Van Melle, W., "A domain independent system that aids in constructing
knowledge-based consultation program," Ph.D. dissertation, Stanford University
Computer Science Dept., Stanford CS-80-820, 1980.
(6) Zadeh, L.A., "Fuzzy logic and approximate reasoning," Synthese 30,407-428,
1975.
44
(7) Zadeh, L.A., "The concept of a linguistic variable and its application to
approximate reasoning," Information Science 8 and 9, 199-249, 301-357, 43-80,
1975.
(8) Zadeh, L.A., "A theory of approximate reasoning," in Hayes, J.E., Michie, D.
and Kulich, L.I., (eds) Machine Intelligence 9, 149-194, John Wiley & Sons, New
York,1979.
(9) Zadeh, L.A., "Fuzzy sets as a basis for a theory of possibility," Fuzzy Sets and
Systems I, 3-28, 1978.
(10) Zadeh, L.A., "PRUF-a meaning representation language for natural languages,"
Int. J. of Man-Machine Studies 10, 395-460, 1978.
(11) Yager, R.R., "Querying knowledge base systems with linguistic information via
knowledge trees," Int. J. Man- Machine Studies 19, 1983.
(12) Yager, R. R., "Knowledge trees in complex knowledge bases," Fuzzy Sets and
Systems 15,45-64, 1985.
[16]. Yager, R. R., "Default and approximate reasoning," Proc. 2nd IFSA
Conference, Tokyo, 690-692,1987.
[17]. Yager, R. R., Ovchinnikov, S., Tong, R. and Nguyen, H., Fuzzy Sets and
Applications: Selected Papers by L. A. Zadeh, John Wiley & Sons: New York,
1987.
[20]. Reiter, R., "A logic for default reasoning," Artificial Intelligence 13, 81-132,
1980.
3
Fuzzy rules in knowledge-based systems
- Modelling gradedness,
uncertainty and preference -
The paper starts with ideas of possibility qualification and certainty qualification
for specifying the possible range of a variable whose value is ill-known. The notion
of possibility which is used for that purpose is not the standard one in possibility
theory, although the two notions of possibility can be related. Based on these
considerations four distinct types of rules with different semantics involving
gradedness and uncertainty are then introduced. The combination operations which
appear for taking advantage of the available knowledge are all derived from the
intended semantics of the rules. The processing of these four types of rules is studied
in detail. Fuzzy rules modelling preference in decision processes are also discussed.
1. INTRODUCTION
The applications of fuzzy set and possibility theories to rule-based expert
systems have been mainly developed along two lines in the eighties : i) the
generalization of the certainty factor approach introduced in MYCIN (Buchanan and
Shortliffe, 1984) by enlarging the possible operations to be used for combining the
uncertainty coefficients; ii) the handling of vague predicates in the expression of the
expert rules or of the available information. The first line of research is exemplified
by the inference system RUM (Bonissone et aI., 1987) where a control layer chooses
the triangular norm operation governing the propagation of uncertainty, or by the
inference system MILORD (Godo et aI., 1988) where the combination and
propagation operations associated with each rule reflect the expert knowledge. The
second trend has motivated a huge amount of literature especially for discussing the
multiple-valued logic implication connective ~ to be used in the modelling of a rule
of the form "if X is A then Y is B" by means of a fuzzy relation R (defined by
J.lR(x,y) = J.lA(x) ~ J.lB(y»· The choice of the implication function has been
investigated from an algebraic point of view by classifying the implications
according to axiomatic properties, and from a deduction-oriented perspective by
46
requiring some prescribed kind of results for the generalized modus ponens applied to
fuzzy "if... then ..... rules (e.g. Mizumoto and Zimmermann (1982), Dubois and
Prade (1984), Trillas and Valverde (1985), Bouchon (1987), Smets and Magrez
(1987». Although the available results indeed enable us to jointly choose an
implication function and the conjunction to be used for combining the two premisses
"X is A' .. and "if X is A then Y is B" in order to obtain an expected behavior for the
generalized modus ponens, these approaches do not really consider the intended
semantics of the rules. See (Dubois and Prade, 199Od) and (Dubois, Lang and Prade,
1990) for an extensive overview and a discussion of the generalized modus ponens
and of the certainty factor approaches respectively.
In this paper, extending recently obtained results (Dubois and Prade, 1989a,
1990b, d), we show how the choice of the implication operation is induced by the
type of rule we have to model in the framework of possibility theory. The approach
which is proposed formalizes ideas which have been more empirically studied by
Bouchon (1988), Despres (1989) about the role of different kinds of modifiers in the
expression and the intended meaning of fuzzy rules and can be also somewhat related
to recent works about possibility and necessity qualifications (Magrez and Smets,
1989; Dubois and Prade, 1990a; Fonck, 1990; Yager, 1990).
We first discuss two distinct ways of specifying a possibility distribution, either
by possibility or by certainty qualification. This can be regarded as a new approach in
possibility theory. The consequences of the mode of qualification on the
manipulation of the pieces of knowledge which are thus specified, are emphasized.
The notion of possibility which is used in possibility qualification do not correspond
to the standard notion of possibility measure in possibility theory ; the links between
the two concepts are clarified in Section 3. Using the ideas of Section 2, Section 4
introduces four different types of rules which are closely related to particular types of
fuzzy truth-values (or, if we prefer, of modifiers). Section 5 discusses the behavior of
these rules in the generalized modus ponens and when used in parallel. Section 6 is
devoted to another kind of fuzzy rules expressing preference.
where the overbar on a subset denotes the complementation and we use the identity
=
~ (A) 1- a ' with B ~ denoting the strong ~cut of a fuzzy set B, namely
{u E U, IlB(u) > ~} (i.e. '~' is changed into '>' in the definition of the level cut).
Clearly (9) applies to any A and thus (9) still holds changing A into A, which gives
'V u E U, IlA(u) =infaE (O,I] max{J.I.AI_a(u), 1- a) (10)
an inf-combination)
'V u E U, 1tx(u) :s; infae (0,1] max{J.1A I-a(u), I - a)
i.e.
(13)
which extends (6) to the case where A is fuzzy. Interestingly enough, (13) was already
discussed by Zadeh (l978b) and Sanchez (1978) for possibility-qualification purposes.
Similarly, "A is certain" has been intetpreted as "AI=]) is at least P-certain",
'V PE (0,1]. Then "A is a-certain" will be intetpreted as "Al'='P is at least min( a,p)
50
certain". Then using the min-combination (5), we get (Dubois and Prade, 1990a, d):
V' u E U,1tx(u) ~ inf~E (0,1] maxULA 4(u), 1 - min(cx,~»
i.e.
V' U E U,1tx(u) ~ max(inf~E (0,1] maxULA q(u), 1 - ~), 1 - cx)
=
which is completely possible for x (i.e. n(A) 1), and saying that "A is possible" is
short for "the range A is (completely) possible for x" whose intended meaning is
really that all the values in A are possible for the variable x. This latter notion of
possibility is particularly important, as advocated in this paper, for the specification
of possibility distributions in general and more particularly of fuzzy rules.
The notion of E-possibility seems to have been largely ignored in the fuzzy set
literature. However its counterpart in Shafer (1976),s evidence theory is well-known ;
it is the commonality function Q, which, by the way, is mainly used for technical
reasons and does not seem to have received any practical interpretation until now.
Indeed starting with a basic probability assignment m such that I. A m(A) = 1, the
=
commonality of A is defined by Q(A) I.B2A m(A) ; it can be easily checked that
the following analogue of (19) holds
Q(A) = 1 ¢:> 'V u E A, PI«(u}) = 1
=
where the plausibility function PI is defined by PI(C) I.Ci'lB.t0 m(B). This results
=
from Q(A) 1 ¢:> 'VB such that m(B) > 0, B 2 A ¢:> 'V u E A, 'VB such that
m(B) > 0, u E B ¢:> 'V u E A, PI«(u}) = 1. Moreover L1(A) can be put under a form
which looks analogous to Q(A). Indeed introducing the fuzzy set F such that ~F =
1t x' we have L1(A) =infuE A ~F(u) ; hence 'V u E A, ~F(u) ~ L1(A) or equivalently
A ~ F L1(A) , and more generally 'V a ~ L1(A), A ~ Fa' Besides ~£>O, A ~ F L1(A)+£
(from the defmition of L1(A». Hence
L1(A) = supra E (0,1], Fa 2 A) (with the convention sup & = 0 if & =0)
A possibility distribution 1tx such that (1t x(u) E (0,1], u E U) is an ordered
finite set M = {aI' ... , an} with al = 1 > .. . > an > a n+l = 0, is equivalent to the
basic probability assignment (Dubois and Prade~ 1982) defmed by
'Vi, m(Fai) =ai - ai+ 1
'V B ::F- Fai, m(B) =0
Then, using the nestedness property a < ~ ~ Fa:> F~, it can be easily seen that
=
L1(A) Q(A), where Q is defined from the function m above, i.e. the two definitions
coincide. More generally, for A remaining non-fuzzy, it is easy to see that
A is (at least) a-possible ¢:> 'V u E A, 1tx(u) ~ a ¢:> L1(A) ~ a (20)
which generalizes (19). The definition of the E-possibility can be extended to fuzzy
sets still preserving the equivalence
A is (at least) a-possible ¢:> 'V u E U, 1tx(u) ~ min~A(u), a) ¢:> L1(A) ~ a (21)
This is satisfied by taking
=
L1(A) infuEU ~A(u) ~ 1t x(u) (22)
={ .
lifa~b
where a ~ b is a multiple-valued logic implication connective,
bifa>b
known as GO<lel's implication. It is easy to see that (22) reduces to (18) when A is an
52
ordinary subset. Moreover (21) is ensured by the equivalence a --+ b ~ c ¢::> b ~
min(a,c). By contrast, a lower bound a on the extension of the possibility measure
TI (defined by (16» to a fuzzy event A, i.e.
TI(A) =sUPueU min{J.LA(u), 1tx(u» ~ a (23)
is equivalent to \:;/ ~ < a, TI(A[3> ~ a, i.e. \:;/ ~ < a, 3 u e A~, 1tx(u) ~ a (see, e.g.,
Dubois and Prade, 1990a), which clearly departs from .1(A) ~ a, the latter means that
\:;/ ~ ~ a, \:;/ u e A~, 1tx(u) ~ a and \:;/ ~ < a, \:;/ u e ~, 1tx(u) ~ ~.
We now identify to what evaluation of A is associated the certainty qualification
presented above. "A is a-certain" is represented, in the general case where A is fuzzy,
by (14), i.e.
\:;/ u e U, 1tx(u) ~ max{J.LA(u), 1- a)
Since c ~ max(a, 1 - b) ¢::> (1 - a) --+ (1 - c) ~ b, where --+ denotes Glklel's
implication, (14) is equivalent to (Dubois and Prade, 1989a)
eN"(A) =infueU (1 - ~A(u» --+ (1-1tx(u» ~ a (24)
i.e.
"A is a-certain" ¢::> eN"(A) ~ a.
When A is an ordinary subset, (24) reduces to
eN"(A) =1- TI(A) =infue:A (1-1tx(u» (25)
where the overbar denotes the complementation. It means that certainty qualification,
when A is non-fuzzy is in complete agreement with the necessity measure based on
1t x and defined by duality with respect to the possibility measure. However when A
is fuzzy, the duality relation eN"(A) = 1 - TIO\), where II is extended to fuzzy events
by (23), is no longer satisfied. When A is fuzzy, since (14) is equivalent to "AI=JJ is
at least min(a,~)-certain", we get
eN"(A) ~ a¢::>\:;/ ~ < 1, eN"(A]V ~ min(a, 1 - ~) (26)
which expresses the relation between certainty-qualification and the measure of
necessity of the f3-cuts of A. See (Dubois and Prade, 1990a) for a discussion about
certainty-qualification in possibilistic logic with fuzzy predicates (which is in full
agreement with eN" defined by (24» and (Dubois and Prade, 1989a, 1990d) for the
distinct uses of eN"(A) and 1 - TI( A), the former in certainty-qualification, the latter
in fuzzy pattern matching. More particularly, as already said in (Dubois and Prade,
1989a), the statement "A is certain" may either mean that we are certain that the
possible values of x are inside A, i.e. 1t x ~ ~A (which is captured by eN"(A) = 1), or
that we are certain that the value of x is among the elements of U which completely
belong to A, i.e. support1t x = (u e U, 1tx(u) > O} .s:. core(A) = (u e U, ~A(u) = I}
which is captured by 1 - TI( A) = 1). This latter interpretation which is more
demanding is clearly related to the fuzzy filtering of fuzzily-known objects ; see
(Dubois, Prade and Testemale, 1988).
53
4. REPRESENTATION OF DIFFERENT KINDS OF FUZZY RULES
We now apply the results of Section 2 on possibility and certainty qualifications
to the specification of fuzzy rules relating a variable x ranging on U to a variable y
ranging on V.
Possibility rules : A first kind of fuzzy rule corresponds to statements of the form
"the more x is A, the more possible B is a range for y". If we interpret this rule as
"'ltu, if x = u, B is a range for x is at least Jl A(u)-possible", a straightforward
application of (13), yields the following constraint on the conditional possibility
distribution 1tylx( . ,u) representing the rule when x u =
'It u E U, 'It v E V, min(JlA(u), JlB(v» :5 1tylx(v,u). (27)
Certainty rules : A second kind of fuzzy rule corresponds to statements of the form
"the more x is A, the more certain y lies in B". Interpreting the rule as "'ltu, if x u, =
y lies in B is at least JlA(u)-certain", by application of (14) we get the following
constraint for the conditional possibility distribution modelling the rule
'It u E U, 'It v E V, 1tylx(v,u) :5 max(JlB(v), 1 - JlA(u». (28)
In the particular case where A is an ordinary subset and where we know that, if x is
in A, B is both a possible and a certain range for y, (27) and (28) yield
{
'It u E A, 1tylx(v,u) =JlB(v) (29)
'It u e A, 1tylx(v,u) is completely unspecified.
This corresponds to the usual modelling of a fuzzy rule with a non-fuzzy condition
part. Note that B may be any kind of fuzzy set in (27), (28) and then in (29). Thus B
may itself includes some uncertainty ; for instance the membership function of B
=
may be of the form JlB max(JlB*' 1 - ~) in order to express that when x is A, B*
is the (fuzzy) range of y with a certainty ~ (any value outside the support of B*
remains a possible value for y with a degree equal to 1 - ~) ; we may even have an
unnormalized possibility distribution which can be put under the form JlB =
min(JlB*,a) if the possibility that y takes its value in V is bounded from above by a
(i.e. there is a possibility 1 - a that y has no value in V, when x takes its value in
A).
Gradual rules : This third kind of fuzzy rule has been discussed in (Dubois and Prade,
1989a, 1990b, d). Gradual rules correspond to statements of the form "the more x is
A, the more y is B". Statements involving "the less" in place of "the more" are
easily obtained by changing A or B into their complements A and B due to the
equivalence between "the more x is A" and "the less x is A" (with Jl A 1 - Jl~. =
More precisely, the intended meaning of a gradual rule can be understood in the
following way: "the greater the degree of membership of the value of x to the fuzzy
set A and the more the value of y is considered to be in relation (in the sense of the
rule) with the value of x, the greater the degree of membership to B should be for this
value of y", i.e.
54
where J.1(J.1A(u),I] is the characteristic function of the interval (J.1A(u),1] and where T
is the fuzzy set of [0,1], defined by 'tI t e [O,I],J.1T{t) =t, which models the fuzzy
truth-value 'true' in fuzzy logic (Zadeh, 1978b). If we remember that "x is A
is 't-true", where 't is a fuzzy truth-value modelled by the fuzzy set 't of [0,1], is
represented by the possibility distribution (Zadeh, 1978b)
=
'tI u e U, 1tx(u) Ilt{J.1A(u» (33)
=
(note that 't T yields the basic assignment (12» , we can interpret the meaning of
=
gradual rules in the following way using (32) : 'tI u e U, if x u then y is B is at
least J.1A(u)-true. The membership function of the fuzzy truth-value 'at least a-true' is
pictured in Figure La. As it can be seen, it is not a crisp "at least a -true" (which
would correspond to the ordinary subset [0.,1]), but a fuzzy one in agreement with
truth-qualification in the sense of (33), indeed "at least I-true" corresponds to the
fuzzy truth-value T.
If we are only looking for the crisp possibility distributions 1tylx (i.e. the {O,l)-
valued ones) which satisfy (30), because we assume that it is a crisp relation between
y and x which underlies the rule "the more x is A, the more y is B", then we obtain
the constraint
I if J.1A(u) S J.1B(v)
'tI u e U, 1ty lx(v,u) S{ . =J.1(J.1
(u) 1]{J.1B(v» (34)
o if J.1A(u) > J.1B(v) A'
which expresses that the rule is now viewed as meaning: 'tIueU, if x =u then y is B
is at least J.1A(u)-true, where the truth-qualification is understood in a crisp sense. The
reader is referred to (Dubois and Prade, 1990b) for a discussion of this kind of rule.
A fourth type of fuzzy rules : The inequality (30) looks like (27) when exchanging
J.1B(v) and 1t y lX<v,u), while (31), which is equivalent tQ (30), is analogue to (28) in
the sense that in both cases 1tylx is bounded from above by a multiple-valued logic
implication function (in (28) it is Dienes' implication: a -+ b =max(1 - a, b) which
appears). It leads to consider the inequality constraint obtained from (28) by
exchanging J.1B(v) and 1ty lx(v,u), i.e.
'tI u e U, 'tI v e V, max(1ty lx(v,u), 1 - J.1A(u» ~ J.1B(v) (35)
This corresponds to a fourth kind of fuzzy rules, of which we now investigate the
55
intended meaning. (35) is perhaps more easily understood by taking the complement
to 1 of each side of the inequality, i.e.
"I u e U, "I v e V, 1 - J.1B(v) ~ min{J.1A(u), 1 -1tylx(v,u»
which can be interpreted as "the more x is A and the less y is related to x, the less y
is B", which corresponds to a new type of gradual rule. Using the
equivalence min(a, 1 - t) :5> 1 - b <=> 1 - t :5> a -+ (1 - b) <=> t ~ 1 - (a -+ (1- b»,
where -+ is G&lel's implication, we can still write (35) under the form
0 if J.1A(u) + J.1B(v) :5> 1
"IueU, "IveV, 1tylX<v,u)~ {
J.1B(v) if J.1A(u) + J.1B(v) > 1
=min{J.1(I-J.lA(u),I]{J.1B(v», J.1B(v» (36)
=J.1(1-J.lA(U),l)n,.<J.1B(v»
Unsurprisingly, the lower bound of 1tylx(v,u) which is obtained, is a multiple-valued
logic conjunction function of J.1 A(u)and J.1B(v) (indeed f(a,b) =0 if a + b :5> 1 and
= = = =
f(a,b) b otherwise, is such that f(O,O) f(O,I) f(I,O) 0 and f(I,I) 1). From =
(36) we see that this type of gradual rules can be interpreted in the following way,
using (33) : "I u e U, if x = u then y is B is at least (1 - J.1A(u»-true. The
membership function of the corresponding fuzzy truth-value "at least (1 - (X)-true" is
1 1 .---------.
/
(X
I-a
/
O~ _ _ _ _ __ '
0 " - - - - - -......
a 1 1
a. "at least a-true" b. "at least a-certainly true"
(core point of view)
1 / 1
0-------....1 O.......----.~....
I-a (X 1
c. "at least a-possibly true" d. "at least (l-a)-true"
(support point of view)
Fi~ure 1 :Four basic OOles of fuzzy truth-values
56
pictured in Figure l.d. As it can be observed by comparing Figures l.a and l.d, they
correspond to two points of view in (fuzzy) truth-qualification of level a, one
insisting on the complete possibility of degrees of truth greater than a (core point of
view), the other insisting on the complete impossibility of degrees of truth less or
equal to 1 - a (support point of view).
Figures I.b and I.c picture the fuzzy truth-values "at least a-certainly true" and
"at least a-possibly true", whose respective membership functions are
maxOJ..r ' 1 - a) and min(J.1T.a). It can be seen on Figure 1 and formally checked
that the four fuzzy truth-values we have introduced satisfy the two duality relations
'at least a-certainly true' =compoant ('at least a-possibly true') (37)
=
'at least a-true (core p. of v.)' compoant (at least l-a-true (support p. of v.» (38)
where comp and ant are two transformations reflecting the ideas of
=
complementation and antonymy respectively, and defmed by com p(f(t» 1 - f(t) and
ant(f(t» =f(1 - t), V t E [0,1] and f ranging in [0,1]. Note that compoant =
antocomp. Note that when there are only two degrees of truth, 0 (false) and 1 (true),
"at least a-certainly true" corresponds to the possibility distribution 1t(true) = 1,
= = =
1t(false) 1 - a and "at least a-possibly true" to 1t(false) 0 and 1t(true) a, while
the two other (fuzzy) truth-values make no sense. Dually when there are only two
degrees of possibility, 0 (complete impossibility) and 1 (complete possibility), then
the representations of "at least a-true" and "at least (1 - a)-true" respectively coincide
with the ordinary subsets [a,l] and (1- a, 1].
The four fuzzy truth-values pictured in Figure 1 (with a =IlA(u» can be viewed
as representing modifiers q> (in the sense of Zadeh (1972» which modify the fuzzy set
B into B* such that IlB* = q>(J.1a), in order to specify the subset of interest for y in
the various rules when x =u. For summarizing, in the case of
- possibility rules, the possibility distribution 1t ylx(· ,u) is bounded from below by
= =
q>(J.1a) with q>(t) min(J.1A(u), t), i.e. B is truncated up to the height a IlA(u) ;
- certainty rules, the possibility distribution 1t ylx( . ,u) is bounded from above by
=
q>(J.1:S> with q>(t) max(t, 1 - IlA(u», i.e. B is drowned in a level of indetermination
I-a;
- gradual rules (core point of view), the possibility distribution 1tylx ( . ,u) is bounded
from above by q>(J.1:S> with q>(t) = IlA(u) ~ t (where ~ denotes GMel's
implication), i.e. the core of B is enlarged ;
- gradual rules (support point of view), the possibility distribution 1t ylx( . ,u) is
bounded from below by q>(J.1:S> with q>(t) = 0 if Il A(u) + t ~ 1 and q>(t) = t
otherwise, i.e. the support of B is diminished, truncated.
~ : The similarity between (27) and (30) suggests that "the more x is A, the more
y is B", where y is in relation R with x, can be understood as meaning that VUE U,
B represents the statement "R(u) is a range for y which is at least possible at the
57
degree IlA(u)", where R(u) is the fuzzy set of elements in V in relation with u.
should be a possible range for y. In Figure 2.b, we have IlA2(UO) = 1 again, but now
58
IlAl (00) = ex < 1. The difference between certainty rules and gradual rules focusing
on cores appears clearly : for certainty rules, the intersection of B2 with B 1 is
pervaded with a level of uncertainty 1 - ex (i.e. min(Il~, max{J.1B 1' 1 - ex»), while
the upper bound of B' for the gradual rules stays between B 1 andB2' overlapping a
little more B 2 . Similarly the difference between possibility rules and gradual rules
focusing on supports also appears ; for the former we obtain IlB' ~
max{J.1~,min(IlBl'ex» which expresses that the values in Blare regarded as a priori
u
A = {uo}
1~--r---~---T----
B' ~Bl n B2
~--~------~--~--~v
less possible than the ones in B 2 ; for the latter some values in the support of Blare
considered as potentially impossible.
S.2. Generalized Modus Ponens with One Rule
In section 4, when studying the representation of the four types of rules
considered in the paper, we have described the response B' of a rule to a precise input
~L---------~~----~u
A'= {uo}
1~--~--~--~----
L.-_ _ _ _ _ _ _ _ _--l-_____......
~ V
I-a
.....r---'--------~---.L....~V
x is A'
<rule relating x is A with y is B>
Y isB'
As usual, and in agreement with (12), "x is A' " will be understood as
'V u e U, 1tx(u) =J.1A'(u)
while the rule, depending on the case, is represented
by 'V u e U, 'V v e Y, 1tylx(v,u) ~ J.1A(u) ~ J.1B(v) (case I)
orby 'V u e U, 'V v e Y, 1tylx(v,u) ~ J.1A(u) & J.1B(v) (caseII)
Applying the combination/projection principle (Zadeh, 1979; see Dubois and Prade,
I990d for a discussion), i.e. here
'V v e Y, 1ty(v) =sUPueU min(1tx(u), 1tylx(v,u» (41)
Thus we get, with 'Vv, J.1B'(v) =1ty(v)
'V v e Y, J.1B{v) ~ sUPue U min(J.1A'(u), J.1A(u) ~ J.1B(v» (case I)
'V v e Y, J.1B'(v) ~ sUPueU min(J.1A{u), J.1A(u) & J.1B(v» (caseII)
Let us fll'St consider the two kinds of rules belonging to case I.
For certainty rules : we obtain
'V v e Y, J.1B{v) ~ sUPueU min{J.1A{u), max(1 - J.1A(u), J.1B(v»
=max{J.1B(v), 1 - N(A ; A'» (42)
provided that A' is normalized and where
N(A; A,) =infueU max{J.1A(u), 1 - J.1A'(u»
is the dual of the possibility measure n(A ; A') of the fuzzy event A defined by (23)
(with 1tx =J.1A')' and N(A; A') is thus equal to 1 - n(A; A') and plays a basic role
in fuzzy pattern matching as briefly recalled at the end of section 3. The inequality
(42) expresses the following. Our lack of certainty that all the values restricted by
J.1 A' are (highly) compatible with the requirement modelled by J.1 A' induces a
possibility at most equal to 1 - N(A ; A') that the value of y is outside the support
of B. In other words (42) means that it is N(A ; A') certain that y is restricted by B.
For gradual rules focusing on cores, we have
'V v e Y, J.1B'(v) ~ sUPue U min{J.1A'(u), J.1[J.1A(u),I]uT{J.1B(v)))
From which it can be concluded that the least upper bounds derivable from the above
inequality are given by (Dubois and Prade, 1988) :
• J.1B'(v) ~ 1, 'V v e (ve Y, J.1B(v) ~ infueU (J.1A(u) I J.1A'(u) = I}}
• J.1B'(v) ~ sUPueU (J.1A'(u) I J.1A(u) =O} =n(support(A) ; A'),
'V v e (ve Y, J.1B(v) =O}
This shows that when A' is no longer a singleton, the enlarging effect of the core of
B may be increased and a non-zero possibility n(support(A) ; A') may be obtained
for values outside the support of B, for the possibility distribution restricting y. The
level of possibility n( supporl(A) ; A') acknowledges the fact that some possible
61
the Bi's respectively, see Dubois, Prade and Testemale (1988) for instance, for more
details. The case of implication-based gradual rules raises similar problems. The
processing of a collection of parallel gradual rules (focusing on cores) has been
investigated in (Dubois, Martin-Clouaire and Prade, 1988) and in Martin-Clouaire
(1988) to which the reader is referred. It is possible from the collection of rules to
build a new rule which, when applied to A', yields the optimal result, i.e. the value
of the upper bound expressed by (44) ; this new rule summarizes the knowledge
useful in the collection of rules for dealing with the fact "x is A' ".
Generally speaking, we have to define the consistency, the non-redundancy of
the set of fuzzy rules, and this leads to put some constraints on the coverage of U by
the Ai'S (see the first of the two above-mentioned references for definitions of these
notions). Clearly further research is needed for a complete investigation of the
practical processing of a collection of rules of a given type, also including the
problem of compound condition parts in the rules which have not been considered
here. Figure 2.c exhibits different behaviors of the four types of rules in the
case n = 2 where A' = Al ("'\ A2. We notice that we obtain B' ~ BI ("'\ ~ with
gradual rules focusing on cores, which confirms the "interpolation" flavor of this
behavior: if A' is between A 1 and A2 (in the sense of the intersection), then the
possible values of y are restricted by a fuzzy set in between Bland B2 ; a level of
uncertainty equal to 0.5 appears for certainty rules, this is due to the fact that with
continuous membership functions N(A; A) =0.5 as soon as A is a fuzzy set
(when A' =Al ("'\ A2 , we are not completely sure x that belongs to the core
of A I and to the core of A2' For the two types of rules corresponding to
case II, we obtain Bl U ~ ~ B' as expected (since here I1(~ ; A') =1 and
sUP{Jl~(u) I JlA'(u) > O} = 1, for i = 1,2).
As a final remark in this section, note that we may think of using two types of
rules simultaneously, especially for certainty and possibility rules, since it can be
checked that the two corresponding inequalities constraining 'Tty are consistent,
namely using both (42) and (43) we get
'V v E V, min{JlB(v) ; I1(A ; A'» S 'Ttlv) S max{JlB(v), 1 - N(A ; A'» (48)
It corresponds to the case of a piece of knowledge saying both that "the more x is A,
63
1~--~--~--~-----
~--~------~----~~V
' - -_ _- ' - -_ _ _ _ _ _........_ _---.Il......~ V
Possibility rules Gradual rules (focusing on supports)
Figure 2.c: J\vo rules in parallel and a fuzzy input
the more possible B as a range for y and the more certain y is in B".
or even an assignment statement (like "choose for y the value v0 ").When there are
many possible states of the world, it is difficult to partition them into rigid classes
where specific decisions can be totally recommended. As a result, the description of
the states of the world where a decision is relevant is often fuzzy, because decisions
can be more or less recommended. Hence the "if' part contains fuzzy predicates, and
the preference rule means
"the more the state of the world corresponds to <situation>,
the more recommended is <decisioo>"
Let x be a vector that contains the precise description of the world, S be a fuzzy set
of values of x corresponding to the description of a range of situations, and u(d) the
preference degree for decision d. By definition, u(d) = 0 means that d should be
rejected, u(d) = I means that d can be applied without any doubt The fuzzy preference
rule just indicates that u(d) can be quantified by IlS(x).
In fuzzy control (e.g. Mamdani (1977), Sugeno (1985», fuzzy rules can be
viewed as preference rules of the form
if x is ~ then (1ty = IlBi)
where Bi is viewed as a fuzzy set of recommended (possible) actions, and an action is
the selection of a value for y, the control parameter. Hence, it is a more general kind
of preference rules than the one when only one decision is involved in the conclusion
part Instead of proposing a single decision in situation Ai' a weighted ordered set is
proposed, as described by 1ty The preference ui(d) of assignment y = d, in the
presence ofx = Xo can be evaluated as a function ofIlBi(d) andll ~(xo>' for a single
rule i. Among natural conditions to be fulfilled is that ui(d) ~ min(J.1Bi(d),Il~(xo»,
which claims that Il~ (xo) stands as a upper bound on the degree of preference for y =
d, induced by rule i. When the equality is taken for granted, we get the fuzzy control
approach.
Usually a preference rule does not stand alone. The set of states of the world is
partitioned into a family of situations, where in each situation, a decision is
recommended. It corresponds to a decision table of the form
if <situation 1> then <decision 1>
else if <situation 2> then <decision 2>
else...
else if <situation n> then <decision n>
else <decision n + 1>
where <decision n + 1> may suggest to refrain from deciding in the case of an + Ith
situation that is defined by complementarity. If <decision i> corresponds to a single
decision, then, when <situation i> is fuzzy, the output is a fuzzy set of recommended
decisions {d i, ~(~», i=I,n+l} . This is what happens in the OPAL system for
instance (Bensana et al., 1988) where a decision table corresponds, to a priority rule
65
Again this is what happens in the OPAL system (Bensana et al., 1988) for instance.
The problem of cooperation between decision tables has been modelled in terms of
social choice (see Bel et al., 1989) and a software architecture for implementing fuzzy
decision tables and cooperation strategies has been devised (Dubois, Koning and Bel,
1989). It is based on a social choice interpretation of fuzzy set aggregation
connectives that is described elsewhere (Dubois and Koning, 1989).
The problem of preference rules and their implementation in rule-based systems
is certainly one of the important topics in Artificial Intelligence, for the forthcoming
years, as witnessed by some current activity in this area, from the standpoint of
utility theory (Keeney, 1988 ; Klein and Shortliffe, 1990), or cognitive psychology
(pinson, 1987).
7 - CONCLUDING REMARKS
The semantic contents of rules in fuzzy expert systems has received little
attention until now in spite of the enormous quantity of existing literature about
approximate reasoning and fuzzy controllers. The paper has tried to formally derive
different kinds of fuzzy rules based on very simple semantical considerations. Four
types of rules have emerged corresponding to very standard alterations of a possibility
distribution: enlarging its core, shrinking its support, truncating its height or,
drowning it in a uniform level of uncertainty. The paper has also pointed out that
fuzzy decision rules do not behave like fuzzy rules describing relationships.
Besides, rule-based expert systems have always been associated with an efficient
local computation strategy where a partial conclusion obtained from a (compound)
fact and a rule have to be combined with other conclusions pertaining to the same
matter and derived from other facts and rules. This kind of strategy can be especially
dangerous in presence of vague and uncertain pieces of knowledge since it may yield
conclusions which are not as accurate as it can be expected from the available
knowledge. Such conclusions may be even incorrect (see Pearl (1988), Heckerman
and Horvitz (1988), Dubois and Prade (1989b) for instance). This is due to the fact
that each rule, each variable to evaluate, cannot always be considered independently in
the evaluation process. A possibilistic hypergraph technique coping with this
problem has been recently developed by Kruse and Schwecke (1990), and by Dubois
and Prade (199Oc).
REFERENCES
Baldwin I.F., Pilsworth B.W. (1979) A model of fuzzy reasoning through multi-valued
logic and set theory. Int. I. of Man-Machine Studies, 11, 351-380.
Bel G., Bensana E., Dubois D., Koning J.L. (1989) Handling fuzzy priority rules in a
jobshop-scheduling system. Proc. of the 3rd. Inter. Fuzzy Systems Assoc. (IFSA)
Congress, Seattle, Wa., Aug. 6-11, 200-203.
Bensana E., Bel G., Dubois D. (1988) OPAL: a multi-knowledge-based system for
industrial job-shop scheduling. Int. I. Prod. Res., 26(5), 795-819.
Bonissone P.P., Gans S.S., Decker K.S. (1987) RUM : a layered architecture for
reasoning with uncertainty. Proc. of the 10th Inter. Ioint Conf. on Artificial
Intelligence (IJCAI-87), Milano, Italy, 891-898.
Bouchon B. (1987) Fuzzy inferences and conditional possibility distributions. Fuzzy
67
Sets and Systems, 23, 33-41.
Bouchon B. (1988) Stability of linguistic modifiers compatible with a fuzzy logic. In :
Uncertainty and Intelligent Systems (2nd Inter. Conf. on Information Processing and
Management of Uncertainty in Knowledge-Based Systems (IPMU'88), Urbino, Italy,
July 1988) (B. Bouchon, L. Saitta, R.R. Yager, eds.), Springer-Verlag, Berlin, 63-70.
Buchanan B.G., Shortliffe E.H. (1984) Rule-Based Expert Systems - The MYCIN
Experiment of the Stanford Heuristic Programming Project. Addison-Wesley, Reading,
Mass ..
Despres S. (1989) GRIF : a guide for representing fuzzy inferences. Proc. of the 3rd
Inter. Fuzzy Systems Assoc. (IPSA) Congress, Seattle, Wa., Aug. 6-11, 353-356.
Di Nola A., Pedrycz W., Sessa S. (1985) Fuzzy relation equations and algorithms of
inference mechanism in expert systems. In : Approximate Reasoning in Expert
Systems (M.M. Gupta, A. Kandel, W. Bandler, lB. Kiszka, eds.), North-Holland,
Amsterdam, 355-367.
Dubois D., Koning J.L. (1989) Social choice axioms for fuzzy set aggregation.
\\brkshop on Aggregation and Best Choices of Imprecise Opinions, Bruxelles, Jan.
1989. Available in Tech. Report IRITJ90-5/R, Univ. P. Sabatier, Toulouse, France. To
appear in Fuzzy Sets and Systems.
Dubois D., Koning J.L., Bel G. (1989) Antagonistic decision rules in knowledge-based
systems (in French). Proc. 7th AFCET Congress on Artificial Intelligence and
Pattern Recognition, Paris.
Dubois D., Lang J., Prade H. (1990) Fuzzy sets in approximate reasoning - Part 2 :
Logical approaches. Fuzzy Sets and Systems, 25th Anniversary Memorial Volume, to
appear.
Dubois D., Martin-Clouaire R., Prade H. (1988) Practical computing in fuzzy logic. In:
Fuzzy Computing - Theory, Hardware and Applications (M.M. Gupta, T. Yamakawa,
eds.), North-Holland, Amsterdam, 11-34.
Dubois D., Prade H. (1982) On several representations of an uncertain body of
evidence. In : Fuzzy Information and Decision Processes (M.M. Gupta, E. Sanchez,
eds.), North-Holland, Amsterdam, 167-181.
Dubois D., Prade H. (1984) Fuzzy logics and the generalized modus ponens revisited.
Cybernetics and Systems, 15, 293-331.
Dubois D., Prade H. (with the collaboration of Farreny H., Martin-Clouaire R.,
Testemale C.) (1988) Possibility Theory - An Approach to Computerized Processing
of Uncertainty. Plenum Press, New York.
Dubois D., Prade H. (1989a) A typology of fuzzy "If... then ... " rules. Proc. of the 3rd
Inter. Fuzzy Systems Assoc. (lFSA) Congress, Seattle, Wa., Aug. 6-11, 782-785.
Dubois D., Prade H. (1989b) Handling uncertainty in expert systems : pitfalls,
difficulties, remedies. In : Reliability of Expert Systems (E. Hollnagel, ed.), Ellis-
Horwood, Chichester, U.K., 64-118.
Dubois D., Prade H. (1990a) Resolution principles in possibilistic logic. Int. J. of
Approximate Reasoning, 3, 1-21.
Dubois D., Prade H. (1990b) Gradual inference rules in approximate reasoning. In :
Tech Report IRITJ90-6/R, IRIT, Univ. P. Sabatier, Toulouse, France. Information
Sciences, to appear.
Dubois D., Prade H. (199Oc) Inference in possibilistic hypergraphs. In : Tech. Report
IRITJ90-6/R, IRIT, Univ. P. Sabatier, Toulouse, France. Extended abstracts of the 3rd
Conf. on Information Processing and Management of Uncertainty in Knowledge-
Based Systems (IPMU'90), Paris, July 2-6, 228-230.
Dubois D., Prade H. (199Od) Fuzzy sets in approximate reasoning - Part 1 : Inference
with possibility distributions. Fuzzy Sets and Systems, 25th Anniversary Memorial
Volume, to appear.
Dubois D., Prade H., Testemale C. (1988) Weighted fuzzy pattern matching. Fuzzy Sets
and Systems, 28, 313-331.
Fonck P. (1990) Representation of vague and uncertain facts. Proc. of the 3rd Inter.
Conf. on Information Processing and Management of Uncertainty in Knowledge-
Based Systems (lPMU'90), Paris, July 2-6, 284-288.
Godo L., L6pez de Mantaras R., Sierra C., Verdaguer A. (1988) Managing linguistically
expressed uncertainty in MILORD : application to medical diagnosis. Artificial
68
1 Introduction
Fuzzy Set Theory, introduced by Zadeh in 1965 [77], has been the subject
of much controversy and debate. In recent years, it has found many
applications in a variety of fields. Among the most successful applications
of this theory has been the area of Fuzzy Logic Control (FLC) initiated
by the work of Mamdani and Assilian [36]. FLC has had considerable
success in Japan, where many commercial products using this technology,
have been built.
In this paper, we will review the basic architecture of fuzzy logic con-
trollers and discuss why this technology often provides controllers with
performance similar to the performance of an expert human operator for
ill-defined and complex systems. In section 2, an introductory survey of
the basics of fuzzy set theory is presented. Next, the basic architecture
of a FLC is described, followed by a brief review of the application of
this theory. Finally, we discuss how a fuzzy logic based control system
can learn from experience to fine-tune its performance.
J.l(x)
NegatIve PosItIve
+
x
Figme 1: Examples of Fuzzy membership functions
1 ifzEA
#LA(Z)= { 0 ifz~A.
#LA : X -+ [0,1]
where X refers to the universal set defined in a specific problem. If this
universal set is countable and finite, then a fuzzy set A in this universe
can be defined by listing each member and its degree of membership in
the set A : n
A = E#LA(Zi)/Zi .
i=l
Similarly, if X is continuous, then a fuzzy set A can be defined by
Note that in the above definitions, "/" does not refer to a division and
is used as a notation to separate the membership of an element from the
element itself. For example, in A = .2/ elementl + .6/ element2, elementl
has membership value of .2 and element2 has a membership value of .6
in the fuzzy set A. As another example, the linguistic term Positive
as shown in Figme 1 may be defined to take the following membership
function:
1 ifz>4
#Lpo3itive(Z) = { Z;l if1;:::z~4
o otherwise.
The support of a fuzzy set A in the universal set X is a crisp set that
contains all the elements of X which have degree of membership greater
71
than zero1 . In the above example, the support set includes all the real
numbers for which 1'( z) ~ o.
The a-cut of a fuzzy set A is defined as the crisp set of all the elements
of the universe X which have memberships in A greater than or equal
to a, where
Aa = {z E XII'A(Z) ~ a}.
For example, if the fuzzy set A is described by its membership function:
I'CON(A)(Z) = I'~(z)
( a) (b )
(c) (d)
Decoder
(Defuzzifier)
Decision
Making Logic Know ledge Base I l Plant
I
Coder
(Fuzzifler)
The selection of the types of fuzzy variable directly affects the type
of reasoning to be performed by the rules using these variables. This is
described later in 3.3. Mter the values of the main control parameters
are determined, a knowledge base is developed using the above control
variables and the values that they may take. If the knowledge base is a
rule base, more than one rule may fire requiring the selection of a conflict
resolution method for decision making, as will be described later.
Figure 3 illustrates a simple architecture for a fuzzy logic controller.
The system dynamics of the plant in this architecture is measured by a
set of sensors. This architecture consists of four modules whose functions
are described next.
sensor
measurement
~(x)
1.0
x
I Xc>
(b)
3. Modeling a process
4. Self Organization
Among the above methods, the first method is the most widely used
[36]. In modeling the human expert operator's knowledge, fuzzy control
rules of the form:
have been used in studies such as [51, 53]. This method is effective
when expert human operators can express the heuristics or the knowl-
edge that they use in controlling a process in terms of rules of the above
form. Applications have been developed in process control (e.g., cement
kiln operations [23]). Beside the ordinary fuzzy control rules which have
been used by Mamdani and others, where the conclusion of a rule is an-
other fuzzy variable, a rule can be developed whereby its conclusion is a
function of the input parameters. For example, the following implication
can be written:
I'(y)
11<><)
usually more than one fuzzy control rule can fire at one time. The
methodology which is used in deciding what control action should be
taken as the result of the firing of several rules can be referred to as the
process of conflict resolution. The following example, using two rules,
illustrates this process. Assume that we have the following:
Rule 1: IF X is At and Y is B t THEN Z is C t
Rule 2: IF X is A2 and Y is B2 THEN Z is C2
Now, if we have Zo and Yo as the sensor readings for fuzzy variables X
and Y, then their truth values are represented by PAl (zo) and PBI(YO)
respectively for Rule 1, where PAl represents the membership function
for At. Similarly for Rule 2, we have PA2(ZO) and PB2(YO) as the truth
values of the preconditions. The strength of Rule 1 can be calculated by:
where n is the number of rules with firing strength (Wi) greater than 0
and Zi is the amount of control action recommended by rule i.
78
z. _ L:~=l ZjJ.Lc(Zj)
- L:~=lJ.LC(Zj)
Al
2.
3
Az z
1
3
x y z
xo=4 Yo=8
3.4.5 An example
Assume that we have the following two rules:
Rule 1: IF X is Al and Y is Bl THEN Z is C1
Rule 2: IF X is A2 and Y is B2 THEN Z is C 2
Suppose Zo and Yo are the sensor readings for fuzzy variables X and Y,
and the following are membership functions:
0:-2 0:-3
2:::;z:::;5 3:::;z:::;6
PAl = """"3
8-0:
-3- 5<z:::;8 PA2 = """"3
9-0:
-3- 6<z:::;9
ll=! 5:::;y:::;8 ll=! 4:::;y:::;7
PBI = 3
11-11
8<y:::;n PB2 = 1&-,1 7 < y:::; 10
3 3
%-1 %-3
1:::;z:::;4 3:::; z:::; 6
PCI = -3-
7-%
-3- 4<z:::;7 P C2 = -3-
9-%
-3- 6<z:::;9
2. the crisp value of the control action using the COA and MOM
methods.
First, the sensor readings Zo and Yo have to be matched against the
preconditions At. Bl respectively. This will produce PAl (Zo) = 2/3 and
PB I (Yo) = 1. Similarly, for rule 2, we have PA,(ZO) = 1/3 and PB,(YO) = 2/3.
The strength of rule 1 is calculated by:
3. Let U = {ttl, tt2, . •• , Un} where Ui is the set of input control param-
eters related to achieving gi.
4. Let A = {at, a2, ... , a,.} where at is the set of linguistic values used
to describe the values of the input control parameters in tti .
5. Let C = {Cl' C2, •••, en} where Ci is the set of linguistic values used
to describe the values of the output Z .
the rules for moving to a specific location. Afuzzy set operation known
as concentration [78] as described earlier can be used here to systemat-
ically obtain a more focused membership functions for the parameters
which represent the achievement of previous goals. The above algorithm
has been applied in cart-pole balancing and more details can be found
in [8].
stored before hand. For more complex processes than the ones discussed
by Procyk and Mamdani [44](Le., other than single-input single-output
processes), the generation ofthese decision tables may be difficult.
Internal reinforcement
A5N
system's failure over time. The integrated fuzzy-AHC model has been
tested in the domain of cart-pole balancing and the results have been
consistently better when compared with the performance of the AHC
model alone (e.g., in terms of speed of learning and smoothness of con-
trol). However, this model is difficult to apply for other control systems
due mainly to the fact that developing the mathematical functions for
the trace function and credit assignment are not trivial. The structure
proposed here suffers from the lack of generality and may be difficult to
apply to larger scale systems.
7 Discussion
Among the problems which still deserve serious attention is the problem
of providing proof of stability for FLCs. In contrast to the analytical
control theory, FLC lacks this necessary attribute although some theo-
retical work has begun producing interesting results (e.g., [29, 26, 10, 11,
45, 17, 16, 68, 24, 62, 21, 43, 74, 47, 25]).
Another area that requires attention is in what we refered to as fuzzy
modeling of systems earlier in this paper. Here the attention should be
focused on structure identification and parameter identification of the
dynamics of a system in order to develop a model which could later be
used to develop the fuzzy logic controller [56, 50].
Finally, as we briefly discussed in the previous section, artificial neu-
ral networks and fusion techniques are being developed in order to de-
velop fuzzy logic controllers which can learn from experience. Despite
these open issues, fuzzy logic control has achieved a huge commercial
success in recent years. Because these controllers are easy to manufac-
ture and greatly resemble human reasoning, it is expected that there will
be many more applications in the near future.
References
[1] Intemational Conference on Fuzzy Logic & Neural Networks, vol-
ume one and two, lizuka, Japan, 1988.
[3] W.H. Bare, R .J. Mulholland, and S.S. Sofer. Design of a self-tuning
rule based controller for a gasoline refinery catalytic reformer. IEEE
Transactions on Automatic Control, 35(2):156-164, 1990.
[4] H. R. Berenji, Y. Y. Chen, C. C. Lee, S. Murugesan, and J . S. Jang.
An experiment-based comparative study of fuzzy logic control. In
American Control Conference, Pittsburgh, 1989.
[6] H.R. Berenji. A reinforcement learning based model for fuzzy logic
control. International Journal of Approximate Reasoning, 1991 (to
appear).
[7] H.R. Berenji. An architecture for designing fuzzy controllers using
neural networks. In Second Joint Technology Workshop on Neural
Networks and Fuzzy Logic, Houston, Texas, April 1990.
[8] H.R. Berenji, Y .Y. Chen, C.C. Lee, J.S. Jang, and S. Murugesan.
A hierarchical approach to designing approximate reasoning-based
controllers for dynamic physical systems. In Sixth Conference on
Uncertainty in Artificial Intelligence, pages 362-369, 1990.
[9] J. A. Bernard. Use ofrule-based system for process control. IEEE
Control Systems Magazine, 8, no. 5:3-13, 1988.
[10] M. Braae and D.A. Rutherford. Theoretical and linguistic aspects
of the fuzzy logic controller. Automatica, 15, no. 5:553-577, 1979.
[11] Y.Y. Chen. Stability analysis of fuzzy control-a lyapunov ap-
proach. In IEEE Systems, Man, Cybernetics, Annual Conference,
volume 3, pages 1027-1031, 1987.
[12] E. Czogala and T. Rawlik. Modelling of a fuzzy controller with
application to the control of biological processes. Fuzzy Sets and
Systems, 31:13-22, 1989.
[13] J. Efstathiou. Rule-based process control using fuzzy logic. In
E. Sanchez and L.A. Zadeh, editors, Approximate Reasoning in In-
telligence Systems, Decision and Control, pages 145-148. Pergamon,
New York, 1987.
[14] B. P. Graham and R. B. Newell. Fuzzy identification and control of
a liquid level rig. Fuzzy Sets and Systems, 26:255-273, 1988.
[15] B. P. Graham and R. B. Newell. Fuzzy adaptive control of a first
order process. Fuzzy Sets and Systems, 31:47-65, 1989.
[16] G. M. Trojan Gupta, M. M. and J. B. Kiszka. Controllability of
fuzzy control systems. IEEE Transactions on Systems, Man, and
Cybernetics, SMC-16, no. 4:576-582, 1986.
[17] W. Pedrycz Gupta, M. M. Cognitive and fuzzy logic controllers:
A retrospective and perspective. In American Control Conference,
pages 2245-2251, 1989.
91
[24] J.B. Kiszka, M.M. Gupta, and P.N. Nikiforuk. Energistic stability
of fuzzy dynamic systems. IEEE Trans. Systems, Man and Cyber-
netics, SMC-15(6), 1985.
[25] J.B. Kiszka, M.M. Gupta, and G.M. Trojan. Multivariable fuzzy
controller under godel's implication. Fuzzy Sets and Systems,
34:301-321, 1990.
[26] S. V. Komolov, S. P. Makeev, and F. Shaknov. Optimal control
of a finite automation with fuzzy constraints and a fuzzy target.
Cybernetics, 16(6):805-810, 1979.
[27] B. Kosko. Fuzzy cognitive maps. International Journal of Man-
Machine Studies, 24:65-75, 1986.
[28] B. Kosko. Fuzzy associative memories. In Kandel A., editor, Fuzzy
Expert Systems. Addison-Wesley, 1987.
[29] G.R. Langari and M. Tomizuka. Stability of fuzzy linguistic control
systems. In IEEE Conference on Decision and Control, Hawaii,
December 1990.
92
[30] L. I. Larkin. A fuzzy logic controller for aircraft flight control. In
M. Sugeno, editor, Industrial Applications of Fuzzy Control, pages
87-104. North-Holland, Amsterdam, 1985.
[31] C.C. Lee. Self-learning rule-based controller employing approximate
reasoning and neural-net concepts. Int. Journal of Intelligent Sys-
tems, 1990.
[32] C.C. Lee and H.R. Berenji. An intelligent controller based on ap-
proximate reasoning and reinforcement learning. In Proc. of IEEE
Int. Symposium on Intelligent Control, Albany, NY, 1989.
[33] Fujitec Co. Ltd. Flex-8800 series elevator group control system.
Technical report, Fujitec Co. Ltd., Osaka, Japan, 1988.
[34] Scharf. E. M. and N. J .Mandie. The application ofafuzzy controller
to the control of a multi-degree-freedom robot arm. In M. Sugeno,
editor, Industrial Applications of Fuzzy Control, pages 41-62. North-
Holland, Amsterdam, 1985.
[35] S. Mabuchi. An approach to the comparison of fuzzy subsets with
an a-cut dependent index. IEEE Transactions on Systems, Man,
and Cybernetics, 18(2), 1988.
[36] E. H. Mamdani and S. Assilian. An experiment in linguistic syn-
thesis with a fuzzy logic controller. International Journal of Man-
Machine Studies, 7(1):1-13, 1975.
[37] S. Murakami and M. Maeda. Application of fuzzy controller to au-
tomobille speed control system. In M. Sugeno, editor, Industrial
Applications of Fuzzy Control, pages 105-124. North-Holland, Am-
sterdam, 1985.
[38] S. Murakami, F. Takemoto, H. Fujimura, and E. Ide. Weld-line
tracking control of arc welding robot using fuzzy logic controller.
Fuzzy Sets and Systems, 32:221-237, 1989.
[39] A. Ollero and A.J. Garcia-Cerezo. Direct digital control, auto-
tuning and supervision using fuzzy logic. Fuzzy Sets and Systems,
30:135-153, 1989.
[40] H. Ono, T . Ohnishi, and Y. Terada. Combustion control of refuse
incineration plant by fuzzy logic. Fuzzy Sets and Systems, 32:193-
206, 1989.
93
[53] M. Sugeno and M. Nishida. Fuzzy control of model car. Fuzzy Sets
and Systems, 16:110-113, 1985.
[54] H. Takagi and I. Hayashi. Artificial-neural-network-driven fuzzy
reasoning. Int. J. of Approximate Reasoning, (to appear).
[74] Hao Ying, William Siler, and James J. Buckley. Fuzzy control the-
ory: A nonlinear case. Automatica, 26(3):513-520, 1990.
96
H.-J. Zimmermann
RWfHAachen
Tempiergraben 55
W-5100 Aachen (Germany)
1. INTRODUCTION
maximize f(x)
is a strict imperative. This also implies that the violation of any single
constraint renders the solution infeasable and that all constraints are of equal
importance (weight). Strictly speaking, these are rather unrealistic
assumptions, which are partly relaxed in "fuzzy linear programming".
Definition 1:
Assume that we are given a fuzzy goal G and a fuzzy constraint Cin a
space of alternatives X. Then Gand Ccombine to form a decision, 15, which is
a fuzzy set resulting from intersection of 0 and C. In symbols, i5 = G n Cand
correspondingly
15 = 0 1 n O2 n· ··· nan n C1 n ~ n n Cm
and correspondingly
Find X
such thatcTx ~ z
Ax ""
~b
""
x ~o (3)
Find x
such that Bx ~ d
x~O (4)
l-'i(X) can be interpreted as the degree to which x fulfills (satisfies) the fuzzy
inequality BjX~dj (where Bj denotes the ith row of B).
Assuming that the decision maker is interested not in a fuzzy set but in
a crisp "optimal" solution we could suggest the "maximizing solution" to (5),
which is the solution to the possibly nonlinear programming problem
ifBjX s di
if dj < BjX S dj + Pi i=1,... m+1
if Bj x > dj + pj (7)
1 if BjX S dj
{
BjX-dj
ISj(X) = 1- if di < BjX S dj+pj i=1,..,m+1
pj
0 if BjX > dj+pj (8)
maximize A
such that Apj+Bjx sdj+pj i=1,... ,m+1
x ~O (10)
If the optimal solution to (10) is the vector (A, "0), the "0 is the
maximizing solution (6) of model (2) assuming membership functions as
specified in (8).
The reader should realize that this maximizing solution can be found
by solving one standard (crisp) LP with only one more variable and one more
constraint than model (4). This makes this approach computationally very
efficient.
A slightly modified version of models (9) and (to), respectively, results
if the membership functions are defined as follows: A variable tj, i = 1,... ,m + 1,
OstjsPj, is defined which measures the degree of violation of the ith
constraint: The membership function of the ith row is then
(11)
102
maximize .x
such that .xpj+tj :Spj i=1,.... ,m+1
BjX - tj :S dj
tj :S pj
x,t ~ 0 (12)
This model is larger than model (10) even though the set of constraints
tj:Spj is actually redundant. Model (12) has some advantages, however, in
particular when performing sensitivity analysis.
The main advantage, compared to the unfuzzy problem formulation, is
the fact that the decision maker is not forced into a precise formulation
because of mathematical reasons, even though he might only be able or willing
to describe his problem in fuzzy terms. Linear membership functions are
obviously only a very rough approximation. Membership functions which
monotonically increase or decrease, respectively, in the interval of [dj,dj+pj]
can also be handled quite easily, as will be shown later.
It should also be observed that the classical assumption of equal
importance of constraints has been relaxed: the slope of the membership
functions determines the "weight" or importance of the constraint. The slopes,
however, are determined by the pj's: The smaller the pj the higher the
importance of the constraint. For pj=O the constraint becomes crisp, i.e. no
violation is allowed.
maximize .x
such that .xpj+BjX:S dj+pj
Ox :Sb
x,.x ~ 0 (13)
0 iff(x) :s sup f
Rl
f(x)-sup f
Rl
J.'G(x) = if sup f < f(x) <sup f
Rl S(R)
su~ f-sup f
S( ) Rl
1 if sup f:s f(x)
S(R)
Adding this fuzzy set to the fuzzy sets defining the solution space gives
again a symmetrical model to which (10) or (12) can be applied. Definition 2
becomes easier to understand if we apply it to a specific given LP-structure:
Let the membership functions of the fuzzy sets representing the fuzzy
constraints be defined in analogy to (8) as
o (15)
104
On the basis of the two LP's following, the membership function of the
fuzzy set defined in definition 2 can then easily be defined:
such that Ax sb
Ox sb'
x ~O (16)
to
J.'o(x) = {
2rx- f.
I
if
iffl
s cTx
< Crx < fo
to - fl
o ifcTx S fl (18)
maximize .oX.
AP+ Axsb+p
Oxsb'
sl
A,X~O (19)
105
Example:
maximize
such that
Xl + x2:S 4
5xl + x2:S 3
xl,x2 ~ 0
maximize A
2C EXTENSIONS
1. Linear membership functions were assumed for all fuzzy sets involved.
2. The use of the minimum-operator for the aggregation of fuzzy sets was
considered to be adequate.
1 a+b
I'H(x) =- where x =
2 2
I'H(X) is strictly convex on [- co, (a + b)/2] and strictly concave on [(a + b)/2, +
co ].
For all x e IR: = 0 < I'H(x) < 1 and I'H(x) approaches asymptotically f(x) =0
and f(x) = 1, respectively.
107
Leberling shows that choosing as lower and upper aspiration levels for
= =
the fuzzy objective function z ex of an LP a £ (lower bound of z) and b c =
(upper limit of the objective function), and representing this (fuzzy) goal by a
hyperbolic function one arrives at the following crisp equivalent problem for
one fuzzy goal and all crisp constraints:
minimize A
eZ'(x) - e -Z'(x) 1
1
such that A- ------<-
2 eZ'(x) + e-Z'(x) 2
Ox:s b'
X,A~O (20)
with Z'(x) = (Lj Cj ":; - ~(C + £)S. For each additional fuzzy goal or
constraint one of these exponential rows has, of course, to be added to (20).
For Xn+1 = tanh- 1(2A - 1), model (20) is equivalent to the following
linear model:
maximize
1
such that S fCj":; -Xn+l ~ 2 s(c + £)
Ox:s b'
Xn+l,x ~ 0
(21)
This is again a standard linear programming model which can be solved, for
instance, by any available simplex code.
The above equivalence between models with nonlinear membership
functions is not accidental. It has been proven that the following relationship
holds [Werners 1984, p. 143].
Theorem 1
maximize A
(22)
maximize A'
If there exists a A0 e R'such that (A 0, xO) is the optimal solution of (22) then
there exists a A'O e R' such that (A 0, xO) is the optimal solution of (23).
Theorem 1 suggests that quite a number of nonlinear membership
functions can be accommodated easily. Unluckily, the same optimism is not
justified concerning other aggregation operators.
The computational efficiency of the approach mentioned so far has
rested to a large extent on the use of the min-operator as a model for the
logical "and" or the intersection of fuzzy sets, respectively. Axiomatic
[Hamacher 1978] as well as empirical (Thole, Zimmermann, Zysno 1979,
Zimmermann, Zysno 1980, 1983] investigations have shead some doubt on the
general use of the min-operator in decision models. Quite a number of context
free or context dependent operators have been suggested in the meantime [see,
e.g., Zimmermann 1990b, ch. 3]. The disadvantage of these operators is,
however, that the resulting crisp equivalent models are no longer linear [see,
e.g., Zimmermann 1978, p.45], which reduces the computational efficiency of
these approaches considerably or even renders the equivalent models
unsolvable within acceptable time limits. There are, however, some exceptions
to this rule, and we will present two of them in more detail.
One of the objections against the min-operator (see, for instance,
Zimmermann and Zysno [1980]) is the fact that neither the logical "and" nor
the min-operator is compensatory in the sense that increases in the degree of
membership in the fuzzy sets "intersected" might not influence at all
membership in the resulting fuzzy set (aggregated fuzzy set or intersection).
There are two quite natural ways to cure this weakness:
or
or
For linear membership functions of the goals and the constraints (25) is a
mixed integer linear program that can be solved by the appropriate available
codes.
If one wants to distinguish between an "and"-aggregation and an "or"-
aggregation (for instance, for the sake of easier modelling) one may want to use
the following operators:
110
Let I'i(x) be the membership functions of fuzzy sets which are to be aggregrated
in the sense of a fuzzy and (and). The membership function of the resulting
fuzzy set is defined to be
m 1
I'and(x) = 1 ·I!lin
,-I
l'i(X) + (1 - 1) - Ll'i(X)
m
with 1 E [0, 1].
m 1 m
I'Of(x) = l ·l!lax
l I'i(x) + (1- 1) m- Ll'i(x)
,- i-I
These two connectives are not inductive and associative, but they are
commutative, idempotent, strictly monotonic increasing in each component,
continuous, and compensatory [Werners 1984, p. 168]. These are certainly very
useful and acceptable properties.
If we use the aggregation operator from definition 2 in model (4), then
the "equivalent model" is:
maximize
(26)
So far, the reference model from which we have departed has always
been the "standard LP". Depending on the type of operator chosen and the
111
maximize f(x)
(28)
For "symmetric" fuzzy numbers a = (m, m, 0:, 0: k-L as shown in fig. 1 system
(28) reduces to
(29)
o m -(}' m m+{) t
(30)
n n
- 0L ( .LO:ijXr1'i)~Pi- .LmijXj'
J-1 J-1
n n
£R (L,8 .. x:- o.)<q.-
j-1 IJ--J 1 - 1
Ln··x·
j_11J J'
n n
",8 IJ.. x:-
( .k -] o.)<q.-
1 - 1 "n··x·
.k IJ J'
J"1 J-1 (31)
113
(31) is a system of crisp linear inequalities, which - together with the crisp
objective function - can now be solved with any classical LP-method. Not
counting the nonnegativity constraints, the number of rows in (31) is, however,
four times as large as that of (27). It should also be noted that (28) is a specific
interpretation ·of the fuzzy inequality relation. The authors offer two other
interpretations, which lead to slightly different results.
(32)
with the function L: [0, +00 [-+ [0, 1] being defined by the formula L(u) = max
{0,1 - u} for u ~ O.
concerning the fuzzy sets are made, but the objeCtive function(s) and the
nonnegativity constraints are also fuzzified. Also Rommelfanger [1989] goes
into this direction.
More general treatments of this problem can be found in Delgado et
al. [1989], Dubois [1987], Orlovski [1989] and others.
4. APPLICATIONS
Fuzzy Mathematical Programming has been applied to other areas of
theoretical investigations as well as to practical applications.
4A METHODOLOGICAL APPLICATIONS
Due to the "symmetry" of the majority of the models in FMP the number of
Objective functions does not matter. In classical mathematical programming,
however, normally only one Objective function, which generates the order over
the solution space, could be accepted. If there are more than one objective
function, multi objective decision making models or "vectorial optimization"
models have to be applied, which normally require a much higher
computational effort. It is, therefore, quite natural that FMP has been applied
extensively to the area of multi criteria analysis.
4B PRACfICAL APPLICATIONS
Real applications of fuzzy mathematical programming are still pretty rare. This
is certainly not due to weaknesses of FLP. Experience shows that people who
have been using linear programming for quite a while have become so used to
"cutting problems" to LP-models that they do not see the need to allow for
uncertainty. The acceptance of FLP seems to be higher amongst people who
have never used LP before and who are looking for tools to solve their
problems properly. Another reason for not finding interesting applications in
the literature is, of course, that good applications are not published for
competitive reasons and failures are not published for other obvious reasons.
FLP has been applied to blending problems with sensory constraints (such as
the blending of chocolate stretch, champagne cuvoo, paints etc.). The paint
application was published [Zimmermann et at. 1986J. Another application was
in logistics by Ernst [1982J, which we will sketch in the following. He suggests a
fuzzy model for the determination of time schedules for containerships, which
can be solved by branch and bound, and a model for the scheduling of
containers on containerships, which results eventually in an LP. We shall only
consider the last model (a real project).
The model contained in a realistic setting approximately 2,000
constraints and originally 21,000 variables, which could then be reduced to
approximately 500 variables. Thus it could be handled adequately on a modern
computer. It is obvious, however, that a description of this model in a textbook
would not be possible. We shall, therefore sketch the contents of the modeling
verbally and then concentrate on the aspects that included fuzziness.
The system is the core of a decision support system for the purpose of
scheduling properly the inventory, movement, and availability of containers,
especially empty containers, in and between 15 harbors. The containers were
Shipped according to known time schedules on approximately 10 big
containerships worldwide on 40 routes. The demand for container space in the
harbors was to a high extent stochastic. Thus the demand for empty containers
in different harbors could either be satisfied by large inventories of empty
containers in all harbors, causing high inventory costs, or they could be shipped
from their locations to the locations where they were needed, causing high
shipping costs and time delays.
Thus the system tries to control optimally primarily the movements
and inventories of empty containers, the capacities of the ships, and the
predetermined time schedule of the Ships.
This problem was formulated as a large LP model. The objective
function maximized profit (from shipping full containers) minus cost for
moving empty containers minus inventory cost of empty containers. When
comparing data of past periods with the model it turned out, that very often
ships transported more containers than their specific maximum capacity. This,
116
Let
maximize z = cTx
such that Ax ~ d
Bxsb
x~O (35)
maximize z'= cT - if
1;s·(p.
II I
- b-)I'·
I I
(1-)
I
such that Ax s d + t
Bxsb
tsp-b
x, t~O (36)
capacity, which was desirable for reasons of safety. Then "tolerance" variables t
were introduced:
Bx - t s .9b
t s .1b
The objective function became
s was defined to be
average profit of shipping a full container
s= average number of time periods which elapsed
between departure and arrival of a container
By the use of this definition more than 90% of the capacity of the ships
was used only if and when very profitable full containers were available for
shipping at the ports, a policy that seemed to be very desirable to the decision
makers.
S. CONCLUSIONS
Mathematical programming is one of the areas to which fuzzy set theory has
been applied extensively. Even if one considers the area of linear programming
only, numerous new models - linear and nonlinear - have emerged through the
application of fuzzy set theory. A good part of the models are of primarily
theoretical interest. Still even from an application point of view, fuzzy
mathematical programming is a valuable extension of traditional crisp
optimization models. It is surprising that some areas, such as duality theory,
have not yet drawn more interest. There further developments can still be
expected.
REFERENCES
INTRODUCTION
Computer vision is the study of theories and algorithms involving the
sensing and transmission of images; preprocessing of digital images for noise
removal, smoothing, or sharpening of contrast; segmentation of images to isolate
objects and regions; description and recognition of the segmented regions; and
finally interpretation of the scene. We normally think of images in the visible
spectrum, either monochrome or color, but in fact, images can be produced by
a wide range of sensing modalities including X-rays, neutrons, ultrasound,
pressure sensing, laser range finding, infrared, and ultraviolet, to name a few.
The first connection of fuzzy set theory to computer vision was made by
Prewitt [1] who suggested that the results of image segmentation should be fuzzy
subsets rather than crisp subsets of the image plane. In order to apply the rich
assortment of fuzzy set theoretic operators to an image, the gray levels (or
feature values) must be converted to membership values. Let X denote the
domain of the digital image. Then a fuzzy subset of X is a mapping J.'f X -+[0,1],
where the value of J.'/x,y) is dependant upon the original feature vector f(x,y).
The calculation of membership functions is central to the application of fuzzy set
theory, just as the calculation of conditional probability density functions or basic
probability assignments are crucial in the use of probabilistic or Dempster-Shafer
belief models.
along with approximations of them, as the basic building blocks for both contrast
enhancement and smoothing. Following Nakagawa and Rosenfeld [8), they
applied min and max operations on membership values in the neighborhood of
each pixel to produce smoothing or edge detection. Other approaches to edge
detection using fuzzy set methods can be found in [9,10).
One problem with this approach is that the parameters which defme the
membership functions must be supplied, primarily in an interactive fashion by
the user. Pal and Rosenfeld [11), in a two class segmentation problem,
automated this process by using several choices and picking the one which
optimized a certain geometric criterion which we will describe later. Recently,
we have used normalized histograms of the feature values generated from
training data to estimate the particular membership functions [12-14). This has
the advantages that it does not force any particular shape to the resultant
distributions, can be extended to deal with multiple features instead of gray level
alone, and can easily accommodate the addition of new classes.
E
;=1
u/k =1 for all k
E"
1=1
U/k> 0 for all i
1=1 i=1
(t
that xk .. Vi for all i, k, (u, V) may be a local minimum of 1m only if
U
it
= (I\XA: - V~IAl2/("'-l)l-1 (1)
}=1 I\XA: - V~IA
for all i, k, and
11
1
L
= 1
".
Uit Xl
Vi = (2)
11
L
1 • 1
Uit
m
for all i.
BEGIN
Set e, 2 ~ e < n
Set €, € ~ 0
Set m, 1 ~ m < 00
Initialize l.f>
Initialize j = 0
DO UNTIL ( IU j -U j-I 1 < € )
Increment j . .
Calculate {v/} using (2) and. U J-I
Compute rJ using (1) and {v/}
END DO UNTIL
END
SEGMENTATION
Image segmentation is one of the most critical components of the
computer vision process. Errors made in this stage will impact all higher level
activities. Therefore, methods which incorporate the uncertainty of object and
region definition and the faithfulness of the features to represent various objects
and regions are desirable.
All of the methods for converting image feature values into class mem-
bership numbers contain adjustable parameters: the cross-over point b for S and
11' functions, the fuzzifier m in the c-means, etc. Varying these parameters affects
the fmal fuzzy partition, and hence the ultimate crisp segmentation of the scene.
Also, the number of classes desired impacts the resultant distributions, since the
memberships are required to sum to one for a fuzzy c-partition. In some cases,
these problems are not serious. For example, many segmentation problems
involve separating an object from its background. Here the number of classes
is obviously two. However, in general situations, the choice of these parameters
must be carefully considered.
126
The basic approach which is taken to pick the number of classes and/or
the function shaping parameters iteratively varies these parameters and picks the
set of values which optimizes some measure of the final fuzzy partition. The
optimization criteria can be based on the geometry of the fuzzy subsets of the
image or on properties of the clusters in feature space.
For the fuzzy case, let I' be a mapping from X into [0,1), that is, let I'
be a fuzzy subset of X . Let P, Q E X . Then the degree of connectedness of P
and Q with respect to I' is
The fuzzy set J1. is said to be connected if every pair of points P, Q is connected
in J1..
This is just the weighted sum of the lengths of the arcsA ijk along which
the i-th and j-th regions having constant J.£ values J.£i and J.£j respectively meet,
weighted by the absolute difference of these values.
p(Jl) = .E.E
. =1 11=1
IJl/1111 -JlIII,II+11 + L .E IJlJIIII -Jl ...1.1I1·
"zl . =1
For crisp sets, the compactness is largest for a disk, where it is equal to
1/411". For a fuzzy disk where J.£ depends only on the distance from the origin
(center), it can be shown that
a(l1) ~.1...
p2(Jl) 4ft
In other words, of all possible fuzzy disks, the compactness is smallest for the
crisp version.
128
The a priori setting of the number of classes is not always possible, espe-
cially in segmentation of natural scenes. In such cases an algorithm called the
Unsupervised Fuzzy Partition-Optimum Number of Clusters (UFP-ONC) algo-
rithm [32] may be used. The UFP-ONC algorithm is derived from a combination
of the fuzzy c-means algorithm and the fuzzy maximum likelihood estimation
(FMLE). It attempts to obtain a satisfactory solution to the problem of large
variability in cluster shapes and densities, and to the problem of unsupervised
tracking of classification prototypes. There are no initial conditions on the
location of cluster centroids, and classification prototypes are identified during
a process of unsupervised learning [32]. The algorithm is essentially the same
as the FCM algorithm described in the previous section, except that the distance
measure defmed by
is used instead of the inner product norm. In (3) Pi is the fuzzy covariance
matrix of cluster i given by
= ~ U:(Xl -lI,) (Xl -lI,(
F, L.. ' (4)
I: ~ ..
L.. u/k
I:
(a) (b)
(c) (d)
. '. "'"
. ,:..,'\.
.'
or the 1-model) [39]. The innovative aspect of this work is that a backpropaga-
tion algorithm (and convergence theory) was developed so that both the type of
connective at each node, as well as the parameters associated with the connective
can be learned from training data [35, 36].
Figure 3a shows the original intensity image and Figure 3b shows the
segmented and labeled image when the 1-model was used as the aggregation
operator. The labels in increasing order of grey level are: road, tree, wall, roof,
grass, and sky. The results are excellent, considering the small number of
features used and the simplicity of the network employed. Note that most of the
misclassifications occur at areas where the true label is not any of the six labels
considered. This segmented image was improved by a shrink-and-expand
operator and this image is shown in Figure 3c. An important point here is that
this method not only partitions the image into connected components of similar
properties, but also labels these components. In other words, it produces both
a segmentation and a region recognition simultaneously, while capturing an
abstract model of the decision making process.
The fuzzy integral has also been used to fuse both objective information
from features and (possibly subjective) information on the importance of subsets
of features for segmentation in [5, 37]. This approach will be described in the
section on object and region recognition.
133
(a) (b)
(c)
BOUNDARY DETECTION
Boundary detection is another approach to segmentation. In this
approach, an edge operator is first used on the image to detect edge elements.
The edge elements so detected are considered to be part of the boundaries
between various objects or regions in the image. The boundaries are sometimes
described in terms of analytical curves such as straight lines, circles, and other
higher degree curves.
The FCM algorithm can be used to detect (or fit) straight lines to edge
elements. This is achieved by initializing the FCM with c linear prototypes
rather than c centers. Each linear prototype consists of a point (which acts as
cluster center) and a parameter defining the orientation of the cluster. The
fuzzy covariance matrix Fi of each cluster (as defmed in (4» may be used to
define its orientation since its principal eigenvector gives the direction of
maximum variance of the cluster. The c prototypes are updated in each iteration
as described in the previous section except that in each iteration the covariance
matrix of each cluster is also updated. Several distance measures may be used
for the detection of lines. One of them is defmed by
d 2(xl' Vi) = ex,D~ + (1-ex) (d~2
where D i/ is the distance of the point from the line and di / is the Enclidean
distance between xk and Vi' 0i is chosen as 1-(>'l;1>'2i)' where >'li and >'2i are
the smaller and larger eigenvalues of cluster i [41]. We have shown that the
scaled Mahalanobis distance given by
z~ = IFi 1112 (Xi - 1I,JTF,- 1 (X, - 11) (6)
is also very effective for the detection of lines or linear clusters. [42]. In (6) Pi
is the fuzzy covariance matrix of cluster i as defmed in (4). As mentioned
earlier, one problem with the FCM is that the number of clusters needs to be
specilled. In the line detection case, one way to overcome this is to specify a
relatively high value of c and then merge compatible clusters after the algorithm
converges [42]. Figure 4 shows an example of this method. Figure 4a shows the
original image. This image is equivalent to the threshold output of an edge
operator (such as the Sobel operator) on an intensity image of the characters
UMC. Figure 4b shows the clustering when c was speclled to be 14. Note that
the leading stroke of both the U and the M are split into two subclusters (in
some examples the initial cluster organization is much worse). Figure 4c shows
the clustering after compatible clusters are merged. The final optimal number
of clusters was determined to be 10, which is correct in this case. In this
implementation, two (or more) clusters were considered compatible if i) their
orientation was the same, ii) the line joining their centers had the same orien-
tation as the clusters and iii) the cluster centers were not more than 4 principal
eigenvalues apart. The lines so found by the algorithm can then be used to
describe large sections of the boundary or the linear substructures in the image.
135
jVlr
.... I.
.
iI ...
!,o Ir.\ ."J
.. "
IL
, .
,:- . ;
~·l
[ ~
.
I I)
: r.,.
., !
(a) (b)
.. II,!Vt r-
L
_.J I I
B
(c)
SuvposeXis a finite set, X = {Xl' ... ,xn }, and lett = g;,({xi}). Then the set
{g', ..., t'} is called the fuzzy density function for g;..
Using the above defmitions one can easily show that g;. can be
constructed from a fuzzy density function by
137
II (1 + 18~ - 1
8,'<..4.) = ....;z,u
_ _ _ __
1
It
for any subset A of X. Using the fact that X = U{Xi}' 1 can be determined
from the above equation. lEI
IfX= {x1' ...,xn }, is a finite set, arranged so that h(x1) ~ h(x2) ~ ... ~
h(xn ), then
f h(x)o8). = Y
II
[h(x) A 81(X,)]
X 1=1
where Xi = {xl' ... , xi}' Also, given>. as calculated above, the values g1 (Xi) can
be determined recursively from the definitions (46). The fuzzy integral is
interpreted as an evaluation of object classes where the subjectivity is embedded
in the fuzzy measure. In comparison with probability theory, the fuzzy integral
corresponds to the concept of expectation. In general, fuzzy integrals are
nonlinear functionals (although monotone) whereas ordinary (e.g., Lebesque)
integrals are linear functionals.
TABLE 1.
Pi IfXisAThen YisB
and Pi XisA,
140
P T : X -> [0,1].
For example, we can defme fuzzy truth value restrictions true, very true, false,
unknown, absolutely true, absolutely false, etc.
CONCLUSIONS
The use of fuzzy set theory is growing in computer vision as it is in all
intelligent processing. The representation capability is flexible and intuitively
pleasing, the combination schemes are mathematically justifiable and can be
tailored to the particular problem at hand from low level aggregation to high
level inferencing, and the results of the algorithms are excellent, producing not
only crisp decisions when necessary, but also corresponding degrees of support.
REFERENCES
2. S.K. Pal, and RA. King. "Image enhancement using smoothing with fuzzy
sets; IEEE Transactions on System. Man. and Cybernetics, Vol. SMC-ll,
1981, pp. 494-501.
3. S.K. Pal, and RA. King. "Histogram equalization with S and '" functions in
detecting x-ray edges", Electronics Letters, Vol. 17, 1981, pp. 302-304.
142
4. S.K. Pal, and RA. King. HOn edge detection of x-ray images using fuzzy sets,H
IEEE Transactions on Pattern Analysis and Machine Intelli~nce, Vol. PAMI-
5,1983, pp. 69-77.
8. Y. Nakagowa, and A. Rosenfeld, H A note on the use of local min and max
operators in digital picture processing,H IEEE Transactions on System. Man
and Gybernetics, Vol. SMC-8, 1978, pp. 632-635.
9. M.M. Gupta, G.K. Knopf, and P .N. Mikiforuk, HEdge Perception Using Fuzzy
LogicH, in Fum Coml'utin~ Theory Hardware and AJ!Plications, North
Holland, 1988.
11. S.K. Pal and A. Rosenfeld, HImage enhances and thresholding by optimization
of fuzzy compactnessH, Pattern Recwition Letters, vol. 7, 1988, pp. 77-86.
15. J.C. Dunn, A fuzzy relative of the Isodata process and its use in detecting
compact well-separated clusters, Journal Cybernet 31(3), 1974, pp. 32-57.
23. A Rosenfeld, "Fuzzy digital topology" Jnfonnation and Control, 40, 1979, pp.
76-87.
26. D. Dubois and M.C. Jaulent, "Shape understanding via fuzzy models",2.n4
IFACIIFIPIIFORSIIEA Conference on analysis. design and evaluation of
man-machine systems, 1985, pp. 302-307.
27. D. Dubois and M.C. Jaulent, "A general approach to parameter evaluations
in fuzzy digital pictures", Pattern Recognition Letters. to appear.
28. S. Peleg and A Rosenfeld, "A mini-max medial axis transformation, IEEE
Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-3,
1981, pp. 208-210.
30. A Rosenfeld and AC. Kak, Di~tal Picture Processing, Vol. 2, Academic
Press, N.Y., 1982.
32. I. Gath and A.B. Geva, "Unsupervised Optimal Fuzzy Clustering', IEEE
Transactions on Pattern Analysis Machine Intelli~nce, vol. PAMI-11, no. 7,
pp. m-781, July 1989.
33. J. Kel1er and Y. Seo, "Local fractal geometric features for image
segmentation", to appear International Journal oj Ima~n~ Systems and
Technology, 1990.
38. H. Tahani and J. Keller, "Information fusion in computer vision using the
fuzzy integral", IEEE Transactions on $.ystem, Mal! and Cybernetics, vol. 20,
no. 3, 1990, pp. 733-741.
42. c.-P. Freg, "Algorithms to detect linear and planar clusters and their applica-
tions", MS Project Report, University of Missouri-Columbia, May 1990.
46. M. Sugeno, "Fuzzy measures and fuzzy integrals: A survey", in Fum Auto-
matic and Decision Processes, North Holland, Amsterdam, 19n, pp. 89-102.
51. J. Keller, H. Shah, and F. Wong, "Fuzzy Computations in risk and decision
analysis", Civil EI!~neerin~ Systems, vol. 2, 1985, pp. 201-208.
52. J. Keller, M. Gray, and J. Givens, "A fuzzy k-nearest neighbor algorithm,"
IEEE Transactions on System. Man. and Cybernetics, vol. 15, 1985, pp. 580-
585.
54. A. Nafarieh and J. Keller, "A fuzzy logic rule-based automatic target
recognizer", International Journal oflntel/i~ent Systems to appear, 1990.
56. J. Keller and H. Tahani, "Backpropagation neural networks for fuzzy logic",
Infonnanon Sciences, to appear 1990.
Sankar K. Pal *
Software Technology Branch/PT4
National Aeronautics and Space Administration
Lyndon B. Johnson Space Center
Houston, Texas 77058, U.S.A.
INTRODUCTION
An application of the theory of fuzzy subsets to image processing and scene
analysis problems has been described here. The problems considered are
(pre)processing of 2-dimensional image pattern, extraction of primitives, and
recognition and interpretation of image.
A gray tone picture possesses some ambiguity within the pixels due to the
possible multi valued levels of brightness. The incertitude in an image may arise
from grayness ambiguity or spatial (geometrical) ambiguity or both. Grayness
ambiguity means "indefiniteness" in deciding a pixel as white or black. Spatial
ambiguity refers to "indefiniteness" in shape and geometry of a region e.g., where is
the boundary or edge of a region? or is this contour "sharp"?
When the regions in a image are ill-defined (fuzzy), it is natural and also
appropriate to avoid committing ourselves to a specific (hard) decision e.g.,
segmentation/thresholding and skeletonization by allowing the segments or skeletons
or contours, to be fuzzy subsets of the image. Similarly, for describing and
interpreting ill-defined structural information in a pattern (when the pattern in-
determinary is due to inherent vagueness rather than randomness), it is natural to
define primitives and relation among them using labels of fuzzy set For example,
primitives may be defined in terms of arcs with varying grades of membership from 0
to 1 and production rules of a grammar may be fuzzified to account for the fuzziness
in physical relation among the primitives; thereby increasing the generative power of
a grammar.
The first part of the article consists of a definition of an image in the light of
fuzzy set theory, and various information measures (arising from fuzziness) and tools
relevant for processing e.g., fuzzy geometrical properties, correlation, bound
functions and entropy measures. The second part provides formulation of various
algorithms along with management of uncertainties (ambiguities) for image
enhancement, edge detection, skeletonization, filtering, segmentation and object
extraction. Ambiguity in evaluation and assessment of membership function has
* Dr. Pal is on leave from the post of Professor in the Electronics and
Communication Sciences Unit, Indian Statistical Institute, Calcutta 700035,
India
148
also been described here. The third part describes the way of exuacting various fuzzy
primitives in order to describe the contours of different object regions of an image.
Finally the fuzzy grammars are used to demonsuate how syntactic algorithms can be
formulated for identifying different region structures! classes of patterns. The above
features have been illusuated through examples and various image data.
IMAGE DEFINITION
An image X of size MxN and L levels can be considered as an array of fuzzy
singletons, each having a value of membership denoting its degree of brightness
relative to some brightness levell , l =0, 1, 2, .. . L - 1. In the notation of fuzzy
sets, we may therefore write
X = {Ilx(x mn )= Ilmn/xmn; m =1, 2 . . . M ; =
n 1, 2, ... N} (1)
X = UU Ilmn/xmn, m = 1,2, .. . , M; n = I, 2, ... N
mn
where
denotes the grade ofpossessiog some property Ilmn (e.g., brightness, edginess,
smoothness) by the (m,n)th pixel intensity xmn. In other words, a fuzzy subset of
an image X is a mapping Il from X into [0, 1]. For any point p eX, Il(P) is
called the degree of membership of p in Il.
One may use either global or local information of an image in defming a
membership function characterizing some property. For example, brightness or
darkness property can be defined only in terms of gray value of a pixel xmn whereas,
edginess, darkness or textural property need the neighborhood information of a pixel
to defme their membership functions. Similarly, positional or co-ordinate
information is necessary, in addition to gray level and neighborhood information to
characterize a dynamic property of an image.
Again, the aforesaid information can be used in a number of ways (in their
various functional forms), depending on individuals opinion and/or the problem to
his hand, to defme a requisite membership function for an image property.
(2)
m n
m = 1, 2, ... M; n = 1, 2, ... N
149
Entropy
H(X) = (lfMN In 2)LLSn(J.lmn) (4)
mn
m = I, 2, . . . M; n = I, 2, . .. N
J.lmn denotes the degree of possessing some property J.l by the (m, n)th pixel
xmn . ,...
J.lmn denotes the nearest two tone version of J.lmn
(5)
i = I, 2, ... k
sequences. J.L{sf) denotes the degree to which the combination sf, as a whole,
possesses the property J.l.
Hybrid Entropy
Hhy(X) = -Pw log Ew - I\, log Eb (6)
Xl = II{21l1mn _1}2
with mn
mn
m = 1, 2, . . . M; n = 1, 2, . . . N
C(Jl1' ~ ) denotes the correlation between two properties III and ~ (dermed over
the same domain). Illmn and 1l2mn denote the degree of possessing the properties III
and ~ respectively by the (m, n)th pixel.
These expressions (equations 2-7) are the versions extended to two dimensional
image plane from those defined for a fuzzy set For example, index of fuzziness was
defined by Kaufmann [1], entropy by DeLuca and Termini [2], rth order entropy and
hybrid entropy by Pal and Pal [3], and correlation by Murthy, Pal and Dutta
Majumdar [4].
Index of fuzziness reflects the ambiguity present in an image by measuring the
distance between its fuzzy property plane and the nearest ordinary plane. The term
"entropy", on the other hand, uses Shannon's function in the property plane but its
meaning is quite different from the one of classical entropy because no probabilistic
concept is needed to define it HT(X) gives a measure of the average amount of
difficulty in taking a decision on any subset of size r with respect to an image
property. Ifr = 1, Hr(X) reduces to (unnormalized) H(X) of equation (4). Hhy(X)
represents an amount of difficulty in deciding whether a pixel possesses certain
properties or not by making a prevision on its probability of occurrence. In absence
of fuzziness (i.e.,with proper defuzzification), Hhy reduces to two state classical
entropy of Shannon, the states being black and white. Since a fuzzy set is a
generalized version of an ordinary set, the entropy of a fuzzy set deserves to be a
generalized version of classical entropy by taking into account not only the fuzziness
of the set but also the underlying probability structure. In that respect, Hhy can be
regarded as a generalized entropy such that classical entropy becomes its special case
when fuzziness is properly removed.
All these terms, which give an idea of 'indefiniteness' or fuzziness of an image
may be regarded as the measures of average intrinsic information which is received
when one has to make a decision (as in pattern analysis) in order to classify the
ensembles of patterns described by a fuzzy set.
reX) and H(X) are normalized in the interval [0, 1] such that
Pr1: rmin = Hmin = 0 for Ilmn = 0 for all (m,n)(X) (8a)
Pr2: rmax = Hmax = 1 for Ilmn = 0.5 for all (m,n) (8b)
o 0.5 1
Jl.
Figure 1 Variation of Fuzziness with Jl..
According to property 8(c). these parameters decrease with contrast enhancement
of an image. Now through processing, if we can partially remove the uncertainty on
the grey levels of X, we say that we have obtained an average amount of information
given by oy =y(X) - y(X *) or oH =H(X) - H(X *) by taking a decision bright or
dark on the pixels of X. The criteria y(X *) ~ y(X) and H(X *) ~ H(X), in order to
have positive oy and OH -values. follow from Eq. (8c). If the uncertainty is
completely removed, then y(X *) = H(X *) =O. In other words. Y(X) and H(X) can
be regarded as measures of the average amount of information (about the grey levels
of pixels) which has been lost by transforming the classical pattern (two-tone) into a
fuzzy pattern X.
It is to be noted that Y(X) or H(X) reduces to zero as long as Jl.mn is made 0 or 1
for all (m, n), no matter whether the resulting defuzzification (or transforming
process) is correct or not. In the following discussion it will be clear how l\y'
takes care of this situation.
Hf (X) has the following properties:
Pr 1: Hr attains a maximum if Jl.i =0.5 for all i.
Pr 2: Hr attains a minimum if Jl.i = 0 or 1 for all i.
152
Hhy(X) has the following properties. In the absence of fuzziness when MNPb
pixels become completely black (Jl.mn = 0) and MNPw pixels become completely
153
the states being black and white. Thus. Hhy reduces to He only when a proper
i.e .• Hhy takes a constant value and becomes independent of P wand Pb' This is
logical in the sense that the machine is unable to take decision on the pixels since all
Ilmn values are 0.5.
Let us consider an example of a digital image in which, say, 70% pixels look
white, while the remaining 30% look dark. Thus the probability of a white pixel Pw
is 0.7 and that of a dark pixel Pb is 0.3. Suppose, the whiteness of the pixels is not
constant, i.e., there is a variation (grayness) and similar is the case with the black
pixels.
Let us now consider the effect of improper defuzzification on the pattern shown
in case 1 of the Table 2. Two types of defuzzifications are considered here. In cases
2-4 all the symbols with Il = 0.5 are transformed to zero when some of them were
154
actually generated from symbol 'I'. In cases 5-6 of Table 2 some of the ~ values
greater than 0.5 which were generated from symbol 1 (or belong to the white portion
of the image) are wrongly defuzzified and brought down towards zero (instead of I).
In both situations, it is to be noted that IH - Hhyl does not reduce to zero. The case
7, on the other hand, has all its elements properly defuzzified. As a result, E1 and Eo
IMAGE GEOMETRY
The various geometrical properties of a fuzzy image subset (characterized by
~X(xmn) or simply by J.1) as defmed by Rosenfeld [5,6] and Pal and Ghosh [7] are
given below with illustration. These provide measures of ambiguity in geometry
(spatial domain) of an image.
155
comp(Jl) = a{Jl) 2
(p{Jl)) (16)
Physically, compactness means the fraction of maximum area (that can be encircled
by the perimeter) actually occupied by the object. In non fuzzy case the value of
compactness is maximum for a circle and is equal to 1t /4. In case of fuzzy disc,
where the membership value is only dependent on its distance from the center, this
compactness value is ~ 1t /4 [6]. Of all possible fuzzy discs compactness is
therefore minimum for its crisp version.
For the fuzzy subset Jl of example 1, comp(Jl) = 4.11(2.3*2.3) = 0 7. 75.
156
D. Height and Width The height of a fuzzy set J.1 is defined as [5]
h (J.1) = f max J.1mn dn
m [17]
where the integration is taken over a region outside which J.1mn =o.
Similarly the width of the fuzzy set is defined by
w{J.1) = f max J.1mndm
n (18)
with the same condition over integration as above. For digital pictures m and n can
take only discrete values, and since J.1 = 0 outside the bounded region, the max
operators are taken over a finite set. In this case the definitions take the form
h{J.1) = ; m:: J.1mn (19)
E. Length and Breadth The length of a fuzzy set J.1 is defmed as [7]
1(J.1) = max (fJ.1 mn dn) (21)
m
where the integration is taken over the region outside which J.1mn =O. In case of a
digital picture where m and n can take only discrete values the expression takes the
form
(22)
Physically speaking, the length of an image fuzzy subset gives its longest expansion
in the column direction. If J.1 is crisp, J.1mn =0 or I; in this case length is the
maximum number of pixels in a column. Comparing equation (22) with (19) we
notice that the length is different from height in the sense, the former takes the
summation of the entries in a column first and then maximizes over different
columns whereas, the later maximizes the entries in a column and then sums over
different columns.
The breadth of a fuzzy set J.1 is defmed as
b(J.1 ) = max (f J.1mn dm ) (23)
n
where the integration is taken over the region outside which J.1mn =O. In case of a
digital picture the expression takes the form
157
(24)
Physically speaking, the breadth of an image fuzzy subset gives its longest
expansion in the row direction. If J.1 is crisp, J.1mn = 0 or I; in this case breadth is
the maximum number of pixels in a row. The difference between width and breadth
is same as that between height and length.
For the fuzzy subset J.1 in example I, length is 1{J.1) = 0.4 + 0.7 + 0.5 = 1.6 and
breadth is b{J.1) = 0.6 + 0.5 + 0 .6 = 1.7.
F. Index ofArea Coverage (lOAC) The index of area coverage of a fuzzy set may be
defmed as [7]
area(J.1)
IOAC(J.1) = 1(J.1) * b(J.1) (25)
In nonfuzzy case, the IOAC has value of 1 for a rectangle (placed along the axes of
measurement). For a circle this value is m 2 / (2r * 2r) = 1t/ 4 . Physically by
IOAC of a fuzzy image we mean the fraction (which may be improper also) of the
maximum area (that can be covered by the length and breadth of the image) actually
covered by the image.
For the fuzzy subset J.1 of example I, the maximum area that can be covered by
its length and breadth is 1.6*1.7 = 2.72 whereas, the actual area is 4.1, so the IOAC
= 4.1/2.72 = 1.51.
It is to be noted that 1 (X)/h (X)::; 1 (26)
b (X)/w (X) ::; 1 (27)
When equality holds for (26) or (27) the object is either vertically or horizontally
oriented.
G. Degree of Adjacency The degree to which two regions S and T of an image are
adjacent is defined as
a(S,T)= I. 1 * 1
p € BP(S) 1 + !J.1(p)-r(q)! l+d(p) (28)
Here d(P) is the shortest distance between p and q, q is a border pixel (BP) ofT and p
is a border pixel of S. The other symbols are having their same meaning as in the
previous discussion.
The degree of adjacency of two regions is maximum (= 1) only when they are
physically adjacent i.e., d(p)=O and their membership values are also equal i.e., J.1(P) =
r(q). If two regions are physically adjacent then their degree of adjacency is
detennined only by the difference of their membership values. Similarly, if the
membership values of two regions are equal their degree of adjacency is detennined by
their physical distance only.
IMAGE PROCESSING OPERATIONS
In this section we will be explaining how the various grayness and geometrical
ambiguity measures can be used for image enhancement, segmentation, edge
detection and skeleton extraction problems. The algorithms which will be described
here provide both fuzzy and non fuzzy (as a special case) outputs.
158
u
;,
.
• 0.5
~
><
;;)
Xmn
Figure 2 Standard S function for an L-Ievel image.
For a particular cross-over point, say, b = lc we have Ilx(lc) = 0.5 and the Ilmn
plane would contain values> 0.5 or < 0.5 corresponding to xmn > lc or < lc' The
159
tenns y(X) and H(X) then measure the average ambiguity in X by computing
)..LXnX (x mn ) or Sn{1l x (xmn)) which is 0 if 11 x (xmn) =0 or 1 and is maximum for
IlX{xmn) =0.5.
The selection of a cross-over point at b =Ic implies the allocation of grey levels
< I and > I within the two clusters namely, background and object of a binodal
c c
image. The contribution of the levels towards y(X) and H(X) is mostly from those
around Ic and would decrease as we move away from Ic . Again, since the nearest
ordinary plane ?f (which gives the two-tone version of X) is dependent on the
position of cross-over point. a proper selection of b may therefore be obtained which
will result in appropriate segmentation of object and background. In other words, if
the grey level of image X has binodal distribution, then the above criteria for different
values of b would result in a minimum Y or H value only when b corresponds to the
appropriate boundary between the two clusters.
For such a position of the threshold (cross-over point), there will be minimum
number of pixel intensities in X having 1.1mn =- 0.5 (resulting in Y or H =- 1) and
maximum number of pixel intensities having 1.1mn =- 0 or l(resulting in Y or H =-
0) thus contributing least towards y(X) or H(X). This optimum (minimum) value
would be greater for any other selection of the cross-over point.
This suggests that modification of the cross-over point will result in variation of
the parameters y(X) and H(X) and so an optimum threshold may be estimated for
automatic histogram-thresholding problems without the need to refer directly to the
histogram of X. The above concept can also be extended to an image having
multimodal distribution in grey levels in which one would have several minima in Y
and H values corresponding to different threshold points in the histogram.
Let us now consider the geometrical parameters comp(X) and IOAC(X)
(equations 16 and 25). It has been noticed that for crisp sets the value of index of
area coverage (IOAC) is maximum for a rectangle. Again, of all possible fuzzy
rectangles IOAC is minimum for its crisp version. Similarly, in a nonfuzzy case the
compactness is maximum for a circle and of all possible fuzzy discs compactness is
minimum for its crisp version (6). For this reason, we will use minimization (rather
than maximization) of fuzzy compactness/lOAC as a criterion for image
segmentation (9).
Suppose we use equation (29) for obtaining the 'bright image' )..L(X) of an image
X. Then for a particular cross over point of S function, compacUless (J..L) and
IOAC{J..L) reflect the average amount of ambiguity in the geometry (i.e., in spatial
domain) of X. Therefore, modification of the cross over point will result in different
)..L(X) planes (and hence different segmented versions), with varying amount of
compactness or IOAC denoting fuzziness in the spatial domain. The )..L(X) plane
having minimum IOAC or compactness value can be regarded as an optimum fuzzy
segmented version of X.
For obtaining the nonfuzzy threshold one may take the cross over point (which
is considered to be the maximum ambiguous level) as the threshold between object
and background. For images having multiple regions, one would have a set of such
160
optimum J.l.(X) planes. The algorithm developed using these criteria is given below.
Algorithm 1
From the algorithm 1 it appears that one needs to scan an L level image L times
(corresponding to L cross over points of the membership function) for computing the
parameters for detecting its threshold. The time of computation can be reduced
significantly by scanning it only once for computing its co-occurrence matrix, row
histogram and column histogram, and by computing J.l.(I), I = 1 , 2,... L every time
with the membership function of a particular cross over point
The computations of y(X) (or H(X», a(X), P(X), 1(X) and b(X) can be made
faster in the following way. Let h(i), i= 1,2..L be the number of occurrences of the
level i, c[ij], i = 1,2 .. L, j = 1,2 .. L the co-occurrence matrix and J.l.(i), i = 1,2..
L the membership vector for a fixed cross over point of an L level image X.
Determine y(X) , area and perimeter as
Y(X)=~ fT(i)h(i)
MN i=l (3Oa)
T(i)= min{J.l.(i),I-J.l.(i)} (30b)
161
L
a{X) = r. h{i) .1J.{i)
(31)
i=l
L L
p{X) =i:l j:lC[i. j]. 11J.{i) -1J.(j)1 (32)
For calculating length and breadth following steps can be used. Compute the
row histogram R[m. 1]. m = 1•... M. I = 1 .. L. where R[m. 1] represents the
number of occurrences of the gray level 1 in the mth row of the image. Find the
column histogram C[n. 1]. n = 1. . N, I = 1 .. L. where C[n. 1] represents the
number of occurrences of the gray level 1 in the nth column of the image. Calculate
length and breadth as
L
I(X)=max L C[n. 1]. ~(l)
(33)
n 1=1
L
b(X) = max L R[m. 1]. J.l.(l} (34)
m 1=1
Some Remarks
The grayness ambiguity measure e.g .• y(X} or H(X) basically sharpens the
histogram of X using its global information only and it detects a single threshold in
its valley region. Therefore. if the histogram does not have a valley. the above
measures will not be able to select a threshold for partitioning the histogram. This
can readily be seen from Equation (30) which shows that the minima of y(X}
measure will only correspond to those regions of gray level which has minimum
occurrences (Le.• valley region). Comp (X) or IOAC(X). on the other hand. uses
local information to determine the fuzziness in spatial domain of an image. As a
result. these are expected to result better segmentation by detecting thresholds even in
the absence of a valley in the histogram.
Again. comp(X) measure attempts to make a circular approximation of the
object region for its extraction. whereas. the IOAC(X) goes by the rectangular
approximation. Their suitability to an image should therefore be guided by this
criterion.
Choice of Membership Function
In the aforesaid algorithm w =2~b is the length of the interval which is shifted
over the entire dynamic range of gray scale. As w decreases. the ~(xmn) plane would
have more intensified contrast around the cross-over point resulting in decrease of
ambiguity in X. As a result, the possibility of detecting some undesirable thresholds
(spurious minima) increases because of the smaller value of fl b. On the other hand.
increase of w results in a higher value of fuzziness and thus leads towards the
possibility of losing some of the weak minima.
The criteria regarding the selection of membership function and the length of
window (Le.• w) have been reported recently by Murthy and Pal [10] assuming
continuous function for both histogram and membership function. For a fuzzy set
162
"bright image plane", the membership function Jl: [0, w] ~ [0,1] should be such
that
i) Jl is continuous, Jl(O) =0, Jl(w) = 1
ii) Jl is montominally non-decreasing, and
iii) Jl(x) = 1- Jl(w-x) for all x E [0, w]where w>o is the length of the window.
Furthermore, Jl should satisfy the bound criteria derived based on the correlation
measure (equation 7). The main properties on which correlation was formulated are
PI: If for higher values of Jl 1, ~ takes higher values and for lower values
of Jl 1, Jl2 also takes lower values then C{Jll' Jl2) > 0
P 2: If Jl 1i and ~ i then C{Jll' Jl 2) > 0
P3: If Jl 1i and ~ J. then C{Jll' Jl 2) < 0
[i denotes increases and J. denotes decreases].
It is to be mentioned that P2 and P3 should not be considered in isolation of Pl.
Had this been the case, one can cite several examples when Jl 1i and ~ i but C{Jll'
Jl2) < 0 and Jl 1i and ~ J. but C{Jll' Jl2) > O. Subsequently, the type of
membership functions which should not be considered in fuzzy set theory are
categorized with the help of correlation. Bound functions hI and h2 are accordingly
derived [11]. They are
(35)
= x- E, ES XS 1
h2(X)=X+E.OSxSl-E (36)
=1. 1- ES X S 1
where E =0.25. The bounds for membership function Jl are such that
hI (x) ~ Jl(x) ~ h2(X) for x E [0,1].
For x belonging to any arbitrary interval, the bound functions will be changed
proportionately. For hI ~Jl~h2, C(hl.h2)~O ,C(hl.Il)~O andC(h2.1l)~O.
The function Jllying in between hI and ~ does not have most of its variation
concentrated (i) in a very small interval, (ii) towards one of the end points of the
interval under consideration and (iii) towards both the end points of the interval under
consideration.
Figure 3 shows such bound functions. It is to be noted that Zadeh's standard S
function (equation 29) satisfies these bounds.
It has been shown [10] that for detecting a minimum in the valley region of a
histogram, the window length w of the Jl function should be less than the distance
between two peaks around that valley region.
163
o .1 l. ~
4 2 4
x
Figure 3 Bound Functions for J..L(x).
Hr as an Objective Criterion
Let us now explain another way of extracting object by minimizing higher order
fuzzy entropy (equation 5) of both object and background regions. Before explaining
the algorithm, let us describe the membership function and its selection procedure.
Let s be an assumed threshold which partitions the image X into two parts
namely, object and background. Suppose the gray level ranges [I - s] and [s + I - L]
denote, respectively, the object and background of the image X. An inverse 1t-type
function as shown by the solid line in the Figure 4 is used here to obtain J..L mn values
of X. The inverse 1t-type function is seen (from Fig. 4) to be generated by taking
union of S(x ; (s - (L - s», s, L) and I - S(x; I, s, (s + s - I», where S denotes the
standard S function defined by Zadeh (equation 29).
The resulting function as shown by the solid line, makes J..L lie in [0.5,1]. Since
the ambiguity (difficulty) in deciding a level as a member of the object or the
background is maximum for the boundary level S, it has been assigned a membership
value of 0.5 (i.e., cross-over point). Ambiguity decreases (Le., degree of
belongingness to either object or background increases) as the gray value moves away
from s on either side. The J..Lmn thus obtained denotes the degree of belongingness of
a pixel xmn to either object or background
Since s is not necessarily the mid point of the entire gray scale, the membership
function (solid line if Fig. 4) may not be a symmmetric one. It is further to be noted
that one may use any linear or nonlinear equation (instead of Zadeh's standard S
function) to represent the membership function in Fig. 4. Unlike the Algorithm-I,
the membership function does not need any parameter selection to control the output.
Algorithm 2
Assume a threshold s, I ~ s ~ L and execute the following steps.
Step 1: Apply an inverse 1t - type function [Fig. 4] to get the fuzzy J..Lmn plane,
164
O.S to - - - - - ..I - - - - - t f - - - - - - - - - + -
L
,
Step 2: Compute the rth order fuzzy entropy of the object Ho and the background
HS considering only the spatially adjacent sequences of pixels present within the
object and background respectively. Use the 'min' operator to get the membership
value of a sequence of pixels.
Step 3: Compute the total rth order fuzzy entropy of the partitioned image as
H~ =Hb+Hs.
Step 4: Minimize H~ with respect to s to get the threshold for object background
classification.
Referring back to the Table I, we have seen that H2 reflects the homogeneity
among the supports in a set, in a better way than HI does. Higher the value of r, the
stronger is the validity of this fact. Thus, considering the problem of object-
background classification, Hr seems to be more sensitive (as r increases) to the
selection of appropriate threshold; i.e., the improper selection of the threshold is
more strongly reflected by Hr than Hr- I For example, the thresholds obtained by H2
measure has more validity than those by HI (which only takes into account the
histogram infonnation). Similar arguments hold good for even higher order (r > 2)
entropy.
Example 2
Figures 5 and 6 show the images of Lincoln and blurred chromosome along with
the histogram. Table 3 shows the thresholds obtained by comp (X) and IOAC (X)
measures for various window sizes w when Zadeh's S function is used as membership
function. Lincoln image is of 64x64 with 32 gray levels whereas, chromosome
image is of 64x64 with 64 gray levels.
165
h(l)
o 1 31
FIgure 5(a) Input. Figure 5(b) Histogram
h(l)
o I 63
Figure 6(a) Input. Figure 6(b) Histogram.
166
Image Enhancement
The object of enhancement technique is to process a given image so that the
result is more suitable than the original for a specific application. The term 'specific'
is of course, problem oriented. The techniques used here are based on the
modification of pixels in the fuzzy property domain of an image. Three kinds of
enhancement operations namely, contrast enhancement, smoothing and edge detection
will be discussed here.
The contrast intensification operator on a fuzzy set A generates another fuzzy set
A' = INT (A) in which the fuzziness is reduced by increasing the values of J.lA(x)
which are above 0.5 and decreasing those which are below it. Define this INT
operator by a transformation Tl of the membership function J.lrnn or Prnn as
(40)
where the position of cross-over points, bandwidth and hence the symmetry of the
x
curves are determined by the fuzzifiers Fe and Fd . When = xmax (maximum level
in X), Ilmn represents an S type function. When x = any arbitrary level I., Ilmn
represents a 1t type function. Zadeh's standard functions do not have the provision
for controlling its cross-over point. The parameters Fe and Fd of equation (40) are
determined from the cross-over point across which contrast enhancement is desired.
After enhancement in the fuzzy property domain, the enhanced spatial domain
x:OO may be obtained from
xmn
I
=0-1(Il'mn) '
a S Il mn S 1
I (41)
where a is the value of Ilmn when xmn=0.
1.0 r-----r--~~.,.
Il ~ 0.5 I - - - - - - - - ' I I I - - - - - - - f
o 0.5
Ilmn
Figure 8 !NT Transformation function for contract enhancement in property plane.
Smoothing Algorithm
The idea of smoothing is based on the property that image points which are
spatially close to each other tend to possess nearly equal grey levels. Smoothing of
an image X may be obtained by using q successive applications of 'min' and then
'max' operators within neighbors such that the smoothed grey level value of (m, n)th
pixel is [13,14]
x' = max q minq{x"}
mn Ql Ql 1J' (42)
(i,j)~(m , n),(i,j)eQl, q=l, 2 ...
Smoothing operation blurs the image by attenuating the high spatial frequency
169
components associated with edges and other abrupt changes in grey levels. The
higher the values of Q I and q, the greater is the degree of blurring.
Edge Detection
If x:nn denotes the edge intensity corresponding to a pixel xmn then edges of the
image are defined as [13,15]
Edges ~U U x~
mn (43a)
where
x~n = IXmn - min{Xij}1 (43b)
Q
or,
x~ =Ixmn -max{xij}1 (43c)
Q
Edginess Measure
3
Nx,y = {(x, y), (x-1.y), (x+I,y), (x,y-I), (x,y+I), (x-I, y-l),
The edge-entropy, HE
x,y
of the pixel (x, y), giving a measure of edginess at (x,
y) may be computed as follows. For every pixel (x, y), compute the average,
maximum and minimum values of gray levels over N3x,y . Let us denote the average,
maximum and minimum values by Avg, Max, Min respectively. Now define the
following parameters.
170
e Ni,y ,such that Il(A) =1l(C) =0.5 and Il(B) =1. It is to be noted that J.l xy ;:: 0.5.
Such a J.lxy ' ti}erefore, gives the degree to which a gray level is close to the average
value computed over N!.y. In other words. it represents a fuzzy set "pixel intensity
close to its average value", averaged over N~.y. When all pixel values over N~.y
are either equal or close to each other (i.e., they are within the same region), such a
transformation will make all Jlxy =I or close to I. In other words. if there is no
edge, pixel values will be close to each other and the Il values will be close to
one(l); thus resulting in a low value of HI. On the other hand, if there is an edge
(dissimilarity in gray values over N~.y). then the Jl values will be more away from
unity; thus resulting in a high value of HI . Therefore, the entropy HI over Ni. oy
can be viewed as a measure of edginess (H;,y) at the point (x, y). The higher the
value of H;.y, the stronger is the edge intensity and the easier is its detection. As
mentioned before. there are several ways in which one can define a 1t-type function as
shown in Fig. 9.
0.5 - - -
I \
'.
l ,
O____.ce~/_______________________
,/ '_.".._______
A B c
Figure 9 1t function for computing edge entropy.
The proposed entropic measure is less sensitive to noise because of the use of a
171
.-_.,,_ -
-
---
.
-
--- .'
..
.-- -
.- - _-..
. _ . _ II
..-.
-0- ._'-'
~~( . ;.:..~-
Input
Fuzzy Skeletonization
The problem of skeletonization or thinning plays a key role in image analysis
and recognition because of the simplicity of object representation it allows. Let us
now explain a skeletonization technique [23] based on minimization of compactness
property over the fuzzy core line plane. The output is fuzzy and one may obtain its
nonfuzzy (crisp) single pixel width version by retaining only those pixels which have
strong skeleton-membership value compared to their neighbors.
After obtaining a fuzzy segmented version (as described before) of the input
image X. the membership function of a pixel denoting the degree of its belonging to
the subset 'Core line' (skeleton) is determined by three factors. These include the
properties of possessing maximum intensity. and occupying vertically and
horizontally middle positions from the edges (pixels beyond which the membership
value in the fuzzy segmented image is zero) of the object
Let xmax be the maximum pixel intensity in the image and Po(x mn ) be the
function which assigns the degree of possessing maximum brightness to the (m,n)th
pixel. Then the simplest way to derme Po(x mn ) is
Po(x mn ) =xmn/xmax' (49)
It is to be mentioned here that one may use other monotonically nondecreasing
functions to define Po(x) with a flexibility of varying cross-over point. Equation 49
is the simplest one with fixed cross-over point at xmax 12.
Let Xl and "2 be the distances of xmn from the left and right edges respectively.
(The distance being measured by the number of units separating the pixel under
consideration from the first background pixel along that direction). Then Ph(x mn)
denoting the degree of occupying the horizontally central position in the object is
defined as
2x2
= -.,..---=-~
Xl(Xl +X2) (50)
pv(X mn } = 1!.
Y2
= Y2
YI
= 2Y2 ifd(YI,Y2}>landYI>Y2,
YI (YI + Y2) (51)
Equations (50,51) assign high values ( ..1.0) for pixels near core and low values to
pixels away from the core. The factor (xl + x2) or (y I + Y2) in the denominator
takes into consideration the extent of the object segment so that there is an
appreciable amount of changes in the property value for the pixels not belonging to
the core.
These primary membership functions Po,Ph and Pv may be combined as either
Jlc( xmn) = max {min(Po,Ph), min(Po ,pv )' min(Ph ,pv )} (52)
or Jlc(X mn } = WIPo + W2 Ph + W3Pv (53a)
with wI + w2 + w3 = I (53b)
to defme the grade of belonging of x to the subset 'Core Line' of the image.
mn
Equation (52) involves connective properties using max and min operators such
that Jlc = I when at least two of the three primary properties take values of unity.
All the three primary membership values are given equal weight in computing the Jlc
value. Equation (53), on the other hand, involves a weighted sum (weights being
denoted by WI' W2 and W3)· Usually, one can consider the weight WI attributed to
Po (property corresponding to pixel intensity) to be higher than the other two and
W2 =W3.
Equation (52) or (53) therefore extracts (using both gray level and spatial
information) the subset 'Core line' such that the membership value decreases as one
moves away towards the edges (boundary) of object regions.
Oplimum a.-CUI
Given the Jlc(xmn) plane developed in the previous stage with the pixels having
been assigned values indicating their degree of membership to 'Core line', the
optimum (in the sense of minimizing ambiguity in geometry or in spatial domain)
skeleton can be extracted from one of its a.-cuts having minimum comp{Jl) value
(Eq. (16». The a.-cut of Jlc(xmn) is defined as
174
(54)
Modification of a will therefore result in different fuzzy skeleton planes with varying
comp(J.l) value. As a increases, the comp(J.l) value initially decreases to a certain
minimum and then for a further increase in a, the comp(J.l) measure increases.
The initial decrease in comp(J.l) value can be explained by observing that for
every value of a, the border pixels having J.1-values less than a are not taken into
consideration. So, both area (Eq. (13» and perimeter (Eq. (14» are less than those for
the previous value of a. But the decrease in area is more than the decrease in its
perimeter and hence the compactness (Eq. 16) decreases (initially) to a certain
minimum corresponding to a value (l = a',say.
Further increase in a (i,e,. for a> a'), results in a J.1 Ca plane consisting of a
number of disconnected regions (because majority of core line pixels being dropped
out). As a result, decrease in perimeter here is more than the decrease in area and
comp(J.l) increases. The J.1 Ca' plane having minimum compactness value can be
taken as an optimum fuzzy skeleton version of the image X. This is optimum in the
sense that for any other selection of a (i,e,. (l::p (l') the comp(J.l) value would be
greater.
If a nonfuzzy (crisp) single-pixel width skeleton is deserved, it can be obtained
by a contour tracking algorithm [24] which takes into account the direction of
contour, multiple crossing pixels, lost path due to spurious wiggles etc. based on
octal chain code.
Fig. 11 shows the optimum fuzzy skeleton of biplane image (Fig. 10). This
corresponds to a =0.55. The connectivity of the skeleton in the optimum version
can be preserved, if necessary, by inserting pixels having intensity equal to the
minimum of those of pairs of neighbors in the object.
PRIMITIVE EXTRACTION
In picture recognition and scene analysis problems, the structural information is
very abundant and important, and the recognition process includes not only the
capability to assign the input pattern to a pattern class, but also the capacity to
describe the characteristics of the pattern that make it ineligible for assignment to
another class. In these cases the recognition requirement can only be satisfied by a
description of pattern-rather than by classification.
In such cases complex patterns are described as hierarchical or tree-like structures
of simpler subpatterns and each simpler subpattern is again described in terms of even
simpler subpatterns and so on. Evidently, for this approach to be advantageous, the
simplest subpatterns, called pattern primitives are to be selected.
Another activity which needs attention in this connection is the subject of shape-
analysis that has become an important subject in its own right Shape analysis is of
primal importance in feature/primitive selection and extraction problems. Shape
analysis also has two approaches, namely, description of shape in terms of scalar
measurements and through structural descriptions. In this connection, it needs to be
mentioned that shape description algorithms should be information-preserving in the
sense that is is possible to reconstruct the shapes with some reasonable
approximation from the descriptors.
This section presents a method [24] to demonstrate an application of the theory
of fuzzy sets in automatic description and primitive extraction of gray-tone edge-
detected images. The ultimate aim is to recognize the pattern using syntactic
approach as described in the next section.
The method described here provides a natural way of viewing the primitives in
terms of arcs with varying grades of membership from 0 to 1.
Encoding
7 r.------r---~:rt 1 v x
6 .....---~~--...... 2
H
5 4 3
Figure 12 The directions of octal codes. Figure 13 Membership function for
vertical and horizontal lines.
~(x) ~ 1 as Ie I ~ 0 0 ,
J.!.ob(x) ~ 1 as 191 ~ 45 0 ,
The next task before extraction of primitives and description of contours is the
process of segmentation of the octal coded strings. Splitting up of a chain is
dependent on the constant increase/decrease in code values. For extracting an arc, the
string is segmented at a position whenever a decrease/increase after constant
increase/decrease in values of codes is found [13]. Again, if the number of codes
between two successive changes exceeds a prespecified limit, a straight line is said to
exist between two curves. In the case of a closed curve, a provision may be kept for
increasing the length of the chain by adding first two starting codes to the tail of the
string. This enables one to take the continuity of the chain into account in order to
reflect its proper segmentation [13].
After segmentation one needs to provide a measure of curvature along with
direction of the different arcs and also to measure the length of lines in order to
extract the primitives. The degree of 'arcness' of a line segment x is obtained using
the function
177
-- ---
." --:.
- --
.
.... . . ---:.
..
:
--
'c
_ . _._1- _~_,_
x ~ ~.
-----
~-------a--------~
Figure 14 Membership function for arc. FIgure 15 Nuclear Pattern of brain cell.
(58)
a is the length of the line joining the two extreme points of an arc x (Figure 14), l
is the arc-length such that the lower the ratio a/ l is, the higher is the degree of
'arcness'.
For example, consider a sequence of codes 5 6 6 7 denoting an arc x. For
computing its l note that if a code represents an oblique line, the corresponding
increase in arc-length would be .fi, otherwise increase is by unity. Arc diameter a is
computed by measuring the resulting shifts ~m and ~n of spatial coordinates (along
mth and nth axes) due to those codes in question. For the aforesaid example we have
~ = 1+0+0+-1=0,
M=-I-I-I-I=-4,
Example 3
To explain the aforesaid features, let us consider the Fig. 15 showing a two-tone
contour of nuclear pattern of brain neurosecretory cells [25]. The string descriptions
of Fig. 15 in terms of arcs (of different arcness) and lines are shown below.
178
derivation chain, k =1(1)m; and fik is the label of the ith production used in the kth
derivation chain, i = I, 2, ... , lk'
Clearly, if a production ex ~ ~ be visualized as a chain link of strength J.1.(r), r
being the label of ex ~ ~ , then the strength of a derivation chain is the strength of its
weakest link, and hence
J.1.L(FG)(X) =strength of the strongest derivation chain from
the primitives a, b, c being 'horizontal', 'vertical' and 'oblique' directed line segments
respectively; the terms 'horizontal; 'vertical' and 'oblique' are taken to be fuzzy with
membership values JlH, Jly and Jlob respectively as dermed in the previous section.
Further, the concatenation considered is of the 'head-tail' type. Hence the only
string generated is X = abc which is in reality a triangle having membership
JlL( FG 2)(abc) = min(JlH(a),Jly(b),Jlob(C))
which attains its maximum value 1 when abc is an isosceles right triangle. Thus
L(FGV is the fuzzy set of isosceles right triangles.
The membership of a pattern triangle given in Fig. 16b is
min (1.0, 1.0, 0.66)=0.66
...
•
t: / )- (0)
(b)
d
•
Figure 16 (a) Primitive (b) Production of Triangle and Letter B.
Example 6: Consider the following fuzzy grammar for generating the fuzzy set
representing the English upper case letter B
y N = (S, A, B, C, D)
YT= (a,b)
where the primitive a denotes a directed 'vertical' (fuzzy) line segment and b denotes a
directed arc (clockwise). The concatenation considered here is again of the 'head-tail'
type.
Also J, P and Jl are as foUows
r 1: S ~ aB Jl(r l )= Jly(a)
r2: B ~ aC Jl(r2) = Jly(a)
r3: C ~ bD Jl(r3) = JlCir(b)
r4: D~ b Jl(r4) = JlCir(b)
The string generated is X=aabb having the following membership in set B.
JlB(X) = min (Jly(a)l' Jly(a)u' JlCir(b)I' Jlci/b)u)
where the suffices l and u denote the locations ('lower' and 'upper) of the primitives
a and b.
For the pattern given in the Fig. 16b
Jly(a)u = Jly(a)l = 0.83, JlCir(a)u = 0.36, JlCi/a)1 = 0.5
181
Acknowledgements
This work was done while the author held an NRC-NASA research
Associateship at the Johnson Space Center, Houston, Texas. The author gratefully
acknowledges Dr. Robert N. Lea for his interest in this work, Ms. Dianne Rader,
Ms. Kim Herhold, and Mr. Todd Carlson for typing the manuscript and Mr. Albert
Leigh for his assistance in getting some results.
182
References
20. N.R Pal. On Image Information Measure and Object Extraction. Ph. D.
Thesis. Indian Statistical Institute. Calcutta. India. March 1990.
21. S.K. Pal and N.R Pal, Higher order entropy. hybrid entropy and their
applications. Proc. INDO-US Workshop on Spectrum analysis in one and
two dimensions. Nov 27-29.1990. New Delhi, NBH Oxford Publishing
Co. New Delhi (to appear).
22. S.K. Pal. A measure of edge ambiguity using fuzzy sets. Patt. Recog. Lett .•
vol. 4. pp. 51-56.1986.
23. S.K. Pal. Fuzzy skeletonization of an image. Patt. Recog. Lett .• vol. 10.
pp. 17-23. 1989.
24. S.K. Pal. RA. King and A.A. Hashim. Image description and primitive
extraction using fuzzy set, IEEE Trans. Syst .• Man and Cyberns.• vol.
SMC-13. pp. 94-100, 1983.
25. S.K. Pal and A. Bhattacharyya, Pattern recognition technique in analyzing
the effect of thiourea on brain neurosecretory cells, Patt. Recog. Lett., vol.
11. pp. 443-452, 1990.
26. G.F. DePalma and S.S. Yau, Fractionally fuzzy grammars with applications
to pattern recognition, in [8], pp. 329-351.
27. A. Pathak and S.K. Pal, Fuzzy grammars in syntactic recognition of
skeletal maturity from X-rays,IEEE Trans. Syst.. Man and Cyberns., vol.
SMC-16, pp. 657-667. 1986.
28. K.S. Fu. Syntactic Pattern Recognition and Applications, Prentice-Hall.
N.J., 1982.
8
FUZZY SETS IN NATURAL
LANGUAGE PROCESSING
Vilem Novak
Czechoslovak Academy of Sciences,
Mining Institute, Studentsk' 1768,
708 00 Ostrava-Poruba, Czechoslovakia
1 INTRODUCTION
2 FUZZY SETS
If the property tp is aimple and sharp then the clus X forma I. eet.
However, most properties I. man meets iD the world are not of this kind. Then
the class X is Dot separeted sharply, i.e. there is no wt.y how to nt.me or
imagine all the object. z from X without any doubt whethf>r I. given object z
haa the property tp, or llOt. Thus, we encouDtered the phenomenon ofVlpeness.
The t.bove melltioned doubt, which probt.bly stems from the illDer, still Dot
understood, complexity of tp is I. core of the pheDomeDoD of Vlpenes. being
encontereci. Clusical mt.themt.tiea haa no other pOllibllity than to model the
grouping X using (sharp) sets. Therefore, the result cannot be .t.tiefa.ctory
from the very beginDing. UDlike clusical eet theory, fUD)' set theory t.Uempts
t.t finding I. more suitt.ble model of the class (1).
Let us take the objtcts z from lOme sufficiently big eet U called the
unif1tr,t. Note tht.t this usumption is not reatrictive since such I. eet alwe.ys
exist. FOr eXlLmple, colllider the property tp :e'o Ie 0 ,moll numltr'. Then
there surely exist. I. number ~ e N which is not small (e.g. ~ == 210 and we me.y
put U .. {z e N;z ~ ~}).
Our doubt whether an object z e U haa the given property tp cn be
expresaed by means of I. certain scale L haviDg the smallest 0 and pt.teet 1
elements, respectively. Thus, 1 expresses thlLt tp(z) (z haa the property tp) with
DO doubt while 0 means that tp(z) does Dot hold t.t all. We obtlLill I. fuction
A:U-L (2)
usigDing an element
beL
aD,
I A .1ftt..... i. part of a MIItftee (nft a word or a whole MIItftee) that it eo...tnaeted
aceordin, to tIM sramatieal ruIet.
2 A Datural Dumber, for timplicit,.
187
(biresiduation) and
0' - ..II ® ...
....
® II.,
,-timea
(power) for all the 0,6 e< 0,1 >.
When introducing a new ft - or, operation 0 on L, the following jitein,
e,ndiC',n must be fulilled: there are '1, ... ". such that
holds for every G,,6, e L, i-I, ... , ft. The justification ofthe fitting condition
can be found in (16,15). Note that all the basic operations fulfil the fiUlng
condition. Moreover, the folowing hold, true:
Theorem 1 All 'Ae 'lae ,peroCi,n. derived lrom 'Ae ,per,di,n. /Vllillin, 'Ae
jiltin, e,ndi,i,n Iflllili', tI'
welL
hunded .um
c,ncen'ro",n
diltl'ion •
in'en,ijica,i,n
() {
2G2 G e < 0, 0.5 >
INT G - 1 _ 2(1 _ .... )2 0 e (0.u, 1 >
If
-
a funy set C C U to Ah ... ,A. when we put
(G)
for every z e U.
For example, we can define the operation of bounded BUm of f1lZ~y sets
by putting
C = AI:IB if Cz - Az$Bz
for every z e U.
A very important notion is that of a funy tONinoli', of a funy set.
-
There are several kinds of them (13,22). We will use the following ones defined
for funy sets with finite support: Aholu'e juu, eONinGli', of A C U:
where
a. = V{ft;Card(A,) = n}
-
and A, is a {J cut of A. RelGtive JuZ%, eGNinGlit, of A with respect to B where
A,B C U:
FCardA(B) - {ad';' e Be} (8)
where
The notions illtroduced above will be used in tile sequel. For other
notion. ud operation. with {uny set. see e.g. (13,4).
natural language (FGD),(aee (18» Ave levels are dlft'eretiated, namely ,Aonelic
(PH) (how a sentence is composed u a .y.tem of souds), ,A,nemic (PM)
(how word. of a sentence are composed), morphemic (MR) (how a llentence is
composed of ita words), '''r/oc, "n'" (88) (the ')'Item of grammatical rules)
and tectogrammatical (TR) which is the highest level correspondinl to the Be-
mlJttics. The latter is also called the lee, ,'rud",.,
of the sentence and this
structure i. the objective of poeaible application of fulll)' eet theory• .M hu al-
ready been .tated, word. and more complex .yntagma of natllrallangaage can
be udel'8tood to be names of propertics enocountered by a man in the world.
In the lilht of the previous section, fuzzy set. can be 118ed u follows: let A be
a syntagm of nat1lrallangaage and rp the correspondinl property. If the clua
(1) determined by rp is approximated by a fun)' eet A £ U then the meaninl
M(A) of A is
M(A) = A. (9)
Thu., our job consists in determininl of the membenhip fuction A.
However, the situation i. by no' meanl simple u not every word of
naturallangaage coreaponda to such a property and, above all, there are vari0118
relations between words. Thus, determination of the membenhip function which
corresponds to a complex syntagm may be a very complicated tuk.
On the tectogrammaticallevel, a meaning of a sentence is represented u
a complex dependency .tructure which can be depicted in the fonn of a labelled
graph. For example, the eentence
M(S) - S,S £ U.
191
;7 Ad~
(Write,')
Appun G,n,r
(He,C) (ShortJ)
What is the uiwrte U? It is a set of objects choeen in such a way that whenever
an object z hu the property f/ls Damed by tile 1l0UD 8 tIlen z E U. This co
be COll.tructed e.g. u follow.: Let K be a set or
generic e1emellts called the
keme1 space. For example, K C&D be a ullion of all the objects described in our
dictionary,or thOle we have ~garded during the last week, of those we see in
our tat etc. In short, K .Ilould contain all tile sp«ific object. we haw met or
or
imagine. Let F(K) be a set all the fuzzy aet. on K and put
Let EK be the smallest set closed with respect to all the Cartma
powers of K, or r(K),. - 1, ••• ,. and all the Cartesian produeta of these
elementa. Thia set is called the "monCie 'poee. Then the universe of S is a
sufficiently big subset U ~ EK.
A certain problem is the determination of tile membership function.
There are sewn! metllods propoeed jn the liter&tu~ (d. [13]). The member-
,hip function correaponding to the object noUDS (e.g. 'oJI" eor, doni" etc.)
could be constructed Oil the basi, of the outer cltaracteriaties of elementa. For
example, we may use proportioDl of lOme geometric patterna contained in ob-
jecta etc. Avery often used method ia atatiatical analy.ia of expert (subjectiw)
eatimation.. Several experiment. have been deacribed in the literature. Let UI
mention that fuzzy methoda are rather robu.t andthaa exact determination of
the membership function ia not as important u it might seem at first glance.
192
The experience SU""ts that even individual estimation worb well when it is
done carefully and seriously.
In practical applications, e.g. in artificial intelligence, it is not very
useful to model the meaning of nouns because we would have to find proper
representation of its elements in the computer which, in fact, we do not need. The
moet successful applications are bued on modelling of the meaning of adjectives
and the syntagms of the form
M(.A-) = A-
M(,A+) = A+
such that SuppA- ~< m,.) and SuppA+ ~ (.,11 > where SuppA := {z e
UiAz > O}.' We will often call ,A- a nef.';'" and .A+ a pDI;I;ve adjective,
respectively. There are also couples of antonyms nch that a third member .A0
exist. Its meaning is
M(,A°) _ AO
where the membership function AO hu the property
1.0 1.0
0.5 0.5
1.0
0.5
42 U
A general formul1l. for &ll the three funy sets is the following:
o if Z < 41 or Z > 42
1 if Cl ~ Z ~ C2
1
:a
(,!::.!1..)
tt-e1
3
if 41 ~ Z < 61
A typical example, widely used in fuzzy set theory, is the modifier ver,
defined as follows:
Vnr ,(4) - CON(4), 4 e< 0,1 >
ud
' ..r,(Z) = Z + (-1)" . d·1I Ker(A) "
where Ie - 1 for A+ or for AO if Z ~ I, ud Ie :I:: 2 for A- or for AO if Z ~ ••
II Ker(A) II is the length of the interval < int(Ker(A»,sup(Ker(A» >. The
puameter d was experimentally estim1l.ted to be & number d e< 0.25,0.40, >.
Examples of some other linguistic modifiers cu be found in the lite~
ture.
The me1l.ning of verbs is & very complic1l.teci problem ud so fu, only
the copula "to be" in .yntagma as
where p and q are ayntagma of the form (12). In fuzzy .et theory we llIually put
applied areas of fuzzy eet theory (lee e.g. (8,9)). Let us remark that many
authors interpret (14) as the Cartesian product
In the applicationl of the approximate reasoning, thi, may work since all the
fuzzy methodl are very robut. However, puUing the meaning of the implication
(14) equal to the Cartesian product is linguistically as well as logically incorrect
lince (16) it symmetric and the implication il not. Another reason why (16)
often worb in the practical applicationl may be the fact that the implication
(14) often describes only some kind of a relation between the input and output
and it it not, in fact, understood to be the implication. This discrepancy need.
Itill more analysil.
Let us alto mention the problem oflinguiatic quantiAers (we will denote
them by the letter Q). They do not form a uniform group form the linguiltic
point of view. We place among them numerals including the indeAnite ous (e.g.
,et1ertJ~,some adverbs (e.g. mlln" Jev, mo"),some pronoulll (e.g. e,err),some
noulll (e.g. majo"'" mino"',) and others. From the point of fuzzy eet theory,
their meaning generally il a fuzzy number, i.e. a fuzzy set M(Q) = Q £ Be in
the real line. It il propoeed in (22) how to interpret the syntagma of the form
(17)
or
QA'. are 8' •. (18)
In (17), the quantiAer Q it interpreted as a fuzzy characterization of the absolute
fuzzy cardinality (7) of the fuzzy set
A-M(A)
while in (18), it characterizes the relative fuzzy cardinality (8) of M(A) with
respect to B = M(B). More exactly, we pat
and
(20)
However, the problem il not fiDiahed yet lince (19) and (20) have eenle only for
fuzzy eetl with Anite IUPPOrt.
In the literature on fuzzy let theory (lee e.g. [19,20,22,13) and others),
one may bd the semantics of the compound Iyntagma of the form
A and B (21)
197
and
.A 0,. B (22)
deADed using the operations of interaection and union of fuzzy seta, respectively.
However, thil il only a tentative IOlution since the Iyntagms (21) and (22) are
Ipecial cues of the very complicated phenomenon bown in linguistics u the
eotmlina,ion. The use of the operationl of intenection and union of fuzy setl
may work in lOme special cue. of the dOle coordination in Iyntagms e.g. Pd,,.
ani Pltd •••, Oil ani lirt, lIJr ••• etc.
Even wone situation il encountered with negation. The simple ue
of the operation of complement of fuzzy sets works only with lome kinds of
adjective. and nouns. However, negation contained in more complex Iyntagms
il inodental to the phenomenon of the 'op;e-Ioea, art;,.da'ion when only focus is
being negated. FUlly comprehensive description of this phnomenon in linguistics
is not, however, still done.
(24)
is minimal for .ome suitable, previlOusly set number p. This method is often
llIed in thecnical application ••
A quite eft'ective procedure wu proposed by P. Eeragh and E. H. Mam-
du.i in (&J. This procedure i. suitable for the .yntagme of the form (21) u.d (22)
where .A u.d B may consist of an adjective, a n01lD u.d a linguistic modifier,
and the universe U i. ordered. According to this method, the membership func-
tion A~ is divided into parts by the eft'ective turniag points (i.e. special points
where the membership fu:action chu.p it. course), the parts are approximated
by the above partial syntagme .A, B uing (24) u.d the resulting syntagm is
obtained by joining .A and B uaing the corresponding connective. In particular,
if the two neighbouring part. of the membership function form a "hill' then the
corresponding syntagme are joined by the connective anti, u.d if they form a
"valley" then they are joined by the connective Dr. Note that this works only
in the cue when the connective an~ is interpreted u the intersection u.d Dr u
the union of fuzzy eets.
199
Ii CONCLUSION
References
(lJ Be'Zdek, J. (ed.), Analys. of Fu ••y Infonnation - Vol. 1: Math&
rnatica and Logic, eRe Press, Boca Raton, Fl. 1987.
(2) Bezdek, J. (ed.), Analysis of Fussy Information - Vol. ~: ArtiAdal
Intelligence and Decision Syatell18, eRe Preu, Boca Raton, Fl. 1987.
(3J Bezdek, J. (ed.), Analysis of Fos.y Information - Vol. as Applica-
tions in Engineering and Science, eRe Press, Boca Raton, Fl. 1987.
(4J Dubois, D., Prade, H., Fussy Seta and Systema:Theory and Appli-
cations, Academic Press, New York 1980.
(6) Eeragh, F., Mamdani, E.H. A ,eneralappreaela to lin,u •• t'e appres'mal'on,
Int. J. Man-Mach. Stud., 11(1979), &01-619.
(6) Gaines, B.a, Boose, J.U. (edI.), Machine Learning and Uncertain
ReMOning, Academic Preas, London 1990.
(7) Girdenfora, P. (ed.), Generalised qoantiAen, D. Reidel, Dordecht, 1987.
(8J Gupta, M.M., Yamakawa, T. (eds.), Fussy Compotings Theory, Hard-
ware and AppHcations, North-Holland, Amsterdam 1988.
200
By Waldemar Karwowski
Center for Industrial Ergonomics
University of Louisville
Louisville, KY 40292, USA
and
Gavriel Salvendy
School of Industrial Engineering
Purdue University
West Lafayette, IN 47907, USA
INTRODUCTION
According to Harre (1972) there are two major purposes of models in
science: 1) logical, which enables to make certain inferences which would not
otherwise be possible to be made; and 2) epistemiological, to express and extend
our knowledge of the world. Models are helpful for explanation and theory
formation, as well as simplication and concretization. Zimmermann (1980)
classifies models into three groups: 1) formal models (purely axiomatic systems
with purely fictitious hypotheses), 2) factual models (conclusions from the models
have a bearing on reality and they have to be verified by empirical evidence), and 3)
prescriptive models (which postulate rules according to which people should
behave). The quality of a model depends on the properties of the model and the
functions for which the model is designed (Zimmermann, 1980). In general, good
models must have three major properties: 1) formal consistency (all conclusions
follow from the hypothesis), 2) usefulness, and 3) efficiency (the model should
fulfill the desired function at a minimum effort. time and cost).
Although the usefulness of the mathematical language for modeling
purposes is undisputed, there are limits of the possibility of using the classical
mathematical language which is based on the dichotomous character of set theory
(Zimmermann, 1980). Such restriction applies especially to the man-machine
systems. This is due to vagueness of the natural language, and the fact that in
empirical research natural language cannot be substituted by formal languages.
Formal languages are rather simple and poor, and are useful only for specific
purposes. Mathematics and logic as research languages widely applied today in
202
natural sciences and engineering are not very useful for modeling purposes in
behavioral sciences and especially in human factors studies. Rather, a new
methodology, based on the theory of fuzzy sets and systems is needed to account
for the ever present fuzziness of man-machine systems.
As suggested by Smithson (1982), the potential advantages for
applications of a fuzzy approach in human sciences are: I) fuzziness, itself, may be
a useful metaphor or model for human language and categorizing processes, and 2)
fuzzy mathematics may be able to augment conventional statistical techniques in
the analysis of fuzzy data. Fuzzy methods are useful supplements for statistical
techniques such as reliability analysis and regressions, and structurally oriented
methods such as hierarchical clustering and multidimensional scaling.
HUMAN FACTORS
Human factors discipline is concerned with "the consideration of human
characteristics, expectations, and behaviors in the design of the things people use
in their work and everyday lives and of the environments in which they work and
live" (McCormick, 1970). The "things" that are designed are complex man-
machine systems. According to Pew and Baron (1983) the ultimate reasons for
building models in general, and man-machine models in particular, are to provide
for:
scientists can learn much from the engineering and physical sciences."
Research techniques applied in man-machine research typically include the
following methods: 1) direct observation (operator opinions, activity sampling
techniques, process analysis, etc.), 2) accident study method (risk analysis, critical-
incident technique) 3) statistical methods, 4) experimental methods (design of
experiments), 5) psychophysical methods (psychophysical scaling and
measurement), and 6) articulation testing methods (Chapanis, 1959). Today we are
stiU at the beginning stage of building robust mathematical models for the analysis
of complex human-machine systems. This is partially due to lack of appropriate
design theory, as well as complexity of human behavior (Topmiller, 1981). The
human being is too complex a "system" to be fully understood or describable in all
his/her properties, limits, tolerances, and performance capabilities, and no
comprehensive mathematical tool has been available up to now to describe and
integrate all the above mentioned measures and findings about human behavior
(Bemotat, 1984).
FUZZY MODELS
Human work taxonomy can be used to describe five different levels
ranging from primarily physical tasks to primarily information processing tasks
(Rohmert, 1979). These are:
+ .-
PERCEIVED TASK Hl..NANOPERATOA CftRATOA caAA..EX
,
&MRCNAENT ~ INTERPRETATJa.I ~ RESf'CH)E
~ WOA< SYSTEMS
(FUZZINESS) (HUMAN FtN;TIONING) (FUZZINESS)
~
t
PERCEIVED PERCEIVED
TASK DEMAtm '--- TASK WOR<lOAD 4-
L ~
IIaM)
AND INTUITION
--1
The concept of fuzzy set extends the range of membership values for fx•
and allows graded membership. usually defined on an interval [0. 1] .
Consequently. an element may belong to a set with a certain degree of
membership. not necessarily 0 or 1. The "excluded middle" concept is then
abandoned, and more flexibility is given in specifying the characteristic function.
In view of the above, the mathematical logic can also be modified. Interestingly.
the classical logic was actually extended as early as 1930 by Lukasiewicz. who
proposed the inrmite-valued logic. As stated by Giles (1981). "Lukasiewicz logic
is exactly appropriate for the fonnulation of the 'fuzzy set theory' first described by
Zadeh; indeed, it is not too much to claim that is related to fuzzy set theory exactly
as classical logic is related to ordinary set theory."
The theory of fuzzy sets has been successfuUy applied in the modeling of
ill-defined systems in a variety of disciplines (cognitive psychology, information
processing and control, decision-making sciences. biological and medical sciences,
sociology and linguistics. image processing and pattern recognition. and artificial
inteUigence).
Willaeys and Malvache (1979) investigated the perception of visual and
207
The subject did not know the file to be edited. The task was performed
from the subject's own office and desk. The subject was familiar with and
regularly used the VI screen editor.
Knowledge elicitation
The knowledge engineers can use sample runs to infer the rules by which
the subjects select their preferred methods of editing text. An additional benefit
from a GOMS perspective would be in structuring knowledge elicitation. For
example, the expert could be prompted to present the methods and the selection
rules and respond in the following manner: "IF the condition X exists and the
condition Y exists, THEN use method Z." For example, while performing a
task, the subject could be asked to describe why he chose a particular method:
[Subject: The word (to be changed) is more than half of a screen down. so I
will use the control-D method and then return-key to the word."}
[Knowledge
Engineer: "How strongly do you feel that it is more than half!"/
[Subject: "Very strong. say 0.8."}
The actual distance to the word was measured directly and found to be, 39
lines. So the degree of membership of belonging to the more than half class was
0.8 for 39 lines. By having the subject perform many tasks while verbalizing the
rules, the methods used, and membership of fuzzy quantifiers can be found.
cursor to the left side of the page and down one line; 4) Arrow Up or Down:
moves the cursor directly up or down; and 5) Pattern Search: places the cursor on
the first occurrence of the pattern.
The subject verbalized five cursor placement rules and seven fuzzy
descriptors. The following rules were used: 1) If the word is more than half of a
screen from the cursor and on the same screen or if the word is more than half of a
screen from the cursor and across the printed page then use method #1; 2) If the
word is more than 70 lines and the pattern is not distinct then use method #2; 3)
If the word is less than half of a screen and on the left half of the page use method
#3; 4) If the word is less than half of a screen and on the right half of the page
use method #4; and 5) If the word is distinct and more than 70 lines away use
method #5.
An example of the compatibility functions for the "right hand side of the
screen" descriptor elicited in the experiment is given in Figure 2. The knowledge
engineer assumed that the subject did not have the perfect cognitive ability to
divide a screen directly in half, and rather elicited the knowledge as fuzzy
knowledge. For all descriptors, the membership functions were perceived numbers
of lines or characters, except the distinct and non-distinct descriptors. The distinct
and non-distinct descriptors were given as counts of failed pattern recognitions and
served, basically, to predict the patience of the user.
Q.
X
1
0.9
0.8
1
_ _L
!
t- - - I-
I ~ --
. , l
j
CI)
I I
0:: I i i , j i
t&J 0.7
ID
~ 0.6
I I ! i I
.- .-
I
.
I
/ !
1 1 i :
t&J
~ 0':;
I
i
I
i 1/1 i; i I i
I
!
, I
I
I I , j . I iJ I
I I
I
I
I
i
l&.
0
0.4
i
, , : I ~ I i
003
t&J
0 0.2
I
I ! !
II
~ J ......10"'"
"
0. 1
0
.Y ..
6 7 8 9 10 11 12 13 14 IS 16 17 18 19 20 21 22 2J 24 2S
Figure 2. Fuzzy descriptor for the "right hand side of the screen"
(after Karwowski et aI., 1990).
213
where Xx (u) is the possibility distribution induced by the proposition (X is Z), and
A is a fuzzy set in the universe U.
The following sub-task is used to illustrate the process of predicting the
rule selection based on the linguistic inexactness of expert's actions. Sub-Task:
Move down 27 lines to a position in column 20. The following rules
(R) apply:
Rule #3:Membership value of less lhan half of lhe screen = OJ, and
Membership value of left hand side of lhe line = 0.4
[The possibility that the rule applies is 0.3 and 0.4.]
possibilistic measure of uncertainty that the subject would use Rule #1, i.e. the
CONTROL-D method. All fuzzy model predictions in the experiment were
checked against the selection rule decisions made by the sUbjects.
8 15 3 3,4 3
27 20 3 1 3
12 14 4 4 3
21 20 4 4 3
44 21 5 1 1
11 24 1 3 3
10 29 3 3 3
31 29 1 1 3
26 18 1 1 3
7 24 4 4 3
29 22 1 1 3
101 25 2 2 2
100 22 5 5 5
7 5 4 3 3
4 42 4 4 4
70 21 1 1 1
12 20 4 4 3
It was also observed that the use of fuzzy concepts seemed very natural
within the knowledge elicitation process. It seemed much easier to ask for fuzzy
memberships in the linguistic terms, than it would be to tty and ascertain exact cut-
offs for selection rules. This observation supports the results of the study by
215
This was not ideal because another rule was noted (but not verbalized): "If there is
a 'very distinct' word 'near' the word to be located, search for that pattern instead. "
Table 3 shows the word number, distinctness ratings, methods actually
used, and the non-fuzzy model predictions (differentiated by using the concept of
distinctness to predict the subject's keystrokes). It is obvious that in the case of
subject #2, the results were not conclusive, and did not imply that fuzziness helps
in the GOMS modeling. The non-fuzzy model correctly predicted 55% of the
keystrokes, while the fuzzy model predicted 60% of the keystrokes. This low
rating may be due to the fact that the rules elicited were not those used, and that the
relationship between concept of distinctness and the methods used could depend on
the distance to the searched word
1 34 Y 0.75 S S* S*
2 12 Y 0.8 S S* S*
3 52 Y 0.75 SN S S
4 116 N 0.6 SN SN* S
5 8 N 0.3 S SN SN
6 30 N 0.35 S SN SN
7 44 N 0.65 S SN S*
8 118 Y 0.4 S S* SN
9 12 N 0.55 S SN S*
10 54 N 0 SN SN* SN*
11 13 N 0 SN SN* SN*
12 16 N 0.35 S SN SN
13 171 N 0 SN SN* SN*
14 25 N 0.35 S SN SN
15 4 N 0 SN SN* SN*
16 4 Y 0.4 S S* SN
17 38 N 0.6 S SN S*
18 198 Y 0.45 SN S SN*
19 16 N 0 SN SN* SN*
20 14 Y 0.8 S S* S*
1 20 11 (55.0%) 12 (60.0%)
2 74 35 (47.0%) 63 (85.1%)
3 26 19 (73.0%) 22 (84.6%)
4 27 19 (70.4%) 21 (76.9%)
5 153 92 (60.1%) 129 (84.3%)
CONCLUSIONS
Fuzzy methodologies can be very useful in the analysis and design of man-
machines systems in general, and human-computer interaction systems in
panicular, by allowing to model vague and imprecise relationship between the user
and computer. In order for this premise to succeed, one must identify the sources
of fuzziness in the data and communication schemes relevant to the human-
computer interaction. By incorporating the concept of fuzziness and linguistic
inexactness based on possibility theory into the model of system performance,
better performance prediction for human-computer system may be achieved.
The imprecision-tolerant communication scheme for human-computer interaction
218
ACKNOWLEDGEMENTS
We are indebted to Mrs. Laura Abell, Secretary at the Center for Industrial
Ergonomics, University of Louisville, for her work on preparation of the
manuscript
REFERENCES
AUDLEY, R. J., ROUSE, W., SENDERS, T., and SHERIDAN, T. 1979, Final
report of mathematical modelling group, in N. Moray (ed.), Mental WorlcJoad.
Its Theory and Measurement. (plenum Press, New York), 269-285.
BENSON, W. H. 1982, in Fuzzy Sets and Possibility Theory. R. R. Yager (ed.),
(pergamon Press, New York).
BERNOTAT, R. 1984, Generation of ergonomic data and their application to
equipment design, in H. Schmidtke (ed.), ErgonorrUc Data for Equipment
Design. (plenum Press, New York), 57-75.
BEZDEK, J. 1981, Pattern Recognition with Fuzzy Objective Function
Algorithms (plenum Press: New York).
BOY, G. A., and KUSS, P. M. 1986, A fuzzy method for modeling of human-
computer interactions in information retrieval tasks, in W. Karwowski and
A. Mital (eds.), Applications of Fuzzy Set Theory in Human Factors.
(Elsevier: Amsterdam),117-133.
BROWNELL, H. H. and CARAMAZZA, A. 1978, Categorizing with overlapping
categories, Memory and Cognition, 6, 481490.
CARD, S. K.,MORAN, T. P., and NEWELL, A. 1983, The Psychology of
Human-Computer Interaction (London: Lawrence Erlbaum Associates.
219
LAKOFF, H. 1973, A study in meaning criteria and the logic of fuzzy concepts,
Journal of Philosophical Logic, 2, 458-508.
MAMDANI, E. H., and GAINES, B. R., (Editors), 1981, Fuzzy Reasoning and Its
Applications, (Academie Press: London).
MCCLOSKEY, M. E. and GLUCKSBERG, S. 1978, Memory and Cognition. (;,
·462-472.
MCCORMICK, E. J. 1970, Human Factors Engineering, (McGraw Hill, New
York).
ODEN, G. C. 1977, Human perception and performance, Journal of Experimental
Psychology. 3, 565-575.
PEW, R. W. and BARON, S. 1983, Automatica. 19,663-676.
ROHMERT, W. 1979, in N. Moray (Ed.), Mental Workload, Its Theory and
Measurement. (plenum Press, New York), 481.
SAATY, S. L., 1977, Exploring the interface between hierarchies, multiple
objectives and fuzzy sets, Fuzzy Sets and Systems. 1,57-68.
SCHMUCKER, K. J. 1984, Fuzzy Sets. Natural Language Computations. and
Risk Analysis. (Computer Science Press, Maryland).
SIMCOX, W. A. 1984, A method for pragmatic communication in graphic
displays, Human Factors, U, 483-487.
SINGLETON, W. T. 1982, The Body at Work. Biological Ergonomics.
(University Press, Cambridge).
SMITHSON, M. 1982, Applications of fuzzy set concepts to behavioral sciences,
Mathematical Social Sciences. 2,257-274.
SMITHSON, M. 1987, Fuzzy Set Analysis for Behavioral and Social Sciences
(Springer-Verlag: New York).
TERANO, T., MURAYAMA, Y., AKUAMA, N. 1983, Human reliability and
safety evaluation of man-machine systems, Automatica. 19, 719-722.
TOPMILLER, D.A. 1981, in Manned Systems Design: Methods, Equipment and
Applications, J. Moraal and K. F. Kraiss (eds.), (plenum Press, New York), 3-
21.
WILLAEYS, D. and MALVACHE, N. 1979, in Advances on Fuzzy Set Theory
and Applications. in M. M. Gupta, R. K. Ragade and R. R. Yager (eds.),
(North Holland, Amsterdam).
ZADEH, L. A. 1965, Fuzzy sets, Information and Control. 8, 338-353.
ZADEH, L. A. 1973, Outline of a new approach to the analysis of complex
systems and decision processes, IEEE Trans. Systems. Man. and Cybernetics,
SMC·3, 28-44.
ZADEH, L. A. 1974, Numerical versus linguistic variables, Newspaper of the
Circuits and Systems Society, 7, 3-4.
ZADEH, L. A. 1978, Fuzzy sets as a basis for a theory of possibility, Fuzzy
Sets and Systems: 1, 3.
ZIMMERMANN, H. J. 1980, Testability and meaning of mathematical models in
social sciences, Mathematical Modeling. 1. 123-139.
ZIMMERMANN, H. J. 1985, Fuzzy Set Theory and Its Applications, (Kluwer-
Nijhoff Publishing, Boston).
10
QUESTIONNAIRES AND FUZZINESS
Bernadette Bouchon-Meunier
CNRS, LAFORIA, Universite Paris VI, Tour 46
4 place Jussieu, 75252 Paris Cedex OS, France
INTRODUCTION
Questionnaires represent hierarchical processes disjoining the elements of a
given set by using successive tests or operators [12]. They involve the probabilities
of the results of the tests, or the probabilities of the modalities of the operators. In the
case where the tests or operators depend on imprecise factors, such as the accuracy of
physical measurements or the linguistic description of variables, the questionnaires
take into account coefficients evaluating the fuzziness of the data. The construction
of such questionnaires is submitted to several kinds of constraints and requires
appropriate algorithms.
When the involved tests or operators are not precisely described, arborescent
questionnaires must take into account both uncertainty and imprecision and they
must lead to conclusions which are acceptable in spite of the imprecision. We present
here several utilizations of questionnaires in a fuzzy framework.
=
We also consider a set Q {qt .... , qm} of so-called "questions". which
represent tests or operators. A question qj is a link between a linguistic variable ~
defined on a universe Ui and a family ofa(i) labels. denoted by qi I ..... qi8 (i) • and
associated with possibility distributions fi 1..... fi a(i). defined on Ui and lying in [0.
1] (see Figurel).
223
~dis~
f..a(i)
I
t ,--
ILL-
II . I IL
redness of the skin redness of the skin
Example of continuous possibility distributions
f.) f.a(i)
pallo! ~...
normality
pal..!. ! .!wshoess
normality
Example of discrete possibility distributions
Figure!
Two different types of problems can be regarded, depending on the fact that the
questions of Q are deterministic or not with regard to the elements of D.
- either there is a possibilisitic relationship between lists of answers to questions of
Q and the elements of D, yielding the possibility of d E D to be concerned in a
studied situation, according to the obtained answers, and the certainty we can have
in this assertion [7]. We construct a questionnaire by successively choosing
questions of Q bringing as much information as possible on the elements of D and
we stop asking new questions when an element of D is sufficiently well identified
(selective construction).
- or there is a precise relationship between lists of answers to questions of Q and
elements of D, and we construct a questionnaire by ordering the questions of Q in
224
such a way that every element of D can be associated with a terminal vertex of the
questionnaire (holistic construction) [1, 3].
Let us suppose given the possibility 7r(dj / qik) that we are in front of the case
dj of D, for 1 ~ j ~ n, when we obtain the label qik for question qi' for 1 ~ i ~ m, 1
~ k ~ a(i). As there is no absolute certainty that this answer implies that dj must be
identified, we also suppose given the necessity N(dj / qik) quantifying this certainty.
We can also suppose given some knowledge about the fact that the element
dj can be thought of, when an answer different from qik is obtained to question qi :
let 1T(dj /..., qik) and N(dj /..., qik) denote the possibility and the certainty that dj is
acceptable when qik is not obtained. If these values are not precisely known, they
will be replaced [10] by the interval [0, 1] to which they belong.
We fix thresholds sand t in [0, 1], defining the acceptable values [s, 1] and
\
[t, 1] for the lowest acceptable possibility and the lowest acceptable certainty of an
element of D to be satisfying when given labels are obtained for a question.
diagnosis assistance. in species identification for instance. The first step corresponds
to the construction of the sequence of questions providing the best recognition of
classes on a training set of examples. the second step is associated with the
identification of the convenient class for an example not belonging to the training
set.
The first question to be asked will be qi' for I ~ i ~ m, processing the most
efficient information about the elements of D and we propose to evaluate this
efficiency by means of the avera~ certainty Cer(qi) provided by qi on the
recognition of any element of D
Then. the first question to be asked will be qi such that Cer(qi) is maximum.
d1
cr ,
IPoisoning 1
~
Figure 2
about D, with regard to all its possible labels, or, equivalently, which gives the
highest absolute certainty C( qi ).
~:
For a new given particular situation co' an element of D must be identified
from the answers to the various questions of the questionnaire we have
constructed.
227
As the labels associated with every question are not precise. we must accept
that an answer is provided in a way somewhat different from the expression we
expect in the list of authorized labels. Let us denote by q'i the label obtained as an
answer to question qi' 1 ~ i ~ m. more or less different from all the qik. 1 ~ k ~ a(i).
and by gi the possibility distribution describing q'i • defined on Ui and lying in [0.
1]. (See Figure 3)
Figure 3
The compatibility of this answer q'j with one of the labels qjk proposed for qj•• with
l~~a(i). is measured by the classical possibility and necessity measureS of
adeqyation [9] respectively denoted by 7[( qi~ q'i) and N( qik.. q'i)'
situation co' according to the proximity of its answer with qik • and the certainty of
this assertion Nk(dj ). This evaluation will be performed for the labels qik such that
7r{ qik ; q'i) ~ s and N( qik ; q'i) ~ t. It is then possible to have several sequences of
questions to use. i.e. several pathes of the questionnaire to follow before the
recognition of a particular element of D.
The element djo will be definitely identified for the situation co' by means
of the sequence of questions Sf' if there exists labels Xl k( I) •.•.• Xrk(r) yielding Pos
d, I k(l)
(io Xl .···.xr
k(r)
)~s
and
Ncc(djo IXI
k(l)
.···.xrk(r\.
'}~t.
Let us suppose that we want to use all the tests or operators of Q. and we
have to order them in such a way that the questionnaire we construct associates an
element of D with each terminal node. The questionnaire could be arborescent or
not. We suppose that Q and D are compatible. which means that such a construction
is possible.
The problem we consider is the choice of the questions providing the most
efficient questionnaire with regard to the recognition to make. Its quality can be
evaluated [3] with respect to the fuzziness which is involved in the characterizations
deduced from the fuzzy tests or operators. and improved. when several
constructions of questionnaires are possible. by an appropriate choice of a the order
229
of some questions when possible. Applications can be found in search trees, in
species identification, for instance.
Several aspects of such a choice can be proposed [3, 4, 6] and we propose
one method hereunder.
Let us suppose that the labels qik associated with the questions qi of Q are
conveniently defined in such a way that they determine a fuZU partition IIi of the
universe Ui on which the concerned linguistic variable ~ is defined. The classes of
this fuzzy partition are fuzzy subsets of Ui defined by membership functions equal
to fik, lilla(i), in every point of Ui . We suppose given the probability
distribution Pi of the variable Xi' for the studied population.
The problem we consider is the identification of a crisp (non-fuzzy) partition
of Ui, able to represent the information contained in IIi. We may think of several
applications of this problem: in knowledge acquisition, if the training set deals
with crisp data and then non-fuzzy tests or operators, and the new examples are
described by means of fuzzy questions; in decision-making, when a crisp decision
must be taken from fuzzy test s or operators or from the answers provided by the
inquired personto a crisp question qi by indicating preference grades for the
elements qik which are proposed to her; in preference elicitation, when the inquirer
makes a choice between two fuzzy questions about the same variable.
For a given threshold r in [0, 1], we associate with IIi a crisp partition IIi '" of
level r, by defining crisp classes as qik'" = { u I fik(u) ~ r } , lrua(i). Obviously,
such a crisp partition does not exist for any value of r and some thresholds
correspond to several possible crisp partitions. We suppose that the tests or
operators are defmed in such a way that there always exist a value r providing a
crisp partition. We can consider the average weight of each fuzzy label by
introducing its r-probability p{( qik) as the average value of its associated
possibility distribution, for the values at least equal to r.
This generalization of the concept of probability to a fuzzy subset of the
universe allows to measure the fuzzy information I{'"(IIi) processed by IIi for the
threshold r with respect to the crisp partition lIt We use this tool as a measure of
the proximity between IIi and lIt
Let us consider the case where we are given a set of fuzzy operators Q and
we look for the crisp partition associated with each of them, loosing as little
information as possible when passing from fuzzy descriptions to crisp descriptions.
230
For every qik associated with the fuzzy partition TIi of Ui , we choose the crisp
partition ITt such that the fuzzy information I{*(ITi)' processed by ITi for the
threshold r with respect to TIi"', is maximum.
If several tests or operators are available for the same linguistic variable ~
on Ui, the most interesting is the one processing the greatest absolute fuzzy
information with regard to all the possible crisp partitions which could be
associated with it.
REFERENCES
[1] AKDAG, H., BOUCHON, B. (1988) - Using fuzzy set theory in the analysis
of structures of information. Fuzzy Sets and Systems, 3,28.
[2] AURAY J.P., DURU G., TERRENOIRE M., TOUNISSOUX D. ZIGHED
A. (1985) - Un logiciel pour une m6thode de segmentation non arborescente,
Informatiqye et Sciences Humaines, vol. 64.
[3] BOUCHON B. (1981) - Fuzzy questionnaires, Fuzzy Sets and Systems 6, pp.
1-9.
[4] BOUCHON B. (1985) - Questionnaires in a fuzzy setting, in Mana~ment
decision support systems usin, fuzzy sets and possibility theory, eds. J.
Kacprzyk and R.R. Yager, Vertag TUV Rheinland, 189-197.
[5] BOUCHON, B. (1987) - Preferences deduced from fuzzy questions, in smt
o.ptimization models usin, fuzzy sets and possibility theory (J. Kacprzyk and
S.A. Orlovski, eds), D. Reidel Publishing Company pp. 110-120.
[6] BOUCHON, B. (1988) - Questionnaires with fuzzy and probabilistIc
elements, in Combiniu, fuzzy iI!JPfeCision with probbilistic uncertainty in
decision makin, (1. Kacprzyk, M. Fedrizzi, eds.), Springer Verlag, pp. 115-
125.
[7] BOUCHON, B. (1990) - Sequences of questions involving linguistic
variables; in Ap,proximate reasonin, tools for artificial inteUiience, (M.
Delgado, J.1.. Verdegay, eds.), Verlag TUV Rheinland.
[8] BOUCHON B., COHEN, G. (1986) - Partitions and fuzziness, L-..2f
Mathematical Analysis and Applications. vol. 113, 1986.
[9] DUBOIS, D. PRADE, H. (1987) -1Morje des possibilit6s. applications Ala
re.pr6sentation des CQDnaissauces en informatiqye. Masson.
[10] FARRENY H., PRADE H., WYSS E. (1986) - Approximate reasoning in a
rule-based expert system using possibility theory : a case study, in
Information Processin, (HJ. Kugler, ed.), Elsevier Science Publishers B.V ..
[11] PAYNE R. (1985) - Genkey : a general program for constructing aids to
identification. Infonnatiqye et Sciences Httmnjnes, vol. 64.
[12] PICARD C.F. (1980) - Graphs and questionnaires, North Holland,
231
Amsterdam.
[13J TERRENOIRE M. (1910) - Pseudoquestionnaires et information, C.R. Acad.
~ 211 A, pp. 884-881.
[14] M. TERRENOlRE (1910) - Pseudoquestionnaires, 'l'Mse de Doctorat d'Etat,
Lyon.
[15] E. WYSS (1988) - TAIGER, un g6n6rateur de syst~mes experts adapt6 au
traitement de donn6es incertaines et impr6cises, ~, Institut National
Polytechnique de Toulouse.
232
Annex 1:
Possibility and necessity coefficients associated with every element rl.i of D.
when the label qik is obtained as an answer to the test or the operator q. of Q. are
respectively denoted by 7r(dj / <tik) and N(dj / qik). They belong to [0. 1] and they
are such that N(dj / qh ~ 7r(dj / qik), with N(dj / qik) =0 if7r(dj / qh < 1 and 7r(dj /
qik) = 1 if N(dj / qik) ¢ o.
The possibility Pos (djo / XI k( 1), •.. , "'rk(r) ) that the element djo of D must
be identified, and the certainty Nec (djo / Xl k( 1), ..., "'rk(r) ) that this identification
is satisfying, when labels xl k(l), ••• , "'rk(r) are obtained as answers to tests or
operators Xl"'" Xr> are evaluated by means of the following coefficients :
Pos(djo/Xlk(l), •.• , xl'(r» =minl~iY 7r(rl.io/~k(i», (2)
Nec (rl.io / xl k(l), ••• ,"'rk(r» = DlaXgiY 7r(rl.io / ~k(i». (3)
We define as follows the ayemae cgtainty Cer(xl k(1), ... , Xsk(s~, provided
about the recognition of any element of D, by a sequence of labels xl k(l) •••• ,
Xsk(s) obtained as answers to tests or operators Jt 1' ... , Xs of Q :
Cer( k(l) k(s). - ~ A. kN'..,./d / k(l) k(s). (4)
Xl '''''Xs 'J-~l~t~n-;J ~ j Xl .... ,xs 'JPj'
with~l = lifPos (rl.i / Xl (1), ... , xsk(s~ ~ s, and 0 otherwise.
Possibility measure of the adequation of any answer q'i with a given label <lik,
for a question (test or operator) qi of Q:
7r( qik; q'i) = sup {u in Ui } min ( fik(u), ii(u) ), 1 ~ k ~ a(i), (6)
Necessity measure of this adequation :
233
For the particular situation co' the possibility ,.f(dj ) that dj is concerned
according to the proximity of the obtained answer q'i with qik , and the certaipty of
this assertion Nk(~), will be evaluated by the following coefficients [10. 15] :
,.f(dj )=max[min{1I( dj / qh.1I( qik;q'i)},min{1I( djjl...,qh.1-N( qik; q'i)}]' (8)
~(dj)=min[max{N ~ / qik).I-1T( qik; q'i)},max{N( dj /..., qik). N( qik; q'i)}]' (9)
As indicated in [10]. the values can be replaced by the interval to which they
belong in the case where they are not precisely known.
Annex 2
A fuzzy partition of the universe Ui on which the concerned linguistic
variable Xi is defined satisfies :
I 1~(i) fi~U) = 1 • for every point u in Ui,
and I {u in Ui} fi~u) >0, for every k. l~~(i).
The r-probability P{( qik) of a fuzzy label qik with regard to the crisp class
qik* is defined by :
p{( qik) = I {u in qik* } fi~U) Pi(u).
The fuzzy information I{'"(IIi) processed by a fuzzy partition IIi of Ui for the
threshoid r with respect to the crisp partition IIi* is defined as follows:
~r"'<ITi) = I l~i) 1.( p{( ~k» I [ I l~i) P{( qb 1.
with the function L(x) =-x log(x).
Properties of this fuzzy information lead to its mazimization in order to have
the best compatibility between a fuzzy partition and any possible associated crisp
partition for a given threshold r.
Elie Sanchez
Faculty of Medicine, University of Marseille, and
*Neurinfo Research Department
Institut Mediterraneen de Technologie
13451 Marseille Cedexl3, France
ABSTRACT
This tutorial paper has been written for biologists, physicians or beginners
in fuzzy sets theory and applications. This field is introduced in the framework of
medical diagnosis problems. The paper describes and illustrates with practical
examples, a general methodology of special interest in the processing of borderline
cases, that allows a graded assignment of diagnoses to patients. Apattern of medical
knowledge consists of a tableau with linguistic entries or of fuzzy propositions.
Relationships between symptoms and diagnoses are interpreted as labels of fuzzy
sets. It is shown how possibility measures (soft matching) can be used and combined
to derive diagnoses after measurements on collected data.
The concepts and methods are illustrated in a biomedical application on
inflammatory protein variations. In the case of poor diagnostic classifications, it is
introduced appropriate ponderations, acting on the characterizations of proteins, in
order to decrease their relative influence. As a consequence, when pattern matching is
achieved, the final ranking of inflammatory syndromes assigned to a given patient
might change to better fit the actual classification. Defuzzification of results (Le.
diagnostic groups assigned to patients) is performed as a non fuzzy sets partition
issued from a "separating power", and not as the center of gravity method commonly
employed in fuzzy control.
It is then introduced a model of fuzzy connectionist expert system, in which
an artificial neural network is designed to build the knowledge base of an expert
system, from training examples (this model can also be used for specifications of
rules in fuzzy logic control). Two types of weights are associated with the
connections: primary linguistic weights, interpreted as labels of fuzzy sets, and
secondary numerical weights. Cell activation is computed through MIN-MAX fuzzy
equations of the weights. Learning consists in finding the (numerical) weights and
the network topology. This feedforward network is described and illustrated in the
same biomedical domain as in the first part.
INTRODUCTION
In many situations, physicians use subjective or intuitive judgments. They
cannot always logically, or in simple terms, explain how they derive conclusions,
because of the complex mental processes inherent to the nature of the cases to be
diagnosed, or to the difficulty of recalling their years of ttaining and experience.
Interpretation of biological analyses suffers from some arbitrariness,
particularly at the boundaries of the quantities that are measured, or evaluated. It is
customary to use symbols like +++, ++, +, N, -, - -, - - -, or, ttt, tt, t, N, ,1"
J,J" J,J,J, to denote variations ('N' stands for 'Normal'). In general, limits of values
that characterize abnormalities, or normality, define numerical intervals that are used
to describe standards in variations. First of all, normal or non pathological states,
have to be determined. They constitute the reference to which abnormalities are
specifIed.
Biologists are familiar with normal variation ranges that are a prerequisite to
a proper interpretation of all laboratory tests. Notions of statistical normality are
usually derived from frequency distributions, not always confined to Gaussian
distributions. But, depending on the measurement procedures of a given laboratory,
on the epidemiologist, the biologist or the clinician who manipulates and interprets
measurements, but also on the nature of the populations under study, and on
conditions of physiological (biological) normality, one has to commonly rely on
fiduciallimitts (see for example [1,2] for discussions on normality).
The main drawback in working with intervals to represent normality, or
ranges of variations for abnormalities, is the weak reliability on thresholds.
Moreover, such boundaries are more or less physician dependent in practice. For
example [3], "the normal base-line value for a given individual's lactic acid
dehydrogenase may be at the extreme low point of the normal range for the general
populations. Thus he (the physician) could develop an elevation due to a disease
process that is significant and still within the normal range of the population." Still
in [3], under a table defining the range of normal values for blood chemistry, one
may read: "these ranges are a guide to the normal concentrations of blood
constituents. For accurate interpretations, always refer to normal values established
by individual laboratories, since individual differences in procedures may affect the
actual ranges. n
A problem that is often posed lies in the ill-definition and in the treatment
of the boundaries of the intervals. To cope with borderline cases, fuzzy set theory
provides very natural and appropriate tools. So it is here assumed that imprecision in
the description of variations is of a fuzzy type and terms like "Normal, Slightly
Decreased, Very Increased, etc.," will be treated as labels of fuzzy sets in (possibly
different) universes of discourse. These fuzzy sets represent linguistic intervals, and
around cutoff boundaries, very close points will not be totally accepted or rejected
like in yes-or-no procedures, according to their position with respect to the frontier.
A coding with i's or J,'s is sometimes too restrictive: it is not always
possible to choose between i and ii for example, and in some patterns one may fmd
237
"from i to ii." A scale with degrees ranging from 0 to 1 is very convenient. Note
that it is not needed to set up precise values in [0,1] : in interpreting patterns, it is
sufficient to have a rough idea of the curve expressing the compatibility between
measurements and concepts.
It will now be described a general methodology, illustrated with an
application, of special interest in the processing of borderline cases, and which offers
the physician, practical assistance in obtaining the same results in the same abnormal
profJJ.es.
SI ... S.1 Sn
tn
~
tn
~ A FI F.
I
F
n
~
is
SERUM PROTEINS
of Medical Knowledge, the fuzzy sets are fuzzy intervals that extend the definition of
usual (crisp) intervals. Fuzzy intervals are here of three types, they fuzzify crisp
intervals and they mean "fuzzily greater (or smaller)" than a given value a, or "fuzzily
between" two values b and c (for example fuzzy intervals representing NORMAL
ranges are usually of this last type). Values like a, b, c, have a grade of membership
equal to 0.5. In particular, a fuzzy [b,c]-type interval can reduce to a fuzzy number D,
meaning "around a value d" (see fig.4). In this case, the bandwidth is the separation
between the two values having a 0 .5 grade of membership, it is a convenient
fuzziness indicator of the fuzzy number D.
JlD
1 D
0.5
numerical
o' - - - -..........;:.....;d~:--l.....- -. . values
:.,..;
bandwidth
Fig. 4 - Fuzzy number D, meaning "around d."
1t (F i .0 i) ....................... ..
o di
S·I
RELATIVE WEIGHTING
Practically. some attributes might be less important than others in the
characterization of a diagnosis. For a given diagnosis. relative importance among
attributes can be translated by means of weights (a. P. 'Y ••• ) ranging in [0.1]. A
value "0" weight assigned to an attribute means that this attribute is not important at
all in the evaluation of the diagnosis and hence it can be deleted. whereas a value "I"
weight does not modify the importance of the protein. Intermediate grades of
importance can be tuned by adjusting values of weights within the unit interval.
In the pattern. fuzzy propositions ("S is F." in the generic form)
characterizing a given group. appear as conjunctions (ANDs). Assignement of a
weight a to take into account the relative importance of protein variations. can
assume the following form [6.7]. for F fuzzy set in a universe of discourse U :
a
F =MAX (1-a. F).
i.e. Vx E U. Jlpa (x) = MAX[(I-a). J.lp (x)].
Generally. a t-conorm could replace the MAX operator in the above formula [8].
Limit cases have the following meanings.
a =0: Vx E u. J.lpD(x) = 1. i.e. FOis neutral for conjunctions and therefore. it can
be deleterl,
a = I: V x E U. J.lFl (x) =J.lF(x). i.e. the weight has no effect
In the case of Vasculitis. the following weights have been assigned. yielding
the modified rule :
Vasculitis IE C3-Complement Fraction is (Decreased or Normal)O.l
ANI! Alpha-I-Antitrypsine is (Decreased or Normal)l.O
Mm Orosomucoid is (Increased)O.8
Mm Haptoglobin is (Very Increased)O.3
Mm C-Reactive Protein is (Very Increased)°·8.
Note that C3-Complement Fraction could have been neglected (weight close to 0) and
that no weight might be assigned to "Decreased or Normal" in Alpha-I-
Antitrypsine (weight equal to I. i.e. no effect of the weight). For example. the
modified fuzzy variations of Haptoglobin (with weight 0.3 to "Very Increased"). and
the corresponding modified fuzzy measure are presented in figure 6.
242
IlF
0·1 F~.3
----IIIIIIf......
1
1
DEFUZZIFICATION
If the patients are to be assigned non fuzzy diagnoses, we must defuzzify the
fuzzy set n of diagnoses, that has been evaluated following a MIN aggregation. For
this purpose, one may use the concept of separating power [9], which is different
from the center of gravity method commonly employed in fuzzy control. The
separating power s(:D) allows to evaluate to which extent a fuzzy set, like n, of a
universe of discourse U (U is here the set of the given diagnoses under study),
separates optimally U into a non fuzzy partition (A,A), where A' is the complement
set of A. The set A is defined as follows:
s(:D) =n • A =sup {n.B such that U ~ B, B *0 }, in which
I
n • B = card(:DB) / card(B) - card(1)B') / card(B) I,
where nB denotes the restriction of t> to B, Card(B) is the cardinality of B, and
card(:DB) is the fuzzy cardinality oft>B ; for example, card(1)B) =I Ae B ~(A).
Applying the separating power to the fuzzy set n, it is derived the optimal partition
«A,A') above) to n. A is fmally the (non fuzzy) set of diagnoses assigned to
patients.
.J
~
Decreased Decreased Increased Very Very
~ Vasculitis
or Normal or Normal Increased Increased
.J
o
~
=:I
The fuzzy sets corresponding to this linguistic pattern have been established for each
entry of the tableau (one of its rows is in Table 8).
PROTEINS
C3 AIAT Om Hpl CRP
.J
~ Vasculitis ~
~ 1
l l i 1.8 ILI~I~
1.7 1.4 1.2 2.2 ~ ~
oJ
o
~
=:I
Table 8 - Fuzzy sets characterisation of Vasculitis in the P.B.I.S. pattern.
In this study, fuzzy numbers issued from measurements over patients have
been compared with the corresponding fuzzy sets in the P.B.I.S. pattern, by means of
244
Jl D
1
1t (F,D)
P(F,D)
V (F,D)
o
d
Fig. 9 -Compatibility measures or indexes.
For each of the eleven groups, comparison of a patient's condition with the
pattern yields five (one for each protein) ttiples of numbers (Vi' Pi' Xi)' i = 1,5,
which are aggregated by means of the MIN operator, expressing conjunctions:
245
(V, P,X) =(MINi Vi' MINi Pi' MINi Xi)' Finally for each patient, one has three
different rankings of diagnoses derived from V, P, X.
ILLUSTRATIVE EXAMPLE
This patient case we report from [4], has been medically diagnosed as
Vasculitis. The protein profile of this patient is given as follows.
C3 AIAT Om Upt CRP
raw data (gil) 1.50 2.26 1.75 10.0 0.060
nonnalized data 1.85 1.00 1.99 5.59 10.0
For simplicity, we only present the matching results from the possibility measure
(the Xi'S) and the corresponding crisp partition (A,A,) that has been found to be
associated with the fuzzy set of diagnostic groups n.
A = (Collagen Diseases) (mini Xi =0.43) s(1)= 0.40.
A' consists of all the remaining diagnostic groups. Vasculitis does not appear here for
one of the possibility measures, Xl =x(Vasculitis,C3), is nearly equal to zero.
Hence, the MIN operator acting on the Xi's produces a value practically equal to zero,
whatever values are computed from the other possibility measures associated with
Vasculitis. In fact, for Vasculitis, the possibility measure results are as follows.
C3 AIAT Om Upt CRP mini xi
Xi'S 0.04 1 0.82 1 1 0.04
The four proteins (AIAT, Om, Hpt and CRP) have a high grade of matching and
Vasculitis is rejected because of the only mismatch due to C3. But as already pointed
out, in the case of Vasculitis, C3 can be nearly neglected (weight equal to 0.1).
Hence, in a weighted process, one will derive:
A =(Vasculitis) (mini Xi =0.85) s(1)= 0.78.
The right diagnostic group of Vasculitis appears now, and it is computed with a
better separating power (0.78) than in the case of a non-weighted process (0.40).
We recall now the weights of importance associated with the five proteins in
the characterisation of Vasculitis. For this patient's case, we also give the matching
results in the non-weighted process, followed by the ones of the weighted process,
using only the Xi's.
C3 AIAT Om Upt CRP min·x·
I I
Weights (Vasculitis) 0.1 1.0 0.8 0.3 0.8
Xi's (non weighting) 0.04 1 0.82 1 1 0.04
Xi'S (weighting) 0.9 1 0.85 1 1 0.85
a numerical weight equal to 1). As soon as a connection is issued from an input cell,
a linguistic weight exists, but not always a numerical weight does, in the case of no
hidden cell. Hidden cells have only numerical weights associated with connections
towards output cells.
(linguistic) (numerical)
~______w_i_i____...~~~____~~j____~••~
Input Cell Hidden Cell Output Cell
Fig. 12 - General connection with a primary (linguistic) weight
and a secondary (numerical) weight
Input cells can take on numerical values or fuzzy numbers, in their
underlying universe of discourse. When the input cells Sj's are given, output cells
Ai'S are computed according to the following formula (combination of weights for
inferencing) :
Ai =MIN· MAX lbi· ,Jl (d·)] for numerical dj's
J ~ w··IJ ~
or else, Ai =MINj MAX lbij , 1t(wij,0j)] for fuzzy numbers OJ's,
where: - dj's are numerical value assigned to Sj's,
- OJ's are fuzzy numbers meaning "around dj's,"
- Jlw..<dj) is the grade of membership of dj in Wij,
IJ
- 1t(wij,oj) is the possibility measure of OJ GIVEN wij-
Of course, a mixed formula for a Ai can involve both numerical dj's and fuzzy
numbers OJ's, and in the above formula, t-norms and t-conorms could replace MIN
and MAX operators, respectively.
Let us consider now training examples, i.e. for a Ai, it is given the
corresponding Sj's connected to it
BIOMEDICAL APPLICATION
We now illustrate the fuzzy connectionist network, using the same previous
biomedical domain: inflammatory protein variations. We consider the same five
proteins : C3-Complement Fraction (C3), Alpha-l-Antitrypsine (AIAT),
Orosomucoid (Om), Haptoglobin (Hpt) , C-Reactive Protein (CRP) and, for
simplicity, only four diagnostic groups composed of the Normal condition and of
three biological inflammatory syndromes: Bacterial Infection, Vasculitis, Nephrotic
Syndromes.
The P.B.I.S. network we present here, is depicted in fig. 13, in which the
five proteins correspond to the input cells SI, ... , S5 and the four groups to the
output cells Ill,... , 1l4. There are seven hidden cells associated with numerical
weights. The linguistic weights have the following meaning.
wll : normal,
w12 : normal, w22: increased, W32 : decreased or normal, w42 : decreased or normal,
w13 : normal, W23 : increased, w33 : increased, W43 : decreased or normal,
W14 : normal, W24: increased, W34 : very increased, W44 : slightly increased or
increased,
W1S : normal, W25 : very increased, w3S: very increased.
BACfERIAL NEPHRonC
NORMAL 1NFECI10N VASCUunS SYNDROMES
REFERENCES
[1] J.L. Beaumont, L.A. Carlson, G.R. Cooper, Z. Fejfar, D.S. Fredrikson and T.
Strasser, "Classification of Hyperlipidemias and Hyperlipoproteinemias," Bull.
W.H.O., 43 (1970), pp. 891-915.
[2] D.S. Fredrickson, R.I. Levy and R.S. Lee, "Fat transport in lipoproteins," N.
Engl. J. Med., 276 (1967), pp. 32-44, 94-103,148-156,215-226,273-281.
[3] R.M. French, "Guide to Diagnostic Procedures," Mc Graw-Hill, New-York
(1975).
[4] E. Sanchez and R. Bartolin, "Fuzzy Inference and Medical Diagnosis, a Case
Study," Proc. of the First Annual Meeting of the Biomedical Fuzzy Systems
Association, Kurashiki, Japan (1989), in J. of the Biom. Fuzzy Syst. Ass.,
VoLl, N.l (1990), pp.4-21.
[5] L.A. Zadeh, "Fuzzy Sets as a Basis for a Theory of Possibility," Fuzzy Sets and
Systems, 1 (1978), pp. 3-28.
[6] E. Sanchez, "Soft Queries in Knowledge Systems," Proc. of the Second IFSA
World Congress, Tokyo (1987), pp. 597-599.
[7] E. Sanchez, "Importance in Knowledge Systems," Information Systems, Vol. 14,
N°6 (1989), pp. 455-464.
[8] E. Sanchez, "Handling Requests in Intelligent Retrieval," in Contributions on
Approximate Reasoning and Artificial Intelligence, M. Delgado and J.L.
Verdegay, eds. (to appear).
[9] C. Dujet,"Valuation et Separation dans les Ensembles Flous," Structures de
l'Information, 18 - Publications du CNRS (1980), pp. 95-105.
[10] M. Cayrol, H. Farreny and H. Prade, "Fuzzy Pattern Matching," Kybernetes, 11
(1982), pp. 103-116.
[11] E. Sanchez, "On Truth-qualification in Natural Languages," Proc. Int. Con/. on
Cybernetics and Society, Tokyo (1978), pp. 1233-1236.
[12] E. Sanchez, "Mesures de Possibilite, qualifications de Verite et Classification de
Formes Linguistiques en Medecine," in Actes Table Ronde C.N.R.S., Lyon
(1980).
[13] R. Bartolin, "Aide au Diagnostic Medical par Mesures de Comparaisons Floues
et Pouvoir Separateur. Approche Linguistique des Profils Proteiques
Inflammatoires Biologiques," These d'Etat en Bilogie Humaine, Marseille (1987).
[14] L.A. Zadeh, "Interpolative Reasoning Based on Fuzzy Logic and its Application
to Control and Systems Analysis," invited lecture, abstract in the Proc. of the
Int. Con/. on Fuzzy Logic & Neural Networks, Iizuka, Japan (1990).
[15] E. Sanchez, "Connectionism, Artificial Intelligence and Fuzzy Control," invited
lecture, abstract in the Proc. of the Second Annual Meeting of the Biomedical
Fuzzy Systems Association, Kawasaki Medical School, Kurashiki, Japan (1990).
251
[16] A.G. Bano, R.S. Sutton and C.W. Anderson, "Neuronlike Adaptive Elements
that Can Solve Difficult Learning Conttol Problems," IEEE Trans. S.M.C., vol.
13, N°5 (1983) pp.834-846.
[17] C.C. Lee(1989), "A Self-learning Rule-based Controller Employing
Approximate Reasoning and Neural Net Concepts," Memo U.C. Berkeley,
N°UCB/ERL, M89/84 (1989), to appear in the Int. J. of Intelligent Systems.
[18] C.C. Lee, "Intelligent Conttol Based on Fuzzy Logic and Neural Network
Theory," Proc. of the Int. Con/. on Fuzzy Logic &. Neural Networks, Iizuka,
Japan (1990) pp.759-764.
[19] S.I. Gallant S.I. (1988), "Connectionist Expert Systems," Com. of the ACM,
vol. 31, N~ (1988) pp.152-169.
[20] M. Frydenberg and S.I. Gallant S.I. (1987), "Fuzziness and Expert System
Generation," Lect. Notes in Computer Science (B. Bouchon and R.R. Yager,
Eds), Springer-Verlag, vol. 286 (1987) pp.137-143.
[21] B. Kosko, "Fuzzy Associative Memories," in Fuzzy Expert Systems (A.
Kandel, Ed.), Addison-Wesley, Reading, Mass. (1986).
[23] M. Togai, "Fuzzy Neural Net Processor and its Programming Environment,"
Preprints of the 1988 first joint technology workshop on neural networks and
fuzzy logic, NASA, Johnson Space Center, Houston, TX (1988).
[24] H. Takagi and I. Hayashi, "ArtificiaCNeuraCNetwork-Driven Fuzzy
Reasoning," Proc. of the Int. workshop on fuzzy systems applications, Kyushu
Institute of Technology, Iizuka, Japan (1988) pp.217-218.
[25] R.R. Yager, "On the Interface of Fuzzy Sets and Neural Networks," Proc. of the
Int. workshop on fuzzy systems applications, Kyushu Institute of Technology,
Iizuka, Japan (1988) pp. 215-216.
[26] D.L. Hudson, M.E. Cohen and M.F. Anderson, "Determination of Testing
Efficacy in Carcinoma of the Lung Using a Neural Network Model," in
Computer applications in medical care (R.A. Greenes, Ed.), vol. 12 (1988)
pp.251-255.
[27] S.S. Chen, "Knowledge Acquisition on Neural Networks," LeCI. Notes in
Computer Sciences (B. Bouchon, L. Saitta and R.R. Yager, Eds.), Springer-
Verlag, vol. 313 (1988) pp.281-289.
[28] T. Yamakawa and S. Tomoda, "A Fuzzy Neuron and its Application to Pattern
Recognition," Proc. of the Third IFSA Congress, Seattle, WA (1989) pp.30-38.
[29] K. Yoshida, Y. Hayashi and A. Imura A. (1989), "A Connectionist Expert
System for Diagnosing Hepatobiliary Disorders," Proc. of MEDINFO 89,
Beijing and Singapore (1989) pp. 116-120.
[30] J. Yen, "Using Fuzzy Logic to Integrate Neural Networks and Knowledge-based
Systems," Proc. of the Neural networks and fuzzy logic workshop, NASA,
Johnson Space Center, Houston, TX (1990).
[31] E. Sanchez, "Fuzzy Connectionist Expert Systems," Proc. of the Int. Con/. on
Fuzzy Logic &. Neural Networks, Iizota. Japan (1990) pp.31-35.
[32] E. Sanchez, "Resolution of Composite Fuzzy Relation Equations," Information
and Control, vol. 30, N°l (1976) pp.38-48.
[33] A. Di Nola, W. Pedrycz, E. Sanchez and S. Sessa. "Fuzzy Relation Equations
and their Applications to Knowledge Engineering," Kluwer Acad. Pub.,
Dordrecht (1989).
12
THE REPRESENTATION AND USE
OF UNCERTAINTY AND
METAKNOWLEDGE IN MILORD
R. Lopez de Mantaras, C. Sierra, J. Agusti
INTRODUCTION
One of the most interesting aspects of Expert Systems research is to
gain some insights about human problem solving strategies by trying to
emulate them in programs. Experts in a domain are better than novices in
performing problem solving tasks. This is due to their greater experience in
solving problems that provides them with better strategies. Such strategies are
knowledge about how to use the knowledge they have in their domain of
expertise. This kind of knowledge is called metaknowledge and is represented
by means of meta-rules in the MILORD system for diagnostic reasoning.
Diagnostic reasoning heavily involves metaknowledge to focuss attention on
the most plausible hypotheses or goals in a given situation and to control the
inference process. Furthermore, uncertainty also plays an important role at the
control level, for example, decisions are taken depending on the uncertainty of
the facts supporting them.
On the other hand, psycological experiments (Kuipers et aI., 1989)
show that human problem solvers do not use numbers to deal with uncertainty
but symbolic descriptions expressing categorical and ordinal relations and that
in complex situations, the propagation and combination of uncertainty is a
local context dependend process. MILORD has a modular structure that allows
to represent and manage uncertainty by means oflocal operators defined over a
set of ordered linguistic terms defined by the expert.
In this paper we describe the MILORD system foccusing in the
metaknowledge and in role that uncertainty plays in such modular system,
that is, its role in the local deductive mechanisms within each module and as a
control feature in the task of selecting and combining modules to achieve a
solution.
Before describing MILORD, the paper starts by presenting
fundamental concepts on control structures for rule-based systems.
254
control meta-rules might be applied that could suggest a new sequence of goals
to be considered and combined with the previous one. This combined strategy
will then be executed and so on. Later in the paper we will describe in more
detail this process.
UNCERTAINTY MANAGEMENT
Most AI research on reasoning under uncertainty is concerned with
normative methods to propagate and combine certainty values and there is
some disagreement between the proponents of the different methods
(Bayesians, Dempster-Shaferians, Fuzzy logicians, etc.). Hoewever, these
methods do not really claim to closely mimic human problem solving under
uncertainty. Although human problem solvers are almost always uncertain
about the possible solution in complex domains, they often achieve their goals
despite uncertainty by using methods that are particularized to the type of
problem solving that they are performing at a given time. In fact, like (Cohen
et aI., 1987) puts it, managing uncertainty consists in selecting actions that
simultaneously achieve solutions and reduce their uncertainty. This view leads
to consider uncertainty as playing an important role at the control level
because it is useful to constrain the focuss of attention (which part of the
problem to work next) and action selection (how to work on it) as will be shown
in the framework of MILORD.
Furthermore, we belive that large complex expert systems draw their
problem solving capabilities more from the power of the structure and control
of their knowledge bases than from the particular uncertainty management
formalism they use. On the other hand, the structure in the knowledge bases
makes the propagation and combination of uncertainty a local, context
dependent process.
For example, a module that determines the dosis of penicillin that has to be
given to a patient must not be presented in any acceptable combination for a
patient allergic to penicillin.
The modularization of KB's leads to the concept of locality in the
modules of a KB. It is possible to define the contents of a module independently
of the definition of the rest of the modules. This possibility, methodologicaly
desirable, allows the use of different local logics and reasoning mechanisms
adapted to the subtasks that the system is performing.
where modid stands for an identifier of the module and modexpr for the body of
the definition made out of the components specified above. Let us look at an
example of module definition.
Module gram_esputum =
begin
import Class, Morphology
export morpho, esputum_ok
deductive knowledge:
Rules:
ROOI If class >4 then esputum_ok is sure
end deductive
end
257
There is also the possibility of defining generic modules that represent
functional abstractions of several non genric modules.
LOCAL LOGICS
It is clear that experts use different approaches to the management of
uncertainty depending on the task they are performing. Usually expert
systems building tools provide a fixed way of dealing with uncertainty
proposing a unique and global method for representing and combining
evidence. In the COALPSES language it is possible to define different
deduction procedures for each one of the modules. If from a methodological
point of view a task is associated with a module then, a different logic can be
used depending on the task.
The definition of local logics is made by the next primitive in the
COLAPSES language:
Inference system:
Truth values = list of linguistic terms
Renaiming= morphisms between linguistic terms
Connectiyes:
Conjunction = function definition
Disjunction = function definition
Inference patterns:
Modus ponens = function definition
ModuleB=
begin
Module A=
begin
ImportC
ExportP
Deductive knowledge:
Rules:
Rl ifC then conclude P is possible
258
Inference system:
Truth values = (false, possible, true)
End deductive
end
ImportD
ExportQ
Deductive knowlegdge:
Rules:
RI if NP and D then conclude Q is
quite-POssible
Inference system:
=
Truth values (impossible, moderatelY-POssible,
quite-POssible, sure)
Renaming = Nfalse = = > impossible
Npossible = = > quite-POssible
Ntrue = = > sure
end deductive
end
- often the experts supplying the knowledge are not able to define the
meaning of the linguistic values using a numerical scale, although they have
no difficulty in ordering them.
- different experts might not agree on the representation of some or all
the linguistic values.
- the necessary approximation process does not always ensure that
resulting operators satisfy the properties which originally were required to the
functions used to generate them.
These disadvantages lead us to propose an alternative approach (Lopez
de Mantaras et aI., 1990). The central idea consists in treating linguistic terms
as mere labels without assuming any underlying numerical repesentation, and
then eliciting the connective operators directly on the set of labels. The only a
priori requirement is that these labels should represent a totally ordered set of
linguistic expressions about uncertainty. For each logical connective, a set of'
desirable properties of the corresponding operator is listed. These properties act
as constraints on the set of possible solutions. In this way, all operators
fulfilling the set of properties are generated. Mterwards, the domain expert
may select the one he thinks fits better his own way of uncertainty
management. This approach can be easily implemented by formulating it as a
constraint satisfaction problem, and most of the disadvantages of the former
approach are avoided.
EVIDENCE INCREASING
The current uncertainty of facts can be used to control the deduction
steps in order to increase the evidence of a given hypothesis. So, for example, if
260
we have an alcoholic patient with a cavitation in the chest x-ray and there is
low evidence for tuberculosis, then the Ziehl-Nielssen test to determine more
clearly whether he has a tuberculosis should not be done. But if he presents a
risk factor for AIDS then we shall increase our evidence for tuberculosis and
the test will be suggested. This is expressed as follows:
If tuberculosis> moderately-possible
then conclude Test Ziehl-Nielssen
STRATEGY FOCUSING
The uncertainty of facts can determine the set of hypothesis to be
followed in the sequel.
Example:
If Strategy (X) and Strategy (Y) and Certainty (X) > Certainty (Y)
and Goals (X) n Goals (Y) ~ 0
Then Ockham (X, Y)
261
KNOWLEDGE ADEQUATION
As indicated at the begining of the paper a KB is a set of knowledge
units that have to be adapted to the current case. For example alcoholism is a
useful concept when determining a bacterial pneumonia, but it is useless for
non-bacterial diseases. Then, for example a possible use of the uncertainty of
the fact bacterianicity is to decide about the use of a given concept in the whole
KB, i.e. to adequate the general knowledge to the particular problem.
Example:
If no bacterian disease
then do not consider alcoholism in the search of the solution
SOLUTION ACCEPTANCE
The degree of uncertainty of a fact can also be used to stop the
execution of the system. For example
METACONTROLAND LOCALITY
The structured definition of KB's helps not only in the definition of safe
and maintainable KB's but also gives some new features that where impossible
to achieve in the previous generation of systems. Among them the most
important is the possibility of defining a local meta-logical components for each
one of the modules.
The definition of strategies <ordered set of elementary steps to solve a
problem) in a previous version of the MILORD system (Godo et aI., 1989) was
made globally. Only one strategy could be active at any moment. Presently, as
many strategies as nodes in the module graph structure can be active. This
flexibility is linked with the fact that each module can have a different
treatment of tlncertainty. So, the uncertainty plays a different role as a control
feature depending on the association between module and logic.
Furthermore, given the fact that the system consists of a hierarchy of
submodules the meta-logical components act ones upon the others in a
pyramidal fashion. This allows us to have as many meta-logic levels as
necessary in an application. Further research will be purused along this line. A
richer representation of the logic components in the meta-logic will also be
investigated and sound semantics from the logic point of view will be defined.
262
CONCLUSION
One interesting aspect of building expert systems is to learn something
about human problem solving strategies by trying to reproduce them in
programs. Human problem solver's are uncertain in many situations and do
not use a simple normative method to handle uncertainty. Instead they take
advantage of a good organization in the problem solving task to obtain good
solutions using qualitative approximations. This suggests to consider
uncertainty as playing an important role at the control level by guiding the
problem solving strategies. In order to illustrate these points, we have
described a modular architecture and language that extensively exploits
uncertaintyas a control feature and uses local context dependent combination
and propagation uncertainty operators.
BIBLIOGRAPHY
1. AgustiJ., Sierra C., Sannella D. (1989): "Adding generic modules to
flat-rules based languages: a low cost approach", in Methodologies for
Intelligent Systems 4, (Z. Ras ed), Elsevier Science Pub, pp43-51.
2. Aiello L., Levi G. (1988): ''The uses of Metaknowledge in AI Systems", in
Meta-Level Architectures and Reflection (P. Maes, D.Nardi, ed.), North-
Holland,243-254.
3. Cohen P.R., Day D., De lisio J., Greenberg M., Kjeldsen R., Suthers D.,
Berman P. (1987) : "Management of Uncertainty in Medicine",
I nternational Journal ofApproximate Reasoning 1: 103-116.
4. Godo L., LOpez de Mantaras R., Sierra C., Verdaguer A. (1989): "MILORD,
the architecture and management of linguistically expressed
uncertainty", Int . Journal ofIntelligent Systems, vol. 4, n.4, pp 471-50l.
5. Kuipers B., Moskowitz A. J ., KassirerJ. P. (1988): "Critiacl Decisions under
Uncertainty: Representation and Structure", Cognitive Science 12, 177-
210.
6. LOpez de Mantaras R., Godo L., SangOesa R. (1990): "Connective operators
Elicitation for Linguistic Term Sets", Proc. Intl. Conference on Fuzzy
Logic and Neural Networks, Iizuka, Japan, 729-733.
7. LOpez de Mantaras R. (1990): "Aproximate Reasoning Models", Ellis
Horwood Series in Artificial Intelligence, London.
8. Sierra C., Agusti J . (1990): "COLAPSES: Syntax and Semantics", CEAB
Reserach Report 90/8.
13
FUZZY LOGIC
WITH LINGUISTIC QUANTIFIERS
IN GROUP DECISION MAKING
Abstract
We present how fuzzy logic with linguistic quantifiers, mainly its calculi of
linguistically quantified propositions, can be used in group decision making.
Basically, the fuzzy linguistic quantifiers (exemplified by most. almost all, ... ) are
employed to represent a fuzzy majority which is in many cases closer to a real
human perception of the very essence of majority. Fuzzy logic provides here means
for a formal handling of such a fuzzy majority which was not possible by using
traditional formal apparata. Using a fuzzy majority, and assuming fuzzy individual
and social preference relations, we redefine solution concepts in group decision
making, and present new «soft» degrees of consensus.
Keywords
Fuzzy logic, linguistic quantifier, fuzzy preference relation, fuzzy majority,
group decision making, social choice.
1. INTRODUCTION
Decision making, whose essence is basically to find a best option from among
some feasible (relevant, available, ... ) ones, is what human beings constantly face
in all their activities. In virtually all nontrivial situations decision making does
require intelligence.
264
matter which group choice procedure we will employ, it will satisfy some set of
plausible conditions but not another set of equally plausible ones. This general
property pertains to all possible choice procedures, so that attempts to develop new,
more sophisticated choice procedures do not seem very promising in this respect
Much more promising seems to be to modify some basic assumptions underlying
the group decision making process. This line of reasoning is pursued here.
Since the process of decision making, notably of group type, is centered on the
human beings, with their inherent subjectivity, imprecision and vagueness in the
articulation of opinions, etc., fuzzy sets have been used in this field for a long time.
A predominant research direction is here based on the introduction of an individUIJI or
social fuzzy preference relation which is then used to fmd some choice sets. There is
a rich literature on this topic (cf. Tanino, 1984, 1988, or many articles in Kacprzyk
and Fedrizzi, 1990), and since this is not explicitly related to the use of fuzzy logic,
we will not discuss these issues in more detail here (though we will assume that the
preference relations are fuzzy). We will concentrate on other elements of group
decision making models where a contribution of fuzzy logic can be explicitly
demonstrated.
One of basic elements underlying group decision making is the concept of a
majority (notice that the solution is to be some option(s) best acceptable by the
group as a whole, that is by most of its members since in no real situation it would
be accepted by all). Some of the above mentioned problems with group decision
making are closely related to a (too) strict perception of majority (e.g., at least a
half). A natural line of reasoning is to try to somehow make that strict concept of
majority closer to its human perception. And here, we fmd many examples in all
kinds of human judgments that what the human beings consider as a required
majority to, say, justify the choice of a course of action is often much more vague.
A good example in a biological context may be found in Loewer and Laddaga
(1985): «... It can correctly be said that there is a consensus among biologists that
Darwinian natural selection is an important cause of evolution though there is
currently no consensus concerning Gould's hypothesis of speciation. This means
that there is a widespread agreement among biologists concerning the frrst matter
but disagreement concerning the second ... ». A rigid majority as, e.g., more than
75% would evidently not reflect the essence of the above statement. It should be
noted that there are naturally situations when a strict majority is necessary, for
obvious reasons, as in all political elections.
To briefly summarize the above considerations, we can say that a possibility to
accommodate a less rigid (<<soft») majority (as, say, an equivalent of a widespread
agreement in the above citation) would certainly help make group decision models
more human consistent
It is easy to see that most natural manifestations of such a «soft» majority are
the so-called linguistic quantifiers as, e.g., most. almost all. much more than a hal/.
etc. One can readily notice that no conventional formal (e.g., logical) apparatus
provides means for handling such quantifiers since, e.g., in virtually all
conventional logics only two quantifiers, at least one and all are accounted for.
Fortunately enough, there have been proposed in recent years some fuzzy -logic
- based calculi of linguistically quantified propositions (Yager, 1983a, b; Zadeh,
1983) which can make it possible to handle fuzzy linguistic quantifiers. These
calculi have been applied by the authors to introduce a fuzzy majority (represented
266
Qy'sareF (1)
QBy'sareF (2)
that is. say. «mOst (Q) of the important (B) experts (y's) are convinced (F)>>.
For our purposes. the main problem is now to find the truth of such
linguistically quantified statements. i.e. eiTher truth (Qy's are F) or truth (QBy's are
F) knowing truth (Yi is F). 'V Yi E Y. This may be done using two basic calculi.
one due to Zadeh (1983) and one due to Yager (1983a. b). In the following we will
present the essence of Zadeh' s calculus since it is simpler and more transparent.
267
hence better suited for the purposes of this volume. though we should bear in mind
that in many instances Yager's calculus may be more «adequate» (cf. Kacprzyk.
1986. 1987b; Kacprzyk and Fedrizzi. 1989).
In Zadeh's (1983) method. a/wzy linguistic quantifier Q is assumed to be a
fuzzy set defined in [0.1].
For instance. Q =«most» may be given as
which may be meant as that if at least 80% of some elements satisfy a property.
then most of them certainly (to degree 1) satisfy it. when less than 30% of them
satisfy a property. then most of them certainly do not satisfy it (satisfy to degree 0).
and between 30% and 80% - the more of them satisfy that property. the higher the
degree of satisfaction by most of the elements.
Notice that we will consider here the proportional quantifiers exemplified by
«most». «almost all». etc. as they are more important for the modelling a fuzzy
majority than the absolute quantifiers exemplified by «about 5». «much more than
10». etc. The reasoning for the absolute quantifiers is however analogous.
Property F is defined as a fuzzy set in Y. For instance. if Y = (X. Y. Z) is the
set of experts and F is a property «convinced». then F may be exemplified by F =
«convinced» =0.1/X + 0.6IY + 0.8/Z which means that expert X is convinced to
degree 0.1. Y to degree 0.6 and Z to degree 0.8. If now Y = (Yt. .. .. yp), then it is
assumed that truth (Yi is F) =J.l.F (Yi). i =1• ...• p .
The value of truth (Qy's are F) is determined in the following two steps (Zadeh.
1983):
= L JlS<Y)
i-I
1\ Jl..<Y) / L JlB(Yj)
I-I
(6)
The essence of these two steps is similar as that of (4) and (5).
C ={Sj e S: ..., 3 Si e S such that r~ > 0.5 for at least r individuals} (9)
i.e. as a set of options not sufficiently (at least to degree a) defeated by the required
majority.
Suppose now that the required majority is imprecisely specified as, e.g., given
by a fuzzy linguistic quantifier as, say, most defined by (3).
While trying to redefine the above concepts of cores under a fuzzy majority, we
start by denoting
270
It
h ij = 1 ifrt < 0.5
=0 otherwise (11)
where here and later on in this section. if not otherwise specified. i. j = 1•.. .• n and
It
k = 1•.. .• ffi. Thus. hij reflects if option Sj defeats Sj or nOL
Then
(12)
(13)
(14)
(15)
i.e. a fuzzy set of options that are not defeated by Q (say. most) individuals.
Analogously. by introducing a threshold of the degree of defeat in (11). we can
define the fuzzy a./Q - core. First. we denote
It It
hij(a) = 1 if hij < a ~ 0.5
= 0 otherwise (16)
It
and then. following the line of reasoning (12) - (15). and using hj (a) • hj (a)
(17)
i.e. a fuzzy set of options that are not sufficiently (at least to degree 1 - a) defeated
by Q individuals.
271
We can also explicitly introduce the strength of defeat into (11) and define the
fuzzy s/Q - core. Namely, we can introduce a function like
k
hij = 2 (0.5 -
k
rij) if rt < 0.5
= 0 otherwise (18)
and then, following the line of reasoning (12) - (15), but using h~ ,hj and Y~
instead of h ~ , h j and Y~, respectively, we define the fuzzy slQ - core, as
(19)
i.e. as a fuzzy set of options that are not strongly defeated by Q individuals.
Example 2. Suppose that we have four individuals, k = 1,2, 3,4, whose fuzzy
preference relations are
j j
o.t]
Rl = [ 0~7 0.3 0.7
o 0.6 0.6
0.3 0.4 0 0.2 R2 =
0.4 0.6 0.2]
[ 006 o 0.7 0.4
0.9 0.4 0.8 0 0.4 0.3 0 0.1
0.8 0.6 0.9 0
j j
[ 0~5 [ 0.30~6
0.5 0.7 0.4 0.7 0.8]
o 0.8 o 0.4 0.3
0.3 0.2 0 0.6 0 0.1
1 0.6 0.8 0.7 0.7 0.9 0
Suppose now that the fuzzy linguistic quantifier is Q = «most» defined by (3).
Then, say,
Ccmost» = 17 S2 + 1/ S4
30
CO.3/cmost» = 0.9/S4
Cs/cmost» = 0.4/S4
272
that is. for instance. in case of Ccm01t» option S2 belongs to the fuzzy Q - core to
the extent 17/30 and option S4 to the extent 1. and analogously for Co.3/cmost» and
C./cmo1t». Notice that though the results are different, for obvious reasons. S4 is
clearly the best choice which is evident if we examine the given individual fuzzy
preference relations.
1
= mLat
ID
rij if i 'jt: j
.1<-1
=0 otherwise (20)
where
k
aij = 1 if rt
> 0.5
=0 otherwise (21)
Notice that R need not be reciprocal (for reciprocal R 10 .... Rm). For other
approaches to the determination ofR. see. e.g.• Blin and Whinston (1973).
We will discuss now the second step. i.e. R -+ solution. that is how to
determine a solution from a social fuzzy preference relation. A solution concept of
much intuitive appeal is here the consensus winner (Nurmi. 1981) which will be
extended here under a fuzzy majority expressed by a fuzzy linguistic quantifIer.
We start with
(23)
which is a mean degree to which option s;, is preferred over all the other options
options. Next
273
(24)
(25)
i.e. as a fuzzy set of options that are preferred over Q other options.
And analogously as in the case of the core, we can introduce a threshold to (22),
i.e.
and then, following the reasoning (23) and (24), and replacing gi and zb by gi (a),
and zb (a), respectively, we can defme the fuzzy a/Q - consensus winner as
i.e. as a fuzzy set of options that are preferred over Q (say, most) other options.
Furthermore, we can also explicitly introduce the strength of preference into (22)
by, e.g., defining
and then, following the reasoning (23) and (24), and replacing gi and zb by &i
and z.b ,respectively, we can define thefuzzy slQ - consensus winner as
(29)
i.e. as a fuzzy set of options that are strongly preferred over Q other options.
For more details on the above solution concepts, as well as on some other ones,
see, e.g., Kacprzyk (1985b, c; 1986a) and Kacprzyk and Nurmi (1988).
Example 3. For the same individual fuzzy preference relations as in Example 2,
and using (20) and (21), we obtain the following social fuzzy preference relation
274
R =
which is not to be read similarly as for the fuzzy cores in Example 2. Notice that
here once again option s4 is clearly the best choice which is obvious by examining
the social fuzzy preference relation.
This concludes our brief exposition of how to employ fuzzy linguistic
quantifiers to model the fuzzy majority in group decision making. For readability
and simplicity we have only shown the application of Zadeh's calculus of
linguistically quantified propositions. The use of Yager's calculus is presented in the
source papers by Kacprzyk (1984; 1985b, c; 1986a; 1987a) or in the surveys by
Kacprzyk and Nurmi (1989) or Fedrizzi, Kacprzyk and Nurmi (1989). On the other
hand, information on some newer solution concepts based on individual and social
fuzzy preference relations which are the so-called fuzzy tournaments may be found in
Nurmi and Kacprzyk (1990).
where here and later on in this section. if not otherwise specified. k 1 = 1•...• m - 1;
k2 = ki + 1•...• m; i = 1•...• n - 1; j =i + 1•...• n.
Relevance of options is assumed to be a fuzzy set defined in the set of options
such that JJ.B (sJ E [0.1] is a degree of relevance of option Si: from 0 standing for
«defmitely irrelevant» to 1 for «definitely relevant». through all intermediate values.
Relevance of a pair of options. (Si. Sj) E S X S. may be defmed in various ways
among which
(31)
b~ 1. k2 • E [0. 1] may also be defined in various ways among which the mean
value of type (31) is the most straightforward. and will be used here too.
The degree of agreement between individuals ki and k2 as to their preferences
between all the relevant pairs of options is
.-1 • 8-1 •
(33)
(34)
Since the strict agreement (30) may be viewed too rigid, we can use the degree of
sufficient agreement (at least to degree a E [0, 1]) of individuals ki and k2 as to
their preferences between options Si and Sj, dermed by
a
Vij (kI, k2) = 1 if Ir~-rm ~I-a~I
=0 otherwise (36)
Then, following the reasoning (31) - (35), we obtain the degree of sufficient
agreement (at least to degree a) of Q2 pairs of individuals as to their preferences
between QI pairs of relevant options (with replacements similar to those in Section
3), called the degree of alQlIQ21IIB - consensus, given by
We can also explicitly introduce the strength of agreement into (30), and
analogously define the degree of strong agreement of individuals ki and k2 as to
their preferences between options Si and Sj, e.g., as
such that x' < x" -+ s(x') ~ s (x"), for all x', x" E [0, 1], and s(x) = 1 for some x
E [0, 1].
Then, following the reasoning (31) - (35) (with replacements similar to those in
Section 3), we obtain the degree of strong agreement of Q2 pairs of important
individuals as to their preferences between Ql pairs of relevant options, called the
degree of slQl1Q21I1BI - consensus, as
j
1 2 3
j
1 2 3
j
1 2 3
Therefore:
conB (<<most», «most», I, B) == 0.35
conO. 90 (<<most», «most», I, B) == 1.0
coni (<<most», «most», I, B) == 0.75
278
For more information on these degrees of consensus, see Fedrizzi and Kacprzyk
(1988), Kacprzyk (1987a) and Kacprzyk and Fedrizzi (1986, 1988, 1989). Moreover,
the use of Yager's fuzzy - logic - based calculus of linguistically quantified
propositions is given in Kacprzyk and Fedrizzi (1989).
s. CONCLUDING REMARKS
In this paper we have tried to show how fuzzy logic with linguistic quantifiers
can be used to model a fuzzy majority, and then to define new solution concepts and
degrees of consensus based on the fuzzy majority. Fuzzy quantifiers are certainly a
natural way of representing a fuzzy majority which cannot practically be adequately
represented by conventional formal means. On the other hand, fuzzy logic based
calculi of linguistically quantified propositions, in particular the one employed in
this paper, offer much simplicity and intuitive appeal, and can help attain more
human consistent, hence more adequate and easier implementable group decision
making and consensus formation models.
BmLIOGRAPBY
ARROW KJ. (1963), Social Choice and Individual Values, 2nd ed. Yale University
Press, New Haven.
BUN J.M. and A.P. WHINSTON (1973), Fuzzy sets and social choice. Journal of
Cybernetics, 4, 17 - 22.
BRAYBROOK D. and C. LINDBLOM (1963), A Strategy of Decision. Free Press,
New York.
CALVERT R. (1986), Models of Imperfect Information in Politics. Harwood
Academic Publishers, Chur.
FED RIZZI M. (1986), Group decisions and consensus: a model using fuzzy sets
theory (in Italian). Rivista per Ie scienze econ. e soc. A. 9, F. 1, 12 - 20.
FEDRIZZI M. and J. KACPRZYK (1988), On measuring consensus in the setting of
fuzzy preference relations. In J. Kacprzyk and M. Roubens (Eds.), Non -
Conventional Preference Relations in Decision Making. Springer - Verlag,
Berlin - New York - Tokyo, 129 - 141.
FEDRIZZI M., J. KACPRZYK and S. ZADROZNY (1988), An interactive multi - user
decision support system for consensus reaching processes using fuzzy logic
with linguistic quantifiers. Decision Support Systems 4, 313 -327.
KACPRZYK J. (1984), Collective decision making with afuzzy majority rule. Proc.
WOGSC Congress, AFCET, Paris, 153-159.
KACPRZYK J. (1985a), Zadeh's commonsense knowledge and its use in
multicriteria, multistage and multiperson decision making. In M.M. Gupta et
al. (Eds.), Approximate Reasoning in Expert Systems, North - Holland,
Amsterdam,105-121.
KACPRZYK J. (1985b), Some «commonsense» solution concepts in group decision
making via fuzzy linguistic quantifiers. In J. Kacprzyk and R.R. Yager
(Eds.), Management Decision Support Systems Using Fuzzy Sets and
Possibility Theory. Vedag TOv Rheinland, Cologne, 125-135.
279
Universita di Torino
Dipartimento di Informatica
Corso Svizzera 185
10149 TORINO (Italy)
E-mail: [email protected]
ABSTRACI'
In this paper we brieny survey the problems arising in learning concept
descriptions from examples in domains affected by uncertainty and vagueness. A
programming environment, called SMART-SHELL, is also presented: it addresses
these problems, exploiting fuzzy logic. This is achieved by supplying the learning
system with the capability of handling a fuzzy relational database, containing the
extensional representation of the acquired logic formulas.
INTRODUCI'ION
Knowledge acquisition has been recognized as a major problem for the quick
and low cost development of expert systems. In fact, knowledge elicitation is a hard
and time consuming task, especially in domains where there is a lack or shortage of
human experts and/or the knowledge is difficult to be formalized. As a consequence,
automated learning methods became appealing and machine learning is now receiving
an increasing attention.
Even though the complete automatization of the knowledge acquisition
process is beyond the possibilities of the current AI technology, developing tools
allowing a substantial part of the necessary knowledge to be flfSt acquired, and, next,
maintained and updated, is both a medium-term reachable goal and a very useful one.
These tools are likely to become, in the future, a fundamental part of expert systems
builders, provided that adequate interfaces towards knowledge engineers and domain
expert will be supplied.
Traditionally, machine learning tasks have ranged from acquiring concepts
descriptions from examples £1-4] to improving planning heuristics [5-7] and
knowledge representation schemes included logical formulas, decision trees (or
networks), production rules and semantic networks [8-10,11,31]. Researches on
scientific discovery [12-14] and concept formations [7,15-17] received also attention.
In all these problems, the notion of learning as a search process, in a space of
descriptions or hypotheses, plays a central role [18], especially in inductive
approaches.
Recently, new trends emerged, such as the proposal of chunking as a
general cognitive architecture [19] and the use of deductive methods to performs
"justified" learning [5,20-24] . As new, more complex tasks are faced, methods
282
become more refined and integrated models of learning are proposed with the hope of
coping with the complexity of real world tasks [25-28].
A great activity is also going on in the field of connectionist models of
learning, as it appears, from instance, from [29]. Another interesting approach is
also constituted by the genetic algorithm [30], presented as a general-purpose learning
method for parallel rule systems.
Unfortunately, many learning systems only work in ideal domains, in which
noise in the data and uncertainty in the task are absent. However, the effective use of
learning systems in real-world applications substantially depends upon the ability
these systems show in handling noise. Some kind of problems arising in real
applications are summarized in [31].
Several systems are provided with mechanisms for facing statistical noise,
such a'i the pruning techniques proposed to limit the sizes of decision trees [32-34].
Similar methods have been also proposed for knowledge represented in the form of
production rules, as in the AQ15 system, where the initially acquired rules are
truncated to limit complexity and avoid overfitting [35]. This kind of noise mainly
concerns random errors in assigning a value to an attribute or a label to a training
event. Several experiments have been performed to investigate the effects of this
noise on the effectiveness of the acquired knowledge [36].
However, statistical noise is not the only source of problems; in fact,
relevant concepts and relations can be ill-defined and vague. To this purpose, the
fuzzy set theory seems the most appropriate tool for handling this type of
uncertainty. We have to notice that a continuous-valued semantics, associated to the
description language, is a major source of complexity in learning methodologies.
Hence, very few systems are able to handle it explicitly and most of these limit
themselves to attaching weights to the pieces of acquired knowledge [37,38].
Fuzzy sets occur in symbolic learning methodologies with different roles.
In [39], they are used to describe concepts and the varying degrees of typicality of
their instances. In [40] the intensional description of a set of classes, to be
discriminated from each other, are expressed as fuzzy languages, learned from a set
of examples. This approach has been applied to problems in medical diagnosis [41].
Finally, in ML-SMART, a system which learns concept descriptions from
examples [42,43] and a domain theory [26,27], the use of fuzzy set theory has proved
to be very suited to transform continuous-valued features into a set of categorical
attributes and, in general, to define the vague semantics associated with real-world
terms and predicates, both in the description of the examples and in the domain
theory. ML-SMART is a learning system which uses a full memory approach,
supported by the special-purpose shell SMART-SHELL [44,45], especially designed
to ease the development of different learning systems. SMART-SHELL mainly
consists of a logic programming environment, interfaced toward a relational data-base
through a set of operators implementing the basic primitives necessary for a learner.
The logic environment has been realized in Common Lisp, whereas the data-base
manager has been tailored for the specific class of applications. This data-base differs
from the commercial relational data-bases in the sense that many standard features
have not been implemented, being not relevant to the particular use it is oriented to,
whereas other important aspects have been enhanced: a query language based on full
first order logics, including a set of non-standard quantifiers, and the capability of
handling continuous-valued semantics.
This paper is organized as follows. Section 2 briefly describes the learning
framework used in systems like ML-SMART. Section 3 describes the logic language
used for representing both the background knowledge and the acquired knowledge,
whereas Section 4 illustrates the behaviour of the basic operators, interfacing the
data-base and the learning system. Finally, Section 5 presents some conclusions.
283
EX •• EX.2 EXIJ
D. & 8·
8 b
0
II
m
~
Fig. 1 - Example of instances from a block world domain.
Circle(xl) Large(xI)
F H xl F H xl Triangle(xl) 1\ Large(xI)
EXI I b EXt I a F H xl
EXt I d EXt I c
I EXt I
EX~
EX 2
~
1 EX2 2 fh EXI I
2
a
tt
EX~ I k EX2 2
~*~ I J
~*~ I
EX I 1 k
m
(a) (b)
The learning process can be modeler as a search through the space of fonnulas which
can be generated in this way [18,43]. However, the set of fonnulas having a non
empty extension may be too large and cannot be searched exhaustively. For this
reason, ML-SMART uses several strategies for limiting the the number of fonnulas
which are actually created and tested. In particular, it develops a tree of fonnulas
using a set of specialization operators; the root of the tree is the maximally general
formula "true", which obviously holds for all the events in FO, and the leaves are
either fonnulas corresponding to acceptable concept descriptions or fonnulas which
are no more interesting. Three kinds of criteria are used to bias the inductive process
in order to limit the size of the tree:
Simplicity and readability of the fonnulas.
Statistical criteria: fonnulas verified by many examples are preferred.
If background knowledge is available, fonnulas which can be deduced from it are
preferred. Moreover, fonnulas contradicting the background knowledge cannot be
generated. An extensive description of the methodology can be found in [42,43].
The SMART-SHELL environment provides the basic operators of
speciali7.ation (and generalization) necessary to implement a problem solver of the
type of ML-SMART, as well as a forwardlbackward inference engine, and the
primitives necessary for implementing the search strategies. The relational data-base
is a special-purpose one, implemented in such a way to achieve high speed on the
most critical operations. On the top of this data-base, a logic environment has been
implemented, as well as a user interface, designed to eases the process of supplying
the system with the background knowledge and application description.
The tool SMART-SHELL basically consists of three main modules:
SMART-CONF, SMART-RUN and SMART-DATABASE which provide the user
with a knowledge editor, a set of high level primitives and a data-base manager,
respectively. The scheme of the system is reported in Fig. 3.
The module SMART-CONF consists of a user-friendly interface, usable to
describe both the background knowledge in input to the learner and other kind of
knowledge (control knowledge) which has to be used by the learning strategies.
Moreover, it contains a set of compilation procedures which translate this kind of
knowledge in a more efficient fonn, internally used by the other two modules.
285
Applications
BackjpOOd
and control
knowledge
SMART
SMART-SHELL
CONF
Evaluable
predicaaes
KNOWLEDGE REPRESENTATION
where p is a predicate belonging to a predicate set p. the terms tl.t2 •....•tk and
sl.s2 •....•sn can be variables. constants or functions and <p is a logical expression
built up using predicates in the set p. the connectives 1\ and -, and the quantifiers
ATM. ATL and EX. These quantifiers stand for ATMost. ATLeast and EXactly.
respectively. and can be considered as an extension of the standard existential
quantifier (similar to the numeric quantifiers used in the system INDUCE [44]).
Fuzzy quantifiers are a very important extension to logical languages and have been
proposed and deeply analyzed by Zadeh [49] . More precisely. let 'I'(x l.x2•...•xm) be a
logical expression built up using only the connectives 1\ and -,; then. the expression:
is true of a given example f iff there exist at least n different bindings. between the
variables variables yl.Y2 •...• yk and the objects occurring in f. satisfying '1'. In an
analogous way. ATM n <ylS2 •...Sk> ['I'(xl.x2 •...•xm)] and EX n <Yl.Y2 •...• yk>
['I'(x l.x2 •...•xm)] require at most n and exactly n different bindings in order to be
satisfied. Notice that. for n = 1. the quantifier ATL corresponds to the existential
quantifier 3. whereas. for n =O. the quantifier ATM n corresponds to -,3. Quantifiers
can be nested according to the usual rules of the predicate calculus. For instance. the
expression :
is an example of a wff of the language (, (provided that the predicates Triangle and
Circle belong to P).
However. some structural restrictions are imposed on the formulas of (,. In
particular. the set of basic predicates P is divided into two disjoint subsets. p(o) and
p(n). The set p(o) contains predicates whose extension is evaluable. on the learning
set, by means of queries to the data-base manager; examples of predicates belonging
to p(o) are the ones reported in Fig. 2(a). By contrast, predicates in p(n) are defined
by means of implication rules such as (1). We recall that a predicate is evaluable
when the data-base manager has a procedure for computing. through a selection
operation. its extension from a given relation [47]; examples of standard evaluable
predicates are the arithmetic predicates >.< and =. According to the defmition given in
[26]. predicates in p(o) will be said operational and predicates in p(n) non-
operational.
=
Given the set of concepts HO (hl.h2 •...•hn). each concept hi corresponds
to a non-operational predicate. which is true of f iff f is an instance of hi' Concepts
descriptions are wffs of the type:
<p(o) ~ hi (hi E HO) (4)
where <p(o) is a conjunctive wff containing only operational predicates. As. in
general. more than one formula (4) is needed to completely define hi. all these
287
(5)
Expression (5) states that variables occurring in a negated predicate must also occur
in a non-negated one in the same formula. In this way. a simple extension of the
SLD-resolution can be used as inference engine.
In order to cope with the vagueness invariably associated with real-world
applications. a continuous-valued semantics has been associated to the language L.
Each formula <p(x1 •..•xn) E L has a corresponding truth degree ~ E [0.11. computed
by combining the truth degrees of the predicates occurring in <po For this reason. the
relation <p*. associated to the formula <po has been extended (with respect to the
format described in Fig. 2). by adding a new field M. containing the truth value ~ of
<p(x 1....xn) when x1•....• xn are bound to the objects specified in the corresponding
tuple.
The semantics of an operational predicate can be defined in two ways:
extensionally. by giving the corresponding relation on the data-base. or intensionally.
by defining a function on attribute values. This two specification forms can both be
used in the system. In particular. the implicit form is more compact and efficient but
needs an analytic definition. whereas the explicit form can always be given by simply
filling up a table when an analytic expression is not available. An example of
extensional semantic definition is given in Fig. 4.
F M xl x2 IN (xl, x2)
(Object xl is inside object x2)
EXI d c
EX~
EX
1.8
1.
1.
g
1 t
EX3 1.0 n m
Furthermore. to ease the writing of semantic functions. the learning events FO are
usually described by means of a set of numerical and categorical attributes a1 •... ,8n:
to this aim. a new type of relation has been introduced in SMART-SHELL: the
attribute values can be all collected into a unique (n+3)-ary relation. called OBJ. The
fields F. H and X contain the identifier f of an event, the classification h of f and the
identifier x of a part of f. respectively. whereas the other n columns store the values
of the defined attributes for the object x. An an examples. the relation OBJ for the set
of instances in Fig. 1 is reported in Fig. 5.
288
~~l I t coangle
lfCle 18 §1~ed
~~l I ~ ~~~ 18 ~~ed
~~~ ~ f ~ircle
quare 98 §1~ed
~~~ ~ R :m~~lg A~ 8~
~~~ r i t~ele AU ~~ed
~~~
EX3
I1 fm
~~l{
Square
~8
70
~~:a~
Oear
EX3 1 n Tnangle 15 Oear
(6)
where Ai is the domain of the attribute a i' A library of primitive functions has been
defined to this aim. For instance. the semantics of a Boolean predicate. such as
Triangle(x). can be specified as follows:
ir shape(x) = triangle then J.I.=l else J.I.=O.
Analogously. the continuous-valued semantics of the predicate small(x) can be
assigned as a membership function of the object x in the fuzzy set "small". This can
be done according to the following syntax:
fuzzy(O.lO.30.40,area(x» (7)
Expression (7) states that the fuzzy set "small" has been defined over the base
variable area(x) and has a trapezoidal shape. specified by the four values. O. 10. 30
and 40. expressed in same suitable measure units. The corresponding fuzzy set is
reported in Fig. 6. What is interesting. in SMART-SHELL. is that the user can give
a default semantic for a fuzzy set definition; then the system itself. by analyzing the
available examples. can adjust this definition or even learn it from scratch. This
facility eases the burden of the domain expert in precisely defining the meaning of the
tenns he/she uses.
Small
1
Area
Fig. 6 - Fuzzy set defining the semantics of predicate "smaU(x)".
289
Given the truth values u and v of two wffs <p and", of L. the semantics of <pA'" and
of <pV'lf is computed according to a pair of corresponding t-nonn a(u.v) and t-cononn
~(u .v); this evaluation reduces to the classical two-valued one in the case of Boolean
predicates.
The evaluation of a formula containing a negated predicate (as in fonnula
(5» is perfonned by evaluating the function a(u.l-v). where u is the evidence of
<p(x l.x2 •... ,y1 ,y2 •.. ·,ym) and v the evidence of ",(Yl ,y2 •... ,ym). For what concerns
the fuzzy quantifiers. a semantics of the type proposed by Yager in [48] has been
adopted. Let us consider the formula:
The evaluation of the other two fuzzy quantifiers can be derived from the following
relationships:
ATM m <y > [",(X)] =--, ATL(m+l) <y > [",(X)] (10)
EX m <y > [",(X)] = ATL m <y > [<P(x)] A ATM m <y > [",(X)] (11)
For efficiency reasons. the procedures for evaluating the predicate semantics and for
updating the evidence of the formulas are handled by the data-base manager
program. The truth evaluation of a non-operational predicate activates a deductive
procedure which builds a corresponding operational fonnula. evaluable as described
above.
The basic operations available in the inductive part of the system are specialization
and generalization. Each one of them can be performed by applying different
operators. as described in the following.
Specialization operators
Specialization by detailing. Given a formula <p(xl.x2 .....yt.y2 .... ,yn). one way of
obtaining from it a more specific formula", is by adding to it a predicate containing
a subsets of the variables occurring in <p:
In this way. the original description is enriched with some new details on the same
objects considered before. Given the relation <p*. the extension ",* of", is built up
by selecting from <p* those tuples satisfying the predicate p(y l •...• yn). The relation
'1'* will have the same number of columns as <p*. An example is given in Fig. 7.
cp(xl.x2)=Triangle(xl) I\Large(xl) I\Circle(x2) ",(xl.x2)=Triangle(xl) I\Large(xl) 1\
lil
F M xl x2
.,
0.6 a e
0.6 a b
8:g
0.6
~
d
I
1
On(xl.x2)
E~l 8:g
0.6
a
a
a
td
J
EX3 0.6 J k EX3 0.6 j k
Specja1i za tjon by gegation. Let <p(xl ..... xk,yl •...• yn) and P(xl .....xk) be two
formulas. Then. the ~ew formula
is obtained by negating the assertion p for the objects bound to <x I •...• xk> in <po
This operator is based on the negation as failure paradigm : given the extensions <p*
and p*. the resulting relation is obtained from <p* by removing those tuples which do
not verify p. An example is given in Fig. 8.
",*
F M xl F M xl
-, Clear(xl)
EXI
EXI
EXI
1.~
1.
1.
t
9 II
EXI
EX~
EX
EX3
1.~
1.
1.
1.0
"
1
I
k
~~~
EX l:~
1.
Ik
Specialization by con junction. Let cp(x I .....xk) and 'II(yt .....yn) be two formulas;
the formula P(xl •...•xk.yt •...• yn) = cp(xl •...• xk) A 'II(yt •...• yn) is more specific than
both cp and'll. A natural join is performed between the two relations cp* and '11*. The
resulting relation. an example of which is reported in Fig. 9. will have k+n+2
columns.
<p*
F M xl
0.6 a
~~ 8:& h
J
~ (xl.x2) .. TriangJe(xl) 1\
1\ Square(x2)1\ Large(x2)
Large(xl) 1\
F M xl x2
1.0 c
i*~ 1.0
1.0
i
m
<p(xl.x2)ETri~gJe(xl) I\Large(xl) 1\
.,
<p F M xl x2 1\ Triangle(xl) I\Large(xl)
0.6 a e
~l 8:&
0.6 J
a
!l ~
k
AlL 2 <x2>
'I'
• F M xl
EXI 0.6 a
Generalization operators
Only one basic generalization operator has been considered, i.e., the one that
performs a disjunction of two formulas having the same number of variables. Let
cp(x I ,... ,xk) and ",(x I , ...,xk) be two formulas with the same number of variables; the
formula p(x I ,... ,xk) = ",(x I ,... ,xk) v <p(x I ,... ,xk) is a generalization of both. A
merging operator, similar to the union operator of relational algebra, is used for
implementing this operation. In Fig. 11 an example is reported.
F M xl
1.0 b
~~ 1.0 g
'I' *
EX2 1.0 1
F M xl 1.0 I
~~ 1.0 k
EX3 1.0 n
~~~
1.0
1.0
P
1
1.0 I
~~~ 1.0 k
Using the described basic operators, any kind of formula in the relational calculus,
extended with the above defined non standard quantifiers, can be built up.
where the non-operational predicate PI (n) has been replaced by the body of the rule
(13), after applying the unification with the terms occurring in g. However, suppose
we know the extension g* on FO of the operational subformula PI (0) 1\ P2(0) 1\ ... in
the goal g; then, the extension g'* can be easily computed by specializing g* with
the formula q 1(0 ) 1\ q2 (0 ), i.e., by using the specialization operators defined in the
previous section.
This basic deductive step corresponds to the one used in deductive data-bases,
which utilize the method of "queries and subqueries" [50]; in particular, SMART-
SHELL incorporates a deductive data-base of this form, which has been obtained by
extending Robinson's LOGLISP [51].
Inductive specialization and deductive steps can be also easily interleaved,
realizing an effective integration of analytical and empirical learning (27): in this
framework, specialization steps allow one to modify the partial operational
descriptions obtained from the theory, thus improving their classification
performance. On the other hand, the deductive use of background knowledge supplies
a skeleton for the inductive process, limits the search space and gives structural
meaning to the obtained concept descriptions.
Finally, as the semantics of the predicates can be freely defined by the user,
he is also allowed to change it dynamically, in the sense that the shell provides a
mechanism to firstly define non-operational predicates through a set of Hom clauses
and then move them to an operational state, by deducing their operational form in a
context free environment. In this case the predicates' semantics is given
extensionally, by means of the relations built up during the former process.
CONCLUSIONS
In this paper we have described the tool SMART-SHELL, designed to ease
the development of learning systems oriented to classification and diagnostic expert
systems. The learning framework is based on an integrated paradigm allowing
empirical learning (i.e. induction) and analytic learning (i.e explanation-based
learning) to be the interleaved. This paradigm, that proved very effective in practice,
can be easily implemented using a deductive data-base. Then, the environment
SMART-SHELL can be considered as a special-purpose deductive data-base, extended
in order to support the development of knowledge-intensive learners. An important
feature of the system is the capability of handling fuzzy relations.
So far, SMART-SHELL has been used to develop four families of learners,
the best known being ML-SMART; they have been applied in several real-world
domains, such as pattern recognition [43] and fault diagnosis of electromechanical
equipments [45] among others. In these applications SMART-SHELL proved to be
reliable enough and usable even by peoples who did not participate to the
implementation of the tool itself (one version of ML-SMART has been develop by
the SOGESTA s p. .a). The help obtained for speeding up the prototyping time has
been evaluated excellent, when used by a trained programmer; several prototypes have
been developed in few weeks.
The facility of handling fuzzy logic was also a key for the success,
especially in diagnostic problems, where coping with the vagueness of the terms used
by a human expert and with the approximation of the measures cues was a must.
294
Moreover, the possibility of automatically acquiring the required fuzzy set defmitions
greatly enhance the system's usefulness.
REFERENCES
1. F. Hayes-Roth and J. McDermott, "An Interference Matching Technique for
Inducing Abstractions," Communications of the ACM, vol. 21, no. 5, pp. 401-
410, 1978.
2. R. S. Michalski, "Pattern Recognition as Rule-guided Inductive Inference,"
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 2, pp.
349-361, 1980.
3. S. A. Vere, "Induction of Concepts in the Predicate Calculus," in Proc. of
the Fourth IJCAI, pp. 281-287, Thilisi, USSR, 1975.
4. P. H. Winston, "Learning Structural Descriptions from Examples," in The
Psychology of Computer Vision, ed. P.H. Winston, McGraw Hill, New York,
1975.
5. S. Minton, J.G. Carbonell, "Strategies for Learning Search Control
Rules: An Explanation-Based Approach," in Proc IJCAI-B7, pp. 228-235,
Milano, Italy, 1987.
6. L. Rendell, "A General Framework for Induction and a Study of Selective
Induction," Machine Learning, vol. I, pp. 177-226, 1986.
7. D.B. Lenat, "The Role of Heuristics in Learning by Discovery: Three Case
Studies," in Machine Learning, An Artificial Intelligence Approach, ed. R. S.
Michalski, J. G. Carbonell, T. M. Mitchell, pp. 243-306, Tioga Publishing
Company, 1983.
8. R. S. Michalski, J. Carbonell, and T. Mitchell, Machine Learning. An
Artificial Intelligence Approach. Vol. I, Tioga Publishing Company, Palo
Alto, CA, 1983.
9. R. S. Michalski, J. Carbonell, and T. Mitchell, Machine Learning. An
Artificial Intelligence Approach, Vol. 2, Morgan Kaufmann, Los Altos, CA,
1985.
10. R. S. Michalski and Y. Kodratoff, eds. : "Machine Learning: An Artificial
Intelligence Approach", vol. 3, Morgan Kaufmann, Palo Alto, CA, 1988.
11. Artiricial Intelligence, Special Issue on Machine Learning, J. Carbonell
(Ed.),~, N. 1-3 (1989)
12. P. Langley, G.L. Bradshaw, and H.A. Simon, "Rediscovering Chemistry with
the Bacon System," in Machine Learning. An Artificial Intelligence Approach,
ed. R. S. Michalski, J. G. Carbonell, T. M. Mitchell, pp. 307- 330, Tioga
Publishing Company, 1983.
13. P. Langley, J.M. Zytkow, H.R. Simon, and G.L. Bradshaw, "The Search of
Regularities: Four Aspect of Scientific Discovery," in Machine Learning , An
Artificial Intelligence Approach, Vol. 2, ed. R. S. Michalski, J. G. Carbonell,
T. M. Mitchell, pp. 425-470, Morgan Kaufmann, Los Altos, CA, 1985.
14. B.C. Falkenhainer, R.S . Michalski, "Integrating Quantitative and Qualitative
Discovery: The ABACUS System," Machine Learning, no. 1-4, pp. 367-402,
1986.
15. R.S. Michalski and R.E. Stepp, "Learning from Observation: Conceptual
Clustering," in Machine Learning. An Artificial Intelligence Approach, ed. R.
S. Michalski, J. G. Carbonell, T. M. Mitchell, pp. 331-364, Tioga
Publishing Company, 1983.
16. M. Lebowitz, "Experiments with Incremental Concept Formation: VNIMEM,"
Machine Learning, no. 2-2, pp. 103-138, 1987.
17. D.H. Fisher, "Knowledge Acquisition Via Incremental Conceptual
Clustering," Machine Learning. no. 2-2, pp. 139-162, 1987.
295
J. F. BALDWIN
Engineering Mathematics Dept
University of Bristol
Bristol BS8 1TR
England
1. INTRODl 'CD ON
The use of "very likely" rather than a point probability value further complicates
matters. As a first approximation we might equate "very likely" with the interval
[0.9, I]. This means that the Pr(x wears large shoes I x is tall) lies in the interval [0.9,
1]. We could further express this as the necessary support in favour of (x wears large
shoes I x is tall) is 0.9 and the necessary support in favour of (x does not wear large
shoes I x is tall) is O. The term necessary support can be replaced with the term
"belief". We can also express this in the form of a mass assignment over the power
set of
{(x wears large shoes I x is tall), (x does not wear large shoes I x is tall)}
namely,
(x wears large shoes I x is tall) : 0.9
(x does not wear large shoes I x is tall) : 0
{(x wears large shoes I x is tall), (x does not wear large shoes I x is tall)} : 0 .1
where an assignment of mass m to set Y means m is the probability associated with
exactly Y but not to any subset of Y. The meaning of these various terms will be
expanded upon later in the paper. In order to more adequately capture the true
semantics of the vague statement "very likely" we require to model this linguistic
term using a fuzzy set, [ZADEH 1965]
1.2 AN EXAMPLE
Consider the following simple example. A bag contains 70% red balls and 30% blue
balls. Each ball is either large or small. 60% of the red balls are large and 40% of the
blue balls are large.
Problem la
What is the probability that a ball drawn randomly from the bag is large?
Of course this is a very elementary problem and can be solved by fusing the pieces
of information concerning the balls in the bag to calculate this probability. If {yl, y2.
y3, y4} stand for the probabilities (Pr(rl), Pr(rs). Pr(bl). Pr(bs)} respectively and r, I
signifies "red", "large" respectively then
yl +y2=0.7 ; y3+y4=0.3
ylt (yl + y2) = 0 .6 ; y3t (y3 + y4) = 0.4
so that yl = 0.42. y2 = 0.28, y3 = 0.12 and y4 = 0 .18
from which Pr(l) = yl + y3 = 0.54.
This is simply a probability logic problem. In the sequel this fusion of probabilistic
information will be done by means of a general assignment method.
Problem Ib
A ball drawn at random from the population is known to be large. What is the
probability that it is red?
The solution is given by ylt (yl + y3) = 0.7778 and comes from fusing the given
information using probability logic and calculating the required conditional
probability.
299
Of course y1/ (y1 + y3) = Pr(ll r)Pr(r) / Pr(l) which is Bayes theorem applied to this
problem. This can therefore be viewed as an updating problem in which the apriori
distribution {y1, y2, y3, y4} is updated using the certain information that the ball in
question is large.
Problem 2
The balls in the bag are shown, as a black and white image on a screen, one by one to
an observer. The observer is then asked if the third ball shown was red. The observer
believes that the third ball was large but is not certain of this fact He expresses this
belief as Pr(third ball shown is large) =0.8. He does not have information about the
colours of the balls shown. What should be his belief that it is red?
One possible answer to this problem is obtained by using Jeffrey's rule, [JEFFREY
1967] namely
Pr(third ball is r) =Pr(r Il)Pr(third ball is I) + Pr(r I s)Pr(third ball is s)
=0.7778 ... 0.8 + {y2/ (y2 + y4)} ... 0.2
= 0.7778 * 0.8 + 0.6087 ... 0.2 = 0.744
It looks as if we have used the theorem of total probabilities, namely,
Pr(third ball is r) =Pr(third ball is r Ithird ball is l)Pr(third ball is I)
+ Pr(third ball is red I third ball is small)Pr(third ball
is s)
with the assumption that
Pr(third ball is r Ithird ball is 1) =Pr(r II) and
Pr(third ball is r I third ball is s) =Pr(r Is).
We do not have to make this assumption if the following philosophy is accepted. The
apriori distribution over the labels {rI, rs, bl, bs} is {yl, y2, y3, y4}. This is to be
updated using the specific information P'r(l) =0.8, where the • is used to signify that
this is not the proportion of large balls in the population but a belief in one particular
ball being large. We could update to (y'l, y'2, y'3, y'4l by choosing the (y'il such
that the relative information
is minimised. This will be discussed further later. This forms the basis of the iterative
assignment method to be discussed in detail in a later section.
Suppose it is known that at least 70% and at most 90% of middle aged persons who
have children are married. We will defme the fuzzy term "middle aged" using the
fuzzy set
middle_aged with membership function
l/5.x - 7 for 35 ~ x ~ 40
Xmiddle_aged(x) = 1 for 40 ~ x ~ 50
-l/5.x + 11 for 50 ~ x ~ 55
oelsewhere
We also defme the fuzzy term "abouc35" using the fuzzy set abouc35 with
membership function
1/5.x - 6 for 30 ~ x ~ 35
Xabout_35(x) = -1/5.x + 8 for 35 ~ x ~ 40
oelsewhere
Suppose it is known that Mary is about 35 and it is believed with a probability of at
least 0.8 that Mary has children.
The methods of inference developed below will allow us to answer this query for this
program. In deriving this interval both the probabilistic and fuzzy types of
uncertainty must be taken into account. For example, the rule talks about middle
aged persons while the age of Mary is given as "about 35". From a syntactic point of
view it would appear that the rule has no relevance to Mary but from a semantic
point of view it does since someone who is "about 35" is to some degree middle
aged. This degree depends on the definitions of the fuzzy sets "middle aged" and
"about 35". In order to answer the query given it is necessary to determine an
interval containing the conditional probability Pr{age(mary, middle_aged) 1
age(mary 1 abouc35). We term this process "semantic unification", [BALDWIN
1990a].
When the second argument of the age predicate is always a crisp set then this interval
is
[0,0], [1, 1] or [0, 1].
For example,
Pr{age(mary, [40,50]) 1age(mary 1[35, 39])} =0 and
Pr{age(mary, [35,45]) 1age(mary 1[37, 42])} =1 and
Pr{age(mary, [40,50]) 1age(mary 1[45, 55])} is contained in the interval [0,1].
Semantic unification extends this to the case of fuzzy sets.
In this paper we will discuss methods for answering queries of the types given above
from a knowledge base expressed in rule form containing statements representing
general tendencies and also specific facts. The specific facts expressed in
probabilistic terms will be used as evidences to update the family of possible apriori
distributions obtained from the relevant general statements and this update used to
302
provide the answer to the given query.The rules and facts can contain both
probabilistic and fuzzy uncertainties.
The inference method for processing the knowledge base to answer queries will use
the foUowing three methods
(1) general assignment method
(2) iterative assignment method
(3) semantic unification.
In special cases the inference process simplifies to using the theorem of total
probabilities if only general statements from the same population are used or
Jeffrey's rule when both general statements and specific evidences are used. This is
the inference mechanism of the AI language FRIL.
Each of these methods requires the information in the form of a mass assignment
over a frame of discernment whose elements are labels formed from the information.
We discuss this more fully in the appropriate sections that follow. A general
treatment will not be given here and the reader is expected to generalise for
him/berself from those cases discussed. Other aspects can be found in [BALDWIN
1990b, 199Oc]
A knowledge base statement, either in the form of a rule or a fact, is convened into
the form of a mass assignment over a set of labels. Each label is a concatenation of
instantiations of the proposition variables and the proposition variables come from
all the information of the knowledge base relevant to answering the given query. The
following example will illustrate this. A general theory in terms of inference
diagrams and logic proof paths can be given but space does not allow this to be
included here.
Consider the knowledge base
fly(X) :- bird(X) : [0.9,0.95].
bird(X) :- penguin(X).
fly(X) :- penguin(X) : [0,0].
penguin(obj) : [0.4,0.4].
bird(obj) : [0.9, 1].
etc
[0.9.0.95] [0,0]
[0.9,1] [0.4,0.4)
For any knowledge base consisting of facts and rules and for any query, a frame of
discernment can be established using this method of constructing an inference
diagram and extracting the propositional variables. The inference diagram is
obtained by using the unification and backtracking mechanisms of Prolog with
extensions to include semantic unification as discussed later.
penguin(obj) : [0.4,0.4].
bird(obj) : [0.9, 1].
given above.
The first is equivalent to the following mass assignment over the set of labels L
m( (-.bp-.f, -.bpf, bp-.f, bpf}) = 0.4
m( (--.b-.p-,f, -,b--,pf, b-.p-,f, b-.pf}) = 0.6
which can be written as
Lp _} : 0.4
L -.p_}: 0.6
where _ can be instantiated to the appropriate proposition or its negation.
Similarly the second evidence is equivalent to the mass assignment
(b __ I : 0.9
L __ }: 0.1
Consider the general statements
fly(X) :- bird(X) : [0.9,0.95].
bird(X) :- penguin(X).
fly(X) :- penguin(X) : [0,0].
The second of these clauses say that the labels {-,bpf, -.bp-,f} are not possible. The
third says that (bpf) are not possible. The combined two statements says that the
labels (-.bpf, -.bp-.f, bpf) are not possible. We can therefore express the first clause
as a mass assignment over the reduced set of labels
L'= (-.b-.p-.f, -.b-.pf, b-.p-.f, b-.pf, bp-.f), by combining the following two
evidences, each expressed as a mass assignment over L'
(l) (b-.p _, bp-.f) : k , (-.b-.p _ ) : l-k
(2) (b-.pf) : 0.9k , (-.b-.p _, b _ -,f) : 1-O.9k
corresponding to
= =
Pr(bird(obj)} k, Pr(-. bird(obj)} l-k .
and
Pr(bird(obj) 1\ fly(obj)} = 0.9k Pr(-. (bird(obj) 1\ fly(obj»} = 1-0.9k
The combination of these two evidences, using the general assignment method
defined below gives the mass assignment over L' as
(b-.pf) : 0.9k
(-.b-.p_) : l-k
(b _ -.f) : O.1k
for the combined relevant general statements in the knowledge base. The conditional
statements of rules can always be treated in this way. Pure logic rules simple reduce
the set of possible labels.
We use the concept of belief and plausibility measures of [SHAFER 1976] to define
necessary support and possible support measures. Names are changed to be
consistent with the notation used in support logic programming, [BALDWIN 1986]
and the FRIL language [Baldwin et al 1987], and to avoid confusion with
305
conclusions and derived results based on the use of the Dempster rule of combining
evidences. The methods given here do not use the Dempster rule and the necessary
and possible supports are more in keeping with upper and lower probabilities,
[DUBOIS, PRADE 1986].
Axiom 1 (boundary condition). Sn(0) =0 and Sn(X) =1 where 0 is the empty set
Axiom 2 : Sn(A1u ~ u ... u ~) ~ Li Sn(Ai) - Lkj Sn(Ai '"' Aj)
n+l
+ ... + (-1) Sn(A 1 '"' ~ '"' ... '"' An)
for every collection of subsets of X.
For each A e P (X), Sn(A) is interpreted as the necessary support, based on available
evidence, that a given label of X belongs to the set A of labels.
When the sets AI' ~, ... , An in axiom 2 are pairwise disjoint ie.
(Ai'"' Ap = 0 for all i, j e {I, 2, ... , n} such that i '* j
the axiom requires that the necessary support associated with the union of the sets is
not smaller than the sum of the necessary supports pertaining to the individual sets.
The basic axiom of necessary support measures is thus a weaker version of the
additivity axiom of probability theory.
It is easy to show that axiom 2 above implies that for every A, B e P (X), if A C B,
then Sn(A) ~ Sn(B)
and also that
Sn(A) + Sn(A) ~ 1
Necessary support measures and possible support measures are therefore mutually
dual and it i~asy to show that
SP(A) + Sp(A) ~ 1
Sn(A) = L m (B)
B~A
and
Sp(A) = I: m(B)
AnBY!;0
which are applicable for all A £ P (X).
Focal Elements
Every set A £ P (X) for which meA) > 0 is called a focal element of m. We can
represent the mass assignment as (m, F) where F is the set of focal elements.
A support pair for A £ P (X) is given by [MIN Sn(A), MAX SP(A)] and this
dermes an interval containing the Pr(A) where the MIN and MAX are over the set of
values of any possible parameters that Sn(A) and Sp(A) may depend on. This will be
illustrated later.
3. GENERAl. ASSIGNMENT METHOD
Let m1 and m2 be two mass assignments over the power set P (X) where X is a set
of labels. Evidence 1 and evidence 2 are denoted by (mI, FI) and (m2, F2)
respectively, where FI and F2 are the sets of focal elements of P (X) for mI and m2
respectively.
m'(Lli IiL2j) = 0 if Lli Ii L2j = 0 the empty set; for i = 1•...• nl ; j = 1•...• n2
If there are more than two evidences to combine then they are combined two at a
time. For example to combine (mI. Fl). (m2. F2). (m3. F3) and (m4. F4) use
(m. F) =«(mI. Fl) e (m2. F2» e (m3. F3» e (m4. F4)
m(L2j)
L2j
The labels in a cell is the intersection of the subset of labels of evidence 1 associated
with the row of the cell and the subset of labels of evidence 2 associated with the
column of the cell.
308
The mass assignment entry in a cell is 0 if the intersection of the subset of labels of
evidence 2 associated with the column of the cell and the subset of labels of evidence
1 associated with the row of the cell is empty.
The mass assignment in a cell is associated with the subset of labels in the cell.
The sum of the cell mass assignment entries in a row must equal the mass
assignment associated with ml in that row.
The sum of the cell mass assignment entries in a column must equal the mass
assignment associated with m2 in that column.
If there are no loops, where a loop is formed by a movement from a non zero
assignment cell to other non zero assignment cells by alternative vertical and
horizontal moves returning to the starting point, then the general assignment problem
gives a unique solution for the mass assignment cell entries. If a loop exists then it is
possible to add and subtract a quantity from the assignment values around the loop
without violating the row and column constraints and the solution will not then be
unique. If a non-unique solution exists then the family of solutions can be
parametrised with known constraints on the parameter values. These possible
parameter values must be taken into account when determining support pairs from
the necessary and possible support measures.
{b-.pf} {b _ -. f}
O.9k O.lk
(2) {-.b-.p_l
0 l-k
309
Consider combining the specific evidences expressed as mass assignments over the
set
X = {bp, o-'p, -,bp, -'o-'p}
in the example above, namely
penguin(obj) : [0.4,0.4].
bird(obj) : [0.9, 1].
We have
(1) L p } : 0.4 , L -,p } : 0.6
(2) {b_} :0.9, L_} :0.1
Using the general assignment method we obtain
0.9 0.1
b {b,-,b}
bp Lp}
0.4
P 0.4 - x x
bop L -,p}
0.6
-p 0.5+x 0.1- x
where 0 S x SO.l
An abbreviated form of labelling is used for convenience.
The necessary and possible supports for the various elements of X are given by
Sn(bp) =0.4 - x Sp(bp) =0.4
Sn(b-,p) =0.5 + x; Sp(b-,p) =0.6
Sn(-,bp) =0 SP(-,bp) =x
Sn(-,o-'p) =0 Sp(-'o-'p) = 0.1 - x
from which we can calculate the support pairs
bp: [0.3,0.4] ; o-'p: [0.5,0.6] ; -,bp: [0,0.1] ; -'o-'p: [0,0.1]
To answer the query we combine the following mass assignments over the label set
L' using the general assignment method.
(1) {1>-, Jh f, 1>-, pf, bp-, f} : 0.7; {..., 1>-, p-, f,..., 1>-, pf} : OJ using (4)
(2) {bp-, f} : 0.05 ; {..., 1>-, p-, f,..., 1>-, pf, 1>-, p-, f, 1>-, pf} : 0.95 using (5)
(3) (1)-, pf) : 0.9k ; (..., 1>-, p-, f,..., 1>-, pf, 1>-, p-, f, bp-, f): 1 - 0 9. k using (1)
where k is the probability assigned to (1)-, p-, f, 1>-, pf, bp-, f)
For this particular example the solution is easily found by elementary analysis to be
(1)-, pf) : 0.63
(bp-, f) : 0.05
{1>-, p-, f} : 0.02
(..., 1>-, pf, ..., 1>-, p-, f) : OJ
so that the answer to the query is that the Pr{ (x can f)} lies in the interval [0.63, 0.93]
since the 0.3 associated with (..., 1>-, pf,..., 1>-, p-, f) could be all be associated with...,
1>-, pf although this is not necessarily the case. This conclusion is expressed in the
fonn of a support pair. We can obtain this result using the general assignment as
follows:
0.05 0.65
Evidence 1
(2) {...,1>-,p_}
0 OJ
{b-.pf} (boJhf)
0 {-,b-.p_l
0 0.3
(bp-, f) : 0.05
{b-. pfl : 0.63
{b-. p-, f} : 0.02
(-, b-. pf, -, b-. p-, f) : OJ
Suppose an apriori mass assignment rna is given over the focal set A whose
elements are subsets of the power set P (X) where X is a set of labels. This
assignment represents general tendencies and is derived from statistical
considerations of some sample space or general rules applicable to such a space.
Suppose we also have a set of specific evidences {EI, E2, ... , En} where for each i,
Ei is (mi, Fi) where Fi is the set of focal elements of P (X) for Ei and mi is the mass
312
assignment for these focal elements. These evidences are assumed to be relevant to
some object and derived by consideration of this object alone and not influenced by
the sample space of objects from which the object came from.
We wish to update the apriori assignment rna with (EI ..... En) to give the updated
mass assignment m such that the minimum information principle concerned with the
relative information of m given rna is satisfied.
p' is said to satisfy the minimum information principle for updating the distribution p
over X with specific evidences El ..... En where each Ei is expressed as a
distribution over a partition of X.
This iterative process in fact converges to the solution which satisfies the minimum
information principle of minimising the relative information with respect to the
apriori p subject to the constraints El ..... En. The multi constraint optimisation
problem is therefore solved by a succession of single constraint optimisation
ptoblems and iterating.
The single constraint optimisation problem has a particularly simple algorithm for its
solution which we will now consider.
Let p(r-I) = p. say. be updated to p(r) = p'. say. using Er with the following
313
LP'(x).Ln(p'(x) / p(x)}
xeX
is minimised subject to the constraint Er being satisfied.
Let the partition for evidence Er be (Xl, ... , Xk} with probability distrIbution
(Pr(Xi)} given.
Let Ki = L p(x)
x e Xi
for i = 1, ... , k
then
p'(x) =p(x)Pr(Xi) /Ki for x e X IF xe Xi for x a label of X, for all labels of X
Pr(Xj)
Xj ={ ... , Ii, ... }
1
Kj =
L pk
k:lkeXk
If the evidences are expressed as mass assignments over X with the apriori
assignment still being a probability distribution over the set of labels X then a more
complicated case must be considered
314
Let the p(r-l) = p be denoted as in 4.2 and let Er be the mass assignment
(Xrk: mrk, for k = 1, ... , or)
where Xrk is a subset of X for all k and mde is the mass assignment given to Xrk.
Aprio·
n mrk mrq Update
Xde Xrq
1 1
Kk= Kq=
LPS
s: Is £Xrk
Each column constraint is satisfied. The labels in the tableau cells are all labels of X
since the focal elements of Er when intersected with a label in the apriori gives the
apriori label. The update of this label is the sum of all the cell assignments associated
with this label. A cell in a row where the apriori label is not a member of the cell
column focal element of Er has a zero mass assignment associated with the empty
set.
In this case the update solution p' satisfies the following relative infonnation
optimisation problem.
In this case the intersection of the row subset of labels of the apriori assignment with
315
the column subset of labels of the evidence assignment for a given cell of the tableau
is a subset of P (X). In the case when this intersection is the empty set the mass
assignment for that cell is zero, When the intersection is not empty then the mass
assignment is the product of the row apriori assignment and the evidence column
assignment scaled with the K multiplier for that column. The K multiplier for a
column is the sum of the apriori row assignments corresponding to those cells of the
tableau in the column which have non-empty label intersections. The update is a
mass assignment over the set of subsets of labels of the cells of the tableau. The
update can therefore be over a different set of subsets of labels to that of the apriori.
Iteration proceeds as before and convergence will be obtained with a mass
assignment over P (X) which will correspond to a family of possible probability
distributions over X.
Tv f"IXrk Tv f"IXrq
1
- - 1
Kk= Kq= --
L
s :Tsf"l
ts
Xrq~0
The apriori assignment was also a family of possible probability distributions. For
anyone of these, the calculation is that of 4.3 and satisfies the minimum information
principle where the constraints are in terms of necessary and possible supports from
the evidence assignments. A member of the apriori set of possible probability
distributions over the label set X will be updated with the evidences EI, ..., Er to a
fmal probability distribution over X satisfying the minimum information principle.
Each member of the set of possible apriori probability distributions will be updated,
in general to a different final probability distribution. The fmal set of distributions
can be expressed as an assignment over P (X). This is what the algorithm described
above does in this case. The calculation is no more involved as for the other more
simple cases apart from having to determine the intersections for fmding the subset
of labels for each cell and taking note of these in the final update.
assignment case. The loop can be treated in exactly the same way as for the general
assignment method. This will be illustrated in the examples that follow. Each
solution of the loop satisfies the minimum information principle.
This is a simple FRll.. type program. X is a variable and a, b, c, d are predicates. The
fU'St two sentences are rules which express general statements about persons and the
third and fourth are facts about a specific person mary.
If a rule contains one list after the colon then this gives the interval containing the
probability of the head of the rule given the body of the rule is true. If the list after
the colon contains two lists then the fust gives the probability of the head of the rule
given the body of the rule is true and the second gives the interval for the probability
of the head of the rule given the body of the rule is false.
The first rule says that for any person X the probability that (X is a) given that (X is
b) and (X is c) is 0.9. This expresses the fact that 90% of persons who are both band
c are also a. It also says that (X is a) cannot be true unless both b and c are satisfied.
The second rule says that at least 85% of persons who are d are also c while no
person who is not a d can be a c.
In this example the rules are used to determine a family of apriori assignments over
the two sets of labels
{ABC} and {CD}
where A, B, C, D denotes a or..., a, b or..., b, c or-, c, d or..., d respectively. Rule (2)
can be used to construct a family of apriori assignments for {CD} which can be
updated using (4) for the specific person Mary and from this a support pair for
Pr{c(mary)} determined. This can be used with (3) to update a family of apriori
assignments determined from (1) for the set of labels {ABC}. From this update the
Pr{a(mary)} can be determined.
Alternatively we could update the family of apriori assignments over the set of labels
317
{ABCD} constructed using rules (I) and (2) with specific evidences (3) and (4) and
detennine Pr{a(mary)} from the fmal update.
These two approaches are equivalent and the fU'St approach decomposes the problem
of fmding Pr{a(mary)} into two sub problems. Decomposition will not be discussed
in detail in this paper but it is important to reduce the computational burden
associated with updating over a large set of labels.
0.9S O.OS
Apriori Ld) L -,d} Update
0.8Sk: (cd) 0.8075 o 0.8075
x : (-, cd) 0.95x/k o 0.95x/k
l-k: (-, c-, d) o 0.05 0.05
O.ISk-x: L d) 0.1425 - (0.95x I k) o 0.1425 - (0.95x I k)
K's Ilk l/(l-k)
318
The interval for be can also be calculated and this is [0.6075, 0.8]. It should be noted
that if only the answer for a(mary) is required the last 5 rows of the last two tables
can be collapsed into one row with the assignment for this row equal to the sum of
the assignments of the five rows. This simplifies the calculation process.
In this example each stage of the process retains the infonnation given by the
appropriate rule. For example in the final table Pr{a(mary) I b(mary), c(mary)} =0.9.
This simply means that there is a member of the family of apriori assignments which
can satisfy the specific evidences. For this example several steps are required for the
fmal iteration to converge. This is because of the imprecision found for Pr{ c(mary)}.
If a point value was used for Pr{ c(mary)} then the iteration would have converged in
one step. In a later section we will deal with the non-monotonic logic case in which
the specific evidences are inconsistent with the family of apriori assignments.
Three clowns stood in line. Each clown was either a man or a woman. The audience
was asked to vote on each of the flJ'St and last clown being male. 90% voted that the
flJ'St clown, the one on the left, was a man and 20% thought the third clown, the one
on the far right, was a man. Nothing was recorded about the middle clown. What is
the probability that a male clown stands next to a female clown with the male on the
left?
If it were known for sure that the first was male and the third was female then a male
would certainly be standing next to a female with the male on the left This problem
320
can be expressed in flCSt order logic and the theorem proved by case analysis. The
refutation resolution method popular in computer theorem proving programs could
also be used but is much more cumbersome. The problem posed above is a
probabilistic version of this.
We can combine these two evidences using the general assignment method
0.2 Evidence 2 0.8
{
-- m} {-- f}
Evidence 1 {m_m} {m_f}
x 0.9 - x
0~x~0.2
{f_m} {f _f}
0.2-x x - 0.1
Therefore the support pair for the statement S ="clowns of opposite sex stand next
to each other with a male on the left of the pair =[MIN(0.9 - x), MAX(0.8 + x)] =
[0.7, 1].
We now consider this example using the iterative assignment method. Above we
used specific information about the three clowns in line. We did not use any apriori
information concerning clowns in general. In fact the apriori information we
assumed was of the form
{mmm, mmf, mfm, mff, fmm, fmf, ffm, fff} : 1
This mass assignment could be used. with the iterative assignment method using the
specific information given for updating as foUows
0.9 0.1
{m __ l {f __ }
{m } : 0.9 {f __ } : 0.1
K's 1 I
321
0.2 O.S
L_m} L_f}
0.9: {m __ } {m _m}: 0.18 {m _f} : 0.72
0.1 : {f __ } {f _ m}: 0.02 {f _ f} : 0.08
K's 1 1
Final Update is
O.IS:(m_m}
0.72: (m _ f)
0.02 : (f_ m)
O.OS : {f _ f}
since (m __ ) : 0 .9is satisfied so that both updating evidences are satisfied.
The loop in this final mass assignment means that we can add and subtract around
the loop without destroying the constraints and all these solutions satisfy the
minimum relative entropy criteria with respect to some apriori assignment in the set
of all possible apriori assignments. The solution which is produced by the iterative
assignment method before any adding and subtracting around the loop is performed
is that corresponding to the maximum entropy apriori assignment, ie. that member of
the set of possible apriori assignments corresponding to maximum entropy.
To obtain the necessary support for the statement S we must minimise the
assignment given to (m _ f) so that we use the assignment
0.2 : (m_m)
0.7 : (m _ f)
0 : (f_m)
0.1: (f _ f)
since 0.02 is the maximum value we can subtract from 0.72 since otherwise the entry
in the cell with assignment 0.02 would go negative.
The above analysis is equivalent to using the iterative assignment with all apriori
distributions over the label set {mmm, mmf, mfm, mff, fmm, fmf, ffm, fff} which
will allow both specific evidences to be retained when using the iterative assignment
method. For example
The apriori (0 0 0.25 0 .25 0 0 0.25 0.25) will give 0.9
The apriori (0.25 0.25 0 0 0.25 0.25 0 O) will give O.S
The apriori (0 0 0 .1 0.4 0 0 0 25 . 0.25) will give 0.9
The apriori (0 0.7 0.2 0 0 0.1 0 O) gives 1
The apriori (0.2 0.4 0 0 .3 0 0 0 01) .gives 0.7
322
-
6. NON MONOTONIC REASONING
Consider the following example. Population statistics tell us that a thirty year old
Englishman has a very high probability of living another 5 years. The statistics also
tell us that a thirty year old Englishman who has lung cancer only has a small
probability of living another 5 years. We are told that John is a thirty year old
Englishman. We can conclude that it is very probable that he will live another 5
years. If we are later told that he has lung cancer then we conclude he has little
chance of living another 5 years. What we could conclude before this additional
piece of information was given we can no longer conclude. From a logic point of
view it appears that we have a situation in which we can approximate the modelling
of this situation by replacing propositions with high probabilities with those
propositions and propositions with low probabilities with their negation. Thus we
have
We make inferences by selecting the correct sample space using the given specific
information and determine the desired probability using this. In the case of John who
is known to be an Englishman in his thirties the answer for the probability of him
living another 5 years will be "high" if this is aU we know about him. If we also
know that he has lung cancer then a different sample space is used and the answer is
"low".
323
In tenns of the iterative assignment method, the general statements (I) and (2) above
are used to detennine a family of apriori assignments which are updated with the
specific evidences concerning John. These specific evidences could be uncertain in
some sense, ie probabilistic statements, in this case.
The rules (I), (2) and (3) defme the family of apriori assignments. (2) and (3)
eliminate certain possible labels as discussed above. The labels are:
yl -,b-,pof
y2 -,b-,pf
y3 b-,Jhf
y4 b-,pf
y5 bp.f
so that y4 / (y3 + y4 + y5) =0.9 and yl + y2 + y3 + y4 + y5 =1
If we let
k =y3 + y4 + y5 ---------------------------------------------------------------- (6)
then
y4 =0.9k --------------------------------------------------------------------------- (7)
and the family of apriori for a given k, 0 < k S 1 is
(b-, pf) : 0.9k
(-,b-,p_}:I-k
(b_-,f}:O.lk
detennined by combining (6) and (7) using the general assignment method.
This family of assignments are updated using the specific evidences (4) and (5) with
the iterative assignment method using the scheme
From the fmal family of assignments we can determine the support pair for f(obj)
f(le) : [assignment for (b-. pf), assignment for (b-. pf) + assignment for {..., b-. p _ }]
=[0.45, 0.55]
This final support pair is in actual fact independent of k, so that this is the actual
support pair for "f'
ie.Pc(fly(obj)} £ [0.45,0.55]
It also gives specific information about the object obj, namely that there is a
325
probability of 0.9 that obj is a bird and a probability of O.4lhat obj is a penguin.
This infonnation allows the following unique distribution over the relevant labels to
be constructed:
Apriori
y1 =0.27
y2= 0.03
y3 =0.0332
y4 =0.63
y5 =0.0368
using
y4 / (y3 + y4) = 0.95 ; y2 / (yl + y2) = 0.1 ; y4 / (y3 + y4 + y5) = 0.9
y3 + y4 + y5 = 0.7 ; y1 + y2 + y3 + y4 + y5 = 1
The iterative assignment update then gives (fly Mary) : (0.485 0.485).
Intuitive solution
In this problem we are presented with two pieces if infonnation:
1. Object Mary came from a population with statistics
...., 1>-. po f 0.27
....,1>-.pf 0.03
1>-. po f 0.0332
lnpf 0.63
bp-, f 0.0368
so that
IF object obj has properties bp then Pr(obj can fly) = 0
IF object obj has properties 1>-, p then Pr(obj can fly) = 0.63 /0.6632 = 0.95
IF object obj has properties...., 1>-, p then Pr(obj can fly) = 0.03/0.3 = 0.1
2. Object properties
2(a) b: 0.9 ; ...., b: 0.1
2(b) P : 0.4 ; ...., P : 0.6
2(c) obj cannot be a penguin and not a bird
Combining 2(a) and 2(b) taking account of 2(c) by allowing only the set of labels
{...., 1>-, p. b...., p. bp }
using the general assignment method gives
p :0.4 ....,p : 0.6
bp b-.p
0.9
b 0.4 0.5
....,~ ....,b-.p
0.1 (not allowed)
....,b 0.1
0
326
giving
bp: 0.4 ; b-. p : 0.5 ; ...., b-. p : 0.1
We can write
P'r(t) =Pr(f 1bp)P'r(bp) + Pr(f 1b-. p)P'r(b-. p) + Pr(f I...., b-. p)P'r(f I...., b-. p)
where Pr(.) signifies a probability determined from the population statistics,
information I, and P'r(.) signifies a probability determined from the specific
information, information 2 and set of possible labels.
7. SEMANTIC UNIFICATION
m(Ai) ~ 0 , all i
L m(Ai) = 1
Let the necessary support and possible support measures for this special case of
nested sets be called necessary and possibility measures denoted by N(.) and P(.)
respectively. It is easy to show that
P(A u B) =MAX (P(A), P(B)}
N(A n B) + MIN (N(A), N(B)}
for all A, B E P(X)
[ZADEH 1978], [KLIR, FOLGER 1988].
Let Pf be a function
Pf: X ~ [0,1]
called the possibility distribution of f over X.
We can generalise this to the case of continuous fuzzy sets like those discussed in the
introduction but we will not do this in this paper. The continuous case can always be
treated by approximating the continuous fuzzy set f with membership function Xf
defmed over R by a discrete set of pairs (xi / Xi) where XtCxi) = Xj and the interval R
is approximated by the set of points (x 1. x2•...• xn).
7.2 EXAMPLES
We will use a voting model with constant thresholds to interpret the meaning of a
fuzzy . set. Consider the fuzzy set "tali" defmed on the height space [4ft, 8ft] by
means of the membership function Xtall. How can we interpret Xtall(Sft 10")?
Consider a representative population sample of persons, S say. We ask each member
of S to accept or reject the height Sft 10" as satisfying the concept "tall". Each
member must accept or reject; there is no allowed abstention. Xtall(Sft 10'') is put
equal to the proportion of S who accept.
An alternative pattern is
1 2 3 4 S 6 7 8 9 10
a a
b b b b
c c c c c c c c
d d d d d d d d d d
The fIrst pattern is more reasonable than the second. In the second pattern voter 3
accepts a which has a low membership level but doesn't accept b which has a higher
membership level. It seems that anyone who accepts a member with a certain
membership level will accept all members with a higher membership level. This we
call the constant threshold assumption. The fast pattern satisfies the constant
329
We can interpret this mass assignment in the following way. If the population S is
told Z has property fl and a member of the population drawn at random is asked
what the value of this property taken from {a, b, c, d, e} for Z is, the answer would
be a family of distributions over {a, b, c, d, e} deduced from the mass assignment
above.
This interpretation is not valid if the fuzzy set is non- normalised since the constant
threshold model cannot be satisfied.
7.4 SEMANTICJJNWICATION
We can associate the mass assignments (ml, FI) and (m2, F2) with fI and f2
respectively where FI and F2 are the focal elements and are nested sets.
For any member sli of FI and any member s2j of F2 we can determine the support
pair for sli I s2j from the set {[O, 0], [I, 1], [0, I]}. Let this be [Sn(sli I s2j), Sp(sli I
s2j)]
Therefore the expected value of Pr(a is fI I a is f2) is contained in the support pair
[Sn(a is fI I a is f2), Sp(a is fIl a is f2)]
where
330
Therefore
S({d} I {a}) =[0,0]
S({c, d} I (a}) =[0,0]
S({b, c, d} I {a}) =[0,0]
S({a, b, c, d} I {a}) = [1, 1]
and
S({d} I {a, b}) = [0,0]
S«(c, d} I (a, b}) =[0,0]
S«(b, c, d} I {a, b}) = [0, 1]
S«(a, b, c, d} I (a, b}) =[1,1]
and
S«(d} I {a, b, c}) =[0,0]
S«(c, d} I (a, b, c}) = [0,1]
S({b, c, d} I {a, b, c}) =[0,1]
S({a, b, c, d} I (a, b, c}) = [I, 1]
so that
Sn(a is fll a is 12) = 0.2*0.2 + 0.2*0.7 + 0.2*0.1 = 0.2
Sp(a is fll a is 12) = 0.2*0.2 + 0.2*0.7 + 0.2*0.7 + 0.4*0.1 + 0.2*0.1 + 0.2*0.1 = 0.4
so that
The support pair for the unification of fl given 12 is
a is fll a is 12 : [0.2,0.4]
and
f = ellO.1 + e2/0.2 + e3/O.3 + e410.4 + e510.5
+ 00/0.6 + e710.7 + eS/O.S + e9/O.9 + e10/1.0
then
fll fl : [0.55, I]
These are two approximations for detennining ftf where f is a ramp fuzzy set on an
interval R. In the limit as more and more points in R are used we obtain
fl f: [0.5, I].
To answer this query we determine the support pair[x, y] below by the method in the
last section applied to
middle_aged 1abouc35
We then solve
8. CONCLUSIONS
This paper provides a general approach to evidential reasoning when the knowledge
representation is in the form of rules and fact with both probabilistic and fuzzy
uncertainties included. The methods provided ean be used for other fonns of
knowledge representation, for example, Bayesian networks [pEARL 1988], Moral
Graphs [LAURITZEN, SPIEGELHALlER 1988] with extensions to the case of
uncertain specific information and valuation-based languages for expert systems
[SHENOY 1989]. The non-monotonic case is not seen to be a problem. Without
decomposition methods the approach given here could easily become
computationally excessive. Decomposition methods have only been lightly touched
on in this paper although expressing knowledge in the form of rules provides a
natural decomposition. Inference diagrams mean be used to construct a
decomposition from a group of rules. For special cases the decomposition allows the
calculus of support logic programming used in FRll. to be used for answering
queries. The methods given here extends FRll. to cases which cannot be treated by
the present version. The next version will take account of these extensions.
9. REFERENCES
Baldwin J.F, (1986), "Support Logic Programming", in: A.Uones et al,. Eds., Fuzzy
Sets Theory and Applications, (Reidel, Dordrecht-Boston).
Baldwin J.F, (1987), "Evidential Support Logic Programming", Fuzzy Sets and
Systems, 24, pp 1-26.
Baldwin J.F. et ai, (1987), "FRll. Manual", Fril Systems Ltd, St Anne's House, St
Anne's Rd, Bristol BS4 4A, UK
Baldwin J.F., (199Oc), "Towards a general theory of intelligent reasoning", 3rd Inl
ConfIPMU, Paris, luly 1990
Dubois D., Prade H., (1986), "On the unicity of Dempster Rule of Combination",
Inl J. oflntelligent Systems, 1, no. 2, pp 133-142
Klir GJ., Folger T.A., (1988), Fuzzy Sets, Uncertainty, and Information, Prentice-
333
Hall
Sharer G., (1976), "A mathematical theory of evidence" ,Princeton Univ. Press
Shenoy P.P., (1989), "A Valuation - Based Language for Expert Systems, Int 1. of
Approx. Reasoning, VOl 3, No.5.
Zadeh L, (1978), ''Fuzzy Sets as a basis for a theory of Possibility", Fuzy Sets and
Systems 1,3-28
16
PROBABILISTIC SETS
PROBABILISTIC EXTENSION OF FUZZY SETS
Kaoru Hirota
Dept. of Instrument & Control Engineering College of
Engineering, Hosei University 3-7-2 Kajino-cho, Koganei-
city, Tokyo 184, Japan
Introduction
In the field of pattern recognition or decision making
theory, the following complicated problems have been
left unsolved: (l)ambiguity of objects, (2)variety of
character, (3)subjectivity of observers, (4)evolution of
knowledge or learning. With regard to each problem,
however, there are several general theories: many-valued
logic, fuzzy set theory (in connection with (1)and(2)),
modal logic (in conj unct ion wi th (2) and (4) ), and
subjective probability (in relation to (3)). It seems,
however, that there are few carefully thought-out
investigations by paying attention to all problems
mentioned above. In this paper we would like to give our
opinion about these problems and to introduce a new
concept called 'probabilistic sets'.
By giving examples in comparison with fuzzy set theory,
the background idea of probabilistic sets is explained
in Section 2. In probabilistic sets, it is essential to
regard the value of membership functions of fuzzy sets
as a random variable. A probabilistic set A on a total
space X is defined by a defining function ILA(X, w),
which is a point (i.e. x E xl-wise (8, BJ-measurable function
from a parameter space (n.B,p) to a characteristic space
(nc. Be)' The parameter space (n, B, P) is a probability
space and is closely related with subjectivity,
personality, and evolution of knowledge. The
characteristic space (n e , Be) is a measurable space
usually adopt ([ 0,1) ,Borel sets) as (n e , Be) . Section 3
describes definitions of probabilistic sets from a
measure-theoretical viewpoint. The concept of
probabilistic sets includes the concept of classical
fuzzy sets . Some other properties are important results
336
h~ffi
-1 0 1 X -1 0 1 X -10 1 X
(a) (b) (C)
Fig. 1. Fuzzy sets; (a) numbers near one, (b) numbers near minus one,
A~x~
(0-1) (0-2) (01-3)
4-.~~
(b-l) (b-2) (b-3)
~~~(c-1)
d:!fining fllrlicn
(c-2)
mean valUe!
(c-3)
variance
Fig. 2. Probabilistic sets; (a) numbers near one, (b) numbers near minus one,
(c) the union (numbers near one or minus one).
1L11L2 e M, (7)
inf ILl eM,
i;;.1 (8)
(11)
lim lLi = inf sup lLi e M.
i-+oo ' __ I j;;ai
Definition 2.
A probabilistic set A on X Is defined by a defining
function ILA
ILA :Xx.fl-.flc'
UJ UJ
(12)
(x, w)>-+ ILA (x, w)
P(E) = 1, (13)
ILA (x, w) ~ ILB(X, w) for all weE.. (14)
The selection of 1' .. 1'2" '" 'Yn froll f is varied . The least
upper bound, denoted by a(x) , can be calculated,
342
O~a(x)~ 1. (19)
(22)
343
for each x E X and each WEn, and the union of {An}:_1 may
be defined by
ILU A. (x, w) = SUP{ILA. (x, w) /1 :s;; n < oc} (23)
O:s;;b(x):s;;I, (25)
such that
lim
}_oo
f
11
min{ILAy(x, w) /1:s;; i:s;; n} dP(w) = b(x),
t J
(27)
and define
Complement of A AC
Difference A - B
ILA-B(X, w) = max{O, ILA(X, W)-ILB(X, w)}. (32)
(36)
lim An = U
.. n.. A k• (38)
"_00 ,,-I k=n
Theorem 1.
A family of probabilistic sets (~(X),c) constitutes a
complete pseudo-Boolean algebra.
Note.
In ordinary set theory, a family of all subsets
constitutes a complete Boolean algebra. The difference
between the two is the lack of a complemented law
(i.e. AU AC f X, An AC f q,) • In probabilistic set theory,
it is essential to consider ambiguous states, so we can
not get any definite information if we know that the
considered object is not in one state. In ordinary set
theory, however, we information that it is not in one
347
UAy) n (. .U
(yer eA B... ) = U(Ay n B... ),
y~
(50)
(53)
(55)
A nA=A,
involution law
ACC=A, (56)
elimination law
AUB=AUC}~B=C,
AnB=AnC (57)
identi ty law
AUX=X,
(58)
AnX=A, (59)
A UcfJ =A, (60)
An cfJ = cfJ· (61)
Proposition 6.
For arbitrary {An l:. 1( c ~(X» , we have
lim An C lim An. (62)
"_co n_OO
(63)
Proposition 7.
Each of (~(X), U), (~(X), n), (~(X),·) ,and (~(X), EB)
constitutes a commutative monoid (i.e. a commutative
semigroup with a unit) and, for arbitrary A,B,C(E~(X»,
we have
349
Note.
In ordinary set theory, it is possible to define
sixteen different kinds of binary operations. (Because
the total space X can be divided into four regions for
arbitrary subsets A and B , hence there exist 24 =16
combinations.) Among these sixteen binary operations,
symmetric difference AAB has a very good property from
an algebraic viewpoint, namely, it constitutes an
Abelian group. In probabilistic set theory, however,
(~(X),~) doesn't satisfy such a good property. On the
contrary, it doesn't satisfy the associative law.
Proposition 8.
Let Xy( 'Yen be total spaces (possibly infinite), then
we have
yer
u
U ~(Xy)c ~( X
yer
y) ,
(70)
n ~(Xy) = ~ n Xy).
yer yer
(71 )
Definition 4.
A probabilistic mapping f from X to Y on a parameter
space (nm,Bm,Pm) is defined by
f:xxnm-y,
IV IV (72)
(x, w",) ..... f(x, w m ).
V(ILA)(X) = J
n
(ILA (x, w) - E(ILA)(X»2 dP(w)( ~ M~(ILA)(X», (74)
(75)
(n EN),
(76)
M~(ILA)(X) = f II
(ILA(X, w)- E(ILA)(X»" . dP(w), (77)
~l~(ILA)(x)=f IJ
IILA(X,w)-E(ILA)(x)l" ·dP(w). (78)
The justification of above stated definitions is
ensured by Proposition I, and the following properties
351
Proposition 9.
In the situation of Definition 5, we have
0.:;; E(ILA )(x).:;; 1 for all x e X, (79)
0.:;; . . . .:;; M 3 (ILA )(x).:;; M2(ILA )(x) .:;; MI (ILA )(x)
= E(ILA)(X)':;; {M2(ILA )(x)}!
for all x e X, (80)
for all x e X, ( 82 )
Definition 6.
C(ILA, lLa)(X) = f.
!I
(ILA (x, w) - E(ILA )(x)) (84)
. (lLu(X, w)- E(lLa)(X» . dP(w),
r(ILA, ILB)(X) = C(ILA, ILB)(X)f../V(ILA)(X) . V(lLa)(X). (85)
(If V(ILA)(X)· V(lLa)(X)=O, r(ILA, lLu)(X) is not defind.)
Proposition 10.
In the situation of Definition 6, we have
0.:;; IC(ILA' ILn)(x)l.:;; V(ILA )(x) . V(lLn)(X)':;; 1, (86)
O.:;;lr(ILA, lLu)(x)l.:;; 1,
(87)
Definition 7.
Let AI> A 2 , ••• , An be probabilistic sets on X whose
defining functions are #J.A,(X, w), #J.A2(X, w), ... , #J.A.(X, w) •
respectively. For arbitrary x, Y(EX) • moment matrix
M(x,y) and variable-covariance matrix V(x,y) of
A I' A 2 , • • • , An are defined by
Conclusions
The background idea of probabilistic sets was discussed
in comparison with fuzzy sets, and its mathematical
structure was explained without proofs. Main results are
(l)a family of probabilistic sets constitutes a pseudo-
Boolean algebra; (2)the possibility of moment analysis
is a great advantage in applications(cf . [9]).
The concepts of probabilistic sets presented in this
353
References
[l]G.Birkhoff, Lattice Theory,Am.Math.Soc.Colloq.Publ.
(Am.Math.Soc.,Mew York,1969).
[2]J.G.Brown, A note on Fuzzy sets, Information and
Control 18(1971) 32-39.
[3]J.A.Geguen, L-fuzzy sets, J.Math.Anal.Appl.
18(1967) 145-174.
[4]P.R.Halmos, Naive Set Theory(Van Nostrand,New York,
1960) .
[5]P.R.Halmos, Measure Theory (Van Nostrand, New York,
1960) .
[6]K.Hirota, Kakuritsu-Shugoron to sono Oyourei
(Probabilistic sets and its applications),
Presented at the Behaviormetric Society of Japan
3rd Conference (1975) (in Japanese).
[7]K.Hirota, Kakuritsu-Shugoron (Probabilistic set
theory), in: Fundamental research works of fuzzy
system theory and artificial intelligence,
Research Reports of Scientific Research fund
from the Ministry of Education in Japan(1976)
193-213(in Japanese).
[8]K.Hirota, Concepts of probabilistic sets,IEEE Conf.
on Decision and Control (New Orleans)(1977)
1361-1366.
[9]K.Hirota, Extended fuzzy expression of
probabilistic sets-Analytical expression of
ambiguity and subjectivity in pattern recognition,
Presented at Seminar on Applied Functional
Analysis(July,1978) 13-18.
[10]K.Hirota er al., A decision making model-A new
approach based on the concepts of probabilistic
sets, Presented at Int. Conf. on Cybernetics and
Society 1978, Tokyo (Nov. 1978) 1348-1353.
[ll]K.Hirota et al., The bounded variation quantity
(B.V.Q) and its application to feature extractions,
Presented at the 4th Int.Conf. on Pattern Recognition,
354
Kyoto(Nov.1978) 456-461.
[12]M.Mizumoto and K.Tanaka,Some properties of fuzzy
sets of type 2, Information and Control 31(1976) 312-
340.
[13]K.Nanba, shugo-ron(Set Theory) (Science-sha Publ.,
1975) (in Japanese).
[14]L.A.Zadeh, Fuzzy sets, Information and Control
81965) 228-353.
[15]L.A.Zadeh, Probabilistic measure of fuzzy events,
J. Math. Ana .. Appl.23(1968) 421-427.
[16]L.A.Zadeh et al., Fuzzy Sets and their Applications
to Cognitive and Decision Theory (Academic Press,
New York,1975).
INDEX
A
"and" operators, 108 fuzzy mathematical
applications of fuzzy logic programming, 97
control, 82 fuzzy parameters, 112
applications, 114
B
fuzzy rules, 53
biology, 235
fuzzy set operations, 71, 187
boundary detection, 134
fuzzy syntactic analysis, 178
C fuzzy truth values, 55
canonical propositions, 12, 28 G
categorical reasoning, 3
gradual rules, 53
certainty qualification, 37, 49
gray scale, 122
clustering, 123
commonsense knowledge, 42
H
computer vision, 121 hierachical fuzzy logic
consensus, 274 control, 80
high level vision, 136
D
human factors, 202
decision making, 241,263
decision trees, 226 I
default knowledge, 40 image enhancement, 167
defuzzification, 77, 242 image geometry, 154
diagnosis, 240 image processing, 122
importance, 241, 275
E
imprecise matching, 59
evidential reasoning, 297
inference rules, 15,31
extension principle, 17
entailment rule, 15
F conjunction rule, 16
FRIL, 299 disjunction rule, 16
fuzzy c means, 124 projection rule, 16
fuzzy chips, 83 composition rule, 16
fuzzy contraints, 5, 99
K
fuzzy goals, 99
knowledge acquisition, 281
fuzzy linear programming, 99
knowledge representation, 299
fuzzy logic control, 69, 72
self organization, 85
neural networks, 86
356
L Q
learning, 281 quantifiers, 3, 263
learning operators, 289 quantified propositions, 32, 266
linguistic approximation, 197 questionnaires, 221
linguistic quantifiers, 263 R
linguistic variables, 222 rule based system, 254
M rules of quantification, 8
man-machine interactions, 201 S
measures of fuzziness, 148 segmentation, 125, 158
medicine, 235, 253 semantic unification, 326
membership functions, 70, 106, smart shell, 283
161, 187, 239 specificity, 38
meta control, 261 support measures, 302
meta reasoning, 259 syllogistic reasoning, 4, 18
moment analysis, 350 T
N test score semantics, 6
natural language, 185 translation rules, 7
neural networks, 86, 235, 246
nonmonotonic reasoning, 322
p
parallel rules, 61
possibility, 3, 50
possibility distribution, 30, 46
possibility measure, 41, 227, 244
possibility qualification, 41, 47
probabilistic mappings, 349
probabilistic sets, 335, 339
probability297, 335
probabilistic masses, 306
prolog, 299