Modeling
Modeling
par
presentée au jury:
Lausanne, EPFL
2002
Abstract
This thesis presents Fuzzy CoCo, a novel approach for system design, conducive to explain-
ing human decisions. Based on fuzzy logic and coevolutionary computation, Fuzzy CoCo is
a methodology for constructing systems able to accurately predict the outcome of a human
decision-making process, while providing an understandable explanation of the underlying
reasoning.
Fuzzy logic provides a formal framework for constructing systems exhibiting both good
numeric performance (precision) and linguistic representation (interpretability). From a nu-
meric point of view, fuzzy systems exhibit nonlinear behavior and can handle imprecise and
incomplete information. Linguistically, they represent knowledge in the form of rules, a natu-
ral way for explaining decision processes.
Fuzzy modeling—meaning the construction of fuzzy systems—is an arduous task, de-
manding the identification of many parameters. This thesis analyses the fuzzy-modeling prob-
lem and different approaches to coping with it, focusing on evolutionary fuzzy modeling—
the design of fuzzy inference systems using evolutionary algorithms—which constitutes the
methodological base of my approach. In order to promote this analysis the parameters of a
fuzzy system are classified into four categories: logic, structural, connective, and operational.
The central contribution of this work is the use of an advanced evolutionary technique—
cooperative coevolution—for dealing with the simultaneous design of connective and opera-
tional parameters. Cooperative coevolutionary fuzzy modeling succeeds in overcoming several
limitations exhibited by other standard evolutionary approaches: stagnation, convergence to
local optima, and computational costliness.
Designing interpretable systems is a prime goal of my approach, which I study thoroughly
herein. Based on a set of semantic and syntactic criteria, regarding the definition of linguistic
concepts and their causal connections, I propose a number of strategies for producing more
interpretable fuzzy systems. These strategies are implemented in Fuzzy CoCo, resulting in a
modeling methodology providing high numeric precision, while incurring as little a loss of
interpretability as possible.
After testing Fuzzy CoCo on a benchmark problem—Fisher’s Iris data—I successfully
apply the algorithm to model the decision processes involved in two breast-cancer diagnostic
problems: the WBCD problem and the Catalonia mammography interpretation problem. For
the WBCD problem, Fuzzy CoCo produces systems both of high performance and high in-
terpretability, comparable (if not better) than the best systems demonstrated to date. For the
Catalonia problem, an evolved high-performance system was embedded within a web-based
tool—called COBRA—for aiding radiologists in mammography interpretation.
Several aspects of Fuzzy CoCo are thoroughly analyzed to provide a deeper understand-
ing of the method. These analyses show the consistency of the results. They also help derive
iii
iv
a stepwise guide to applying Fuzzy CoCo, and a set of qualitative relationships between some
of its parameters that facilitate setting up the algorithm.
Finally, this work proposes and explores preliminarily two extensions to the method:
Island Fuzzy CoCo and Incremental Fuzzy CoCo, which together with the original CoCo con-
stitute a family of coevolutionary fuzzy modeling techniques. The aim of these extensions is
to guide the choice of an adequate number of rules for a given problem. While Island Fuzzy
CoCo performs an extended search over different problem sizes, Incremental Fuzzy CoCo
bases its search power on a mechanism of incremental evolution.
Résumé
Cette thèse présente Fuzzy CoCo, une nouvelle approche pour la conception de systèmes fa-
vorisant l’explication des décisions humaines. Basée sur la logique floue et sur le calcul
coévolutionniste, Fuzzy CoCo est une méthodologie visant à construire des systèmes capa-
bles de prédire le résultat d’un processus décisionnel humain et de fournir une explication
compréhensible du raisonnement sous-jacent.
La logique floue fournit un cadre formel pour construire des systèmes qui offrent à la
fois une bonne performance numérique (précision), et une représentation linguistique (inter-
prétabilité). D’un point de vue numérique, les systèmes flous sont des systèmes non linéaires
capables de traiter une information imprécise et incomplète. Linguistiquement, ils représen-
tent les connaissances sous forme de règles, ce qui est une façon naturelle d’expliquer des
processus décisionnels.
La modélisation floue—c’est à dire, la conception de systèmes flous—est une tâche dif-
ficile, exigeant l’identification de nombreux paramètres. Cette thèse analyse le problème de
modélisation floue ainsi que des différentes approches existant pour le résoudre, se focalisant
sur la modélisation floue évolutionniste—la conception de systèmes flous en utilisant des al-
gorithmes évolutionnistes—qui constitue la base méthodologique de mon approche. Afin de
favoriser cette analyse, les paramètres d’un système flou sont classifiés en quatre catégories:
logiques, structuraux, connectifs, et opérationnels.
La contribution centrale de ce travail est l’utilisation d’une technique évolutionniste
avancée—la coévolution coopérative—pour faire face à la conception simultanée des
paramètres connectifs et opérationnels. La modélisation floue par coévolution coopérative
réussit à surmonter plusieurs limitations montrées par d’autres approches évolutionnistes:
stagnation, convergence aux optimums locaux, et temps élevé de calcul.
Concevoir des systèmes interprétables est un des buts principaux de mon approche, que
j’étudie complètement. Basé sur un ensemble de critères sémantiques et syntaxiques concer-
nant la définition des concepts linguistiques et leurs liens causals, je propose un certain nombre
de stratégies pour produire des systèmes flous plus facilement interprétables. Ces stratégies
sont implantées dans Fuzzy CoCo, ayant pour résultat une méthodologie de modélisation four-
nissant une précision numérique élevée, tout en gardant une interprétabilité aussi élevée que
possible.
Après avoir essayé Fuzzy CoCo sur un problème benchmark—le problème des iris de
Fisher—j’ai appliqué avec succès l’algorithme pour modéliser les processus de décision im-
pliqués dans deux problèmes de diagnostique de cancer du sein: le problème connu comme
WBCD et le problème d’interprétation de mammographies de Catalogne. Pour le problème
WBCD, Fuzzy CoCo produit des systèmes très performants et hautement interprétables, com-
parables (sinon superieures) aux meilleurs systèmes rapportés jusqu’à présent. Pour le prob-
v
vi
lème de Catalogne, un très bon système évolué a été inclus dans un outil en ligne—appelé
COBRA—aidant des radiologistes à l’interprétation de mammographies.
Plusieurs aspects de Fuzzy CoCo sont minutieusement analysés afin de fournir une
compréhension aprofondie de la méthode. Ces analyses montrent l’uniformité des résultats
obtenus. Sur la base de ces analyses, je propose un guide pour appliquer Fuzzy CoCo, ainsi
qu’un ensemble de rapports qualitatifs entre certains de ses paramètres pour faciliter leur
choix lors de l’utilisation de l’algorithme.
Finalement, ce travail propose et explore, de façon préliminaire, deux extensions à la
méthode: Island Fuzzy CoCo et Incremental Fuzzy CoCo. Combinées avec le CoCo origi-
nal, elles constituent une famille des techniques de modélisation floue coévolutionniste. Le
but de ces extensions est de guider le choix du nombre de règles pour un problème donné;
tandis que Island Fuzzy CoCo exécute une recherche étendue sur différentes tailles du prob-
lème, Incremental Fuzzy CoCo base sa puissance de recherche sur un mécanisme d’évolution
incrémentielle.
Table of Contents
Abstract iii
Résumé v
List of Figures xi
1 Introduction 1
1.1 General Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Proposed solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Fuzzy Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Basic notions of fuzzy sets and fuzzy logic . . . . . . . . . . . . . . 5
1.2.2 Conditional fuzzy statements . . . . . . . . . . . . . . . . . . . . . . 8
1.2.3 Fuzzy inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.4 Fuzzy inference systems . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Evolutionary computation . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.1 Genetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3.2 Genetic programming . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3.3 Evolution strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.3.4 Evolutionary programming . . . . . . . . . . . . . . . . . . . . . . . 24
1.3.5 Classifier systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
vii
viii
Bibliography 127
xi
xii
5.1 Fuzzy CoCo set-up used to analyze the effects of some parameters. . . . . . . 88
5.2 Qualitative relationships between Fuzzy CoCo parameters. . . . . . . . . . . 92
5.3 Comparison of Fuzzy CoCo with knn. . . . . . . . . . . . . . . . . . . . . . 94
5.4 Generality of the three-rule fuzzy controller. . . . . . . . . . . . . . . . . . . 96
5.5 Generality of the three-rule fuzzy classifier. . . . . . . . . . . . . . . . . . . 97
5.6 Generality of the seven-rule WBCD fuzzy system. . . . . . . . . . . . . . . . 98
xiii
xiv
6.3 Results of Island Fuzzy CoCo for the WBCD problem. . . . . . . . . . . . . 104
6.4 Fuzzy CoCo set-up in Incremental Fuzzy CoCo for the WBCD problem. . . . 108
6.5 Comparison of Incremental Fuzzy CoCo with Fuzzy CoCo. . . . . . . . . . . 110
Introduction
Human thinking and in particular our capacity to make decisions has long interested scien-
tists from many disciplines: philosophy, medicine, psychology, mathematics, and engineering,
among others. Philosophers are interested in the motivations and the implications of decisions,
physicians and psychologists in the different mechanisms leading to them, and mathematicians
and engineers in obtaining a model that permits the reproduction of this capacity. The central
aim of my thesis is to propose a methodology to attain this latter goal: modeling and repro-
ducing human decision making processes.
1
2 1. Introduction
build systems to predict the outcome of a given decision. Albeit useful and widely used, these
methods and systems lack the explanatory power required to transmit knowledge to humans.
It thus becomes necessary to produce systems that, besides proposing accurate predic-
tions, also provide human-understandable explanations of the decisions made. Note that the
goal herein is not to model the actual human reasoning process, but only to express the knowl-
edge behind a decision in a manner conducive to human understanding. As with many other
human activities, decision-making involves information that is inherently imprecise, often am-
biguous, and sometimes incomplete. The systems developed should be able to deal with such
information, and still provide accurate predictions and understandable explanations.
Many human tasks may benefit from, and sometimes require, decision explanation sys-
tems. Among them one can cite client attention, diagnosis, prognosis, and planning. In client
attention, the person or system in charge usually requires vast knowledge to answer simple
questions. Diagnostic tasks involve the interpretation of many facts to identify the condition
of a system. Prediction implies the analysis of actual conditions to predict the future devel-
opment of a system. Planning tasks involve deciding on actions that, once performed, would
drive a system to a desired condition. There exists a domain that concerns all these tasks and
several others, where decisions must be made and accompanied by explanations: medicine.
Indeed, in medical practice it is customary to perform patient examination, diagnosis of risks
and diseases, prognosis of a disease development, and treatment planning. Medicine is thus a
domain where explanatory systems might bring many advantages.
Building explanatory systems is a difficult task. An extended approach, used to build
so-called expert systems, is based on knowledge gathered directly from experts. One applies a
process, known as knowledge engineering, which starts by collecting raw knowledge from di-
verse experts, continuing by systematically organizing and formalizing this knowledge, finally
producing an explanatory system capable of providing sufficiently accurate predictions. Such
knowledge construction is a lengthy task that involves many people from diverse domains, and
long sessions of knowledge gathering, formalization, and validation with experts. Despite the
fact that it is costly and time-consuming, knowledge engineering is still the best alternative to
designing large and hierarchic explanatory systems (i.e., involving hundreds of input variables,
and tens of chained, hierarchical decisions).
Other approaches for building explanatory systems, in which this thesis is interested, take
advantage of available data to support many design stages, extracting the knowledge embed-
ded in the data and representing such knowledge in a manner accessible to humans. These
approaches render knowledge engineering more automatic, as the role of the expert is reduced
to delimiting the problem and validating the soundness and the coherence of the extracted
knowledge.
performance and provide a scheme to represent knowledge in a way that resembles human
communication and reasoning. Fuzzy systems exhibit some characteristics that render them
adequate to solving the problem tackled in this thesis:
1. They represent knowledge in the form of rules, a natural way to explain decision pro-
cesses.
2. They express concepts with linguistic labels, close to human representation (e.g., “high
fever” instead of “temperature higher than 39.3 degrees”).
4. Fuzzy systems are adequate to model nonlinear behaviors, exhibited by almost all natural
processes.
The construction of fuzzy models of large and complex systems is a hard task, demand-
ing the identification of many parameters. To better understand this problem—i.e., the fuzzy
modeling problem—I propose a classification of fuzzy parameters into four classes: logic,
structural, connective, and operational. This classification serves as a conceptual framework
to decompose the fuzzy modeling problem, to understand how existing modeling techniques
deal with this problem, and to propose novel techniques to solve it efficiently.
As a general methodology for constructing fuzzy systems I use evolutionary computa-
tion, a set of computational techniques based on the principle of natural selection. Evolu-
tionary algorithms are widely used to search for adequate solutions in complex spaces that
resist analytical solutions. Specifically, I use an advanced evolutionary technique, cooperative
coevolution, which deals particularly well with requirements of complexity and modularity,
while exhibiting reasonable computational cost.
The search for interpretability in evolutionary fuzzy modeling is represented by several
constraints taken into account when designing the evolutionary algorithm. However, there
is no well-established definition for interpretability of fuzzy systems serving to define these
constraints. Based on some works that have attempted to define objective criteria to reinforce
interpretability, I define two groups of criteria—semantic and syntactic—and propose some
strategies to satisfy them.
The resulting approach, Fuzzy CoCo, is a fuzzy modeling technique, based on coopera-
tive coevolution, conceived to provide high numeric precision (accuracy), while incurring as
little a loss of linguistic descriptive power (interpretability) as possible. In Fuzzy CoCo, two
coevolving, cooperative species are defined: rules and membership functions (which define
linguistic concepts). The interpretability of the resulting systems is reinforced by applying the
strategies proposed with this aim. Fuzzy CoCo proves to be very efficient in designing highly
accurate and interpretable systems to solve hard problems, in particular modeling medical
diagnostic decision processes.
4 1. Introduction
that, in contrast to Boolean logic, a statement can be partially true (or false), and composed of
imprecise concepts. For example, the expression “I live near Geneva,” where the fuzzy value
“near” applied to the fuzzy variable “distance,” in addition to being imprecise, is subject to
interpretation. The foundations of Fuzzy Logic were established in 1965 by Lotfi Zadeh in
his seminal paper about fuzzy sets [193]. While research and applications of fuzzy sets have
grown to cover a wide spectrum of disciplines [185], the present thesis concentrates on its use
in rule-based systems.
In this section I present some concepts of fuzzy logic and fuzzy systems necessary for my
work. In the next subsection I introduce basic notions of fuzzy sets and fuzzy logic. Then, in
Section 1.2.2 I briefly present concepts related to propositional fuzzy logic: linguistic variables
and conditional fuzzy statements. Next, I describe the steps involved in the fuzzy inference
process in Section 1.2.3 to finally present, in Section 1.2.4, what a fuzzy inference system is
and explain its functioning based on a simple example.
For more detailed introductions to fuzzy logic and fuzzy inference systems, the reader is
referred to [101, 185, 186]
0
20 26
Figure 1.1 Membership function of a crisp set. The crisp set is defined as A = {x | 20 ≤ x ≤ 26}.
Fuzzy sets are a generalization of crisp sets. A fuzzy set F defined on a universe of dis-
course U is characterized by a membership function µF (x) which takes values in the interval
[0, 1] (i.e., µA : U → [0, 1]). Note that the term membership function makes more sense in
the context of fuzzy sets as it stresses the idea that µF (x) denotes the degree to which x is a
member of the set F . The operation that assigns a membership value µ(x) to a given value x
is called fuzzification. The example in Figure 1.2 shows the membership function of the fuzzy
set F = {x | x is between 20 and 26} (i.e., the fuzzy set representing approximately the same
concept than the crisp set of Figure 1.1), together with the fuzzification of an example value.
6 1. Introduction
Membership
F
1
0.67
19
0
17 20 26 29
Figure 1.2 Membership function of a fuzzy set. The fuzzy set is defined as F = {x | x is between 20
and 26}. In the figure, an example value 19 is fuzzified as (i.e. it is assigned the membership value)
µF (19) = 0.67
Membership functions might formally take any arbitrary form as they express only
an element-wise membership condition. However, they usually exhibit smooth, monotonic
shapes. This is due to the fact that membership functions are generally used to represent
linguistic units described in the context of a coherent universe of discourse (i.e., the closer
the elements, the more similar the characteristics they represent, as is the case for variables
with physical meaning). The most commonly used membership functions are triangular,
trapezoidal, and bell-shaped (see Figure 1.3).
Figure 1.3 Common membership functions. (a) triangular, (b) trapezoidal, and (c) bell-shaped.
I mentioned before that, in this thesis, I concentrate on the use of fuzzy rule-based systems.
The rules of such a system are expressed as logical implications (i.e., in the form of if . . . then
statements). We can relate our previous discussion about fuzzy sets with our need for a formal
basis for fuzzy logic taking advantage of the fact that “it is well established that propositional
logic is isomorphic to set theory under the appropriate correspondence between components
of these two mathematical systems. Furthermore, both of these systems are isomorphic to a
Boolean algebra” [101]. Some of the most important equivalences between these isomorphic
domains are:
1.2 Fuzzy Systems 7
1.2.1.3 Operations
In this subsection I present the extension of the most commonly used crisp operations in the
fuzzy domain. This extension imposes as a prime condition that all those fuzzy operations
which are extensions of crisp concepts must reduce to their usual meaning when the fuzzy sets
reduce themselves to crisp sets (i.e., when they have only 1 and 0 as membership values). In
most cases there exist several possible operators satisfying this condition, some of them having
a sole theoretic interest. For the purpose of this thesis, I present below the fuzzy operators most
commonly used in the frame of fuzzy inference systems. For the following definitions, assume
A and B are two fuzzy subsets of U; x denotes an arbitrary element of U.
Operators for intersection/AND operations (µA∩B (x) = µA (x) ∧ µB (x)). Also known as
t-norm operators, the most common—minimum, product, and bounded product, illustrated in
Figure 1.4—are defined as follows:
A B A B A B
Figure 1.4 Common t-norm (AND) operators. (a) minimum, (b) product, and (c) bounded product.
8 1. Introduction
Operators for union/OR operations (µA∪B (x) = µA (x) ∨ µB (x)). Also known as t-conorm
operators, the most common—maximum, probabilistic sum, and bounded sum, exemplified in
Figure 1.5—are defined as follows:
A B A B A B
Figure 1.5 Common t-conorm (OR) operators. (a) maximum, (b) probabilistic sum, and (c) bounded
sum.
complement/NOT operator (µA (x) = µ∼A (x)). The so-called fuzzy complement operator is
almost universally used in fuzzy inference systems. It is defined as follows:
0.33
19 Temperature
0
17 20 26 29
Figure 1.6 Example of a linguistic variable: Temperature has three possible linguistic values, labeled
Cold, Warm, and Hot, plotted above as degree of membership versus input value. In the figure, the
example input value 19 degrees is assigned the membership values µCold (19) = 0.33, µWarm (19) =
0.67, and µHot (19) = 0.
Partially Sunny, defined for the variables Temperature and Sunshine respectively. Figure 1.7
shows the bidimensional membership function characterizing the fuzzy condition “Today is
warm and partially sunny”. Note that fuzzy conditions can also be constructed using OR and
NOT operators (e.g., “Today is hot or not cloudy”).
1
W PS
0.8
0.6
0.4
0.2
0 100
35
30
25 50
Temperature 20
15 Sunshine
10 0
Figure 1.7 Bidimensional membership function. The fuzzy condition “Today is warm and partially
sunny” is characterized by the AND operation—the minimum operator in this case—applied on the
fuzzy values Warm (W) and Partially Sunny (PS) of variables Temperature and Sunshine respectively.
fuzzy facts are transformed in fuzzy conclusions by a fuzzy reasoning mechanism; knowledge
is represented by fuzzy rules. This section describes three processes participating in the fuzzy
reasoning mechanism: implication, aggregation, and defuzzification.
1.2.3.1 Implication
In a fuzzy inference system , the fuzzy rules have the form:
if (input fuzzy condition) then (output fuzzy assignment),
where output fuzzy assignment designates a fuzzy value resulting from applying a causal im-
plication operator to the input and output fuzzy sets. The term causal implication, also known
as engineering implication [101], refers to the fact that it preserves cause-effect relationships.
The input condition and the output assignment are called, respectively antecedent and conse-
quent. To understand the concept of causal implication, let us define the rule
if u is A then v is C,
where u ∈ U and v ∈ V (U and V are, respectively, the input and output universes of dis-
course). The implication operation A → C for a given input value x is characterized by the
fuzzy set C = µA→C (x) defined on the output universe V . The most commonly used im-
plication operators, minimum and product, are defined as follows (Figure 1.8 illustrates these
operators):
A rule that does not fire, i.e., whose antecedent is absolutely false (µA (x) = 0), produces an
empty implication. Consequently, this rule does not participate in the inference of the whole
system.
1.2.3.2 Aggregation
The rules of a fuzzy inference system are related by an also connective, which implies that
all the rules are evaluated. Due to the overlap between fuzzy values, several rules might fire
simultaneously with different truth levels, each proposing an output fuzzy set. In general,
fuzzy systems are expected to propose a single decision instead of a number of different fuzzy
sets. To solve this, these fuzzy sets are integrated, by means of an aggregation operation, to
obtain a single fuzzy set that describes the output of the fuzzy system. Generally, t-conorms
(i.e., OR operators) are used as aggregation operators. Figure 1.9 illustrates the aggregation
operation and shows its results using maximum and bounded-sum operators.
1.2.3.3 Defuzzification
Although a single output fuzzy set—as that obtained by aggregating rule outputs—contains
useful qualitative information, most of the time the output of a fuzzy system must be a crisp
value. The process that produces a crisp output from a fuzzy set is called defuzzification.
Many defuzzification methods have been proposed in the literature; however, the two most
1.2 Fuzzy Systems 11
A C
µA(x)
x u v
(a) (b)
C C
µA(x) C’ µA(x) C’
v v
(c) (d)
Figure 1.8 Common implication operators. Given the rule “if u is A then v is C”, the figure shows:
(a) the input fuzzy value (antecedent) A with a given input element x, (b) the output fuzzy value (conse-
quent) C, and the resulting implication C = µA→C (x) applying (c) minimum and (d) product impli-
cation operators.
commonly used methods are the Center of Areas (COA), also called center of gravity or cen-
troid, and the Mean of Maxima (MOM). Following, I define these two methods, whose results
are exemplified in Figure 1.10.
Given an output fuzzy set Y = µY (v) defined in the universe V of the variable v, the
defuzzified output y is given by the expressions:
If the output universe V is discrete, the integral expressions are replaced by the corresponding
sums:
v · µY (v)
yCOA = V
V µY (v)
L M H
1 1
M2
L1
M3
H4
0 0
v v
(a) (b)
Y’
1 1
Y
0 0
v v
(c) (d)
Figure 1.9 Aggregation operation. The output variable v of a system has three fuzzy values, L, M, and
H, presented in (a). In a given instant, four fuzzy rules are fired with activation levels of 0.4, 0.6, 0.2,
and 0.1, producing the fuzzy sets L1 , M2 , M3 , and H4 , respectively, as shown in (b). The aggregation
operation integrates these fuzzy sets applying a t-conorm operator at each point of the output universe.
The figure shows the resulting fuzzy sets when using maximum (c) and bounded-sum (d) operators.
Note that the latter operator ascribes greater importance to the number of active rules with identical
(or similar) consequent than the former does.
Generally speaking, the term fuzzy system applies to any system whose operation is mainly
based on fuzzy theory concepts such as reasoning, arithmetic, algebra, topology, or program-
ming, among others. However, in the frame of this thesis I use the terms fuzzy inference
system and fuzzy system to design a rule-based system that uses: (1) linguistic variables, (2)
fuzzy sets to represent the linguistic values of such variables, (3) fuzzy rules to describe causal
relationships between them, and (4) fuzzy inference to compute the response of the system to
a given input.
1
Y
0 yCOA yMOM v
Figure 1.10 Defuzzification. This process produces a crisp value y from a given fuzzy set Y . The
figure shows the resulting values obtained using two common methods: yCOA for the Center of Areas
and yMOM for the Mean of Maxima.
1.2 Fuzzy Systems 13
The basic structure of a fuzzy system consists of four main components, as depicted in Fig-
ure 1.11: (1) a knowledge base, which contains both an ensemble of fuzzy rules, known as the
rule base, and an ensemble of membership functions known as the database; (2) a fuzzifier,
which translates crisp inputs into fuzzy values; (3) an inference engine that applies a fuzzy rea-
soning mechanism to obtain a fuzzy output; and (4) a defuzzifier, which translates this latter
output into a crisp value. These components perform the aforementioned processes necessary
to fuzzy inference.
Knowledge base
Database
Rule base
The tourist prediction system. A simple fuzzy system is proposed to predict the number
of tourists visiting a resort (for more details on building fuzzy systems, see Section 2.1). The
prediction is based on two weather-related input variables: (1) the temperature, measured in
degrees and (2) the sunshine, expressed as a percentage of the maximum expected sunshine.
The system outputs an estimated amount of tourists expressed in terms of percentage of the
resort’s capacity. The example presented here is developed for a slightly cold, partially sunny
day: the observed temperature and sunshine are 19 degrees and 60%, respectively.
Knowledge base. The knowledge describing the system’s behavior is represented by the
membership functions defining the linguistic variables (the “database” unit in Figure 1.11)
and by a set of rules (the “rule-base” unit in figure 1.11).
Database. Three linguistic variables are defined. Two inputs: Temperature, and
Sunshine, and one output: Tourists. Each variable has three membership functions:
Cold, Warm, and Hot for the first input, Cloudy, Partially Sunny, and Sunny for the sec-
ond input, and Low, Medium, and High for the output. Figure 1.12 shows these variables.
14 1. Introduction
17 20 26 29 30 50 100 30 50 100
Temperature Sunshine Tourists
Figure 1.12 Tourist prediction example: database. The three variables Temperature, Sunshine, and
Tourists have three membership functions each one. The figure shows the fuzzification of two given
input values: 19 degrees for Temperature and 60% for Sunshine.
Rule base. The following three rules describe the behavior of the prediction system:
Rule 1: if (Temperature is Hot) or (Sunshine is Sunny) then (Tourists is High)
Rule 2: if (Temperature is Warm) and (Sunshine is Partially Sunny) then (Tourists
is Medium)
Rule 3: if (Temperature is Cold) or (Sunshine is Cloudy) then (Tourists is Low)
Fuzzifier. This unit computes the membership values of each input variable in accordance
with the fuzzy values defined in the database (see Figure 1.12).
Temperature Sunshine
µCold (19) = 0.33, µCloudy (60) = 0,
µWarm (19) = 0.67, µPartSunny (60) = 0.8,
µHot (19) = 0. µSunny (60) = 0.2.
Inference engine. This unit interprets the rules contained in the rule base. The inference is
performed in three steps: (1) antecedent activation, (2) implication, and (3) aggregation.
Antecedent activation. The inference engine takes the membership values of the input
fuzzy conditions of each rule—computed previously by the fuzzifier—and then applies
the fuzzy operators indicated to obtain the rule’s truth level.
Rule 1: if (Temperature is Hot) or (Sunshine is Sunny)
= max(0.67, 0)
= 0.67
Implication. The inference engine then applies the implication operator to each rule in
order to obtain the fuzzy output values. In this example, I apply the minimum implication
operator. Figure 1.13 shows the fuzzy sets resulting from this operation.
Figure 1.13 Tourist prediction example: implication. The inference engine applies the minimum
implication operator to each rule in order to obtain the fuzzy output values. Rule truth values are
respectively µRule 1 = 0.2, µRule 2 = 0.33, and µRule 3 = 0.67.
Aggregation. Finally, the inference engine aggregates the three fuzzy sets obtained in
the implication stage in a single fuzzy output. In this example, I apply maximum as
aggregation operator. Figure 1.14 shows the fuzzy set resulting of this operation.
Defuzzifier. As mentioned before, an output fuzzy set as that shown in Figure 1.14 carries
important qualitative information. However, the tourist prediction problem requires a crisp
value quantifying the expected percentage of tourists. The defuzzifier of this example com-
putes this value applying the COA defuzzification method (Section 1.2.3). The output of the
tourist prediction system for the given input conditions—i.e. a temperature of 19 degrees and
a sunshine of 60%—is thus approximately 32%.
µ Tourists(19,60)
0 50 100
tourists(19,60) = 32
Figure 1.14 Tourist prediction example: aggregation and defuzzification. Aggregation integrates the
fuzzy sets of Figure 1.13 into a single output fuzzy set. This fuzzy set, called here µTourists (19, 60), rep-
resents the fuzzy (qualitative) prediction of the expected amount of tourists. The defuzzification process
translates the output fuzzy value, µTourists (19, 60), into a crisp value, Tourists(19, 60), quantifying this
expected amount. In this example, the final output value is approximately 32%.
The fuzzy system presented above, can be seen as a 2-input, 1-output block as that shown
in Figure 1.15
16 1. Introduction
temperature (T)
tourists(T,S)
sunshine (S)
Figure 1.15 Tourist prediction example: input-output block representation. The tourist-prediction
block has two inputs—temperature (T) and sunshine (S)—and one output—tourists(T, S).
Mamdani fuzzy systems. These fuzzy systems have fuzzy sets as rule consequents, as those
presented in this section. They are called Mamdani fuzzy systems as they were first proposed
by Mamdani [94, 95]. Their main advantage is high interpretability, due to the fact that output
variables are defined linguistically and each rule consequent takes on a value from a set of
labels with associated meaning. However, they also have some drawbacks, among which the
most important are their lack of accuracy and their high computational cost. The accuracy of
the system is usually limited by the rigidity of linguistic values. The high computational cost
is related to the operations required to compute the center of areas of the fuzzy sets, i.e., two
integrals or sums and one division for each inference.
TSK fuzzy systems. These systems derive their name from Takagi, Sugeno, and Kang who
proposed an alternative type of fuzzy systems [168, 170], in which the consequent of each
fuzzy rule is represented by a function—usually linear—of the input variables. Thanks to
their functional description, TSK-type systems usually exhibit greater accuracy than Mamdani
systems. However, the interpretability of their rules is significantly reduced as they no longer
represent linguistic concepts.
Coming back to the tourist-prediction example, we can convert the system into a TSK sys-
tem by replacing the three linguistic consequents—Low, Medium, and High—with carefully
calculated, corresponding linear functions of the input variables. The three rule consequents
thus become:
With this consequents, the implication step propose the following predictions:
The aggregation step does not modify these values. Then, the defuzzifier receives this discrete
output set to produce the following output:
3
fi (19, 60) · µRule i
tourists(19, 60) = i=1
3
i=1 µRule i
0.2 × 38% + 0.33 × 52% + 0.67 × 26%
=
0.2 + 0.33 + 0.67
= 35%
Singleton fuzzy systems. The rule consequents of this type of systems are constant values.
Singleton fuzzy systems can be considered as a particular case of either Mamdani or TSK fuzzy
systems. In fact, a constant value is equivalent to both a singleton fuzzy set—i.e., a fuzzy set
that concentrates its membership value in a single point of the universe—and a linear function
in which the coefficients of the input variables value 0. The singleton representation constitutes
a trade-off between the interpretability offered by Mamdani systems with their meaningful
labels and the accuracy provided by TSK systems with their linear functions. Moreover, thanks
to the discrete representation of the output variable, the defuzzification process demands less
computation than in the other two cases.
In the tourist-prediction example, the Mamdani-type system is easily converted into a
singleton-type system replacing the fuzzy values Low, Medium, and High for their correspond-
ing center of areas: 0%, 50%, and 100%. With these new consequents, the implication step
propose thus:
3
fi (19, 60) · µRule i
tourists(19, 60) = i=1
3
i=1 µRule i
0.2 × 100% + 0.33 × 50% + 0.67 × 0%
=
0.2 + 0.33 + 0.67
= 31%
µHot
Rule 1
Temp. µWarm
OR Minimum
µCold Rule 2
Tourists
AND Minimum Maximum COA
µSunny Rule 3
µCloudy
Figure 1.16 Tourist prediction example: connectionist representation. Five layers of connected func-
tional blocks represent the steps and the elements of the fuzzy inference process: (1) fuzzification (mem-
bership functions), (2) antecedent activation (logical operators), (3) implication, (4) aggregation, and
(5) defuzzification.
a melange of both parents’ genetic information. Mutation introduces small changes into the
inherited chromosomes; it often results from copying errors during reproduction. Selection is
a process guided by the Darwinian principle of survival of the fittest. The fittest individuals
are those best adapted to their environment, which thus survive and reproduce.
Evolutionary computation makes use of a metaphor of natural evolution. According to
this metaphor, a problem plays the role of an environment wherein lives a population of indi-
viduals, each representing a possible solution to the problem. The degree of adaptation of each
individual (i.e., candidate solution) to its environment is expressed by an adequacy measure
known as the fitness function. The phenotype of each individual, i.e., the candidate solution
itself, is generally encoded in some manner into its genome (genotype). Like evolution in
nature, evolutionary algorithms potentially produce progressively better solutions to the prob-
lem. This is possible thanks to the constant introduction of new “genetic” material into the
population, by applying so-called genetic operators that are the computational equivalents of
natural evolutionary mechanisms.
There are several types of evolutionary algorithms, among which the best known are ge-
netic algorithms, genetic programming, evolution strategies, and evolutionary programming;
though different in the specifics they are all based on the same general principles. The archety-
pal evolutionary algorithm proceeds as follows: An initial population of individuals, P (0), is
generated at random or heuristically. Every evolutionary step t, known as a generation, the
individuals in the current population, P (t), are decoded and evaluated according to some
predefined quality criterion, referred to as the fitness, or fitness function. Then, a subset of
individuals, P (t)—known as the mating pool—is selected to reproduce, with selection of in-
dividuals done according to their fitness. Thus, high-fitness (“good”) individuals stand a better
chance of “reproducing,” while low-fitness ones are more likely to disappear.
Selection alone cannot introduce any new individuals into the population, i.e., it cannot
find new points in the search space. These points are generated by altering the selected popu-
lation P (t) via the application of crossover and mutation, so as to produce a new population,
P (t). Crossover tends to enable the evolutionary process to move toward “promising” regions
of the search space. Mutation is introduced to prevent premature convergence to local optima,
by randomly sampling new points in the search space. Finally, the new individuals P (t) are
introduced into the next-generation population, P (t + 1); although a part of P (t) can be pre-
served, usually P (t) simply becomes P (t + 1). The termination condition may be specified
as some fixed, maximal number of generations or as the attainment of an acceptable fitness
level. Figure 1.17 presents the structure of a generic evolutionary algorithm in pseudo-code
format.
Evolutionary techniques exhibit a number of advantages over other search methods as
they combine elements of directed and stochastic search. First, they usually need a smaller
amount of knowledge and fewer assumptions about the characteristics of the search space.
Second, they can more easily avoid getting stuck in local optima. Finally, they strike a good
balance between exploitation of the best solutions, and exploration of the search space. The
strength of evolutionary algorithms relies on their population-based search, and on the use of
the genetic mechanisms described above. The existence of a population of candidate solutions
entails a parallel search, with the selection mechanism directing the search to the most promis-
ing regions. The crossover operator encourages the exchange of information between these
search-space regions, while the mutation operator enables the exploration of new directions.
20 1. Introduction
begin EA
t:=0
Initialize population P (t)
while not done do
Evaluate P (t)
P (t) = Select[P (t)]
P (t) = ApplyGeneticOperators[P (t)]
P (t + 1) = Merge[P (t),P (t)]
t:=t+1
end while
end EA
*
/
A /
* +
B +
A B 2 C
2 C
a) b)
Figure 1.18 Genetic programming parse trees, representing the following programs in LISP-like
syntax: a) (/(∗AB)(+2C)) and b) (∗A(/B(+2C))). Both programs implement the expression
AB/(2 + C). It is important to note that though LISP is the language chosen by Koza to implement
genetic programming, it is not the only possibility. Any language capable of representing programs as
parse trees is adequate. Moreover, machine language has been used as well [118].
Evolution in genetic programming proceeds along the general lines of the generic evolu-
tionary algorithm (Figure 1.17), with the genetic operators adapted to the tree representation.
Reproduction is performed in both asexual and sexual ways. Asexual reproduction, or cloning,
allows some of the fittest individuals to survive into the next generation; this is equivalent to
so-called elitist selection in genetic algorithms. Sexual reproduction, i.e., crossover, starts out
by selecting a random crossover point in each parent tree and then exchanging the subtrees
“hanging” from these points, thus producing two offspring trees (Figure 1.19). Mutation in
genetic programming is considered as a secondary genetic operator and is applied much less
frequently [82]. It is performed by removing a subtree at a randomly selected point and insert-
ing at that point a new random subtree.
One important issue in genetic programming is related to the size of the trees. Under
the influence of the crossover operator, the depth of the trees can quickly increase, leading
to a fitness plateau. The presence of huge programs in the population also has direct conse-
quences vis-a-vis computer memory and evaluation speed. Most implementations of genetic
programming include mechanisms to prevent trees from becoming too deep. However, these
1.3 Evolutionary computation 23
/ * / *
A / A *
* + / +
B + A B
A B 2 C B + 2 C
2 C
2 C
Figure 1.19 Crossover in genetic programming. The two shadowed subtrees of the parent trees are
exchanged to produce two offspring trees. Note that the two parents, as well as the two offspring, are
typically of different size.
mechanisms also present a disadvantage, in that they reduce the genetic diversity contained in
larger trees. There exist a number of books consacred to other advanced topics and applications
of genetic programming, which is a field in constant expansion [6].
0/n 1/n
Z R
Figure 1.20 A finite state machine with states {Z, T, R}. The input symbols belong to the set {0, 1},
whereas the output alphabet is the set {m, n, p, q}. The edges representing the state transitions are
labeled a/b, where a represents the input symbol triggering the transition, and b represents the output
symbol. For example, when the machine is in state R and the input is 0 it switches to state T and outputs
q. A double circle indicates the initial state.
single offspring search. Mutation was the only genetic operator, and the standard deviation
vector σ was constant or modified by some deterministic algorithm. Later, recombination was
added as evolution strategies were extended to encompass populations of individuals.
A good source for further information on evolution strategies is the book by Schwe-
fel [159].
according to the given payoff function. Once the individual has been exposed to the whole
sequence of symbols, its overall performance (e.g., average payoff per symbol) is used as the
fitness value.
Like with evolution strategies, evolutionary programming first generates offspring and
then selects the next generation. There is no sexual reproduction (crossover), but rather each
parent machine is mutated to produce a single offspring. There are five possible mutation oper-
ators: change of an output symbol, change of a state transition, addition of a state, deletion of a
state, and change of the initial state. The two latter operations are not allowed when the parent
machine has only one state. Mutation operators and the number of mutations per offspring are
chosen with respect to a probability distribution. The offspring are then evaluated in the same
way as their parents, and the next generation is selected from the ensemble of parents and
offspring. This process is iterated until a new symbol is required in the environment. The best
individual obtained up to this moment provides the prediction, the new symbol is added to the
environment, and the algorithm is executed again. Note that as opposed to most evolutionary-
computation applications where fitness is fixed from the outset, evolutionary programming
inherently incorporates a dynamic fitness, i.e., the environment changes in time. Fogel’s book
[45] is a good reference on evolutionary programming.
Each matched classifier makes a bid proportional to its strength. Rules that have accumulated
a large “capital” (i.e., strength) are preferred over other rules.
The genetic algorithm adapts the classifier system by introducing new classifiers (rules).
There exist two approaches for the application of evolutionary techniques in the design of
rule-based systems in general: the Michigan approach and the Pittsburgh approach; these two
approaches are also applied to classifier systems. In the Michigan approach, each individual
represents a single rule, and the classifier list is represented by the entire population. The
strengths calculated by the bucket-brigade algorithm are used as fitness function to evaluate
the quality of each classifier. In the Pittsburgh approach, the genetic algorithm maintains a
population of candidate classifier lists, with each individual representing an entire list. A good
introduction to classifier systems is given by Goldberg [51].
Chapter 2
27
28 2. Evolutionary Fuzzy Modeling
1. Logical parameters. Functions and operators which define the type of transformations
undergone by crisp and fuzzy quantities during the inference process. They include
the shape of the membership functions, the fuzzy logic operators applied for AND, OR,
implication, and aggregation operations, and the defuzzyfication method.
2. Structural parameters. Related mainly with the size of the fuzzy system. Includes the
number of variables participating in the inference, the number of membership functions
defining each linguistic variable, and the number of rules used to perform the inference.
3. Connective parameters. Related with the topology of the system, these parameters
define the connection between the different linguistic instances. They include the an-
tecedents, the consequents, and the weights of the rules.
2.1 Fuzzy Modeling: The Art of Building Fuzzy Systems 29
4. Operational parameters. These parameters define the mapping between the linguistic
and the numeric representations of the variables. They characterize the membership
functions of the linguistic variables.
Class Parameters
Reasoning mechanism
Logical Fuzzy operators
Membership function types
Defuzzification method
Relevant variables
Structural Number of membership functions
Number of rules
Antecedents of rules
Connective Consequents of rules
Rule weights
Operational Membership function values
In fuzzy modeling, logical parameters are usually predefined by the designer based on
experience and on problem characteristics. Typical choices for the reasoning mechanism are
Mamdani-type, Takagi-Sugeno-Kang (TSK)-type, and singleton-type (Section 1.2.4) [185].
Common fuzzy operators are minimum, maximum, product, bounded product, bounded sum,
and probabilistic sum(Section 1.2.1. The most common membership functions are triangular,
trapezoidal, and bell-shaped. As for defuzzification, several methods have been proposed with
the Center Of Area (COA) and the Mean Of Maxima (MOM) being the most popular [101,185]
(Section 1.2.3).
Structural, connective, and operational parameters may be either predefined, or obtained
by synthesis or search methodologies. Generally, the search space, and thus the computational
effort, grows exponentially with the number of parameters. Therefore, one can either invest
more resources in the chosen search methodology, or infuse more a priori, expert knowledge
into the system (thereby effectively reducing the search space). The aforementioned trade-
off between accuracy and interpretability is usually expressed as a set of constraints on the
parameter values, thus complexifying the search process.
use of more automatic approaches to fuzzy modeling, in which only a part of the fuzzy model
is built from a priori knowledge.
There exist a great number of fuzzy modeling methods differing in the search strategy
they apply and in the amount of parameters they can search for—related directly with the part
of the system they require to be pre-defined. Below, I briefly describe the direct approach to
fuzzy modeling (Section 2.1.2.1) as well as other approaches based on classic identification
algorithms (Section 2.1.2.2), on constructive learning methods (Section 2.1.2.3), and on bio-
inspired techniques (Section 2.1.2.4).
3. Determination of the linguistic labels into which these variables are partitioned (struc-
tural parameters);
5. Definition of the rules that describe the model’s behavior (connective parameters);
Unfortunately, there is no general methodology for the implementation of the direct approach,
which is more an art of intuition and experience than precise theory. This approach has been,
however, successfully used since the first fuzzy system applications [94, 95] to present-day
research [15, 43, 196].
One simple, and rather intuitive, improvement of the direct approach is the use of quan-
titative input-output information to update the membership-function values and/or the rule
weights in order to fine-tune the knowledge contained in the fuzzy model [113].
The simplest methods apply linear least-squares parameter estimation as they assume
that the parameters appear in a linear fashion into the model. Such linearity assumption lim-
its their applicability in fuzzy modeling and asks for the development of methods applying
nonlinear least-squares parameter estimation [90]. Recent works using this approach, apply
identification methods such as orthogonal least-squares [163], gradient descent [34], quasi-
Newton [155], Levenberg-Marquardt [35], or auto-regressive (AR) modeling [21].
• Fuzzy-rule extraction from neural networks. This approach attempts to extract, in the
form of fuzzy rules, the knowledge embedded in trained neural networks [32, 107, 161].
The main drawback of these techniques is that the access to the knowledge requires a
previous rule-extraction phase.
• Neuro-fuzzy systems. These are fuzzy inference systems implemented as neural net-
works, taking advantage of their structural similarity (see Section 1.2.4). The main
advantage of this kind of representation is that such hybrid systems can be optimized
via powerful, well-known neural-network learning algorithms. ANFIS [71] is a well
known neuro-fuzzy system consisting of a six-layer generalized network with super-
vised learning. Most of the current research on this area is derived from the original
neuro-fuzzy concept, either in new flavors (i.e., by changing the network structure or the
32 2. Evolutionary Fuzzy Modeling
learning strategy) [19, 119, 173], or in adaptation of existing methods to face new hard
problems [91]. The main drawback of this approach is that the methods are intended to
maximize accuracy, neglecting human interpretability. In many applications this is not
acceptable.
• The Michigan approach. Each individual represents a single rule. The fuzzy inference
system is represented by the entire population. Since several rules participate in the
inference process, the rules are in constant competition for the best action to be proposed,
and cooperate to form an efficient fuzzy system. The cooperative-competitive nature of
this approach renders difficult the decision of which rules are ultimately responsible for
good system behavior. It necessitates an effective credit-assignment policy to ascribe
fitness values to individual rules.
34 2. Evolutionary Fuzzy Modeling
facilitate the task of assigning linguistic terms [125, 129, 134]. The focus is on the meaning of
the ensemble of labels instead of the absolute meaning of each term in isolation.
Membership
Normal High Very High
1
0.75
0.25
Triglycerides
0 250 [mg/dL]
P1 =200 P2 =400 P3 =1000
Figure 2.1 Semantically correct fuzzy variable: Triglycerides has three possible fuzzy values, labeled
Normal, High, and Very High, plotted above as degree of membership versus input value. The values
Pi , setting the trapezoid and triangle apices, define the membership functions. In the figure, an example
input value 250 mg/dl is assigned the membership values µN ormal (250) = 0.75, µHigh (250) = 0.25,
and µV eryHigh (250) = 0. Note that µN ormal (250) + µHigh (250) + µV eryHigh (250) = 1.
• Distinguishability. Each linguistic label should have semantic meaning and the fuzzy set
should clearly define a range in the universe of discourse of the variable. In the example
of Figure 2.1, to describe variable Triglycerides we used three meaningful labels: Nor-
mal, High, and Very High. Their membership functions are defined using parameters
P1 , P2 , and P3 .
• Justifiable number of elements. The number of linguistic labels—i.e., the number of
membership functions of a variable—should be compatible with the number of concep-
tual entities a human being can handle. This number, that should not exceed the limit of
7 ± 2 distinct terms, is related directly with the expertise of the human interacting with
the system. In the example of Figure 2.1, while a patient would not feel comfortable
with more than the three labels defined, a physician should certainly handle more labels.
• Coverage. Any element from the universe of discourse should belong to at least one of
the fuzzy sets. That is, its membership value must be different than zero for at least one
of the linguistic labels. More generally, a minimum level of coverage may be defined,
giving rise to the concept of strong coverage. Referring to Figure 2.1, we see that any
value along the x-axis belongs to at least one fuzzy set; no value lies outside the range
of all sets.
• Normalization. Since all labels have semantic meaning, then, for each label, at least one
element of the universe of discourse should have a membership value equal to one. In
Figure 2.1, we observe that all three sets Normal, High, and Very High have elements
with membership value equal to 1.
• Complementarity. For each element of the universe of discourse, the sum of all its mem-
bership values should be equal to one (as in the example in Figure 2.1). This guarantees
uniform distribution of meaning among the elements.
36 2. Evolutionary Fuzzy Modeling
VeryHigh
R
High
Normal
Age
Figure 2.2 Example of a fuzzy rule and its firing range. The rule if Triglycerides is High and Age
is Middle then Cardiac risk is Moderate, marked as R, is (partially) fired by input values into the
dashed-line rectangle (i.e. µ(R) > 0). The solid-line rectangle denotes the region where µ(R) ≥ 0.5.
• Completeness. For any possible input vector, at least one rule should be fired to prevent
the fuzzy system from getting blocked.
• Rule-base simplicity. The set of rules must be as small as possible. If, however, the rule
base is still large, rendering hard a global understanding of the system, the number of
rules that fire simultaneously for any input vector must remain low in order to furnish a
simple local view of the behavior.
• Single-rule readability. The number of conditions implied in the antecedent of a rule
should be compatible with the aforementioned number of conceptual entities a human
being can handle (i.e., number of entities ≤ 7 ± 2).
• Consistency. If two or more rules are simultaneously fired, their consequents should not
be contradictory, i.e., they should be semantically close.
of membership functions, while syntactic criteria bind the fuzzy rule base. I present below
some strategies to apply these restrictions when defining a fuzzy model.
• Linguistic labels shared by all rules. A number of fuzzy sets is defined for each variable,
which are interpreted as linguistic labels and shared by all the rules [56]. In other words,
each variable has a unique semantic definition. This results in a grid partition of the input
space as illustrated in Figure 2.3. To satisfy the completeness criterion, we normally use
a fully defined rule base, meaning that it contains all the possible rules, The example
system shown in Figure 2.4a contains all nine possible rules of the form if Triglycerides
is label and Age is label then Cardiac risk is . . . . Label sharing by itself facilitates
but does not guarantee the semantic integrity of each variable. More conditions are
necessary.
Tryglicerides
VeryHigh
High
Normal
Age
Figure 2.3 Grid partition of the input space. In this example, two semantically correct input variables,
each with three labels, divide the input space into a grid of nine regions.
• Don’t-care conditions. A fully defined rule base, as that shown in Figure 2.4a, becomes
impractical for high-dimension systems. The number of rules in a fully defined rule base
exponentially increases as the number of input variables increases (e.g., a system with
five variables, each with three labels, would contain 35 = 243 rules). Moreover, given
that each rule antecedent contains a condition for each variable, the rules might be too
lengthy to be understandable, and too specific to describe general circumstances.
To tackle these two problems some authors use “don’t-care” as a valid input label [69,
125,129]. Variables in a given rule that are marked with a don’t-care label are considered
as irrelevant. For example, in the rule base shown in Figure 2.4b two rules, RA and RB ,
containing don’t-care labels cover almost half of the input space. The rule RA :
covers the space of three rules (i.e., R3 , R6 , and R9 in Figure 2.4a) and is interpreted as:
if Age is Old then Cardiac risk is Moderate.
In the same way, the rule RB is interpreted as:
if Triglycerides is VeryHigh then Cardiac risk is . . . .
Although don’t-care labels allow a reduction of rule-base size, their main advantage is
the improvement of rule readability.
Tryglicerides
VeryHigh R7 R8 R9 RB RB
High R4 R5 R6 R4 R5 RA R5 RA
Normal R1 R2 R3 R1 R2 R0
Age
Young Middle Old
a) b) c)
Figure 2.4 Strategies to define a complete rule base. a) Fully defined rule base. b) don’t-care labels.
c) Default rule. By definition, the activation of the (fuzzy) default rule is µ(R0 ) = 1 − max(µ(Ri )),
with i = {1, 2, 3, . . .}. The rectangles denote the region where µ(Ri ) ≥ 0.5.
• Default rule. In many cases, the behavior of a system exhibits only a few regions of
interest, which can be described by a small number of rules (e.g., R5 , RA , and RB in
Figure 2.4b). To describe the rest of the input space, a simple default action, provided
by the default rule, would be enough [178]. The example in Figure 2.4c shows that
the default rule, named R0 , covers the space of rules R1 , R2 , and R4 . By definition, a
default condition is true when all other rule conditions are false. In a fuzzy context, the
default rule is as true as all the others are false. Consequently, the activation degree of
the default rule, µ(R0 ), is thus given by µ(R0 ) = 1 − max(µ(Ri )), where µ(Ri ) is the
activation degree of the i-th rule.
• Linguistic fitness. Some linguistic criteria can be reinforced by taking them into account
to compute the fitness value of a given fuzzy system. Size factors, related with sim-
plicity and readability, such as the number of rules effectively used or the number of
conditions implied in the rule antecedents can be easily quantified and included in the
fitness function.
few decades, to computerized diagnostic tools, intended to aid the physician in making sense
out of the welter of data.
A prime target for such computerized tools is in the domain of cancer diagnosis. Specif-
ically, where breast cancer is concerned, the treating physician is interested in ascertaining
whether the patient under examination exhibits the symptoms of a benign case, or whether her
case is a malignant one.
A good computerized diagnostic tool should possess two characteristics, which are often
in conflict. First, the tool must attain the highest possible performance, i.e., diagnose the
presented cases correctly as being either benign or malignant. Second, it would be highly
beneficial for such a diagnostic system to be interpretable. This means that the physician is not
faced with a black box that simply spouts answers (albeit correct) with no explanation; rather,
we would like for the system to provide some insight as to how it derives its outputs.
Thanks to their linguistic representation and their numeric behavior, fuzzy systems can
provide both characteristics. Moreover, an evolutionary fuzzy modeling approach, carefully
setup following the strategies presented in Section 2.3.3, can deal with the trade-off between
performance and interpretability. Following, I describe the Wisconsin breast cancer diagnosis
(WBCD) problem. Section 2.4.2 then describes a genetic-fuzzy approach to the WBCD prob-
lem, while Section 2.4.3 delineates the results. In Section 2.4.4, I discuss the issue of obtaining
a confidence measure of the system’s output, going beyond a mere binary, benign-malignant
classification. Finally, Section 2.4.5, briefly describes two further experiments that I carried
out.
case v1 v2 v3 · · · v9 diagnostic
1 5 1 1 ··· 1 benign
2 5 4 4 ··· 1 benign
.. .. .. .. .. . ..
. . . . . .. .
683 4 8 8 ··· 1 malignant
Note that the diagnostics do not provide any information about the degree of benignity or
malignancy.
There are several studies based on this database. Bennet and Mangasarian [10] used
linear programming techniques, obtaining a 99.6% classification rate on 487 cases (the reduced
database available at the time). However, their solution exhibits little understandability, i.e.,
diagnostic decisions are essentially black boxes, with no explanation as to how they were
attained. With increased interpretability in mind as a prior objective, a number of researchers
have applied the method of extracting Boolean rules from neural networks [160–162, 169].
Their results are encouraging, exhibiting both good performance and a reduced number of
rules and relevant input variables. Nevertheless, these systems use Boolean rules and are not
capable of furnishing the user with a measure of confidence for the decision made. My own
work on the evolution of fuzzy rules for the WBCD problem has shown that it is possible
to obtain diagnostic systems exhibiting high performance, coupled with interpretability and a
confidence measure [123–126].
benign
Figure 2.5 Proposed diagnosis system. Note that the fuzzy subsystem displayed to the left is in fact the
entire fuzzy inference system of Figure 1.11.
In order to evolve the fuzzy model we must make some preliminary decisions about the
fuzzy system itself and about the genetic algorithm encoding.
2.4 Example: Medical Diagnosis 41
0 Variable
P d
Figure 2.6 Input fuzzy variables for the WBCD problem. Each fuzzy variable has two possible fuzzy
values labeled Low and High, and orthogonal membership functions, plotted above as degree of mem-
bership versus input value. P and d define the start point and the length of membership function edges,
respectively.
• Connective parameters: the antecedents of the rules are searched by the evolutionary
algorithm. The consequents of the rules are predefined, the algorithm finds rules for the
benign diagnostic, while the default-rule consequent is the malignant diagnostic. All
rules have unitary weight.
• Operational parameters: the input membership-function values are to be found by the
evolutionary algorithm. For the output singletons I used the values 2 and 4, for benign
and malignant, respectively.
approach, using a simple genetic algorithm [177] to search for individuals whose genomes,
encoding these three parameters, are defined as follows:
• Membership-function parameters. There are nine variables (v1 – v9 ), each with two
parameters P and d, defining the start point and the length of the membership-function
edges, respectively (Figure 2.6).
where Aij represents the membership function applicable to variable vj . Aij can take on
the values: 1 (Low), 2 (High), or 0 or 3 (Don’t Care).
• Relevant variables are searched for implicitly by letting the algorithm choose Don’t care
labels as valid antecedents; in such a case the respective variable is considered irrelevant
(See Section 2.3.3).
Table 2.2 delineates the parameter encoding and Figure 2.7 shows a sample genome en-
coding a whole fuzzy system.
Table 2.2 Parameter encoding of an individual’s genome. Total genome length is 54 + 18Nr , where
Nr denotes the number of rules (Nr is set a priori to a value between 1–5, and is fixed during the
genetic-algorithm run).
To evolve the fuzzy inference system, I used a genetic algorithm with a fixed popu-
lation size of 200 individuals, and fitness-proportionate selection (Section 1.3). The algo-
rithm terminates when the maximum number of generations, Gmax , is reached (I set Gmax =
2000 + 500 × Nr , i.e., dependent on the number of rules used in the run), or when the increase
in fitness of the best individual over five successive generations falls below a certain threshold
(in these experiments, I used threshold values between 2 × 10−7 and 4 × 10−6 ).
My fitness function combines three criteria: (1) Fc : classification performance, computed
as the percentage of cases correctly classified; (2) Fe : the quadratic difference between the
continuous appraisal value (in the range [2, 4]) and the correct discrete diagnosis given by the
WBCD database (either 2 or 4); and (3) Fv : the average number of variables per active rule.
The fitness function is given by F = Fc − αFv − βFe , where α = 0.05 and β = 0.01 (these
latter values were derived empirically). Fc , the ratio of correctly diagnosed cases, is the most
important measure of performance. Fv measures the interpretability, penalizing systems with
a large number of variables per rule (on average). Fe adds selection pressure towards systems
with low quadratic error.
2.4 Example: Medical Diagnosis 43
P1 d1 P2 d2 P3 d3 P4 d4 P5 d5 P6 d6 P7 d7 P8 d8 P9 d9
...
4 3 1 5 2 7 1 1 1 6 3 7 4 6 7 1 1 5
(a)
Database
v1 v2 v3 v4 v5 v6 v7 v8 v9
P 4 1 2 1 1 3 4 7 1
d 3 5 7 1 6 7 6 1 5
Rule base
Rule 1 if (v2 is Low) and (v5 is High) and (v7 is Low) and (v9 is Low) then
(output is benign)
Default else (output is malignant)
(b)
Figure 2.7 Example of a genome for a single-rule system. (a) Genome encoding. The first 18 positions
encode the parameters P and d for the nine variables v1 –v9 . The rest encode the membership function
applicable for the nine antecedents of each rule. (b) Interpretation. Database and rule base of the
single-rule system encoded by (a). The parameters P and d are interpreted as illustrated in Figure 2.6.
2.4.3 Results
The evolutionary experiments performed fall into three categories, in accordance with the data
repartitioning into two distinct sets: training and test (or evaluation). The three experimental
categories are: (1) the training set contains all 683 cases of the WBCD database, while the test
set is empty; (2) the training set contains 75% of the WBCD cases, and the test set contains
the remaining 25% of the cases; (3) the training set contains 50% of the WBCD cases, and
the test set contains the remaining 50% of the cases. In the last two categories, the choice of
training-set cases is done randomly, and is performed anew at the outset of every evolutionary
run. The number of rules per system was also fixed at the outset, to be between one and five,
i.e., evolution seeks a system with an a priori given number of rules (the choice of number of
rules per system determines the final structure of the genome, as presented in Table 2.2).
A total of 120 evolutionary runs were performed, all of which found systems whose
classification performance exceeds 94.5%. In particular, considering the best individual per
run (i.e., the evolved system with the highest classification success rate), 78 runs led to a
fuzzy system whose performance exceeds 96.5%, and of these, 8 runs found systems whose
performance exceeds 97.5%; these results are summarized in Figure 2.8. Table 2.3 presents the
average performance obtained by the genetic algorithm over all 120 evolutionary runs, divided
according to the three experimental categories discussed above. A more detailed account of
these results is presented in Table 2.7 at the end of this chapter, which lists the top evolved 45
systems.
Table 2.4 shows the results of the best systems obtained with the fuzzy-genetic approach.
The number of rules per system was fixed at the outset to be between one and five, i.e., evo-
lution seeks a system with an a priori given number of rules. A comparison of these systems
with other approaches is presented in Section 4.1.2.
44 2. Evolutionary Fuzzy Modeling
Figure 2.8 Summary of results of 120 evolutionary runs. The histogram depicts the number of systems
exhibiting a given performance level at the end of the evolutionary run. The performance considered is
that of the best individual of the run, measured as the overall percentage of correctly classified cases
over the entire database.
Table 2.3 Summary of results of 120 evolutionary runs, divided according to the three experimental
categories discussed in the text (i.e., the three classes which differ in the training-set to test-set ratio).
The table lists the average performance over all 120 runs, where the averaging is done over the best
individual of each run. The performance value denotes the percentage of cases correctly classified.
Three such performance values are shown: (1) performance over the training set; (2) performance
over the test set; and (3) overall performance, considering the entire database. In addition, the average
number of variables per rule is also shown.
I next describe three of my top-performance systems, which serve to exemplify the solu-
tions found by my evolutionary approach. The first system, delineated in Figure 2.9, consists
of three rules (note that the default rule is not counted as an active rule). Taking into account
all three criteria of performance—classification rate, number of rules per system, and average
number of variables per rule— this system can be considered the top one over all 120 evolu-
tionary runs. It obtains 98.2% correct classification rate over the benign cases, 97.07% correct
classification rate over the malignant cases,3 and an overall classification rate (i.e., over the
entire database) of 97.8%.
3
The WBCD database contains 444 benign cases and 239 malignant cases.
2.4 Example: Medical Diagnosis 45
Table 2.4 Results of the best systems evolved by the fuzzy-genetic approach. Shown below are the
classification performance values of the top systems obtained by these approaches, along with the
average number of variables-per-rule. Results are divided into five classes, in accordance with the
number of rules-per-system, going from one-rule systems to five-rule ones.
Database
v1 v2 v3 v4 v5 v6 v7 v8 v9
P 3 5 2 2 8 1 4 5 4
d 5 2 1 2 4 7 3 5 2
Rule base
Rule 1 if (v3 is Low) and (v7 is Low) and (v8 is Low) and (v9 is Low) then
(output is benign)
Rule 2 if (v1 is Low) and (v2 is Low) and (v3 is High) and (v4 is Low) and
(v5 is High) and (v9 is Low) then (output is benign)
Rule 3 if (v1 is Low) and (v4 is Low) and (v6 is Low) and (v8 is Low) then
(output is benign)
Default else (output is malignant)
Figure 2.9 The best evolved, fuzzy diagnostic system with three rules. It exhibits an overall classifica-
tion rate of 97.8%, and an average of 4.7 variables per rule.
A thorough test of this three-rule system revealed that the second rule (Figure 2.9) is never
actually used, i.e., is fired by none of the input cases. Thus, it can be eliminated altogether from
the rule base, resulting in a two-rule system (also reducing the average number of variables
per rule from 4.7 to 4).
Can the genetic algorithm automatically discover a two-rule system, i.e., without recourse
to any post-processing (such as that described in the previous paragraph)? My results have
shown that this is indeed the case—one such solution is presented in Figure 2.10. It obtains
97.3% correct classification rate over the benign cases, 97.49% correct classification rate over
the malignant cases, and an overall classification rate of 97.36%.
Finally, Figure 2.11 delineates the best one-rule system found through my evolutionary
approach. It obtains 97.07% correct classification rate over the benign cases, 97.07% correct
classification rate over the malignant cases, and an overall classification rate of 97.07%.
46 2. Evolutionary Fuzzy Modeling
Database
v1 v2 v3 v4 v5 v6 v7 v8 v9
P 1 1 1 6 2 3
d 5 3 2 7 4 1
Rule base
Rule 1 if (v1 is Low) and (v3 is Low) then (output is benign)
Rule 2 if (v2 is Low) and (v5 is Low) and (v6 is Low) and (v8 is Low) then
(output is benign)
Default else (output is malignant)
Figure 2.10 The best evolved, fuzzy diagnostic system with two rules. It exhibits an overall classifica-
tion rate of 97.36%, and an average of 3 variables per rule.
Database
v1 v2 v3 v4 v5 v6 v7 v8 v9
P 4 4 2 2
d 3 1 5 7
Rule base
Rule 1 if (v1 is Low) and (v2 is Low) and (v6 is Low) and (v8 is Low) then
(output is benign)
Default else (output is malignant)
Figure 2.11 The best evolved, fuzzy diagnostic system with one rule. It exhibits an overall classification
rate of 97.07%, and a rule with 4 variables.
v1 v2 v3 v4 v5 v6 v7 v8 v9
Value 4 3 1 1 2 1 4 8 1
The membership value of each variable is then computed in accordance with the (evolved)
database of Figure 2.9:
2.4 Example: Medical Diagnosis 47
v1 v2 v3 v4 v5 v6 v7 v8 v9
µLow 0.8 1 1 1 1 1 1 0.4 1
µHigh 0.2 0 0 0 0 0 0 0.6 0
This completes the fuzzification phase (the “fuzzifier” unit of Figure 1.11). Having com-
puted these membership values, the inference engine (Figure 1.11) can now go on to compute
the so-called truth value of each rule. This truth value is computed by applying the fuzzy AND
operator (Section 1.2) to combine the antecedent clauses (the membership values) in a fuzzy
manner; this results in the output truth value, namely, a continuous value which represents the
rule’s degree of activation. Thus, a rule is not merely either activated or not, but in fact is ac-
tivated to a certain degree—represented by a value between 0 and 1. In this example, the rule
activation values are as follows (remember that we “chucked out” rule 2, since it was found to
never fire):
The inference engine (Figure 1.11) now goes on to apply the aggregation operator (Sec-
tion 1.2), combining the continuous rule activation values to produce a fuzzy output with a
certain truth value (the point marked “fuzzy output” in Figure 1.11). The defuzzifier then
kicks in (Figure 1.11), producing the final continuous value of the fuzzy inference system; this
latter value is the appraisal value that is passed on to the threshold unit (Figure 2.5). In this
example the appraisal value is 2.86.
In general, the appraisal value computed by my evolved fuzzy systems is in the range
[2, 4]. I chose to place the threshold value at 3, with inferior values classified as benign, and
superior values classified as malignant . Thus, in this example, the appraisal value of 2.86 is
classified as benign—which is correct.
This case in the WBCD database produces an appraisal value (2.86) which is among the
closest to the threshold value. Most other cases result in an appraisal value that lies close to
one of the extremes (i.e., close to either 2 or 4). Thus, in a sense, we can say that we are
somewhat less confident where this case is concerned, with respect to most other entries in
the WBCD database; specifically, the appraisal value can accompany the final output of the
diagnostic system, serving as a confidence measure. This demonstrates a prime advantage
of fuzzy systems, namely, the ability to output not only a (binary) classification, but also a
measure representing the system’s confidence in its output.
favoring systems with few variables-per-rule via the Fv coefficient: lower Fv , meaning fewer
variables-per-rule, entails higher overall fitness.
Can higher-performance systems be obtained by eliminating the Fv factor (albeit at the
cost of reduced interpretability due to more complicated rules)? This was the aim of the first
of the two experiments described herein. I eliminated not only the Fv measure but also the
Fe factor, the resulting fitness function thus containing solely Fc . My intent was to provide
selection pressure for but one goal: overall classification performance. With this aim in mind
I was also more “generous” with the number of rules per system: whereas previously this was
set to a (fixed) value between one and five, herein I set this value to be between three and
seven.
I performed a total of 31 evolutionary runs, the results of which are summarized in Ta-
ble 2.5. Note that my previously best system (Figure 2.9) obtained 97.8% overall classification
performance, while Table 2.5 shows an evolved system with a 98.24% classification rate. This
latter system is thus able to correctly classify 3 additional cases. This small improvement in
performance carries, however, a price: the slightly better system comprises four rules with an
average of 5.8 variables per rule, whereas the previous one (Figure 2.9) contains but three rules
with an average of 4.7 variables per rule; I have thus traded off interpretability for performance.
The judgment of whether this is worthwhile or not is entirely dependent on the human user. It
is noteworthy that this choice (interpretability versus performance) can be easily implemented
in this approach.
Table 2.5 Results of evolutionary runs in which the variables-per-rule constraint has been removed.
Results are divided into five classes, in accordance with the number of rules-per-system, going from
three-rule systems to seven-rule ones. I performed 5–7 runs per class, totaling 31 runs in all; shown
below are the resulting best systems as well as the average per class. Results include the overall
classification performance and the average number of variables-per-rule in parentheses.
As explained in Section 2.4.2, the active rules diagnose benignity, with the default diagno-
sis being malignancy; this means that the if conditions have benign as a consequent, whereas
the else condition has malignant as a consequent. The second experiment sought to find out
what would happen if this were reversed, i.e., could better systems be evolved with benignity as
the default diagnosis? Table 2.6 delineates the results of 27 evolutionary runs. While I did not
improve upon the results of the malignancy-default systems of Section 2.4.3, I did note a ten-
dency toward a smaller number of variables-per-rule. The highest-performance system found
in this experiment is fully specified in Figure 2.12. It comprises five rules with an average of
2.8 variables-per-rule, exhibiting the same overall performance (97.8%) as the three-rule, 4.7
2.4 Example: Medical Diagnosis 49
average-variables-per-rule system of Figure 2.9. This nicely illustrates the tradeoff between
these two parameters: number of rules per system and average number of variables per rule.
Table 2.6 Results of evolutionary runs in which the default diagnosis is benign (rather than malignant).
Results are divided into five classes, in accordance with the number of rules-per-system, going from
one-rule systems to five-rule ones. I performed 4–6 runs per class, totaling 27 runs in all; shown below
are the resulting best systems as well as the average per class. Results include the overall classification
performance and the average number of variables-per-rule in parentheses.
Database
v1 v2 v3 v4 v5 v6 v7 v8 v9
P 4 2 2 8 4 2 2 5 6
d 8 5 5 1 8 6 3 4 5
Rule base
Rule 1 if (v2 is High) and (v7 is High) then (output is malignant)
Rule 2 if (v2 is High) and (v3 is High) and (v4 is Low) and (v8 is High) and
(v9 is Low) then (output is malignant)
Rule 3 if (v3 is High) and (v6 is High) then (output is malignant)
Rule 4 if (v3 is Low) and (v5 is High) and (v8 is High) then (output is malig-
nant)
Rule 5 if (v1 is High) and (v6 is High) then (output is malignant)
Default else (output is benign)
Figure 2.12 The best evolved, fuzzy diagnostic system with active rules encoding malignant cases. It
exhibits an overall classification rate of 97.80%, and an average of 2.8 variables per rule.
2. Evolutionary Fuzzy Modeling
Table 2.7 Summary of top 45 evolutionary runs (of 120) described in Section 2.4.3, divided according to the three experimental categories discussed
in the text (i.e., the three classes which differ in the training-set to test-set ratio). For each of the 45 evolved systems, the table lists its fitness value,
its performance, and its average number of variables-per-rule. As explained in Section 2.4.3, the performance value denotes the percentage of cases
correctly classified. Three such performance values are shown: (1) performance over the training set; (2) performance over the test set; and (3)
overall performance, considering the entire database.
performance
performance
variables
variables
variables
training
training
fitness
fitness
fitness
Rules
test
test
0.9548 96.78 3 0.9553 97.46 95.91 97.07 4 0.9637 97.66 95.6 96.63 3
1 0.9548 96.78 3 0.9597 97.27 95.32 96.78 3 0.9607 97.37 95.01 96.19 3
0.9533 96.63 3 0.9557 96.88 96.49 96.78 3 0.9607 97.37 95.01 96.19 3
0.9576 97.36 3.5 0.9598 97.27 97.66 97.36 3 0.9603 97.95 95.89 96.93 4
2 0.9593 97.22 3 0.9548 97.07 96.49 96.93 3.5 0.9579 97.08 96.77 96.93 3
0.9578 97.07 3 0.9648 97.46 94.74 96.78 2.5 0.9636 97.95 94.43 96.19 3.5
0.9548 97.8 4.67 0.9577 97.27 97.66 97.36 3.33 0.9659 97.66 95.89 96.78 2.67
3 0.9554 97.66 4.33 0.9557 97.27 97.08 97.22 3.67 0.9546 97.37 95.89 96.63 4
0.9594 97.22 3 0.9577 97.27 95.91 96.93 3.33 0.9626 97.95 95.01 96.49 3.67
0.9543 97.8 4.75 0.952 97.27 98.25 97.51 4.25 0.9755 99.12 95.6 97.36 3.5
4 0.9529 97.51 4.5 0.9563 97.07 96.49 96.93 3.25 0.971 98 95.89 97.36 3.75
0.9594 97.22 3 0.9591 97.66 94.15 96.78 3.75 0.965 98.25 95.31 96.78 3.75
0.9599 97.51 3.4 0.9587 97.66 97.08 97.51 3.8 0.9586 97.66 96.77 97.22 3.8
5 0.9483 97.36 5 0.9575 97.66 97.08 97.51 4 0.9756 98.83 94.72 96.78 3
0.9584 97.36 3.4 0.9561 97.27 96.49 97.07 3.6 0.9623 98.25 95.01 96.63 4.2
50
Chapter 3
In the simplified models of evolution discussed in Section 1.3, we consider individuals be-
longing to a single species—i.e., sharing the same genetic encoding, and reproducing with
each other. We assume this species evolves in isolation, in an almost unchanging environment.
In nature, species live in the niches afforded by other species, modifying themselves and the
environment and being affected by such modifications.
Over time, the evolution of many species has been influenced by interactions with other
species. Species that have mutually influenced one another’s evolution are said to have coe-
volved [145]. Flowers coevolved with the insects that pollinated them and fed on their nectar
in a mutualist relationship where reproductive success (fitness) of one species is beneficial for
other species’ survival. On the other hand, predator-prey interaction constitutes an example
of competitive coevolution where the survival of individuals of one species requires the death
of individuals from other species. Although species-specific coevolution is easily identifiable
(e.g., yucca plants and the so-called yucca moths cannot reproduce without the other) it is rare,
contrary to the widespread diffuse coevolution. In diffuse coevolution, species are influenced
by a wide variety of predators, parasites, preys, and mutualists.
Coevolution has served as inspiration to propose a family of evolutionary algorithms ca-
pable of surmounting some of the limitations encountered by evolutionary computation. These
coevolutionary algorithms deal particularly well with increasing requirements of complexity
and modularity while keeping computational cost bounded.
In this chapter I present Fuzzy CoCo, an original approach which applies cooperative
coevolution to tackle the fuzzy-modeling problem. I begin by presenting some general notions
of coevolutionary computation in the next section. Then, in Section 3.2, I discuss cooperative
coevolution. In Section 3.3 I present in detail Fuzzy CoCo, my coevolutionary fuzzy modeling
approach. I finally illustrate in Section 3.4 some of the capabilities of Fuzzy CoCo by applying
it to a well-known classification problem.
51
52 3. Coevolutionary Fuzzy Modeling
evolution provides some advantages over non-coevolutionary approaches that render coevolu-
tion an interesting alternative when confronting certain problems. Among these advantages,
we can mention [77, 121, 140]:
• Coevolution favors the discovery of complex solutions whenever complex solutions are
required.
found in organisms going from cells (e.g., eukaryotic organisms resulted probably from the
mutualistic interaction between prokaryotes and some cells they infected) to superior animals
(e.g., African tick birds obtain a steady food supply by cleaning parasites from the skin of
giraffes, zebras, and other animals), including the common mutualism between plants and
animals (e.g., pollination and seed dispersion in change of food) [145].
Cooperative coevolutionary algorithms involve a number of independently evolving
species which together form complex structures, well-suited to solve a problem. The fitness
of an individual depends on its ability to collaborate with individuals from other species. In
this way, the evolutionary pressure stemming from the difficulty of the problem favors the
development of cooperative strategies and individuals. As in nature the species are genetically
isolated because they evolve in separate populations, because their genomes are genetically
incompatible, or both.
Figure 3.1 shows the general architecture of Potter’s cooperative coevolutionary frame-
work, and the way each evolutionary algorithm computes the fitness of its individuals by com-
bining them with selected representatives from the other species. The representatives can be
selected via a greedy strategy as the fittest individuals from the last generation.
Species 1 Species 2
Individual
EA to be E A Species 3
evaluated
E A Species 4
Population Population EA
Population
representatives representatives
Population
Merge
representatives
representatives
fitness fitness
evaluation
Figure 3.1 Potter’s cooperative coevolutionary system. The figure shows the evolutionary process
from the perspective of Species 1. The individual being evaluated is combined with one or more repre-
sentatives of the other species so as to construct several solutions which are tested on the problem. The
individual’s fitness depends on the quality of these solutions.
Results presented by Potter and De Jong [143] show that their approach addresses ad-
equately issues like problem decomposition and interdependencies between subcomponents.
The cooperative coevolutionary approach performs as good as, and sometimes better than,
single-population evolutionary algorithms. Finally, cooperative coevolution usually requires
less computation than single-population evolution as the populations involved are smaller, and
convergence—in terms of number of generations—is faster.
definition of the linguistic knowledge describing the behavior of a fuzzy system, and of the
values mapping this symbolic description into a real-valued world (a complete definition also
requires structural parameters, such as relevant variables and number of rules). Thus, fuzzy
modeling can be thought of as two separate but intertwined search processes: (1) the search
for the membership functions (i.e., operational parameters) that define the fuzzy variables, and
(2) the search for the rules (i.e., connective parameters) used to perform the inference.
Fuzzy modeling presents several features discussed in Section 3.2 which justify the ap-
plication of cooperative coevolution: (1) The required solutions can be very complex, since
fuzzy systems with a few dozen variables may call for hundreds of parameters to be defined.
(2) The proposed solution—a fuzzy inference system—can be decomposed into two distinct
components: rules and membership functions. (3) Membership functions are represented by
continuous, real values, while rules are represented by discrete, symbolic values. (4) These two
components are interdependent because the membership functions defined by the first group
of values are indexed by the second group (rules).
Consequently, in Fuzzy CoCo, the fuzzy modeling problem is solved by two coevolving,
cooperating species. Individuals of the first species encode values which define completely all
the membership functions for all the variables of the system. For example, with respect to the
variable Triglycerides shown in Figure 2.1, this problem is equivalent to finding the values of
P1 , P2 , and P3 .
Individuals of the second species define a set of rules of the form:
if (v1 is A1 ) and . . . and (vn is An ) then (output is C),
where the term Ai indicates which one of the linguistic labels of fuzzy variable v is used by
the rule. For example, a valid rule could contain the expression
if . . . and (T emperature is W arm) and . . . then . . .
which includes the membership function W arm whose defining parameters are contained in
the first species.
Figure 3.2 Pseudo-code of Fuzzy CoCo. Two species coevolve in Fuzzy CoCo: membership functions
and rules. The elitism strategy extracts ES individuals to be reinserted into the population after evo-
lutionary operators have been applied (i.e., selection, crossover, and mutation). Selection results in a
reduced population PS (g) (usually, the size of PS (g) is PS = PS − ES ). The line “Evaluate
population PS (g)” is elaborated in Figure 3.3.
3.3.2 Elitism
I introduced elitism to avoid the divergent behavior of Fuzzy CoCo, observed in preliminary
trial runs. Contrary to Juillé’s statement about premature convergence of cooperative coevolu-
tion [77]—due in his case to the fact that he was searching for solution-test pairs, two inher-
ently competitive species—non-elitist versions of Fuzzy CoCo often tend to lose the genetic
information of good individuals found during evolution, consequently producing populations
with mediocre individuals scattered throughout the search space. This is probably due to the
relatively small size of the populations which renders difficult the preservation (exploitation)
of good solutions while exploring the search space.
The introduction of simple elitism produces an undesirable effect on Fuzzy CoCo’s per-
formance: populations converge prematurely even with reduced values of the elitism rate Er .
To offset this effect without losing the advantages of elitism, it is necessary to increase the mu-
tation probability Pm by an order of magnitude so as to improve the exploration capabilities
of the algorithm. As the dispersion effect is less important when Fuzzy CoCo is allowed to
manage relatively large populations, the values of both Er and Pm should be reduced in such
case.
58 3. Coevolutionary Fuzzy Modeling
Fitness Cooperators
111
000 1111
0000
000
111 0000
1111
0000
1111
g−1 0000
1111
1111
0000
0000
1111
0000
1111 111
000
0000
1111 000
111
000
111
0000
1111
Selected Cooperators Fitness
g
cooperators
Species 1 Evaluation Species 2
Generation Species 1 Species 2 Environment
(a) (b)
Figure 3.3 Fitness evaluation in Fuzzy CoCo. (a) Several individuals from generation g − 1 of each
species are selected both randomly and according to their fitness to be the representatives of their
species during generation g; these representatives are called “cooperators.” (b) During the evalua-
tion stage of generation g (after selection, crossover, and mutation—see Figure 3.2), individuals are
combined with the selected cooperators of the other species to construct fuzzy systems. These systems
are then evaluated on the problem domain and serve as a basis for assigning the final fitness to the
individual being evaluated.
in the next section illustrates both the application of Fuzzy CoCo and the way interpretability
criteria are introduced in it.
Apart from Fuzzy CoCo, the use of cooperative coevolution to develop fuzzy models is scarce
and the few existing works limit themselves to a brief description and do not develop a struc-
tured approach nor study the characteristics of the proposed algorithms. Because of that, I can
can cautiously claim that Fuzzy CoCo is the first consistent work in cooperative coevolutionary
fuzzy modeling.
The earliest work reporting on coevolution for building fuzzy systems does not fit strictly
the conditions to be considered as coevolutionary. Indeed, in their approach, Włoszek and Do-
mański [182, 183], describe an island-based evolutionary system implementing in each island
a Michigan-type modeling process (Section 2.2.2). After each evolutionary step, they allow
the populations to exchange rules using an aggressive migration policy—up to 25% of the rule
base.
In 1997, Hopf [66] applied Potter’s coevolutionary model [140,141] to build the rule base
of a fuzzy system. In his approach, Hopf implements a species for each rule composing the
rule base. A candidate rule base is thus formed by an individual of each species. The pre-
defined membership functions are distributed regularly over each variable’s universe of dis-
course. This behavior-learning approach (Section 2.2.1) is applied to approximate an abstract
function obtaining results that compare favorably with those of a simple genetic algorithm.
Jun, Joung, and Sim [78] applied a method, similar to Fuzzy CoCo in that it implements
two species encoding, respectively, rules and membership functions. They applied different
evolutionary algorithms to control each species evolution: (1) a discrete alphabet representa-
tion with (µ + λ) selection and no crossover operator for the rule species, and (2) a simple
genetic algorithm with binary representation for the membership-function species. They ap-
plied their approach to design a fuzzy controller for the navigation of two simulated mobile
robots which “find their goal positions in relatively short time and avoid obstacles success-
fully”. The main difference between their approach and Fuzzy CoCo is the fitness evaluation
stage and the associated computational cost. While the evaluation of each individual in Fuzzy
CoCo is based on a reduced number of cooperating relations (i.e., Ncf + Ncr ); each individual
in their approach must cooperate with every individual in the other population to obtain its
fitness. The cost of this meticulous exploration is a huge computational load.
Two more recent works have applied cooperative coevolution to design fuzzy controllers.
Both Jeong and Oh [74] and Juang, Lin, and Lin [76] implement a species-per-rule strategy in
which each individual represents a single rule defined by its own input and output membership
functions. As usual in fuzzy-control applications, where the number of involved variables is
low, these two works neglect interpretability in favor of precision. Note that Juang, Lin, and
Lin select only random cooperators (i.e., Ncf = 0), while Jeong and Oh select only the fittest
individual as cooperators (i.e., Ncf = 1 and Ncr = 0).
60 3. Coevolutionary Fuzzy Modeling
2.5
1.5
PW
0.5
0
1 2 3 4 5 6 7
PL
Figure 3.4 Iris data distribution in the P L–P W subspace. The three classes: setosa, versicolor, and
virginica correspond to marks x, o, and +, respectively. Note that in both axes, almost all the versicolor
cases are in between setosa and virginica cases. This fact can be exploited to facilitate the design of
classification systems.
3.4 Application Example: The Iris Problem 61
Fisher’s iris data has been widely used to test classification and modeling algorithms,
recently including fuzzy models [65, 67, 152, 164, 184]. I propose herein two types of fuzzy
logic-based systems to solve the iris data classification problem: (1) fuzzy controller-type (as
used by Shi et al. [164] and Russo [152]), and (2) fuzzy classifier-type (as used by Hong
and Chen [65], Hung and Lin [67], and Wu and Chen [184]). Both types consist of a fuzzy
inference system and a selection unit.
In the fuzzy controller (see Figure 3.5) the fuzzy subsystem computes a single continuous
value estimating the class to which the input vector belongs. Note that each class is assigned
a numeric value; based on the iris data distribution, I assigned values 1, 2, and 3 to the classes
setosa, versicolor, and virginica, respectively (such an assignment makes sense only under
the assumption that versicolor is an intermediate species in between setosa and virginica, see
Figure 3.4). The selection unit approximates this value to the nearest class value using a stair
function.
vector estimation
Figure 3.5 Fuzzy controller used to solve Fisher’s iris data problem. The fuzzy subsystem is used to
compute a continuous value that describes the class to which a given input vector belongs (the three
classes: setosa, versicolor, and virginica correspond to values 1, 2, and 3, respectively). The stair-
function approximates the computed value to the nearest class value.
In the fuzzy classifier (see Figure 3.6) the fuzzy inference subsystem computes a con-
tinuous membership value for each class. The selection unit chooses the most active class,
provided that its membership value exceeds a given threshold (which I set to 0.5).
Figure 3.6 Fuzzy classifier used to solve Fisher’s iris data problem. The fuzzy subsystem is used to
compute the membership value of a given input vector for each of the three classes: setosa, versicolor,
and virginica. The maximum-and-threshold subsystem chooses the class with the maximum membership
value, provided this value exceeds a given threshold.
The two fuzzy subsystems thus differ in the number of output variables: a single out-
put (with values {1,2,3}) for the controller-type and three outputs (with values {0,1}) for the
classifier-type. In general, controller-type systems take advantage of data distribution while
classifier-type systems offer higher interpretability because the output classes are independent;
these latter systems are harder to design.
62 3. Coevolutionary Fuzzy Modeling
• Structural parameters: three input membership functions (Low, Medium, and High);
three output singletons for the controller-type system and two output singletons for the
classifier-type system; a user-configurable number of rules. The relevant variables are
one of Fuzzy CoCo’s evolutionary objectives.
• Connective parameters: the antecedents and the consequent of the rules are searched by
Fuzzy CoCo. The algorithm also searches for the consequent of the default rule. All
rules have unitary weight.
• Operational parameters: the input membership function values are to be found by Fuzzy
CoCo. For the output singletons I used the values 1, 2, and 3 for the controller-type
system, and the values 0 and 1 for the classifier-type system.
Fuzzy CoCo thus searches for four parameters: input membership-function values, rele-
vant input variables, and antecedents and consequents of rules. The genomes of the two species
are constructed as follows:
µ
Low Medium High
1
0
P1 P2 P3
Figure 3.7 Input fuzzy variables for the iris problem. Each fuzzy variable has three possible fuzzy
values labeled Low, Medium, and High, and orthogonal membership functions, plotted above as
degree of membership versus input value. Parameters P1 , P2 , and P3 define the membership-function
edges.
• Species 2: Rules (Controller-type systems). The i-th rule has the form:
if (SL is AiSL ) and . . . and (P W is AiP W ) then (output is C i ),
3.4 Application Example: The Iris Problem 63
Aij can take on the values: 1 (Low), 2 (Medium), 3 (High), or 0 (Don’t Care). C i can take
on the values: 1 (setosa), 2 (versicolor), or 3 (virginica). Relevant variables are searched
for implicitly by allowing the algorithm to choose the Don’t Care label (i.e., Aij = 0)
as valid antecedents; in such case the respective variable is considered irrelevant, and is
removed from the rule. The default rule is defined by its consequent parameter C 0 .
• Species 2: Rules (Classifier-type systems). The i-th rule has the form:
if (SL is AiSL ) and . . . and (P W is AiP W )
i i i
then {(setosa is Cset ), (versicolor is Cver ), (virginica is Cvir )},
Aij can take on the values: 1 (Low), 2 (Medium), 3 (High), or 0 (Other). Cji can take
on the values: 0 (No), or 1 (Yes). Relevant variables are searched for implicitly by
allowing the algorithm to choose non-existent membership functions (i.e., Aij = 0) as
valid antecedents; in such case the respective variable is considered irrelevant, and is
0
removed from the rule. The default rule is defined by its consequent parameters Cset ,
0 0
Cver , and Cvir .
Table 3.1 delineates the parameters encoding both species’ genomes, which together de-
scribe an entire fuzzy system.
Table 3.1 Genome encoding of parameters for both species. Genome length for Species 1 (membership
functions) is 60 bits. Genome length for Species 2 (rules) is 10 × Nr + 2 for the controller-type system
and 11 × Nr + 3 for the classifier-type system, where Nr denotes the number of rules. Values Vmin and
Vmax for parameters Pi are defined according to the ranges of the four variables SL, SW , P L, and
PW.
Table 3.2 delineates values and ranges of values of the evolutionary parameters. The
algorithm terminates when the maximum number of generations, Gmax , is reached (I set
Gmax = 500 + 100 × Nr , i.e., dependent on the number of rules used in the run), or when the
64 3. Coevolutionary Fuzzy Modeling
increase in fitness of the best individual over five successive generations falls below a certain
threshold (10−4 in my experiments). Note that mutation rates are relatively higher than with
a simple genetic algorithm, which is typical of coevolutionary algorithms [140, 143]. This is
due in part to the small population sizes and to elitism.
Table 3.2 Fuzzy CoCo set-up. Population size was fixed to 60 for controller-type systems and to 70 for
classifier-type systems.
Parameter Values
Population size PS {60,70}
Maximum generations Gmax 500 + 100 × Nr
Crossover probability Pc 1
Mutation probability Pm {0.02,0.05,0.1}
Elitism rate Er {0.1,0.2}
“Fit” cooperators Ncf 1
Random cooperators Ncr {1,2}
where α = 1/150 and β = 0.3. Fc , the ratio of correctly classified samples, is the most
important measure of performance. Fmse adds selection pressure towards systems with low
quadratic error, in which misclassifications are closer to “crossing the line” and becoming cor-
rect classifications. Fv measures the interpretability, penalizing systems with a large number
of variables per rule (on average). Fv penalization is only applied to perfect classifiers as the
number of variables in the iris data problem is low and 100% classification rate can be attained.
The value β was set small enough to penalize systems exhibiting a large quadratic error. The
value α was calculated to allow Fv to penalize rule multiplicity, but without decreasing fitness
to a value below lower-performance systems.
3.4.3 Results
In this section I present the fuzzy systems evolved using Fuzzy CoCo for the two setups de-
scribed above. I compare my systems with those presented in recently published articles,
which thus represent the state-of-the-art. I detail some high-performance systems obtained for
each problem in order to illustrate the type of systems found by Fuzzy CoCo.
3.4 Application Example: The Iris Problem 65
I performed a total of 145 evolutionary runs, searching for controller-type systems with 2,
3, and 4 rules, all of which found systems whose classification performance exceeds 97.33%
(i.e., the worst system misclassifies only 4 cases). The average classification performance
of these runs was 98.98%, corresponding to 1.5 misclassifications. 121 runs led to a fuzzy
system misclassifying 2 or less cases, and of these, 4 runs found perfect classifiers. Figure 3.8
summarizes my results.
90
Accumulated number of evolved systems
2 rules
80 3 rules
4 rules
70
60
50
40
30
20
10
0
0 1 2 3 4
Number of misclassifications
Figure 3.8 Summary of results of 145 Fuzzy CoCo runs searching for controller-type systems. The
histogram depicts the total number of systems for a given number of misclassifications, at the end of the
Fuzzy CoCo run.
Table 3.3 compares my best controller-type systems with the top systems obtained by two
other evolutionary fuzzy modeling approaches. Shi et al. [164] used a simple genetic algorithm
with adaptive crossover and adaptive mutation operators. Russo’s FuGeNeSys method [152]
combines evolutionary algorithms and neural networks to produce fuzzy systems. The search
power of this latter method lies mainly in artificial evolution, while neural-based learning tech-
niques are applied only to improve the performance of promising (high-performance) systems.
The main drawback of these two methods is the low interpretability of the generated systems.
As they do not define constraints on the input membership-function shapes, almost none of the
semantic criteria presented in Section 2.3 are respected.
As evident in Table 3.3, the evolved fuzzy systems described in this section surpass or
equal those obtained by the two other approaches in terms of performance, while maintaining
high interpretability. Thus, my approach not only produces systems exhibiting high perfor-
mance, but also ones with less rules and less antecedents per rule (which systems are thus
more interpretable).
Fuzzy CoCo found controller-type systems with 3 and 4 rules exhibiting perfect perfor-
mance (no misclassifications). Among these, I consider as best the system with fewest rules
and variables. Figure 3.9 presents one such three-rule system, with an average of 1.7 variables
per rule.
66 3. Coevolutionary Fuzzy Modeling
Table 3.3 Comparison of the best controller-type systems evolved by Fuzzy CoCo with the top fuzzy sys-
tems obtained by Russo’s FuGeNeSys method [152] and with those obtained using a single-population
evolutionary approach by Shi et al. [164]. Shown below are the classification rates of the top systems
obtained by these approaches, along with the average number of variables per rule in parentheses.
Results are divided into four classes, in accordance with the number of rules per system, going from
two-rule systems to five-rule ones. The highlighted system is the top-performance one, detailed in
Figure 3.9. A dash implies that the relevant system was not designed.
Rules per Shi et al. [164] FuGeNeSys [152] Fuzzy CoCo
system best best average best
2 – – 98.71% (1.9) 99.33% (2)
3 – – 99.10% (1.3) 100% (1.7)
4 98.00% (2.6) – 99.12% (1.3) 100% (2.5)
5 – 100% (3.3) – –
Database
SL SW PL PW
P1 5.68 3.16 1.19 1.55
P2 6.45 3.16 1.77 1.65
P3 7.10 3.45 6.03 1.74
Rule base
Rule 1 if (P L is High) then (output is virginica)
Rule 2 if (SW is Low) and (P W is Low) then (output is virginica)
Rule 3 if (SL is Medium) and (P W is Medium) then (output is setosa)
Default else (output is setosa)
Figure 3.9 The best evolved, controller-type system with three rules. It exhibits a classification rate
of 100%, and an average of 1.7 variables per rule. Note that there is no rule classifying explicitly
for versicolor class. The system interprets as versicolor cases where rules classifying both setosa and
virginica classes have similar activation levels. For example, if the activation level is 0.5 for both
classes 1 (setosa) and 3 (virginica), then the defuzzifier will output 2 (versicolor).
35
25
20
15
10
0
1 2 3 4 5 6 7
Number of misclassifications
Figure 3.10 Summary of results of 144 Fuzzy CoCo runs searching for classifier-type systems. The
histogram depicts the total number of systems for a given number of misclassifications, at the end of the
Fuzzy CoCo run.
with either a few [184] or simple rules [65]. They do not, however, constrain the input mem-
bership functions, thus rendering the obtained systems less interpretable. Hung and Lin [67]
proposed a neuro-fuzzy hybrid approach to learn classifier-type systems. As their learning
strategy hinges mainly on the adaptation of the connection weights, their systems exhibit low
interpretability.
The evolved fuzzy systems described herein surpass or equal those obtained by these
three approaches in terms of both performance and interpretability. As evident in Table 3.4,
my approach not only produces systems exhibiting higher performance, but also ones with less
rules and less antecedents per rule (which are thus more interpretable).
Table 3.4 Comparison of the best classifier-type systems evolved by Fuzzy CoCo with systems obtained
applying constructive learning methods proposed by Hong and Chen [65] and by Wu and Chen [184],
and with those obtained by Hung and Lin’s neuro-fuzzy approach [67]. Shown below are the clas-
sification rates of the top systems obtained by these approaches, along with the average number of
variables per rule in parentheses. Results are divided into four classes, in accordance with the number
of rules per system, going from two-rule systems to eight-rule ones. The highlighted system is the top-
performance one, detailed in Figure 3.11. A dash implies that the relevant system was not designed.
Fuzzy CoCo found classifier-type systems with 3 and 4 rules exhibiting the highest clas-
sification performance to date (i.e., 99.33%, corresponding to 1 misclassification). I consider
68 3. Coevolutionary Fuzzy Modeling
as most interesting the system with the smallest number of conditions (i.e., the total number of
variables in the rules). Figure 3.11 presents one such system. This three-rule system presents
an average of 2.3 variables per rule, corresponding to a total of 7 conditions.
Database
SL SW PL PW
P1 4.65 2.68 4.68 0.39
P2 4.65 3.74 5.26 1.16
P3 5.81 4.61 6.03 2.03
Rule base
Rule 1 if (P W is Low)
then {(setosa is Yes), (versicolor is No), (virginica is No) }
Rule 2 if (P L is Low) and (P W is Medium)
then{(setosa is No), (versicolor is Yes), (virginica is No)}
Rule 3 if (SL is High) and (SW is Medium) and (P L is Low) and (P W is High)
then{(setosa is No), (versicolor is Yes), (virginica is No)}
Default else{(setosa is No), (versicolor is No), (virginica is Yes)}
Figure 3.11 The best evolved, classifier-type system with three rules. It exhibits a classification rate
of 99.33%, and an average of 2.3 variables per rule.
3.5 Summary
In this chapter I presented a novel approach to fuzzy modeling based on cooperative coevo-
lution. The method, called Fuzzy CoCo, involves the existence of two separated coevolving
species: rules and membership functions. A greedy fitness evaluation strategy, based on the
use of selected cooperators from each species, allows the method to balance adequately the
exploration and the exploitation of the search while keeping bounded the computational cost.
I propose Fuzzy CoCo as a methodology for modeling fuzzy systems and have conceived
it to allow a high degree of freedom in the type of fuzzy systems it can design. Fuzzy CoCo can
be used to model Mamdani-type, TSK-type, and singleton-type fuzzy models2.1. The rules
can contain an arbitrary number of antecedents (i.e., zero, one, or many) for the same variable.
The designer is free to choose the type of membership functions used for each variable and the
way they are parametrized. The membership functions can be defined either as shared by all
fuzzy rules, or per-rule. Fuzzy CoCo is thus highly general and generic.
The configurability of Fuzzy CoCo facilitates the management of the interpretability-
accuracy trade-off. To satisfy interpretability criteria, the user must impose conditions on
the input and output membership functions as well as on the rule definition. In Fuzzy CoCo
these conditions are translated both into restrictions on the choice of fuzzy parameters and into
criteria included in the fitness function.
I also illustrated the use of Fuzzy CoCo by applying it to a well-known classification
problem: the Iris problem. Two types of fuzzy systems—controller-type and classifier-type—
were successfully modelled to solve the problem. Further applications of Fuzzy CoCo to hard
medical diagnosis problems are presented in the next chapter.
Chapter 4
Although computerized tools for medical diagnosis have been developed since the early 60s,
their number and capabilities have grown impressively in the last years due, mainly, to the
availability of medical data and increased computing power. Most of these systems are con-
ceived to provide high diagnostic performance. However, interest has recently shifted to sys-
tems capable of providing, besides a correct diagnosis, insight on how the answer was ob-
tained. Thanks to their linguistic representation and their numeric behavior, fuzzy systems can
provide both performance and explanataion.
In this chapter, I apply Fuzzy CoCo to model the decision processes involved in two
breast-cancer diagnostic problems. First, I describe in Section 4.1, the application of Fuzzy
CoCo to the Wisconsin Breast Cancer Diagnostic problem (already presented in Section 2.4).
Then, in Section 4.2, I present the application of Fuzzy CoCo to the design of COBRA, a
fuzzy-based tool to assess mammography interpretation.
69
70 4. Breast Cancer Diagnosis by Fuzzy CoCo
Fuzzy CoCo is used to search for four parameters: input membership-function values,
relevant input variables, and antecedents and consequents of rules. These search goals are more
ambitious than those defined for the fuzzy-genetic approach (Section 2.4.2) as the consequents
of rules are added to the search space. The genomes of the two species are constructed as
follows:
• Species 1: Membership functions. There are nine variables (v1 – v9 ), each with two
parameters, P and d, defining the start point and the length of the membership-function
edges, respectively (Figure 2.6).
Aij can take on the values: 1 (Low), 2 (High), or 0 or 3 (Don’t Care). C i bit can take on
the values: 0 (Benign) or 1 (Malignant). Relevant variables are searched for implicitly
by letting the algorithm choose Don’t care labels as valid antecedents; in such a case the
respective variable is considered irrelevant.
Table 4.1 delineates the parameter encoding for both species’ genomes, which together
describe an entire fuzzy system. Note that in the fuzzy-genetic approach (Section 2.4.2) both
membership functions and rules were encoded in the same genome, i.e., there was only one
species.
Table 4.1 Genome encoding of parameters for both species. Genome length for membership functions
is 54 bits. Genome length for rules is 19 × Nr + 1, where Nr denotes the number of rules.
Species 2: Rules
Parameter Values Bits Qty Total bits
A {0,1,2,3} 2 9 × Nr 18 × Nr
C {1,2} 1 Nr + 1 Nr + 1
Total Genome Length 19 × Nr + 1
To evolve the fuzzy inference system, I applied Fuzzy CoCo with the same evolutionary
parameters for both species. Table 4.2 delineates the values and ranges of values used for these
parameters. The algorithm terminates when the maximum number of generations, Gmax , is
reached (I set Gmax = 1000 + 100 × Nr , i.e., dependent on the number of rules used in the
run), or when the increase in fitness of the best individual over five successive generations falls
below a certain threshold (10−4 in my experiments).
4.1 Breast-biopsy Analysis: The WBCD Problem 71
Parameter Values
Population size P s [30-90]
Maximum generations Gmax 1000 + 100Nr
Crossover probability Pc 1
Mutation probability Pm [0.02-0.3]
Elitism rate Er [0.1-0.6]
“Fit” co-operators Ncf 1
Random co-operators Ncr {1,2,3,4}
4.1.2 Results
A total of 495 evolutionary runs were performed, all of which found systems whose classifi-
cation performance exceeds 96.7%. In particular, considering the best individual per run (i.e.,
the evolved system with the highest classification success rate), 241 runs led to a fuzzy system
whose performance exceeds 98.0%, and of these, 81 runs found systems whose performance
exceeds 98.5%.; these results are summarized in Figure 4.1.
Table 4.3 compares the best systems found by Fuzzy CoCo with the top systems obtained
by the fuzzy-genetic approach (Section 2.4) [125, 130] and with the systems obtained by Se-
tiono’s NeuroRule approach [161] (note that the results presented by these two works were
the best reported to date for genetic-fuzzy and neuro-Boolean rule systems, respectively, and
that they were compared with other previous approaches such as [160, 162, 169]). The evolved
72 4. Breast Cancer Diagnosis by Fuzzy CoCo
160
120
Number of systems
80
40
0
94 95 96 97 98 99
Classification performance
Figure 4.1 Summary of results of 495 evolutionary runs. The histogram depicts the number of systems
exhibiting a given performance level at the end of the evolutionary run. The performance considered is
that of the best individual of the run, measured as the overall percentage of correctly classified cases
over the entire database.
fuzzy systems described herein can be seen to surpass those obtained by other approaches in
terms of performance, while still containing simple, interpretable rules. As shown in Table 4.3,
I obtained higher-performance systems for all rule-base sizes but one, i.e., from two-rule sys-
tems to seven-rule ones, while all my one-rule systems perform as well as the best system
reported by Setiono.
Table 4.3 Comparison of the best systems evolved by Fuzzy CoCo with the top systems obtained using
single-population evolution [125] and with those obtained by Setiono’s NeuroRule approach [161].
Shown below are the classification performance values of the top systems obtained by these approaches,
along with the number of variables of the longest rule in parentheses. Results are divided into seven
classes, in accordance with the number of rules per system, going from one-rule systems to seven-rule
ones.
Rules per Neuro-Rule [161] Single population Fuzzy CoCo
system GA [125]a
best best average best
1 97.36% (4) 97.07% (4) 97.36% (4) 97.36% (4)
2 – 97.36% (4) 97.73% (3.9) 98.54% (5)
3 98.10% (4) 97.80% (6) 97.91% (4.4) 98.54% (4)
4 – 97.80% (-) 98.12% (4.2) 98.68% (5)
5 98.24% (5) 97.51% (-) 98.18% (4.6) 98.83% (5)
6 – 98.10% (9) 98.18% (4.3) 98.83% (5)
7 – 97.95% (8) 98.25% (4.7) 98.98% (5)
a
Data extracted from Tables 2.4 and 2.5
4.1 Breast-biopsy Analysis: The WBCD Problem 73
Database
v1 v2 v3 v4 v5 v6 v7 v8 v9
P 2 1 1 1 6 1 3 5 2
d 7 8 4 8 1 4 8 4 1
Rule base
Rule 1 if (v1 is Low) and (v3 is Low) then (output is benign)
Rule 2 if (v4 is Low) and (v6 is Low) and (v8 is Low) and (v9 is Low) then
(output is benign)
Rule 3 if (v1 is Low) and (v3 is High) and (v5 is High) and (v8 is Low) and
(v9 is Low) then (output is benign)
Rule 4 if (v1 is Low) and (v2 is High) and (v4 is Low) and (v5 is Low) and
(v8 is High) then (output is benign)
Rule 5 if (v2 is High) and (v4 is High) then (output is malignant)
Rule 6 if (v1 is High) and (v3 is High) and (v6 is High) and (v7 is High) then
(output is malignant)
Rule 7 if (v2 is High) and (v3 is High) and (v4 is Low) and (v5 is Low) and
(v7 is High) then (output is malignant)
Default else (output is malignant)
Figure 4.2 The best evolved, fuzzy diagnostic system with seven rules. It exhibits an overall classifica-
tion rate of 98.98%, and its longest rule includes 5 variables.
I next describe two of my top-performance systems, which serve to exemplify the solu-
tions found by Fuzzy CoCo. The first system, delineated in Figure 4.2, presents the highest
classification performance evolved to date. It consists of seven rules with the longest rule
including 5 variables. This system obtains an overall classification rate (i.e., over the entire
database) of 98.98%.
In addition to the above seven-rule system, evolution found systems with between 2 and
6 rules exhibiting excellent classification performance, i.e., higher than 98.5% (Table 4.3).
Among these systems, I consider as the most interesting the system with the smallest number
of conditions (i.e., total number of variables in the rules). Figure 4.3 presents one such two-
rule system, containing a total of 8 conditions, and which obtains an overall classification rate
of 98.54%; its longest rule has 5 variables.
The improvement attained by Fuzzy CoCo, while seemingly slight (0.5-1%) is in fact
quite significant. A 1% improvement implies 7 additional cases which are classified correctly.
At the performance rates in question (above 98%) every additional case is hard-won. Indeed,
try as I did with the fuzzy-genetic approach—tuning parameters and tweaking the setup—I ar-
rived at a performance impasse. Fuzzy CoCo, however, readily churned out better-performance
systems, which were able to classify a significant number of additional cases; moreover, these
systems were evolved in less time.
74 4. Breast Cancer Diagnosis by Fuzzy CoCo
Database
v1 v2 v3 v4 v5 v6 v7 v8 v9
P 3 1 3 4 5 7 2
d 8 3 1 2 2 4 1
Rule base
Rule 1 if (v1 is Low) and (v3 is Low) and (v5 is Low) then (output is benign)
Rule 2 if (v1 is Low) and (v4 is Low) and (v6 is Low) and (v8 is Low) and (v9
is Low) then (output is benign)
Default else (output is malignant)
Figure 4.3 The best evolved, fuzzy diagnostic system with two rules. It exhibits an overall classification
rate of 98.54%, and a maximum of 5 variables in the longest rule.
Table 4.5 Variables corresponding to radiologic features. There are two groups of variables that
describe the mammographic existence of microcalcifications and masses.
Microcalcifications Mass
v4 Disposition v10 Morphology
1 Round 1 Oval
2 Indefinite 2 Round
3 Triangular or Trapezoidal 3 Lobulated
4 Linear or Ramified 4 Polilobulated
v5 Other signs of group form 5 Irregular
1 None v11 Margins
2 Major axis in direction of nipple 1 Well delimited
3 Undulating contour 2 Partially well delimited
4 Both previous 3 Poorly delimited
v6 Maximum diameter of group 4 Spiculated
[3-120] mm v12 Density greater than parenchyma
v7 Number 1 Not
1 <10 2 Yes
2 10 to 30 v13 Focal distortion
3 >30 1 Not
v8 Morphology 2 Yes
1 Ring shaped v14 Focal asymmetry
2 Regular sharp-pointed 1 Not
3 Too small to determine 2 Yes
4 Irregular sharp-pointed v15 Maximum diameter
5 Vermicular, ramified [5-80] mm
v9 Size irregularity
1 Very regular
2 Sparingly regular
3 Very irregular
76 4. Breast Cancer Diagnosis by Fuzzy CoCo
Reading form
Reading Fuzzy system Malignancy
Threshold unit
Biopsy
Database
Figure 4.4 The COBRA system comprises a user interface, a reading form—used to collect the pa-
tient’s data from either the user or the database, a database containing selected cases, and a diagnostic
decision unit which is the core of the system. In the decision unit the fuzzy system estimates the malig-
nancy of the case and the threshold unit outputs a biopsy recommendation.
Figure 4.5 User interface: reading form. The snapshot illustrates the reading form through which the
COBRA system collects data about a case.
• Structural parameters: two input membership functions (Low and High; see Figure 4.7);
two output singletons (benign and malignant); a user-configurable number of rules. The
relevant variables are one of Fuzzy CoCo’s evolutionary objectives. Low and High lin-
guistic labels may be further replaced by labels having medical meaning according to
the specific context of each variable.
• Connective parameters: the antecedents and the consequent of the rules are searched by
Fuzzy CoCo. The algorithm also searches for the consequent of the default rule. All
rules have unitary weight.
Figure 4.6 User interface: biopsy recommendation. The snapshot shows a diagnostic recommendation
for a given patient. Note that besides recommending to practice a biopsy, COBRA gives information
about the appraisal value (indeterminate, 2.8), about the rules involved in the decision (rules 7, 8, and
default), and about their truth value (0.33, 0.4, and 0.6, respectively).
• Species 1: Membership functions. The fifteen input variables (v1 – v15 ) present three
different types of values: continuous (v1 ,v6 , and v15 ), discrete (v3 – v5 and v7 – v11 ), and
binary (v2 and v12 – v14 ). It is not necessary to encode membership functions for binary
variables as they can only take on two values. The membership-function genome en-
codes the remaining 11 variables—three continuous and eight discrete—each with two
parameters P1 and P2 , defining the membership-function apices (Figure 4.7). Table 4.6
delineates the parameters encoding the membership-function genome.
4.2 Mammography Interpretation: The COBRA system 79
Membership
Low High
1
0 Variable
P1 P2
Figure 4.7 Input fuzzy variables. Each fuzzy variable has two possible fuzzy values labeled Low and
High, and orthogonal membership functions, plotted above as degree of membership versus input value.
P1 and P2 define the membership-function apices.
Table 4.6 Genome encoding for membership-function species. Genome length is 106 bits.
Variable type Qty Parameters Bits Total bits
Continuous 3 2 7 42
Discrete 8 2 4 64
Total Genome Length 106
Table 4.8 delineates values and ranges of values of the evolutionary parameters. The algorithm
terminates when the maximum number of generations, Gmax , is reached (I set Gmax = 700 +
200 × Nr , i.e., dependent on the number of rules used in the run), or when the increase in
fitness of the best individual over five successive generations falls below a certain threshold
(10−4 in these experiments).
The fitness definition takes into account medical diagnostic criteria. The most commonly em-
ployed measures of the validity of diagnostic procedures are the sensitivity and specificity, the
likelihood ratios, the predictive values, and the overall classification (accuracy) [18]. Table 4.9
provides expressions for four of these measures which are important for evaluating the perfor-
mance of my systems. Three of them are used in the fitness function, the last one is used in
Section 4.2.4 to support the analysis of the results. Besides these criteria, the fitness function
provides extra selective pressure based on two syntactic criteria: simplicity and readability
(see Section 2.3).
Table 4.9 Diagnostic performance measures. The values used to compute the expressions are: True
positive (TP): the number of positive cases correctly detected, true negative (TN): the number of neg-
ative cases correctly detected, false positive (FP): the number of negative cases diagnosed as positive,
and false negative (FN): the number of positive cases diagnosed as negative.
TP
Sensitivity
TP + FN
TN
Specificity
TN + FP
TP + TN
Accuracy
TP + TN + FP + FN
TP
Positive predictive value (PPV)
TP + FP
4.2 Mammography Interpretation: The COBRA system 81
The fitness function combines the following five criteria: 1) Fsens : sensitivity, or true-
positive ratio, computed as the percentage of positive cases correctly classified; 2) Fspec : speci-
ficity, or true-negative ratio, computed as the percentage of negative cases correctly classified
(note that there is usually an important trade-off between sensitivity and specificity which ren-
ders difficult the satisfaction of both criteria); 3) Facc : classification performance, computed
as the percentage of cases correctly classified; 4) Fr : rule-base size fitness, computed as the
percentage of unused rules (i.e., the number of rules that are never fired and can thus be re-
moved altogether from the system); and 5) Fv : rule-length fitness, computed as the average
percentage of don’t-care antecedents—i.e., unused variables—per rule. This order also repre-
sents their relative importance in the final fitness function, from most important (Fsens ) to least
important (Fr and Fv ).
The fitness function is computed in three steps—basic fitness, accuracy reinforcement,
and size reduction—as explained below:
Fsens + αFspec
F1 = ,
1+α
where the weight factor α = 0.3 reflects the greater importance of sensitivity.
where β = 0.01. Facc = Facc when Facc > 0.7; Facc = 0 elsewhere. This step slightly
reinforces the fitness of high-accuracy systems.
F2 + γFsize
F = ,
1 + 2γ
where γ = 0.01. Fsize = (Fr + Fv ) if Facc > 0.7 and Fsens > 0.98; Fsize = 0 elsewhere.
This step rewards top systems exhibiting a concise rule set, thus directing evolution
toward more interpretable systems.
4.2.4 Results
A total of 65 evolutionary runs were performed, all of which found systems whose fitness
exceeds 0.83. In particular, considering the best individual per run (i.e., the evolved system
with the highest fitness value), 42 runs led to a fuzzy system whose fitness exceeds 0.88, and
of these, 6 runs found systems whose fitness exceeds 0.9; these results are summarized in
Figure 4.8.
Table 4.10 shows the results of the best systems obtained. The maximum number of rules
per system was fixed at the outset to be between ten and twenty-five.
82 4. Breast Cancer Diagnosis by Fuzzy CoCo
25
20
15
10
0
0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92
Figure 4.8 Summary of results of 65 evolutionary runs. The histogram depicts the number of systems
exhibiting a given fitness value at the end of the evolutionary run. The fitness considered is that of the
best individual of the run.
Table 4.10 Results of the best systems evolved. Results are divided into four classes, in accordance
with the maximum number of rules-per-system, going from 10-rule systems to 25-rule ones. Shown
below are the fitness values of the top systems as well as the average fitness per class, along with the
number of rules which effectively used by the system (Reff ) and the average number of variables per
rule (Vr ). The performance of the highlighted systems is presented in more detail in Table 4.11.
Maximum number Best individual Average per class
of rules Fitness Reff Vr Fitness Reff Vr
10 0.8910 9 2.22 0.8754 9.17 2.52
15 0.8978 12 2.50 0.8786 12.03 2.62
20 0.9109 17 2.41 0.8934 14.15 2.59
25 0.9154 17 2.70 0.8947 15.78 2.76
As mentioned before, my fitness function includes two syntactic criteria to favor the evo-
lution of good diagnostic systems exhibiting interpretable rule bases (see Section 2.3). Con-
cerning the simplicity of the rule base, rules that are encoded in a genotype but that never fire
are removed from the phenotype (the final system), rendering it more interpretable. Moreover,
to improve readability, the rules are allowed (and encouraged) to contain don’t-care conditions.
The relatively low values of Reff and Vr in Table 4.10 confirm the reinforced interpretability of
the evolved fuzzy systems.
Table 4.11 shows the diagnostic performance of two selected evolved systems. The first
system, which is the top one over all 65 Fuzzy CoCo runs, is a 17-rule system exhibiting a
sensitivity of 99.47% (i.e., it detects all but one of the positive cases), and a specificity of
68.69% (i.e., 226 of the 329 negative cases are correctly detected as benign). The second sys-
tem is the best found when searching for ten-rule systems. The sensitivity and the specificity
of this 9-rule system are, respectively, 98.40% and 64.13%. As mentioned before, the usual
4.2 Mammography Interpretation: The COBRA system 83
positive predictive value (PPV) of mammography ranges between 15 and 35%. As shown in
Table 4.11, Fuzzy CoCo increases this value beyond 60%—64.36% for the 17-rule system—
while still exhibiting a very high sensitivity.
Table 4.11 Diagnostic performance of two selected evolved systems. Shown below are the sensitiv-
ity, the specificity, the accuracy, and the positive predictive value (PPV) of two selected systems (see
Table 4.9). In parentheses are the values, expressed in number of cases, leading to such performance
measures. The 17-rule system is the top system. The 9-rule system is the best found when searching for
ten-rule systems.
17-rule 9-rule
Sensitivity 99.47% (186/187) 98.40% (184/187)
Specificity 68.69% (226/329) 64.13% (211/329)
Accuracy 79.84% (412/516) 76.55% (395/516)
PPV 64.36% (186/289) 60.93% (184/302)
To assess the generalization capabilities of the fuzzy systems obtained, I apply ten-fold
cross-validation [181]. For each ten-fold test the data set is first partitioned into ten equal-
sized sets, then each set is in turn used as the test set while Fuzzy CoCo trains on the other
nine sets. Table 4.12 presents the average results obtained performing six cross-validation runs
for several rule-base sizes. The percent difference between training and test sets is relatively
low, even for large rule bases, indicating low overfitting. A further discussion on generalization
in fuzzy systems is presented in Section 5.4.
Table 4.12 Results of ten-fold cross-validation. Results are divided into three classes, in accordance
with the maximum number of rules-per-system, going from 3-rule systems to 30-rule ones. Shown below
is the average fitness per class obtained in both training and test sets (the fitness represents the average
of the average fitnesses for the six cross-validation runs), along with the difference between training
and test performances, expressed both in absolute value and percentage.
Maximum number Training set Test set Difference Percentage
of rules
3 0.8269 0.7865 0.0404 4.89
7 0.8511 0.8036 0.0474 5.58
10 0.8712 0.8030 0.0682 7.83
15 0.8791 0.8125 0.0666 7.58
20 0.8814 0.8244 0.0569 6.47
30 0.8867 0.8104 0.0763 8.60
In summary, Fuzzy CoCo was able to evolve high-performance systems for two hard
problems, with all systems exhibiting high interpretability.
Chapter 5
(a) Define the logical parameters. As noted in Section 2.1, logical parameters are
predefined by the designer based on experience and on problem characteristics.
The user must define the type of fuzzy system (e.g., singleton-type), the operators
used for AND, OR, implication, and aggregation operations (e.g., min-
max operators), the type of membership functions (e.g., orthogonal, trapezoidal
ones), and the defuzzyfication method (e.g., COA).
(b) Choose the structural parameters.
85
86 5. Analyzing Fuzzy CoCo
• A set of relevant variables should be defined. Usually, this set includes all the
available variables, as Fuzzy CoCo can be set to automatically reduce their
number.
• Fuzzy CoCo requires predefining the number of membership functions. Al-
though this number could be set to a relatively high value to then let Fuzzy
CoCo seek automatically an efficient subset of the membership functions, this
strategy must be used carefully. The increase in the size of the search space
may prevent Fuzzy CoCo from converging towards good results.
• The number of rules is fixed (by the designer) for a given Fuzzy-CoCo run. A
discussion about this number is presented in Section 5.2.1.
(c) Encode the connective parameters into the rules genome. The rules may be either
complete (i.e., containing at least one antecedent for each variable) or incomplete
(i.e., using don’t-care labels). The antecedents in rules may be connected merely
by the AND operator, or may contain also OR and NOT operators. Fuzzy CoCo
thus offers the designer the freedom of choosing any type of rule, given that there
exists a proper way to encode it. If the problem requires a good interpretability, the
syntactic criteria presented in Section 2.3 must be taken into account to constrain
the definition of the rules genome.
(d) Encode the operational parameters into the membership-function genome. The
membership functions can be of arbitrary form. The only condition imposed by
Fuzzy CoCo is that all possible labels implied by the rules species should be de-
fined. Besides, to reinforce the interpretability of the system, the semantic criteria
presented in Section 2.3 should be used to define some restrictions on the definition
of the membership functions.
Note that Fuzzy CoCo is a methodology for improving the performance and speed of the
fuzzy-modeling process. It cannot correct on its own wrong decisions made during the
definition of the fuzzy-system parameters. Thus, the designer need posses knowledge of
the problem or a good evaluation heuristics.
(a) Population size PS . Fuzzy CoCo requires smaller populations than a simple ge-
netic algorithm; typically, 50 to 80 percent smaller. This markedly reduces the
computational cost. In the WBCD example, typical population sizes in Fuzzy
CoCo are 40 to 80, while the standard fuzzy-genetic approach uses 200 individ-
uals. A deeper analysis of the effects of PS is presented in Section 5.2.2.
(b) Number of cooperators Nc . Typical values range from 1 to 6 (1 to 3 “fit” coopera-
tors and 0 to 4 random cooperators). The number of “fit” cooperators Ncf directly
affects exploitation, while the number of random cooperators Ncr directly affects
5.2 Effects of Some Parameters on Performance 87
exploration. Both of them affect the computational cost. The effects of this param-
eter are analyzed in Section 5.2.3.
(c) Crossover probability Pc . There is no special consideration concerning the value
of Pc in Fuzzy CoCo. Standard values—0.5 to 1—are used [106].
(d) Mutation probability Pm . As discussed in Section 3.3, due to an exploration-
exploitation tradeoff with the elitism rate, Pm values in Fuzzy CoCo are
usually an order of magnitude higher than in a simple genetic algorithm.
While the value of Pm proposed by Potter and DeJong (Pm = 1/Lg where
Lg = length(genome)) [143] can be applied with relatively large populations,
it has to be increased by up to 10 times when Fuzzy CoCo is applied with small
populations. See Section 5.2.4 for an analysis of the effects of Pm on Fuzzy CoCo
performance.
(e) Elitism Rate Er . Typical values for Er are between 0.1 and 0.4, where larger
values are required in systems with few rules and small populations. Er encourages
exploitation of solutions found. Section 5.2.5 analyzes the effects of this parameter
on the performance of Fuzzy CoCo.
(f) Maximum number of generations Gmax . Due to the speed gain offered by Fuzzy
CoCo, the value Gmax , related directly to computational cost, can be up to five
times smaller than single-population algorithms. For example, for a 5-rule sys-
tem in the WBCD problem, while Fuzzy CoCo runs 1500 generations, the fuzzy-
genetic approach runs 4500 generations.
Table 5.1 Fuzzy CoCo set-up for the WBCD problem used to analyze the effects of some parameters.
Parameter Default value Tested values
Number of rules Nr 5 {2, 3, 5, 8, 12}
Population size PS 70 {20, 30, 50, 80, 120}
Crossover probability Pc 1 —
Mutation probability Pm 0.05 {0.001, 0.005, 0.01, 0.05, 0.1}
Elitism rate Er 0.1 {0.03, 0.1, 0.2, 0.4, 0.6}
“Fit” cooperators Ncf 1 —
Random cooperators Ncr 1 {0, 1, 2, 3, 4}
5
Maximum fitness evaluations Fmax 4 × 10 —
The first parameter analyzed—number of rules—principally affects the size and the shape
of the search space. The next two parameters, population size and number of cooperators,
define the number of fuzzy systems evaluated per generation. Then, I analyze the effects of
mutation probability and elitism rate. Finally, I derive some qualitative relationships between
these parameters.
0.97
0.98
0.96
2 rules 0.975
0.95 3 rules
5 rules
8 rules
12 rules
0.94 0.97
3 4 5 4 5
10 10 10 10 10
Figure 5.1 Evolution of fitness as a function of the maximum number of rules. The figures show the
average fitness of Fuzzy CoCo runs for different values of Nr . The inset box in (a)—corresponding to
a steadier evolution of the fitness—is enlarged in (b). The abscissa represents the computational effort
measured in number of fitness evaluations.
For a given problem, searching for a compact fuzzy rule base is harder than searching
5.2 Effects of Some Parameters on Performance 89
for a slightly larger system, even if the genome is larger for this latter search. The reason
for this apparent contradiction is that a less compact fuzzy system—i.e., with more rules—
can cover a larger part of the problem space. However, if evolution seeks too many rules, the
fitness landscape becomes too “flat” (intuitively, an abundance of low-performance hills, rather
than a few high-performance mountains), thus rendering the search more difficult. This idea
is reinforced by the systems found for the COBRA problem (Table 4.10), which effectively
utilize between 63% and 91% of the number of rules encoded into the genome (respectively,
15.78 rules for 25-rule runs and 9.13 for 10-rule runs). It seems clear that for each problem
there exists a range of ideal rule-base sizes (between 4 and 7 for the WBCD problem, and
about 17 for the COBRA system). Besides trial and error, I am aware of no current method to
determine this range.
(a) (b)
0.99 0.99
Pop= 20
Pop= 30
Pop= 50
0.98 Pop= 80
0.985 Pop=120
0.97
0.98
0.96
Pop= 20 0.975
0.95 Pop= 30
Pop= 50
Pop= 80
Pop=120
0.94 0.97
3 4 5 4 5
10 10 10 10 10
Figure 5.2 Evolution of fitness as a function of the population size. The figures show the average
fitness of Fuzzy CoCo runs for different values of PS . The inset box of (a)—corresponding to a
steadier evolution of the fitness—is enlarged in (b). The abscissa represents the computational effort
measured in number of fitness evaluations.
90 5. Analyzing Fuzzy CoCo
The above analysis suggests that when applying Fuzzy CoCo to a given problem the
use of small populations would provide a fast, accurate estimate of attainable fitness values.
This can be useful for coarse-tuning the fitness function and the genetic operators. Medium-
size populations would then be used to fine-tune the algorithm. The final search should be
performed using large populations in order to provide the algorithm with the diversity required
to adequately explore the search space.
(a) (b)
0.99 0.99
Nc=1
Nc=2
Nc=3
0.98 Nc=4
0.985 Nc=5
0.97
0.98
0.96
Nc=1 0.975
0.95 Nc=2
Nc=3
Nc=4
Nc=5
0.94 0.97
3 4 5 4 5
10 10 10 10 10
Figure 5.3 Evolution of fitness as a function of the number of cooperators. The figures show the
average fitness of Fuzzy CoCo runs for different values of Nc = Ncf + Ncr . The inset box of (a)—
corresponding to a steadier evolution of the fitness—is enlarged in (b). The abscissa represents the
computational effort measured in number of fitness evaluations.
Thus, concerning the number of cooperators, it seems that a light setup, with few co-
operators, might be used in the earliest stages of the design of a solution based on Fuzzy
CoCo. Once the fitness function has been adjusted the search can be performed using more
cooperators. Note, however, that the present analysis did not evaluate the combined effect of
simultaneously having a large population and a high number of cooperators, both of which are
5.2 Effects of Some Parameters on Performance 91
(a) (b)
0.99 0.99
Pm=0.001
Pm=0.005
Pm=0.01
0.98 Pm=0.05
0.985 Pm=0.1
0.97
0.98
0.96
Pm=0.001 0.975
0.95 Pm=0.005
Pm=0.01
Pm=0.05
Pm=0.1
0.94 0.97
3 4 5 4 5
10 10 10 10 10
Figure 5.4 Evolution of fitness as a function of the mutation probability. The figures show the average
fitness of Fuzzy CoCo runs for different values of Pm . The inset box of (a)—corresponding to a steadier
evolution of the fitness—is enlarged in (b). The abscissa represents the computational effort measured
in number of fitness evaluations.
(a) (b)
0.99 0.99
Er=0.03
Er=0.1
Er=0.2
0.98 Er=0.4
0.985 Er=0.6
0.97
0.98
0.96
Er=0.03 0.975
0.95 Er=0.1
Er=0.2
Er=0.4
Er=0.6
0.94 0.97
3 4 5 4 5
10 10 10 10 10
Figure 5.5 Evolution of fitness as a function of the elitism rate. The figures show the average fitness of
Fuzzy CoCo runs for different values of Er . The inset box of (a)—corresponding to a steadier evolution
of the fitness—is enlarged in (b). The abscissa represents the computational effort measured in number
of fitness evaluations.
values of Pm and Er ) in order to characterize completely the performance of the method. The
same exhaustive analysis should be performed for different problems so as to identify whether
or not these effects depend on the problem under study.
Instead of such an arduous analysis, I have derived some qualitative relationships between
various parameters of Fuzzy CoCo. Shown in Table 5.2, these relationships are based on the
simulations carried out on the different problems presented in this thesis.
Table 5.2 Qualitative relationships between Fuzzy CoCo parameters. Delineated below are criteria to
guide the choice of the number of cooperators Nc , the mutation probability Pm (expressed as a function
of the genome’s length, Lg ), and the elitism rate Er , all three as a function of the desired number of
rules and of the desired population size (expressed in terms of percentage of the typical population size
of a single-population algorithm). For example, if the user wishes to employ a large population with
few rules then she should set Nc , Pm , and Er to values within the ranges specified in the upper-right
quadrant of the table.
0.99
0.98
0.97
0.96
Best run
Average
Worst run
0.95
0 100 200 300 400 500 600 700 800 900 1000
Figure 5.6 Evolution of the best, the worst, and the average runs of a given Fuzzy CoCo setup. The
curves correspond to 40 evolutionary runs, searching for 7-rule systems for the WBCD problem. The
difference between the best and the worst runs is around 1%.
The consistency of the results does not provide information concerning the quality of
the systems obtained by Fuzzy CoCo. This is usually done by comparing the results with
previously published works. Because previous results for the Catalonia database are only
partially available, I have opted for the use of a standard test that provides a rough estimation
of the attainable performance: An intuitive, simple method known as k-nearest-neighbor (knn).
The knn method [181] is a classification technique based on the memorization of a train-
ing database. Given an unclassified input case, the system searches the memory for the k
training cases that most strongly resemble the new one. The resemblance is usually measured
through a distance metric—e.g., Manhattan or Euclidean distances. The closest training cases
are called the nearest neighbors. The unknown case is assigned the class of the majority of its
k-nearest neighbors.
Table 5.3 shows the performance obtained applying knn to the problems presented in this
thesis and their comparison with the results obtained by Fuzzy CoCo. The results for knn are
computed using “leave-one-out” cross-validation (i.e., the case being evaluated is left out and
the knn algorithm is applied in the remaining cases) [181].
94 5. Analyzing Fuzzy CoCo
Table 5.3 Comparison of Fuzzy CoCo with knn. Knn is used to roughly estimate an attainable fitness
value. Shown below are the classification performance values obtained for each problem, along with
the number of misclassifications in parentheses. The values for the COBRA system correspond to the
basic fitness F1 described in Section 4.2 that depends on sensitivity and specificity.
Problem knn Fuzzy CoCo
prediction Rules Runs Average Best
Iris (controller) 0.9733 (4) 3 49 0.9910 (1.35) 1.0000 (0)
Iris (classifier) 0.9733 (4) 3 48 0.9751 (3.74) 0.9933 (1)
WBCD 0.9736 (18) 7 40 0.9825 (11.95) 0.9898 (7)
Cobra 0.7441 25 15 0.8947 0.9154
These comparisons allow to relativize the results obtained for a given problem. For the Iris
and the WBCD problem, we can expect classification performances over 97.33%. However,
this does not imply that slightly higher performance values can be easily attained, as they
correspond to new correctly-classified cases that are hard to obtain, usually accepting increased
complexity of the solution. On the other hand, fitness values obtained for the COBRA system,
close to 0.89, while seemingly low in comparison with those of the WBCD problem, are in
fact good results that correspond to an increase of more than 200 correctly-diagnosed cases
with respect to knn results. Note that other machine-learning methods can be used to estimate
the attainable fitness values.
input and the output spaces. Moreover, high performance is often attained at the expense of
local generality.
Even though generality is defined at the rule level, some parameters of the whole fuzzy
system may affect its generality as discussed below:
• Membership functions per variable. The number of membership functions defining a
linguistic variable, also known as granularity, influences the fineness of the partition
and hence the capacity of the rules to target many reduced portions of the input space.
In general, a reduced number of membership functions favors the existence of general
rules.
• Variables per rule. Intuitively, the less variables in a rule’s premise, the more extended
may be the portion of the input space it covers. As explained in Section 2.3, several long
rules are needed to cover the same space than a short rule does.
• Rules per system. The coverage that a fuzzy system can attain for a given input space,
depends on the cumulative coverage of its rules. If the system disposes of only a few
rules, in order to guarantee an adequate coverage these rules will tend to be general.
Good performance requires good coverage. One can say that a small fuzzy system can-
not do both: perform well and overlearn.
Note that generality is a meaningful concept both linguistically and numerically speak-
ing. Because of this, the strategies proposed to reinforce the interpretability of fuzzy systems
(Section 2.3.3) also favor the emergence of general rules. That is explained below:
• The use of common shared labels prevents each rule from defining its own membership
functions. This avoids the existence of erratic rules tuned to very specific conditions.
• Don’t-care conditions promote the existence of rules with few variables in their premise.
• The default rule is, by definition, a general rule. It provides coverage for wide (and
sometimes disjoint) regions that would require many specific rules to describe them.
The example presented in Figure 5.7 illustrates how these strategies allow to improve the
generality of the rule base while preserving classification performance. The rule in the center
of the input space, albeit being specific in terms of extension, is quite general as it represents
one third of the cases contained in the hypothetic database. The rule containing a don’t-care
condition, marked A covers the space of three more-specific rules. The default rule, marked
D, covers the space of several very-specific rules including coverage for a region without any
instance.
D A
3 9 20
18 50 35 35 50 65
3 0 10
a) b)
Figure 5.7 Improving the generality of a fuzzy system. The figure illustrates how a rule containing
a don’t-care condition and a default rule (marked A and D, respectively) increases the generality of
the system. The numbers in the rules represent the cases covered by the rule. The colors represent the
output class proposed by each rule. Note that the classification performance of the system is preserved.
Table 5.4 Generality of the three-rule fuzzy controller. The activation profile of each rule consists
of: number of firing instances, winning instances, instances where it fires alone, average activation
level, and maximum activation. Note that the average activation level is computed using only the firing
instances. The Iris database has 150 instances.
Two of the active rules (i.e., rules 1 and 2) and the default rule are clearly general as they
are involved in the classification of many instances. In contrast, rule 3 is more specific as it
is fired only by 4 instances. However, despite its specificity, rule 3 never fires alone and it is
never the winner rule. Note that the four instances for which rule 3 fires are hard to classify.
Indeed, they lie close to the border between the classes versicolor and virginica in the input
space.
5.5 Summary 97
Table 5.5 Generality of the three-rule fuzzy classifier. The activation profile of each rule consists
of: number of firing instances, number of winning instances, number of instances where it fires alone,
average activation level, and maximum activation. Note that the average activation level is computed
using only the firing instances. The Iris database has 150 instances.
It is clear from their firing profiles that all the rules are general. Even rule 3, whose
premise is very specific—it defines conditions for all four input variables (see Figure 3.11)—
fires for 47 instances.
5.5 Summary
In this chapter I presented some analyses that should guide future users of Fuzzy CoCo in
their experimental design. First, I presented the steps required to define all the parameters
of the algorithm. Secondly, I proposed some qualitative criteria to define the values of the
most important parameters, based on an analysis of their effect on Fuzzy CoCo performance.
Then, to analyze the quality of the results independent of the availability of previous results, I
proposed the use of a simple machine-learning method (knn in this case) to roughly estimate
an attainable performance. Such an estimation allows to relativize the real performance of a
system for a given problem. Finally, I introduced the concept of local generality as a measure
98 5. Analyzing Fuzzy CoCo
Table 5.6 Generality of the seven-rule WBCD fuzzy system. The activation profile of each rule consists
of: number of firing instances, number of winning instances, number of instances where it fires alone,
average activation level, and maximum activation. Note that the average activation level is computed
using only the firing instances. The WBCD database has 683 instances.
that can assess, or even reinforce, the global generalization capabilities of the fuzzy systems
designed using Fuzzy CoCo.
Chapter 6
As mentioned in Section 5.1, Fuzzy CoCo requires the user to define a maximum number
of rules for a given run. However, for each problem there exists a range of ideal rule-base
sizes, that is hard to determine. The user is thus obliged to find this range by trial and error. I
propose in this chapter two extensions to Fuzzy CoCo, intended to simplify the task of finding
an adequate size of the rule base.
The first extension, called Island Fuzzy CoCo, is based on the Island model [172, 180].
It takes advantage of the exploration performed separately by concurrent instances of Fuzzy
CoCo, where each instance is set to search for systems of different sizes. Island Fuzzy CoCo
is presented in Section 6.1. Section 6.2 presents the second extension, inspired by the iter-
ative rule learning approach [59], and called Incremental Fuzzy CoCo. In this method, the
number of rules of the sought-after system increases each time that evolution satisfies certain
criteria. In this way, the search for more complex systems starts on the basis of some “good”
individuals.
• In complex runs (i.e., searching for large systems), systems often appear that effectively
utilize less rules than the maximum allowed. Such systems exhibit fitness values similar
or superior to those of the simpler runs corresponding to their effective number of rules.
Island Fuzzy CoCo, the approach proposed herein, is similar to the so-called island model
where several (sub)populations, called islands or demes, evolving separately most of the time,
occasionally exchange individuals according to a certain migration policy [172, 180]. Below, I
99
100 6. Extensions of the Methodology
sketch Island Fuzzy CoCo and describe two preliminary tests performed to explore its behav-
ior.
FCC m Migration
FCC n FCC q
FCC p Island
Figure 6.1 Island Fuzzy CoCo. Several instances of Fuzzy CoCo, called here FCCc , run concurrently.
The index c represents the number of rules of the systems evolving in the island. The exchange of
individuals (migration) is controlled by a migration policy, which in this example allows exchange of
individuals between all the islands.
Migration introduces three new groups of parameters to be defined: the migration topol-
ogy, i.e., a set of allowed migration paths; the migration interval (Gm ), i.e., the number of
generations between exchanges; and the migration size (Nm ), i.e., the number of individuals
that migrate between the islands. In other words, from time to time, each island proposes
interesting candidates to some other islands and chooses a number of immigrants among the
candidates proposed to it. Figure 6.2 presents the Island Fuzzy CoCo algorithm in pseudo-code
format.
The main criterion to select the Nm migrants for a given destination island is the relative
performance with respect to the actual performance of the population in the destination island.
Two possible criteria, for an island, to select its Nm immigrants from a pool of candidates (MD )
are: (1) relative performance with respect to the actual average performance in the island and
(2) keep a balanced immigration either by choosing the same number of immigrants from
each origin island, or by establishing a maximum number of migrants per origin island. Even
6.1 Island Fuzzy CoCo 101
Figure 6.2 Pseudo-code of Island Fuzzy CoCo. In each island an instance of Fuzzy CoCo runs for
a given number of rules. Every Gm generations, each island propose Nm emigrants to other islands
m immigrants from the pool of candidates proposed to it (MI =
(called here MD I ) and chooses N
j
j (MI )). “Fuzzy-CoCo evolve” corresponds to the evolutionary operators applied in Fuzzy CoCo
(see Figure 3.2).
though each island may use its own, specific fitness criteria to evaluate its individuals, migrants
must be selected on the basis of a common fitness measure (e.g., performance).
Given that the complexity of individuals in the origin and the destination islands are
different, the genome of the migrants must be adapted: eliminating unused rules in downsizing
migrants or duplicating active rules for enlarging migrants. Special care must be taken to
preserve as much as possible the fitness of the migrants after these operations.
Fuzzy CoCo involves two coevolving species, but migration is decided according to the
fitness computed for entire fuzzy systems consisting of a pair (individual-cooperator). Be-
sides, the complexity of the system, which is a key concept in Island Fuzzy CoCo, is based only
on individuals from the rules species. There exist thus at least three possible ways to choose
102 6. Extensions of the Methodology
the migrants: (1) migrants are chosen separately for each species according to their fitness in-
dependent of their relation with the other species; (2) migrants are chosen for each species, but
emigrate together with their cooperators (i.e., selected members of the other species), and (3)
migrants are selected only from the rule species and migrate together with their cooperators.
Table 6.1 Fuzzy CoCo set-up used for all the islands for the Iris classifier and the WBCD problems.
Parameter Iris WBCD
Population size P s 70 50
Maximum generations Gmax 1000 1500
Crossover probability Pc 1 1
Mutation probability Pm 0.1 0.1
Elitism rate Er 0.2 0.2
“Fit” cooperators Ncf 1 1
Random cooperators Ncr 1 1
• Emigrant selection. The pool of possible emigrants to a given island is composed of indi-
viduals whose fitness value is greater than the actual average fitness of the destination is-
land. For each destination island, up to Nm emigrants are selected fitness-proportionally
from each species. Note that migrants carry with them their cooperators. However, if
several individuals carry the same cooperator, only one copy of the cooperator is placed
in the migrating group.
• Immigrant selection. The pool of possible immigrants to an island consists of all the
candidates proposed to it that exhibit a fitness value greater than the actual average fit-
ness on the island. Each island selects up to Nm immigrants using fitness-proportionate
selection. No mechanism to balance immigration is applied.
• Fitness function. All the islands use the same fitness function for both evolution and
migration purposes. The Iris classifier islands simply use the classification performance
Fc . The fitness function of the WBCD islands includes an interpretability-related term.
It is given by F = Fc − αFv , where α = 0.0015, Fc is the classification performance,
and Fv penalizes systems with a large number of variables in their rules.
Table 6.2 Results of Island Fuzzy CoCo for the Iris classifier problem. Shown below are the average,
over ten runs, of the maximum fitness obtained for each combination of the migration parameters:
migration interval Gm and migration size Nm expressed in generations and individuals, respectively.
Gm Nm
5 20 50
1 98.59% 98.53% 98.60%
10 98.73% 98.67% 98.80%
100 98.73% 98.93% 98.73%
None 98.83%
From the analysis of these global results—i.e., extracting the best system found at each
run—one can not see that migration affords a real advantage with respect to simple Fuzzy
CoCo (i.e., without migration). Only those runs that allow 20 migrants every 100 generations
perform slightly better. However, as Island Fuzzy CoCo searches simultaneously for different
system sizes, it is necessary to analyze the results for each island. Figure 6.3 compares, island
by island, the results obtained using the best migration policy with those obtained without
migration.
Migration appears to have a benefical effect for those islands containing large systems.
However, the low number of islands required for this problem renders difficult the extraction
of conclusions.
104 6. Extensions of the Methodology
99
98.5
98
97.5
97
96.5
Migration
No migration
96
2 3 4 5
Figure 6.3 Results, island by island, for the Iris classifier problem. The figure shows the average
fitness obtained, for each number of rules, using the best migration policy (i.e., Gm = 100 generations
and Nm = 20 individuals), compared with those obtained without migration.
Table 6.3 Results of Island Fuzzy CoCo for the WBCD problem. Shown below are (a) the average
over ten runs and (b) the maximum fitness, obtained for each combination of the tested migration
parameters: migration interval Gm and migration size Nm expressed in generations and individuals,
respectively.
From these global results, one can see that only those runs using either Gm = 50 or
Nm = 20 find systems with similar performance to those found without migration. Runs
combining both values find the best systems. To evaluate the effect that migration has on each
island, Figure 6.4 compares the results of the best migration policy with those obtained without
migration.
These results confirm, as observed in the Iris-classifier test, that an adequate migration
6.1 Island Fuzzy CoCo 105
98.4 98.4
98.1 98.1
97.8 97.8
97.5 97.5
Migration Migration
No migration No migration
97.2 97.2
1 2 3 4 5 6 7 1 2 3 4 5 6 7
Figure 6.4 Results, island by island, for the WBCD problem. The figure shows the average fitness
obtained, for each number of rules, using the best migration policy (i.e., Gm = 50 generations and
Nm = 20 individuals), compared with those obtained without migration.
policy may have a clearly positive effect on the performance of islands containing complex
individuals. In this case four-rule to seven-rule systems.
Migration also affects the dynamics of evolution in each island. Figure 6.5 shows the
evolution of performance in different islands with and without migration. Note that even for
small systems, where migration does not improve the final performance, the fitness evolve at
least as fast as in the absence of migration.
0.975 0.975
0.975 0.975
Figure 6.5 Effect of migration on evolutionary dynamics. The figure shows the average fitness in
different islands with and without migration. The abscissa represents the number of generations.
• Island Fuzzy CoCo should be implemented in parallel platforms, with each island exe-
cuted on a separate processor. The communication between the islands is sparse as it is
reduced to information about some migrants (i.e., genome and fitness value) and about
the global performance of island populations.
• The principle of Island Fuzzy CoCo, i.e., the use of islands containing individuals with
different levels of complexity, may be used for other optimization methods dealing with
variable complexity (e.g., the evolution of finite state machines [103] and the evolution
of fuzzy systems with different input-space dimensions [149]).
and describe a first test performed to explore its potential search capabilities..
Figure 6.6 Pseudo-code of Incremental Fuzzy CoCo. A simple Fuzzy CoCo evolves a population of
systems of size R. Each time an increment criterion is satisfied, the sought-after complexity is increased
and part of the population is used to seed a new instance of Fuzzy CoCo.
The criteria used to decide to increase the complexity may be one or more of the follow-
ing: number of generations elapsed, average performance of the entire population or of a se-
lected elite, stagnation of evolution, or explicit user interaction. A part of the actual population
is used to seed the new, complexity-increased, population. Due to the increased complexity
in the new population, the genome of the selected seed must be adapted by duplicating some
active rules. Special care must be taken to ensure that fitness is preserved after this operation.
The remaining individuals in the initial population are generated randomly. In this way it is
possible to keep the flexibility of the search while launching the new search with known “good
individuals.”
108 6. Extensions of the Methodology
• Range of the number of rules. The algorithm starts with one-rule systems and works its
way up to ten-rule systems.
• Fuzzy CoCo setup. All the instances of Fuzzy CoCo use the same setup. Table 6.4
delineates the values used for the Fuzzy CoCo parameters.
Table 6.4 Fuzzy CoCo set-up for all the instances in Incremental Fuzzy CoCo for the WBCD problem.
Parameter Value
Population size P s 100
Maximum generations Gmax 1000
Crossover probability Pc 1
Mutation probability Pm 0.15
Elitism rate Er 0.1
“Fit” cooperators Ncf 1
Random cooperators Ncr 1
• Number of seeding individuals. The best five percent of the evolved population—the
elite—is used to seed the new initial population. For the rule species, the genomes are
adapted to the new size by adding a new rule. Each individual of the selected elite is
used to generate three new individuals: the first is obtained by duplicating one of the
active rules, the other two by adding a random rule. The remaining individuals (i.e.,
85% of the population) are randomly initialized. In this way, the first 5% are known to
perform well, the following 10% explore promising regions of the new enlarged search
space, while the remaining 85% explore new regions.
6.2.2.2 Results.
Thirty-two runs were performed, all but one of which found systems whose classification
performance exceeds 98.0%. In particular, considering the best individual per run (i.e., the
evolved system with the highest classification success rate), 20 runs led to a fuzzy system
6.2 Incremental Fuzzy CoCo 109
whose performance exceeds 98.5%, and of these, 2 runs found systems whose performance
exceeds 98.7%.; these results are summarized in Figure 6.7. The best system found obtains
an overall classification rate of 98.83%. The average performance over the 32 runs is 98.50%.
(Note: The results presented here were obtained by Olivier Rutti in the course of a student
project.)
20
15
10
0
97.5 98 98.5 99
Figure 6.7 Summary of results of 32 evolutionary runs. The histogram depicts the number of systems
exhibiting a given classification performance level at the end of the evolutionary run.
Figure 6.8 shows the evolution of the classification performance during Incremental
Fuzzy CoCo runs.
98.5
98 98
1 2 3 4 5 6 7 8 9 10 97.5 1 2 3 4 5 6 7 8 9 10
97 97
96.5
96 96
0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500
Figure 6.8 Evolution of classification performance in Incremental Fuzzy CoCo. The figures show
the classification performance for: (a) the average over 32 runs, and (b)the best run. The abscissa
represents the number of generations elapsed. The numbers 1 to 10 represent the number of rules of the
systems evolved in the corresponding interval.
Taking the results obtained by Incremental Fuzzy CoCo at the end of the search for seven-
rule systems (corresponding to 1650 generations), we can compare them with the results of
Fuzzy CoCo when searching for seven-rule systems presented in Section 4.1.2. Table 6.5
110 6. Extensions of the Methodology
presents this comparison, while Figure 6.9 extends the comparison to other rule base sizes.
The best systems found by Incremental Fuzzy CoCo are worse than those found by Fuzzy
CoCo. However, the average performance of the former is better, and its results are obtained
with less generations than those of the latter.
Table 6.5 Comparison of seven-rule systems evolved by Incremental Fuzzy CoCo with those obtained
using Fuzzy CoCo.
Generations Average Best
Incremental Fuzzy CoCo 1650 98.43% 98.54%
Simple Fuzzy CoCo 1700 98.25% 98.98%
98.9 98.9
98.7 98.7
98.5 98.5
98.3 98.3
98.1 98.1
Incremental Fuzzy CoCo Incremental Fuzzy CoCo
Simple Fuzzy CoCo Simple Fuzzy CoCo
97.9 97.9
3 4 5 6 7 3 4 5 6 7
Figure 6.9 Comparison of systems evolved by Incremental Fuzzy CoCo with those obtained using
Fuzzy CoCo. The figure shows the classification performance obtained at the end of the search for each
rule-base size.
• Incremental Fuzzy CoCo seems to exhibit better repeatability than Fuzzy CoCo, as sug-
gested by the narrow distribution of the results (see Figure 6.7). This is true not only for
the global results but also for different rule-base sizes as shown in Figure 6.9. However,
this approach fails to find top systems as good as those found by Fuzzy CoCo. One of
the possible reasons is, again, that all the instances of Fuzzy CoCo were identically set
up.
• The use of a fixed number of generations to define the increment of the complexity
should be a weak criterion as the population can either converge toward mediocre per-
formances, if this number is too high, or be stopped before a good exploration of the
6.2 Incremental Fuzzy CoCo 111
search space is performed, if this number is too low. This criterion affects directly the
diversity and the quality of the initial population of the following instance.
• Intuitively, the seeding strategy (5% elite, 10% modified elite) is adequate, but the rate of
seeding individuals needs to be tuned, as it should be excessive if the elite is not diverse
enough.
• This method requires a deeper study to better understand the effects of the parameters
(i.e., Fuzzy CoCo, increment criterion, and seeding strategy) on its performance. The
goal of the study should be to find a setup that improves the quality of the best systems
while preserving as much as possible the repeatability of the results.
• Even if this goal is not attained, Incremental Fuzzy CoCo can be useful, as it is, for:
– searching automatically for an adequate range of sizes of the rule base, and
– estimating attainable performance values for a given problem, and for different
number of rules.
Chapter 7
7.1 Summary
I presented a novel approach for system design—Fuzzy CoCo—based on fuzzy logic and
coevolutionary computation, which is conducive to explaining human decisions. My algorithm
has been able to find accurate and interpretable systems for hard, real-world problems. The
analysis of Fuzzy CoCo and of the systems it produced shows, among other features, the
consistency of the results. I also proposed two extensions to the method, which deserve further
exploration.
Evolutionary fuzzy modeling—i.e., the design of fuzzy inference systems using evolu-
tionary algorithms—constitutes the methodological base of my work. In Chapter 2 I studied
extant evolutionary fuzzy modeling approaches and some criteria and considerations involved
in this task. I emphasized an aspect usually neglected in most approaches: interpretability. In
particular, I presented some strategies to satisfy semantic and syntactic criteria that reinforce
the interpretability of the systems produced. To illustrate these concepts, I applied a basic
fuzzy-genetic approach to solve a medical diagnostic problem: the WBCD problem. The
systems obtained were the best explanatory systems presented at the time for this problem.
The aforementioned study brought to the fore some limitations of evolutionary fuzzy
modeling, but, at the same time, it provided clues on how to overcome them. Based on these
clues, I proposed the use of cooperative coevolution to surmount the problem of dealing with
different types of parameters in the same genome. This is the origin of Fuzzy CoCo, pre-
sented in detail in Chapter 3, together with a simple application example: the Iris classification
problem. The systems obtained for this problem surpassed previous fuzzy modeling results.
Fuzzy CoCo was then used in Chapter 4 to model the decision processes involved in two
breast-cancer diagnostic problems: the WBCD problem and the Catalonia mammography in-
terpretation problem. For the WBCD problem, Fuzzy CoCo produced markedly better results
using less or similar computational resources than the fuzzy-genetic approach. For the Catalo-
nia problem, an evolved system was embedded within a web-based tool—called COBRA—for
aiding radiologists in mammography interpretation.
In order to attain a deeper understanding of Fuzzy CoCo, I performed in Chapter 5 several
analyses regarding the performance of the methodology and of the systems it produces. These
analyses involve aspects like the application of the method, the effects that some parameters
have on performance, the consistency and the quality of the systems designed using Fuzzy
113
114 7. Conclusions and Future Work
CoCo, as well as their local generality. Finally, I proposed two extensions: Island Fuzzy CoCo
and Incremental Fuzzy CoCo, which together with the original CoCo constitute a family of
coevolutionary fuzzy modeling techniques. The extensions are intended to guide the choice
of an adequate number of rules for a given problem—a critical, hard-to-define parameter of
Fuzzy CoCo. The encouraging preliminary results obtained with these extensions motivate
further investigation.
• The interpretability considerations presented in Section 2.3 contain several original el-
ements. First, observing that most of the interpretability criteria presented in the litera-
ture were mainly oriented toward constraining the definition of membership functions,
I grouped them under the label semantic criteria, as they affect the coherence of the
linguistic concepts. Then, I identified and proposed some other criteria regarding the
rule base, which are called syntactic criteria, as they affect the (causal) connection be-
tween linguistic concepts. Finally, I proposed a set of modeling strategies that, when
applied, should reinforce the linguistic integrity—both semantic and syntactic—of the
fuzzy systems produced. Note that these considerations are valid for any fuzzy modeling
technique.
• The systems obtained to solve the Catalonia mammography problem (Section 4.2),
served as a base for the development of the COBRA system—the Catalonia On-
7.3 Future Work 115
• Finally, I proposed two extensions to Fuzzy CoCo, intended to simplify the task of find-
ing an adequate size of the rule base. Two key elements of the first extension, Island
Fuzzy CoCo, are: (1) the existence of evolutionary islands containing individuals of dif-
ferent sizes, and (2) a migration mechanism that adapts the size of the migrants to render
them compatible with their destination island. The second extension, Incremental Fuzzy
CoCo, bases its search power on a mechanism of incremental evolution. In contrast with
iterative rule learning methods, the search does not set a good system early on. Good
systems are only used to signal promising search regions as they seed the new initial
population. These methods represent two new members of a family of coevolutionary
fuzzy modeling techniques based on Fuzzy CoCo.
• Studying the effects of the evolutionary parameters on the performance of each species
to determine specific optimal setups.
• Investigating other types of evolutionary algorithms for each species, for example, real-
coded ones—such as evolution strategies—for membership-function species, or tree-
based representations for rules species.
• Tuning the setup, as the existence of migration between islands may change the effect of
evolutionary parameters on performance. This means revisiting the analysis presented
in Section 5.2.
• Defining a mechanism for dynamically finding the range of number of rules to perform
the search, e.g., stopping islands that stagnate in low performance values or creating
islands for larger individuals if performance suggests this being viable.
116 7. Conclusions and Future Work
• Comparing the performance of Island Fuzzy CoCo with that of simple Fuzzy CoCo (i.e.,
with zero migration) on harder problems. The island version shows higher performance
improvement for islands containing systems with a high number of rules. Given that
harder problems usually require larger systems, the algorithm seems adequate to solve
them.
• Parallel implementation. As mentioned before, the algorithm is well suited for paral-
lelization, implementing each island on a separate processor. There is no need for cen-
tral control, and the communication between the islands is reduced to information about
some migrants (their genome and fitness values) and about the global performance of
island populations.
• The setup of each instance of Fuzzy CoCo should be adapted to the features of the
systems it evolves. Parameters such as population size, mutation probability, elitism
rate, and number of cooperators should change as the complexity of the encoded systems
increases.
• This method requires a more thorough study to better understand the effects on perfor-
mance of its specific new parameters (i.e. increment criteria and seeding strategy).
• In the same way the complexity of the search is increased, there should exist some
criteria to decrease this complexity. For example, if the effective number of rules used
by (most of) the best systems is lower than the currently allowed maximum number of
rules.
• Incremental Fuzzy CoCo can serve for early exploration of the problem space. It can
estimate both attainable performance values and an adequate range of sizes of the rule
base.
• Island Fuzzy CoCo could explore in-depth the range of complexities found in order to
validate the estimation obtained before.
• Simple Fuzzy CoCo could perform specific searches for systems with a given, user-
defined number of rules.
• An extended version of the user interface developed for the COBRA system could serve
for the interaction with both the fuzzy modeler and the end user.
7.3 Future Work 117
119
120 A. Evolutionary Computation in Medicine: An Overview
diagnosis, prognosis, imaging, signal processing, planning, and scheduling, while section A.3
provides an extensive bibliography, classified both according to the medical task addressed
and according to the evolutionary technique used.
There are two major approaches to data mining: supervised and unsupervised. In the
supervised approach, specific examples of a target concept are given, and the goal is to learn
how to recognize members of the class using the description attributes. In the unsupervised
approach, a set of examples is provided without any prior classification, and the goal is to
discover underlying regularities and patterns, most often by identifying clusters or subsets of
similar examples.
Clinical databases have accumulated large amounts of data on patients and their medical
conditions. The clinical history of a patient generates data that goes beyond the disease being
treated. This information, stored along with that of other patients, constitutes a good place to
look for new relationships and patterns, or to validate proposed hypotheses.
The range of applications of data mining in medicine is very wide, with the two most
popular applications being diagnosis and prognosis. Diagnosis is the process of selectively
gathering information concerning a patient, and interpreting it according to previous knowl-
edge, as evidence for or against the presence or absence of disorders [92]. In a prognostic
process, a patient’s information is also gathered and interpreted, but the objective is to predict
the future development of the patient’s condition. Due to the predictive nature of this process,
prognostic systems are frequently used as tools to plan medical treatments [93].
The role played by data mining in the context of diagnosis and prognosis is the discovery
of the knowledge necessary to interpret the gathered information. In some cases this knowl-
edge is expressed as probabilistic relationships between clinical features and the proposed
diagnosis or prognosis. In other cases a rule-based representation is chosen so as to provide
the physician with an explanation of the decision. Finally, in yet other cases, the system is
designed as a black-box decision maker that is totally unconcerned with the interpretation of
its decisions.
Evolutionary computation is usually applied in medical data mining as a parameter finder.
Evolutionary techniques search for the parameter values of the knowledge representation set
up by the designer so that the mined data are optimally interpreted. For example, evolutionary
algorithms can search for the weights of a neural network, the membership function values of
a fuzzy system, or the coefficients of a linear regressor.
(a) Diagnosis [7–9, 11, 12, 14, 46, 52, 54, 70, 85, 88, 114, 122, 125, 138, 146, 153, 188].
Diagnosis is the process of selectively gathering information concerning a patient,
and interpreting it according to previous knowledge, as evidence for or against
the presence or absence of disorders (Section A.2.1). The papers in this category
apply evolutionary algorithms to solve numerous diagnostic problems, including:
patient’s general condition evaluation, location of primary tumor, detection of hya-
line membrane disease in preterm newborn infants, detection of breast cancer cells
A.3 Classified Bibliography
Medical task Data Mining Medical imaging Planning and
and
Evolutionary technique Diagnosis Prognosis signal processing scheduling
123
124 A. Evolutionary Computation in Medicine: An Overview
2. Medical imaging and signal processing [3, 5, 20, 24, 28–31, 42, 53, 55, 58, 60, 83, 84, 100,
108,115,154,156,167,174,176,179,197]. The fields of medical imaging and signal pro-
cessing have developed tools to deal with huge amounts of data expressed as images or
other types of temporal signals. All but one of the papers in this category deal with prob-
lems related to clinical tests, including: thorax radiography, retinal and cardiac angiog-
raphy, computarized tomography, magnetoencephalography, ultrasound imaging, elec-
troencephalography, electrocardiography, radiographic cephalography, laser profilome-
try, and many applications of both mammography and magnetic resonance. The “odd
paper out” presents an application of surgery assistance [5].
3. Planning and scheduling [12, 23, 38, 57, 87, 135, 136, 187, 189–192]. Planning and
scheduling involve the assignment of resources to accomplish one or more tasks subject
to several constraints (Section A.2.3). The papers in this category use evolutionary
techniques to solve problems such as: allocation of hospital resources, electrical carotid
sinus nerve stimulation, radiologist allocation, three-dimensional radiation therapy
treatment planning, dosimetric preplanning and treatment planning of permanent
prostate implants, patient scheduling in highly constrained situations, and stereotactic
radiosurgery planning.
2. Genetic programming [11, 52, 114]. In genetic programming solutions are encoded as
computer programs rather than as fixed-length character strings (Section 1.3.2).
3. Evolution strategies [7–9, 135]. Evolution strategies are well suited for parameter-
optimization problems. They use mainly the mutation operator. A major characteristic
of evolution strategies is that mutation values are evolved along with the parameters
being optimized (Section 1.3.3).
4. Evolutionary programming [46, 114, 188]. In evolutionary programming, individuals
are represented by finite state machines, which provide a meaningful representation of
behavior based on interpretation of environmental symbols (Section 1.3.4).
5. Classifier systems [14, 122, 146]. Classifier systems are evolution-based learning sys-
tems. They can be viewed as restricted versions of classical rule-based systems that add
interaction with the exterior thanks to input and output interfaces (Section 1.3.5).
6. Hybrid approaches
(a) Evolutionary-fuzzy systems [7–9, 70, 86, 125, 176]. In evolutionary-fuzzy systems,
the capability of expressing knowledge in a linguistic, “human-friendly” way of-
fered by fuzzy logic, is combined with the power of evolutionary algorithms to
search and optimize. Thus one obtains systems with both high performance as
well as high interpretability.
126 A. Evolutionary Computation in Medicine: An Overview
(b) Evolutionary-neural systems [7–9, 22, 24, 25, 31, 33, 46, 58, 72, 73, 98, 110, 122, 166,
188]. With evolutionary-neural systems evolution and learning strategies work in
concert to attain adaptation.
Bibliography
[4] T. Bäck, U. Hammel, and H.-P. Schwefel. Evolutionary computation: Comments on the
history and current state. IEEE Transactions on Evolutionary Computation, 1(1):3–17,
April 1997.
[5] S. Baluja and D. Simon. Evolution-based methods for selecting point data for object
localization: Applications to computer-assisted surgery. Applied Intelligence, 8(1):7–
19, January 1998.
[6] Wolfgang Banzhaf, Peter Nordin, Robert E. Keller, and Frank D. Francone. Genetic
Programming – An Introduction; On the Automatic Evolution of Computer Programs
and its Applications. Morgan Kaufmann, dpunkt.verlag, January 1998.
127
128 Bibliography
[10] K. P. Bennett and O. L. Mangasarian. Neural network training via linear programming.
In P. M. Pardalos, editor, Advances in Optimization and Parallel Computing, pages 56–
57. Elsevier Science, 1992.
[11] A. S. Bickel and R. W. Bickel. Tree structured rules in genetic algorithms. In John J.
Grefenstette, editor, Genetic Algorithms and their Applications: Proceedings of the sec-
ond International Conference on Genetic Algorithms, pages 77–81, MIT, Cambridge,
MA, USA, July 1987. Lawrence Erlbaum Associates.
[13] C. L. Blake and C. J. Merz. UCI repository of machine learning databases, 1998.
[14] P. Bonelli and A. Parodi. An efficient classifier system and its experimental compari-
son with two representative learning methods on three medical domains. In R. Belew
and L. Booker, editors, Proceedings of the Fourth International Conference on Genetic
Algorithms, pages 288–295, San Mateo, CA, July 1991. Morgan Kaufman.
[15] C. Bonivento, A. Davalli, and C. Fantuzzi. Tuning of myoelectric prostheses using fuzzy
logic. Artificial Intelligence in Medicine, 21(1–3):221–225, January–March 2001.
[17] G. E. P. Box and J. S. Hunter. Condensed calculations for evolutionary operation pro-
grams. Technometrics, 1:77–95, 1959.
[19] D. Chakraborty and N.R. Pal. Integrated feature analysis and fuzzy rule-based system
identification in a neuro-fuzzy paradigm. IEEE Transactions on Systems, Man, and
Cybernetics, Part B: Cybernetics, 31(3):391–400, June 2001.
[21] B.-S. Chen, S.-C. Feng, and K.-C. Wang. Traffic modeling, prediction, and congestion
control for high-speed networks: A fuzzy AR approach. IEEE Transactions on Fuzzy
Systems, 8(5):491–508, October 2000.
[22] H.-Y. Chen, T. C. Chen, D. I. Min, G. W. Fischer, and Y.-M. Wu. Prediction of
tacrolimus blood levels by using the neural network with genetic algorithm in liver
transplantation patients. Therapeutic Drug Monitoring, 21(1):50–56, February 1999.
Bibliography 129
[23] Y. Chen, M. Narita, M. Tsuji, and S. Sa. A genetic algorithm approach to optimization
for the radiological worker allocation problem. Health Physics, 70(2):180–6, February
1996.
[24] Y.-T. Chen, K.-S. Cheng, and J.-K. Liu. Improving cephalogram analysis through
feature subimage extraction. IEEE Engineering in Medicine and Biology Magazine,
18(1):25–31, January-February 1999.
[26] O. Cordón, F. Herrera, and M. Lozano. On the combination of fuzzy logic and evolu-
tionary computation: A short review and bibliography. In Pedrycz [133], pages 33–56.
[27] P. Darwen and X. Yao. Every niching method has its niche: Fitness sharing and implicit
sharing compared. In H.-M. Voigt, W. Ebeling, I. Rechenberg, and H.-P. Schwefel,
editors, Parallel Problem Solving from Nature – PPSN IV, pages 398–407, Berlin, 1996.
Springer.
[28] K. K. Delibasis, P. E. Undrill, and G. Cameron. Designing texture filters with genetic
algorithms: An application to medical images. Signal Processing, 57:19–33, 1997.
[35] M. O. Efe and O. Kaynak. A novel optimization procedure for training of fuzzy in-
ference systems by combining variable structure systems technique and levenbergŕmar-
quardt algorithm. Fuzzy Sets and Systems, 122(1):153–165, August 2001.
130 Bibliography
[37] J. Espinosa and J. Vandewalle. Constructing fuzzy models with linguistic integrity from
numerical data-AFRELI algorithm. IEEE Transactions on Fuzzy Systems, 8(5):591–
600, October 2000.
[43] M. Si Fodil, P. Siarry, F. Guely, and J.-L. Tyran. A fuzzy rule base for the improved
control of a pressurized water nuclear reactor. IEEE Transactions on Fuzzy Systems,
8(1):1–10, February 2000.
[44] D. B. Fogel, editor. Evolutionary Computation: The Fossil Record. IEEE Press, Piscat-
away, NJ, 1998.
[50] R. M. Friedberg, B. Dunham, and J. H. North. A learning machine. II. IBM Journal of
Research and Development, 3:282–287, 1959.
Bibliography 131
[53] J. J. Grefenstette and M. J. Fitzpatrick. Genetic search with approximate function evalu-
ation. In John J. Grefenstette, editor, Proceedings of the 1st International Conference on
Genetic Algorithms and their Applications, pages 112–120, Pittsburgh, PA, July 1985.
Lawrence Erlbaum Associates.
[58] H. Handels, Th. Roβ, J. Kreusch, H. H. Wolff, and S. J. Pöppl. Feature selection for
optimized skin tumor recognition using genetic algorithms. Artificial Intelligence In
Medicine, 16(3):283–297, 1999.
[59] F. Herrera, M. Lozano, and J. L. Verdegay. Generating fuzzy rules from examples using
genetic algorithms. In B. Bouchon-Meunier, R. R. Yager, and L. A. Zadeh, editors,
Fuzzy Logic and Soft Computing, pages 11–20. World Scientific, 1995.
[62] J. H. Holland. Outline for a logical theory of adaptive systems. Journal of the ACM,
9(3):297–314, July 1962.
132 Bibliography
[63] J. H. Holland. Processing and processors for schemata. In E. L. Jacks, editor, Associa-
tive information processing, pages 127–146. New York: American Elsevier, 1971.
[65] T. P. Hong and J. B. Chen. Processing individual fuzzy attributes for fuzzy rule induc-
tion. Fuzzy Sets and Systems, 112(1):127–140, May 2000.
[66] Jörn Hopf. Cooperative coevolution of fuzzy rules. In N. Steele, editor, Proceedings
of the 2nd International ICSC Symposium on Fuzzy Logic and Applications (ISFL-97),
pages 377–381, Zurich, Switzerland, 1997. ICSC Academic Press.
[67] C. A. Hung and S. F. Lin. An incremental learning neural network for pattern clas-
sification. International Journal of Pattern Recognition and Artificial Intelligence,
13(6):913–928, September 1999.
[68] P. Husbands and F. Mill. Simulated co-evolution as the mechanism for emergent plan-
ning and scheduling. In R. Belew and K. Booker, editors, Proceedings of the 4th In-
ternational Conference on Genetic Algorithms, pages 264–270, San Diego, CA, July
1991. Morgan Kaufmann.
[71] J.-S. R. Jang and C.-T. Sun. Neuro-fuzzy modeling and control. Proceedings of the
IEEE, 83(3):378–406, March 1995.
[74] Jonghyeok Jeong and Se-Young Oh. Automatic rule gerenartion for fuzzy logic con-
trollers using rule-level coevolution of subpopulations. In 1999 Congress on Evolution-
ary Computation, pages 2151–2156, Piscataway, NJ, 1999. IEEE Service Center.
[75] Y. Jin. Fuzzy modeling of high-dimensional systems: complexity reduction and inter-
pretability improvement. IEEE Transactions on Fuzzy Systems, 8(2):335–344, April
2000.
Bibliography 133
[76] C.-F. Juang, J.-Y. Lin, and C.-T. Lin. Genetic reinforcement learning through symbi-
otic evolution for fuzzy controller design. IEEE Transactions on Systems, Man and
Cybernetics, Part B: Cybernetics, 30(2):290–302, April 2000.
[77] H. Juillé. Methods for Statistical Inference: Extending the Evolutionary Computation
Paradigm. PhD thesis, Brandeis University, May 1999.
[78] H.-B. Jun, C.-S. Joung, and K.-B. Sim. Co-evolution of fuzzy rules and membership
functions. In Proceedings of 3rd Asian Fuzzy Systems Symposium (AFSS’98), 1998.
[79] C. L. Karr. Genetic algorithms for fuzzy controllers. AI Expert, 6(2):26–33, February
1991.
[82] J. R. Koza. Genetic Programming. The MIT Press, Cambridge, MA, 1992.
[85] L. Kuncheva. Genetic algorithm for feature selection for parallel classifiers. Information
Processing Letters, 46(4):163–168, June 1993.
[86] L. Kuncheva and K. Andreeva. Dream: A shell-like software system for medical
data analysis and decision support. Computer Methods and Programs in Biomedicine,
40(2):73–81, June 1993.
[87] M. Langer, R. Brown, S. Morrill, R. Lane, and O. Lee. A generic genetic algorithm for
generating beam weights. Medical Physics, 23(6):965–71, June 1996.
[88] J. Laurikkala and M. Juhola. A genetic-based machine learning system to discover the
diagnostic rules for female urinary incontinence. Computer Methods and Programs in
Biomedicine, 55(3):217–228, March 1998.
[89] C.-T. Lin and C. S. G. Lee. Neural-network-based fuzzy logic control and decision
system. IEEE Transactions on Computers, 40(12):1320–1336, December 1991.
134 Bibliography
[90] P. Lindskog. Fuzzy identification from a grey box modeling point of view. In H. Hel-
lendoorn and D. Driankov, editors, Fuzzy Model Identification, pages 3–50. Springer-
Verlag, 1997.
[91] Y.L. Loukas. Adaptive neuro-fuzzy inference system: An instant and architecture-free
predictor for improved QSAR studies. Journal of Medicinal Chemistry, 44(17):2772–
2783, August 2001.
[94] E. H. Mamdani. Application of fuzzy algorithms for control of a simple dynamic plant.
Proceedings of the IEE, 121(12):1585–1588, 1974.
[95] E. H. Mamdani and S. Assilian. An experiment in linguistic synthesis with a fuzzy logic
controller. International Journal of Man-Machine Studies, 7(1):1–13, 1975.
[96] O. L. Mangasarian, R. Setiono, and W.-H Goldberg. Pattern recognition via linear
programming: Theory and application to medical diagnosis. In T. F. Coleman and
Y. Li, editors, Large-Scale Numerical Optimization, pages 22–31. SIAM, 1990.
[101] J. M. Mendel. Fuzzy logic systems for engineering: A tutorial. Proceedings of the
IEEE, 83(3):345–377, March 1995.
[102] C. J. Merz and P. M. Murphy. UCI repository of machine learning databases, 1996.
[105] D. Michie, D. J. Spiegelhalter, and C. C. Taylor, editors. Machine Learning, Neural and
Statistical Classification. Ellis Horwood, 1994.
[106] M. Mitchell. An Introduction to Genetic Algorithms. MIT Press, Cambridge, MA, 1996.
[107] S. Mitra and Y. Hayashi. Neuro-fuzzy rule generation: Survey in soft computing frame-
work. IEEE Transactions on Neural Networks, 11(3):748–768, May 2000.
[108] S. K. Mitra and S. N. Sarbadhikari. Iterative function system and genetic algorithm-
based EEG compression. Medical Engineering and Physics, 19(7):605–617, October
1997.
[111] D. Nauck and R. Kruse. A neuro-fuzzy method to learn fuzzy classification rules from
data. Fuzzy Sets and Systems, 89(3):277–288, August 1997.
[112] D. Nauck and R. Kruse. Neuro-fuzzy systems for function approximation. Fuzzy Sets
and Systems, 101(2):261–271, January 1999.
[113] B. N. Nelson. Automatic vehicle detection in infrared imagery using a fuzzy inference-
based classification system. IEEE-Transaction on Fuzzy Systems, 9:53–61, February
2001.
[114] P. S. Ngan, M. L. Wong, W. Lam, K. S. Leung, and J. C. Cheng. Medical data mining
using evolutionary computation. Artificial Intelligence in Medicine, 16(1):73–96, May
1999.
[116] T. Niwa and M. Tanaka. Analysis on the island model parallel genetic algorithms for
the genetic drifts. In B. McKay, X. Yao, C. S. Newton, J.-H. Kim, and T. Furuhashi,
editors, Proceedings of the 2nd Asia-Pacific Conference on Simulated Evolution and
Learning (SEAL-98), volume 1585 of LNAI, pages 349–356, Berlin, November 24–27
1999. Springer.
[117] S. Nolfi and D. Floreano. Co-evolving predator and prey robots: Do ’arm races’ arise
in artificial evolution? Artificial Life, 4(4):311–335, 1998.
136 Bibliography
[118] P. Nordin. Evolutionary Program Induction of Binary Machine Code and its Applica-
tion. Krehl Verlag, Munster, Germany, 1997.
[119] C.W. Omlin, L. Giles, and K.K. Thornber. Fuzzy knowledge and recurrent neural net-
works: A dynamical systems perspective. Hybrid Neural Systems. Lecture Notes in
Artificial Intelligence, 1778:123–143, 2000.
[122] C. S. Pattichis and C. N. Schizas. Genetics-based machine learning for the assessment
of certain neuromuscular disorders. IEEE Transactions on Neural Networks, 7(2):427–
439, March 1996.
[123] C. A. Peña-Reyes and M. Sipper. Evolving fuzzy rules for breast cancer diagnosis. In
Proceedings of 1998 International Symposium on Nonlinear Theory and Applications
(NOLTA’98), volume 2, pages 369–372. Presses Polytechniques et Universitaires Ro-
mandes, Lausanne, 1998.
[124] C. A. Peña-Reyes and M. Sipper. Designing breast cancer diagnostic systems via a hy-
brid fuzzy-genetic methodology. In 1999 IEEE International Fuzzy Systems Conference
Proceedings, volume 1, pages 135–139. IEEE Neural Network Council, 1999.
[126] C. A. Peña-Reyes and M. Sipper. Applying Fuzzy CoCo to breast cancer diagnosis. In
Proceedings of the 2000 Congress on Evolutionary Computation (CEC00), volume 2,
pages 1168–1175. IEEE Press, Piscataway, NJ, USA, 2000.
[128] C. A. Peña-Reyes and M. Sipper. The flowering of Fuzzy CoCo: Evolving fuzzy iris
classifiers. In V. Kurková, N. C. Steele, R. Neruda, and M. Kárný, editors, Proceedings
of the 5th International Conference on Artificial Neural Networks and Genetic Algo-
rithms (ICANNGA 2001), pages 304–307. Springer-Verlag, 2001.
[130] C.-A. Peña-Reyes and M. Sipper. Combining evolutionary and fuzzy techniques in
medical diagnosis. In M. Schmitt, H. N. Teodorescu, A. Jain, A. Jain, S. Jain, and
L. C. Jain, editors, Computational Intelligence Techniques in Medical Diagnosis and
Prognosis, volume 96 of Studies in Fuzziness and Soft Computing, chapter 14, pages
391–426. Springer-Verlag, Heidelberg, 2002.
Bibliography 137
[131] C.-A. Peña-Reyes and M. Sipper. Fuzzy CoCo: Balancing accuracy and interpretability
of fuzzy models by means of coevolution. In J. Casillas, O. Cordón, F. Herrera, and
L. Magdalena, editors, Trade-off between Accuracy and Interpretability in Fuzzy Rule-
Based Modelling, Studies in Fuzziness and Soft Computing. Physica-Verlag, 2002. (In
press).
[134] W. Pedrycz and J. Valente de Oliveira. Optimization of fuzzy models. IEEE Transac-
tions on Systems, Man and Cybernetics, Part B: Cybernetics, 26(4):627–636, August
1996.
[136] V. Podgorelec and P. Kokol. Genetic algorithm based system for patient scheduling in
highly constrained situations. Journal of Medical Systems, 21(6):417–427, December
1997.
[138] R. Poli, S. Cagnoni, and G. Valli. Genetic design of optimum linear and nonlinear QRS
detectors. IEEE Transactions on Biomedical Engineering, 42(11):1137–41, November
1995.
[139] L. Porta, R. Villa, E. Andia, and E. Valderrama. Infraclinic breast carcinoma: Ap-
plication of neural networks techniques for the indication of radioguided biopsias. In
J. Mira, R. Moreno-Díaz, and J. Cabestany, editors, Biological and Artificial Compu-
tation: From Neuroscience to Technology, volume 1240 of Lecture Notes in Computer
Science, pages 978–985. Springer, 1997.
[140] M. A. Potter. The Design and Analysis of a Computational Model of Cooperative Co-
evolution. PhD thesis, George Mason University, 1997.
[142] M. A. Potter and K. A. De Jong. Evolving neural networks with collaborative species.
In Proc. of the 1995 Summer Computer Simulation Conf., pages 340–345. The Society
of Computer Simulation, 1995.
[145] W. K. Purves, G. H. Orians, H. C. Heller, and D. Sadava. Life. The Science of Biology,
chapter 52: Community Ecology. Sinauer, 5th edition, 1998.
[146] R. Rada, Y. Rhine, and J. Smailwood. Rule refinement. In Proceedings of the Society of
Computer Applications in Medical Care, volume 8 of Annual symposium on computer
application in medical care; proceedings, pages 62–65, 1984.
[149] I. Rojas, J. Gonzalez, H. Pomares, F.J. Rojas, F.J. Fernández, and A. Prieto. Multi-
dimensional and multideme genetic algorithms for the construction of fuzzy systems.
International Journal of Approximate Reasoning, 26:179–210, 2001.
[150] I. Rojas, H. Pomares, J. Ortega, and A. Prieto. Self-organized fuzzy system generation
from training examples. IEEE Transactions on Fuzzy Systems, 8(1):23–36, February
2000.
[151] C. D. Rosin and R. K. Belew. New methods for competitive coevolution. Evolutionary
Computation, 5(1):1–29, 1997.
[152] M. Russo. FuGeNeSys - A fuzzy genetic neural system for fuzzy modeling. IEEE
Transactions on Fuzzy Systems, 6(3):373–388, August 1998.
[155] A. Lo Schiavo and A. M. Luciano. Powerful and flexible fuzzy algorithm for nonlin-
ear dynamic system identification. IEEE-Transaction on Fuzzy Systems, 9:828–835,
December 2001.
[157] H.-P. Schwefel. Kybernetische evolution als strategie der experimentelen forschung
in der stromungstechnik. Master’s thesis, Technical University of Berlin, Hermann
Föttinger Institute for Hydrodynamics, March 1965.
[158] H.-P. Schwefel. Evolutionsstrategie und numerische Optimierung. PhD thesis, Techni-
cal University of Berlin, Department of Process Engineering, Berlin, May 1975.
[159] H.-P. Schwefel. Evolution and Optimum Seeking. Sixth-Generation Computer Technol-
ogy Series. John Wiley & Sons., New York, 1995.
[160] R. Setiono. Extracting rules from pruned neural networks for breast cancer diagnosis.
Artificial Intelligence in Medicine, 8:37–51, 1996.
[161] R. Setiono. Generating concise and accurate classification rules for breast cancer diag-
nosis. Artificial Intelligence in Medicine, 18(3):205 – 219, 2000.
[162] R. Setiono and H. Liu. Symbolic representation of neural networks. IEEE Computer,
29(3):71–77, March 1996.
[163] M. Setnes. Supervised fuzzy clustering for rule extraction. IEEE Transactions on Fuzzy
Systems, 8(4):416–424, August 2000.
[164] Y. Shi, R. Eberhart, and Y. Chen. Implementation of evolutionary fuzzy systems. IEEE
Transactions on Fuzzy Systems, 7(2):109–119, April 1999.
[165] B. Sierra and P. Larrañaga. Predicting survival in malignant skin melanoma using
bayesian networks automatically induced by genetic algorithms. An empirical compari-
son between different approaches. Artificial Intelligence in Medicine, 14(1-2):215–230,
September-October 1998.
[168] M. Sugeno and G. T. Kang. Structure identification of fuzzy model. Fuzzy Sets and
Systems, 28(1):15–33, 1988.
140 Bibliography
[169] I. Taha and J. Ghosh. Evaluation and ordering of rules extracted from feedforward
networks. In Proceedings of the IEEE International Conference on Neural Networks,
pages 221–226, 1997.
[170] Y. Takagi and M. Sugeno. Fuzzy Identification of Systems and Its Applications to
Modeling and Control. IEEE Transactions on Systems, Man and Cybernetics, 15:116–
132, 1985.
[173] E. C. C. Tsang, X. S. Wang, and D.S. Yeung. Improving learning accuracy of fuzzy de-
cision trees by hybrid neural networks. IEEE-Transactions on Fuzzy Systems, 8(5):601–
614, October 2000.
[176] R. P. Velthuizen, L. O. Hall, and L. P. Clarke. Feature extraction for MRI segmentation.
Journal of Neuroimaging, 9(2):85–90, April 1999.
[177] M. D. Vose. The Simple Genetic Algorithm. MIT Press, Cambridge, MA, August 1999.
[178] P. Vuorimaa. Fuzzy self-organizing map. Fuzzy Sets and Systems, 66:223–231, 1994.
[180] D. Whitley, S. Rana, and R. Heckendorn. Exploiting separability in search: The island
model genetic algorithm. Journal of Computing and Information Technology, 7(1):33–
47, 1999. (Special Issue on Evolutionary Computing).
[181] I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Tech-
niques with Java Implementations. Data Management Systems. Morgan Kaufmann, San
Francisco, USA, 2000.
Bibliography 141
[182] A. Włoszek and P. D. Domański. Application of the coevolutionary system to the rule
generation in fuzzy systems. In Proceedings of IKAE96, pages 203–210, Murzasichle,
Poland, 1996.
[183] A. Włoszek and P. D. Domański. Application of the coevolutionary system to the fuzzy
model design. In Proceedings of the Sixth IEEE International Conference on Fuzzy
Systems, volume 1, pages 391–395, 1997.
[184] T.-P. Wu and S.-M. Chen. A new method for constructing membership functions and
fuzzy rules from training examples. IEEE Transactions on Systems, Man and Cyber-
netics, Part B: Cybernetics, 29(1):25–40, February 1999.
[185] R. R. Yager and D. P. Filev. Essentials of Fuzzy Modeling and Control. John Wiley &
Sons., New York, 1994.
[186] R. R. Yager and L. A. Zadeh. Fuzzy Sets, Neural Networks, and Soft Computing. Van
Nostrand Reinhold, New York, 1994.
[187] G. Yang, L. E. Reinstein, S. Pai, Z. Xu, and D. L. Carroll. A new genetic algo-
rithm technique in optimization of permanent 125 I prostate implants. Medical Physics,
25(12):2308–2315, December 1998.
[188] X. Yao and Y. Liu. A new evolutionary system for evolving artificial neural networks.
IEEE Transactions on Neural Networks, 8(3):694–713, May 1997.
[189] Y. Yu. Multiobjective decision theory for computational optimization in radiation ther-
apy. Medical Physics, 24(9):1445–1454, September 1997.
[190] Y. Yu and M. C. Schell. A genetic algorithm for the optimization of prostate implants.
Medical Physics, 23(12):2085–91, December 1996.
[191] Y. Yu, M. C. Schell, and J. B. Zhang. Decision theoretic steering and genetic algo-
rithm optimization: Application to stereotactic radiosurgery treatment planning. Medi-
cal Physics, 24(11):1742–50, November 1997.
[194] L. A. Zadeh. Outline of a new approach to the analysis of complex systems and decision
processes. IEEE Transactions on Systems, Man and Cybernetics, SMC-3(1):28–44,
January 1973.
[195] L. A. Zadeh. The concept of a linguistic variable and its applications to approximate
reasoning. Information Science, Part I, 8:199-249; Part II, 8:301-357; Part III, 9:43-80.,
1975.
142 Bibliography
[197] B. Zheng, Y. H. Chang, X. H. Wang, W. F. Good, and D. Gur. Feature selection for
computerized mass detection in digitized mammograms by using a genetic algorithm.
Academic Radiology, 6(6):327–332, June 1999.
Curriculum Vitæ
PERSONAL DATA
EDUCATION
143
144 Bibliography
ACADEMIC POSITIONS
• PhD Candidate and teaching assistant
Logic Systems Laboratory (LSL)
Swiss Federal Institute of Technology at Lausanne - EPFL
Lausanne, Switzerland, January 1999 - September 2002
• Fellowship holder
Swiss Federal Institute of Technology at Lausanne - EPFL
Lausanne, Switzerland, July 1996 - December 1998
• Assistant Instructor
Pontificia Universidad Javeriana and Corporación Universitaria Autónoma de Occidente
Cali, Colombia, January 1995 - May 1996
RESEARCH INTERESTS
• Bioinspired systems
• Computational Intelligence - Soft Computing
• Fuzzy systems
• Evolutionary computation
• Medical applications of computational intelligence techniques
• Adaptation, learning, and intelligence
• Autonomous robotics
• Evolutionary Fuzzy Modeling, Course, Universidad de los Andes, Bogotá and Pontificia
Universidad Javeriana, Cali, Colombia, March 2000.
• Test of the applicability of Fuzzy CoCo to function approximation, Akhil Gupta, Indian
Institute of Technology, Kanpur (Summer Internship Program, EPFL), summer 2001.
• Island model for the cooperative coevolutionary algorithm Fuzzy CoCo, Yves Blatter,
EPFL, winter 2000-01.
146 Bibliography
• A fuzzy logic-based tool for computer-assisted breast cancer diagnosis, Richard Greset,
EPFL, winter 2000-01.
• A study of the Fuzzy CoCo algorithm, Jean-Marc Henry, EPFL, summer 2000.
• Intelligent data analysis, Francesc Font (in cooperation with J.-L. Beuchat), EPFL,
winter 99-00
LANGUAGES
• General manager
Profesionales en comunicaciones PROECOM Ltda.
Cali, Colombia, March 1992 - May 1995
• Research assistant
Empresa Nacional de Telecomunicaciones TELECOM
Research Division
Bogotá, Colombia, 1991
Bibliography 147
MEMBERSHIPS
• IEEE, The Institute of Electric and Electronics Engineers
SM-85, M-93, SM-95.
• EvoNet, European Network of Excellence in Evolutionary Computing
• EUNITE, EUropean Network on Intelligent TEchnologies for Smart Adaptive Systems
• AL-IC, Latin-American Association of Computational Intelligence
Co-founder in 2002
• ACIS, Colombian Association of Researchers in Switzerland
President since 1999
• Red-CIC, Colombian Network on Computational Intelligence and Bioinspired Systems.
General coordinator since its foundation in 2000
• Gi3, Research Group on Computational Intelligence and Bio-inspired Systems
Co-founder in 1997
• Red Caldas, Colombian Network of Researchers Abroad
Member since 1992, administrator of the electronic list in 1998-99
PUBLICATIONS
Journal Papers
1. C. A. Peña-Reyes and M. Sipper, "Fuzzy Modeling by Fuzzy CoCo," International Jour-
nal of Computational Intelligence and Applications, 2002. (Submitted.)
2. C. A. Peña-Reyes and M. Sipper, "Fuzzy CoCo: A Cooperative-Coevolutionary ap-
proach to Fuzzy Modeling," IEEE Transactions on Fuzzy Systems, vol. 9, no. 5, pp.
727-737, Oct. 2001.
3. C. A. Peña-Reyes and M. Sipper, "Evolutionary Computation in Medicine: An
Overview," Artificial Intelligence in Medicine, vol. 19, no. 1, pp. 1-23, May 2000.
4. H. F. Restrepo, C. A. Peña-Reyes, and A. Perez-Uribe, "Hacia el desarrollo de nuevas
máquinas computacionales: lecciones que aprendemos de la naturaleza," Energía y
Computación, vol. IX, no. 2, pp. 40–47, 2000.
5. C. A. Peña-Reyes and M. Sipper, "A fuzzy-genetic approach to breast cancer diagnosis,"
Artificial Intelligence in Medicine, vol. 17, no. 2, pp. 131-155, Oct. 1999.
Book chapters
1. C.-A. Peña-Reyes and M. Sipper, "Fuzzy CoCo: Balancing accuracy and interpretability
of fuzzy models by means of coevolution," in Trade-off between Accuracy and Inter-
pretability in Fuzzy Rule-Based Modelling, (J. Casillas, O. Cordon, F. Herrera, and L.
Magdalena, eds.), 2002. (In press.)
148 Bibliography
Conferences
7. C. A. Peña-Reyes and M. Sipper, "Evolving Fuzzy Rules for Breast Cancer Diagnosis,"
in Proceedings of 1998 International Symposium on Nonlinear Theory and Applications
(NOLTA’98), pp. 369-372, 1998.