08 Data Mining-Other Classifications
08 Data Mining-Other Classifications
Challenges in case-based reasoning include finding a good similarity metric (e.g., for
matching subgraphs) and suitable methods for combining solutions. Other challenges
include the selection of salient features for indexing training cases and the development
of efficient indexing techniques. A trade-off between accuracy and efficiency evolves as
the number of stored cases becomes very large. As this number increases, the case-based
reasoner becomes more intelligent. After a certain point, however, the systems efficiency
will suffer as the time required to search for and process relevant cases increases. As with
nearest-neighbor classifiers, one solution is to edit the training database. Cases that are
redundant or that have not proved useful may be discarded for the sake of improved
performance. These decisions, however, are not clear-cut and their automation remains
an active area of research.
9.6
9.6.1
Genetic Algorithms
Genetic algorithms attempt to incorporate ideas of natural evolution. In general,
genetic learning starts as follows. An initial population is created consisting of randomly
generated rules. Each rule can be represented by a string of bits. As a simple example,
suppose that samples in a given training set are described by two Boolean attributes,
A1 and A2 , and that there are two classes, C1 and C2 . The rule IF A1 AND NOT A2
THEN C2 can be encoded as the bit string 100, where the two leftmost bits represent
attributes A1 and A2 , respectively, and the rightmost bit represents the class. Similarly,
the rule IF NOT A1 AND NOT A2 THEN C1 can be encoded as 001. If an attribute
has k values, where k > 2, then k bits may be used to encode the attributes values.
Classes can be encoded in a similar fashion.
Based on the notion of survival of the fittest, a new population is formed to consist
of the fittest rules in the current population, as well as offspring of these rules. Typically,
the fitness of a rule is assessed by its classification accuracy on a set of training samples.
Offspring are created by applying genetic operators such as crossover and mutation.
In crossover, substrings from pairs of rules are swapped to form new pairs of rules. In
mutation, randomly selected bits in a rules string are inverted.
The process of generating new populations based on prior populations of rules continues until a population, P, evolves where each rule in P satisfies a prespecified fitness
threshold.
427
Genetic algorithms are easily parallelizable and have been used for classification as
well as other optimization problems. In data mining, they may be used to evaluate the
fitness of other algorithms.
9.6.2
Upper approximation of C
Lower approximation of C
Figure 9.14 A rough set approximation of class Cs set of tuples using lower and upper approximation
sets of C. The rectangular regions represent equivalence classes.
searching on the entire training set, the matrix is instead searched to detect redundant
attributes.
9.6.3
(9.24)
By Rule (9.24), a customer who has had a job for at least two years will receive credit
if her income is, say, $50,000, but not if it is $49,000. Such harsh thresholding may seem
unfair.
Instead, we can discretize income into categories (e.g., {low income, medium income,
high income}) and then apply fuzzy logic to allow fuzzy thresholds or boundaries to
be defined for each category (Figure 9.15). Rather than having a precise cutoff between
categories, fuzzy logic uses truth values between 0.0 and 1.0 to represent the degree of
membership that a certain value has in a given category. Each category then represents a
fuzzy set. Hence, with fuzzy logic, we can capture the notion that an income of $49,000
is, more or less, high, although not as high as an income of $50,000. Fuzzy logic systems
typically provide graphical tools to assist users in converting attribute values to fuzzy
truth values.
Fuzzy set theory is also known as possibility theory. It was proposed by Lotfi Zadeh
in 1965 as an alternative to traditional two-value logic and probability theory. It lets
us work at a high abstraction level and offers a means for dealing with imprecise data
Fuzzy membership
428
low
medium
high
1.0
0.5
0
0
10K
20K
30K
40K
50K
income
60K
70K
Figure 9.15 Fuzzy truth values for income, representing the degree of membership of income values with
respect to the categories {low, medium, high}. Each category represents a fuzzy set. Note that
a given income value, x, can have membership in more than one fuzzy set. The membership
values of x in each fuzzy set do not have to total to 1.
429
measurement. Most important, fuzzy set theory allows us to deal with vague or inexact
facts. For example, being a member of a set of high incomes is inexact (e.g., if $50,000
is high, then what about $49,000? or $48,000?) Unlike the notion of traditional crisp
sets where an element belongs to either a set S or its complement, in fuzzy set theory,
elements can belong to more than one fuzzy set. For example, the income value $49,000
belongs to both the medium and high fuzzy sets, but to differing degrees. Using fuzzy set
notation and following Figure 9.15, this can be shown as
mmedium income ($49,000) = 0.15 and mhigh income ($49,000) = 0.96,
where m denotes the membership function, that is operating on the fuzzy sets of
medium income and high income, respectively. In fuzzy set theory, membership values for a given element, x (e.g., for $49,000), do not have to sum to 1. This is unlike
traditional probability theory, which is constrained by a summation axiom.
Fuzzy set theory is useful for data mining systems performing rule-based classification. It provides operations for combining fuzzy measurements. Suppose that in
addition to the fuzzy sets for income, we defined the fuzzy sets junior employee and
senior employee for the attribute years employed. Suppose also that we have a rule that,
say, tests high income and senior employee in the rule antecedent (IF part) for a given
employee, x. If these two fuzzy measures are ANDed together, the minimum of their
measure is taken as the measure of the rule. In other words,
m(high income AND senior
This is akin to saying that a chain is as strong as its weakest link. If the two measures
are ORed, the maximum of their measure is taken as the measure of the rule. In other
words,
m(high income OR senior
Intuitively, this is like saying that a rope is as strong as its strongest strand.
Given a tuple to classify, more than one fuzzy rule may apply. Each applicable rule
contributes a vote for membership in the categories. Typically, the truth values for each
predicted category are summed, and these sums are combined. Several procedures exist
for translating the resulting fuzzy output into a defuzzified or crisp value that is returned
by the system.
Fuzzy logic systems have been used in numerous areas for classification, including
market research, finance, health care, and environmental engineering.
9.7