0% found this document useful (0 votes)

39 views

08 Data Mining-Other Classifications

This document discusses several advanced classification methods including genetic algorithms, rough set approaches, and fuzzy set approaches. Genetic algorithms attempt to mimic natural evolution by generating an initial population of randomly generated rules that are assessed for fitness. New populations are formed from the fittest rules through genetic operators like crossover and mutation. Rough set approaches establish equivalence classes within data to approximate classifications. Fuzzy set approaches allow for fuzzy or gradual thresholds rather than sharp cutoffs, allowing elements to belong to multiple fuzzy sets to varying degrees. This handles imprecise data better than traditional binary logic.

Uploaded by

Raj Endran

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views

08 Data Mining-Other Classifications

Uploaded by

Raj Endran

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

426

Chapter 9 Classification: Advanced Methods

Challenges in case-based reasoning include finding a good similarity metric (e.g., for
matching subgraphs) and suitable methods for combining solutions. Other challenges
include the selection of salient features for indexing training cases and the development
of efficient indexing techniques. A trade-off between accuracy and efficiency evolves as
the number of stored cases becomes very large. As this number increases, the case-based
reasoner becomes more intelligent. After a certain point, however, the systems efficiency
will suffer as the time required to search for and process relevant cases increases. As with
nearest-neighbor classifiers, one solution is to edit the training database. Cases that are
redundant or that have not proved useful may be discarded for the sake of improved
performance. These decisions, however, are not clear-cut and their automation remains
an active area of research.

9.6

Other Classification Methods

In this section, we give a brief description of several other classification methods, including genetic algorithms (Section 9.6.1), rough set approach (Section 9.6.2), and fuzzy set
approaches (Section 9.6.3). In general, these methods are less commonly used for classification in commercial data mining systems than the methods described earlier in this
book. However, these methods show their strength in certain applications, and hence it
is worthwhile to include them here.

9.6.1

Genetic Algorithms
Genetic algorithms attempt to incorporate ideas of natural evolution. In general,
genetic learning starts as follows. An initial population is created consisting of randomly
generated rules. Each rule can be represented by a string of bits. As a simple example,
suppose that samples in a given training set are described by two Boolean attributes,
A1 and A2 , and that there are two classes, C1 and C2 . The rule IF A1 AND NOT A2
THEN C2 can be encoded as the bit string 100, where the two leftmost bits represent
attributes A1 and A2 , respectively, and the rightmost bit represents the class. Similarly,
the rule IF NOT A1 AND NOT A2 THEN C1 can be encoded as 001. If an attribute
has k values, where k > 2, then k bits may be used to encode the attributes values.
Classes can be encoded in a similar fashion.
Based on the notion of survival of the fittest, a new population is formed to consist
of the fittest rules in the current population, as well as offspring of these rules. Typically,
the fitness of a rule is assessed by its classification accuracy on a set of training samples.
Offspring are created by applying genetic operators such as crossover and mutation.
In crossover, substrings from pairs of rules are swapped to form new pairs of rules. In
mutation, randomly selected bits in a rules string are inverted.
The process of generating new populations based on prior populations of rules continues until a population, P, evolves where each rule in P satisfies a prespecified fitness
threshold.

9.6 Other Classification Methods

427

Genetic algorithms are easily parallelizable and have been used for classification as
well as other optimization problems. In data mining, they may be used to evaluate the
fitness of other algorithms.

9.6.2

Rough Set Approach

Rough set theory can be used for classification to discover structural relationships within
imprecise or noisy data. It applies to discrete-valued attributes. Continuous-valued
attributes must therefore be discretized before its use.
Rough set theory is based on the establishment of equivalence classes within the
given training data. All the data tuples forming an equivalence class are indiscernible,
that is, the samples are identical with respect to the attributes describing the data. Given
real-world data, it is common that some classes cannot be distinguished in terms of the
available attributes. Rough sets can be used to approximately or roughly define such
classes. A rough set definition for a given class, C, is approximated by two setsa lower
approximation of C and an upper approximation of C. The lower approximation of C
consists of all the data tuples that, based on the knowledge of the attributes, are certain to
belong to C without ambiguity. The upper approximation of C consists of all the tuples
that, based on the knowledge of the attributes, cannot be described as not belonging to
C. The lower and upper approximations for a class C are shown in Figure 9.14, where
each rectangular region represents an equivalence class. Decision rules can be generated
for each class. Typically, a decision table is used to represent the rules.
Rough sets can also be used for attribute subset selection (or feature reduction, where
attributes that do not contribute to the classification of the given training data can be
identified and removed) and relevance analysis (where the contribution or significance
of each attribute is assessed with respect to the classification task). The problem of finding the minimal subsets (reducts) of attributes that can describe all the concepts in
the given data set is NP-hard. However, algorithms to reduce the computation intensity
have been proposed. In one method, for example, a discernibility matrix is used that
stores the differences between attribute values for each pair of data tuples. Rather than
C

Upper approximation of C
Lower approximation of C

Figure 9.14 A rough set approximation of class Cs set of tuples using lower and upper approximation
sets of C. The rectangular regions represent equivalence classes.

Chapter 9 Classification: Advanced Methods

searching on the entire training set, the matrix is instead searched to detect redundant
attributes.

9.6.3

Fuzzy Set Approaches

Rule-based systems for classification have the disadvantage that they involve sharp cutoffs for continuous attributes. For example, consider the following rule for customer
credit application approval. The rule essentially says that applications for customers
who have had a job for two or more years and who have a high income (i.e., of at least
$50,000) are approved:
IF (years employed 2) AND (income 50,000) THEN credit = approved.

(9.24)

By Rule (9.24), a customer who has had a job for at least two years will receive credit
if her income is, say, $50,000, but not if it is $49,000. Such harsh thresholding may seem
unfair.
Instead, we can discretize income into categories (e.g., {low income, medium income,
high income}) and then apply fuzzy logic to allow fuzzy thresholds or boundaries to
be defined for each category (Figure 9.15). Rather than having a precise cutoff between
categories, fuzzy logic uses truth values between 0.0 and 1.0 to represent the degree of
membership that a certain value has in a given category. Each category then represents a
fuzzy set. Hence, with fuzzy logic, we can capture the notion that an income of $49,000
is, more or less, high, although not as high as an income of $50,000. Fuzzy logic systems
typically provide graphical tools to assist users in converting attribute values to fuzzy
truth values.
Fuzzy set theory is also known as possibility theory. It was proposed by Lotfi Zadeh
in 1965 as an alternative to traditional two-value logic and probability theory. It lets
us work at a high abstraction level and offers a means for dealing with imprecise data

Fuzzy membership

428

low

medium

high

1.0
0.5
0
0

10K

20K

30K

40K
50K
income

60K

70K

Figure 9.15 Fuzzy truth values for income, representing the degree of membership of income values with
respect to the categories {low, medium, high}. Each category represents a fuzzy set. Note that
a given income value, x, can have membership in more than one fuzzy set. The membership
values of x in each fuzzy set do not have to total to 1.

9.7 Additional Topics Regarding Classification

429

measurement. Most important, fuzzy set theory allows us to deal with vague or inexact
facts. For example, being a member of a set of high incomes is inexact (e.g., if $50,000
is high, then what about $49,000? or $48,000?) Unlike the notion of traditional crisp
sets where an element belongs to either a set S or its complement, in fuzzy set theory,
elements can belong to more than one fuzzy set. For example, the income value $49,000
belongs to both the medium and high fuzzy sets, but to differing degrees. Using fuzzy set
notation and following Figure 9.15, this can be shown as
mmedium income ($49,000) = 0.15 and mhigh income ($49,000) = 0.96,
where m denotes the membership function, that is operating on the fuzzy sets of
medium income and high income, respectively. In fuzzy set theory, membership values for a given element, x (e.g., for $49,000), do not have to sum to 1. This is unlike
traditional probability theory, which is constrained by a summation axiom.
Fuzzy set theory is useful for data mining systems performing rule-based classification. It provides operations for combining fuzzy measurements. Suppose that in
addition to the fuzzy sets for income, we defined the fuzzy sets junior employee and
senior employee for the attribute years employed. Suppose also that we have a rule that,
say, tests high income and senior employee in the rule antecedent (IF part) for a given
employee, x. If these two fuzzy measures are ANDed together, the minimum of their
measure is taken as the measure of the rule. In other words,
m(high income AND senior

employee) (x) = min(mhigh income (x), msenior employee (x)).

This is akin to saying that a chain is as strong as its weakest link. If the two measures
are ORed, the maximum of their measure is taken as the measure of the rule. In other
words,
m(high income OR senior

employee) (x) = max(mhigh income (x), msenior employee (x)).

Intuitively, this is like saying that a rope is as strong as its strongest strand.
Given a tuple to classify, more than one fuzzy rule may apply. Each applicable rule
contributes a vote for membership in the categories. Typically, the truth values for each
predicted category are summed, and these sums are combined. Several procedures exist
for translating the resulting fuzzy output into a defuzzified or crisp value that is returned
by the system.
Fuzzy logic systems have been used in numerous areas for classification, including
market research, finance, health care, and environmental engineering.

9.7

Additional Topics Regarding Classification

Most of the classification algorithms we have studied handle multiple classes, but some,
such as support vector machines, assume only two classes exist in the data. What adaptations can be made to allow for when there are more than two classes? This question is
addressed in Section 9.7.1 on multiclass classification.

The AI Wealth Creation Blueprint PDF
67% (3)
The AI Wealth Creation Blueprint PDF
50 pages
Procedural Generation in Game Design
93% (14)
Procedural Generation in Game Design
339 pages
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
88% (8)
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
56 pages
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
81% (48)
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
708 pages
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
100% (10)
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
821 pages
A Coomer's Guide To AI Dungeon
No ratings yet
A Coomer's Guide To AI Dungeon
30 pages
Solutions To II Unit Exercises From Kamber
83% (42)
Solutions To II Unit Exercises From Kamber
16 pages
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
100% (25)
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
306 pages
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
100% (24)
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
52 pages
Banana Pancakes - Ukulele Chord Chart
100% (1)
Banana Pancakes - Ukulele Chord Chart
2 pages
The Fabric of Reality
100% (1)
The Fabric of Reality
6 pages
Moves Towards The Incomprehensible Wild
100% (1)
Moves Towards The Incomprehensible Wild
14 pages
75 Productivity Hacks - System Sunday
100% (7)
75 Productivity Hacks - System Sunday
75 pages
Assignment 1:: Intro To Machine Learning
No ratings yet
Assignment 1:: Intro To Machine Learning
6 pages
E-Tivity 2.2 Tharcisse 217010849
No ratings yet
E-Tivity 2.2 Tharcisse 217010849
7 pages
Military Remote Viewing Manual
100% (5)
Military Remote Viewing Manual
72 pages
The Ask and The Answer by Patrick Ness Discussion Guide
No ratings yet
The Ask and The Answer by Patrick Ness Discussion Guide
6 pages
Mid Term
No ratings yet
Mid Term
12 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
2 SVM Kernel
No ratings yet
2 SVM Kernel
8 pages
Whole ML PDF 1614408656
100% (1)
Whole ML PDF 1614408656
214 pages
Threshold Tuning For Improved Classification Association Rule Mining
No ratings yet
Threshold Tuning For Improved Classification Association Rule Mining
10 pages
Machine_Learning_Unit_4
No ratings yet
Machine_Learning_Unit_4
22 pages
Answers PDF
No ratings yet
Answers PDF
9 pages
Common DS Interview Questions and Answers - 2
No ratings yet
Common DS Interview Questions and Answers - 2
7 pages
A216 - DWM - LAbno 9
No ratings yet
A216 - DWM - LAbno 9
8 pages
Google - Machine Learning Glossary
No ratings yet
Google - Machine Learning Glossary
83 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Bayesian Laws
No ratings yet
Bayesian Laws
16 pages
ML_Dictionary__1735833898
No ratings yet
ML_Dictionary__1735833898
84 pages
Huang 2011
No ratings yet
Huang 2011
5 pages
Evaluation of Different Classifier
No ratings yet
Evaluation of Different Classifier
4 pages
Calibrated Lazy Associative Classification: Abstract. Classification Is An Important Problem in Data Mining. Given An Ex
No ratings yet
Calibrated Lazy Associative Classification: Abstract. Classification Is An Important Problem in Data Mining. Given An Ex
15 pages
How To Minimize Misclassification Rate and Expected Loss For Given Model
No ratings yet
How To Minimize Misclassification Rate and Expected Loss For Given Model
7 pages
A216 - DWM - LAbno 9
No ratings yet
A216 - DWM - LAbno 9
8 pages
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
No ratings yet
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
6 pages
Module 4
No ratings yet
Module 4
41 pages
Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055
No ratings yet
Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055
5 pages
Data Mining Miscellaneous Classification Methods
No ratings yet
Data Mining Miscellaneous Classification Methods
2 pages
تمارین درس داده کاوی فصل طبقه بندی
No ratings yet
تمارین درس داده کاوی فصل طبقه بندی
7 pages
Finding Association Rules That Trade Support Optimally Against Confidence
No ratings yet
Finding Association Rules That Trade Support Optimally Against Confidence
12 pages
Answer Report: Data Mining
No ratings yet
Answer Report: Data Mining
32 pages
Document
No ratings yet
Document
6 pages
Deductive Database Xujia
No ratings yet
Deductive Database Xujia
10 pages
Rule-Base Structure Identification in An Adaptive-Network-Based Fuzzy Inference System PDF
No ratings yet
Rule-Base Structure Identification in An Adaptive-Network-Based Fuzzy Inference System PDF
10 pages
An Alternative Ranking Problem For Search Engines: 1 Motivation
No ratings yet
An Alternative Ranking Problem For Search Engines: 1 Motivation
22 pages
Classification and Prediction
No ratings yet
Classification and Prediction
21 pages
statistic inference unit 2 notes
No ratings yet
statistic inference unit 2 notes
34 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
DWM Labno 9 A222
No ratings yet
DWM Labno 9 A222
10 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
DWM UNIT-V NOTES
No ratings yet
DWM UNIT-V NOTES
15 pages
Unit 4
No ratings yet
Unit 4
20 pages
21AI71-module-5-textbook
No ratings yet
21AI71-module-5-textbook
25 pages
Ai&ml 2
No ratings yet
Ai&ml 2
15 pages
Cap. 10 Reglas de asociacion sobre intervalos miller1997
No ratings yet
Cap. 10 Reglas de asociacion sobre intervalos miller1997
10 pages
DMWH M3
No ratings yet
DMWH M3
21 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Discovering Stock Price Prediction Rules Using Rough Sets
No ratings yet
Discovering Stock Price Prediction Rules Using Rough Sets
19 pages
For More Visit WWW - Ktunotes.in
No ratings yet
For More Visit WWW - Ktunotes.in
21 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
DS Unit 3 Essay Answers
No ratings yet
DS Unit 3 Essay Answers
15 pages
ML unit-2 (CEC)
No ratings yet
ML unit-2 (CEC)
96 pages
A Prototype System For Value-Range Queries
No ratings yet
A Prototype System For Value-Range Queries
4 pages
Unit-III Classification
No ratings yet
Unit-III Classification
10 pages
1.0 Modeling: 1.1 Classification
No ratings yet
1.0 Modeling: 1.1 Classification
5 pages
datamining unit4
No ratings yet
datamining unit4
21 pages
J. Appl. Math. & Informatics Vol. 26 (2008), No. 5 - 6, Pp. 861 - 876
No ratings yet
J. Appl. Math. & Informatics Vol. 26 (2008), No. 5 - 6, Pp. 861 - 876
16 pages
Data Minning Unit 2-1
No ratings yet
Data Minning Unit 2-1
10 pages
Mahfoud & Mani 1996
No ratings yet
Mahfoud & Mani 1996
24 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
Unit II Deep Learning
No ratings yet
Unit II Deep Learning
11 pages
UNIT III IRT
No ratings yet
UNIT III IRT
66 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Mining-Mining Sequence Patterns in Biological Data
No ratings yet
Data Mining-Mining Sequence Patterns in Biological Data
6 pages
Data Mining-Graph Mining
No ratings yet
Data Mining-Graph Mining
9 pages
Data Mining-Multimedia Datamining
No ratings yet
Data Mining-Multimedia Datamining
8 pages
Data Mining - Mining Sequential Patterns
No ratings yet
Data Mining - Mining Sequential Patterns
10 pages
Data Mining-Spatial Data Mining
No ratings yet
Data Mining-Spatial Data Mining
8 pages
Data Mining-Constraint Based Cluster Analysis
100% (1)
Data Mining-Constraint Based Cluster Analysis
4 pages
Data Mining-Mining Time Series Data
0% (1)
Data Mining-Mining Time Series Data
7 pages
Data Mining-Partitioning Methods
100% (1)
Data Mining-Partitioning Methods
7 pages
Data Mining-Model Based Clustering
No ratings yet
Data Mining-Model Based Clustering
8 pages
5.1 Mining Data Streams
No ratings yet
5.1 Mining Data Streams
16 pages
Data Mining-Outlier Analysis
No ratings yet
Data Mining-Outlier Analysis
6 pages
Data Mining-Backpropagation
100% (1)
Data Mining-Backpropagation
5 pages
Data Mining - Other Classifiers
No ratings yet
Data Mining - Other Classifiers
7 pages
Data Mining-Rule Based Classification
No ratings yet
Data Mining-Rule Based Classification
4 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
Data Mining - Discretization
100% (1)
Data Mining - Discretization
5 pages
Data Mining-Applications, Issues
No ratings yet
Data Mining-Applications, Issues
9 pages
Data Mining - Data Reduction
No ratings yet
Data Mining - Data Reduction
6 pages
Data Mining-Data Warehouse
No ratings yet
Data Mining-Data Warehouse
7 pages
Data Mining - Outlier Analysis
100% (3)
Data Mining - Outlier Analysis
11 pages
02 Data Mining-Partitioning Method
No ratings yet
02 Data Mining-Partitioning Method
8 pages
Data Mining - Density Based Clustering
No ratings yet
Data Mining - Density Based Clustering
8 pages
My Ai Cheat List
100% (11)
My Ai Cheat List
3 pages
The Secrets of A Slot Machine
No ratings yet
The Secrets of A Slot Machine
4 pages
2045: The Year Man Becomes Immortal
No ratings yet
2045: The Year Man Becomes Immortal
9 pages
Teas Topics To Study
100% (12)
Teas Topics To Study
6 pages
Mercity - Ai-Guide To Fine-Tuning LLMs Using PEFT and LoRa Techniques
No ratings yet
Mercity - Ai-Guide To Fine-Tuning LLMs Using PEFT and LoRa Techniques
25 pages
Mythic Magazine #009
100% (3)
Mythic Magazine #009
27 pages
Improved Statistical Test
87% (171)
Improved Statistical Test
20 pages
Download Complete Artificial Intelligence and Problem Solving 1st Edition Danny Kopec PDF for All Chapters
100% (4)
Download Complete Artificial Intelligence and Problem Solving 1st Edition Danny Kopec PDF for All Chapters
61 pages
Algebra Workbook
100% (3)
Algebra Workbook
299 pages
Next Generation Sequencing Data Analysis
No ratings yet
Next Generation Sequencing Data Analysis
435 pages
Ghosh S. Mathematics and Computer Science Vol 1. 2023
No ratings yet
Ghosh S. Mathematics and Computer Science Vol 1. 2023
743 pages
Prompt Engineering - Links and Resources
No ratings yet
Prompt Engineering - Links and Resources
2 pages
A Methodology For Detecting Credit Card Fraud
No ratings yet
A Methodology For Detecting Credit Card Fraud
60 pages
Deep Thinking Where Machine Intelligence PDF
100% (1)
Deep Thinking Where Machine Intelligence PDF
3 pages
Scientific American - April 2024
100% (1)
Scientific American - April 2024
88 pages
Websites and Tools Links
No ratings yet
Websites and Tools Links
3 pages
List of Deepfake Tools
No ratings yet
List of Deepfake Tools
5 pages
Cognitive Bias Cheat Sheet
100% (1)
Cognitive Bias Cheat Sheet
17 pages
Chainsaw Man Jumpchain
No ratings yet
Chainsaw Man Jumpchain
29 pages
Sociology: Final State Exam - Examples of Questions From The Previous Years
No ratings yet
Sociology: Final State Exam - Examples of Questions From The Previous Years
3 pages
Syhthesis Essay
No ratings yet
Syhthesis Essay
4 pages
Conflict of East and West
No ratings yet
Conflict of East and West
278 pages
Response To The Man of Mode (Sept. 2005? Scanned)
No ratings yet
Response To The Man of Mode (Sept. 2005? Scanned)
3 pages
Practical Research 2 Module - 2nd Sem
No ratings yet
Practical Research 2 Module - 2nd Sem
29 pages

08 Data Mining-Other Classifications

Uploaded by

08 Data Mining-Other Classifications

Uploaded by

426

Chapter 9 Classification: Advanced Methods

Other Classification Methods

9.6 Other Classification Methods

Rough Set Approach

Chapter 9 Classification: Advanced Methods

Fuzzy Set Approaches

9.7 Additional Topics Regarding Classification

employee) (x) = min(mhigh income (x), msenior employee (x)).

employee) (x) = max(mhigh income (x), msenior employee (x)).

Additional Topics Regarding Classification

You might also like