0% found this document useful (0 votes)

3 views

Classification-Clustering

The document discusses classification and clustering in machine learning, highlighting methods such as decision trees and Naive Bayesian classifiers. It explains concepts like information gain, Bayes' theorem, and the k-means clustering algorithm, detailing their processes and applications. Additionally, it outlines the strengths and weaknesses of these methods and emphasizes that no single approach is universally superior for all datasets.

Uploaded by

sahu.leena24

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Classification-Clustering

Uploaded by

sahu.leena24

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 44

Machine Learning

Classification and Clustering

2
Classificati
on Example
 The class label attribute, buys computer, has two
distinct values, namely, {yes, no}; therefore, there
are two distinct classes (i.e., m = 2).
 Let class C1 correspond to yes and class C2
correspond to no.
 There are nine tuples of class yes and five tuples of
class no.
 A (root) node N is created for the tuples in D.
 To find the splitting criterion for these tuples, we must
compute the information gain of each attribute.
 We first compute the expected information needed to
classify a tuple in D

3
Classificati
on Example
 Next, we need to compute the expected information
requirement for each attribute.
 Let’s start with the attribute age.
 We need to look at the distribution of yes and no
tuples for each category of age.
 For the age
category “youth,” there are two yes tuples and three no
tuples.
category “middle aged,” there are four yes tuples and
zero no tuples.
category “senior,” there are three yes tuples and two no
tuples.
 Compute the expected information needed to classify
a tuple in D if the tuples are partitioned according to
4 age is
5
6
7
8
9
10
11
Quiz Time !!!
What is the complexity of decision tree?

As the data structure is tree the complexity is

in order of log, but the cost of designing tree
must be included.

12
Basic
Classificati
on Concepts
• Bayesian classifiers are statistical classifiers.
• They can predict class membership probabilities such as
the probability that a given tuple belongs to a particular
class.

• Bayesian classification is based on Bayes’ theorem.

• Naive Bayesian classifiers assume that the effect of an
attribute value on a given class is independent of the
values of the other attributes.
• This assumption is called class conditional
independence. It is made to simplify the computations
involved and, in this sense, is considered “naive.”

13
Classificati
on
Bayes’ Theorem
 Let X be a data tuple.
 In Bayesian terms, X is considered “evidence.”
 As usual, it is described by measurements made on a
set of n attributes.
 Let H be some hypothesis such as that the data tuple
X belongs to a specified class C.
 For classification problems, we want to determine
P(H|X), the probability that the hypothesis H holds
given the “evidence” or observed data tuple X.
 In other words, we are looking for the probability that
tuple X belongs to class C, given that we know the
attribute description of X.

14
Classificati
on Bayes’ Theorem
• P(H|X) is the posterior probability, or a posteriori
probability, of H conditioned on X.
• For example, suppose our world of data tuples is
confined to customers described by the attributes
age and income, respectively, and that X is a 35-
year-old customer with an income of $40,000.
• Suppose that H is the hypothesis that our customer
will buy a computer.
• Then P(H|X) reflects the probability that customer X
will buy a computer given that we know the
customer’s age and income.

15
Classificati
on Bayes’ Theorem
 In contrast, P(H) is the prior probability, or a priori
probability, of H.
 For our example, this is the probability that any given
customer will buy a computer, regardless of age,
income, or any other information, for that matter.
 The posterior probability, P(H|X), is based on more
information (e.g., customer information) than the prior
probability, P(H), which is independent of X.
 Similarly, P(X|H) is the posterior probability of X
conditioned on H.
 That is, it is the probability that a customer, X, is 35
years old and earns $40,000, given that we know the
customer will buy a computer.
 P(X) is the prior probability of X.
16
 Using our example, it is the probability that a person
Classificati
on Naive Bayesian Classification
 The naïve Bayesian classifier, or simple Bayesian
classifier, works as follows:
1. Let D be a training set of tuples and their
associated class labels. As usual, each tuple is
represented by an n-dimensional attribute vector,
X ={ x1, x2, : : : , xn}, depicting n measurements
made on the tuple from n attributes, respectively,
A1, A2, : : : , An.
2. Suppose that there are m classes, C1, C2, : : : ,
Cm. Given a tuple, X, the classifier will predict that
X belongs to the class having the highest posterior
probability, conditioned on X. That is, the na¨ıve
Bayesian classifier predicts that tuple X belongs to
the class Ci if and only if
17
Classificati
on
Naïve Bayesian Classification
2. Thus, we maximize P(Ci|X). The class Ci for which
P(Ci|X) is maximized is called the maximum
posteriori hypothesis. By Bayes’ theorem

3. As P(X) is constant for all classes, only P(X|Ci)/P(Ci)

needs to be maximized. If the class prior probabilities
are not known, then it is commonly assumed that the
classes are equally likely, that is, P(C1)= P(C2) =, …, =
P(Cm), and we would therefore maximize P(X|Ci).
Otherwise, we maximize P(X|Ci)P(Ci). Note that the
class prior probabilities may be estimated by P(Ci) = |
Ci,D|/|D|, where |Ci,D| is the number of training tuples of
class Ci in D.

18
Classificati
on Naïve Bayesian Classification
4. Given data sets with many attributes, it would be
extremely computationally expensive to compute
P(X|Ci). To reduce computation in evaluating P(X|Ci),
the na¨ıve assumption of class-conditional
independence is made. This presumes that the
attributes’ values are conditionally independent of
one another, given the class label of the tuple (i.e.,
that there are no dependence relationships among
the attributes). Thus,

19
Classificati
on Naïve Bayesian Classification
5. To predict the class label of X, P(X|Ci)P(Ci) is
evaluated for each class Ci . The classifier predicts
that the class label of tuple X is the class Ci if and
only if

In other words, the predicted class label is the class Ci for

which P(X|Ci)/P(Ci) is the maximum.

20
21
Classificati
on Example
 The data tuples are described by the attributes age,
income, student, and credit rating.
 The class label attribute, buys computer, has two
distinct values namely, {yes, no}).
 Let C1 correspond to the class buys computer D yes
and C2 correspond to buys computer D no. The tuple
we wish to classify is

22
Classificati
on Example
 We need to maximize P(X|Ci)P(Ci), for i=1, 2.
 P(Ci), the prior probability of each class, can be
computed based on the training tuples:

23
Classificati
on Example

Using these probabilities, we obtain

Similarly,

To find the class, Ci , that maximizes P(X|Ci)P(Ci), we

compute

24
Naïve Bayes
Figure 1 The Train Dataset

weekday winter high heavy ?????

26
For the Train dataset

27
28
29
Features CLASSIFICATION CLUSTERING

Type of learning Supervised Unsupervised

Algorithms available Naïve Bayesian,SVM K-means,K-Medoid

Type of dataset Labeled dataset Unlabeled dataset

Application Weather prediction, Customer

covid detection segmentation
Basic criteria Information gain, gain Distance measures
ratio, gini index
Process Classify an unknown Group
sample
Type of data Discrete valued data Numeric data

Accuracy measures Confusion Mean square error

matrix,F1score ,precis
30
ion, recall
Clustering

31
What Is a Good Clustering?

A good clustering method will produce clusters

with
High intra-class similarity
Low inter-class similarity

The quality of a clustering result depends on

both the similarity measure used by the

method and its implementation

33
Requirements for Clustering in
Data Mining
 Scalability
 Ability to deal with different types of attributes
 Discovery of clusters with arbitrary shape
 Minimal domain knowledge required to
determine input parameters
 Ability to deal with noise and outliers
 Insensitivity to order of input records
 Robustness wrt high dimensionality
 Incorporation of user-specified constraints
 Interpretability and usability

34
Major Clustering
Approaches
 Partitioning approach:
 Construct various partitions and then evaluate them by

some criterion, e.g., minimizing the sum of square errors

 Typical methods: k-means, k-medoids.

 Hierarchical approach:
 Create a hierarchical decomposition of the set of data (or

objects) using some criterion

 Typical methods: Agglomerative, Diana, BIRCH

 Density-based approach:
 Based on connectivity and density functions

 Typical methods: DBSACN, OPTICS

35
36
Partitioning Algorithms

 Partitioning method: Construct a partition of a

database D of n objects into a set of k clusters
 Given a k, find a partition of k clusters that
optimizes the chosen partitioning criterion
k-means (MacQueen, 1967): Each cluster is
represented by the center of the cluster
k-medoids or PAM (Partition around medoids)
(Kaufman & Rousseeuw, 1987): Each cluster is
represented by one of the objects in the cluster

37
K-Means Clustering

 Given k, the k-means algorithm is implemented in

four steps:
Partition objects into k nonempty subsets
Compute seed points as the centroids of the
clusters of the current partition (the centroid is the
center, i.e., mean point, of the cluster)
Assign each object to the cluster with the nearest
seed point
Go back to k Step 2, stop when
2 no more new
E   p m i
assignment i 1 p  Ci

38
K-Means Clustering (contd.)

 Example
10 10
10
9 9
9
8 8
8
7 7
7
6 6
6
5 5
5
4 4
4
Assign 3 Update 3
3

2 each
2 the 2

1
objects
1

0
cluster 1

0
0
0 1 2 3 4 5 6 7 8 9 10 to most
0 1 2 3 4 5 6 7 8 9 10 means 0 1 2 3 4 5 6 7 8 9 10

similar
center
10 10

K=2 9 9

8 8

Arbitrarily choose K 7 7

object as initial
6 6

5 5

cluster center 4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

39
39
K-means Example

For simplicity, 1-dimensional data and k=2.

Data: 1, 2, 5, 6, 7
K-means:
Randomly select 5 and 6 as initial centroids;
=> Two clusters {1, 2, 5} and {6,7};
 mean C1=8/3 = 2.66 & mean C2=6.5
=> {1,2}, {5,6,7};
 mean C1=1.5 & mean C2=6
=> No Change

40
Comments on the K-Means
Method

 Strengths
Relatively efficient: O(nkt), where n is # objects, k
is # clusters, and t is # iterations. Normally, k, t
<< n.

 Weaknesses
Applicable only when mean is defined (what about
categorical data?)
Need to specify k, the number of clusters, in
advance
Trouble with noisy data and outliers
Instead of taking the mean value of the object in a
Not suitable to discover clusters with non-convex
cluster as a reference point, medoids can be used, which
shapes
is the most centrally located object in a cluster.
41
Example: K-Means

Figure: Initial
Choice of Centroids

42 Figure: Objects For

Clustering
Figure: Centroids After First
Iterations

Figure: Centroids After First Two

Iterations

43
Summary
 Classification and prediction are two forms of
data analysis that can be used to extract models
describing important data classes or to predict future
data trends.
 Effectiveand scalable methods have been
developed for decision trees induction, Naive
Bayesian classification, Bayesian belief
network, rule-based classifiers etc.
 No single method has been found to be superior over

all others for all data sets.

Books
Text Books:
1. Han, Kamber, "Data Mining Concepts and Techniques", Morgan
Kaufmann 3nd Edition
2. P. N. Tan, M. Steinbach, Vipin Kumar, “Introduction to Data
Mining”, Pearson Education
3. M. H. Dunham, Data Mining Techniques and Algorithms”,
Prentice Hall-2000.

Marrington - Recording Classical Guitar
No ratings yet
Marrington - Recording Classical Guitar
445 pages
Allison MT (B) 600 Series Transmissions Service Manual PDF
100% (4)
Allison MT (B) 600 Series Transmissions Service Manual PDF
56 pages
6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
3 PDF
No ratings yet
3 PDF
244 pages
MindUP 3-5 Sample
100% (1)
MindUP 3-5 Sample
11 pages
Lecture12-Ch8-ClassBasic-Part2
No ratings yet
Lecture12-Ch8-ClassBasic-Part2
22 pages
A5 PDF
No ratings yet
A5 PDF
9 pages
Lecture 8 - Naive Bayes
No ratings yet
Lecture 8 - Naive Bayes
27 pages
Bayesian
No ratings yet
Bayesian
23 pages
Lesson 3.3 - Supervised Learning Rule Based Classification
No ratings yet
Lesson 3.3 - Supervised Learning Rule Based Classification
43 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
3 - Bayesian Classification
No ratings yet
3 - Bayesian Classification
15 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
60 pages
Chapter 4: Classification & Prediction: 4.1 Basic Concepts of Classification and Prediction 4.2 Decision Tree Induction
No ratings yet
Chapter 4: Classification & Prediction: 4.1 Basic Concepts of Classification and Prediction 4.2 Decision Tree Induction
19 pages
Unit 4
No ratings yet
Unit 4
207 pages
DWDM Unit 3 Part 2
No ratings yet
DWDM Unit 3 Part 2
8 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
16 pages
Bayesian Classification- problem (1)
No ratings yet
Bayesian Classification- problem (1)
4 pages
Bayesian Learning: Berrin Yanikoglu
No ratings yet
Bayesian Learning: Berrin Yanikoglu
64 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
CS1004 DataMining Unit 4 Notes
No ratings yet
CS1004 DataMining Unit 4 Notes
8 pages
IME672 - Lecture 44
No ratings yet
IME672 - Lecture 44
16 pages
TTDS Lecture 5
No ratings yet
TTDS Lecture 5
8 pages
Unit-Iv Data Classification: Data Warehousing and Data Mining
No ratings yet
Unit-Iv Data Classification: Data Warehousing and Data Mining
7 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
Mod09-ppt2-ML_in_Image_Classification
No ratings yet
Mod09-ppt2-ML_in_Image_Classification
30 pages
Classification FoundationalMathofAI S24
No ratings yet
Classification FoundationalMathofAI S24
6 pages
Data Mining Chapter
No ratings yet
Data Mining Chapter
6 pages
Note 1518944988
No ratings yet
Note 1518944988
27 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Unit II Classifications
No ratings yet
Unit II Classifications
18 pages
Naive Bayesian Classifier: National Institute of Technology Sikkim
No ratings yet
Naive Bayesian Classifier: National Institute of Technology Sikkim
6 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
60 pages
Chapter
100% (1)
Chapter
101 pages
Unit 3 in Machine Intelligence
No ratings yet
Unit 3 in Machine Intelligence
62 pages
A Complete Guide To KNN
No ratings yet
A Complete Guide To KNN
16 pages
Binary Classification
No ratings yet
Binary Classification
2 pages
Calibrated Lazy Associative Classification: Abstract. Classification Is An Important Problem in Data Mining. Given An Ex
No ratings yet
Calibrated Lazy Associative Classification: Abstract. Classification Is An Important Problem in Data Mining. Given An Ex
15 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
2.3 Bayes classification
No ratings yet
2.3 Bayes classification
15 pages
8 - Classification NaiveBayes PDF
No ratings yet
8 - Classification NaiveBayes PDF
13 pages
Clustering: Analisis Big Data - Pertemuan 6
No ratings yet
Clustering: Analisis Big Data - Pertemuan 6
51 pages
Naive Bayes
No ratings yet
Naive Bayes
37 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
X (Age Youth, Income Medium, Student Yes, Credit Rating Fair)
No ratings yet
X (Age Youth, Income Medium, Student Yes, Credit Rating Fair)
2 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Parametric Classification PDF
No ratings yet
Parametric Classification PDF
46 pages
49 Machine Learning
No ratings yet
49 Machine Learning
300 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
Unsupervised K-Means Clustering Algorithm
No ratings yet
Unsupervised K-Means Clustering Algorithm
17 pages
Data Mining
No ratings yet
Data Mining
27 pages
Cluster-Analysis
No ratings yet
Cluster-Analysis
89 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Bayes Classification
No ratings yet
Bayes Classification
9 pages
Improved Naive Bayes With Optimal Correlation Factor For Text Classification
No ratings yet
Improved Naive Bayes With Optimal Correlation Factor For Text Classification
10 pages
Mid-Sem_11
No ratings yet
Mid-Sem_11
2 pages
DM Chapter 5 (Clustering)
No ratings yet
DM Chapter 5 (Clustering)
40 pages
Lecture 2 - Principle of Machine Learning
No ratings yet
Lecture 2 - Principle of Machine Learning
39 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
Unit 4
No ratings yet
Unit 4
20 pages
Post Op Weka Data Set Sample PDF
No ratings yet
Post Op Weka Data Set Sample PDF
8 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Naive Bayes Classifier: Fundamentals and Applications
From Everand
Naive Bayes Classifier: Fundamentals and Applications
Fouad Sabry
No ratings yet
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
57 pages
linear-models-for-classification
No ratings yet
linear-models-for-classification
21 pages
Prediction---accuracy
No ratings yet
Prediction---accuracy
33 pages
Chapter 1-ML
No ratings yet
Chapter 1-ML
27 pages
UNMO STM 2019 Instructional
100% (1)
UNMO STM 2019 Instructional
432 pages
Prof. Vinod S. Ramteke: Assistant Professor Department of Computer Science Janata Mahavidyalara, Chandrapur (M.S.)
No ratings yet
Prof. Vinod S. Ramteke: Assistant Professor Department of Computer Science Janata Mahavidyalara, Chandrapur (M.S.)
9 pages
Accurate Empathic Understanding
No ratings yet
Accurate Empathic Understanding
3 pages
GrandmasHouseP1 0.16 - Walkthrough Rev 1.15
No ratings yet
GrandmasHouseP1 0.16 - Walkthrough Rev 1.15
20 pages
Step 1: Reverse Out Knee Pain
100% (4)
Step 1: Reverse Out Knee Pain
11 pages
Atm Benzene Topic
No ratings yet
Atm Benzene Topic
31 pages
Goals & Objectives Setting - 2012-13 Frequently Asked Questions
No ratings yet
Goals & Objectives Setting - 2012-13 Frequently Asked Questions
3 pages
BT Reflex Brochure Manual
100% (1)
BT Reflex Brochure Manual
11 pages
CV For Esam A. Alwagait, PHD
No ratings yet
CV For Esam A. Alwagait, PHD
3 pages
Mitutoyo Catalog 2013 2014
No ratings yet
Mitutoyo Catalog 2013 2014
676 pages
Social Innovation in Education System by Using RPA
100% (1)
Social Innovation in Education System by Using RPA
4 pages
Anatomy & Physiology - Week 5
No ratings yet
Anatomy & Physiology - Week 5
35 pages
JD iOS Engineer
No ratings yet
JD iOS Engineer
2 pages
OpenStack Grizzly Architecture
No ratings yet
OpenStack Grizzly Architecture
10 pages
TM410 Leaflet
100% (1)
TM410 Leaflet
4 pages
Assignment 3 PDF
No ratings yet
Assignment 3 PDF
17 pages
Case Study 6-Send me a Pic
No ratings yet
Case Study 6-Send me a Pic
5 pages
774 2602 1 PB
No ratings yet
774 2602 1 PB
9 pages
Cadbury Dairy Milk Chocolate
No ratings yet
Cadbury Dairy Milk Chocolate
6 pages
Best Ethical Hacking Roadmap 2024
No ratings yet
Best Ethical Hacking Roadmap 2024
14 pages
Ryan Class 6,7,8 PA-1 Sayllabus
No ratings yet
Ryan Class 6,7,8 PA-1 Sayllabus
97 pages
PYP Curriculum Guide Grade 5
No ratings yet
PYP Curriculum Guide Grade 5
20 pages
MVC Lecture-002 Limits
No ratings yet
MVC Lecture-002 Limits
15 pages
College Readiness of Senior High School Students
83% (6)
College Readiness of Senior High School Students
10 pages
Math - January 2022 (R) QP 2F
No ratings yet
Math - January 2022 (R) QP 2F
28 pages
Sample R1 Paper1
No ratings yet
Sample R1 Paper1
3 pages

Classification-Clustering

Uploaded by

Classification-Clustering

Uploaded by

Machine Learning

Classification and Clustering

As the data structure is tree the complexity is

• Bayesian classification is based on Bayes’ theorem.

3. As P(X) is constant for all classes, only P(X|Ci)/P(Ci)

In other words, the predicted class label is the class Ci for

Using these probabilities, we obtain

To find the class, Ci , that maximizes P(X|Ci)P(Ci), we

weekday winter high heavy ?????

Type of learning Supervised Unsupervised

Algorithms available Naïve Bayesian,SVM K-means,K-Medoid

Type of dataset Labeled dataset Unlabeled dataset

Application Weather prediction, Customer

Accuracy measures Confusion Mean square error

A good clustering method will produce clusters

The quality of a clustering result depends on

both the similarity measure used by the

some criterion, e.g., minimizing the sum of square errors

objects) using some criterion

 Typical methods: DBSACN, OPTICS

 Partitioning method: Construct a partition of a

 Given k, the k-means algorithm is implemented in

For simplicity, 1-dimensional data and k=2.

42 Figure: Objects For

Figure: Centroids After First Two

all others for all data sets.

You might also like