0% found this document useful (0 votes)

44 views

Unit - Iii

1. The document discusses supervised and unsupervised learning methods. Supervised learning uses labeled training data for classification tasks, while unsupervised learning uses unlabeled data to establish clusters in the data. 2. Decision tree induction is a supervised learning method where internal nodes represent attribute tests, branches represent outcomes, and leaf nodes hold class labels. Algorithms like ID3, C4.5, and CART are used to build decision trees from training data. 3. Naive Bayes classification is a probabilistic method that applies Bayes' theorem. It assumes conditional independence between attributes, allowing it to classify data efficiently based on attribute values and prior probabilities.

Uploaded by

Laxmi

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

Unit - Iii

Uploaded by

Laxmi

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 52

UNIT - III

 Supervised learning (classification)

 Supervision: The training data (observations, measurements,
etc.) are accompanied by labels indicating the class of the
observations
 New data is classified based on the training set
 Unsupervised learning (clustering)
 The class labels of training data is unknown
 Given a set of measurements, observations, etc. with the aim of
establishing the existence of classes or clusters in the data
1. Preparing the data for classification and prediction
2. Comparing Classification and Prediction Methods
 Decision tree induction is the learning of decision trees from
class-labeled training tuples.

 A decision tree is a flowchart like tree structure where

 each internal node(non-leaf node) denotes a test on an attribute

 each branch represents an outcome of the test

 each leaf node (terminal node) holds a class label

 The topmost node in a tree is the root node

 Internal nodes are represented by rectangles

 leaf nodes are represented by ovals
 Given a tuple X, for which class label is unknown, the attribute
values of the tuple are tested against the decision tree.

 A path is traced from root to leaf node, which holds the class
prediction for that tuple.
 The decision tree induction algorithms are:
 ID3 (Iterative Dichotomiser)

 C4.5 (a successor of ID3)

 CART (Classification and Regression Trees)

The splitting criterion indicates the splitting attribute and may also indicat
either a split-point or a splitting subset.
 Each leaf node contains examples of one type

 Algorithm ran out of attributes

 No further significant information gain

 The C4.5 algorithm introduces a number of improvements over the
original ID3 algorithm.
 The C4.5 algorithm can handle missing data.
 If the training records contain unknown attribute values, the C4.5
evaluates the gain for an attribute by considering only the records
where the attribute is defined.
 Both categorical and continuous attributes are supported by C4.5
 Values of continuous variable are sorted and partitioned
 For the corresponding records of each partition, the gain is calculated,
and the partition that maximizes the gain is chosen for the next split.
 The Id3 algorithm may construct a deep and complex tree, which
would cause overfitting.
 The C4.5 algorithm addresses the overfitting problem in ID3 by using
a bottom-up technique called pruning to simplify the tree by removing
the least visited nodes and branches.
Similarly, find gain ratios for other attributes (age, student, credit_rating)
and the attribute with maximum gain ratio is selected as the splitting
attribute.
 Let D be the training data of Table, where there are nine tuples
belonging to the class buys computer = yes and the remaining five
tuples belong to the class buys computer = no. A (root) node N is
created for the tuples in D.

 Gini index to compute the impurity of D:

 Gini(D) = 1 – (9/14)2 – (5/14)2

 To find the splitting criterion for the tuples in D, we need to

compute the gini index for each attribute. Let’s start with the
attribute income and consider each of the possible splitting subsets.
Consider the subset {low, medium}. This would result in 10 tuples
in partition D1 satisfying the condition “income ∈ {low, medium}”
.The remaining four tuples of D would be assigned to partition D2.
The Gini index value computed based on this partitioning is
Similarly, find the Gini index values for splits on the remaining subsets
(for the subsets{low, high} and {medium}) which is 0.47
(for the subsets {medium, high} and {low}) which is 0.34

Therefore, the best binary split for the attribute income is on

({medium, high} or {low}) because it minimizes the gini index.
 Represent the knowledge in the form of IF-THEN rules
 One rule is created for each path from the root to a leaf
 Each attribute-value pair along a path forms a conjunction
 The leaf node holds the class prediction
 Rules are easier for humans to understand
 Example
IF age = “youth” AND student = “yes” THEN buys_computer =
“yes”
IF age = “youth” AND student = “no” THEN buys_computer = “no”
IF age = “middle_aged” THEN buys_computer =
“yes”
IF age = “senior” AND credit_rating = “fair” THEN
buys_computer = “yes”
IF age = “senior” AND credit_rating = “excellent” THEN
buys_computer = “no”
 Computationally inexpensive
 Outputs are easy to interpret – sequence of tests
 Show importance of each input variable
 Decision trees handle
 Both numerical and categorical attributes

 Categorical attributes with many distinct values

 Variables with nonlinear effect on outcome

 Variable interactions
 Overfitting can occur because each split reduces training data for
subsequent splits
NOTE:-Tree pruning methods address problem of overfitting
Definition:- Tree pruning attempts to identify and remove those
branches having anomalies , with the goal of improving
classification accuracy on unseen data.

 Poor if dataset contains many irrelevant variables

 The generated tree may overfit the training data
 Too many branches, some may reflect anomalies due to noise
or outliers
 Result is in poor accuracy for unseen samples
 Two approaches to avoid overfitting
 Prepruning: Halt tree construction early—do not split a node if
this would result in the goodness measure falling below a
threshold
 Difficult to choose an appropriate threshold
 Postpruning: Remove branches from a “fully grown” tree—get
a sequence of progressively pruned trees
 Use a set of data different from the training data to decide
which is the “best pruned tree”
 Allow for continuous-valued attributes
 Dynamically define new discrete-valued attributes that
partition the continuous attribute value into a discrete set of
intervals
 Handle missing attribute values
 Assign the most common value of the attribute

 Assign probability to each of the possible values

 Attribute construction
 Create new attributes based on existing ones that are sparsely
represented
 This reduces fragmentation, repetition, and replication
 Classification—a classical problem extensively studied by
statisticians and machine learning researchers
 Scalability: Classifying data sets with millions of examples and
hundreds of attributes with reasonable speed
 Why decision tree induction in data mining?
 relatively faster learning speed (than other classification
methods)
 convertible to simple and easy to understand classification
rules
 can use SQL queries for accessing databases

 comparable classification accuracy with other methods

 SLIQ (Supervised Learning in Quest) - builds an index for each
attribute and only class list and the current attribute list reside in
memory
 SPRINT (Scalable PaRallelizable INduction of decision Trees) -
constructs an attribute list data structure
 PUBLIC (VLDB’98 — Rastogi & Shim) - integrates tree splitting
and tree pruning: stop growing the tree earlier
 RainForest (VLDB’98 — Gehrke, Ramakrishnan & Ganti)
 separates the scalability aspects from the criteria that determine

the quality of the tree

 maintains an AVC-list (attribute, value, class label) for each

attribute
 BOAT (Bootstrapped Optimistic Algorithm for Tree Construction) -
not based on any special data structures but uses a technique known
as “boot strapping”
 A statistical classifier: performs probabilistic prediction i.e.,
predicts class membership probabilities

 Foundation: Based on Bayes’ theorem (named after Thomas

Bayes)

 Performance: A simple bayesian classifier known as naive

Bayesian classifier has comparable performance with decision tree
and selected neural network classifier.

 Class Conditional Independence: Naive Bayesian classifiers

assume that the effect of an attribute value on a given class is
independent of the values of other attributes.
 It is made to simplify the computations
 Let X be a data sample(tuple) called evidence
 Let H be a hypothesis (our prediction) that X belongs to class C
 Classification is to determine P(H | X), the probability that the
hypothesis H holds given the evidence or observed data tuple X
 Example: Customer X will buy a computer given the
customer’s age and income
 P(H) (prior probability), the initial probability
 E.g., X will buy computer, regardless of age , income or any
other information
 P(X): probability that sample data is observed
 P(X | H) (posterior probability), the probability of observing the
sample X, given that the hypothesis holds
 E.g., Given that X will buy computer, the probability that X is
31...40, medium income
 The naïve Bayesian classifier, or simple Bayesian classifier, works
as follows:
 1.Let D be a training set of tuples and their associated class labels.
As usual, each tuple isrepresented by an n-dimensional attribute
vector, X = (x1, x2, …,xn), depicting n measurements made on the
tuple from n attributes, respectively, A1, A2, …, An.
 2.Suppose that there are m classes, C1, C2, …, Cm. Given a tuple,
X, the classifier will predict that X belongs to the class having the
highest posterior probability, conditioned on X. That is, the naïve
Bayesian classifier predicts that tuple X belongs to the class Ci if
and only if
 Thus we maximize P(CijX). The class Ci for which P(CijX) is maximized
is called the maximum posteriori hypothesis. By Bayes’ theorem

 3. As P(X) is constant for all classes, only P(X|Ci)P(Ci) need be

maximized. If the class prior probabilities are not known, then it is
commonly assumed that the classes are equally likely, that is, P(C1) =
P(C2) = …= P(Cm), and we would therefore maximize P(X|Ci).
Otherwise, we maximize P(X|Ci)P(Ci).

 4. Given data sets with many attributes, it would be extremely

computationally expensive to compute P(X|Ci). In order to reduce
computation in evaluating P(X|Ci), the naive assumption of class
conditional independence is made. This presumes that the values of the
attributes are conditionally independent of one another, given the class
label of the tuple. Thus,
 We can easily estimate the probabilities P(x1|Ci), P(x2|Ci), : : : ,
P(xn|Ci) from the training tuples. For each attribute, we look at
whether the attribute is categorical or continuous-valued. For
instance, to compute P(X|Ci), we consider the following:

 If Ak is categorical, then P(xk|Ci) is the number of tuples of

class Ci in D having the value xk for Ak, divided by |Ci,D| the
number of tuples of class Ci in D.

 If Ak is continuous-valued, then we need to do a bit more work,

but the calculation is pretty straightforward.
 A continuous-valued attribute is typically assumed to have a
Gaussian distribution with a mean μ and standard deviation ,
defined by

5.In order to predict the class label of X, P(XjCi)P(Ci) is evaluated

for each class Ci. The classifier predicts that the class label of
tuple X is the class Ci if and only if
Classify the tuple
X=(age=youth, income=medium, student=yes, credit_rating=fair)
 A BBN is a probabilistic Graphical Model that represents
conditional dependencies between random variables through a
Directed Acyclic Graph (DAG).
 The graph consists of nodes and arcs.
 The node represents variables, which can be discrete or
continuous.
 The arcs represent causal relationships between variables.
 BBNs are also called as belief networks, Bayesian networks, and
probabilistic networks.
 BBNs enable us to model and reason about uncertainty
 BBNs represent joint probability distribution
 Two types of probabilities are used

 Joint Probability

 Conditional probability
 These probabilities can help us make an inference.
 A belief network is defined by two components:
 A directed acyclic graph encoding the dependence
relationships among set of variables
 A set of conditional probability tables (CPT) associating each
node to its immediate parent nodes.
 When given a training tuple, a lazy learner simply stores it and
waits until it is given a test tuple.

 They are also referred as instance-based learners.

 Examples of lazy learners

 k-nearest neighbour classifiers
 k-NN is a supervised machine learning algorithm
 Nearest-neighbour classifiers are based on learning by analogy
i.e., by comparing a given test tuple with training tuples that are
similar to it.
 Intuition: Given some training data and a new data point, we
would assign the new data based on the class of the training data it
is nearer to.
 Simplest of all machine learning algorithms
 No explicit training required.
 Can be used both for classification and regression.
 The training tuples are described by ‘n’ attributes where each tuple
represents a point in an n-dimensional space. In this way all of the
training tuples are stored in an n-dimensional pattern space.
 When given a unknown tuple, a k-nearest-neighbour classifier
searches the pattern space for the k-training tuples that are closest
to the unknown tuple.
 Closeness is defined in terms of a distance metric: such as
Euclidean distance.
 Euclidean distance between two points or tuples say,
X1=(x11,x12,..,x1n) and X2=(x21,x22,x23,....,x2n) is
 How can distance be computed for attributes that not numeric, but
categorical, such as color?
 Assume that the attributes used to describe the tuples are all
numeric.
 For categorical attributes, a simple method is to compare the
corresponding value of the attribute in tuple X1 with that in
tuple X2. If the two are identical (e.g., tuples X1 and X2 both
have the color blue), then the difference between the two is
taken as 0.
 If the two are different (e.g., tuple X1 is blue but tuple X2 is
red), then the difference is considered to be 1.
Name Age Gender Sport
Ajay 32 M Football
Mark 40 M Neither
Sara 16 F Cricket
Zaira 34 F Cricket
Sachin 55 M Neither
Rahul 40 M Cricket
Pooja 20 F Neither
Smith 15 M Cricket
Michael 15 M Football
Angelina 5 F ? Cricket

k=3 Male=0 Female=1

Name Age Gender Distance Class of Sport
Ajay 32 0 27.02 Football
Mark 40 0 35.01 Neither
Sara 16 1 11.00 Cricket
Zaira 34 1 29.00 Cricket
Sachin 55 0 50.00 Neither
Rahul 40 0 35.01 Cricket
Pooja 20 1 15.00 Neither
Smith 15 0 10.04 Cricket
Michael 15 0 10.04 Football

k=3 , so 3 closest records to Angelina smith 10.04 Cricket

Michael 10.04 Football
2 cricket > 1 football
Sara 11.00 Cricket
So Angelina’s class of sport is cricket

Pipesim User Guide PDF
No ratings yet
Pipesim User Guide PDF
786 pages
Aerodrome Emergency Plan Presentation
100% (1)
Aerodrome Emergency Plan Presentation
22 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
DWDM Module IV
No ratings yet
DWDM Module IV
57 pages
Unit 4
No ratings yet
Unit 4
20 pages
04. UNIT-IV(DMWH6EM)
No ratings yet
04. UNIT-IV(DMWH6EM)
33 pages
Unit-3 Classification
No ratings yet
Unit-3 Classification
28 pages
Unit-Iii: Classification and Prediction
No ratings yet
Unit-Iii: Classification and Prediction
21 pages
CS467-M4-Machine Learning-Ktustudents - in
No ratings yet
CS467-M4-Machine Learning-Ktustudents - in
9 pages
Classification and Prediction
No ratings yet
Classification and Prediction
21 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
DWM UNIT-V NOTES
No ratings yet
DWM UNIT-V NOTES
15 pages
Unit-3 (MLT)
No ratings yet
Unit-3 (MLT)
46 pages
Dwdm-Unit-3 R16
No ratings yet
Dwdm-Unit-3 R16
14 pages
Class i Fiers
No ratings yet
Class i Fiers
24 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
Assignment 04
No ratings yet
Assignment 04
17 pages
siv UNIT-3 Classification DWM PART-A
No ratings yet
siv UNIT-3 Classification DWM PART-A
12 pages
Decision Tree and Related Techniques For Classification in Scalation
No ratings yet
Decision Tree and Related Techniques For Classification in Scalation
12 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Decisiontree
No ratings yet
Decisiontree
6 pages
DWDM Asgmnt Prog
No ratings yet
DWDM Asgmnt Prog
51 pages
Unit 3 - Classification
No ratings yet
Unit 3 - Classification
28 pages
DM Module 4
No ratings yet
DM Module 4
12 pages
Data Minin1
No ratings yet
Data Minin1
104 pages
Unit Iii DM
No ratings yet
Unit Iii DM
48 pages
Case Study-1: Department of Computer Science and Engineering (7 Semester)
No ratings yet
Case Study-1: Department of Computer Science and Engineering (7 Semester)
16 pages
Unit-III Classification
No ratings yet
Unit-III Classification
10 pages
Classification and Prediction
100% (1)
Classification and Prediction
31 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
DWDM Unit 4 PDF
No ratings yet
DWDM Unit 4 PDF
18 pages
Unit-4 DM
No ratings yet
Unit-4 DM
15 pages
Machine Learning QNA
No ratings yet
Machine Learning QNA
1 page
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
No ratings yet
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
75 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
UNIT III DM (2)
No ratings yet
UNIT III DM (2)
48 pages
DWDM Unit 3-Part 1
No ratings yet
DWDM Unit 3-Part 1
14 pages
Machine Learning Lab: Delhi Technological University
No ratings yet
Machine Learning Lab: Delhi Technological University
6 pages
Module 3_classification
No ratings yet
Module 3_classification
9 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Unit-4 - Data Ware
No ratings yet
Unit-4 - Data Ware
59 pages
Unit 3
No ratings yet
Unit 3
16 pages
Data Structures: Notes For Lecture 13 Techniques of Data Mining by Samaher Hussein Ali
No ratings yet
Data Structures: Notes For Lecture 13 Techniques of Data Mining by Samaher Hussein Ali
8 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
Performance Analysis of Decision Tree Classifiers
100% (1)
Performance Analysis of Decision Tree Classifiers
9 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Classification and Prediction
No ratings yet
Classification and Prediction
69 pages
Module III
No ratings yet
Module III
130 pages
DataMining_Unit-3
No ratings yet
DataMining_Unit-3
8 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
DAR LECT 12
No ratings yet
DAR LECT 12
29 pages
Exp 3 121a1047 Lavanya Kurup ML
No ratings yet
Exp 3 121a1047 Lavanya Kurup ML
4 pages
Unit-3 Decision Tree Learning (Februray 26, 2024)
No ratings yet
Unit-3 Decision Tree Learning (Februray 26, 2024)
51 pages
Carla Karina Garces: Bachelor of Business Administration in Accounting
No ratings yet
Carla Karina Garces: Bachelor of Business Administration in Accounting
1 page
Couse Work
No ratings yet
Couse Work
3 pages
RBI GRADE B Syllabus
No ratings yet
RBI GRADE B Syllabus
9 pages
Term One Accountancy 12 QP & MS
No ratings yet
Term One Accountancy 12 QP & MS
13 pages
2.1.2 - Handout Writing Templates and Lyn's Experience of Vstep Exam in Vlu (20!08!2023) ?
No ratings yet
2.1.2 - Handout Writing Templates and Lyn's Experience of Vstep Exam in Vlu (20!08!2023) ?
25 pages
Kami Export - FURTHER PRACTICE 5.2-USE OF ENGLISH
No ratings yet
Kami Export - FURTHER PRACTICE 5.2-USE OF ENGLISH
3 pages
Sena
100% (1)
Sena
483 pages
Quizlet - SOM 122 Chapter 17 - Managing Business Finances
No ratings yet
Quizlet - SOM 122 Chapter 17 - Managing Business Finances
3 pages
Week 03 Student Worksheets For Grammar
No ratings yet
Week 03 Student Worksheets For Grammar
6 pages
Example4 FatigueTools WS01-SN
No ratings yet
Example4 FatigueTools WS01-SN
23 pages
AD-120ES Parts Manual Phase 7: 88 Currant Road Fall River, MA 02720-4781 Telephone: (508) 678-9000 / Fax: (508) 678-9447
No ratings yet
AD-120ES Parts Manual Phase 7: 88 Currant Road Fall River, MA 02720-4781 Telephone: (508) 678-9000 / Fax: (508) 678-9447
60 pages
Cancer Epidemiology
No ratings yet
Cancer Epidemiology
80 pages
Sahodhaya - STD X Maths QP Hosur
100% (1)
Sahodhaya - STD X Maths QP Hosur
8 pages
Parsons 9th Cir Opinion 6-5-14
No ratings yet
Parsons 9th Cir Opinion 6-5-14
63 pages
[FREE PDF sample] Jaqaru Outline of phonological and morphological structure Martha James Hardman ebooks
100% (1)
[FREE PDF sample] Jaqaru Outline of phonological and morphological structure Martha James Hardman ebooks
79 pages
Module 2 Proposal Presentaion Harvey Elvins
No ratings yet
Module 2 Proposal Presentaion Harvey Elvins
6 pages
The Tone Unit
No ratings yet
The Tone Unit
2 pages
Diary Writing (1)
No ratings yet
Diary Writing (1)
6 pages
Seller Registration - SOP
No ratings yet
Seller Registration - SOP
18 pages
UniSim Heat Exchangers User Guide
No ratings yet
UniSim Heat Exchangers User Guide
18 pages
IGCSE MATHS Booklet 2015 1 Numbers
No ratings yet
IGCSE MATHS Booklet 2015 1 Numbers
34 pages
Life Sciences Grade 11 Term 1 Week 3 - 2021
No ratings yet
Life Sciences Grade 11 Term 1 Week 3 - 2021
9 pages
International Business:Education Sector: PESTLE Analysis of CHINA
No ratings yet
International Business:Education Sector: PESTLE Analysis of CHINA
13 pages
Relative Clause - Grade 9-KEYS
No ratings yet
Relative Clause - Grade 9-KEYS
4 pages
Stat
No ratings yet
Stat
14 pages
SSC JE Civil Question Paper 23 March 2021 1st Shift With Answer
No ratings yet
SSC JE Civil Question Paper 23 March 2021 1st Shift With Answer
59 pages
160231514 Design a Layout Plan for Speciality Units to Print 2012
No ratings yet
160231514 Design a Layout Plan for Speciality Units to Print 2012
13 pages
A. Des Research
No ratings yet
A. Des Research
39 pages

Unit - Iii

Uploaded by

Unit - Iii

Uploaded by

UNIT - III

 Supervised learning (classification)

 A decision tree is a flowchart like tree structure where

 each branch represents an outcome of the test

 each leaf node (terminal node) holds a class label

 The topmost node in a tree is the root node

 Internal nodes are represented by rectangles

 C4.5 (a successor of ID3)

 CART (Classification and Regression Trees)

 Algorithm ran out of attributes

 No further significant information gain

 Gini index to compute the impurity of D:

 To find the splitting criterion for the tuples in D, we need to

Therefore, the best binary split for the attribute income is on

 Categorical attributes with many distinct values

 Variables with nonlinear effect on outcome

 Poor if dataset contains many irrelevant variables

 Assign probability to each of the possible values

 comparable classification accuracy with other methods

the quality of the tree

 Foundation: Based on Bayes’ theorem (named after Thomas

 Performance: A simple bayesian classifier known as naive

 Class Conditional Independence: Naive Bayesian classifiers

 3. As P(X) is constant for all classes, only P(X|Ci)P(Ci) need be

 4. Given data sets with many attributes, it would be extremely

 If Ak is categorical, then P(xk|Ci) is the number of tuples of

 If Ak is continuous-valued, then we need to do a bit more work,

5.In order to predict the class label of X, P(XjCi)P(Ci) is evaluated

 They are also referred as instance-based learners.

 Examples of lazy learners

k=3 Male=0 Female=1

k=3 , so 3 closest records to Angelina smith 10.04 Cricket

You might also like