0% found this document useful (0 votes)

124 views

Classification and Regression Trees (CART - I) : Dr. A. Ramesh

1) Classification and Regression Trees (CART) is a supervised learning technique that uses a recursive partitioning process to generate decision trees for classification or regression. 2) CART builds decision trees in a top-down greedy manner by splitting nodes using an attribute selection measure such as information gain or Gini index. 3) The attribute selection measure determines the splitting criterion that best separates the training data at each node into individual classes or distinct target values.

Uploaded by

Vaibhav Gupta

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

124 views

Classification and Regression Trees (CART - I) : Dr. A. Ramesh

Uploaded by

Vaibhav Gupta

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Classification and Regression Trees (CART - I)

Dr. A. Ramesh
DEPARTMENT OF MANAGEMENT STUDIES

1
Agenda

• Introduction to Classification and Regression Trees

• Attribute selection measures – Introduction

2
Introduction

• Classiﬁcation is one form of data analysis that can be used to extract

models describing important data classes or to predict future data trends
• Classiﬁcation predicts categorical (discrete, unordered) labels whereas
Regression analysis is a statistical methodology that is most often used for
numeric (continuous) prediction
• For example, we can build a classiﬁcation model to categorize bank loan
applications as either safe or risky
• Regression model is used to predict expenditures in dollars of potential
customers on computer equipment given their income and occupation

3
Problem Description for Illustration

Han, J., Pei, J. and Kamber, M., 2011. Data mining: concepts and
techniques. Elsevier.

4
Root Node, Internal Node, Child Node
Root Node or
Internal parent node
• A decision tree uses a tree structure to
represent a number of possible decision paths Node
and an outcome for each path
• A decision tree consists of root node, internal
node and leaf node
• The topmost node in a tree is the root node
or parent node
• It represents entire sample population
• Internal node (non-leaf node) denotes a test
on an attribute, each branch represents an
outcome of the test
• Leaf node (or terminal node or child node)
holds a class label Child
• It can not be further split
Node

5
Decision Tree Introduction

• A decision tree for the concept buys_computer, indicating whether a

customer at All Electronics is likely to purchase a computer

• Each internal (non-leaf) node

represents a test on an attribute
• Each leaf node represents a class
(either buys_computer = yes or
buys computer = no).

Figure 1.1 : Decision Tree

6
CART Introduction

• CART comes under supervised learning technique

• CART adopt a greedy(i.e., non backtracking) approach in which decision

trees are constructed in a top-down recursive divide-and-conquer manner

• It is very interpretable model

7
Decision Tree Algorithm
Input:
• Data partition, D, which is a set of
training tuples and their associated
class labels;
• Attribute list, the set of candidate
attributes;
• Attribute selection method, a
procedure to determine the splitting
criterion that “best” partitions the data
tuples into individual classes. This
criterion consists of a splitting attribute
and, possibly, either a split point or
splitting subset.
Output: A decision tree

8
Decision Tree Algorithm

• The algorithm is called with three parameters: D, attribute list, and

Attribute selection method
• D is defined as a data partition. Initially, it is the complete set of training
tuples and their associated class labels
• The parameter attribute list is a list of attributes or independent variables
which are describing the tuples
• Attribute selection method speciﬁes a heuristic procedure for selecting
the attribute that “best” discriminates the given tuples according to class

9
Decision Tree Algorithm

• This procedure employs an attribute selection measure, such as

information gain, gain ratio or the Gini index.
• Whether the tree is strictly binary is generally driven by the attribute
selection measure
• Some attribute selection measures, such as the Gini index, enforce the
resulting tree to be binary. Others, like information gain, do not, therein
allowing multiway splits (i.e., two or more branches to be grown from a
node).

10
Decision Tree Method

N-Node
C- Class
D- tuples in training data set

11
Decision Tree Method step 1 to 6
• The tree starts as a single node, N,
representing the training tuples in D (step
1).
• If the tuples in D are all of the same class,
then node N becomes a leaf and is
labelled with that class (steps 2 and 3)
• Steps 4 and 5 are terminating conditions
• Otherwise, the algorithm calls Attribute
selection method to determine the
splitting criterion
• The splitting criterion (like Gini) tells us
which attribute to test at node N by
determining the “best” way to separate
or partition the tuples in D into individual
classes (step 6)

12
Decision Tree Method - Step 7 - 11
• The splitting criterion indicates the splitting
attribute and may also indicate either a
split-point or a splitting subset
• The splitting criterion is determined so
that, ideally, the resulting partitions at each
branch are as “pure” as possible. A
partition is pure if all of the tuples in it
belong to the same class.
• The node N is labelled with the splitting
criterion, which serves as a test at the node
(step 7).
• A branch is grown from node N for each of
the outcomes of the splitting criterion.
• The tuples in D are partitioned accordingly
(steps 10 to 11)

13
Three possibilities for partitioning tuples based on the
splitting criterion
• There are three possible scenarios, as illustrated in Figure (a), (b) and (c).
• Let A be the splitting attribute. A has ‘v’ distinct values,{a1,a2,...,av}, based
on the training data
• If A is discrete-valued in figure (a), then one branch is grown for each
known value of A.

Figure (a)
14
Three possibilities for partitioning tuples based on the
splitting criterion
• If A is continuous-valued in figure (b), then two branches are grown,
corresponding to A ≤ split point and A > split point.
• Where split point is the split-point returned by Attribute selection method
as part of the splitting criterion.

Figure (b)

15
Three possibilities for partitioning tuples based on the
splitting criterion
• If A is discrete-valued and a binary tree must be produced, then the test is
of the form A ∈ 𝑆𝐴 , where 𝑆𝐴 is the splitting subset for A.

Figure (c)

16
Decision Tree Method – termination condition

• The algorithm uses the same process recursively to form a decision tree
for the tuples at each resulting partition, 𝐷𝑗 , of D (step 14).

• The recursive partitioning stops only when anyone of the following

terminating conditions is true:

1. All of the tuples in partition D (represented at node N) belong to the

same class (steps 2 and 3), or

17
Decision Tree Method – termination condition
1.
2. There are no remaining attributes on which the tuples may be further
partitioned (step4).
• In this case, majority voting is employed(step 5).
• This involves converting node N into a leaf and labelling it with the most
common class in D.
• Alternatively, the class distribution of the node tuples may be stored.
3. There are no tuples for a given branch, that is, a partition Dj is empty (step
12).
• In this case, a leaf is created with the majority class in D (step 13).
• The resulting decision tree is returned (step 15).

18
Attribute Selection Measures

• Attribute selection measures are also known as splitting rules because

they determine how the tuples at a given node are to be split
• It is a heuristic approach for selecting the splitting criterion that “best”
separates a given data partition, D, of class-labeled training tuples into
individual classes
• The attribute selection measure provides a ranking for each attribute
describing the given training tuples
• The attribute having the best score for the measure is chosen as the
splitting attribute for the given tuples

19
Attribute Selection Measures
• If the splitting attribute is continuous-valued or if we are restricted to binary
trees then, respectively, either a ‘split point’ or a ‘splitting subset’ must also be
determined as part of the splitting criterion

• There are three popular attribute selection measures

– information gain,
– gain ratio, and
– Gini index

• CART algorithm uses information gain and Gini index measure for attribute
selection

20
Attribute Selection Measures

21
Information Gain

• This measure studied the value or “information content” of messages

• The attribute with the highest information gain is chosen as the splitting
attribute for node
• This attribute minimizes the information needed to classify the tuples in
the resulting partitions and reﬂects the least randomness or “impurity” in
these partitions
• This approach minimizes the expected number of tests needed to classify
a given tuple

22
Information Gain-Entropy Measure
• The expected information needed to classify a
tuple in D is given by

• Where 𝑝𝑖 is the probability that an arbitrary

tuple in D belongs to class 𝐶𝑖 and is estimated
by |𝐶𝑖,𝐷 |/|D|.
• A log function to the base 2 is used, because
the information is encoded in bits
• Info(D) (or Entropy of D )is just the average
amount of information needed to identify the
class label of a tuple in D

23
Attribute Selection Measures

• It is quite likely that the partitions will be impure (e.g., where a partition
may contain a collection of tuples from different classes rather than from
a single class).
• How much more information would we still need (after the partitioning) in
order to arrive at an exact classification?
• This amount is measured by

• The term |𝐷𝑗 | / |D| acts as the weight of the 𝑗𝑡ℎ partition. 𝐼𝑛𝑓𝑜𝐴 (D) is
the expected information required to classify a tuple from D based on the
partitioning by A.
24
Information Gain

• The smaller the expected information (still) required, the greater the
purity of the partitions
• Information gain is defined as the difference between the original
information requirement (i.e., based on just the proportion of classes) and
the new requirement (i.e., obtained after partitioning on A). That is,

• The attribute A with the highest information gain, (Gain(A)), is chosen as

the splitting attribute at node N.

25
Gini Index
• Gini index is used to measures the
impurity of D, a data partition or set
of training tuples, as

• Where 𝑝𝑖 is the probability that a tuple

in Dbelongs to class 𝐶𝑖 and is
estimated by |𝐶𝑖,𝐷 |/|D|.
• The sum is computed over ‘m’classes.
• The Gini index considers a binary split
for each attribute

26
Gini Index

• When considering a binary split, we compute a weighted sum of the

impurity of each resulting partition
• For example, if a binary split on Apartitions Dinto 𝐷1 and 𝐷2 , the Gini
index of Dgiven that partitioning is-

• For each attribute, each of the possible binary splits is considered

• For a discrete-valued attribute, the subset that gives the minimum Gini
index for that attribute is selected as its splitting subset

27
Gini Index
• For continuous-valued attributes, each possible split-point must be considered
• The strategy is similar where the midpoint between each pair of (sorted)
adjacent values is taken as a possible split-point.
• For a possible split-point of A, 𝐷1 is the set of tuples in D satisfying A ≤ split
point, and 𝐷2 is the set of tuples in D satisfying A > split point.
• The reduction in impurity that would be incurred by a binary split on a
discrete- or continuous-valued attribute A is

• The attribute that maximizes the reduction in impurity (or, equivalently, has
the minimum Gini index) is selected as the splitting attribute

28
Which attribute selection measure is the best?

• All measures have some bias.

• The time complexity of decision tree generally increases exponentially
with tree height
• Hence, measures that tend to produce shallower trees (e.g., with
multiway rather than binary splits, and that favour more balanced splits)
may be preferred.
• However, some studies have found that shallow trees tend to have a large
number of leaves and higher error rates
• Several comparative studies suggests no one attribute selection measure
has been found to be signiﬁcantly superior to others.
29
Tree Pruning

• When a decision tree is built, many of the branches will reﬂect anomalies
in the training data due to noise or outliers
• Tree pruning use statistical measures to remove the least reliable
branches
• Pruned trees tend to be smaller and less complex and, thus, easier to
comprehend
• They are usually faster and better at correctly classifying independent test
data than unpruned trees

30
How does Tree Pruning Work?

• There are two common approaches to tree pruning: pre-pruning and post-
pruning.
• In the pre-pruning approach, a tree is “pruned” by halting its construction
early (e.g., by deciding not to further split or partition the subset of
training tuples at a given node).
• When constructing a tree, measures such as statistical signiﬁcance,
information gain, Gini index can be used to assess the goodness of a split.

31
How does Tree Pruning Work?

• The post-pruning approach removes sub_trees from a “fully grown” tree

• A subtree at a given node is pruned by removing its branches and
replacing it with a leaf
• The leaf is labelled with the most frequent class among the subtree being
replaced
• For example, the subtree at node “A3?” in the unpruned tree of Figure 1.2
• The most common class within this subtree is “class B”
• In the pruned version of the tree, the subtree in question is pruned by
replacing it with the leaf “class B

32
How does Tree Pruning Work?

Figure: 1.2 An unpruned decision tree and a post-pruned decision tree

33
THANK YOU

CIPP/E Sample Questions: An IAPP Publication
100% (8)
CIPP/E Sample Questions: An IAPP Publication
18 pages
Decision Tree & Techniques
71% (7)
Decision Tree & Techniques
41 pages
Session 5b Classification by Decision Tree Induction (1)
No ratings yet
Session 5b Classification by Decision Tree Induction (1)
42 pages
UNIT 1 CLASSIFICATION & PREDICTION DM
No ratings yet
UNIT 1 CLASSIFICATION & PREDICTION DM
71 pages
Classification and Prediction
No ratings yet
Classification and Prediction
69 pages
COMP 6930 Topic01 Classification Basics
No ratings yet
COMP 6930 Topic01 Classification Basics
190 pages
Topic01 Classification Basics Jiawei Han Extra
No ratings yet
Topic01 Classification Basics Jiawei Han Extra
198 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Data Mining & Knowledge Discovery
No ratings yet
Data Mining & Knowledge Discovery
34 pages
10.1 Decision Tree
No ratings yet
10.1 Decision Tree
17 pages
ML UNIT 2-2-40
No ratings yet
ML UNIT 2-2-40
39 pages
MI_Unit 4
No ratings yet
MI_Unit 4
79 pages
Lecture11-Ch8-ClassBasic-Part1
No ratings yet
Lecture11-Ch8-ClassBasic-Part1
38 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
DWDM Asgmnt Prog
No ratings yet
DWDM Asgmnt Prog
51 pages
Unit-Iii: Classification and Prediction
No ratings yet
Unit-Iii: Classification and Prediction
21 pages
ML Unit-2
No ratings yet
ML Unit-2
16 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Module 3
No ratings yet
Module 3
64 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Machine Learning: Prepared by
No ratings yet
Machine Learning: Prepared by
44 pages
Classification
No ratings yet
Classification
45 pages
ML (Interview)
No ratings yet
ML (Interview)
20 pages
Class Basic
No ratings yet
Class Basic
75 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Unit 3
No ratings yet
Unit 3
95 pages
Unit-5 Decision Trees & Ensembles Methods
No ratings yet
Unit-5 Decision Trees & Ensembles Methods
11 pages
Lecture 13
No ratings yet
Lecture 13
25 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Decision Trees Set-1
No ratings yet
Decision Trees Set-1
7 pages
ml unit3
No ratings yet
ml unit3
8 pages
DWDM UNIT-IV Classification and Prediction
100% (1)
DWDM UNIT-IV Classification and Prediction
70 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
module2-2
No ratings yet
module2-2
30 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
AI - Mod 5. Part 2
No ratings yet
AI - Mod 5. Part 2
40 pages
Module 4 Lecture -2
No ratings yet
Module 4 Lecture -2
65 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Apznzayn4iudcvxyoppqs61j04 7hfvwveb4orry3irmq7ekrlv08lh81olz64cb1ycwzmxuattzrg0ox0g-e Tcprei1i3bwhbnbqofqhvtixwokm0ftaoxwee3znpcytoh6jgknlof6 Rukjysosqdyan8wfbovpzrikmrpeywyu07ft Vvpsanuerxuhcghc7g6sd4pcyi9z-Wao8bn
No ratings yet
Apznzayn4iudcvxyoppqs61j04 7hfvwveb4orry3irmq7ekrlv08lh81olz64cb1ycwzmxuattzrg0ox0g-e Tcprei1i3bwhbnbqofqhvtixwokm0ftaoxwee3znpcytoh6jgknlof6 Rukjysosqdyan8wfbovpzrikmrpeywyu07ft Vvpsanuerxuhcghc7g6sd4pcyi9z-Wao8bn
20 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
34 pages
siv UNIT-3 Classification DWM PART-A
No ratings yet
siv UNIT-3 Classification DWM PART-A
12 pages
ML Unit 3 Part 3
No ratings yet
ML Unit 3 Part 3
33 pages
Decision tree
No ratings yet
Decision tree
16 pages
11 W11NSE6220 - Fall 2023 - Zeng
No ratings yet
11 W11NSE6220 - Fall 2023 - Zeng
43 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
DM Mod 3
No ratings yet
DM Mod 3
14 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
DWDM Unit 4 PDF
No ratings yet
DWDM Unit 4 PDF
18 pages
DM Lab 04
No ratings yet
DM Lab 04
6 pages
ML4 - Decision Trees & Random Forest
No ratings yet
ML4 - Decision Trees & Random Forest
44 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
chapter 04
No ratings yet
chapter 04
48 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
22 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Classification and Regression Trees (CART - III) : DR A. Ramesh
No ratings yet
Classification and Regression Trees (CART - III) : DR A. Ramesh
42 pages
Cashless Indian Economy (Merits and Demerits) : Presentation On
No ratings yet
Cashless Indian Economy (Merits and Demerits) : Presentation On
10 pages
Industrial Instrumentation Lab
No ratings yet
Industrial Instrumentation Lab
34 pages
Optimization For Data Science
No ratings yet
Optimization For Data Science
18 pages
K - Nearest Neighbors Implementation in R
No ratings yet
K - Nearest Neighbors Implementation in R
22 pages
3 1 - Exam: 100 Marks
No ratings yet
3 1 - Exam: 100 Marks
1 page
ML Assignment v4
No ratings yet
ML Assignment v4
1 page
Varignon's Theorem: F R F R F F R
No ratings yet
Varignon's Theorem: F R F R F F R
10 pages
Addition of Vectors: Combining Vector Components
No ratings yet
Addition of Vectors: Combining Vector Components
10 pages
Addition of Vectors: Combining Vector Components
No ratings yet
Addition of Vectors: Combining Vector Components
10 pages
Mobile Phone Charger References
No ratings yet
Mobile Phone Charger References
13 pages
Henry Thesis PHD
No ratings yet
Henry Thesis PHD
275 pages
KYL-200L Low Power Wireless Transceiver Data Module: Shenzhen KYL Communication Equipment Co., LTD
No ratings yet
KYL-200L Low Power Wireless Transceiver Data Module: Shenzhen KYL Communication Equipment Co., LTD
11 pages
Dastd Project Report PDF
No ratings yet
Dastd Project Report PDF
43 pages
Neutral Grounding Resistors
100% (1)
Neutral Grounding Resistors
5 pages
Unit 3 Uncertainity
No ratings yet
Unit 3 Uncertainity
57 pages
SEO Proposal For ..........
No ratings yet
SEO Proposal For ..........
17 pages
Desta Tegegne
No ratings yet
Desta Tegegne
100 pages
PDDL
No ratings yet
PDDL
2 pages
En30seq 1
No ratings yet
En30seq 1
69 pages
15067CEM Resit
No ratings yet
15067CEM Resit
17 pages
Mock II Std. X - I.T. Paper III
No ratings yet
Mock II Std. X - I.T. Paper III
7 pages
Battery Report
No ratings yet
Battery Report
30 pages
DSP FFT - Codigo en Matlab
No ratings yet
DSP FFT - Codigo en Matlab
2 pages
IOS XR Link Bundle
No ratings yet
IOS XR Link Bundle
26 pages
AWS Interview Questions and Answers
No ratings yet
AWS Interview Questions and Answers
33 pages
MUY INTERESANTE Armas 2a Guerra Mundial
No ratings yet
MUY INTERESANTE Armas 2a Guerra Mundial
201 pages
BloodBank IE: Blood Bank Management System Using Cloud Computing
No ratings yet
BloodBank IE: Blood Bank Management System Using Cloud Computing
35 pages
ECE VLSI 2nd SEM Syllabus
No ratings yet
ECE VLSI 2nd SEM Syllabus
9 pages
Bifold Sil 3 Bif1107001c001
No ratings yet
Bifold Sil 3 Bif1107001c001
2 pages
Spectromaxx With iCAL 2.0: Even Faster Speed. Even Better Performance. Even Greater Intelligence
No ratings yet
Spectromaxx With iCAL 2.0: Even Faster Speed. Even Better Performance. Even Greater Intelligence
4 pages
The Blockchain Revolution
No ratings yet
The Blockchain Revolution
27 pages
Ethics MCQS
100% (1)
Ethics MCQS
23 pages
Karbala Refinery Project
No ratings yet
Karbala Refinery Project
5 pages
Working With Vi
No ratings yet
Working With Vi
5 pages
Using Bellville Springs To Maintain Bolt Preload
No ratings yet
Using Bellville Springs To Maintain Bolt Preload
30 pages
Wa0001.
No ratings yet
Wa0001.
13 pages
Kabel - PC SPSC2000 FW2 PDF
No ratings yet
Kabel - PC SPSC2000 FW2 PDF
1 page
Cordect Wireless Access System
No ratings yet
Cordect Wireless Access System
28 pages

Classification and Regression Trees (CART - I) : Dr. A. Ramesh

Uploaded by

Classification and Regression Trees (CART - I) : Dr. A. Ramesh

Uploaded by

Classification and Regression Trees (CART - I)

• Introduction to Classification and Regression Trees

• Classiﬁcation is one form of data analysis that can be used to extract

• A decision tree for the concept buys_computer, indicating whether a

• Each internal (non-leaf) node

Figure 1.1 : Decision Tree

• CART comes under supervised learning technique

• CART adopt a greedy(i.e., non backtracking) approach in which decision

• It is very interpretable model

• The algorithm is called with three parameters: D, attribute list, and

• This procedure employs an attribute selection measure, such as

• The recursive partitioning stops only when anyone of the following

1. All of the tuples in partition D (represented at node N) belong to the

• Attribute selection measures are also known as splitting rules because

• There are three popular attribute selection measures

• This measure studied the value or “information content” of messages

• Where 𝑝𝑖 is the probability that an arbitrary

• The attribute A with the highest information gain, (Gain(A)), is chosen as

• Where 𝑝𝑖 is the probability that a tuple

• When considering a binary split, we compute a weighted sum of the

• For each attribute, each of the possible binary splits is considered

• All measures have some bias.

• The post-pruning approach removes sub_trees from a “fully grown” tree

Figure: 1.2 An unpruned decision tree and a post-pruned decision tree

You might also like