0% found this document useful (0 votes)

20 views101 pages

Aiml M4 C1

Uploaded by

Vivek Tg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views101 pages

Aiml M4 C1

Uploaded by

Vivek Tg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 101

MODULE – 4

Chapter – 6
Decision Tree Learning
6.1 INTRODUCTION TO DECISION TREE LEARNING
MODEL
• Decision tree learning model, one of the most popular supervised
predictive learning models, classifies data instances with high
accuracy and consistency.
• The model performs an inductive inference that reaches a general
conclusion from observed examples.
• This model is variably used for solving complex classification
applications.
• Decision tree is a concept tree which summarizes the information
contained in the training dataset in the form of a tree structure.
• Once the concept model is built, test data can be easily classified.
• This model can be used to classify both categorical target
variables and continuous-valued target variables.
• Given a training dataset X, this model computes a hypothesis
function f(X) as decision tree.
• Inputs to the model are data instances or objects with a set of
features or attributes which can be discrete or continuous
and the output of the model is a decision tree which predicts or
classifies the target class for the test data object.
• In statistical terms, attributes or features are called as
independent variables.
• The target feature or target class is called as response
variable which indicates the category we need to predict on a
test object.
• The decision tree learning model generates a complete
hypothesis space in the form of a tree structure with the
given training dataset and allows us to search through the
possible set of hypotheses which in fact would be a smaller
decision tree as we walk through the tree.
• This kind of search bias is called as preference bias.
6.1.1 Structure of a Decision Tree
• A decision tree has a structure that consists of a root node, internal
nodes/decision nodes, branches, and terminal nodes/leaf nodes.
The topmost node in the tree is the root node.
• Internal nodes are the test nodes and are also called as decision
nodes. These nodes represent a choice or test of an input attribute
and the outcome or outputs of the test condition are the branches
emanating from this decision node.
• The branches are labelled as per the outcomes or output values of the
test condition. Each branch represents a sub-tree or sub section n of
the entire tree.
• Every decision node is part of a path to a leaf node. The leaf nodes
represent the labels or the outcome of a decision path. The labels of
the leaf nodes are the different target classes a data instance can
belong to.
• Every path from root to a leaf node represents a logical rule that
corresponds to a conjunction of test attributes and the whole
tree represents a disjunction of these conjunctions.
• The decision tree model, in general, represents a collection of
logical rules of classification in the form of a tree structure.
• Decision networks, otherwise called as influence diagrams,
have a directed graph structure with nodes and links.
• It is an extension of Bayesian belief networks that represents
information about each node’s current state, its possible
actions, the possible outcome of those actions, and their utility.
• Figure 6.1 shows symbols that are used in this book to
represent different nodes in the construction of a decision tree.
• A circle is used to represent a root node, a diamond symbol is
used to represent a decision node or the internal nodes, and all
leaf nodes are represented with a rectangle.
• A decision tree consists of two major procedures discussed below.
Advantages of Decision Trees
• 1. Easy to model and interpret
• 2. Simple to understand
• 3. The input and output attributes can be discrete or continuous
predictor variables.
• 4. Can model a high degree of nonlinearity in the relationship
between the target variables and the predictor variables
• 5. Quick to train
Disadvantages of Decision Trees
• Some of the issues that generally arise with a decision tree learning
are that:
• 1. It is difficult to determine how deeply a decision tree can be grown
or when to stop growing it.
• 2. If training data has errors or missing attribute values, then the
decision tree constructed may become unstable or biased.
• 3. If the training data has continuous valued attributes, handling it is
computationally complex and has to be discretized.
• 4. A complex decision tree may also be over-fitting with the training
data.
• 5. Decision tree learning is not well suited for classifying multiple
output classes.
• 6. Learning an optimal decision tree is also known to be
NP-complete.
6.1.2 Fundamentals of Entropy
• Given the training dataset with a set of attributes or features, the
decision tree is constructed by finding the attribute or feature
that best describes the target class for the given test instances.
• The best split feature is the one which contains more
information about how to split the dataset among all features so
that the target class is accurately identified for the test
instances.
• In other words, the best split attribute is more informative to
split the dataset into sub datasets and this process is continued
until the stopping criterion is reached.
• This splitting should be pure at every stage of selecting the
best feature.
• The best feature is selected based on the amount of information
among the features which are basically calculated on
probabilities.
• Quantifying information is closely related to information theory. In
the field of information theory, the features are quantified by a
measure called Shannon Entropy which is calculated based on
the probability distribution of the events.
• Entropy is the amount of uncertainty or randomness in the
outcome of a random variable or an event.
• Moreover, entropy describes about the homogeneity of the data
instances.
• The best feature is selected based on the entropy value.
• For example, when a coin is flipped, head or tail are the two
outcomes, hence its entropy is lower when compared to rolling a
dice which has got six outcomes.
6.2 DECISION TREE INDUCTION
ALGORITHMS
• There are many decision tree algorithms, such as ID3, C4.5,
CART, CHAID, QUEST, GUIDE, CRUISE, and CTREE, that are
used for classification in real-time environment.
• The most commonly used decision tree algorithms are ID3
(Iterative Dichotomizer 3), developed by J.R Quinlan in 1986,
and C4.5 is an advancement of ID3 presented by the same
author in 1993.
• CART, that stands for Classification and Regression Trees, is
another algorithm which was developed by Breiman et al. in
1984.
• The accuracy of the tree constructed depends upon the selection
of the best split attribute.
• Different algorithms are used for building decision trees which
use different measures to decide on the splitting criterion.
• Algorithms such as ID3, C4.5 and CART are popular algorithms
used in the construction of decision trees.
• The algorithm ID3 uses ‘Information Gain’ as the splitting
criterion whereas the algorithm C4.5 uses ‘Gain Ratio’ as the
splitting criterion.
• The CART algorithm is popularly used for classifying both
categorical and continuous-valued target variables. CART
uses GINI Index to construct a decision tree.
• Decision trees constructed using ID3 and C4.5 are also called
as univariate decision trees which consider only one
feature/attribute to split at each decision node whereas decision
trees constructed using CART algorithm are multivariate
decision trees which consider a conjunction of univariate splits.
6.2.1 ID3 Tree
• ID3 is a supervised learning algorithm which uses a training
dataset with labels and constructs a decision tree.
• ID3 is an example of univariate decision trees as it considers
only one feature at each decision node.
• This leads to axis-aligned splits. The tree is then used to classify
the future test instances.
• It constructs the tree using a greedy approach in a top-down
fashion by identifying the best attribute at each level of the
tree.
• ID3 works well if the attributes or features are considered as
discrete/categorical values. If some attributes are continuous,
then we have to partition attributes or features to be
decretized or nominal attributes or features.
Axis-aligned split function uses only one feature at a time to separate the feature space of training samples
by a hyper-plane that is aligned to the feature axes
• The algorithm builds the tree using a purity measure called
‘Information Gain’ with the given training data instances and
then uses the constructed tree to classify the test data.
• It is applied for training set with only nominal attributes or
categorical attributes and with no missing values for
classification.
• ID3 works well for a large dataset.
• If the dataset is small, overfitting may occur. Moreover, it is not
accurate if the dataset has missing attribute values.
• No pruning is done during or after construction of the tree and it
is prone to outliers.
• C4.5 and CART can handle both categorical attributes and
continuous attributes.
• Both C4.5 and CART can also handle missing values, but C4.5
is prone to outliers whereas CART can handle outliers as well.
The algorithm C4.5 is based on Occam’s Razor which says that given two correct
solutions, the simpler solution has to be chosen. Moreover, the algorithm requires a larger
training set for better accuracy. It uses Gain Ratio as a measure during the construction
of decision trees. ID3 is more biased towards attributes with larger values.

Improvements in C4.5 over ID3:

· Handling both continuous and discrete
· Handling training data with missing attribute values
· Handling attributes with differing costs.
· Pruning trees after creation
Limitations:
The limitations of C4. 5 is its information entropy, it
gives poor results for larger distinct attributes.
• For example, if there is an attribute called ‘Register No’ for students
it would be unique for every student and will have distinct value for
every data instance resulting in more values for the attribute.
• Hence, every instance belongs to a category and would have higher
Information Gain than other attributes.
• To overcome this bias issue, C4.5 uses a purity measure Gain ratio
to identify the best split attribute.
• In C4.5 algorithm, the Information Gain measure used in ID3
algorithm is normalized by computing another factor called
Split_Info.
• This normalized information gain of an attribute called as Gain_Ratio
is computed by the ratio of the calculated Split_Info and Information
Gain of each attribute.
• Then, the attribute with the highest normalized information gain, that
is, highest gain ratio is used as the splitting criteria.
• Here, we split based on ‘Practical Knowledge’. The final
decision tree is shown in Figure 6.6.
Dealing with Continuous Attributes in C4.5
• The C4.5 algorithm is further improved by considering attributes
which are continuous, and a continuous attribute is discretized
by finding a split point or threshold.
• When an attribute ‘A’ has numerical values which are
continuous, a threshold or best split point ‘s’ is found such that
the set of values is categorized into two sets such as A < s and
A ≥ s.
• The best split point is the attribute value which has maximum
information gain for that attribute.
6.2.3 Classification and Regression Trees Construction
• The Classification and Regression Trees (CART) algorithm is a
multivariate decision tree learning used for classifying both
categorical and continuous-valued target variables.
• CART algorithm is an example of multivariate decision trees that
gives oblique splits.
• It solves both classification and regression problems.
• If the target feature is categorical, it constructs a classification
tree and if the target feature is continuous, it constructs a
regression tree.
• CART uses GINI Index to construct a decision tree.
• GINI Index is defined as the number of data instances for a class
or it is the proportion of instances. It constructs the tree as a
binary tree by recursively splitting a node into two nodes.
multivariate decision trees - each test can be based on one or more of the input features
6.2.4 Regression Trees

• Regression trees are a variant of decision trees where the

target feature is a continuous valued variable.
• These trees can be constructed using an algorithm called
reduction in variance which uses standard deviation to
choose the best splitting attribute.
6.3 VALIDATING AND PRUNING OF DECISION TREES
• Inductive bias refers to a set of assumptions about the domain
knowledge added to the training data to perform induction that is
to construct a general model out of the training data.
• A bias is generally required as without it induction is not possible,
since the training data can normally be generalized to a larger
hypothesis space.
• Inductive bias in ID3 algorithm is the one that prefers the first
acceptable shorter trees over larger trees, and when selecting
the best split attribute during construction, attributes with
high information gain are chosen.
• Thus, even though ID3 searches a large space of decision trees,
it constructs only a single decision tree when there may exist
many alternate decision trees for the same training data.

Inductive bias is the set of assumptions that a machine learning algorithm makes about the
relationship between input variables (features) and output variables (labels) based on the
• It applies a hill-climbing search that does not backtrack and
may finally converge to a locally optimal solution that is not
globally optimal.
• The shorter tree is preferred using Occam’s razor principle
which states that the simplest solution is the best solution.
• Overfitting is also a general problem with decision trees.
• Once the decision tree is constructed, it must be validated for better
accuracy and to avoid over-fitting and under-fitting.
• There is always a tradeoff between accuracy and complexity of the
tree.
• The tree must be simple and accurate.
• If the tree is more complex, it can classify the data instances
accurately for the training set but when test data is given, the
tree constructed may perform poorly which means misclassifications
are higher and accuracy is reduced.
• This problem is called as over-fitting.
• To avoid overfitting of the tree, we need to prune the trees and
construct an optimal decision tree.
• Trees can be pre-pruned or post-pruned.
• If tree nodes are pruned during construction or the construction is
stopped earlier without exploring the nodes' branches, then it is
called as pre-pruning whereas if tree nodes are pruned after the
construction is over then it is called as post-pruning.
• Basically, the dataset is split into three sets called training dataset,
validation dataset and test dataset.

Pruning reduces the size of decision trees by removing parts of the tree that do not provide power to classify
• Generally, 40% of the dataset is used for training the
decision tree and the remaining 60% is used for validation and
testing.
• Once the decision tree is constructed, it is validated with the
validation dataset and the misclassifications are identified.
• Using the number of instances correctly classified and number
of instances wrongly classified, Average Squared Error (ASE)
is computed.
• The tree nodes are pruned based on these computations and
the resulting tree is validated until we get a tree that performs
better.
• Cross validation is another way to construct an optimal decision
tree. Here, the dataset is split into k-folds, among which k–1 folds
are used for training the decision tree and the kth fold is used
for validation and errors are computed.
• The process is repeated for randomly k–1 folds and the mean of
the errors is computed for different trees.
• The tree with the lowest error is chosen with which the
performance of the tree is improved.
• This tree can now be tested with the test dataset and predictions
are made.
For Understanding
Another approach is that after the tree is constructed using the training
set, statistical tests like error estimation and Chi-square test are used to
estimate whether pruning or splitting is required for a particular node to
find a better accurate tree.
REVISION

Stephane Reverre - The Complete Arbitrage Deskbook PDF
86% (7)
Stephane Reverre - The Complete Arbitrage Deskbook PDF
527 pages
Abstract Reasoning Questions & Answers - Page 6
No ratings yet
Abstract Reasoning Questions & Answers - Page 6
8 pages
Decision Tree & Techniques
71% (7)
Decision Tree & Techniques
41 pages
Timbre: Acoustics, Perception, and Cognition PDF
100% (4)
Timbre: Acoustics, Perception, and Cognition PDF
392 pages
Upang Nursing Psych
100% (4)
Upang Nursing Psych
8 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AIML Module-04
No ratings yet
AIML Module-04
46 pages
ML_Module-3-chapter-6 RNSIT
No ratings yet
ML_Module-3-chapter-6 RNSIT
10 pages
ML UNIT 2-2-40
No ratings yet
ML UNIT 2-2-40
39 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Unit Iir20
No ratings yet
Unit Iir20
22 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
UNIT II 2.1 ML Decision Tree Learning
No ratings yet
UNIT II 2.1 ML Decision Tree Learning
55 pages
AIML Module 4 Imp
No ratings yet
AIML Module 4 Imp
5 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Decision Tree.pptx
No ratings yet
Decision Tree.pptx
41 pages
Decession Tree
No ratings yet
Decession Tree
72 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
Decisiontree 2
No ratings yet
Decisiontree 2
16 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
DecisionTree Numerical ID3Prob
No ratings yet
DecisionTree Numerical ID3Prob
114 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Unit-2 Material (1)
No ratings yet
Unit-2 Material (1)
52 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Unit 3
No ratings yet
Unit 3
46 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
7 pages
4. Classification
No ratings yet
4. Classification
75 pages
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
18 pages
decision tree
No ratings yet
decision tree
13 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Peer Reviewed Scientific Journals
No ratings yet
Peer Reviewed Scientific Journals
9 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
ML-unit-3
No ratings yet
ML-unit-3
22 pages
unit-4[1].docx ML
No ratings yet
unit-4[1].docx ML
42 pages
Module 3
No ratings yet
Module 3
103 pages
Session 5b Classification by Decision Tree Induction (1)
No ratings yet
Session 5b Classification by Decision Tree Induction (1)
42 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
15 pages
DWDM UNIT 4
No ratings yet
DWDM UNIT 4
80 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Slide 3
No ratings yet
Slide 3
23 pages
DMI UNIT 4
No ratings yet
DMI UNIT 4
34 pages
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
No ratings yet
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
36 pages
ML Unit 3
No ratings yet
ML Unit 3
15 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
UNIT-3[MLT]
No ratings yet
UNIT-3[MLT]
42 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Unit II Part 1
No ratings yet
Unit II Part 1
62 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Regular Expression
No ratings yet
Regular Expression
14 pages
Aiml M3 C1
No ratings yet
Aiml M3 C1
59 pages
21CS43 SIMP Questions-TIE
No ratings yet
21CS43 SIMP Questions-TIE
60 pages
Pointers
No ratings yet
Pointers
16 pages
Aiml 2 3
No ratings yet
Aiml 2 3
51 pages
Schoenberg A Survival From Varsavia
No ratings yet
Schoenberg A Survival From Varsavia
2 pages
Starters 2
No ratings yet
Starters 2
5 pages
Kawwana - The Struggle For Inwardness in Judaism
No ratings yet
Kawwana - The Struggle For Inwardness in Judaism
26 pages
Script For HM
No ratings yet
Script For HM
4 pages
Why Should We Do Ibadah Allah PDF
No ratings yet
Why Should We Do Ibadah Allah PDF
11 pages
Cold Call Ebook v3
No ratings yet
Cold Call Ebook v3
15 pages
Tic Tac Toe C++
No ratings yet
Tic Tac Toe C++
4 pages
Reading Comprehension
No ratings yet
Reading Comprehension
15 pages
Video Lesson and Marungko Approach: A Way To Upgrade The Reading Skills of Grade 1 Learners
No ratings yet
Video Lesson and Marungko Approach: A Way To Upgrade The Reading Skills of Grade 1 Learners
6 pages
NEW-P.Ed9 BEED Syllabus
No ratings yet
NEW-P.Ed9 BEED Syllabus
13 pages
Section 164, 364 & 533
No ratings yet
Section 164, 364 & 533
20 pages
RP - Relative Pronouns - Exercícios
No ratings yet
RP - Relative Pronouns - Exercícios
2 pages
Esau Effect
No ratings yet
Esau Effect
40 pages
Calvin Institutes - 1 - 1
No ratings yet
Calvin Institutes - 1 - 1
3 pages
ST Augustine College of South Africa
No ratings yet
ST Augustine College of South Africa
3 pages
4 Fields Planning Sheet Fillable
No ratings yet
4 Fields Planning Sheet Fillable
2 pages
Bilkent Cope Bilkent Hazirlik Cope Sorulari Cloze Test 1
No ratings yet
Bilkent Cope Bilkent Hazirlik Cope Sorulari Cloze Test 1
2 pages
History One Marks Questions With Answer
No ratings yet
History One Marks Questions With Answer
49 pages
Lecture 4 Phase-Shift Keying PDF
No ratings yet
Lecture 4 Phase-Shift Keying PDF
5 pages
K Series PDF
No ratings yet
K Series PDF
2 pages
Pre-Test Philippine Politics and Governance
No ratings yet
Pre-Test Philippine Politics and Governance
2 pages
FEOS_ Dhaka
No ratings yet
FEOS_ Dhaka
42 pages
Svetlana Alexievich
No ratings yet
Svetlana Alexievich
17 pages
Primacy PDF
No ratings yet
Primacy PDF
25 pages
O Magsaya
No ratings yet
O Magsaya
2 pages

Aiml M4 C1

Uploaded by

Aiml M4 C1

Uploaded by

MODULE – 4

Improvements in C4.5 over ID3:

• Regression trees are a variant of decision trees where the

You might also like