0% found this document useful (0 votes)

6 views

03 02 Decision Trees (1)

Uploaded by

l.arrizabalaga

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

03 02 Decision Trees (1)

Uploaded by

l.arrizabalaga

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Rubén Sánchez Corcuera

[email protected]

Decision Trees
Decision trees

Decision tree learning is a method for approximating discrete-valued

target functions, in which the learned function is represented by a
decision tree.

2
Introduction

■ Learned trees can be represented as sets of it-then rules to improve human

readability
■ These learning methods are among the most popular of inductive inference
algorithms and have been successfully applied to a broad range of tasks
from learning to diagnose medical cases to learning to assess credit risk of
loan applicants.
● We will see that Random Forest, an ensemble method using decision
trees, offers pretty good results and is commonly used.

3
Introduction

■ Decision trees classify instances by sorting them down the tree from the
root to some leaf node, which provides the classification of the instance.
■ Each node in the tree specifies a test of some attribute of the instance, and
each branch descending from that node corresponds to one of the possible
values for this attribute.
■ I.e. an instance is classified by starting at the root node of the tree, testing
the attribute specified by this node, then moving down the tree branch
corresponding to the value of the attribute in the given example.
● This process is repeated for the subtree rooted at the new node.

4
Decision tree for playing
tennis

5
Introduction

■ Decision trees are made up of groups of rules that describe how attributes
(or features) of instances relate to each other.
■ Each path from the root of the tree to a leaf is like a set of conditions (rules)
based on the attributes. The entire tree is a combination of these different
sets of conditions.

What are the set of rules of the previous decision tree?

Express it as a logic statement (5 mins)

6
Appropriate Problems for Decision Trees

■ Instances are represented by attribute-value pairs

● Instances are described by a ﬁxed set of attributes and their values (temp:
hot).
● The easiest situation for a decision tree learning is when each attribute
takes on a small number of possible values (Hot, Mild, Cold)
■ The target function has discrete output value
● Boolean classiﬁcation (true, false) to each example
● Decision tree methods easily extend to learning functions with more than
two possible output values

7
Appropriate Problems for Decision Trees

■ Disjunctive descriptions may be required

● Decision trees naturally represent disjunctive expressions
■ The training data may contain errors
● Decision tree methods are robust to errors, both in training samples
and in the attribute values
■ The training data may contain missing attribute values
● Decision tree methods can be used even when some training
examples have unknown values

8
The ID3
algorithm 9
ID3

■ Most algorithms that have been developed for learning decision trees are
variations on a core algorithm that employs a top-down, greedy search
through the space of possible decision trees.
■ ID3 learns decision trees by constructing them top down starting with the
question:
● Which attribute should be tested at the root of the tree?
■ To answer this question, each instance attribute is evaluated using a
statistical test to determine how well it alone classiﬁes the training
examples.

10
ID3

■ A descendant of the root node is then created for each possible value of
this attribute, and the training examples are sorted to the appropriate
descendant node
● i.e., down the branch corresponding to the example's value for this
attribute.
■ The entire process is then repeated using the training examples associated
with each descendant node to select the best attribute to test at that point
in the tree. This forms a greedy search for an acceptable decision tree, in
which the algorithm never backtracks to reconsider earlier choices.
Let's see it with an example

11
ID3 (Examples, Target_Attribute, Attributes)
Create a root node for the tree
If all examples are positive, Return the single-node tree Root, with label = +.
If all examples are negative, Return the single-node tree Root, with label = -.
If number of predicting attributes is empty, then Return the single node tree Root,
with label = most common value of the target attribute in the examples.
Otherwise Begin
A ← The Attribute that best classiﬁes examples.
Decision Tree attribute for Root = A.
For each possible value, vi, of A,
Add a new tree branch below Root, corresponding to the test A = vi.
Let Examples(vi) be the subset of examples that have the value vi for A
If Examples(vi) is empty
Then below this new branch add a leaf node with label = most common target value in the examples
Else below this new branch add the subtree ID3 (Examples(vi), Target_Attribute, Attributes – {A})
End
Return Root 12
ID3

■ The central choice in the ID3 algorithm is selecting which attribute to test
at each node in the tree.
● We want to select the most useful attribute to classify the examples
■ We will deﬁne a statistical property, called information gain
● This measures how well a given attribute separates the training
examples according to their target classiﬁcation
■ ID3 uses this information gain measure to select among the candidate
attributes at each step while growing the tree.

13
ID3: Selecting the best classiﬁer attribute

■ In order to deﬁne information gain precisely, we begin by deﬁning a measure

commonly used in information theory → ENTROPY
● Entropy characterizes the (im)purity of an arbitrary collection of examples
■ Given a collection S, containing positive and negative examples of some target
concept, the entropy of S relative to this boolean classiﬁcation is:

■ Where p+ is the proportion of positive examples and p- is the proportion of

negative examples
● If we have 20 examples and 5 are positive, p+ = 5/20
● We will deﬁne 0log20 as 0

14
ID3: Selecting the best classiﬁer attribute

■ Entropy is 0 if all members of S belong to the same class

■ Entropy is 1 when the collection contains an equal number of positive and
negative examples
■ The previous formula can only be applied to Boolean attributes. More generally,
if the target attribute can take on several different values (c), then the entropy is
relative to this c-wise classiﬁcation:

■ The logarithm is still base 2 because entropy is a measure of the expected

encoding length measured in bits (Binary)

15
ID3: Selecting the best classiﬁer attribute

■ Given entropy as a measure of the impurity in a collection of training

examples, we can now deﬁne a measure of the effectiveness of an attribute
in classifying the training data.
■ The measure we will use, called information gain, is simply the expected
reduction in entropy caused by partitioning the examples according to this
attribute.

16
ID3: Selecting the best classiﬁer attribute

■ The information gain, Gain(S, A), of an attribute A relative to a collection of

examples S is deﬁned as:

■ Where
● Values(A) is the set of all possible values for attribute A
● Sv is the subset of S for which attribute A has value v
■ Note that the ﬁrst term is just the entropy of the original collection S and
the second term is the expected value of entropy after S is partitioned
using attribute A

17
ID3: Selecting the best classiﬁer attribute

■ Gain(S, A) is therefore the expected reduction in entropy caused by

knowing the value of attribute A.
■ Put another way, Gain(S, A) is the information provided about the target
function value, given the value of some other attribute A.

Let’s see it with an example

18
ID3: Selecting the best classiﬁer attribute

■ Suppose S is a collection of training-example days described by attributes

including Wind, which can have the values Weak or Strong
■ Assume S is a collection containing 14 examples [9+, 5-]
■ Of these 14 examples, suppose:
● 6 of the positive and 2 of the negative examples have Wind = Weak
● Remainder have Wind = strong

19
ID3: Selecting the best classiﬁer attribute

20
Exercice

■ Lets try calculating it ourselves:

● Suppose S is a collection of training-example days described by
attributes including Wind, which can have the values Weak and Strong
● Assume S is a collection containing 20 examples [16+, 4-]
● Suppose 9 of the positive and 1 of the negative have Wind = Weak and
the remainder Wind = Strong

Calculate the information gain using the previous formula

21
ID3: Hypothesis Space Search in DTL

■ ID3 can be characterized as searching a space of hypotheses for one that

ﬁts the training examples provided
● The hypothesis space searched by ID3 is the set of possible decision
trees.
■ ID3 performs a simple-to-complex, hill-climbing search through this
hypothesis space, beginning with the empty tree, then considering
progressively more elaborate hypotheses in search of a decision tree that
correctly classiﬁes the training data.
■ The evaluation function that guides this hill-climbing search is the
information gain measure.

22
ID3: Capabilities and limitations

■ ID3 maintains only a single current hypothesis as it searches through the

space of decision trees.
● By determining only a single hypothesis, ID3 loses the capabilities that
follow from explicitly representing all consistent hypotheses.
● For example, it does not have the ability to determine how many
alternative decision trees are consistent with the available training
data, or to pose new instance queries that optimally resolve among
these competing hypotheses

23
ID3: Capabilities and limitations

■ ID3 does not perform backtracking in its search

● Once it selects an attribute to test at a particular level in the tree, it
never backtracks to reconsider this choice.
● It is susceptible to the usual risks of hill-climbing search without
backtracking: converging to locally optimal solutions that are not
globally optimal.
● In the case of ID3, a locally optimal solution corresponds to the
decision tree it selects along the single search path it explores.
● However, this locally optimal solution may be less desirable than trees
that would have been encountered along a different branch of the
search.

24
ID3: Capabilities and limitations

■ ID3 uses all training examples at each step in the search to make
statistically based decisions regarding how to reﬁne its current hypothesis.
● This contrasts with methods that make decisions incrementally, based
on individual training examples.
● One advantage of using statistical properties of all the examples (e.g.,
information gain) is that the resulting search is much less sensitive to
errors in individual training examples.
● ID3 can be easily extended to handle noisy training data by modifying
its termination criterion to accept hypotheses that imperfectly ﬁt the
training data.

25
Exercises

■ Now that we understand how ID3 works, let’s implement it.

■ Follow the pseudocode to implement the algorithm in a collab notebook.
■ Use the data in 03_a_ID3_dataset.csv to try it.
● Some useful code on how to access external ﬁles from colab:
https://colab.research.google.com/notebooks/snippets/accessing_ﬁles.
ipynb

26
The CART
algorithm 27
CART algorithm

■ CART is the decision tree

algorithm implemented in
sklearn library
■ Is a greedy algorithm like
ID3 (no backtracking)

28
CART: Gini index

■ The selection criterion in CART is the Gini index instead of the information
gain used in ID3
■ The Gini index measures the impurity of D, a data partition or set of training
tuples:

■ Where:
● Pi is the probability that a tuple in D belongs to class Ci and is
estimated by

● The sum is computed over m classes

29
CART: Gini index

■ When considering a binary split, we compute a weighted sum of the

impurity of each resulting partition
■ For example, if a binary split on A partitions D into D1 and D2, the Gini index
of D given that partitioning is

30
CART: Gini index

■ When considering a binary split, we compute a weighted sum of the

impurity of each resulting partition
■ For example, if a binary split on A partitions D into D1 and D2, the Gini index
of D given that partitioning is

31
CART: Gini index

■ The Gini index considers a binary split for each attribute.

● I.e., we will end with a binary tree
■ For each attribute, each of the possible binary split is considered
■ For discrete-valued attributes, the subset that gives the minimum Gini
index for that attribute is selected as its splitting subset
■ For continuous-valued attributes, each possible split-point must be
considered. The strategy is similar to that described earlier for information
gain, where the midpoint between each pair of (sorted) adjacent values is
taken as a possible split-point.

32
CART: Gini index

■ The reduction in impurity that would be incurred by a binary split on a

discrete- or continuous-valued attribute A is:

■ The attribute that maximizes the reduction in impurity (or, equivalently, has
the minimum Gini index) is selected as the splitting attribute.
■ This attribute and either its splitting subset (for a discrete-valued splitting
attribute) or split-point (for a continuous-valued splitting attribute)
together form the splitting criterion.

33
CART: Gini index (Example)

34
CART: Gini index (Example)

■ Let D be the training data shown in the table, where there are nine tuples
belonging to the class buys computer D = yes and the remaining ﬁve tuples
belong to the class buys computer D = no.
■ A (root) node N is created for the tuples in D.
■ We ﬁrst use the Gini index to compute the impurity of D:

35
CART: Gini index (Example)

■ To ﬁnd the splitting criterion for the tuples in D, we need to compute the
Gini index for each attribute.
■ Let’s start with the attribute income and consider each of the possible
splitting subsets.
■ Consider the subset {low, medium}
● This would result in 10 tuples in partition D1 satisfying the condition
income ∈ {low, medium}
● The remaining four tuples of D would be assigned to partition D2
● The Gini index value computed based on this partitioning would be…

36
CART: Gini index (Example)

■ The Gini index value computed based on this partitioning would be:

37
CART: Gini index (Example)

■ Similarly, the Gini index values for splits on the remaining subsets are 0.458
(for the subsets {low, high} and {medium}) and 0.450 (for the subsets
{medium, high} and {low}).
■ Therefore, the best binary split for attribute income is on {low, medium} (or
{high}) because it minimizes the Gini index.
■ Evaluating age, we obtain {youth, senior} (or {middle aged}) as the best
split for age with a Gini index of 0.375.
■ The attributes student and credit rating are both binary, with Gini index
values of 0.367 and 0.429, respectively.

38
CART: Gini index (Example)

■ The attribute age and splitting subset {youth, senior} therefore give the
minimum Gini index overall, with a reduction in impurity of 0.459- 0.357 =
0.102.
■ This binary split results in the maximum reduction in impurity of the tuples
in D and is returned as the splitting criterion.
■ Node N is labeled with the criterion, two branches are grown from it, and
the tuples are partitioned accordingly.

39
Tree pruning
40
CART: Tree pruning

■ When a decision tree is built, many of the branches will reflect anomalies in
the training data due to noise or outliers.
■ Tree pruning methods address this problem of overﬁtting the data.
■ Such methods typically use statistical measures to remove the least-reliable
branches.
■ Pruned trees tend to be smaller and less complex and, thus, easier to
comprehend.
■ They are usually faster and better at correctly classifying independent test
data (i.e., of previously unseen tuples) than unpruned trees.

41
CART: Tree pruning

■ There are two common approaches to tree pruning: pre-pruning and

post-pruning.
■ In the pre-pruning approach, a tree is “pruned” by halting its construction
early
● If partitioning the tuples at a node would result in a split that falls
below a prespeciﬁed threshold, then further partitioning of the given
subset is halted.
■ Upon halting, the node becomes a leaf.
■ The leaf may hold the most frequent class among the subset tuples or the
probability distribution of those tuples.

42
CART: Tree pruning

■ Post-pruning removes subtrees from a “fully grown” tree.

■ A subtree at a given node is pruned by removing its branches and replacing
it with a leaf.
■ The leaf is labeled with the most frequent class among the subtree being
replaced.

43
44
CART: Tree pruning
■ CART uses post-pruning with an approach called cost complexity.
■ This approach considers the cost complexity of a tree to be a function of
the number of leaves in the tree and the error rate of the tree.
● Where the error rate is the percentage of tuples misclassiﬁed by the
tree.
■ It starts from the bottom of the tree.
■ For each internal node, N, it computes the cost complexity of the subtree at
N, and the cost complexity of the subtree at N if it were to be pruned (i.e.,
replaced by a leaf node).
■ The two values are compared. If pruning the subtree at node N would result
in a smaller cost complexity, then the subtree is pruned. Otherwise, it is
kept.
45
CART: Tree pruning

■ A pruning set of class-labeled tuples is used to estimate cost complexity.

■ This set is independent of the training set used to build the unpruned tree
and of any test set used for accuracy estimation.
■ The algorithm generates a set of progressively pruned trees.
■ In general, the smallest decision tree that minimizes the cost complexity is
preferred.

46
CART: Exercices

■ Open the 03_b_CART colab to do the exercises.

47
Random forests 48
Random forest

■ A Random Forest model is composed by a large number of individual

decision trees that operate as an ensemble method.
■ An ensemble for classification is a composite model, made up of a
combination of classifiers.
■ The individual classifiers vote, and a class label prediction is returned by the
ensemble based on the collection of votes.
■ Ensembles tend to be more accurate than their component classifiers.
● Wisdom of crowds
■ This is one of my go-to algorithms for the first tests with a new dataset.

49
Random forest

50
Random forest

■ The low correlation between models is the key.

● Similar to how in investments a portfolio of low correlation stocks is a
better idea as its parts on their own.
■ The ensemble reduces the errors that arise from individual trees (as long as
all of them don’t fail)
● While some trees may be wrong, many other trees will be right, so as a
group the trees are able to move in the correct direction.

51
Random forest

52
Random forest

■ The prerequisites for random forest to perform well are:

● There needs to be some actual signal in our features so that models
built using those features do better than random guessing.
● The predictions (and therefore the errors) made by the individual trees
need to have low correlations with each other.
■ How do we ensure that the behavior of each individual tree is not too
correlated with the behavior of any of the other trees in the model?
● Bagging and feature randomness

53
Random forest: Bagging

Suppose that you are a patient and would like to have a diagnosis made based
on your symptoms. Instead of asking one doctor, you may choose to ask several.
If a certain diagnosis occurs more than any other, you may choose this as the
final or best diagnosis. That is, the final diagnosis is made based on a majority
vote, where each doctor gets an equal vote. Now replace each doctor by a
classifier, and you have the basic idea behind bagging. Intuitively, a majority vote
made by a large group of doctors may be more reliable than a majority vote
made by a small group.

54
Random forest: Bagging

■ Given a set, D, of d tuples, bagging works as follows. For iteration i (i = 1, 2,

… , k), a training set, Di, of d tuples is sampled with replacement from the
original set of tuples, D.
● Note that the term bagging stands for bootstrap aggregation.
■ Because sampling with replacement is used, some of the original tuples of
D may not be included in Di , whereas others may occur more than once.
■ A classiﬁer model, Mi , is learned for each training set, Di.
■ Random forest takes advantage of this by allowing each individual tree to
randomly sample from the dataset with replacement, resulting in different
trees.

55
Random forest: Bagging

■ Notice that with bagging we are not subsetting the training data into
smaller chunks and training each tree on a different chunk.
■ If we have a sample of size N, we are still feeding each tree a training set of
size N (unless speciﬁed otherwise).
■ But instead of the original training data, we take a random sample of size N
with replacement.
● E.g., if our training data was [1, 2, 3, 4, 5, 6] then we might give one of
our trees the following list [1, 2, 2, 3, 6, 6].
● Notice that both lists are of length six and that “2” and “6” are both
repeated in the randomly selected training data we give to our tree
(because we sample with replacement).

56
Random forest: Feature randomness

■ In a decision tree, when building a node, we choose the feature that

provides the best result according to the used metric.
■ In contrast, each tree in a random forest can pick only from a random
subset of features. This forces even more variation amongst the trees in the
model and ultimately results in lower correlation across trees and more
diversiﬁcation.

57
Further reading

■ Chapter 3 in [Mitchel, 1997]

■ Sections 8.2 and 8.6 in [Han and Kamber, 2006]
Extra material
■ Decision trees in sklearn:
https://scikit-learn.org/stable/modules/tree.html
■ Random Forests in sklearn:
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble
.RandomForestClassiﬁer.html

58
Exercises

■ Use Random Forest to process the Iris and Wine datasets.

■ Use a Random Forest to classify the CSV ﬁle of the previous
exercise.

59
We have our ﬁrst machine
learning model trained…

but how do we know how

well is working?

60
Do you have any questions?
[email protected]

Thanks!
61

Lec-1 Introduction
No ratings yet
Lec-1 Introduction
15 pages
Decision Trees
No ratings yet
Decision Trees
7 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
Unit 3
No ratings yet
Unit 3
46 pages
Ai Mod3@Azdocuments - in
No ratings yet
Ai Mod3@Azdocuments - in
42 pages
Screenshot 2024-02-06 at 1.43.15 PM
No ratings yet
Screenshot 2024-02-06 at 1.43.15 PM
66 pages
Unit 2 1
No ratings yet
Unit 2 1
15 pages
Module 3-1 PDF
No ratings yet
Module 3-1 PDF
43 pages
Module 3 DecisionTree Notes
100% (1)
Module 3 DecisionTree Notes
14 pages
Video Tutorial: Decision Tree Learning
No ratings yet
Video Tutorial: Decision Tree Learning
21 pages
ID3
No ratings yet
ID3
7 pages
Mod 3 AIML QB With Answers
No ratings yet
Mod 3 AIML QB With Answers
26 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Unit2 ML
No ratings yet
Unit2 ML
19 pages
AI_01_ID3
No ratings yet
AI_01_ID3
7 pages
ML Unit-2.1
No ratings yet
ML Unit-2.1
17 pages
Unit-2 Notes
No ratings yet
Unit-2 Notes
20 pages
AIML- Module 3- Updated
No ratings yet
AIML- Module 3- Updated
42 pages
Lec-3-Decision Trees
No ratings yet
Lec-3-Decision Trees
47 pages
Module 3
No ratings yet
Module 3
103 pages
ML UNIT 2-2-40
No ratings yet
ML UNIT 2-2-40
39 pages
The ID3 Algorithm
No ratings yet
The ID3 Algorithm
9 pages
Machine Learning: MVJ21CS62
No ratings yet
Machine Learning: MVJ21CS62
12 pages
3 Decision Tree Learning
No ratings yet
3 Decision Tree Learning
38 pages
ID3 Algorithm
100% (1)
ID3 Algorithm
3 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
ML - Unit 2 - Part I
No ratings yet
ML - Unit 2 - Part I
15 pages
3. Tree Models
No ratings yet
3. Tree Models
42 pages
Module - 3 - DTL & Ann
No ratings yet
Module - 3 - DTL & Ann
10 pages
2.decision Tree
No ratings yet
2.decision Tree
56 pages
Research Scholars Evaluation Based On Guides View Using Id3
No ratings yet
Research Scholars Evaluation Based On Guides View Using Id3
4 pages
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-29 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-29 Reference-Material-I
48 pages
Decision Tree Using ID3 Algorithm
No ratings yet
Decision Tree Using ID3 Algorithm
40 pages
Module 2 Notes v1 PDF
No ratings yet
Module 2 Notes v1 PDF
20 pages
module 2
No ratings yet
module 2
42 pages
Ijret - Research Scholars Evaluation Based On Guides View Using Id3
No ratings yet
Ijret - Research Scholars Evaluation Based On Guides View Using Id3
4 pages
Unit 2
No ratings yet
Unit 2
20 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
Data Structures: Notes For Lecture 13 Techniques of Data Mining by Samaher Hussein Ali
No ratings yet
Data Structures: Notes For Lecture 13 Techniques of Data Mining by Samaher Hussein Ali
8 pages
ML Lecture 13-14
No ratings yet
ML Lecture 13-14
33 pages
Decision Tree Learning and Inductive Inference
No ratings yet
Decision Tree Learning and Inductive Inference
37 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Decision Tree 2
No ratings yet
Decision Tree 2
20 pages
ML-Lec5
No ratings yet
ML-Lec5
7 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
No ratings yet
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
49 pages
ID3 Algorithm: Abbas Rizvi CS157 B Spring 2010
No ratings yet
ID3 Algorithm: Abbas Rizvi CS157 B Spring 2010
19 pages
W7-8_ Decision Trees
No ratings yet
W7-8_ Decision Trees
81 pages
Deep Learning: Decision Trees I
No ratings yet
Deep Learning: Decision Trees I
45 pages
Module 3
No ratings yet
Module 3
101 pages
ML UNIT 2 Decision Tree
No ratings yet
ML UNIT 2 Decision Tree
109 pages
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
No ratings yet
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
21 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
Module 3
No ratings yet
Module 3
102 pages
lec06decisiontreesandid3algorithm_727c2262eb504a6ee5d0bcf1f5c4d0c3_
No ratings yet
lec06decisiontreesandid3algorithm_727c2262eb504a6ee5d0bcf1f5c4d0c3_
26 pages
Unit IV Notes
No ratings yet
Unit IV Notes
20 pages
Decision Tree: Dept of CS & IT Bahauddin Zakariya University, Sahiwal Campus
No ratings yet
Decision Tree: Dept of CS & IT Bahauddin Zakariya University, Sahiwal Campus
31 pages
MAchine Learning 1
No ratings yet
MAchine Learning 1
17 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
09_Regression
No ratings yet
09_Regression
5 pages
03_Data_Preprocessing
No ratings yet
03_Data_Preprocessing
15 pages
14_DBSCAN
No ratings yet
14_DBSCAN
7 pages
13_BIRCH
No ratings yet
13_BIRCH
8 pages
Linearprogramming Problems With Solution
No ratings yet
Linearprogramming Problems With Solution
8 pages
Analog Transmission 1
No ratings yet
Analog Transmission 1
10 pages
Kakuro
No ratings yet
Kakuro
4 pages
The Weka Multilayer Perceptron Classifier: Daniel I. MORARIU, Radu G. Creţulescu, Macarie Breazu
No ratings yet
The Weka Multilayer Perceptron Classifier: Daniel I. MORARIU, Radu G. Creţulescu, Macarie Breazu
9 pages
Singly Linked Lists Interview Questions and Answers - Sanfoundry PDF
No ratings yet
Singly Linked Lists Interview Questions and Answers - Sanfoundry PDF
4 pages
Machine Learning: Lunch & Learn - Session 8 Luis Borbon 25/07/2017
No ratings yet
Machine Learning: Lunch & Learn - Session 8 Luis Borbon 25/07/2017
30 pages
Basic Graph Theory by MSR PDF
No ratings yet
Basic Graph Theory by MSR PDF
199 pages
Sri Vidya College of Engineering and Technology Question Bank
No ratings yet
Sri Vidya College of Engineering and Technology Question Bank
5 pages
Theory Assignment of AI
No ratings yet
Theory Assignment of AI
3 pages
Asymptotic Notations
No ratings yet
Asymptotic Notations
11 pages
CS302 Quiz Solved For MID TERM PDF
No ratings yet
CS302 Quiz Solved For MID TERM PDF
25 pages
Principles of Programming Languages
No ratings yet
Principles of Programming Languages
168 pages
Modified COMPD FIN
No ratings yet
Modified COMPD FIN
53 pages
Algorithm Unit 4 This Is A Study Material For Bca Students For Getting Knowledge
No ratings yet
Algorithm Unit 4 This Is A Study Material For Bca Students For Getting Knowledge
18 pages
Convolutional Code
No ratings yet
Convolutional Code
15 pages
Information Theory
No ratings yet
Information Theory
38 pages
Bit 2212 Business Systems Modelling Exam
No ratings yet
Bit 2212 Business Systems Modelling Exam
3 pages
1 - Introduction To Graphs
No ratings yet
1 - Introduction To Graphs
28 pages
Fundamentals of Datastructures and Algorithms (ICT 4303)
No ratings yet
Fundamentals of Datastructures and Algorithms (ICT 4303)
3 pages
Array Operation and Sorting
No ratings yet
Array Operation and Sorting
10 pages
Matlab Example
No ratings yet
Matlab Example
11 pages
Tom, Dick and Mary Discover The FFT
No ratings yet
Tom, Dick and Mary Discover The FFT
15 pages
Array Representation
No ratings yet
Array Representation
29 pages
Baughwooly Multiplier
No ratings yet
Baughwooly Multiplier
65 pages
Haskell Problems
No ratings yet
Haskell Problems
7 pages
Survey of Boosting From An Optimization Perspective: ICML 2009 Tutorial
No ratings yet
Survey of Boosting From An Optimization Perspective: ICML 2009 Tutorial
3 pages
Mathematical Foundations of Computer Science July 2023
No ratings yet
Mathematical Foundations of Computer Science July 2023
2 pages
CFG & GNF
No ratings yet
CFG & GNF
21 pages
Numerical Integration
No ratings yet
Numerical Integration
33 pages

03 02 Decision Trees (1)

Uploaded by

03 02 Decision Trees (1)

Uploaded by

Rubén Sánchez Corcuera

Decision tree learning is a method for approximating discrete-valued

■ Learned trees can be represented as sets of it-then rules to improve human

What are the set of rules of the previous decision tree?

■ Instances are represented by attribute-value pairs

■ Disjunctive descriptions may be required

■ In order to deﬁne information gain precisely, we begin by deﬁning a measure

■ Where p+ is the proportion of positive examples and p- is the proportion of

■ Entropy is 0 if all members of S belong to the same class

■ The logarithm is still base 2 because entropy is a measure of the expected

■ Given entropy as a measure of the impurity in a collection of training

■ The information gain, Gain(S, A), of an attribute A relative to a collection of

■ Gain(S, A) is therefore the expected reduction in entropy caused by

Let’s see it with an example

■ Suppose S is a collection of training-example days described by attributes

■ Lets try calculating it ourselves:

Calculate the information gain using the previous formula

■ ID3 can be characterized as searching a space of hypotheses for one that

■ ID3 maintains only a single current hypothesis as it searches through the

■ ID3 does not perform backtracking in its search

■ Now that we understand how ID3 works, let’s implement it.

■ CART is the decision tree

● The sum is computed over m classes

■ When considering a binary split, we compute a weighted sum of the

■ When considering a binary split, we compute a weighted sum of the

■ The Gini index considers a binary split for each attribute.

■ The reduction in impurity that would be incurred by a binary split on a

■ There are two common approaches to tree pruning: pre-pruning and

■ Post-pruning removes subtrees from a “fully grown” tree.

■ A pruning set of class-labeled tuples is used to estimate cost complexity.

■ Open the 03_b_CART colab to do the exercises.

■ A Random Forest model is composed by a large number of individual

■ The low correlation between models is the key.

■ The prerequisites for random forest to perform well are:

■ Given a set, D, of d tuples, bagging works as follows. For iteration i (i = 1, 2,

■ In a decision tree, when building a node, we choose the feature that

■ Chapter 3 in [Mitchel, 1997]

■ Use Random Forest to process the Iris and Wine datasets.

but how do we know how

You might also like