2EL1730-ML-Lecture05-Trees and Ensemble Learning
2EL1730-ML-Lecture05-Trees and Ensemble Learning
2EL1730
Lecture 5
Tree-based methods and ensemble learning
Thank you!
2
Last lecture
3
Non parametric learning
• Non-parametric learning algorithm (does not mean NO parameters)
• The complexity of the decision function grows with the number of data
points
• Examples:
• K-nearest neighbors (today's lecture)
• Tree-based methods
• Some cased of SVMs
4
k-Nearest Neighbors (kNN) Algorithm
1NN 3NN
Algorithm kNN
• Find k examples (x*i, y *i), i=1,…,k closest to the test instance x
• The output is the majority class
5
Choice of Parameter k
6
In this Lecture
• Decision trees
• Ensemble learning
– Bagging methods
– Boosting methods
• AdaBoost
7
Decision trees
8
Another Classification Idea (1/2)
9
Another Classification Idea (2/2)
10
Example of a Decision Tree
11
Another Example of Decision Tree
12
Decision Trees – Nodes and Branching
NO Refund
Yes No
NO TaxInc
< 80K > 80K
NO YES
13
Decision Tree Classification Task
6 No Medium 60K No
Training Set
Apply Decision
Model
Tid Attrib1 Attrib2 Attrib3 Class Tree
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
14
Apply Model to Test Data
Test Data
Start from the root of tree Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
TaxInc NO
< 80K > 80K
NO YES
15
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
TaxInc NO
< 80K > 80K
NO YES
16
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
TaxInc NO
< 80K > 80K
NO YES
17
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
TaxInc NO
< 80K > 80K
NO YES
18
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
TaxInc NO
< 80K > 80K
NO YES
19
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
TaxInc NO
< 80K > 80K
NO YES
20
Decision Tree Induction – The Idea (1/2)
• Basic algorithm
– Tree is constructed in a top-down recursive manner
– Initially, all the training examples are at the root
– Attributes are categorical (if continuous-valued, they are discretized
in advance)
– Examples are partitioned recursively based on the selected
attributes
– Split attributes are selected on the basis of a heuristic or statistical
measure (e.g., gini index, information gain)
21
Decision Tree Induction – The Idea (2/2)
22
Decision Tree Induction Algorithms
• Many algorithms
– Hunt’s algorithm (one of the earliest)
– CART
– ID3, C4.5
– SLIQ, SPRINT
23
General Structure of Hunt’s Algorithm
24
Hunt’s Algorithm Tid Refund Marital
Status
Taxable
Income Cheat
25
Hunt’s Algorithm Tid Refund Marital
Status
Taxable
Income Cheat
= No Status
Single, Married
Divorced
Cheat Cheat
= Yes = No
26
Hunt’s Algorithm Tid Refund Marital
Status
Taxable
Income Cheat
= No Status Status
Single, Single, Married
Married Divorced
Divorced
Taxable Cheat
Cheat Cheat No
Income
= Yes = No
< 80K >= 80K
Cheat Cheat
= No = Yes
27
Tree Induction
• Greedy strategy
– Split the records based on an attribute test that
optimizes a certain criterion
• Issues
– Determine how to split the records
• How to specify the attribute test condition?
• How to determine the best split?
– Determine when to stop splitting
28
How to Specify Test Condition?
29
Splitting Based on Nominal Attributes
CarType CarType
{Sport, {Family,
Luxury} {Family} OR Luxury} {Sport}
30
Splitting Based on Ordinal Attributes
Size
{Small, Size
Medium} {Large} OR {Medium,
Large} {Small}
31
Splitting Based on Continuous Attributes
Taxable Taxable
Income Income?
> 80K?
< 10K > 80K
Yes No
33
Tree Induction
• Greedy strategy
– Split the records based on an attribute test that
optimizes certain criterion
• Issues
– Determine how to split the records
• How to specify the attribute test condition?
• How to determine the best split?
– Determine when to stop splitting
34
How to Determine the Best Split (1/2)
35
How to Determine the Best Split (1/2)
Purer partition
36
How to Determine the Best Split (2/2)
• Greedy approach:
– Nodes with homogeneous (pure) class distribution
are preferred
C0: 5 C0: 9
• Need a measure
C1: 5 of node impurity:
C1: 1
Non-homogeneous, Homogeneous,
High degree of impurity Low degree of impurity
• Gini Index
• Misclassification error
38
How to Find the Best Split – Gini Index
Before Splitting: C0 N00 Two possible splits on
C1 N01 attributes A or B
A? B?
Yes No Yes No
39
How to Find the Best Split – Gini Index
Before Splitting: C0 N00 M0 Two possible splits on
C1 N01 attributes A or B
A? B?
Yes No Yes No
M1 M2 M3 M4
40
How to Find the Best Split – Gini Index
Before Splitting: C0 N00 M0 Two possible splits on
C1 N01 attributes A or B
A? B?
Yes No Yes No
M1 M2 M3 M4
M12 M34
41
How to Find the Best Split – Gini Index
Before Splitting: C0 N00 M0 Two possible splits on
C1 N01 attributes A or B
A? B?
Yes No Yes No
M1 M2 M3 M4
M12 M34
Gain = M0 – M12 vs. M0 – M34
Reduction in impurity
42
Measure of Impurity: Gini Index
44
Example
Node N1 Node N2
Gini(N1)
= 1 – (5/7)2 – (2/7)2 Split on attribute B
N1 N2
= 0.408
C1 5 1 GiniB
Gini(N2) = 7/12 * 0.408 +
= 1 – (1/5)2 – (4/5)2 C2 2 4
5/12 * 0.32
= 0.32 Gini=0.371 = 0.371
45
Tree Induction
• Greedy strategy
– Split the records based on an attribute test that optimizes certain
criterion
• Issues
– Determine how to split the records
• How to specify the attribute test condition?
• How to determine the best split?
– Determine when to stop splitting
46
Stopping Criteria for Tree Induction
47
Decision Tree Based Classification
• Advantages
– Inexpensive to construct (training phase)
– Extremely fast at testing phase (classifying unseen data)
– Easy to interpret for small-sized trees
– Accuracy is comparable to other classification techniques for
many simple data sets
48
Overfitting and Tree Pruning
49
scikit-learn
http://scikit-
learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.
html#sklearn.tree.DecisionTreeClassifier
50
Ensemble learning
51
Ensemble Methods
52
Ensemble Learning – General Idea
Original
D Training data
Step 1:
Create Multiple D1 D2 .... Dt-1 Dt
Data Sets
Step 2:
Build Multiple C1 C2 Ct -1 Ct
Classifiers
Step 3:
Combine C*
Classifiers
53
Ensemble Methods: Summary
• Boosting
– Sequential training, iteratively re-weighting the training examples –
the current classifier focuses on hard examples
– E.g., Adaboost
54
Bagging vs. Boosting
Source: https://quantdare.com/what-is-the-difference-between-bagging-and-boosting/
55
Bagging: Bootstrap Estimation
56
Bagging
• Simple idea
– Generate M bootstrap samples from your original training set
– Train on each one to get y m , and average them
....…
58
Random Forest
Take the
majority
vote
....…
....…
59
Random Forest - Algorithm
60
Boosting
61
AdaBoost: Making Weak Learners Stronger
• Can you apply this learning module many times to get a strong
learner that can get close to zero error rate on the training data?
– ML theorists showed how to do this and it actually led to an
effective new learning procedure (Freund & Shapire, 1996)
– The AdaBoost
62
AdaBoost – The Idea
• First train the base classifier on all the training data with equal
importance weights on each case
• Then, re-weight the training data to emphasize the hard cases
and train a second model
– Instances that were misclassified in the previous step
– Q: How do we re-weight the data?
• Keep training new models on the re-weighted data
• Finally, use a weighted committee of all the models for the test
data
– How do we weight the models in the committee?
63
How to Train Each Classifier
• Input:
• Output:
• Weight of instance (e.g., data point) xi for classifier t:
1 if error, 0 otherwise
64
Weight of Instances for Classifier t
γt
66
AdaBoost Pseudocode
Examine if we have
encountered errors or not
67
scikit-learn
http://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifi
er.html
http://scikit-learn.org/stable/modules/ensemble.html
68
Next Class
69
Thank You!
70