Unit en Decision Trees Algorithms
Unit en Decision Trees Algorithms
ML:III-66
Decision Trees
STEIN/LETTMANN 2005-2015
[ML Introduction] :
C is a set of classes.
ML:III-67
Decision Trees
STEIN/LETTMANN 2005-2015
[ML Introduction] :
C is a set of classes.
ML:III-68
Decision Trees
STEIN/LETTMANN 2005-2015
Then add a leaf node with label of the most common value of Target in D.
Else add the subtree ID3(D_a, Attributes \ {A*}, Target).
q Return t.
ML:III-69
Decision Trees
STEIN/LETTMANN 2005-2015
[algorithm template]
ML:III-70
Decision Trees
STEIN/LETTMANN 2005-2015
[algorithm template]
ML:III-71
Decision Trees
STEIN/LETTMANN 2005-2015
[algorithm template]
ML:III-72
Decision Trees
STEIN/LETTMANN 2005-2015
[algorithm template]
ML:III-73
Decision Trees
STEIN/LETTMANN 2005-2015
Remarks:
q Target designates the feature (= attribute) that is comprised of the labels according to
which an example can be classified. Within Mitchells algorithm the respective class labels
are + and , modeling the binary classification situation. In the pseudo code version,
Target may be comprised of multiple (more than two) classes.
q Step 3 of of the ID3 algorithm checks the purity of D and, given this case, assigns the
ML:III-74
Decision Trees
STEIN/LETTMANN 2005-2015
ML:III-75
Color
Size
Points
Eatability
red
small
yes
toxic
2
3
4
5
brown
brown
green
red
small
large
small
large
no
yes
no
no
eatable
eatable
eatable
eatable
Decision Trees
STEIN/LETTMANN 2005-2015
(continued)
Top-level call of ID3. Analyze a splitting with regard to the feature color :
D|color
toxic eatable
red
1
1
=
brown
0
2
green
0
1
ML:III-76
Decision Trees
2
= 0.4,
5
pbrown =
2
= 0.4,
5
pgreen =
1
= 0.2
5
STEIN/LETTMANN 2005-2015
(continued)
Top-level call of ID3. Analyze a splitting with regard to the feature color :
D|color
toxic eatable
red
1
1
=
brown
0
2
green
0
1
2
= 0.4,
5
pbrown =
2
= 0.4,
5
pgreen =
1
= 0.2
5
H(C | size)
0.55
H(C | points) = 0.4
ML:III-77
Decision Trees
STEIN/LETTMANN 2005-2015
Remarks:
q The smaller H(C | feature) is, the larger becomes the information gain. Hence, the
difference H(C) H(C | feature) needs not to be computed since H(C) is constant within
each recursion step.
q In the example, the information gain in the first recursion step is maximum for the two
ML:III-78
Decision Trees
STEIN/LETTMANN 2005-2015
(continued)
no
color
size
eatability
color
size
eatability
red
brown
small
large
toxic
eatable
brown
green
red
small
small
large
eatable
eatable
eatable
ML:III-79
Decision Trees
STEIN/LETTMANN 2005-2015
(continued)
no
attribute: color
red
brown
green
size
eatability
size
small
toxic
-/-
eatability
-/-
size
eatability
large
eatable
color
size
eatability
brown
green
red
small
small
large
eatable
eatable
eatable
ML:III-80
Decision Trees
STEIN/LETTMANN 2005-2015
(continued)
no
label: eatable
attribute: color
red
label: toxic
green
label: toxic
brown
label: eatable
Break of a tie: choosing the class toxic for Dgreen in Step 6 of the ID3 algorithm.
ML:III-81
Decision Trees
STEIN/LETTMANN 2005-2015
...
A1
+
+o o
A2
+
+ o
A2
+
+
+
A2
A3 -
Decision Trees
+
o
...
ML:III-82
...
A4 -
...
...
STEIN/LETTMANN 2005-2015
To generate a decision tree, the ID3 algorithm needs per branch at most as
many decisions as features are given.
no backtracking takes place
local optimization of decision trees
ML:III-83
Decision Trees
STEIN/LETTMANN 2005-2015
To generate a decision tree, the ID3 algorithm needs per branch at most as
many decisions as features are given.
no backtracking takes place
local optimization of decision trees
ML:III-84
Decision Trees
STEIN/LETTMANN 2005-2015
To generate a decision tree, the ID3 algorithm needs per branch at most as
many decisions as features are given.
no backtracking takes place
local optimization of decision trees
Is this justified?
ML:III-85
Decision Trees
STEIN/LETTMANN 2005-2015
Remarks:
q Let Aj be the finite domain (the possible values) of feature Aj , j = 1, . . . , p, and let C be a
set of classes. Then, a hypothesis space H that is comprised of all decision trees
corresponds to the set of all functions h, h : A1 . . . Ap C. Typically, C = {0, 1}.
q The inductive bias of the ID3 algorithm is of a different kind than the inductive bias of the
ML:III-86
Decision Trees
STEIN/LETTMANN 2005-2015
[ML Introduction] :
C is a set of classes.
ML:III-87
Decision Trees
STEIN/LETTMANN 2005-2015
[ML Introduction] :
C is a set of classes.
ML:III-88
Decision Trees
STEIN/LETTMANN 2005-2015
(continued)
ML:III-89
Decision Trees
STEIN/LETTMANN 2005-2015
(continued)
ML:III-90
Decision Trees
STEIN/LETTMANN 2005-2015
(continued)
Illustration for two numeric features; i.e., the feature space X corresponds to a
two-dimensional plane:
t1 X(t1)
X(t7)
t2
t3
X(t2)
X(t4)
X(t3)
X(t6)
t5 X(t5)
t4 X(t4)
c3
X(t8)
c1
c3
t6 X(t6)
c1
X(t7)
c2
X(t8)
X(t9)
X = X(t1)
X(t9)
ML:III-91
Decision Trees
STEIN/LETTMANN 2005-2015