Decision Tree Classifiers
Decision Tree Classifiers
Avinash Kak
Purdue University
December 6, 2022
12:12 Noon
CONTENTS
Page
1 Introduction 3
2 Entropy 8
3 Conditional Entropy 13
4 Average Entropy 15
14 Incorporating Bagging 64
15 Incorporating Boosting 70
20 Acknowledgments 95
2
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
1: Introduction
3
Decision Trees An RVL Tutorial by Avi Kak
result of the test, and the decision d i the diagnosis. In drug discovery, each feature f i could
represent the name of an ingredient in the drug, the value val i the proportion of that
ingredient, and the decision d i the effectiveness of the drug in a drug trial. In a Wall Street
sort of an application, each feature could represent a criterion (such as the price-to-earnings
12 of this tutorial. ]
Let’s say that your new data record for which you need to make
a decision looks like:
new_val_1 new_val_2 new_val_2 .... new_val_N
the decision tree will spit out the best possible decision to
make for this new data record. The “quality” of this decision
would obviously depend on the quality of the training data, as
explained in Section 12.
4
Decision Trees An RVL Tutorial by Avi Kak
all the training data taken together to the data as partitioned by the feature test. ]
One then drops from the root node a set of child nodes, one for
each value of the feature tested at the root node for the case of
symbolic features. For the case when a numeric feature is tested
at the root node, one drops from the root node two child nodes,
one for the case when the value of the feature tested at the root
is less than the decision threshold chosen at the root and the
other for the opposite case.
5
Decision Trees An RVL Tutorial by Avi Kak
– For each object shown, all that we tell the computer is its class
label. We do NOT tell the computer how to discriminate between
the objects belonging to the different classes.
– For image data, these features could be color and texture attributes
and the presence or absence of shape primitives. [For depth data, the
features could be different types of curvatures of the object surfaces and junctions formed
by the joins between the surfaces, etc.]
6
Decision Trees An RVL Tutorial by Avi Kak
– The job given to the computer: From the data thus collected, it
must figure out on its own how to best discriminate
between the objects belonging to the different classes. [That
is, the computer must learn on its own what features to use for discriminating between
https://engineering.purdue.edu/RVL/Publications/Grewe95Interactive.pdf
7
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
2: Entropy
each entry in the last column of that table is one of “apple,” “orange,” and “pear,” then your
What is entropy?
8
Decision Trees An RVL Tutorial by Avi Kak
8
X 1 1
H(X) = − log2
i=1
8 8
8
X 1
= − log2 2−3
i=1
8
= 3 bits (2)
9
Decision Trees An RVL Tutorial by Avi Kak
In this case,
64
X 1 1
H(X) = − log2
i=1
64 64
8
X 1
= − log2 2−6
i=1
8
= 6 bits (3)
10
Decision Trees An RVL Tutorial by Avi Kak
___ 1.0
| |
| |
histogram | |
(normalized) | |
| |
| 0 | 0 | 0 | 0 | | ........ | 0 | 0 |
------------------------------------------------->
1 2 3 .. k ........ 63 64
p(xi) = 1 xi = k
= 0 otherwise (4)
N
X
H(X) = − p(xi) log2 p(xi)
i=1
= − 1 × log2 1 bits
= 0 bits (5)
11
Decision Trees An RVL Tutorial by Avi Kak
So we see that the entropy becomes zero when X has zero chaos.
12
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
3: Conditional Entropy
In general,
H(Y |X) = H(X, Y ) − H(X) (6)
13
Decision Trees An RVL Tutorial by Avi Kak
The formula shown for H(Y |X) in Eq. (6) is the average of
H Y X = a over all possible instantiations a for X, as I’ll
establish in the next section.
14
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
4: Average Entropies
15
Decision Trees An RVL Tutorial by Avi Kak
X
H Y X = H Y X = a · p(X = a)
a
XnX o
= − p yj X = a log2 p yj X = a p(X = a)
a j
XX
= − p yj xi log2 p yj xi p(xi )
i j
X X p(xi , yj ) p(xi , yj )
= − log2 p(xi )
i j
p(xi ) p(xi )
XX h i
= − p(xi , yj ) log2 p(xi , yj ) − log2 p(xi )
i j
XX
= H(X, Y ) + p(xi , yj ) log2 p(xi )
i j
X
= H(X, Y ) + p(xi ) log2 p(xi )
i
a marginal probability when a joint probability is summed with respect to its free variable.]
16
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
In this and the next section, I’ll assume that each feature fi
takes on a discrete value from a known set of such values. I’ll
refer to such features as symbolic features. [When the feature values
cannot be assumed to be discrete in the sense described here, I’ll refer to those as numeric
17
Decision Trees An RVL Tutorial by Avi Kak
Eq. (10). ]
The entropy in each term on the right hand side of Eq. (12) can
be calculated by
X
H C v(f ) = a = − p Cm v(f ) = a ·log2 p Cm v(f ) = a (13)
m
18
Decision Trees An RVL Tutorial by Avi Kak
But how do we figure out p Cm v(f ) = a that is needed on
the right hand side?
from the population of all objects belonging to all classes. Say the sensor system is
allowed to measure K different kinds of features: f1 , f2 , ...., fK . For each feature fk , the
sensor system keeps a count of the number of objects that gave rise to the v(fk ) = a
value. Now we estimate p(Cm v(f ) = a) for any choice of f = fk simply by counting off
the number of objects from class Cm that exhibited the v(fk ) = a measurement.
Bayes’ Theorem:
p v(f ) = a Cm · p(Cm)
p Cm v(f ) = a = (14)
p v(f ) = a
19
Decision Trees An RVL Tutorial by Avi Kak
20
Decision Trees An RVL Tutorial by Avi Kak
21
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
Now that you know how to use the class entropy to find the
best feature that will discriminate between the classes, we will
now extend this idea and show how you can construct a decision
tree. Subsequently the tree may be used to classify future
samples of data.
If you are not familiar with a decision tree, the best way to
appreciate the idea is to first revisit the traditional approach to
the classification of multi-dimensional data. In the traditional
approach, you start with a feature space whose dimensionality is
the same as that of the data.
22
Decision Trees An RVL Tutorial by Avi Kak
neighbor classification.)
When you construct a decision tree, you select for the root node
of the tree a feature test that can be expected to maximally
disambiguate the class labels that could be associated with the
data you are trying to classify.
You then attach to the root node a set of child nodes, one for
each value of the feature you chose at the root node. [This
statement is not entirely accurate. As you will see later, for the case of symbolic features, you
create child nodes for only those feature values (for the feature chosen at the root node) that
] Now at
reduce the class entropy in relation to the value of the class entropy at the root.
each child node you pose the same question that you posed
when you found the best feature to use at the root node: What
feature at the child node in question would maximally
disambiguate the class labels to be associated with a given data
vector assuming that the data vector passed the root node on
the branch that corresponds to the child node in question. The
feature that is best at each node is the one that causes the
maximal reduction in class entropy at that node.
23
Decision Trees An RVL Tutorial by Avi Kak
Assume that the feature selected at the root node is fj and that
we are now at one of the child nodes hanging from the root. So
the question now is how do we select the best feature to use at
the child node.
That is, for any feature f not previously used at the root, we
find the conditional entropy (with respect to our choice for f ) when we
are on the v(fj ) = aj branch:
24
Decision Trees An RVL Tutorial by Avi Kak
H C f, v(fj ) = aj =
X
H C v(f ) = b, v(fj ) = aj · p v(f ) = b, v(fj ) = aj (16)
b
1 In my lab at Purdue, we refer to such normalizations in the calculation of average entropy as “JZ Normalization”—
25
Decision Trees An RVL Tutorial by Avi Kak
p v(f ) = b Cm · p v(fj ) = aj Cm · p Cm
= (19)
p v(f ) = b · p v(fj ) = aj
26
Decision Trees An RVL Tutorial by Avi Kak
f = 1
j f = 2
j
You will add other child nodes to the root in the same manner,
with one child node for each value that can be taken by the
feature fj .
27
Decision Trees An RVL Tutorial by Avi Kak
The children of the root are assigned the entropy H(C fj ) that
resulted in their creation.
A child node of the root that is on the branch v(fj ) = aj gets its
own feature test (and is split further) ifand only if we can find
a feature fk such that H C fk , v(fj ) = aj is less than the entropy
H(C fj ) inherited by the child from the root.
If the condition H C f, v(fj ) = aj < H C fj ) cannot be
satisfied at the child node on the branch v(fj ) = aj of the root
for any feature f 6= fj , the child node remains without a feature
test and becomes a leaf node of the decision tree.
28
Decision Trees An RVL Tutorial by Avi Kak
Next, we will subject the data vector to the feature test at the
child node on that branch. We will continue this process until
we have used up all the feature values in the data vector.
That should put us at one of the nodes, possibly a leaf node.
29
Decision Trees An RVL Tutorial by Avi Kak
p Cm v(fj ) = aj , v(fk ) = bk , . . . =
p v(fj ) = aj , v(fk ) = bk , . . . Cm × p Cm
(21)
p v(fj ) = aj , v(fk ) = bk , . . .
Y
p v(fj ) = aj , v(fk ) = bk , . . . Cm = p v(f ) = value Cm
f on branch
(22)
30
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
31
Decision Trees An RVL Tutorial by Avi Kak
H C vth(f ) = θ = H C v(f ) ≤ θ × p v(f ) ≤ θ +
H C v(f ) > θ × p v(f ) > θ (23)
The left side in the equation shown above is the average entropy
for the two parts considered separately on the right hand side.
The threshold for which this average entropy is the minimum
is the best threshold to use for the numeric feature f .
The component entropies on the right hand side in Eq. (23) can
be calculated as follows:
X
H C v(f ) ≤ θ = − p Cm v(f ) ≤ θ × log2 p Cm v(f ) ≤ θ
m
(24)
32
Decision Trees An RVL Tutorial by Avi Kak
and
X
H C v(f ) > θ = − p Cm v(f ) > θ × log2 p Cm v(f ) > θ
m
(25)
and
p v(f ) > θ Cm × p(Cm)
p Cm v(f ) > θ = (27)
p v(f ) > θ
later. After both the numerators have been estimated, you invoke the normalization
constraints shown below for the two LHS’s above for estimatig the denominators. ]
X
p Cm v(f ) ≤ θ = 1 (28)
m
33
Decision Trees An RVL Tutorial by Avi Kak
and
X
p Cm v(f ) > θ = 1 (29)
m
Now we are all set to use this partitioning logic to choose the
best feature for the root node of our decision tree. We proceed
as follows.
After finding the best feature for the root node in the manner
described above, we can drop two branches from it, one for the
training samples for which v(f ) ≤ θ and the other for the
samples for which v(f ) > θ as shown in the figure below:
v( f ) <= θ v( f ) > θ
j j
Now that you know how to construct the root node for the case
when you have just numeric features or a mixture of numeric
35
Decision Trees An RVL Tutorial by Avi Kak
Let’s say we used the numeric feature fj , along with its decision
threshold θj , at the root and that the choice of the best feature
to use at the left child turns out to be fk and that its best
decision threshold is θk .
The choice (fk , θk ) at the left child of the root must be the best
possible among all possible features and all possible thresholds
for those features so that following average entropy is minimized:
36
Decision Trees An RVL Tutorial by Avi Kak
H C v(fj ) ≤ θj , v(fk ) ≤ θk × p v(fj ) ≤ θj , v(fk ) ≤ θk
+
H C v(fj ) ≤ θj , v(fk ) > θk × p v(fj ) ≤ θk , v(fk ) > θk (30)
At this point, our decision tree will look like what is shown
below:
Feature Tested at Root
f
j
v( f ) < θ v( f ) >= θ
j j j j
Feature Tested
Feature Tested
f
k f
l
v( f ) < θ
k k v( f ) >= θ
k k
37
Decision Trees An RVL Tutorial by Avi Kak
root node of the tree. Next we consider the two children of this node: We must decide what
feature is the best at the left child node and what feature is the best at the right child node.
To answer these questions at the two child nodes, we put the parent’s feature fj back in
contention — which may seem strange at first sight. But note that, strictly speaking, fj at the
root is NOT the same as fj at the child nodes of the root. At the root, the value interval
associated with fj is [vmin (fj ), vmax (fj )). However, at the left child node of the root, the value
interval associated with nominally the same fj is [vmin (fj ), θj ) and at the right brach the value
38
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
dt = DecisionTree.DecisionTree(
training_datafile = training_datafile,
csv_class_column_index = 2,
csv_columns_for_features = [3,4,5,6,7,8],
entropy_threshold = 0.01,
max_depth_desired = 3,
symbolic_to_numeric_cardinality_threshold = 10,
csv_cleanup_needed = 1,
)
39
Decision Trees An RVL Tutorial by Avi Kak
Regarding the parameter max depth desired, note that the tree
is grown in a depth-first manner to the maximum depth set by
this parameter.
41
Decision Trees An RVL Tutorial by Avi Kak
Here is an example of the syntax used for a test sample and the
call you need to make to classify it:
test_sample = [’g2 = 4.2’,
’grade = 2.3’,
’gleason = 4’,
’eet = 1.7’,
’age = 55.0’,
’ploidy = diploid’]
For large test datasets, see the next section for the
demonstration scripts that show you how you can classify all
your data records in a CSV file in one go.
https://engineering.purdue.edu/kak/distDT/DecisionTree-3.4.3.html
43
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
Please read the documentation at the CPAN site for the API of
this software package. [If clicking on the link shown above does not work you, you can also
just do a Google search on “Algorithm::DecisionTree” and go to Version 3.43 when you get to the CPAN page
]
for the module.
my $dt = Algorithm::DecisionTree->new(
training_datafile => $training_datafile,
csv_class_column_index => 2,
csv_columns_for_features => [3,4,5,6,7,8],
entropy_threshold => 0.01,
max_depth_desired => 8,
symbolic_to_numeric_cardinality_threshold => 10,
csv_cleanup_needed => 1,
);
44
Decision Trees An RVL Tutorial by Avi Kak
45
Decision Trees An RVL Tutorial by Avi Kak
module, you read in the training data file and initialize the
probability cache by calling:
$dt->get_training_data();
$dt->calculate_first_order_probabilities();
$dt->calculate_class_priors();
Now you are ready to construct a decision tree for your training
data by calling:
$root_node = $dt->construct_decision_tree_classifier();
With that, you are ready to start classifying new data samples:
Let’s say that your data record looks like:
my @test_sample = qw / g2=4.2
grade=2.3
gleason=4
eet=1.7
age=55.0
ploidy=diploid /;
46
Decision Trees An RVL Tutorial by Avi Kak
http://search.cpan.org/~ avikak/Algorithm-DecisionTree-3.43/
48
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
For the case of Perl, see the following scripts in the Examples
directory of the module for bulk classification of data records:
classify test data in a file.pl
And for the case of Python, check out the following script in the
Examples directory for doing the same things:
classify test data in a file.py
49
Decision Trees An RVL Tutorial by Avi Kak
datafile, the second the test datafile, and the third the file in
which the classification results will be deposited.
bagging_for_bulk_classification.py
boosting_for_bulk_classification.py
classify_database_records.py
50
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
51
Decision Trees An RVL Tutorial by Avi Kak
52
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
Here are the four most important reasons for why a given
training data file may be of poor quality: (1) Insufficient
number of data samples to adequately capture the statistical
distributions of the feature values as they occur in the real
world; (2) The distributions of the feature values in the training
file not reflecting the distribution as it occurs in the real world;
(3) The number of the training samples for the different classes
not being in proportion to the real-world prior probabilities of
the classes; and (4) The features not being statistically
independent.
The code fragment shown below illustrates how you invoke the
testing function of the EvalTrainingData class in the Python
version of the module:
training_datafile = "training3.csv"
eval_data = DecisionTree.EvalTrainingData(
training_datafile = training_datafile,
csv_class_column_index = 1,
csv_columns_for_features = [2,3],
entropy_threshold = 0.01,
max_depth_desired = 3,
symbolic_to_numeric_cardinality_threshold = 10,
csv_cleanup_needed = 1,
)
eval_data.get_training_data()
eval_data.evaluate_training_data()
my $eval_data = EvalTrainingData->new(
training_datafile => $training_datafile,
csv_class_column_index => 1,
csv_columns_for_features => [2,3],
entropy_threshold => 0.01,
max_depth_desired => 3,
symbolic_to_numeric_cardinality_threshold => 10,
csv_cleanup_needed => 1,
);
$eval_data->get_training_data();
$eval_data->evaluate_training_data()
56
Decision Trees An RVL Tutorial by Avi Kak
evaluate_training_data1.pl
evaluate_training_data2.pl
57
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
the training samples are non-uniformly distributed, it is entirely possible for the estimated
probability densities to be non-zero in a small region around a point even when there are no training
samples specifically in that region. (After you have created a statistical model for, say, the height
distribution of people in a community, the model may return a non-zero probability for the height
values in a small interval even if the community does not include a single individual whose height
]
falls in that interval.)
58
Decision Trees An RVL Tutorial by Avi Kak
The point made above implies that the path leading to a node
in the decision tree may test a feature for a certain value or
threshold despite the fact that the portion of the feature space
assigned to that node is devoid of any training data.
The second of the three scripts listed above descends down the
decision tree and shows for each node the training samples that
fall directly in the portion of the feature space assigned to that
node.
The last of the three script listed above shows for each training
sample how it affects the decision-tree nodes either directly or
indirectly through the generalization achieved by the
probabilistic modeling of the data.
60
Decision Trees An RVL Tutorial by Avi Kak
looks like:
Node 0: the samples are: None
Node 1: the samples are: [’sample_46’, ’sample_58’]
Node 2: the samples are: [’sample_1’, ’sample_4’, ’sample_7’, .....
Node 3: the samples are: []
Node 4: the samples are: []
...
...
The nodes for which no samples are listed come into existence
through the generalization achieved by the probabilistic
modeling of the data.
looks like:
sample_1:
nodes affected directly: [2, 5, 19, 23]
nodes affected through probabilistic generalization:
2=> [3, 4, 25]
25=> [26]
5=> [6]
6=> [7, 13]
7=> [8, 11]
8=> [9, 10]
11=> [12]
13=> [14, 18]
14=> [15, 16]
61
Decision Trees An RVL Tutorial by Avi Kak
16=> [17]
19=> [20]
20=> [21, 22]
23=> [24]
sample_4:
nodes affected directly: [2, 5, 6, 7, 11]
nodes affected through probabilistic generalization:
2=> [3, 4, 25]
25=> [26]
5=> [19]
19=> [20, 23]
20=> [21, 22]
23=> [24]
6=> [13]
13=> [14, 18]
14=> [15, 16]
16=> [17]
7=> [8]
8=> [9, 10]
11=> [12]
...
...
...
For each training sample, the display shown above first presents
the list of nodes that are directly affected by the sample. A node
is affected directly by a sample if the latter falls in the portion
of the feature space that belongs to the former. Subsequently,
for each training sample, the display shows a subtree of the
nodes that are affected indirectly by the sample through the
generalization achieved by the probabilistic modeling of the
data. In general, a node is affected indirectly by a sample if it is
a descendant of another node that is affected directly.
63
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
64
Decision Trees An RVL Tutorial by Avi Kak
65
Decision Trees An RVL Tutorial by Avi Kak
max_depth_desired => 8,
symbolic_to_numeric_cardinality_threshold=>10,
how_many_bags => 4,
bag_overlap_fraction => 0.2,
csv_cleanup_needed => 1,
);
get training data for bagging(): This method reads your training datafile,
randomizes it, and then partitions it into the specified number of bags.
Subsequently, if the constructor parameter bag overlap fraction is some
positive fraction, it adds to each bag a number of additional samples drawn at
random from the other bags. As to how many additional samples are added to
each bag, suppose the parameter bag overlap fraction is set to 0.2, the size of
each bag will grow by 20% with the samples drawn from the other bags.
show training data in bags(): Shows for each bag the name-tags of the training
data samples in that bag.
calculate class priors(): Calls on the appropriate method of the main DecisionTree
class to estimate the class priors for the data classes found in each bag.
construct decision trees for bags(): Calls on the appropriate method of the main
DecisionTree class to construct a decision tree from the training data in each bag.
display decision trees for bags(): Display separately the decision tree for each
bag.
classify with bagging( test sample ): Calls on the appropriate methods of the
main DecisionTree class to classify the argument test sample.
get majority vote classification(): Using majority voting, this method aggregates
the classification decisions made by the individual decision trees into a single
decision.
67
Decision Trees An RVL Tutorial by Avi Kak
See the example scripts in the directory ExamplesBagging for how to call these
methods for classifying individual samples and for bulk classification when you
place all your test samples in a single file.
bagging_for_bulk_classification.py
bagging_for_bulk_classification.pl
As the name of the script implies, the first Perl or Python script
named on the previous slide shows how to call the different
methods of the DecisionTreeWithBagging class for
classifying a single test sample.
68
Decision Trees An RVL Tutorial by Avi Kak
The second script named on the previous slide is for the case
when you place all of the test samples in a single file. The
demonstration script displays for each test sample a single
aggregate classification decision that is obtained through
majority voting by all the decision trees.
69
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
If you are not familiar with boosting, you may want to first
browse through my tutorial “AdaBoost for Learning Binary and
Multiclass Discriminations” that is available at:
https://engineering.purdue.edu/kak/Tutorials/AdaBoost.pdf
boosted = BoostedDecisionTree.BoostedDecisionTree(
training_datafile = training_datafile,
csv_class_column_index = 1,
csv_columns_for_features = [2,3],
entropy_threshold = 0.01,
max_depth_desired = 8,
symbolic_to_numeric_cardinality_threshold = 10,
how_many_stages = 10,
csv_cleanup_needed = 1,
)
72
Decision Trees An RVL Tutorial by Avi Kak
max_depth_desired => 8,
symbolic_to_numeric_cardinality_threshold=>10,
how_many_stages => 4,
csv_cleanup_needed => 1,
);
get training data for base tree(): In this method name, the string
base tree refers to the first tree of the cascade. This is the tree for
which the training samples are drawn assuming a uniform
distribution over the entire dataset. This method reads your
training datafile, creates the data structures from the data ingested
for constructing the base decision tree.
show training data for base tree(): Shows the training data
samples and some relevant properties of the features used in the
training dataset.
73
Decision Trees An RVL Tutorial by Avi Kak
74
Decision Trees An RVL Tutorial by Avi Kak
that illustrate how you can use boosting with the help of the
BoostedDecisionTree class. The Perl version of the module
contains the following similarly named scripts in its
ExamplesBoosting subdirectory:
boosting_for_classifying_one_test_sample_1.pl
boosting_for_classifying_one_test_sample_2.pl
boosting_for_bulk_classification.pl
75
Decision Trees An RVL Tutorial by Avi Kak
As implied by the names of the first two scripts, these show how
to call the different methods of the BoostedDecisionTree
class
The third script listed on the previous slide is for the case when
you place all of the test samples in a single file. The
demonstration script displays for each test sample a single
aggregate classification decision that is obtained through
trust-factor weighted majority voting by all the decision trees.
76
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
77
Decision Trees An RVL Tutorial by Avi Kak
trees, your final classification decision could then be based on, say,
majority voting by the trees.
import RandomizedTreesForBigData
training_datafile = "MyLargeDatabase.csv"
rt = RandomizedTreesForBigData.RandomizedTreesForBigData(
training_datafile = training_datafile,
csv_class_column_index = 48,
csv_columns_for_features = [39,40,41,42],
entropy_threshold = 0.01,
max_depth_desired = 8,
symbolic_to_numeric_cardinality_threshold = 10,
looking_for_needles_in_haystack = 1,
how_many_trees = 5,
csv_cleanup_needed = 1,
)
Except for obvious changes, the syntax is very similar for the
Perl case also.
78
Decision Trees An RVL Tutorial by Avi Kak
how_many_trees
79
Decision Trees An RVL Tutorial by Avi Kak
how_many_training_samples_per_tree = 50,
how_many_trees = 17,
csv_cleanup_needed = 1,
)
Again, except for obvious changes, the syntax is very similar for
the Perl version of the module.
The first parameter will set the number of samples that will be
drawn randomly from the training database and the second the
number of decision trees that will be constructed.
IMPORTANT: When you set the
how many training samples per tree parameter, you are not
allowed to also set the looking for needles in haystack
parameter, and vice versa.
get training data for N trees(): What this method does depends on which of the
two constructor parameters, looking for needles in haystack or
how many training samples per tree, is set. When the former is set, it creates
a collection of training datasets for how many trees number of decision trees,
with each dataset being a mixture of the minority class and sample drawn
80
Decision Trees An RVL Tutorial by Avi Kak
randomly from the majority class. However, when the latter option is set, all the
datasets are drawn randomly from the training database with no particular
attention given to the relative populations of the two classes.
show training data for all trees(): As the name implies, this method shows the
training data being used for all the decision trees. This method is useful for
debugging purposes using small datasets.
construct all decision trees(): Calls on the appropriate method of the main
DecisionTree class to construct the decision trees.
display all decision trees(): Displays all the decision trees in your terminal
window. (The textual form of the decision trees is written out to the standard
output.)
classify with all trees(): A test sample is sent to each decision tree for
classification.
display classification results for all trees(): The classification decisions returned
by the individual decision trees are written out to the standard output.
81
Decision Trees An RVL Tutorial by Avi Kak
randomized_trees_for_classifying_one_test_sample_2.py
classify_database_records.py
The third script listed on the previous page illustrates how you
can evaluate the classification power of an ensemble of decision
trees as constructed by RandomizedTreesForBigData by
classifying a large number of test samples extracted randomly
from the training database.
82
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
With just the two features f1 and f2 , let’s say that our decision
tree looks like what is shown in the figure below:
83
Decision Trees An RVL Tutorial by Avi Kak
1
Feature Tested at Root
f
1
v( f ) <= θ
1 1 v( f ) > θ
1 1
2 5
Feature Tested
Feature Tested
f
2 f
2
v( f ) <= θ v( f ) > θ
2 2 2 2
3 4
Feature Tested
Feature Tested
f
1 f
1
Note that when we create two child nodes at any node in the
tree, we are dividing up a portion of the underlying feature
space, the portion that can be considered to be allocated to the
node in question.
84
Decision Trees An RVL Tutorial by Avi Kak
umax
4
5
f2 values
3
umin
vmin vmax
f values
1
85
Decision Trees An RVL Tutorial by Avi Kak
The resulting divisions in the feature space will look like what is
shown in the figure on the next slide.
86
Decision Trees An RVL Tutorial by Avi Kak
umax
4
5
f2 values
3
umin
vmin vmax
f values
1
87
Decision Trees An RVL Tutorial by Avi Kak
would use the built-in hash data structure for creating such a
hash table. You can do the same in Python with a dictionary.
88
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
It is for such situations that you will find useful the new class
RegressionTree that is now a part of the DecisionTree module.
If interested in regression, look for the RegressionTree class in
Version 3.4.3 of the Python module and in Version 3.43 of the
Perl module.
For both the Perl and the Python cases, the RegressionTree
class has been programmed as a subclass of the main
89
Decision Trees An RVL Tutorial by Avi Kak
90
Decision Trees An RVL Tutorial by Avi Kak
You will find the RegressionTree class easy to use in your own
scripts. See my Regression Tree tutorial at:
https://engineering.purdue.edu/kak/Tutorials/RegressionTree.pdf
for how to call the constructor of this class and how to invoke
the functionality incorporated in it.
91
Decision Trees An RVL Tutorial by Avi Kak
92
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
From the human supplied class labels and pose information, the
computer constructed a decision tree in the manner described in
the previous sections of this tutorial. Subsequently, the decision
tree was converted into a hash table for fast classification.
The testing phase consisted of the robot using the hash table
constructed during the learning phase to recognize the objects
93
Decision Trees An RVL Tutorial by Avi Kak
and to estimate their poses. The fact that the robot successfully
manipulated the objects established for us the viability of using
decision-tree based learning in the context of robot vision.
https://engineering.purdue.edu/RVL/Publications/Grewe95Interactive.pdf
94
Decision Trees An RVL Tutorial by Avi Kak
Back to TOC
20: Acknowledgments
In one form or another, decision trees have been around for over sixty years
now. From a statistical perspective, they are closely related to classification
and regression by recursive partitioning of multidimensional data. Early
work that demonstrated the usefulness of such partitioning of data for
classification and regression can be traced in the statistics community to
the work done by Terry Therneau in the early 1980’s, and, in the machine
learning community, to the work of Ross Quinlan in the mid 1990’s.
During the time they were in RVL, I enjoyed several animated conversations
with Josh Zapf and Padmini Jaikumar on the topic of decision tree
induction. As a matter of fact, this tutorial was prompted by my
conversations with Josh regarding Lynne Grewe’s implementation of
decision-tree induction for computer vision applications. (As I mentioned in
Section 19, Lynne Grewe was a former Ph.D. student of mine in Purdue
Robot Vision Lab.) To the best of my memory, Josh and I basically decided
to agree to disagree about some some aspects of how Lynne had
implemented the decision-tree algorithm and the theoretical consequences
thereof. Come to think about it, our lives would be very dull if people
always agreed with one another all the time.
95