0% found this document useful (0 votes)
141 views3 pages

Randomized Decision Trees II: 1 Feature Selection

This document summarizes randomized decision trees and boosting algorithms. It discusses randomized feature selection for decision trees using a subset of high-dimensional features. It then explains boosting intuition as combining multiple weak learners to produce a stronger learner. The full boosting algorithm trains weak learners sequentially on examples weighted by previous performance, upweighting misclassified examples on each round to produce an "expert" learner.

Uploaded by

Christine Straub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views3 pages

Randomized Decision Trees II: 1 Feature Selection

This document summarizes randomized decision trees and boosting algorithms. It discusses randomized feature selection for decision trees using a subset of high-dimensional features. It then explains boosting intuition as combining multiple weak learners to produce a stronger learner. The full boosting algorithm trains weak learners sequentially on examples weighted by previous performance, upweighting misclassified examples on each round to produce an "expert" learner.

Uploaded by

Christine Straub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Randomized Decision Trees II

compiled by Alvin Wan from Professor Jitendra Maliks lecture

1 Feature Selection

Note that depth-limited trees have a finite number of combinations.

1.1 Randomized Feature Selection

Suppose X is 1000-dimensional. We can randomly select a subset of features to create a new


decision tree. This is called randomized feature selection. If the features are nominal,
such as hair color. Our questions will simply compare: black or brown? brown?

2 Boosting

Our first intuition is the wisdom of the crowds. Our second is that we want experts for
different types of samples. In other words, some trees perform better on particular samples.
How do we give each tree a different weight? This note will cover only the algorithm and
not the proof.

2.1 Intuition

Let us consider the boosting algorithm first proposed, trimmed.

1. Train weak learner.

2. Get weak hypothesis, ht : X {1, +1}.

3. Choose t .

1
At is a single decision tree with error rate t . t is the weighting for a decision tree t.

1 1 t
t = ln
2 t

First thing to notice is that t is at least 0.5, for a binary classification problem. Any less
(say, 0.45) and we can simply invert the classification for a higher accuracy (1-0.45 = 0.55).
Consider the worst case scenario, where t = 0.5. Plugging in t = 0.5, we get t = 0, as
expected.

We pick this weighting per the error rate of a decision tree. Create a classifier h1 , compute
its error 1 and weight 1 . Repeat this for the second, third etc. trees. Here is our scheme,
to train an expert. Take the samples that h1 classified incorrectly. Train h2 on those.
Take the samples that h2 classified incorrectly. Train h3 on those. We can continue in this
fashion to produce an expert. This is the intuition for it, but in reality, we will instead
give more weight to the samples that h1 classified incorrectly.

2.2 Full Algorithm

Now, let us consider the original boosting algorithm in its full glory.

1. Train weak learner with distribution Dt .

2. Get weak hypothesis, ht : X {1, +1}.

3. Choose t .

4. Update Dt = Dt+1 .

P
We compute a probability distribution Dt over the samples. We know that i Dt (i) = 1,
and we can initialize D1 to be

1
D1 (i) = , i
n

For each sample, we multiply its old weight by a factor, which effectively gives more weight
to examples that were classified incorrectly. First, note that for a good classifier, t is high

2
and we weight Dt+1 less, by making our factor et . If otherwise, t is low and our factor is
et , to scale Dt+1 up.

(
Dt (i) et if ht (xi ) = yi
Dt+1 =
Zt et if ht (xi ) 6= yi

We can summarize this update as the following.

Dt (i)exp(t yt ht (xt ))
Dt+1 =
Zt

You might also like