Random Forest
Random Forest
to random forests
Eric Debreuve / Team Morpheme
Institutions: University Nice Sophia Antipolis / CNRS / Inria
Labs: I3S / Inria CRI SA-M / iBV
Outline
Machine learning
Decision tree
Random forest
Bagging
Random decision trees
Machine learning
Learning/training: build a classification or regression rule
from a set of samples
Samples
(learning set)
Machine learning
algorithm
class/category = rule(sample)
or
value = rule(sample)
Learned rule
predicted class
or
predicted value
(Un)Supervised learning
Supervised
Learning set = { (sample
[acquisition],
class
[expert])
Unsupervised
Learning set = unlabeled samples
Semi-supervised
Learning set = some labeled samples + many unlabeled samples
Ensemble learning
Combining weak classifiers (of the same type)...
... in order to produce a strong classifier
Condition: diversity among the weak classifiers
Example: Boosting
Train each new weak classifier focusing on samples misclassified by
previous ones
Popular implementation: AdaBoost
Weak classifiers: only need to be better than random guess
Outline
Machine learning
Decision tree
Random forest
Bagging
Random decision trees
Decision tree
Root node
Q1
Q2
D1
D2
Q3
D3
D4
D5
Leaf nodes
Correspond to the decision to take (or conclusion to make) if reached
Normally, pruning
To avoid over-fitting of learning data
To achieve a trade-off between prediction accuracy and complexity
7
Examples
Gini index =
Entropy =
Misclassification error =
SVM
Intrinsically multiclass
Handles Apple and Orange features
Robustness to outliers
Works w/ "small" learning set
Scalability (large learning set)
Prediction accuracy
Parameter tuning
10
Outline
Machine learning
Decision tree
Random forest
Bagging
Random decision trees
Random forest
Definition
Collection of unpruned CARTs
Rule to combine individual tree decisions
Purpose
Improve prediction accuracy
Principle
Encouraging diversity among the tree
Solution: randomness
Bagging
Random decision trees (rCART)
12
Two steps
Bootstrap sample set
Aggregation
13
Random forest: q = p
Asymptotic proportion of unique samples in Lk = 100 (1 - 1/e) ~ 63%
The remaining samples can be used for testing
14
Prediction
S: a new sample
Aggregation = majority vote among the K predictions/votes Ck(S)
15
Find the feature F among a random subset of features + threshold value T...
... that split the samples assigned to N into 2 subsets Sleft and Sright...
... so as to maximize the label purity within these subsets
Assign (F,T) to N
If Sleft
else
CART kNN
SVM
Intrinsically multiclass
Handles Apple and Orange features
Robustness to outliers
Works w/ "small" learning set
Scalability (large learning set)
Prediction accuracy
Parameter tuning
17
1 rCART
10 rCARTs
100 rCARTs
500 rCARTs
18
Fundamentally discrete
Functional data? (Example: curves)
19
Outline
Machine learning
Decision tree
Random forest
Bagging
Random decision trees
Kernel-induced features
Learning set L = { Si, i [1..N] }
Kernel K(x,y)
Features of sample S = { Ki(S) = K(Si, S), i [1..N] }
Samples S and Si can be vectors or functional data
21
Kernel K(x,y)
Symmetric
Positive semi-definite (Mercer's condition):
Note: mapping needs not to be known (might not even have an
explicit representation; e.g., Gaussian kernel)
22
23
24
25
Outline
Machine learning
Decision tree
Random forest
Bagging
Random decision trees
27
28
29
An introduction
to random forests
Thank you for your attention