0% found this document useful (0 votes)
8 views

25 June 2024 12:34: Random Fores Page 1

Uploaded by

Debjit Patar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

25 June 2024 12:34: Random Fores Page 1

Uploaded by

Debjit Patar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

25 June 2024 12:34

Random fores Page 1


Random fores Page 2
Random Forest is a versatile and widely used machine learning algorithm that belongs to the
class of ensemble methods. Specifically, it is a type of bagging technique, which involves
training many individual models (in this case, decision trees) and combining their outputs to
make a final prediction.

Bagging, short for bootstrap aggregating, is a machine learning ensemble method designed to
improve the stability and accuracy of machine learning algorithms used in statistical
classification and regression. It also helps to avoid overfitting. The key principle of bagging is to
generate multiple subsets of the original data (with replacement), train a separate model for
each subset, and then combine the results.

Random fores Page 3


Random fores Page 4
"OOB" stands for "out-of-bag". In the context of machine learning, an out-of-bag score is a
method of measuring the prediction error of random forests, bagging classifiers, and other
ensemble methods that use bootstrap aggregation (bagging) when sub-samples of the training
dataset are used to train individual models.
Here's how it works:
Each tree in the ensemble is trained on a distinct bootstrap sample of the data. By the
nature of bootstrap sampling, some samples from the dataset will be left out during the
training of each tree. These samples are called "out-of-bag" samples.
1.

The out-of-bag samples can then be used as a validation set. We can pass them through
the tree that didn't see them during training and obtain predictions.
2.

These predictions are then compared to the actual values to compute an "out-of-bag
score", which can be thought of as an estimate of the prediction error on unseen data.
3.

One of the advantages of the out-of-bag score is that it allows us to estimate the prediction
error without needing a separate validation set. This can be particularly useful when the
dataset is small and partitioning it into training and validation sets might leave too few
samples for effective learning.

Extra Trees is short for "Extremely Randomized Trees". It's a modification of the Random
Forest algorithm that changes the way the splitting points for decision tree branches are
chosen.
In traditional decision tree algorithms (and therefore in Random Forests), the optimal split
point for each feature is calculated, which involves a degree of computation.

For a given node,


the feature and the corresponding optimal split point that provide the best split are chosen.

On the other hand, in the Extra Trees algorithm, for each feature under consideration, a split
point is chosen completely at random. The best-performing feature and its associated random
split are then used to split the node. This adds an extra layer of randomness to the model,
hence the name "Extremely Randomized Trees".

Because of this difference, Extra Trees tend to have more branches (be deeper) than Random
Forests, and the splits are made more arbitrarily. This can sometimes lead to models that
perform better, especially on tasks where the data may not have clear optimal split points.

However, like all models, whether Extra Trees will outperform Random Forests (or any other
algorithm) depends on the specific dataset and task.

Random fores Page 5


Random fores Page 6

You might also like