SlideShare a Scribd company logo
2
Most read
7
Most read
9
Most read
Random Forest Classifier
Classification Technique
Overview
Random Forest is a supervised learning ensemble algorithm. Ensemble algorithms are those which combine more than
one algorithms of same or different kind for classifying objects. The ‘forest’ that Random Forest Classifier builds, is an
ensemble of Decision Trees, most of the time trained with the ‘bagging’ method. The general idea of the bagging
method is that a combination of learning models increases the overall result.
Random forest classifier creates a set of decision trees from randomly selected subset of training set. It then aggregates
the votes from different decision trees to decide the final class of the test object.
Random Forest adds additional randomness to the model, while growing the trees. Instead of searching for the most
important feature while splitting a node, it searches for the best feature among a random subset of features. This
results in a wide diversity that generally results in a better model.
Explanation
Say, we have 1000 observations in the complete population with 10 variables. Random forest tries to build multiple
CART model with different sample and different initial variables. For instance, it will take a random sample of 100
observation and 5 randomly chosen initial variables to build a CART model. It will repeat the process (say) 10 times and
then make a final prediction on each observation. Final prediction is a function of each prediction. This final prediction
can simply be the mean of each prediction.
Each tree in a forest is grown as follows:
• If the number of cases in the training set is N, sample n cases at random (but with replacement) from the original
data. This sample will be the training set for growing the tree.
• If there are M input variables, a number m < M is specified such that at each node, m variables are selected at
random out of the M and the best split on these m is used to split the node. The value of m is held constant during
the forest growing.
• Each tree is grown to the largest extent possible. There is no pruning.
Forest Error rate depends on two things:
• The correlation between any two trees in the forest. Increasing the correlation increases the forest error rate.
• The strength of each individual tree in the forest. A tree with a low error rate is a strong classifier. Increasing the
strength of the individual trees decreases the forest error rate.
Reducing m reduces both the correlation and the strength. Increasing it increases both. Somewhere in between is an
"optimal" range of m (usually quite wide). Using the OOB error rate (explained in later slides) an optimal value of m can
quickly be found. This is the only adjustable parameter to which random forests is somewhat sensitive.
Features
• It is unexcelled in accuracy among current algorithms.
• It runs efficiently on large data bases.
• It can handle thousands of input variables without variable deletion.
• It gives estimates of what variables are important in the classification.
• It generates an internal unbiased estimate of the generalization error as the forest building progresses.
• It has an effective method for estimating missing data. It maintains accuracy even when a large proportion of the data are
missing.
• It has methods for balancing error in class population unbalanced data sets.
• Generated forests can be saved for future use on other data.
• Prototypes are computed that give information about the relation between the variables and the classification.
• The capabilities of the above can be extended to unlabeled data, leading to unsupervised clustering, data views and outlier
detection.
• It offers an experimental method for detecting variable interactions.
Out-Of-Bag (OOB)
When the training set for the current tree is drawn by sampling with replacement, about one-third of the
observations are left out of the sample.
This OOB (out-of-bag) data is used to get a running unbiased estimate of the classification error as trees are added to
the forest. It is also used to get estimates of variable importance.
Each tree is constructed using a different bootstrap sample from the original data. About one-third of the cases are left
out of the bootstrap sample and not used in the construction of the kth tree.
Out-Of-Bag (OOB) Error Estimate
Each tree is constructed using a different bootstrap sample from the original data. About one-third of the cases are
left out of the bootstrap sample and not used in the construction of the kth tree.
Put each case left out in the construction of the kth tree down the kth tree to get a classification. In this way, a test set
classification is obtained for each case in about one-third of the trees. At the end of the run, take j to be the class that
got most of the votes every time case n was OOB. The proportion of times that j is not equal to the true class of n
averaged over all cases is the OOB error estimate. This has proven to be unbiased in many tests.
Overfitting
Random Forest does not overfit. You can run as many trees as
you want. It is fast.
Summary
Random Forest is a great algorithm to train early in the model development process, to see how it performs and it’s hard
to build a “bad” Random Forest, because of its simplicity. This algorithm is also a great choice, if you need to develop a
model in a short period of time. On top of that, it provides a pretty good indicator of the importance it assigns to your
features.
Random Forests are also very hard to beat in terms of performance. Of course you can probably always find a model that
can perform better, like a neural network, but these usually take much more time in the development. And on top of that,
they can handle a lot of different feature types, like binary, categorical and numerical.
Python’s sklearn Documentation
http://scikit-learn.org/stable/modules/ensemble.html
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

More Related Content

What's hot (20)

PDF
Supervised and Unsupervised Machine Learning
Spotle.ai
 
PPT
2.2 decision tree
Krish_ver2
 
PDF
Machine Learning Algorithm - Decision Trees
Kush Kulshrestha
 
PPT
2.4 rule based classification
Krish_ver2
 
PPTX
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Simplilearn
 
PPTX
Decision Trees
Student
 
PPTX
Decision Tree - C4.5&CART
Xueping Peng
 
PPTX
Classification techniques in data mining
Kamal Acharya
 
PDF
Understanding Bagging and Boosting
Mohit Rajput
 
PPTX
Ensemble methods
Christopher Marker
 
PDF
Understanding random forests
Marc Garcia
 
PPTX
Linear regression with gradient descent
Suraj Parmar
 
PPTX
Classification in data mining
Sulman Ahmed
 
PPTX
Naive bayes
Ashraf Uddin
 
PDF
Decision tree
SEMINARGROOT
 
PDF
Data preprocessing using Machine Learning
Gopal Sakarkar
 
PDF
Bias and variance trade off
VARUN KUMAR
 
PPTX
Random Forest
Abdullah al Mamun
 
PDF
Classification and Clustering
Eng Teong Cheah
 
PPTX
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Simplilearn
 
Supervised and Unsupervised Machine Learning
Spotle.ai
 
2.2 decision tree
Krish_ver2
 
Machine Learning Algorithm - Decision Trees
Kush Kulshrestha
 
2.4 rule based classification
Krish_ver2
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Simplilearn
 
Decision Trees
Student
 
Decision Tree - C4.5&CART
Xueping Peng
 
Classification techniques in data mining
Kamal Acharya
 
Understanding Bagging and Boosting
Mohit Rajput
 
Ensemble methods
Christopher Marker
 
Understanding random forests
Marc Garcia
 
Linear regression with gradient descent
Suraj Parmar
 
Classification in data mining
Sulman Ahmed
 
Naive bayes
Ashraf Uddin
 
Decision tree
SEMINARGROOT
 
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Bias and variance trade off
VARUN KUMAR
 
Random Forest
Abdullah al Mamun
 
Classification and Clustering
Eng Teong Cheah
 
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Simplilearn
 

Similar to Random Forest Classifier in Machine Learning | Palin Analytics (20)

PPT
RANDOM FORESTS Ensemble technique Introduction
Lalith86
 
PPTX
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
RaflyRizky2
 
PDF
Random Forest / Bootstrap Aggregation
Rupak Roy
 
PPTX
CS109a_Lecture16_Bagging_RF_Boosting.pptx
AbhishekSingh43430
 
PPTX
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
PDF
Random forest sgv_ai_talk_oct_2_2018
digitalzombie
 
PPTX
13 random forest
Vishal Dutt
 
PPTX
Supervised and Unsupervised Learning .pptx
KerenEvangelineI
 
PPTX
DecisionTree_RandomForest good for data science
Kuzivakwashe1
 
PDF
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
 
PDF
Random forests-talk-nl-meetup
Willem Hendriks
 
PPTX
DecisionTree_RandomForest.pptx
SagynKarabay
 
PDF
Data Science - Part V - Decision Trees & Random Forests
Derek Kane
 
PPTX
An Introduction to Random Forest and linear regression algorithms
Shouvic Banik0139
 
PPT
RandomForestsRandomForestsRandomForests.ppt
umarjaved98
 
PPT
RandomForests Bootstrapping BAgging Aggregation
rohan910028
 
PPT
RandomForests in artificial intelligence
PriyadharshiniG41
 
PPTX
Decision Tree.pptx
JayabharathiMuraliku
 
PPTX
Random Forest.pptx
SPIDERSRSTV
 
PDF
BaggingBoosting.pdf
DynamicPitch
 
RANDOM FORESTS Ensemble technique Introduction
Lalith86
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
RaflyRizky2
 
Random Forest / Bootstrap Aggregation
Rupak Roy
 
CS109a_Lecture16_Bagging_RF_Boosting.pptx
AbhishekSingh43430
 
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
Random forest sgv_ai_talk_oct_2_2018
digitalzombie
 
13 random forest
Vishal Dutt
 
Supervised and Unsupervised Learning .pptx
KerenEvangelineI
 
DecisionTree_RandomForest good for data science
Kuzivakwashe1
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
 
Random forests-talk-nl-meetup
Willem Hendriks
 
DecisionTree_RandomForest.pptx
SagynKarabay
 
Data Science - Part V - Decision Trees & Random Forests
Derek Kane
 
An Introduction to Random Forest and linear regression algorithms
Shouvic Banik0139
 
RandomForestsRandomForestsRandomForests.ppt
umarjaved98
 
RandomForests Bootstrapping BAgging Aggregation
rohan910028
 
RandomForests in artificial intelligence
PriyadharshiniG41
 
Decision Tree.pptx
JayabharathiMuraliku
 
Random Forest.pptx
SPIDERSRSTV
 
BaggingBoosting.pdf
DynamicPitch
 
Ad

Recently uploaded (20)

PPTX
Introduction to Biochemistry & Cellular Foundations.pptx
marvinnbustamante1
 
PPTX
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
PPTX
HUMAN RESOURCE MANAGEMENT: RECRUITMENT, SELECTION, PLACEMENT, DEPLOYMENT, TRA...
PRADEEP ABOTHU
 
PPTX
How to Manage Allocation Report for Manufacturing Orders in Odoo 18
Celine George
 
PDF
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
PPTX
care of patient with elimination needs.pptx
Rekhanjali Gupta
 
PDF
Is Assignment Help Legal in Australia_.pdf
thomas19williams83
 
PPTX
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 
PDF
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 
PPTX
Controller Request and Response in Odoo18
Celine George
 
PPTX
Post Dated Cheque(PDC) Management in Odoo 18
Celine George
 
PPTX
Introduction to Indian Writing in English
Trushali Dodiya
 
PDF
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
PPTX
TRANSLATIONAL AND ROTATIONAL MOTION.pptx
KIPAIZAGABAWA1
 
PDF
epi editorial commitee meeting presentation
MIPLM
 
PPTX
How to Configure Re-Ordering From Portal in Odoo 18 Website
Celine George
 
PPTX
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
PPTX
How to Send Email From Odoo 18 Website - Odoo Slides
Celine George
 
PDF
STATEMENT-BY-THE-HON.-MINISTER-FOR-HEALTH-ON-THE-COVID-19-OUTBREAK-AT-UG_revi...
nservice241
 
PPTX
infertility, types,causes, impact, and management
Ritu480198
 
Introduction to Biochemistry & Cellular Foundations.pptx
marvinnbustamante1
 
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
HUMAN RESOURCE MANAGEMENT: RECRUITMENT, SELECTION, PLACEMENT, DEPLOYMENT, TRA...
PRADEEP ABOTHU
 
How to Manage Allocation Report for Manufacturing Orders in Odoo 18
Celine George
 
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
care of patient with elimination needs.pptx
Rekhanjali Gupta
 
Is Assignment Help Legal in Australia_.pdf
thomas19williams83
 
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 
Controller Request and Response in Odoo18
Celine George
 
Post Dated Cheque(PDC) Management in Odoo 18
Celine George
 
Introduction to Indian Writing in English
Trushali Dodiya
 
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
TRANSLATIONAL AND ROTATIONAL MOTION.pptx
KIPAIZAGABAWA1
 
epi editorial commitee meeting presentation
MIPLM
 
How to Configure Re-Ordering From Portal in Odoo 18 Website
Celine George
 
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
How to Send Email From Odoo 18 Website - Odoo Slides
Celine George
 
STATEMENT-BY-THE-HON.-MINISTER-FOR-HEALTH-ON-THE-COVID-19-OUTBREAK-AT-UG_revi...
nservice241
 
infertility, types,causes, impact, and management
Ritu480198
 
Ad

Random Forest Classifier in Machine Learning | Palin Analytics

  • 2. Overview Random Forest is a supervised learning ensemble algorithm. Ensemble algorithms are those which combine more than one algorithms of same or different kind for classifying objects. The ‘forest’ that Random Forest Classifier builds, is an ensemble of Decision Trees, most of the time trained with the ‘bagging’ method. The general idea of the bagging method is that a combination of learning models increases the overall result. Random forest classifier creates a set of decision trees from randomly selected subset of training set. It then aggregates the votes from different decision trees to decide the final class of the test object. Random Forest adds additional randomness to the model, while growing the trees. Instead of searching for the most important feature while splitting a node, it searches for the best feature among a random subset of features. This results in a wide diversity that generally results in a better model.
  • 3. Explanation Say, we have 1000 observations in the complete population with 10 variables. Random forest tries to build multiple CART model with different sample and different initial variables. For instance, it will take a random sample of 100 observation and 5 randomly chosen initial variables to build a CART model. It will repeat the process (say) 10 times and then make a final prediction on each observation. Final prediction is a function of each prediction. This final prediction can simply be the mean of each prediction.
  • 4. Each tree in a forest is grown as follows: • If the number of cases in the training set is N, sample n cases at random (but with replacement) from the original data. This sample will be the training set for growing the tree. • If there are M input variables, a number m < M is specified such that at each node, m variables are selected at random out of the M and the best split on these m is used to split the node. The value of m is held constant during the forest growing. • Each tree is grown to the largest extent possible. There is no pruning.
  • 5. Forest Error rate depends on two things: • The correlation between any two trees in the forest. Increasing the correlation increases the forest error rate. • The strength of each individual tree in the forest. A tree with a low error rate is a strong classifier. Increasing the strength of the individual trees decreases the forest error rate. Reducing m reduces both the correlation and the strength. Increasing it increases both. Somewhere in between is an "optimal" range of m (usually quite wide). Using the OOB error rate (explained in later slides) an optimal value of m can quickly be found. This is the only adjustable parameter to which random forests is somewhat sensitive.
  • 6. Features • It is unexcelled in accuracy among current algorithms. • It runs efficiently on large data bases. • It can handle thousands of input variables without variable deletion. • It gives estimates of what variables are important in the classification. • It generates an internal unbiased estimate of the generalization error as the forest building progresses. • It has an effective method for estimating missing data. It maintains accuracy even when a large proportion of the data are missing. • It has methods for balancing error in class population unbalanced data sets. • Generated forests can be saved for future use on other data. • Prototypes are computed that give information about the relation between the variables and the classification. • The capabilities of the above can be extended to unlabeled data, leading to unsupervised clustering, data views and outlier detection. • It offers an experimental method for detecting variable interactions.
  • 7. Out-Of-Bag (OOB) When the training set for the current tree is drawn by sampling with replacement, about one-third of the observations are left out of the sample. This OOB (out-of-bag) data is used to get a running unbiased estimate of the classification error as trees are added to the forest. It is also used to get estimates of variable importance. Each tree is constructed using a different bootstrap sample from the original data. About one-third of the cases are left out of the bootstrap sample and not used in the construction of the kth tree.
  • 8. Out-Of-Bag (OOB) Error Estimate Each tree is constructed using a different bootstrap sample from the original data. About one-third of the cases are left out of the bootstrap sample and not used in the construction of the kth tree. Put each case left out in the construction of the kth tree down the kth tree to get a classification. In this way, a test set classification is obtained for each case in about one-third of the trees. At the end of the run, take j to be the class that got most of the votes every time case n was OOB. The proportion of times that j is not equal to the true class of n averaged over all cases is the OOB error estimate. This has proven to be unbiased in many tests.
  • 9. Overfitting Random Forest does not overfit. You can run as many trees as you want. It is fast.
  • 10. Summary Random Forest is a great algorithm to train early in the model development process, to see how it performs and it’s hard to build a “bad” Random Forest, because of its simplicity. This algorithm is also a great choice, if you need to develop a model in a short period of time. On top of that, it provides a pretty good indicator of the importance it assigns to your features. Random Forests are also very hard to beat in terms of performance. Of course you can probably always find a model that can perform better, like a neural network, but these usually take much more time in the development. And on top of that, they can handle a lot of different feature types, like binary, categorical and numerical.