0% found this document useful (0 votes)
7 views

10 Random Forest

Uploaded by

BALI RAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

10 Random Forest

Uploaded by

BALI RAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Random Forest

Prof. Kailash Singh


Department of Chemical Engineering
MNIT Jaipur
Prof. Kailash Singh
MNIT Jaipur

Random Forest
What is Random Forest Prof. Kailash Singh
MNIT Jaipur

• Random Forest is a versatile machine learning


algorithm used for both classification and regression
tasks.
• It works by creating a large number of decision trees at
training time and outputting either:
– by voting (for classification).
– The mean prediction (for regression).
• Random forests are widely used for classification and
regression functions, which are known for their ability
to handle complex data, reduce overfitting, and
provide reliable forecasts in different environments.
A schematic of Random Forest Prof. Kailash Singh
MNIT Jaipur
What is Ensemble Learning Prof. Kailash Singh
MNIT Jaipur

• In ensemble learning, different models team


up to enhance predictive performance.
• It’s all about leveraging the collective wisdom
of the group to overcome individual
limitations and make more informed decisions
in various machine learning tasks.
• Some popular ensemble models include-
XGBoost, AdaBoost, LightGBM, Random
Forest, Bagging, Voting, etc.
What is Bagging and Boosting Prof. Kailash Singh
MNIT Jaipur

• Bagging is an ensemble learning model, where multiple


week models are trained on different subsets of the
training data.
• Each subset is sampled with replacement and prediction is
made by averaging the prediction of the week models for
regression problem and considering majority vote for
classification problem.
• Boosting trains multiple based models sequentially. In this
method, each model tries to correct the errors made by the
previous models. Each model is trained on a modified
version of the dataset, the instances that were misclassified
by the previous models are given more weight. The final
prediction is made by weighted voting.
How Random Forest Works Prof. Kailash Singh
MNIT Jaipur

• Step 1: Bootstrapping
– Draw several bootstrapped samples from the training data
(sampling with replacement).
– Build a decision tree from each bootstrapped sample.
• Step 2: Random Feature Selection
– For each node in a tree, instead of considering all features,
select a random subset of features and choose the best one.
– This process reduces correlation between individual trees,
making the model more robust.
• Step 3: Tree Voting/Averaging
– In classification, trees "vote" for the class.
– In regression, each tree produces a numeric prediction, and the
average of these predictions becomes the final result.
Random Forest Hyperparameters Prof. Kailash Singh
MNIT Jaipur

•Number of Trees (n_estimators):


•The number of trees in the forest.
•Larger forests tend to give better performance but at a cost of
higher computation.
•Max Features:
•Maximum number of features to consider for splitting a node.
•Higher values can result in better accuracy but may increase
overfitting.
•Max Depth:
•The maximum depth of each decision tree.
•Deeper trees capture more complexity but can also lead to
overfitting.
•Min Samples Split:
•The minimum number of samples required to split an internal
node.
•Min Samples Leaf:
•The minimum number of samples required to be at a leaf node.
Python program Prof. Kailash Singh
MNIT Jaipur

from sklearn.ensemble import RandomForestClassifier


from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset (e.g., iris dataset)


from sklearn.datasets import load_iris
data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data,
data.target, test_size=0.2)

# Create a random forest classifier


clf = RandomForestClassifier(n_estimators=100, max_depth=5,
random_state=42)
clf.fit(X_train, y_train)

# Predict on the test set


y_pred = clf.predict(X_test)

# Evaluate model
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
Advantages of Random Forest Prof. Kailash Singh
MNIT Jaipur

• High accuracy due to multiple decision trees.


• Reduces overfitting by averaging.
• Can handle missing data by using majority
voting or averaging predictions.
• Provides a way to measure the importance of
each feature in prediction.
Limitations Prof. Kailash Singh
MNIT Jaipur

• Computational Complexity: Slower to train


and predict due to multiple trees.
• Memory Usage: Requires more memory,
especially for a large number of trees or large
datasets.
• Interpretability: Harder to interpret than a
single decision tree.
Applications of Random Forest Prof. Kailash Singh
MNIT Jaipur

• Healthcare: Disease prediction, personalized


medicine.
• Finance: Credit scoring, risk analysis.
• E-commerce: Customer segmentation,
recommendation systems.
• Image Recognition: Classification of objects in
images.
• Genomics: Feature selection and classification
in bioinformatics.
Questions for students Prof. Kailash Singh
MNIT Jaipur

• What is Random Forest used for?


• What is the difference between decision tree
and random forest?
• What is the difference between XGBoost and
Random Forest?

You might also like