0% found this document useful (0 votes)

6 views

Hyperparameter Tuning the Random Forest in Python BOM 3_ by Will Koehrsen _ Towards Data Science

This document discusses hyperparameter tuning for Random Forest models in Python using Scikit-Learn, focusing on optimizing model performance through techniques like cross-validation and random search. It explains the importance of hyperparameters, the risks of overfitting, and provides a step-by-step guide on implementing RandomizedSearchCV and GridSearchCV for model tuning. The results indicate that while improvements in accuracy can be achieved through hyperparameter tuning, diminishing returns may occur after extensive adjustments.

Uploaded by

Lud Cam

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Hyperparameter Tuning the Random Forest in Python BOM 3_ by Will Koehrsen _ Towards Data Science

Uploaded by

Lud Cam

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Get started Open in app

491K Followers · About Follow

Hyperparameter Tuning the Random Forest in

Python
Will Koehrsen Jan 10, 2018 · 12 min read

Improving the Random Forest Part Two

So we’ve built a random forest model to solve our machine learning problem (perhaps
by following this end-to-end guide) but we’re not too impressed by the results. What
are our options? As we saw in the first part of this series, our first step should be to
gather more data and perform feature engineering. Gathering more data and feature
engineering usually has the greatest payoff in terms of time invested versus improved
performance, but when we have exhausted all data sources, it’s time to move on to
model hyperparameter tuning. This post will focus on optimizing the random forest
model in Python using Scikit-Learn tools. Although this article builds on part one, it
fully stands on its own, and we will cover many widely-applicable machine learning
concepts.

One Tree in a Random Forest

I have included Python code in this article where it is most instructive. Full code and
data to follow along can be found on the project Github page.

A Brief Explanation of Hyperparameter Tuning

The best way to think about hyperparameters is like the settings of an algorithm that
can be adjusted to optimize performance, just as we might turn the knobs of an AM
radio to get a clear signal (or your parents might have!). While model parameters are
learned during training — such as the slope and intercept in a linear regression —
hyperparameters must be set by the data scientist before training. In the case of a
random forest, hyperparameters include the number of decision trees in the forest and
the number of features considered by each tree when splitting a node. (The parameters
of a random forest are the variables and thresholds used to split each node learned
during training). Scikit-Learn implements a set of sensible default hyperparameters for
all models, but these are not guaranteed to be optimal for a problem. The best
hyperparameters are usually impossible to determine ahead of time, and tuning a
model is where machine learning turns from a science into trial-and-error based
engineering.
Hyperparameters and Parameters

Hyperparameter tuning relies more on experimental results than theory, and thus the
best method to determine the optimal settings is to try many different combinations
evaluate the performance of each model. However, evaluating each model only on the
training set can lead to one of the most fundamental problems in machine learning:
overfitting.

If we optimize the model for the training data, then our model will score very well on
the training set, but will not be able to generalize to new data, such as in a test set.
When a model performs highly on the training set but poorly on the test set, this is
known as overfitting, or essentially creating a model that knows the training set very
well but cannot be applied to new problems. It’s like a student who has memorized the
simple problems in the textbook but has no idea how to apply concepts in the messy
real world.

An overfit model may look impressive on the training set, but will be useless in a real
application. Therefore, the standard procedure for hyperparameter optimization
accounts for overfitting through cross validation.

Cross Validation
The technique of cross validation (CV) is best explained by example using the most
common method, K-Fold CV. When we approach a machine learning problem, we
make sure to split our data into a training and a testing set. In K-Fold CV, we further
split our training set into K number of subsets, called folds. We then iteratively fit the
model K times, each time training the data on K-1 of the folds and evaluating on the
Kth fold (called the validation data). As an example, consider fitting a model with K =
5. The first iteration we train on the first four folds and evaluate on the fifth. The
second time we train on the first, second, third, and fifth fold and evaluate on the
fourth. We repeat this procedure 3 more times, each time evaluating on a different
fold. At the very end of training, we average the performance on each of the folds to
come up with final validation metrics for the model.

5 Fold Cross Validation (Source)

For hyperparameter tuning, we perform many iterations of the entire K-Fold CV

process, each time using different model settings. We then compare all of the models,
select the best one, train it on the full training set, and then evaluate on the testing set.
This sounds like an awfully tedious process! Each time we want to assess a different set
of hyperparameters, we have to split our training data into K fold and train and
evaluate K times. If we have 10 sets of hyperparameters and are using 5-Fold CV, that
represents 50 training loops. Fortunately, as with most problems in machine learning,
someone has solved our problem and model tuning with K-Fold CV can be
automatically implemented in Scikit-Learn.

Random Search Cross Validation in Scikit-Learn

Usually, we only have a vague idea of the best hyperparameters and thus the best
approach to narrow our search is to evaluate a wide range of values for each
hyperparameter. Using Scikit-Learn’s RandomizedSearchCV method, we can define a
grid of hyperparameter ranges, and randomly sample from the grid, performing K-Fold
CV with each combination of values.

As a brief recap before we get into model tuning, we are dealing with a supervised
regression machine learning problem. We are trying to predict the temperature
tomorrow in our city (Seattle, WA) using past historical weather data. We have 4.5
years of training data, 1.5 years of test data, and are using 6 different features
(variables) to make our predictions. (To see the full code for data preparation, see the
notebook).

Let’s examine the features quickly.

Features for Temperature Prediction

temp_1 = max temperature (in F) one day prior

average = historical average max temperature

ws_1 = average wind speed one day prior

temp_2 = max temperature two days prior

friend = prediction from our “trusty” friend

year = calendar year

In previous posts, we checked the data to check for anomalies and we know our data is
clean. Therefore, we can skip the data cleaning and jump straight into hyperparameter
tuning.

To look at the available hyperparameters, we can create a random forest and examine
the default values.

from sklearn.ensemble import RandomForestRegressor

rf = RandomForestRegressor(random_state = 42)

from pprint import pprint

# Look at parameters used by our current forest

print('Parameters currently in use:\n')
pprint(rf.get_params())

Parameters currently in use:

{'bootstrap': True,
'criterion': 'mse',
'max_depth': None,
'max_features': 'auto',
'max_leaf_nodes': None,
'min_impurity_decrease': 0.0,
'min_impurity_split': None,
'min_samples_leaf': 1,
'min_samples_split': 2,
'min_weight_fraction_leaf': 0.0,
'n_estimators': 10,
'n_jobs': 1,
'oob_score': False,
'random_state': 42,
'verbose': 0,
'warm_start': False}

Wow, that is quite an overwhelming list! How do we know where to start? A good place
is the documentation on the random forest in Scikit-Learn. This tells us the most
important settings are the number of trees in the forest (n_estimators) and the number
of features considered for splitting at each leaf node (max_features). We could go read
the research papers on the random forest and try to theorize the best hyperparameters,
but a more efficient use of our time is just to try out a wide range of values and see
what works! We will try adjusting the following set of hyperparameters:

n_estimators = number of trees in the foreset

max_features = max number of features considered for splitting a node

max_depth = max number of levels in each decision tree

min_samples_split = min number of data points placed in a node before the node
is split

min_samples_leaf = min number of data points allowed in a leaf node

bootstrap = method for sampling data points (with or without replacement)

Random Hyperparameter Grid

To use RandomizedSearchCV, we first need to create a parameter grid to sample from
during fitting:

from sklearn.model_selection import RandomizedSearchCV

# Number of trees in random forest

n_estimators = [int(x) for x in np.linspace(start = 200, stop =
2000, num = 10)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 4]
# Method of selecting samples for training each tree
bootstrap = [True, False]

# Create the random grid

random_grid = {'n_estimators': n_estimators,
'max_features': max_features,
'max_depth': max_depth,
'min_samples_split': min_samples_split,
'min_samples_leaf': min_samples_leaf,
'bootstrap': bootstrap}

pprint(random_grid)

{'bootstrap': [True, False],

'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, None],
'max_features': ['auto', 'sqrt'],
'min_samples_leaf': [1, 2, 4],
'min_samples_split': [2, 5, 10],
'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800,
2000]}

On each iteration, the algorithm will choose a difference combination of the features.
Altogether, there are 2 * 12 * 2 * 3 * 3 * 10 = 4320 settings! However, the benefit of a
random search is that we are not trying every combination, but selecting at random to
sample a wide range of values.

Random Search Training

Now, we instantiate the random search and fit it like any Scikit-Learn model:

# Use the random grid to search for best hyperparameters

# First create the base model to tune
rf = RandomForestRegressor()
# Random search of parameters, using 3 fold cross validation,
# search across 100 different combinations, and use all available
cores
rf_random = RandomizedSearchCV(estimator = rf, param_distributions =
random_grid, n_iter = 100, cv = 3, verbose=2, random_state=42,
n_jobs = -1)

# Fit the random search model

rf_random.fit(train_features, train_labels)
The most important arguments in RandomizedSearchCV are n_iter, which controls the
number of different combinations to try, and cv which is the number of folds to use for
cross validation (we use 100 and 3 respectively). More iterations will cover a wider
search space and more cv folds reduces the chances of overfitting, but raising each will
increase the run time. Machine learning is a field of trade-offs, and performance vs
time is one of the most fundamental.

We can view the best parameters from fitting the random search:

rf_random.best_params_

{'bootstrap': True,
'max_depth': 70,
'max_features': 'auto',
'min_samples_leaf': 4,
'min_samples_split': 10,
'n_estimators': 400}

From these results, we should be able to narrow the range of values for each
hyperparameter.

Evaluate Random Search

To determine if random search yielded a better model, we compare the base model
with the best random search model.

def evaluate(model, test_features, test_labels):

predictions = model.predict(test_features)
errors = abs(predictions - test_labels)
mape = 100 * np.mean(errors / test_labels)
accuracy = 100 - mape
print('Model Performance')
print('Average Error: {:0.4f} degrees.'.format(np.mean(errors)))
print('Accuracy = {:0.2f}%.'.format(accuracy))

return accuracy

base_model = RandomForestRegressor(n_estimators = 10, random_state =

42)
base_model.fit(train_features, train_labels)
base_accuracy = evaluate(base_model, test_features, test_labels)

Model Performance
Average Error: 3.9199 degrees.
Accuracy = 93.36%.

best_random = rf_random.best_estimator_
random_accuracy = evaluate(best_random, test_features, test_labels)

Model Performance
Average Error: 3.7152 degrees.
Accuracy = 93.73%.

print('Improvement of {:0.2f}%.'.format( 100 * (random_accuracy -

base_accuracy) / base_accuracy))

Improvement of 0.40%.

We achieved an unspectacular improvement in accuracy of 0.4%. Depending on the

application though, this could be a significant benefit. We can further improve our
results by using grid search to focus on the most promising hyperparameters ranges
found in the random search.

Grid Search with Cross Validation

Random search allowed us to narrow down the range for each hyperparameter. Now
that we know where to concentrate our search, we can explicitly specify every
combination of settings to try. We do this with GridSearchCV, a method that, instead of
sampling randomly from a distribution, evaluates all combinations we define. To use
Grid Search, we make another grid based on the best values provided by random
search:

from sklearn.model_selection import GridSearchCV

# Create the parameter grid based on the results of random search

param_grid = {
'bootstrap': [True],
'max_depth': [80, 90, 100, 110],
'max_features': [2, 3],
'min_samples_leaf': [3, 4, 5],
'min_samples_split': [8, 10, 12],
'n_estimators': [100, 200, 300, 1000]
}

# Create a based model

rf = RandomForestRegressor()

# Instantiate the grid search model

grid_search = GridSearchCV(estimator = rf, param_grid = param_grid,
cv = 3, n_jobs = -1, verbose = 2)
This will try out 1 * 4 * 2 * 3 * 3 * 4 = 288 combinations of settings. We can fit the
model, display the best hyperparameters, and evaluate performance:

# Fit the grid search to the data

grid_search.fit(train_features, train_labels)

grid_search.best_params_

{'bootstrap': True,
'max_depth': 80,
'max_features': 3,
'min_samples_leaf': 5,
'min_samples_split': 12,
'n_estimators': 100}

best_grid = grid_search.best_estimator_
grid_accuracy = evaluate(best_grid, test_features, test_labels)

Model Performance
Average Error: 3.6561 degrees.
Accuracy = 93.83%.

print('Improvement of {:0.2f}%.'.format( 100 * (grid_accuracy -

base_accuracy) / base_accuracy))

Improvement of 0.50%.

It seems we have about maxed out performance, but we can give it one more try with a
grid further refined from our previous results. The code is the same as before just with
a different grid so I only present the results:

Model Performance
Average Error: 3.6602 degrees.
Accuracy = 93.82%.

Improvement of 0.49%.

A small decrease in performance indicates we have reached diminishing returns for

hyperparameter tuning. We could continue, but the returns would be minimal at best.

Comparisons
We can make some quick comparisons between the different approaches used to
improve performance showing the returns on each. The following table shows the final
results from all the improvements we made (including those from the first part):

Comparison of All Models

Model is the (very unimaginative) names for the models, accuracy is the percentage
accuracy, error is the average absolute error in degrees, n_features is the number of
features in the dataset, n_trees is the number of decision trees in the forest, and time is
the training and predicting time in seconds.

The models are as follows:

average: original baseline computed by predicting historical average max

temperature for each day in test set

one_year: model trained using a single year of data

four_years_all: model trained using 4.5 years of data and expanded features (see
Part One for details)

four_years_red: model trained using 4.5 years of data and subset of most
important features

best_random: best model from random search with cross validation

first_grid: best model from first grid search with cross validation (selected as the
final model)
second_grid: best model from second grid search

Overall, gathering more data and feature selection reduced the error by 17.69%
while hyperparameter further reduced the error by 6.73%.

Model Comparison (see Notebook for code)

In terms of programmer-hours, gathering data took about 6 hours while

hyperparameter tuning took about 3 hours. As with any pursuit in life, there is a point
at which pursuing further optimization is not worth the effort and knowing when to
stop can be just as important as being able to keep going (sorry for getting all
philosophical). Moreover, in any data problem, there is what is called the Bayes error
rate, which is the absolute minimum possible error in a problem. Bayes error, also
called reproducible error, is a combination of latent variables, the factors affecting a
problem which we cannot measure, and inherent noise in any physical process.
Creating a perfect model is therefore not possible. Nonetheless, in this example, we
were able to significantly improve our model with hyperparameter tuning and we
covered numerous machine learning topics which are broadly applicable.

Training Visualizations
To further analyze the process of hyperparameter optimization, we can change one
setting at a time and see the effect on the model performance (essentially conducting a
controlled experiment). For example, we can create a grid with a range of number of
trees, perform grid search CV, and then plot the results. Plotting the training and
testing error and the training time will allow us to inspect how changing one
hyperparameter impacts the model.

First we can look at the effect of changing the number of trees in the forest. (see
notebook for training and plotting code)

Number of Trees Training Curves

As the number of trees increases, our error decreases up to a point. There is not much
benefit in accuracy to increasing the number of trees beyond 20 (our final model had
100) and the training time rises consistently.

We can also examine curves for the number of features to split a node:
Number of Features Training Curves

As we increase the number of features retained, the model accuracy increases as

expected. The training time also increases although not significantly.

Together with the quantitative stats, these visuals can give us a good idea of the trade-
offs we make with different combinations of hyperparameters. Although there is
usually no way to know ahead of time what settings will work the best, this example
has demonstrated the simple tools in Python that allow us to optimize our machine
learning model.

As always, I welcome feedback and constructive criticism. I can be reached at

[email protected]

Sign up for The Daily Pick

By Towards Data Science

Hands-on real-world examples, research, tutorials, and cutting-edge techniques delivered Monday
to Thursday. Make learning your daily ritual. Take a look

Your email

Get this newsletter

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information
about our privacy practices.

Machine Learning Python Data Science Data

About Help Legal

Get the Medium app

GWU MSBA Analyzing Data With SAS Visual Statistics Student Homework Exercises We
No ratings yet
GWU MSBA Analyzing Data With SAS Visual Statistics Student Homework Exercises We
9 pages
Hyperparameter Tuning For Machine Learning Models
No ratings yet
Hyperparameter Tuning For Machine Learning Models
5 pages
Random Forest
No ratings yet
Random Forest
16 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
9 pages
Tuning A CART's Hyperparameters: Elie Kawerk
No ratings yet
Tuning A CART's Hyperparameters: Elie Kawerk
26 pages
Hyperparameter tuning
No ratings yet
Hyperparameter tuning
4 pages
Scikit Learn What Were Covering
No ratings yet
Scikit Learn What Were Covering
15 pages
Reference guide- Validation & cross-validation
No ratings yet
Reference guide- Validation & cross-validation
7 pages
HyperParameterTuning
No ratings yet
HyperParameterTuning
4 pages
Hyperparameters
No ratings yet
Hyperparameters
8 pages
Model Training: (Anything Done While We Train The Model)
No ratings yet
Model Training: (Anything Done While We Train The Model)
194 pages
Hyperparameter tuning is the process of optimizing the model
No ratings yet
Hyperparameter tuning is the process of optimizing the model
3 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
Mid2 Answers
No ratings yet
Mid2 Answers
42 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
4 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Generalization Error: Elie Kawerk
No ratings yet
Generalization Error: Elie Kawerk
37 pages
#Machinelearning: Mastering Tuning Hyperparameter
No ratings yet
#Machinelearning: Mastering Tuning Hyperparameter
7 pages
Generalization Error: Elie Kawerk
No ratings yet
Generalization Error: Elie Kawerk
37 pages
Jntuk Machine Learning 3-2 Unit-3
No ratings yet
Jntuk Machine Learning 3-2 Unit-3
33 pages
Week 7 - Tree-Based Model
100% (1)
Week 7 - Tree-Based Model
8 pages
Hyper Parameters
No ratings yet
Hyper Parameters
24 pages
Updated Lecture 12 Zainab
No ratings yet
Updated Lecture 12 Zainab
17 pages
804-Article Text-1282-1-10-20221223
No ratings yet
804-Article Text-1282-1-10-20221223
7 pages
Introduction To Scikit Learn
100% (1)
Introduction To Scikit Learn
108 pages
Slides (A19 A20)
No ratings yet
Slides (A19 A20)
261 pages
Random Forest
No ratings yet
Random Forest
18 pages
Decision Trees in Sklearn Decision Trees in Sklearn
No ratings yet
Decision Trees in Sklearn Decision Trees in Sklearn
7 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
7 pages
8 To 12 Jaimeen
No ratings yet
8 To 12 Jaimeen
34 pages
Lec 10
No ratings yet
Lec 10
36 pages
2. Random Forest Algorithm
No ratings yet
2. Random Forest Algorithm
2 pages
Decision Trees and Random Forests
No ratings yet
Decision Trees and Random Forests
25 pages
Machine Learning: Hands-On for Developers and Technical Professionals
From Everand
Machine Learning: Hands-On for Developers and Technical Professionals
Jason Bell
No ratings yet
Section 1: Cross-Validation and Model Performance
No ratings yet
Section 1: Cross-Validation and Model Performance
33 pages
Evaluation Tuning
No ratings yet
Evaluation Tuning
58 pages
Hyperparameter_Tuning_in_Machine_Learning_1706249573
No ratings yet
Hyperparameter_Tuning_in_Machine_Learning_1706249573
9 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
QB 1
No ratings yet
QB 1
11 pages
Applied Machine Learning Supervised Machine Learning (Part 2)
No ratings yet
Applied Machine Learning Supervised Machine Learning (Part 2)
47 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Skit Learn Cheatsheet
No ratings yet
Skit Learn Cheatsheet
11 pages
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
sklearn
No ratings yet
sklearn
141 pages
Hyper Parameters
No ratings yet
Hyper Parameters
7 pages
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
No ratings yet
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
12 pages
Machine Learning With Scikit Learn Strata 2015
No ratings yet
Machine Learning With Scikit Learn Strata 2015
72 pages
Advanced Scikit Learn
No ratings yet
Advanced Scikit Learn
98 pages
ML UNIT-3 PART-1
No ratings yet
ML UNIT-3 PART-1
17 pages
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
From Everand
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
Andrei Besedin
2.5/5 (2)
Machine learning model ENG
No ratings yet
Machine learning model ENG
16 pages
Hyperparameter Tuning For Machine Learning Models
No ratings yet
Hyperparameter Tuning For Machine Learning Models
14 pages
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
From Everand
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
Peter Bradley
No ratings yet
Scikit Learn
No ratings yet
Scikit Learn
25 pages
Machine Learning Random Forest Algorithm - Javatpoint
No ratings yet
Machine Learning Random Forest Algorithm - Javatpoint
14 pages
Analysis and Design of Algorithms: A Beginner’s Hope
From Everand
Analysis and Design of Algorithms: A Beginner’s Hope
Shefali Singhal
No ratings yet
07 - Model Selection & Building
No ratings yet
07 - Model Selection & Building
17 pages
SupervisedML
No ratings yet
SupervisedML
13 pages
S-2
No ratings yet
S-2
10 pages
ML Individual assigenment 1 - Copy (2)
No ratings yet
ML Individual assigenment 1 - Copy (2)
11 pages
Lnq = Β + Β Lnli + Β Lnki + Ɛ
No ratings yet
Lnq = Β + Β Lnli + Β Lnki + Ɛ
12 pages
Appendix A Approval Letter
No ratings yet
Appendix A Approval Letter
10 pages
Cointegration Analysis
No ratings yet
Cointegration Analysis
7 pages
Lecture 8.0 - T-Test
No ratings yet
Lecture 8.0 - T-Test
31 pages
Hsslive-Xii-Statistics-1. Correlation Analysis-English
No ratings yet
Hsslive-Xii-Statistics-1. Correlation Analysis-English
6 pages
Cortina, J.M. (1993) What Is Coefficient Alpha An Examination of Theory and Applications PDF
No ratings yet
Cortina, J.M. (1993) What Is Coefficient Alpha An Examination of Theory and Applications PDF
7 pages
Business Statistics-Unit 2
No ratings yet
Business Statistics-Unit 2
18 pages
ANOVA Executive Summary
No ratings yet
ANOVA Executive Summary
6 pages
Project Report Adv Stat V1.0
No ratings yet
Project Report Adv Stat V1.0
5 pages
Kelompok 4 (Paper) - Pearson Product Moment
No ratings yet
Kelompok 4 (Paper) - Pearson Product Moment
12 pages
Chapter - 4 Dispersion
No ratings yet
Chapter - 4 Dispersion
10 pages
Final Assessment
No ratings yet
Final Assessment
5 pages
Research-Methodology-and-Research-Ethics
No ratings yet
Research-Methodology-and-Research-Ethics
62 pages
Measures of Position For Ungrouped Data
100% (2)
Measures of Position For Ungrouped Data
49 pages
Sign Test & Wilcoxon Tests: Concept, Procedures and Examples
No ratings yet
Sign Test & Wilcoxon Tests: Concept, Procedures and Examples
31 pages
Ejercicios Minimos Cuadrados Chapra
No ratings yet
Ejercicios Minimos Cuadrados Chapra
3 pages
Efektivitas Pendekatan Realistic Mathematics Terhadap Kemampuan Pemecahan Masalah Matematis Siswa
No ratings yet
Efektivitas Pendekatan Realistic Mathematics Terhadap Kemampuan Pemecahan Masalah Matematis Siswa
10 pages
Chapter 8-Fractional Design-2020
No ratings yet
Chapter 8-Fractional Design-2020
22 pages
Kuesioner Pola Asuh Scale: All Variables: Case Processing Summary
No ratings yet
Kuesioner Pola Asuh Scale: All Variables: Case Processing Summary
4 pages
Chapter-5 SAMPLING
No ratings yet
Chapter-5 SAMPLING
27 pages
Econometrics
No ratings yet
Econometrics
205 pages
Independent Samples T-Test Dr. Tom Pierce Radford University
No ratings yet
Independent Samples T-Test Dr. Tom Pierce Radford University
10 pages
Basic Business Statistics 13th Edition Berenson Solutions Manual - Full Version Is Now Available For Download
No ratings yet
Basic Business Statistics 13th Edition Berenson Solutions Manual - Full Version Is Now Available For Download
62 pages
Forecasting: Course: Production & Operation Analysis Effective Period: September 2015
No ratings yet
Forecasting: Course: Production & Operation Analysis Effective Period: September 2015
66 pages
Quant Interview Cheat Sheet
No ratings yet
Quant Interview Cheat Sheet
13 pages
Sample in Variance and Standard Deviation
100% (1)
Sample in Variance and Standard Deviation
3 pages
Tabel T-Student PDF
No ratings yet
Tabel T-Student PDF
16 pages
1 ORSolution Manual Ch01
No ratings yet
1 ORSolution Manual Ch01
8 pages
MATH SBA (2).docx
No ratings yet
MATH SBA (2).docx
20 pages

Hyperparameter Tuning the Random Forest in Python BOM 3_ by Will Koehrsen _ Towards Data Science

Uploaded by

Hyperparameter Tuning the Random Forest in Python BOM 3_ by Will Koehrsen _ Towards Data Science

Uploaded by

Get started Open in app

491K Followers · About Follow

Hyperparameter Tuning the Random Forest in

Improving the Random Forest Part Two

One Tree in a Random Forest

A Brief Explanation of Hyperparameter Tuning

5 Fold Cross Validation (Source)

For hyperparameter tuning, we perform many iterations of the entire K-Fold CV

Random Search Cross Validation in Scikit-Learn

Let’s examine the features quickly.

temp_1 = max temperature (in F) one day prior

average = historical average max temperature

ws_1 = average wind speed one day prior

temp_2 = max temperature two days prior

friend = prediction from our “trusty” friend

year = calendar year

from sklearn.ensemble import RandomForestRegressor

from pprint import pprint

# Look at parameters used by our current forest

Parameters currently in use:

n_estimators = number of trees in the foreset

max_features = max number of features considered for splitting a node

max_depth = max number of levels in each decision tree

min_samples_leaf = min number of data points allowed in a leaf node

bootstrap = method for sampling data points (with or without replacement)

Random Hyperparameter Grid

from sklearn.model_selection import RandomizedSearchCV

# Number of trees in random forest

# Create the random grid

{'bootstrap': [True, False],

Random Search Training

# Use the random grid to search for best hyperparameters

# Fit the random search model

Evaluate Random Search

def evaluate(model, test_features, test_labels):

base_model = RandomForestRegressor(n_estimators = 10, random_state =

print('Improvement of {:0.2f}%.'.format( 100 * (random_accuracy -

We achieved an unspectacular improvement in accuracy of 0.4%. Depending on the

Grid Search with Cross Validation

from sklearn.model_selection import GridSearchCV

# Create the parameter grid based on the results of random search

# Create a based model

# Instantiate the grid search model

# Fit the grid search to the data

print('Improvement of {:0.2f}%.'.format( 100 * (grid_accuracy -

A small decrease in performance indicates we have reached diminishing returns for

Comparison of All Models

The models are as follows:

average: original baseline computed by predicting historical average max

one_year: model trained using a single year of data

best_random: best model from random search with cross validation

Model Comparison (see Notebook for code)

In terms of programmer-hours, gathering data took about 6 hours while

Number of Trees Training Curves

As we increase the number of features retained, the model accuracy increases as

As always, I welcome feedback and constructive criticism. I can be reached at

Sign up for The Daily Pick

Get this newsletter

Machine Learning Python Data Science Data

Get the Medium app

You might also like