DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB

Ebook337 pages2 hours

DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB

Name: DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
Author: César Pérez López
ISBN: 9781794829053

By César Pérez López

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Data Mining and Machine Learning uses two types of techniques: predictive techniques (supervised techniques), which trains a model on known input and output data so that it can predict future outputs, and descriptive techniques (unsupervised techniques), which finds hidden patterns or intrinsic structures in input data. The aim of predictive techniques is to build a model that makes predictions based on evidence in the presence of uncertainty. A predictive algorithm takes a known set of input data and known responses to the data (output) and trains a model to generate reasonable predictions for the response to new data. Predictive techniques uses regression techniques to develop predictive models. This book develoop ensemble methods, boosting, bagging, random forest, decision trees and regression trees. Exercises are solved with MATLAB software.

Skip carousel

LanguageEnglish

PublisherLulu.com

Release dateNov 11, 2021

ISBN9781794829053

Author

César Pérez López

Related to DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES

Related ebooks

Skip carousel

Test Driven Machine Learning: Control your machine learning algorithms using test-driven development to achieve quantifiable milestones
Ebook
Test Driven Machine Learning: Control your machine learning algorithms using test-driven development to achieve quantifiable milestones
byJustin Bozonier
Rating: 0 out of 5 stars
0 ratings
PyTorch Cookbook: 100+ Solutions across RNNs, CNNs, python tools, distributed training and graph networks
Ebook
PyTorch Cookbook: 100+ Solutions across RNNs, CNNs, python tools, distributed training and graph networks
byMatthew Rosch
Rating: 0 out of 5 stars
0 ratings
Effective Amazon Machine Learning
Ebook
Effective Amazon Machine Learning
byAlexis Perrier
Rating: 0 out of 5 stars
0 ratings
Python for Data Science: A Practical Approach to Machine Learning
Ebook
Python for Data Science: A Practical Approach to Machine Learning
byJarrel E.
Rating: 0 out of 5 stars
0 ratings
Machine Learning in Production: Master the art of delivering robust Machine Learning solutions with MLOps (English Edition)
Ebook
Machine Learning in Production: Master the art of delivering robust Machine Learning solutions with MLOps (English Edition)
bySuhas Pote
Rating: 0 out of 5 stars
0 ratings
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
Ebook
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
byPARTHA MAJUMDAR
Rating: 0 out of 5 stars
0 ratings
Java for Data Science
Ebook
Java for Data Science
byRichard M Reese
Rating: 0 out of 5 stars
0 ratings
Ultimate Machine Learning with Scikit-Learn: Unleash the Power of Scikit-Learn and Python to Build Cutting-Edge Predictive Modeling Applications and Unlock Deeper Insights Into Machine Learning (English Edition)
Ebook
Ultimate Machine Learning with Scikit-Learn: Unleash the Power of Scikit-Learn and Python to Build Cutting-Edge Predictive Modeling Applications and Unlock Deeper Insights Into Machine Learning (English Edition)
byParag Saxena
Rating: 0 out of 5 stars
0 ratings
Introduction to Machine Learning with Python
Ebook
Introduction to Machine Learning with Python
byDeepti Chopra
Rating: 0 out of 5 stars
0 ratings
Instant StyleCop Code Analysis How-to
Ebook
Instant StyleCop Code Analysis How-to
byFranck LEVEQUE
Rating: 0 out of 5 stars
0 ratings
Social Media Data Mining and Analytics
Ebook
Social Media Data Mining and Analytics
byGabor Szabo
Rating: 0 out of 5 stars
0 ratings
Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood
Ebook
Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood
bySupun Kamburugamuve
Rating: 0 out of 5 stars
0 ratings
Meshing, Geometric Modeling and Numerical Simulation 1: Form Functions, Triangulations and Geometric Modeling
Ebook
Meshing, Geometric Modeling and Numerical Simulation 1: Form Functions, Triangulations and Geometric Modeling
byHouman Borouchaki
Rating: 0 out of 5 stars
0 ratings
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
Ebook
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
Hydraulic Modeling for Effective Flow Management in Managed Pressure Drilling
Ebook
Hydraulic Modeling for Effective Flow Management in Managed Pressure Drilling
byDHIVAKAR POOSAPADI
Rating: 0 out of 5 stars
0 ratings
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
Ebook
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
Hebbian Learning: Fundamentals and Applications for Uniting Memory and Learning
Ebook
Hebbian Learning: Fundamentals and Applications for Uniting Memory and Learning
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
Ebook
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
OpenFlow Cookbook
Ebook
OpenFlow Cookbook
byKingston Smiler. S
Rating: 5 out of 5 stars
5/5
Kernel Methods: Fundamentals and Applications
Ebook
Kernel Methods: Fundamentals and Applications
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
Robot Operating System: Mastering Autonomous Systems for Seamless Integration and Control
Ebook
Robot Operating System: Mastering Autonomous Systems for Seamless Integration and Control
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
Advanced Functional Programming: Mastering Concepts and Techniques
Ebook
Advanced Functional Programming: Mastering Concepts and Techniques
byPeter Jones
Rating: 0 out of 5 stars
0 ratings
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Ebook
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
Ultimate Robotics Programming with ROS 2 and Python
Ebook
Ultimate Robotics Programming with ROS 2 and Python
byJonathan Cacace
Rating: 0 out of 5 stars
0 ratings
Building Modern GUIs with tkinter and Python: Building user-friendly GUI applications with ease (English Edition)
Ebook
Building Modern GUIs with tkinter and Python: Building user-friendly GUI applications with ease (English Edition)
bySaurabh Chandrakar
Rating: 0 out of 5 stars
0 ratings
The Official Raspberry Pi Handbook 2023: Astounding projects with Raspberry Pi computers
Ebook
The Official Raspberry Pi Handbook 2023: Astounding projects with Raspberry Pi computers
byThe Makers of The MagPi magazine
Rating: 0 out of 5 stars
0 ratings
Dynamic Bayesian Networks: Fundamentals and Applications
Ebook
Dynamic Bayesian Networks: Fundamentals and Applications
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Ebook
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
Mastering AndEngine Game Development: Move beyond basic games and explore the limits of AndEngine
Ebook
Mastering AndEngine Game Development: Move beyond basic games and explore the limits of AndEngine
byMaya Posch
Rating: 0 out of 5 stars
0 ratings
Rake Task Management Essentials
Ebook
Rake Task Management Essentials
byAndrey Koleshko
Rating: 3 out of 5 stars
3/5

Applications & Software For You

Skip carousel

Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 5 out of 5 stars
5/5
80 Ways to Use ChatGPT in the Classroom
Ebook
80 Ways to Use ChatGPT in the Classroom
byStan Skrabut
Rating: 5 out of 5 stars
5/5
Canva Tips and Tricks Beyond The Limits
Ebook
Canva Tips and Tricks Beyond The Limits
byKoushik K
Rating: 3 out of 5 stars
3/5
Logic Pro X For Dummies
Ebook
Logic Pro X For Dummies
byGraham English
Rating: 0 out of 5 stars
0 ratings
Blender All-in-One For Dummies
Ebook
Blender All-in-One For Dummies
byJason van Gumster
Rating: 0 out of 5 stars
0 ratings
55 Smart Apps to Level up Your Brain: Free Apps, Games, and Tools for iPhone, iPad, Google Play, Kindle Fire, Web Browsers, Windows Phone, & Apple Watch
Ebook
55 Smart Apps to Level up Your Brain: Free Apps, Games, and Tools for iPhone, iPad, Google Play, Kindle Fire, Web Browsers, Windows Phone, & Apple Watch
byI. C. Robledo
Rating: 4 out of 5 stars
4/5
The Beginner's Guide to Procreate Dreams: How to Create and Animate Your Stories on the iPad
Ebook
The Beginner's Guide to Procreate Dreams: How to Create and Animate Your Stories on the iPad
byDavid Miller
Rating: 0 out of 5 stars
0 ratings
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
GarageBand For Dummies
Ebook
GarageBand For Dummies
byBob LeVitus
Rating: 5 out of 5 stars
5/5
The Basics of User Experience Design by Interaction Design Foundation
Ebook
The Basics of User Experience Design by Interaction Design Foundation
byIDFMads
Rating: 4 out of 5 stars
4/5
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
Photoshop - Stupid. Simple. Photoshop: A Noobie's Guide to Using Photoshop TODAY
Ebook
Photoshop - Stupid. Simple. Photoshop: A Noobie's Guide to Using Photoshop TODAY
byJoseph Scolden
Rating: 3 out of 5 stars
3/5
Adobe Illustrator CC For Dummies
Ebook
Adobe Illustrator CC For Dummies
byDavid Karlins
Rating: 5 out of 5 stars
5/5
The Designer’s Guide to Figma: Master Prototyping, Collaboration, Handoff, and Workflow
Ebook
The Designer’s Guide to Figma: Master Prototyping, Collaboration, Handoff, and Workflow
byDaniel Schwarz
Rating: 3 out of 5 stars
3/5
Windows 11 for Beginners: The Complete Step-by-Step User Guide to Learn and Take Full Use of Windows 11 (A Windows 11 Manual with Useful Tips & Tricks)
Ebook
Windows 11 for Beginners: The Complete Step-by-Step User Guide to Learn and Take Full Use of Windows 11 (A Windows 11 Manual with Useful Tips & Tricks)
byKyle A. Veltri
Rating: 5 out of 5 stars
5/5
Mastering ChatGPT
Ebook
Mastering ChatGPT
byCharles J. Jones
Rating: 0 out of 5 stars
0 ratings
2022 Adobe® Premiere Pro Guide For Filmmakers and YouTubers
Ebook
2022 Adobe® Premiere Pro Guide For Filmmakers and YouTubers
byScott Bradley
Rating: 5 out of 5 stars
5/5
Microsoft Word Guide for Success: Achieve Efficiency and Professional Results in Every Document [IV EDITION]
Ebook
Microsoft Word Guide for Success: Achieve Efficiency and Professional Results in Every Document [IV EDITION]
byKevin Pitch
Rating: 5 out of 5 stars
5/5
Animation for Beginners: Getting Started with Animation Filmmaking
Ebook
Animation for Beginners: Getting Started with Animation Filmmaking
byMorr Meroz
Rating: 4 out of 5 stars
4/5
Tableau Your Data!: Fast and Easy Visual Analysis with Tableau Software
Ebook
Tableau Your Data!: Fast and Easy Visual Analysis with Tableau Software
byDaniel G. Murray
Rating: 4 out of 5 stars
4/5
Mastering YouTube Automation: The Ultimate Guide to Creating a Successful Faceless Channel
Ebook
Mastering YouTube Automation: The Ultimate Guide to Creating a Successful Faceless Channel
byRaissa Gomez
Rating: 0 out of 5 stars
0 ratings
YouTube Channels For Dummies
Ebook
YouTube Channels For Dummies
byRob Ciampa
Rating: 3 out of 5 stars
3/5
Skulls & Anatomy: Copyright Free Vintage Illustrations for Artists & Designers
Ebook
Skulls & Anatomy: Copyright Free Vintage Illustrations for Artists & Designers
byIngram Spark
Rating: 1 out of 5 stars
1/5
Smartphone Photography
Ebook
Smartphone Photography
byRoger Carter
Rating: 0 out of 5 stars
0 ratings
Digital Video Production Handbook
Ebook
Digital Video Production Handbook
byDr. Pierre A. Kandorfer
Rating: 0 out of 5 stars
0 ratings
AutoCAD For Dummies
Ebook
AutoCAD For Dummies
byRalph Grabowski
Rating: 0 out of 5 stars
0 ratings
Photoshop For Beginners: Learn Adobe Photoshop cs5 Basics With Tutorials
Ebook
Photoshop For Beginners: Learn Adobe Photoshop cs5 Basics With Tutorials
byNisha Ramavat
Rating: 0 out of 5 stars
0 ratings
Canva For Dummies
Ebook
Canva For Dummies
byJesse Stay
Rating: 5 out of 5 stars
5/5
Sound Design for Filmmakers: Film School Sound
Ebook
Sound Design for Filmmakers: Film School Sound
byMurray Stiller
Rating: 5 out of 5 stars
5/5
Blender 4.3 Guide for All: Mastering 3D Design and Animation
Ebook
Blender 4.3 Guide for All: Mastering 3D Design and Animation
byPaige Massy-Greene
Rating: 0 out of 5 stars
0 ratings

Related categories

Skip carousel

Reviews for DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES - César Pérez López

Currently the weak learner types are:

'Discriminant' (recommended for Subspace ensemble)

'KNN' (only for Subspace ensemble)

'Tree' (for any ensemble except Subspace)

There are two ways to set the weak learner type in the ensemble.

To create an ensemble with default weak learner options, pass in the character vectors as the weak learner. For example:

ens = fitensemble(X,Y,'AdaBoostM2',50,'Tree');

% or

ens = fitensemble(X,Y,'Subspace',50,'KNN');

To create an ensemble with nondefault weak learner options, create a nondefault weak learner using the appropriate template method. For example, if you have missing data, and want to use trees with surrogate splits for better accuracy:

templ = templateTree('Surrogate','all');

ens = fitensemble(X,Y,'AdaBoostM2',50,templ);

To grow trees with leaves containing a number of observations that is at least 10% of the sample size:

templ = templateTree('MinLeafSize',size(X,1)/10);

ens = fitensemble(X,Y,'AdaBoostM2',50,templ);

Alternatively, choose the maximal number of splits per tree:

templ = templateTree('MaxNumSplits',4);

ens = fitensemble(X,Y,'AdaBoostM2',50,templ);

While you can give fitensemble a cell array of learner templates, the most common usage is to give just one weak learner template.

Decision trees can handle NaN values in X. Such values are called missing. If you have some missing values in a row of X, a decision tree finds optimal splits using nonmissing values only. If an entire row consists of NaN, fitensemble ignores that row. If you have data with a large fraction of missing values in X, use surrogate decision splits.

Common Settings for Tree Weak Learners

The depth of a weak learner tree makes a difference for training time, memory usage, and predictive accuracy. You control the depth these parameters:

MaxNumSplits — The maximal number of branch node splits is MaxNumSplits per tree. Set large values of MaxNumSplits to get deep trees. The default for bagging is size(X,1) - 1. The default for boosting is 1.

MinLeafSize — Each leaf has at least MinLeafSize observations. Set small values of MinLeafSize to get deep trees. The default for classification is 1 and 5 for regression.

MinParentSize — Each branch node in the tree has at least MinParentSize observations. Set small values of MinParentSize to get deep trees. The default for classification is 2 and 10 for regression.

If you supply both MinParentSize and MinLeafSize, the learner uses the setting that gives larger leaves (shallower trees):

MinParent = max(MinParent,2*MinLeaf)

If you additionally supply MaxNumSplits, then the software splits a tree until one of the three splitting criteria is satisfied.

Surrogate — Grow decision trees with surrogate splits when Surrogate is 'on'. Use surrogate splits when your data has missing values.

PredictorSelection — fitensemble and TreeBagger grow trees using the standard CART[1] algorithm by default. If the predictor variables are heterogeneous or there are predictors having many levels and other having few levels, then standard CART tends to select predictors having many levels as split predictors. For split-predictor selection that is robust to the number of levels that the predictors have, consider specifying 'curvature' or 'interaction-curvature'. These specifications conduct chi-square tests of association between each predictor and the response or each pair of predictors and the response, respectively. The predictor that yields the minimal p-value is the split predictor for a particular node.

The syntax of fitensemble is:

ens = fitensemble(X,Y,model,numberens,learners)

X is the matrix of data. Each row contains one observation, and each column contains one predictor variable.

Y is the responses, with the same number of observations as rows in X.

model is a character vector, such as 'bag', naming the type of ensemble.

numberens is the number of weak learners in ens from each element of learners. The number of elements in ens is numberens times the number of elements in learners.

learners is a character vector, such as 'tree', naming a weak learner, a weak learner template, or a cell array of such character vectors and templates.

The result of fitensemble is an ensemble object, suitable for making predictions on new data.

Where to Set Name-Value Pairs. There are several name-value pairs you can pass to fitensemble, and several that apply to the weak learners (templateDiscriminant, templateKNN, and templateTree). To determine which name-value pair argument is appropriate, the ensemble or the weak learner:

Use template name-value pairs to control the characteristics of the weak learners.

Use fitensemble name-value pair arguments to control the ensemble as a whole, either for algorithms or for structure.

For example, for an ensemble of boosted classification trees with each tree deeper than the default, set the templateTree name-value pair arguments MinLeafSize and MinParentSize to smaller values than the defaults. Or, MaxNumSplits to a larger value than the defaults. The trees are then leafier (deeper).

To name the predictors in the ensemble (part of the structure of the ensemble), use the PredictorNames name-value pair in fitensemble.

This example shows how to create a classification tree ensemble for the ionosphere data set, and use it to predict the classification of a radar return with average measurements.

Load the ionosphere data set.

load ionosphere

Train a classification ensemble. For binary classification problems, fitcensemble aggregates 100 classification trees using LogitBoost.

Mdl = fitcensemble(X,Y)

Mdl =

classreg.learning.classif.ClassificationEnsemble

ResponseName: 'Y'

CategoricalPredictors: []

ClassNames: {'b' 'g'}

ScoreTransform: 'none'

NumObservations: 351

NumTrained: 100

Method: 'LogitBoost'

LearnerNames: {'Tree'}

ReasonForTermination: 'Terminated normally after completing the requested number of training cycles.'

FitInfo: [100×1 double]

FitInfoDescription: {2×1 cell}

Mdl is a ClassificationEnsemble model.

Plot a graph of the first trained classification tree in the ensemble.

view(Mdl.Trained{1}.CompactRegressionLearner,'Mode','graph');

Descripción: http://www.mathworks.com/help/examples/stats/win64/TrainAClassificationEnsembleExample_01.png

By default, fitcensemble grows shallow trees for boosting algorithms. You can alter the tree depth by passing a tree template object to fitcensemble. For more details, see templateTree.

Predict the quality of a radar return with average predictor measurements.

label = predict(Mdl,mean(X))

label =

cell

'g'

This example shows how to create a regression ensemble to predict mileage of cars based on their horsepower and weight, trained on the carsmall data.

Load the carsmall data set.

load carsmall

Prepare the predictor data.

X = [Horsepower Weight];

The response data is MPG. The only available boosted regression ensemble type is LSBoost. For this example, arbitrarily choose an ensemble of 100 trees, and use the default tree options.

Train an ensemble of regression trees.

Mdl = fitensemble(X,MPG,'LSBoost',100,'Tree')

Mdl =

classreg.learning.regr.RegressionEnsemble

ResponseName: 'Y'

CategoricalPredictors: []

ResponseTransform: 'none'

NumObservations: 94

NumTrained: 100

Method: 'LSBoost'

LearnerNames: {'Tree'}

ReasonForTermination: 'Terminated normally after completing the requested number of training cycles.'

FitInfo: [100×1 double]

FitInfoDescription: {2×1 cell}

Regularization: []

Plot a graph of the first trained regression tree in the ensemble.

view(Mdl.Trained{1},'Mode','graph');

Descripción: http://www.mathworks.com/help/examples/stats/win64/TrainARegressionEnsemble1Example_01.png

By default, fitensemble grows stumps for boosted trees.

Predict the mileage of a car with 150 horsepower weighing 2750 lbs.

mileage = predict(Mdl,[150 2750])

mileage =

22.4236

This example shows how to choose the appropriate split predictor selection technique for your data set when growing a random forest of regression trees. This example also shows how to decide which predictors are most important to include in the training data.

Load and Preprocess Data

Load the carbig data set. Consider a model that predicts the fuel economy of a car given its number of cylinders, engine displacement, horsepower, weight, acceleration, model year, and country of origin. Consider Cylinders, Model_Year, and Origin as categorical variables.

load carbig

Cylinders = categorical(Cylinders);

Model_Year = categorical(Model_Year);

Origin = categorical(cellstr(Origin));

X = table(Cylinders,Displacement,Horsepower,Weight,Acceleration,Model_Year,...

Origin,MPG);

Determine Levels in Predictors

The standard CART algorithm tends to split predictors with many unique values (levels), e.g., continuous variables, over those with fewer levels, e.g., categorical variables. If your data is heterogeneous, or your predictor variables vary greatly in their number of levels, then consider using the curvature or interaction tests for split-predictor selection instead of standard CART.

For each predictor, determine the number of levels in the data. One way to do this is define an anonymous function that:

Converts all variables to the categorical data type using categorical

Determines all unique categories while ignoring missing values using categories

Counts the categories using numel

Then, apply the function to each variable using varfun.

countLevels = @(x)numel(categories(categorical(x)));

numLevels = varfun(countLevels,X(:,1:end-1),'OutputFormat','uniform');

Compare the number of levels among the predictor variables.

figure;

bar(numLevels);

title('Number of Levels Among Predictors');

xlabel('Predictor variable');

ylabel('Number of levels');

h = gca;

h.XTickLabel = X.Properties.VariableNames(1:end-1);

h.XTickLabelRotation = 45;

h.TickLabelInterpreter = 'none';

Descripción: http://www.mathworks.com/help/examples/stats/win64/SelectPredictorsForRandomForestsExample_01.png

The continuous variables have many more levels than the categorical variables. Because the number of levels among the predictors vary so much, using standard CART to select split predictors at each node of the trees in a random forest can yield inaccurate predictor importance estimates.

Grow Robust Random Forest

Grow a random forest of 200 regression trees. Specify sampling all variables at each node. Specify usage of the interaction test to select split predictors. Because there are missing values in the data, specify usage of surrogate splits to increase accuracy.

t = templateTree('NumPredictorsToSample','all',...

'PredictorSelection','interaction-curvature','Surrogate','on');

rng(1); % For reproducibility

Mdl = fitrensemble(X,'MPG','Method','bag','NumLearningCycles',200,...

'Learners',t);

Mdl is a RegressionBaggedEnsemble model.

Estimate the model Descripción: $R^2$ using out-of-bag predictions.

yHat = oobPredict(Mdl);

R2 = corr(Mdl.Y,yHat)^2

R2 =

0.8739

Mdl explains 87.39% of the variability around the mean.

Predictor Importance Estimation

Estimate predictor importance values by permuting out-of-bag observations among the trees.

impOOB = oobPermutedPredictorImportance(Mdl);

impOOB is a 1-by-7 vector of predictor importance estimates corresponding to the predictors in Mdl.PredictorNames. The estimates are not biased toward predictors containing many levels.

Compare the predictor importance estimates.

figure;

bar(impOOB);

title('Unbiased Predictor Importance Estimates');

xlabel('Predictor variable');

ylabel('Importance');

h = gca;

h.XTickLabel = Mdl.PredictorNames;

h.XTickLabelRotation = 45;

h.TickLabelInterpreter = 'none';

Descripción: http://www.mathworks.com/help/examples/stats/win64/SelectPredictorsForRandomForestsExample_02.png

Greater importance estimates indicate more important predictors. The bar graph suggests that Model_Year is the most important predictor, followed by Weight. Model_Year has 13 distinct levels only, whereas Weight has over 300.

Compare predictor importance estimates by permuting out-of-bag observations and those estimates obtained by summing gains in the mean squared error due to splits on each predictor. Also, obtain predictor association measures estimated by surrogate splits.

[impGain,predAssociation] = predictorImportance(Mdl);

figure;

plot(1:numel(Mdl.PredictorNames),[impOOB' impGain']);

title('Predictor Importance Estimation Comparison')

xlabel('Predictor variable');

ylabel('Importance');

h = gca;

h.XTickLabel = Mdl.PredictorNames;

h.XTickLabelRotation = 45;

h.TickLabelInterpreter = 'none';

legend('OOB permuted','MSE improvement')

grid on

Descripción: http://www.mathworks.com/help/examples/stats/win64/SelectPredictorsForRandomForestsExample_03.png

impGain is commensurate with impOOB. According to the values of impGain, Model_Year and Weight do not appear to be the most important predictors.

predAssociation is a 7-by-7 matrix of predictor association measures. Rows and columns correspond to the predictors in Mdl.PredictorNames. You can infer the strength of the relationship between pairs of predictors using the elements of predAssociation. Larger values indicate more highly correlated pairs of predictors.

figure;

imagesc(predAssociation);

title('Predictor Association Estimates');

colorbar;

h = gca;

h.XTickLabel = Mdl.PredictorNames;

h.XTickLabelRotation = 45;

h.TickLabelInterpreter = 'none';

h.YTickLabel = Mdl.PredictorNames;

predAssociation(1,2)

ans =

0.6830

Descripción: http://www.mathworks.com/help/examples/stats/win64/SelectPredictorsForRandomForestsExample_04.png

The largest association is between Cylinders and Displacement, but the value is not high enough to indicate a strong relationship between the two predictors.

Grow Random Forest Using Reduced Predictor Set

Because prediction time increases with the number of predictors in random forests, it is good practice to create a model using as few predictors as possible.

Grow a random forest of 200 regression trees using the best two predictors only.

MdlReduced = fitrensemble(X(:,{'Model_Year' 'Weight' 'MPG'}),'MPG','Method','bag',...

'NumLearningCycles',200,'Learners',t);

Compute the Descripción: $R^2$ of the reduced model.

yHatReduced = oobPredict(MdlReduced);

r2Reduced = corr(Mdl.Y,yHatReduced)^2

r2Reduced =

0.8525

The Descripción: $R^2$ for the reduced model is close to the Descripción: $R^2$ of the full model. This result suggests that the reduced model is sufficient for prediction.

Usually you cannot evaluate the predictive quality of an ensemble based on its performance on training data. Ensembles tend to overtrain, meaning they produce overly optimistic estimates of their predictive power. This means the result of resubLoss for classification (resubLoss for regression) usually indicates lower error than you get on new

Enjoying the preview?

Page 1 of 1

DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB

About this ebook

César Pérez López

Read more from César Pérez López

DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB

DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB

DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS

DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB

DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB

Related authors

Related to DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES

Related ebooks

Test Driven Machine Learning: Control your machine learning algorithms using test-driven development to achieve quantifiable milestones

PyTorch Cookbook: 100+ Solutions across RNNs, CNNs, python tools, distributed training and graph networks

Effective Amazon Machine Learning

Python for Data Science: A Practical Approach to Machine Learning

Machine Learning in Production: Master the art of delivering robust Machine Learning solutions with MLOps (English Edition)

Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)

Java for Data Science

Ultimate Machine Learning with Scikit-Learn: Unleash the Power of Scikit-Learn and Python to Build Cutting-Edge Predictive Modeling Applications and Unlock Deeper Insights Into Machine Learning (English Edition)

Introduction to Machine Learning with Python

Instant StyleCop Code Analysis How-to

Social Media Data Mining and Analytics

Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood

Meshing, Geometric Modeling and Numerical Simulation 1: Form Functions, Triangulations and Geometric Modeling

Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories

Hydraulic Modeling for Effective Flow Management in Managed Pressure Drilling

Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis

Hebbian Learning: Fundamentals and Applications for Uniting Memory and Learning

Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs

OpenFlow Cookbook

Kernel Methods: Fundamentals and Applications

Robot Operating System: Mastering Autonomous Systems for Seamless Integration and Control

Advanced Functional Programming: Mastering Concepts and Techniques

Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models

Ultimate Robotics Programming with ROS 2 and Python

Building Modern GUIs with tkinter and Python: Building user-friendly GUI applications with ease (English Edition)

The Official Raspberry Pi Handbook 2023: Astounding projects with Raspberry Pi computers

Dynamic Bayesian Networks: Fundamentals and Applications

Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition

Mastering AndEngine Game Development: Move beyond basic games and explore the limits of AndEngine

Rake Task Management Essentials

Applications & Software For You

Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence

80 Ways to Use ChatGPT in the Classroom

Canva Tips and Tricks Beyond The Limits

Logic Pro X For Dummies

Blender All-in-One For Dummies

55 Smart Apps to Level up Your Brain: Free Apps, Games, and Tools for iPhone, iPad, Google Play, Kindle Fire, Web Browsers, Windows Phone, & Apple Watch

The Beginner's Guide to Procreate Dreams: How to Create and Animate Your Stories on the iPad

Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1

GarageBand For Dummies

The Basics of User Experience Design by Interaction Design Foundation

Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.

Photoshop - Stupid. Simple. Photoshop: A Noobie's Guide to Using Photoshop TODAY

Adobe Illustrator CC For Dummies

The Designer’s Guide to Figma: Master Prototyping, Collaboration, Handoff, and Workflow

Windows 11 for Beginners: The Complete Step-by-Step User Guide to Learn and Take Full Use of Windows 11 (A Windows 11 Manual with Useful Tips & Tricks)

Mastering ChatGPT

2022 Adobe® Premiere Pro Guide For Filmmakers and YouTubers

Microsoft Word Guide for Success: Achieve Efficiency and Professional Results in Every Document [IV EDITION]

Animation for Beginners: Getting Started with Animation Filmmaking

Tableau Your Data!: Fast and Easy Visual Analysis with Tableau Software

Mastering YouTube Automation: The Ultimate Guide to Creating a Successful Faceless Channel

YouTube Channels For Dummies

Skulls & Anatomy: Copyright Free Vintage Illustrations for Artists & Designers

Smartphone Photography

Digital Video Production Handbook

AutoCAD For Dummies

Photoshop For Beginners: Learn Adobe Photoshop cs5 Basics With Tutorials

Canva For Dummies

Sound Design for Filmmakers: Film School Sound

Blender 4.3 Guide for All: Mastering 3D Design and Animation

Related categories

Reviews for DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES

What did you think?

Book preview

DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES - César Pérez López