0% found this document useful (0 votes)
50 views

Regression Trees Chapter2

The document provides an introduction to regression trees and machine learning with tree-based models in R. It discusses training a regression tree using rpart(), performing a train/validation/test split of data, common metrics for evaluating regression like MAE and RMSE, hyperparameters that can be tuned for decision trees like minsplit, maxdepth, and cp, using grid search to evaluate multiple models trained with different hyperparameter combinations, and selecting the best model based on validation set performance.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Regression Trees Chapter2

The document provides an introduction to regression trees and machine learning with tree-based models in R. It discusses training a regression tree using rpart(), performing a train/validation/test split of data, common metrics for evaluating regression like MAE and RMSE, hyperparameters that can be tuned for decision trees like minsplit, maxdepth, and cp, using grid search to evaluate multiple models trained with different hyperparameter combinations, and selecting the best model based on validation set performance.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Introduction to

regression trees
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN R

Erin LeDell
Instructor
Train a Regression Tree in R
rpart(formula = ___,
data = ___,
method = ___)

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Train/Validation/Test Split
training set

validation set

test set

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Let's practice!
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN R
Performance metrics
for regression
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN R

Erin LeDell
Instructor
Common metrics for regression
Mean Absolute Error (MAE)
1
MAE = ∑ ∣ actual − predicted ∣
n
Root Mean Square Error (RMSE)

RMSE = √ ∑ (actual − predicted)2


1
n

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Evaluate a regression tree model
pred <- predict(object = model, # model object
newdata = test) # test dataset

library(Metrics)

# Compute the RMSE


rmse(actual = test$response, # the actual values
predicted = pred) # the predicted values

2.278249

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Let's practice!
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN R
What are the
hyperparameters for
a decision tree?
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN R

Erin LeDell
Instructor
Decision tree hyperparameters
?rpart.control

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Decision tree hyperparameters
minsplit: minimum number of data points required to attempt a
split

cp: complexity parameter

maxdepth: depth of a decision tree

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Cost-Complexity Parameter (CP)
plotcp(grade_model)

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Cost-Complexity Parameter (CP)
print(model$cptable)

CP nsplit rel error xerror xstd


1 0.06839852 0 1.0000000 1.0080595 0.09215642
2 0.06726713 1 0.9316015 1.0920667 0.09543723
3 0.03462630 2 0.8643344 0.9969520 0.08632297
4 0.02508343 3 0.8297080 0.9291298 0.08571411
5 0.01995676 4 0.8046246 0.9357838 0.08560120
6 0.01817661 5 0.7846679 0.9337462 0.08087153
7 0.01203879 6 0.7664912 0.9092646 0.07982862
8 0.01000000 7 0.7544525 0.9407895 0.08399125

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Cost-Complexity Parameter (CP)
# Prune the model to optimized cp value
model_opt <- prune(tree = model, cp = cp_opt)

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Let's practice!
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN R
Grid Search for
model selection
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN R

Erin LeDell
Instructor
Grid Search
What is a model hyperparameter?

What is a "grid"?

What is the goal of a grid search?

How is the best model chosen?

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Set up the grid
# Establish a list of possible hyper_grid[1:10,]
# values for minsplit & maxdepth

minsplit maxdepth
splits <- seq(1, 30, 5)
1 1 5
depths <- seq(5, 40, 10)
2 6 5
3 11 5
# Create a data frame containing 4 16 5
# all combinations 5 21 5
6 26 5
hyper_grid <- expand.grid( 7 1 15
minsplit = splits 8 6 15
maxdepth = depths 9 11 15
10 16 15

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Grid Search in R: Train models
# Create an empty list to store models
models <- list()

# Execute the grid search


for (i in 1:nrow(hyper_grid)) {
# Get minsplit, maxdepth values at row i
minsplit <- hyper_grid$minsplit[i]
maxdepth <- hyper_grid$maxdepth[i]

# Train a model and store in the list


models[[i]] <- rpart(formula = response ~ .,
data = train,
method = "anova",
minsplit = minsplit,
maxdepth = maxdepth)

MACHINE LEARNING WITH TREE-BASED MODELS IN R


# Create an empty vector to store RMSE values
rmse_values <- c()

# Compute validation RMSE


for (i in 1:length(models)) {

# Retreive the i^th model from the list


model <- models[[i]]

# Generate predictions on grade_valid


pred <- predict(object = model,
newdata = valid)

# Compute validation RMSE and add to the


rmse_values[i] <- rmse(actual = valid$response,
predicted = pred)
}

MACHINE LEARNING WITH TREE-BASED MODELS IN R


Let's practice!
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN R

You might also like