Ineg4163 Homework (Classification and Regression Tree) : Load ("Path To The Data On Your Computer/Data3.Rdata")
Ineg4163 Homework (Classification and Regression Tree) : Load ("Path To The Data On Your Computer/Data3.Rdata")
Step 1: Go to Blackboard, download “data3.RData” from the “data” folder, and save it to
your computer (remember where you save it).
Step 2: Open R, and load this data by running the following R code:
load("path to the data on your computer/data3.RData")
Step 3: After the data has been loaded, select the columns that we would like to use by
running the following code (because we don’t need all columns in this homework):
data = data3[,c("o3","T","w_x","w_y","L1","L2","L3","L4","L5")]
Here, “o3” is the response, while other columns are the features.
Question 1). Use the “rpart” function to build a regression tree with cp= 0.0005 (using all
features). Plot the tree and insert your plot below (note that, when you plot the tree, set
“uniform = TRUE” for better visualization) (3 points)
Question 2). Investigate the relationship between the tree performance and the choice of
cp using the function “plotcp”. Insert the plot below (3 points)
Question 3). Based on the plot above, choose an appropriate value for cp, prune the tree
using the “prune” function, and show the pruned tree below (3 points)
Question 4). Based on the tree above, can you tell which feature is the perhaps the most
important one? And why? (1 point)
Based on the tree above the most important feature is T, because the first node is
branched based on the value of T