0% found this document useful (0 votes)
2 views

lesson-3.2-introduction-to-regression-structured-projects

The document discusses a structured data project focused on predicting bulldozer sale prices using regression techniques. It covers important concepts such as cross-validation, training and test splits, and various regression metrics like R2, MAE, and MSE. The document emphasizes the significance of understanding model generalization and choosing appropriate evaluation metrics for performance assessment.

Uploaded by

soulopp27
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

lesson-3.2-introduction-to-regression-structured-projects

The document discusses a structured data project focused on predicting bulldozer sale prices using regression techniques. It covers important concepts such as cross-validation, training and test splits, and various regression metrics like R2, MAE, and MSE. The document emphasizes the significance of understanding model generalization and choosing appropriate evaluation metrics for performance assessment.

Uploaded by

soulopp27
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Structured Data Project 2:

Predicting the sale price of


Bulldozers (regression)
Data

🕰
🚜 💰
Where can you get help?

• Follow along with the code


• Try it for yourself
• Press SHIFT + TAB to read the docstring
• Search for it
• Try again
• Ask
Cross-validation
5-fold Cross-validation
100 patient records
Normal Train & Test Split

100 patient records

Split 20 80 patient records

80 patient records 20

Training split (80%) Test split (20%)

Model is trained on training data, and evaluated on the test


data.

Model is trained on 5 different versions of training data, and


evaluated on 5 different versions of the test data.
The most important concept in
machine learning
(the 3 sets)

Course materials Practice exam Final exam


(training set) (validation set) (test set)

The ability for a machine learning model to perform


Generalization well on data it hasn’t seen before.
Classification and Regression
metrics
Classification Regression

Accuracy R2 (r-squared)

Precision Mean absolute error (MAE)

Recall Mean squared error (MSE)

F1 Root mean squared error (RMSE)

Bold = default evaluation in Scikit-Learn


Which regression metric should you
use?
• R2 is similar to accuracy. It gives you a quick indication of how well your model might be doing.
Generally, the closer your R2 value is to 1.0, the better the model. But it doesn't really tell exactly
how wrong your model is in terms of how far off each prediction is.
• MAE gives a better indication of how far off each of your model's predictions are on average.
• As for MAE or MSE, because of the way MSE is calculated, squaring the differences between
predicted values and actual values, it amplifies larger differences. Let's say we're predicting the
value of houses (which we are).
• Pay more attention to MAE: When being $10,000 off is twice as bad as being $5,000 off.
• Pay more attention to MSE: When being $10,000 off is more than twice as bad as being
$5,000 off.

You might also like