09_Regression
09_Regression
Regression
2
3 4
Regression in sklearn (our fav library) Regression in sklearn (our fav library)
■ There are multiple methods for regression supported in sklearn: ■ There are multiple methods for regression supported in sklearn:
● Nearest Neighbour regression
5 6
● Decision Trees Regression ● HuberRegressor should be faster than RANSAC and Theil Sen unless the
number of samples are very large, i.e n_samples >> n_features.
● HuberRegressor should be more robust than RANSAC and Theil Sen on
default parameters.
● RANSAC is faster than Theil Sen and scales much better with the number
of samples.
● RANSAC will deal better with large outliers in the y direction (most
common situation).
● Theil Sen will cope better with medium-size outliers in the X direction, but
this property will disappear in high-dimensional settings.
7 8
Evaluating Regression Models
Evaluating
Regression ■ Why can’t we use accuracy to evaluate our regression models?
● We have a continuous target variable.
Models
● If we evaluate accuracy for each one of the data points we will
obtain awful results
■ We need other type of metrics to properly evaluate our models.
9 10
■ One of the most used metrics ■ When the target variable has a single dimension, some users tend
to normalize it, whereas other don´t.
■ We try to calculate the difference between the predicted values
and the actual ones. ■ The value of MAE will vary between normalized and
non-normalized approaches.
■ If is the predicted value and yi the expected one, the error would
be ■ Defining the error as a percentage variation from the actual values,
solves these situations:
●
■ As it would not be useful to present it as the total error, we
calculate the mean:
11 12
Root Mean Squared Error (RMSE) R-squared (R2)
■ RMSE is another widely used metric for regression models. ■ R-squared explains to what extent the variance of one variable
explains the variance of the second variable.
■ Is similar to the MSE, but the result is square-rooted.
● It is also known as the Coefficient of Determination.
13 14
■ R-squared explains to what extent the variance of one variable ■ If we have an overfitted model can have a high R-squared we can
explains the variance of the second variable. help this problem with the adjusted R-squared measure.
● It is also known as the Coefficient of Determination.
15 16
Exercise Do you have any questions?
[email protected]
Thanks!
17 18