Linear Regression Final Exam
Linear Regression Final Exam
Rj2 is the the coefficient of determination obtained when the regression of j th feature is
performed on rest of the features in our data.
The phenomenon of multicollinearity involves high correlation between two or more features.
Pairwise correlation considers only correlation between pairs of features, whereas VIF identifies
correlation between a feature and a group of other features. Hence, VIF is better.
Q22. Radj2 and PRESS are good criteria for variable selection as they take into account how a
particular variable improves the performance of our model. R2 increases irrespective of whether
a variable added to our model improves the model performance or not. MSE and SSE are
accuracy measures of our model and doesn’t play a role in variable selection.
Q23.
Q25.This statement is true. As per forward stepwise regression procedure, we add one variable at a
time. So, for a target variable, there can be multiple predictor variables in our dataset. However, not all
of them are important contributors. So, if we include all of the features, this will not make a good model
as it directly affects the model interpretability as well. Hence, in order to include the important features,
we apply forward stepwise regression procedure to firstly identify the best predictor and then gradually
add the rest one-by-one on the basis of their importance.
Q26. R2 of 0.991 is very high, this simply means that our model has fit our data almost perfectly
including any random effects in our data such as outlier or noise as well. This is a case of overfitting as
our model is completely attuned to the training data itself. Hence, the comment made by the classmate
is correct.
Adjusted R2 is a better descriptive measure than R2 as it doesn’t increase like R2 if we keep on adding any
random independent variable.
Q27. This information means that there exists a linear combination between the independent variables.
The problem might have developed due to multicollinearity producing almost perfectly linearly
dependent columns.
This could also be because of single matrix created when the student uses an incorrect indicator variable
and included an additional indicator column which created linearly dependent columns.