0% found this document useful (0 votes)
5 views

Linear Regression Final Exam

The document discusses various statistical concepts related to regression analysis, including multicollinearity, variable selection criteria like Radj2 and PRESS, and the implications of high R2 values indicating potential overfitting. It highlights the importance of using VIF over pairwise correlation for assessing feature relationships and emphasizes the need for careful variable selection to maintain model interpretability. Additionally, it addresses issues such as residuals with misspecified mean structures and non-constant variance.

Uploaded by

Kauesh Chaudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Linear Regression Final Exam

The document discusses various statistical concepts related to regression analysis, including multicollinearity, variable selection criteria like Radj2 and PRESS, and the implications of high R2 values indicating potential overfitting. It highlights the importance of using VIF over pairwise correlation for assessing feature relationships and emphasizes the need for careful variable selection to maintain model interpretability. Additionally, it addresses issues such as residuals with misspecified mean structures and non-constant variance.

Uploaded by

Kauesh Chaudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Q21.

Rj2 is the the coefficient of determination obtained when the regression of j th feature is
performed on rest of the features in our data.
The phenomenon of multicollinearity involves high correlation between two or more features.
Pairwise correlation considers only correlation between pairs of features, whereas VIF identifies
correlation between a feature and a group of other features. Hence, VIF is better.
Q22. Radj2 and PRESS are good criteria for variable selection as they take into account how a
particular variable improves the performance of our model. R2 increases irrespective of whether
a variable added to our model improves the model performance or not. MSE and SSE are
accuracy measures of our model and doesn’t play a role in variable selection.
Q23.

Here the residuals have misspecified mean structure.

Here, the residuals have non constant variance.


Here, we can see serial correlations in residuals

Residuals are not normally distributed.


Q24.

(a) Model: Y = βo+ β1 * X1 + β2 * X2 + e

(b) Model: Y = βo+ β1 * X1 + β2 * X2 + β3 * X1 * X2+ e

Q25.This statement is true. As per forward stepwise regression procedure, we add one variable at a
time. So, for a target variable, there can be multiple predictor variables in our dataset. However, not all
of them are important contributors. So, if we include all of the features, this will not make a good model
as it directly affects the model interpretability as well. Hence, in order to include the important features,
we apply forward stepwise regression procedure to firstly identify the best predictor and then gradually
add the rest one-by-one on the basis of their importance.

Q26. R2 of 0.991 is very high, this simply means that our model has fit our data almost perfectly
including any random effects in our data such as outlier or noise as well. This is a case of overfitting as
our model is completely attuned to the training data itself. Hence, the comment made by the classmate
is correct.

Adjusted R2 is a better descriptive measure than R2 as it doesn’t increase like R2 if we keep on adding any
random independent variable.

Q27. This information means that there exists a linear combination between the independent variables.
The problem might have developed due to multicollinearity producing almost perfectly linearly
dependent columns.

This could also be because of single matrix created when the student uses an incorrect indicator variable
and included an additional indicator column which created linearly dependent columns.

You might also like