Quiz_2_2021_sol
Quiz_2_2021_sol
No explanation for any question would be provided. Please make any assumptions and solve the quiz.
1. A machine learning model is being developed to perform regression analysis. The three models being
considered are Ridge Regression, Lasso Regression and K-Nearest Neighbors regression. The training
10 is performed on a data set of size thousand. Is it possible to have hyperparameters in the three model
such that all the three models give exactly the same prediction? Briefly explain if this is possible.
Solution:
Yes, it is possible to have the three models give exactly the same answer with a specific choice of the
hyperparameters.
The hyperparameter “t” for Ridge ( ) and Lasso ( ) regression should be 0; and the k
value in knn regression should be 1000 (size of the dataset). This will lead to all the regression results
being equal to the mean of the data set.
2. A highly accurate (accuracy of 99%) classification algorithm is developed that can predict colour
10 blindness of a person. If color blindness is found in around 1 in 12 men and 1 in 200 women then
comment on whether the above classification algorithm is good enough for classifying colour
blindness in men and women. Please provide a brief explanation for your answer and with possible
remedial solutions.
Solution:
No, this is not a good enough classifier for the classification of color blindness, especially for women.
This is because 199 women (out of 200) are not color blind, thus even if a classifier always predicts that
a woman is not color blind will be (199/200)x100=99.5% accurate. This is due to the fact that the two
possible outcomes (color blind and not color blind) are highly skewed towards one outcome (not color
blind). To address this, we need to see the confusion matrix.
3. Which of the two, linear regression and K-Nearest Neighbors regression, would be a better model for
performing regression analysis on the following data:
10
a. For highly nonlinear output variable
b. Non-uniformly distributed data in the predictor space
Solution:
a. For this, K-Nearest Neighbors regression would be a better model since the linear model will not be
able to capture the highly nonlinearity.
b. For this, the linear regression would be a better model since the K-Nearest Neighbors model will
perform poorly in places where the data is sparse.
4. a.) In a regression analysis, two loss- functions, "Mean Square Error" and "Mean Absolute Error", were
used. Please provide one advantage and one disadvantage of them.
10+10
b.) What would be the trivial machine learning model if the "Mean Bias Error" loss function is being
used for the regression analysis.
Solution:
a.)
Advantage Disadvantage
Mean Square Error It has mathematical properties Due to squaring, predictions
which makes it easier to which are far away from actual
calculate gradients. values are penalized heavily in
comparison to less deviated
predictions.
Mean Absolute Error MAE is more robust to outliers MAE needs more complicated
since it does not make use of tools such as linear
square. programming to compute the
gradients.
b.) The trivial model would be 𝑦𝑦 = 𝑦𝑦� (i.e., mean value of y from the training dataset). This will give zero
Mean Bias Error (for training).
5. In Bagging Classification trees with hundred predictors, a very large majority of predictions were the
same. This made classification based on majority vote very simple. However, this also means that all
10 the trees (in bagging classification) are highly correlated. How could this be addressed? Provide a brief
description.
Solution:
Suppose there is one very strong predictor in the data set. And if it is also very strongly correlated to
several other predictors, then a large majority of the Bagging Classification trees would result in the
same classification. This can be addressed by Random Forest by forcing each split to use only one of the
predictors from a small set of randomly chosen predictors.
6. A data set with moderate random noise (in the output variable) is used for regression analysis. How
would the variance and bias change for a linear regression versus tenth-degree polynomial regression
when:
10
a. You have limited data
b. You have a very large data set
Solution:
Linear regression
Tenth-degree polynomial regression
a.) Variance Low High
a.) Bias Moderate Low
b.) Variance Low Low (because very large dataset)
b.) Bias Moderate Low
Thus, if you have very limited data, use a less flexible model; however, if you have a very large dataset,
you can use a more flexible model.
7. What is the key difference between the dimensionality reduction achieved by Lasso Regression and
10 principal component analysis?
Solution: Lasso Regression removes some variables based on their ability to predict the output variable;
however, principal component analysis reduces dimensionality only based on the variance of the input
(predictor) variables, without considering their ability to predict the output variable.
1 2 1
8. For a data set (with two predictors (X1, X2)) the covariance matrix is given by 𝐶𝐶 = � �.
3 1 1
This corresponds to eigenvalues and eigenvectors as follows:
20
Eigenvalues λ1 = 0.8727 and λ2 = 0.1273
−0.85065 0.52571
Eigenvectors 𝑎𝑎1 = � � and 𝑎𝑎2 = � �
−0.52571 −0.85065
What would be the eigenvalues and eigenvectors for data sets which have covariance Matrix as:
1 2 + 0.6 1 1 2 1 1 5 3
a. 𝐶𝐶1 = � � b. 𝐶𝐶2 = � �+ � �
3 1 1 + 0.6 3 1 1 9 3 2
Solution:
1 2 + 0.6 1 1 2 1 0.6 1 1
a. 𝐶𝐶1 = � � = 3� �+ 3 � �
3 1 1 + 0.6 1 1 1 1
1 1
Eigenvector remains the same, the Eigenvalue would be (𝜆𝜆1 + (0.6)) = 1.0727 and (𝜆𝜆2 + (0.6)) =
3 3
0.3273.
1 2 1 1 5 3 1 2 1 1 2 1 1 2 1
b. 𝐶𝐶2 = � �+ 9� �= � � + 3� � × 3� �
3 1 1 3 2 3 1 1 1 1 1 1
Therefor, eigenvector remains the same, the eigenvalue would be (𝜆𝜆1 + 𝜆𝜆12 ) = 1.63and (𝜆𝜆2 + 𝜆𝜆22 ) =
0.144.