30 Questions To Test A Data Scientist On Linear Regression PDF
30 Questions To Test A Data Scientist On Linear Regression PDF
$ (https://in.linkedin.com/company/analytics-vidhya)
CORPORATE (HTTPS://WWW.ANALYTICSVIDHYA.COM/CORPORATE/)
(https://www.analyticsvidhya.com/myfeed/?utm- (https://courses.analyticsvidhya.com/bundles/ai-blackbelt-beginner-to-master/?
source=blog&utm-medium=top-icon/) utm_source=blog&utm_medium=top_banner_blog&utm_canpaign=pre-bookseat)
Home (https://www.analyticsvidhya.com/) » 30 Questions to test a data scientist on Linear Regression [Solution: Skilltest – Linear
Regression]
ADVANCED (HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/CATEGORY/ADVANCED/)
CAREER (HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/CATEGORY/CAREER/)
EDUCATION (HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/CATEGORY/EDUCATION/)
SKILLTEST (HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/CATEGORY/SKILLTEST/)
(https://www.analyticsvidhya.com/data-science-immersive-bootcamp/?
utm_source=blog&utm_medium=bannerbelowtitle&utm_campaign=1feb_applicationsclosing)
Introduction
Linear Regression is still the most prominently used statistical technique in data science industry and in
academia to explain relationships between features.
A total of 1,355 people registered for this skill test. It was specially designed for you to test your knowledge on
linear regression techniques. If you are one of those who missed out on this skill test, here are the questions and
solutions. You missed on the real time test, but can read this article to [nd out how many could have answered
correctly.
Overall Distribution
https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/ Page 1 of 13
30 Questions to test a data scientist on Linear Regression 09/02/20, 9'16 PM
Helpful Resources
Here are some resources to get in depth knowledge in the subject.
5 Questions which can teach you Multiple Regression (with R and Python)
(https://www.analyticsvidhya.com/blog/2015/10/regression-python-beginners/)
Going Deeper into Regression Analysis with Assumptions, Plots & Solutions
(https://www.analyticsvidhya.com/blog/2016/07/deeper-regression-analysis-assumptions-plots-
solutions/)
A) TRUE
B) FALSE
Solution: (A)
Yes, Linear regression is a supervised learning algorithm because it uses true labels for training. Supervised
learning algorithm should have input variable (x) and an output variable (Y) for each example.
A) TRUE
B) FALSE
Solution: (A)
A) TRUE
B) FALSE
https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/ Page 2 of 13
30 Questions to test a data scientist on Linear Regression 09/02/20, 9'16 PM
Solution: (A)
True. A Neural network can be used as a universal approximator, so it can de[nitely implement a linear
regression algorithm.
4) Which of the following methods do we use to [nd the best [t line for data in Linear Regression?
Solution: (A)
In linear regression, we try to minimize the least square errors of the model to identify the line of best [t.
5) Which of the following evaluation metrics can be used to evaluate a model while modeling a
continuous output variable?
A) AUC-ROC
B) Accuracy
C) Logloss
D) Mean-Squared-Error
Solution: (D)
Since linear regression gives output as continuous values, so in such case we use mean squared error metric to
evaluate the model performance. Remaining options are use in case of a classi[cation problem.
6) True-False: Lasso Regularization can be used for variable selection in Linear Regression.
A) TRUE
B) FALSE
Solution: (A)
True, In case of lasso regression we apply absolute penalty which makes some of the coeocients zero.
A) Lower is better
B) Higher is better
C) A or B depend on the situation
D) None of these
Solution: (A)
Residuals refer to the error values of the model. Therefore lower residuals are desired.
8) Suppose that we have N independent variables (X1,X2… Xn) and dependent variable is Y. Now Imagine that
you are applying linear regression by [tting the best [t line using least square error on this data.
You found that correlation coeocient for one of it’s variable(Say X1) with Y is -0.95.
Solution: (B)
https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/ Page 3 of 13
30 Questions to test a data scientist on Linear Regression 09/02/20, 9'16 PM
The absolute value of the correlation coeocient denotes the strength of the relationship. Since absolute
correlation is very high it means that the relationship is strong between X1 and Y.
9) Looking at above two characteristics, which of the following option is the correct for Pearson
correlation between V1 and V2?
If you are given the two variables V1 and V2 and they are following below two characteristics.
Solution: (D)
We cannot comment on the correlation coeocient by using only statement 1. We need to consider the both of
these two statements. Consider V1 as x and V2 as |x|. The correlation coeocient would not be close to 1 in such
a case.
10) Suppose Pearson correlation between V1 and V2 is zero. In such case, is it right to conclude
that V1 and V2 do not have any relation between them?
A) TRUE
B) FALSE
Solution: (B)
Pearson correlation coeocient between 2 variables might be zero even when they have a relationship between
them. If the correlation coeocient is zero, it just means that that they don’t move together. We can take examples
like y=|x| or y=x^2.
11) Which of the following offsets, do we use in linear regression’s least square line [t? Suppose
horizontal axis is independent variable and vertical axis is dependent variable.
A) Vertical offset
B) Perpendicular offset
C) Both, depending on the situation
D) None of above
Solution: (A)
We always consider residuals as vertical offsets. We calculate the direct differences between actual value and
the Y labels. Perpendicular offset are useful in case of PCA.
12) True- False: Over[tting is more likely when you have huge amount of data to train?
https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/ Page 4 of 13
30 Questions to test a data scientist on Linear Regression 09/02/20, 9'16 PM
A) TRUE
B) FALSE
Solution: (B)
With a small training dataset, it’s easier to [nd a hypothesis to [t the training data exactly i.e. over[tting.
13) We can also compute the coeocient of linear regression with the help of an analytical method
called “Normal Equation”. Which of the following is/are true about Normal Equation?
A) 1 and 2
B) 1 and 3
C) 2 and 3
D) 1,2 and 3
Solution: (D)
Instead of gradient descent, Normal Equation can also be used to [nd coeocients. Refer this article
(http://eli.thegreenplace.net/2014/derivation-of-the-normal-equation-for-linear-regression/) for read more about
normal equation.
14) Which of the following statement is true about sum of residuals of A and B?
Below graphs show two [tted regression lines (A & B) on randomly generated data. Now, I want to [nd the sum
of residuals in both cases A and B.
Note:
Solution: (C)
Sum of residuals will always be zero, therefore both have same sum of residuals
Suppose you have [tted a complex regression model on a dataset. Now, you are using Ridge regression with JJ O
NNO
O II N
OW W
N
penality x. www.GoSourcing365
(https://www.analyticsvidhya.com/myfeed/?
https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/ Page 5 of 13
30 Questions to test a data scientist on Linear Regression 09/02/20, 9'16 PM
utm-source=blog&utm-
15) Choose the option which describes bias in best manner.
RECOMMENDED RESOUR
A) In case of very large x; bias is BLOG
medium=top-icon/) low (HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/?UTM_SOURCE=HOME_BLOG_NAVBAR) &
B) In case of very large x; bias is high
C)COURSES
We can’t say about bias
(HTTPS://COURSES.ANALYTICSVIDHYA.COM) & HACKATHONS (HTTPS://DATAHACK.ANALYTICSVIDHYA.COM/CONTEST/ALL) Practice & L
D) None of these Loan Prediction
DSAT (HTTPS://DSAT.ANALYTICSVIDHYA.COM/?UTM_SOURCE=HOME_BLOG_NAVBAR)
Solution: (B)
(https://datahack.analyticsvidh
If BOOTCAMP problem-loan-prediction-iii/?
the penalty is(HTTPS://WWW.ANALYTICSVIDHYA.COM/DATA-SCIENCE-IMMERSIVE-BOOTCAMP/?UTM_SOURCE=HOME_BLOG_NAVBAR)
very large it means model is less complex, therefore the bias would be high.
utm_source=Recommendation_
Free Course
CONTACT (HTTPS://WWW.ANALYTICSVIDHYA.COM/CONTACT/) Creating Time S
16) What will happen when you apply very large penalty? using Python
In lasso some of the coeocient value become zero, but in case of Ridge, the coeocients become close to zero (https://courses.analyticsvidhya
but not zero. machine-learning-beginner-to-p
utm_source=Recommendation_
17) What will happen when you apply very large penalty in case of Lasso? RECOMMENDED READS
A) Some of the coeocient will become zero
B) Some of the coeocient will be approaching to zero but not absolute zero
C) Both A and B depending on the situation Commonly u
D) None of these Machine Lea
Algorithms (
Solution: (A) and R Codes
As already discussed, lasso applies absolute penalty, so some of the coeocients will become zero. (https://www.analyticsvidhya.co
machine-learning-algorithms/?
utm_source=30-questions-to-te
scientist-on-linear-regression)
18) Which of the following statement is true about outliers in Linear regression?
A Complete
A) Linear regression is sensitive to outliers Tutorial to L
B) Linear regression is not sensitive to outliers Science from
C) Can’t say
D) None of these (https://www.analyticsvidhya.co
tutorial-learn-data-science-pyth
Solution: (A) utm_source=30-questions-to-te
scientist-on-linear-regression)
The slope of the regression line will change due to outliers in most of the cases. So Linear Regression is 7 Regressio
sensitive to outliers. Techniques
know!
(https://www.analyticsvidhya.co
19) Suppose you plotted a scatter plot between the residuals and predicted values in linear
guide-regression/?utm_source=
regression and you found that there is a relationship between them. Which of the following
to-test-a-data-scientist-on-linea
conclusion do you make about this situation?
Solution: (A)
There should not be any relationship between predicted values and residuals. If there exists any relationship
between them,it means that the model has not perfectly captured the information in the data.
https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/ Page 6 of 13
30 Questions to test a data scientist on Linear Regression 09/02/20, 9'16 PM
Suppose that you have a dataset D1 and you design a linear regression model of degree 3 polynomial and you
found that the training and testing error is “0” or in another terms it perfectly [ts the data.
20) What will happen when you [t degree 4 polynomial in linear regression?
A) There are high chances that degree 4 polynomial will over [t the data
B) There are high chances that degree 4 polynomial will under [t the data
C) Can’t say
D) None of these
Solution: (A)
Since is more degree 4 will be more complex(over[t the data) than the degree 3 model so it will again perfectly [t
the data. In such case training error will be zero but test error may not be zero.
21) What will happen when you [t degree 2 polynomial in linear regression?
A) It is high chances that degree 2 polynomial will over [t the data
B) It is high chances that degree 2 polynomial will under [t the data
C) Can’t say
D) None of these
Solution: (B)
If a degree 3 polynomial [ts the data perfectly, it’s highly likely that a simpler model(degree 2 polynomial) might
under [t the data.
22) In terms of bias and variance. Which of the following is true when you [t degree 2 polynomial?
Solution: (C)
Since a degree 2 polynomial will be less complex as compared to degree 3, the bias will be high and variance will
be low.
Which of the following is true about below graphs(A,B, C left to right) between the cost function and Number of
iterations?
23) Suppose l1, l2 and l3 are the three learning rates for A,B,C respectively. Which of the following
is true about l1,l2 and l3?
A) l2 < l1 < l3
https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/ Page 7 of 13
30 Questions to test a data scientist on Linear Regression 09/02/20, 9'16 PM
B) l1 > l2 > l3
C) l1 = l2 = l3
D) None of these
Solution: (A)
In case of high learning rate, step will be high, the objective function will decrease quickly initially, but it will not
[nd the global minima and objective function starts increasing after a few iterations.
In case of low learning rate, the step will be small. So the objective function will decrease slowly
We have been given a dataset with n records in which we have input attribute as x and output attribute as y.
Suppose we use a linear regression method to model this data. To test our linear regressor, we split the data in
training set and test set randomly.
24) Now we increase the training set size gradually. As the training set size increases, what do
you expect will happen with the mean training error?
A) Increase
B) Decrease
C) Remain constant
D) Can’t Say
Solution: (D)
Training error may increase or decrease depending on the values that are used to [t the model. If the values used
to train contain more outliers gradually, then the error might just increase.
25) What do you expect will happen with bias and variance as you increase the size of training
data?
Solution: (D)
As we increase the size of the training data, the bias would increase while the variance would decrease.
Consider the following data where one input(X) and one output(Y) is given.
https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/ Page 8 of 13
30 Questions to test a data scientist on Linear Regression 09/02/20, 9'16 PM
26) What would be the root mean square training error for this data if you run a Linear Regression
model of the form (Y = A0+A1X)?
A) Less than 0
B) Greater than zero
C) Equal to 0
D) None of these
Solution: (C)
We can perfectly [t the line on the following data so mean error will be zero.
Suppose you have been given the following scenario for training and validation error for Linear Regression.
Number of Validation
Scenario Learning Rate Training Error
iterations Error
27) Which of the following scenario would give you the right hyper parameter?
A) 1
B) 2
C) 3
D) 4
Solution: (B)
Option B would be the better option because it leads to less training as well as validation error.
28) Suppose you got the tuned hyper parameters from the previous question. Now, Imagine you
want to add a variable in variable space such that this added feature is important. Which of the
following thing would you observe in such case?
https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/ Page 9 of 13
30 Questions to test a data scientist on Linear Regression 09/02/20, 9'16 PM
Solution: (D)
If the added feature is important, the training and validation error would decrease.
Suppose, you got a situation where you [nd that your linear regression model is under [tting the data.
29) In such situation which of the following options would you consider?
A) 1 and 2
B) 2 and 3
C) 1 and 3
D) 1, 2 and 3
Solution: (A)
In case of under [tting, you need to induce more variables in variable space or you can add some polynomial
degree variables to make the model more complex to be able to [r the data better.
A) L1
B) L2
C) Any
D) None of these
Solution: (D)
I won’t use any regularization methods because regularization is used in case of over[tting.
End Notes
I tried my best to make the solutions as comprehensive as possible but if you have any questions / doubts
please drop in your comments below. I would love to hear your feedback about the skilltest. For more such
skilltests, check out our current hackathons (https://datahack.analyticsvidhya.com/contest/all/).
You can also read this article on Analytics Vidhya's Android APP
(//play.google.com/store/apps/details?
id=com.analyticsvidhya.android&utm_source=blog_article&utm_campaign=blog&pcampaignid=MKT-Other-
global-all-co-prtnr-py-PartBadge-Mar2515-1)
Share this:
(https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/?share=linkedin&nb=1)
(https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/?share=facebook&nb=1)
(https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/?share=twitter&nb=1)
(https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/?share=pocket&nb=1)
(https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/?share=reddit&nb=1)
https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/ Page 10 of 13
30 Questions to test a data scientist on Linear Regression 09/02/20, 9'16 PM
Like this:
Loading...
demysti[ed/) questions-test-data-scientist-natural-language-
processing-solution-skilltest-nlp/)
(https://www.analyticsvidhya.com/blog/author/facebook_user_4/)
Ankit is currently working as a data scientist at UBS who has solved complex data
mining problems in many domains. He is eager to learn more about data science and
machine learning algorithms.
$ + * (ankit.gupta968)
(https://www.linkedin.com/in/ankit-
(https://github.com/anki1909)
gupta-84b737ba?
trk=nav_responsive_tab_pro[le)
This article is quite old and you might not get a prompt response from the author. We request you to post
this comment on Analytics Vidhya's Discussion portal (https://discuss.analyticsvidhya.com/) to get your
queries resolved
5 COMMENTS
MEJ Reply
July 3, 2017 at 10:31 pm (https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-
regression/#comment-131504)
For question 4, isn’t (D) the right answer? Can’t we use OLS or MLE to [nd best [t line in Linear Regression? I had
thought MLE would be better for complex data.
MATEUSZ Reply
July 14, 2017 at 8:28 pm (https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-
regression/#comment-132133)
https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/ Page 11 of 13
30 Questions to test a data scientist on Linear Regression 09/02/20, 9'16 PM
A) Lower is better
B) Higher is better
C) A or B depend on the situation
D) None of these
The correct answer is D. Lower Residuals SQUARES are better than higher residuals squares!
ASW Reply
July 22, 2017 at 11:31 am (https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-
regression/#comment-132620)
SHWETA Reply
September 8, 2017 at 5:46 pm (https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-
regression/#comment-136653)
Hey Ankit
Thanks in advance!!
Thanks for making it possible to train our knowledge regarding regression techniques.
Suppose that you have a dataset D1 and you design a linear regression model of degree 3 polynomial and you
found that the training and testing error is “0” or in another terms it perfectly [ts the data.”
But one question, a degree 3 polynomial regression isn’t considered as a linear regerssion model right?
Cheers,
Lena
(https://play.google.com/store/apps/details? Careers
(https://apps.apple.com/us/app/analytics-
Apply Jobs Advertising
id=com.analyticsvidhya.android) (https://www.analyticsvidhya.com/about-
vidhya/id1470025572) (https://www.analyticsvidhya.com/jobs/)
(https://www.analyticsvidhya.com/contact/)
me/career-analytics-vidhya/)
Contact us
(https://www.analyticsvidhya.com/contact/)
https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/ Page 12 of 13
30 Questions to test a data scientist on Linear Regression 09/02/20, 9'16 PM
© Copyright 2013-2020 Analytics Vidhya Privacy Policy Terms of Use Refund Policy
×
-
(http://play.google.com/store/apps/details?id=com.analyticsvidhya.android)
https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/ Page 13 of 13