92% found this document useful (13 votes)

7K views6 pages

Subjective Questions

The document contains questions related to linear regression analysis and model building. It asks the respondent to summarize inferences made from analyzing categorical variables, explain why drop_first=True is used for dummy variable creation, and identify the feature with highest correlation to the target variable based on a pair plot. It also asks to validate linear regression assumptions on the training set, list the top 3 significant features in the final model, and explain the linear regression algorithm and Anscombe's quartet in detail. General questions about Pearson's R, scaling, VIF values becoming infinite, and Q-Q plots are also included.

Uploaded by

Nitish Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

92% found this document useful (13 votes)

7K views6 pages

Subjective Questions

Uploaded by

Nitish Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Assignment-based Subjective Questions

1. From your analysis of the categorical variables from the dataset, what
could you infer about their effect on the dependent variable?

Answer: Here are some of the inferences I made from my analysis of categorical variables
from the dataset on the dependent variable (Count)

1. Fall has the highest median, which is expected as weather conditions are most
optimal to ride bike followed by summer.
2. Median bike rents are increasing year on as year 2019 has a higher median then
2018, it might be due the fact that bike rentals are getting popular and people are
becoming more aware about environment.
3. Overall spread in the month plot is reflection of season plot as fall months have
higher median.
4. People rent more on non - holidays compared to holidays, so reason might be they
prefer to spend time with family and use personal vehicle instead of bike rentals.
5. Overall median across all days is same but spread for Saturday and Wednesday is
bigger may be evident that those who have plans for Saturday might not rent bikes
as it a non-working day.
6. Working and non-working days have almost the same median although spread is
bigger for non-working days as people might have plans and do not want to rent
bikes because of that
7. Clear weather is most optimal for bike renting, as temperate is optimal, humidity is
less, and temperature is less.

2. Why is it important to use drop_first=True during dummy variable

creation?

Answer: A variable with n levels can be represented by n-1 dummy variables. So, if we
remove the first column then also, we can represent the data. If the value of variable from 2
to n is 0, it means that the value of 1st variable is 1.
Example: 'Relationship' with three levels, namely, 'Single', 'In a Relationship', and 'Married', I
would create a dummy table like the following:
But I can clearly see that there is no need to define three different levels. If I drop a level,
say 'Single', I would still be able to explain the three levels.
Let us drop the dummy variable 'Single' from the columns and see what the table looks like:

If both the dummy variables, namely, 'In a Relationship' and 'Married', are equal to zero,
that means that the person is single. If 'In a relationship' is one and 'Married' is zero, that
means that the person is in a relationship, and finally, if 'In a relationship' is zero and
'Married' is 1, that means that the person is married.

3. Looking at the pair-plot among the numerical variables, which one has
the highest correlation with the target variable?

Answer: ‘temp’ had the highest correlation coefficient of 0.63.

4. How did you validate the assumptions of Linear Regression after
building the model on the training set?

Answer: By plotting the residuals distribution. It came out to be a normal distribution with
a mean value of 0.

5. Based on the final model, which are the top 3 features contributing
significantly towards explaining the demand of the shared bikes?

Answer: The Following are the top 3 features contributing significantly towards explaining
the demands of the shared bikes:
• atemp (0.412)
• yr (0.236)
• weathersit Light rain (-0.275)
General Subjective Questions
1. Explain the linear regression algorithm in detail.

Answer: A linear regression algorithm tries to explain the relationship between independent
and dependent variable using a straight line. It is applicable to numerical variables only.
Following steps are performed while doing linear regression:
• The dataset is divided into test and training data
• Train data is divided into features(independent) and target (dependent) datasets
• A linear model is fitted using the training dataset. Internally the api’s from python
uses gradient descent algorithm to find the coefficients of the best fit line. The
gradient descent algorithm works by minimising the cost function. A typical example
of cost function is residual sum of squares.
• In case of multiple features, the predicted variable is a hyperplane instead of line.
The predicted variable takes the following form:

𝑌= 𝛽0+𝛽1𝑥1+𝛽2𝑥2+𝛽3𝑥3+⋯+ 𝛽𝑛𝑥𝑛
• The predicted variable is than compared with test data and assumptions are
checked.

2. Explain the Anscombe’s quartet in detail.

Answer: Anscombe’s quartet comprises of four data sets that have nearly identical simple
descriptive statistics but have quite different distribution when visualized graphically. The
simple statistics consist of mean, sample variance of x and y, correlation coefficient, linear
regression line and R-Square value. Anscombe's Quartet shows that multiple data sets with
many similar statistical properties can still be vastly different from one another when
graphed. The graphs are shown below:
Image source - https://en.wikipedia.org/wiki/Anscombe%27s_quartet
3. First plot (top left) appears to be simple linear relationship
4. The second plot (top right) is not distributed normally and correlation coefficient is
irrelevant as it shows a nonlinear relationship
5. The third plot (bottom left) is linear but has different regression line. This is
happening because of the outliers present in the data
6. The fourth plot (bottom right) does not show linear relationship however due to
outliers the statistics got adjusted.

In a nutshell, it is a better practice to visualize data and remove outliers before analysing it.

3. What is Pearson’s R?

Answer: Pearson’s R measures the strength of association of two variables. It is the

covariance of two variables divided by the product of their standard deviation. It has a value
from +1 to -1.
• A value of 1 means a total positive linear correlation. It means that if one variable
increase then other will also increase
• A value of 0 means no correlation
• A value of -1 means a total negative correlation. It means that if one variable
increase then other will decrease

4. What is scaling? Why is scaling performed? What is the difference

between normalized scaling and standardized scaling?

Answer: Scaling of a variable is performed to keep a variable in certain range. Scaling is a

pre-processing step in linear regression analysis. The reason we scale a variable is to make
the computation of gradient descent faster. The step size of gradient descent are generally
low for accuracy, if the data has some small variables (values in the range of 0-1) and some
big variables (values in the range of 0 -1000) than the time taken by the gradient descent
algorithm will be huge.

Normalised Scaling Standardized scaling

Called min max scaling, scales the variable Values are centred around mean with a unit
such that the range is 0-1 standard deviation
Good for non- gaussian distribution Good for gaussian distribution
Value id bounded between 0 and 1 Value is not bounded
Outliers are also scaled Does not affect outliers
5. You might have observed that sometimes the value of VIF is infinite.
Why does this happen?

Answer: The formula for VIF is

1
VIFi = 1− 𝑅2
𝑖
Basically, if R square is 1 than VIF becomes infinite. It means that there is perfect correlation
between the features.

6. What is a Q-Q plot? Explain the use and importance of a Q-Q plot in
linear regression.

Answer: A Q-Q plot is a scatter plot of two sets of quantiles against each other. Its purpose
is to check if the two sets of data came from the same distribution. It is a visual check of
data. If the data is from same source than the plot will appear as a line.

Problem 1: Linear Regression
54% (13)
Problem 1: Linear Regression
14 pages
Weekly Quiz 2 (AS) - PGPBABI.O.OCT19 Advanced Statistics - Great Learning PDF
No ratings yet
Weekly Quiz 2 (AS) - PGPBABI.O.OCT19 Advanced Statistics - Great Learning PDF
5 pages
Predictive Modelling Project Report Final
45% (11)
Predictive Modelling Project Report Final
49 pages
Weekly Quiz 3 SMDM - PGPBABI.O.OCT19 Statistical Methods For Decision Making - Great Learning PDF
67% (3)
Weekly Quiz 3 SMDM - PGPBABI.O.OCT19 Statistical Methods For Decision Making - Great Learning PDF
6 pages
Bike Sharing Assignment
100% (6)
Bike Sharing Assignment
7 pages
Capstone Presentation: Telecom Churn Study
100% (3)
Capstone Presentation: Telecom Churn Study
19 pages
Education - Post 12th Standard - CSV
88% (16)
Education - Post 12th Standard - CSV
11 pages
Assignment Subjective Questions
67% (3)
Assignment Subjective Questions
1 page
Lead Score Case Study - Presentation
33% (3)
Lead Score Case Study - Presentation
17 pages
Credit EDA Assignment
67% (6)
Credit EDA Assignment
41 pages
Business Analysis Report: SQL Lite and Mysql Project
76% (21)
Business Analysis Report: SQL Lite and Mysql Project
11 pages
Lead Scoring Case Study - Summary
80% (5)
Lead Scoring Case Study - Summary
2 pages
Capstone Project - Credit Risk Analysis
67% (6)
Capstone Project - Credit Risk Analysis
50 pages
Customer Churn Prediction Project: by Shweta Gupta
100% (6)
Customer Churn Prediction Project: by Shweta Gupta
41 pages
RSVP Movies SQL Assignment
0% (1)
RSVP Movies SQL Assignment
1 page
Great LearningPG-program-dsba
0% (1)
Great LearningPG-program-dsba
16 pages
Time Series Forecasting - ShoeSales - Business Report - Divjyot Shah Singh
100% (5)
Time Series Forecasting - ShoeSales - Business Report - Divjyot Shah Singh
38 pages
New Wheels - Project - Report
No ratings yet
New Wheels - Project - Report
31 pages
RSVP SQL
100% (2)
RSVP SQL
14 pages
Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
Assignment-Based Subjective Questions
No ratings yet
Assignment-Based Subjective Questions
1 page
Boston Condo Sale Story
0% (1)
Boston Condo Sale Story
11 pages
Predictive Modelling - Final Project Report-Logistic Regression and LDA
100% (1)
Predictive Modelling - Final Project Report-Logistic Regression and LDA
25 pages
Advance Statistics Business Report
No ratings yet
Advance Statistics Business Report
15 pages
Business Report Machine Learning-1
100% (7)
Business Report Machine Learning-1
60 pages
DVT Alternate Project
50% (2)
DVT Alternate Project
1 page
Mechanism of Physics: Hot Air Balloon
No ratings yet
Mechanism of Physics: Hot Air Balloon
14 pages
Assignment-Based Subjective Questions
100% (1)
Assignment-Based Subjective Questions
10 pages
Predictive Modelling Project - Business Report
100% (1)
Predictive Modelling Project - Business Report
23 pages
Capstone Project Business: Predict Customer Churn in E-Commerce
100% (2)
Capstone Project Business: Predict Customer Churn in E-Commerce
10 pages
Business Report - Advanced Statistics - Great Learning
100% (1)
Business Report - Advanced Statistics - Great Learning
20 pages
Advanced Statistics Project Report
100% (1)
Advanced Statistics Project Report
34 pages
Advance Statistics - Buisness Report
100% (1)
Advance Statistics - Buisness Report
26 pages
CustomerChurn Assignment
100% (3)
CustomerChurn Assignment
15 pages
Machine Learning Business Report
100% (1)
Machine Learning Business Report
34 pages
Machine Learning VIVEK
80% (5)
Machine Learning VIVEK
118 pages
Predictive Modelling Project 2
100% (4)
Predictive Modelling Project 2
32 pages
Capstone Grp6 PREDICTING INSURANCE RENEWAL PROPENSITY v3
100% (1)
Capstone Grp6 PREDICTING INSURANCE RENEWAL PROPENSITY v3
24 pages
Advanced Statistics Jupyter File PDF
100% (2)
Advanced Statistics Jupyter File PDF
56 pages
Credit EDA Case Study
100% (3)
Credit EDA Case Study
16 pages
SMDM Project
No ratings yet
SMDM Project
16 pages
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
No ratings yet
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
18 pages
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
100% (4)
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
36 pages
Time Series Forecasting Business Report: Name: S.Krishna Veni Date: 20/02/2022
100% (1)
Time Series Forecasting Business Report: Name: S.Krishna Veni Date: 20/02/2022
31 pages
Report - Project8 - FRA - Surabhi - Report
100% (2)
Report - Project8 - FRA - Surabhi - Report
15 pages
RFM Analysis Presentation
No ratings yet
RFM Analysis Presentation
30 pages
Machine Learning (Project5) PDF
100% (2)
Machine Learning (Project5) PDF
13 pages
REport Time Series
100% (2)
REport Time Series
57 pages
Project Predictive Modeling
50% (2)
Project Predictive Modeling
69 pages
Machine Learning
100% (1)
Machine Learning
33 pages
Jupyter Notebook Project CART RF ANN
100% (1)
Jupyter Notebook Project CART RF ANN
41 pages
Project ML
100% (4)
Project ML
36 pages
Machine Learning Report
92% (12)
Machine Learning Report
42 pages
Cart-Rf-Ann: Prepared by Muralidharan N
67% (3)
Cart-Rf-Ann: Prepared by Muralidharan N
33 pages
Car Transport Machine Learning
89% (9)
Car Transport Machine Learning
28 pages
Lifi
100% (1)
Lifi
16 pages
MRA Project As On 23rd Feb-2020
93% (14)
MRA Project As On 23rd Feb-2020
29 pages
Linear Regression - House Price Prediction
100% (2)
Linear Regression - House Price Prediction
174 pages
Linear Regression Assignment Questions and Answer
No ratings yet
Linear Regression Assignment Questions and Answer
7 pages
Subjective Questions
No ratings yet
Subjective Questions
3 pages
ML_Asssignment_Subjective_Questions_Answers
No ratings yet
ML_Asssignment_Subjective_Questions_Answers
7 pages
Linear Regression Subjective Questions
No ratings yet
Linear Regression Subjective Questions
14 pages
Watch The Love in Your Eyes (2022) Episode 70 English Subbed On Myasiantv
No ratings yet
Watch The Love in Your Eyes (2022) Episode 70 English Subbed On Myasiantv
2 pages
Kokdu Season of Deity (2023) (2023)
No ratings yet
Kokdu Season of Deity (2023) (2023)
2 pages
Watch Trolley (2022) Episode 1 English Subbed On Myasiantv
No ratings yet
Watch Trolley (2022) Episode 1 English Subbed On Myasiantv
1 page
Watch The Love in Your Eyes (2022) Episode 72 English Subbed On Myasiantv
No ratings yet
Watch The Love in Your Eyes (2022) Episode 72 English Subbed On Myasiantv
2 pages
Glass Mask (2012)
No ratings yet
Glass Mask (2012)
2 pages
Watch Hetalia World Stars English Subbed Online Free
No ratings yet
Watch Hetalia World Stars English Subbed Online Free
2 pages
Watch Beyblade Burst Dynamite Battle English Subbed Online Free
0% (1)
Watch Beyblade Burst Dynamite Battle English Subbed Online Free
2 pages
Watch Dungeon Ni Deai Wo Motomeru No Wa Machigatteiru Darou Ka III OVA English Subbed Online Free
No ratings yet
Watch Dungeon Ni Deai Wo Motomeru No Wa Machigatteiru Darou Ka III OVA English Subbed Online Free
2 pages
Watch Aikatsu Planet! English Subbed Online Free
No ratings yet
Watch Aikatsu Planet! English Subbed Online Free
2 pages
YouTube
No ratings yet
YouTube
5 pages
Watch Anime Online, Watch English Anime Online Subbed, Dubbed
No ratings yet
Watch Anime Online, Watch English Anime Online Subbed, Dubbed
2 pages
How Does Anesthesia Work - Steven Zheng - YouTube
No ratings yet
How Does Anesthesia Work - Steven Zheng - YouTube
3 pages
Skythewood Translations Overlord Volume 1 Chapter 2 & Intermission - 6
No ratings yet
Skythewood Translations Overlord Volume 1 Chapter 2 & Intermission - 6
1 page
Watch One Piece English Subbed Online Free
No ratings yet
Watch One Piece English Subbed Online Free
2 pages
AnimeSeries Watch Anime Online Free
No ratings yet
AnimeSeries Watch Anime Online Free
7 pages
Skythewood Translations Overlord Volume 1 Chapter 2 & Intermission - 11
No ratings yet
Skythewood Translations Overlord Volume 1 Chapter 2 & Intermission - 11
1 page
Skythewood Translations Overlord Volume 1 Prologue & Chapter 1
No ratings yet
Skythewood Translations Overlord Volume 1 Prologue & Chapter 1
1 page
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
No ratings yet
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
16 pages
Tamplate To Document Learning From Orientation Program-2
No ratings yet
Tamplate To Document Learning From Orientation Program-2
1 page
Skythewood Translations Overlord Volume 1 Chapter 2 & Intermission - 5
No ratings yet
Skythewood Translations Overlord Volume 1 Chapter 2 & Intermission - 5
1 page
Handout - NTPC
No ratings yet
Handout - NTPC
1 page
Skythewood Translations Overlord Volume 1 Prologue & Chapter 1
No ratings yet
Skythewood Translations Overlord Volume 1 Prologue & Chapter 1
34 pages
Case Study Summary
No ratings yet
Case Study Summary
3 pages
Grade Woe Data
No ratings yet
Grade Woe Data
2,759 pages
651 PDF
No ratings yet
651 PDF
14 pages
Microbial Air Contamination
No ratings yet
Microbial Air Contamination
16 pages
Thomas-Fiering Model For The Sequential Generation of Stream Flow
100% (3)
Thomas-Fiering Model For The Sequential Generation of Stream Flow
26 pages
Pattern Recognition Unit 1,2
No ratings yet
Pattern Recognition Unit 1,2
82 pages
1.3.5.14. Anderson-Darling Test
No ratings yet
1.3.5.14. Anderson-Darling Test
3 pages
Non-Stationarity and Unit Roots
No ratings yet
Non-Stationarity and Unit Roots
25 pages
Optimization of Minimum Quantity Lubrication Parameters in Turning of EN-8 Steel
No ratings yet
Optimization of Minimum Quantity Lubrication Parameters in Turning of EN-8 Steel
4 pages
Time Series Rose Shehroz Arfeen
100% (1)
Time Series Rose Shehroz Arfeen
42 pages
A Bass Diffusion Model Analysis - Understanding Alternative Fuel V PDF
No ratings yet
A Bass Diffusion Model Analysis - Understanding Alternative Fuel V PDF
53 pages
QNT 561 Weekly Learning Assessments Answers - UOP Students
No ratings yet
QNT 561 Weekly Learning Assessments Answers - UOP Students
36 pages
TI-36X Pro Guidebook
100% (1)
TI-36X Pro Guidebook
78 pages
Quiz Ch7 Statistics Questions and Answers
No ratings yet
Quiz Ch7 Statistics Questions and Answers
6 pages
Probability and Statistics Are The Two Important C
No ratings yet
Probability and Statistics Are The Two Important C
6 pages
Hypothesis Testing Roadmap PDF
50% (2)
Hypothesis Testing Roadmap PDF
2 pages
Fatigue Ocean Structures
No ratings yet
Fatigue Ocean Structures
68 pages
Mathematics V - Rbec First Quarter I. Whole Numbers A. Comprehension of Whole Numbers Review
No ratings yet
Mathematics V - Rbec First Quarter I. Whole Numbers A. Comprehension of Whole Numbers Review
6 pages
Forecasting at Uber: A Brief Survey: Andrea Pasqua
No ratings yet
Forecasting at Uber: A Brief Survey: Andrea Pasqua
53 pages
A Food Storage Locker Requires A Refrigeration Capacity of 50 KW
No ratings yet
A Food Storage Locker Requires A Refrigeration Capacity of 50 KW
2 pages
An Introduction To Categorical Data Analysis, 2Nd Ed
No ratings yet
An Introduction To Categorical Data Analysis, 2Nd Ed
13 pages
OPM400Ass1Feb_2024 (2)
No ratings yet
OPM400Ass1Feb_2024 (2)
3 pages
Excel As Me P T C
No ratings yet
Excel As Me P T C
6 pages
To Print - Randomnumber
No ratings yet
To Print - Randomnumber
29 pages
PAGP - Spectral Wave Characteristics Over The Head Bay of Bengal
No ratings yet
PAGP - Spectral Wave Characteristics Over The Head Bay of Bengal
24 pages
Time Series Analysis and Its Applications: With R Examples: Second Edition
No ratings yet
Time Series Analysis and Its Applications: With R Examples: Second Edition
18 pages
Telecommunications Engineering I - Probability Concepts and Stochastic Processes
No ratings yet
Telecommunications Engineering I - Probability Concepts and Stochastic Processes
68 pages
Modelling The Effect of Climate Change Induced Soil Settling On Drinking Water Distribution Pipes
No ratings yet
Modelling The Effect of Climate Change Induced Soil Settling On Drinking Water Distribution Pipes
8 pages
Statistics Unit 5 Notes
No ratings yet
Statistics Unit 5 Notes
13 pages
Sir 20155019 F
No ratings yet
Sir 20155019 F
42 pages

Subjective Questions

Uploaded by

Subjective Questions

Uploaded by

Assignment-based Subjective Questions

2. Why is it important to use drop_first=True during dummy variable

Answer: ‘temp’ had the highest correlation coefficient of 0.63.

2. Explain the Anscombe’s quartet in detail.

Answer: Pearson’s R measures the strength of association of two variables. It is the

4. What is scaling? Why is scaling performed? What is the difference

Answer: Scaling of a variable is performed to keep a variable in certain range. Scaling is a

Normalised Scaling Standardized scaling

Answer: The formula for VIF is

You might also like