0% found this document useful (0 votes)

32 views

Linear Regression Assignment Questions and Answer

SUBJECTIVE QUESTIONS

Uploaded by

lakshna673

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Linear Regression Assignment Questions and Answer

SUBJECTIVE QUESTIONS

Uploaded by

lakshna673

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Assignment-based Subjective Questions

1. From your analysis of the categorical variables from the dataset, what could you
infer about their effect on the dependent variable?

Answer: I have analyzed the categorical columns using boxplots, and here are the
key insights from the visualization:

❖ The fall season has the highest number of bookings, and in each season, the booking
count increased significantly from 2018 to 2019.
❖ Most bookings occurred during May, June, July, August, September, and October.
The trend increased from the beginning of the year until mid-year and then started
decreasing towards the end of the year.
❖ Clear weather attracted more bookings, which is expected.
❖ Thursdays, Fridays, Saturdays, and Sundays have more bookings compared to the
beginning of the week.
❖ On holidays, bookings are fewer, likely because people prefer to spend time at home
with their families.
❖ Bookings are almost equal on working days and non-working days.
❖ The number of bookings in 2019 was higher than in 2018, indicating good progress in
terms of business.
2. Why is it important to use drop_first=True during dummy variable creation?
Answer:
Using drop_first=True is crucial because it helps reduce the extra column created
during dummy variable creation, thereby minimizing the correlations among dummy
variables.

The syntax for this is ‘drop_first: bool, default False,’ which specifies whether to
create ‘k-1' dummies out of ‘k’ categorical levels by removing the first level.

For instance, if we have a categorical column with three values and we create
dummy variables for that column, we don't need a dummy variable for the third
value. If a variable is not A or B, it is implicitly C, thus eliminating the need for a third
dummy variable to identify C.

3. Looking at the pair-plot among the numerical variables, which one has the highest
correlation with the target variable?
Answer:
‘temp’ variable has the highest correlation with the target variable.

4. How did you validate the assumptions of Linear Regression after building the model
on the training set?
Answer:
I have validated the Linear Regression Model based on the following five
assumptions:
▪ Normality of Error Terms: The error terms should be normally distributed.
▪ Multicollinearity Check: There should be insignificant multicollinearity
among variables.
▪ Linear Relationship Validation: Linearity should be evident among
variables.
▪ Homoscedasticity: There should be no visible pattern in the residual values.
▪ Independence of Residuals: There should be no autocorrelation.

5. Based on the final model, which are the top 3 features contributing significantly
towards explaining the demand of the shared bikes?
Answer:
The top three features significantly contributing to explaining the demand for
shared bikes are:
▪ Temperature (temp)
▪ Winter season (winter)
▪ September (september)
General Subjective Questions
1. Explain the linear regression algorithm in detail.
Answer:
Linear regression is a statistical model that analyses the linear relationship
between a dependent variable and a given set of independent variables. A linear
relationship implies that changes in the independent variables result in
proportional changes in the dependent variable.

Key Components

o Dependent Variable (Y): The variable being predicted.

o Independent Variable (X): The variable used to make predictions.
o Slope (m): Represents the effect of X on Y.
o Intercept (c): The constant value of Y when X is zero.

The relationship is represented mathematically by the equation:

Y=mX+cY=mX+c

Types of Linear Relationships

o Positive Linear Relationship: Both the independent and dependent

variables increase together.
o Negative Linear Relationship: The independent variable increases while
the dependent variable decreases.
Types of Linear Regression
o Simple Linear Regression: Involves one independent variable.
o Multiple Linear Regression: Involves multiple independent variables.
Assumptions of Linear Regression
a. Multicollinearity: Assumes little or no multicollinearity, meaning
independent variables should not be highly correlated with each other.
b. Autocorrelation: Assumes little or no autocorrelation, meaning residual
errors should not be dependent on each other.
c. Linear Relationship: Assumes a linear relationship between response
and feature variables.
d. Normality of Error Terms: Error terms should be normally distributed.
e. Homoscedasticity: There should be no visible pattern in the residual
values.

2. Explain the Anscombe’s quartet in detail.

Answer:
Anscombe's quartet is a collection of four datasets that have nearly identical simple
descriptive statistics but appear very different when graphed. These datasets were
created by the statistician Francis Anscombe in 1973 to illustrate the importance of
graphing data before analyzing it and the impact of outliers and the distribution of
data on statistical measures.

3. What is Pearson’s R?
Answer:
Pearson's R, also known as Pearson's correlation coefficient, is a statistical
measure that quantifies the strength and direction of a linear relationship
between two variables. It is denoted by rr and ranges from -1 to 1.

Key Features of Pearson’s R

Range:
o r=1r=1: Perfect positive linear correlation.
o r=−1r=−1: Perfect negative linear correlation.
o r=0r=0: No linear correlation.
Interpretation:
o Positive Values: Indicates a positive relationship where, as one variable
increases, the other variable also increases.
o Negative Values: Indicates a negative relationship where, as one variable
increases, the other variable decreases.
o Magnitude: The closer the value of rr is to 1 or -1, the stronger the linear
relationship between the two variables.

4. What is scaling? Why is scaling performed? What is the difference between

normalized scaling and standardized scaling?
Answer:
Scaling is a strategy for standardizing the independent features included in data over
a specified range. Data pre-processing involves handling significantly variable
magnitudes, values, and units. Rescaling variables is crucial for achieving a comparable
scale. If scales are not similar, certain coefficients may be much larger or smaller than
others when fitting a regression model.
▪ Normalized Scaling translates data to a scale of 0 to 1. However, this may
result in data loss or distortion of outliers. This method is typically
employed when features vary in magnitude and data distribution is
unclear. Min-Max scaling is a typical approach for such cases, using the
formula below:

Xscaled = X−Xmin / Xmax −Xmin

where:

• X is the original value.

• Xmin is the minimum value of the feature.
• Xmax is the maximum value of the feature.
• Xscaled is the scaled value in the range [0, 1].

▪ Standardized scaling translates data to a conventional normal distribution

using the formula below. This method is typically utilized when the feature
distribution is normal/gaussian, there are no outliers, and the modified
data does not have specified boundaries.

Xstandardized =(X−μ)/ σ

where:

• X is the original value.

• μ is the mean of the feature.
• σ is the standard deviation of the feature.
•
• Xstandardized is the standardized value.

5. You might have observed that sometimes the value of VIF is infinite. Why does this
happen?
Answer:

The Variance Inflation Factor (VIF) measures the collinearity between

predictor variables in multiple regression. It is calculated by dividing the
variance of all betas in a given model by the variance of a single beta when fitted
alone. A VIF of infinity indicates a perfect relationship between two independent
variables, leading to an R² value of 1, which results in 1/(1-R²) being infinite. To
address this, one of the variables causing perfect multicollinearity needs to be
removed from the dataset. In summary, an infinite

VIF value indicates that the corresponding variable can be exactly

expressed as a linear combination of other variables.

6. What is a Q-Q plot? Explain the use and importance of a Q-Q plot in linear
regression.
Answer:
A Q-Q plot, or Quantile-Quantile plot, is a graphical tool used to compare
the distribution of a dataset to a theoretical distribution, often the normal
distribution. The plot displays the quantiles of the sample data against the
quantiles of the theoretical distribution. If the data follows the theoretical
distribution, the points on the Q-Q plot will approximately lie on a straight line.

Use and Importance of a Q-Q Plot in Linear Regression:

Assessing Normality of Residuals:

▪ In linear regression, one key assumption is that the residuals (the
differences between observed and predicted values) are normally
distributed. A Q-Q plot can help check this assumption. If the
residuals are normally distributed, the points on the Q-Q plot will
fall along a straight line.
Detecting Deviations from Normality:
▪ Deviations from the straight line in a Q-Q plot can indicate
departures from normality. For example, systematic deviations,
such as an S-shaped curve, may indicate skewness, while
deviations at the ends of the plot (tails) can indicate the presence
of outliers or heavy tails.
Identifying Potential Problems:
▪ By visualizing the distribution of residuals, a Q-Q plot helps
identify potential problems with the regression model, such as
non-linearity, heteroscedasticity (non-constant variance), or the
presence of outliers. These issues can affect the validity of the
model's predictions and the reliability of statistical tests.
Model Diagnostics and Improvement:
Analysing the Q-Q plot can guide model diagnostics and
improvements. If the residuals are not normally distributed, it may
suggest the need for transforming variables, adding polynomial terms, or
using a different modeling approach.

In summary, a Q-Q plot is a valuable diagnostic tool in linear regression that

helps assess the normality of residuals, detect deviations from normality, identify
potential problems, and guide improvements to the regression model.

Subjective Questions
92% (13)
Subjective Questions
6 pages
Salt Cfa Level 1 Formulasheet 2023
No ratings yet
Salt Cfa Level 1 Formulasheet 2023
23 pages
Bike Sharing Assignment
100% (6)
Bike Sharing Assignment
7 pages
Regression Analysis
No ratings yet
Regression Analysis
12 pages
Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Linear Regression Assignment_Subjective
No ratings yet
Linear Regression Assignment_Subjective
7 pages
Assignment Linear Regression
No ratings yet
Assignment Linear Regression
10 pages
Subjective Questions
No ratings yet
Subjective Questions
3 pages
Unit-III (Data Analytics)
100% (1)
Unit-III (Data Analytics)
15 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
3-Linear Regreesion-Assumptions
No ratings yet
3-Linear Regreesion-Assumptions
28 pages
DA-MODULE-3
No ratings yet
DA-MODULE-3
54 pages
Explain The Linear Regression Algorithm in Detail
No ratings yet
Explain The Linear Regression Algorithm in Detail
12 pages
U-4_IML
No ratings yet
U-4_IML
17 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Econometrics
No ratings yet
Econometrics
13 pages
Unit1 - Data Science - SPPU
No ratings yet
Unit1 - Data Science - SPPU
15 pages
Unit 2
No ratings yet
Unit 2
34 pages
11, 12. Predictive Analysis
No ratings yet
11, 12. Predictive Analysis
33 pages
Assignment-Based Subjective Questions
100% (1)
Assignment-Based Subjective Questions
10 pages
Revision235
No ratings yet
Revision235
8 pages
MODULE-3
No ratings yet
MODULE-3
34 pages
Chapter 4
No ratings yet
Chapter 4
15 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Ridge Regression
No ratings yet
Ridge Regression
24 pages
Unit 2 ML
No ratings yet
Unit 2 ML
201 pages
Chapter 6
No ratings yet
Chapter 6
58 pages
Module 3 EDA
No ratings yet
Module 3 EDA
14 pages
Bus 173 - Lecture 5
No ratings yet
Bus 173 - Lecture 5
38 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
28 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
ML Unit-2 Final
No ratings yet
ML Unit-2 Final
32 pages
Bio2 Module 4 - Multiple Linear Regression
No ratings yet
Bio2 Module 4 - Multiple Linear Regression
20 pages
REGRESSION
No ratings yet
REGRESSION
8 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
correlation coefficient
No ratings yet
correlation coefficient
44 pages
MLT Unit 2
No ratings yet
MLT Unit 2
53 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
FAM Unit6
No ratings yet
FAM Unit6
32 pages
Data Analytics Lesson 11 Notes
No ratings yet
Data Analytics Lesson 11 Notes
8 pages
Linear - Regression & Evaluation Metrics
No ratings yet
Linear - Regression & Evaluation Metrics
31 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
2.1 Regression Analysis
No ratings yet
2.1 Regression Analysis
28 pages
Machine Learning Algorithm
100% (2)
Machine Learning Algorithm
20 pages
CORRELATION
No ratings yet
CORRELATION
10 pages
Ecotrix Ecotrix: B.A. Economics (Hons.) (University of Delhi) B.A. Economics (Hons.) (University of Delhi)
No ratings yet
Ecotrix Ecotrix: B.A. Economics (Hons.) (University of Delhi) B.A. Economics (Hons.) (University of Delhi)
18 pages
Linear Regression
No ratings yet
Linear Regression
10 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Describe in Brief Different Types of Regression Algorithms
No ratings yet
Describe in Brief Different Types of Regression Algorithms
25 pages
ML Unit-III Notes
No ratings yet
ML Unit-III Notes
83 pages
AIML MSE 2 Notes
No ratings yet
AIML MSE 2 Notes
35 pages
Unit 3
No ratings yet
Unit 3
25 pages
Unit - 2 MLA
No ratings yet
Unit - 2 MLA
57 pages
ML unit-2 ppt
No ratings yet
ML unit-2 ppt
34 pages
Chapter 10 Regression Analysis
No ratings yet
Chapter 10 Regression Analysis
3 pages
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
No ratings yet
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
19 pages
Central Tendency
No ratings yet
Central Tendency
10 pages
Advanced Econometrics I: Tesfaye Chofana (PHD)
No ratings yet
Advanced Econometrics I: Tesfaye Chofana (PHD)
59 pages
Statistics Important Questions
100% (1)
Statistics Important Questions
26 pages
ST4250 23S1 Assignment 2
No ratings yet
ST4250 23S1 Assignment 2
2 pages
6.05 Project Template (Individual) v22 copy
No ratings yet
6.05 Project Template (Individual) v22 copy
4 pages
SASA
No ratings yet
SASA
4 pages
Tolerance Stack Template v1
No ratings yet
Tolerance Stack Template v1
43 pages
Dajes-Task 1
No ratings yet
Dajes-Task 1
9 pages
OECD Glossary of Statistical Terms
No ratings yet
OECD Glossary of Statistical Terms
24 pages
Sampling Distributions: Golden Gate Colleges
No ratings yet
Sampling Distributions: Golden Gate Colleges
8 pages
DDBTNK
No ratings yet
DDBTNK
18 pages
Ardl Analysis Appendix
No ratings yet
Ardl Analysis Appendix
9 pages
Oup 6
No ratings yet
Oup 6
48 pages
Sma 2103 Probability and Statistics-Print Ready
No ratings yet
Sma 2103 Probability and Statistics-Print Ready
4 pages
Analytics for Retail - A Step-By-Step Guide to the Statistics Behind a Successful Retail Business
No ratings yet
Analytics for Retail - A Step-By-Step Guide to the Statistics Behind a Successful Retail Business
105 pages
2 Limits Fits and Tolerances
No ratings yet
2 Limits Fits and Tolerances
65 pages
11 Regression JASP
100% (1)
11 Regression JASP
35 pages
Assignment 3 - Exploratory Data Analysis
No ratings yet
Assignment 3 - Exploratory Data Analysis
2 pages
Mode of Grouped Data
No ratings yet
Mode of Grouped Data
5 pages
Notes No. 3 Solvinmg For Mean Variance Standard Deviation
No ratings yet
Notes No. 3 Solvinmg For Mean Variance Standard Deviation
7 pages
Lab5 - NguyenHoangAnhTu - Jupyter Notebook
No ratings yet
Lab5 - NguyenHoangAnhTu - Jupyter Notebook
33 pages
ED H14b Cumualtive Frequency, Box Plots, Histogram
No ratings yet
ED H14b Cumualtive Frequency, Box Plots, Histogram
2 pages
Modified Ps Final 2023
No ratings yet
Modified Ps Final 2023
124 pages
BS-chapter1-2021-Intro Statistics-Descrptv N Sumary M & Measures of Location-22
No ratings yet
BS-chapter1-2021-Intro Statistics-Descrptv N Sumary M & Measures of Location-22
57 pages
Vinay Dahiya On LinkedIn #Qualityassurance #Sixsi
No ratings yet
Vinay Dahiya On LinkedIn #Qualityassurance #Sixsi
4 pages
Business Statistics: For University of Delhi
No ratings yet
Business Statistics: For University of Delhi
11 pages
01 Standard Deviation Labreport
No ratings yet
01 Standard Deviation Labreport
5 pages
Business Statistics Report
No ratings yet
Business Statistics Report
27 pages
Biostatistics & Nursing Research
No ratings yet
Biostatistics & Nursing Research
69 pages

Linear Regression Assignment Questions and Answer

Uploaded by

Linear Regression Assignment Questions and Answer

Uploaded by

Assignment-based Subjective Questions

o Dependent Variable (Y): The variable being predicted.

The relationship is represented mathematically by the equation:

Types of Linear Relationships

o Positive Linear Relationship: Both the independent and dependent

2. Explain the Anscombe’s quartet in detail.

Key Features of Pearson’s R

4. What is scaling? Why is scaling performed? What is the difference between

Xscaled = X−Xmin / Xmax −Xmin

• X is the original value.

▪ Standardized scaling translates data to a conventional normal distribution

• X is the original value.

The Variance Inflation Factor (VIF) measures the collinearity between

VIF value indicates that the corresponding variable can be exactly

Use and Importance of a Q-Q Plot in Linear Regression:

Assessing Normality of Residuals:

In summary, a Q-Q plot is a valuable diagnostic tool in linear regression that

You might also like