0% found this document useful (0 votes)

32 views

Linear Regression Assignment Questions and Answer

SUBJECTIVE QUESTIONS

Uploaded by

lakshna673

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Linear Regression Assignment Questions and Answer

SUBJECTIVE QUESTIONS

Uploaded by

lakshna673

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Assignment-based Subjective Questions

1. From your analysis of the categorical variables from the dataset, what could you
infer about their effect on the dependent variable?

Answer: I have analyzed the categorical columns using boxplots, and here are the
key insights from the visualization:

❖ The fall season has the highest number of bookings, and in each season, the booking
count increased significantly from 2018 to 2019.
❖ Most bookings occurred during May, June, July, August, September, and October.
The trend increased from the beginning of the year until mid-year and then started
decreasing towards the end of the year.
❖ Clear weather attracted more bookings, which is expected.
❖ Thursdays, Fridays, Saturdays, and Sundays have more bookings compared to the
beginning of the week.
❖ On holidays, bookings are fewer, likely because people prefer to spend time at home
with their families.
❖ Bookings are almost equal on working days and non-working days.
❖ The number of bookings in 2019 was higher than in 2018, indicating good progress in
terms of business.
2. Why is it important to use drop_first=True during dummy variable creation?
Answer:
Using drop_first=True is crucial because it helps reduce the extra column created
during dummy variable creation, thereby minimizing the correlations among dummy
variables.

The syntax for this is ‘drop_first: bool, default False,’ which specifies whether to
create ‘k-1' dummies out of ‘k’ categorical levels by removing the first level.

For instance, if we have a categorical column with three values and we create
dummy variables for that column, we don't need a dummy variable for the third
value. If a variable is not A or B, it is implicitly C, thus eliminating the need for a third
dummy variable to identify C.

3. Looking at the pair-plot among the numerical variables, which one has the highest
correlation with the target variable?
Answer:
‘temp’ variable has the highest correlation with the target variable.

4. How did you validate the assumptions of Linear Regression after building the model
on the training set?
Answer:
I have validated the Linear Regression Model based on the following five
assumptions:
▪ Normality of Error Terms: The error terms should be normally distributed.
▪ Multicollinearity Check: There should be insignificant multicollinearity
among variables.
▪ Linear Relationship Validation: Linearity should be evident among
variables.
▪ Homoscedasticity: There should be no visible pattern in the residual values.
▪ Independence of Residuals: There should be no autocorrelation.

5. Based on the final model, which are the top 3 features contributing significantly
towards explaining the demand of the shared bikes?
Answer:
The top three features significantly contributing to explaining the demand for
shared bikes are:
▪ Temperature (temp)
▪ Winter season (winter)
▪ September (september)
General Subjective Questions
1. Explain the linear regression algorithm in detail.
Answer:
Linear regression is a statistical model that analyses the linear relationship
between a dependent variable and a given set of independent variables. A linear
relationship implies that changes in the independent variables result in
proportional changes in the dependent variable.

Key Components

o Dependent Variable (Y): The variable being predicted.

o Independent Variable (X): The variable used to make predictions.
o Slope (m): Represents the effect of X on Y.
o Intercept (c): The constant value of Y when X is zero.

The relationship is represented mathematically by the equation:

Y=mX+cY=mX+c

Types of Linear Relationships

o Positive Linear Relationship: Both the independent and dependent

variables increase together.
o Negative Linear Relationship: The independent variable increases while
the dependent variable decreases.
Types of Linear Regression
o Simple Linear Regression: Involves one independent variable.
o Multiple Linear Regression: Involves multiple independent variables.
Assumptions of Linear Regression
a. Multicollinearity: Assumes little or no multicollinearity, meaning
independent variables should not be highly correlated with each other.
b. Autocorrelation: Assumes little or no autocorrelation, meaning residual
errors should not be dependent on each other.
c. Linear Relationship: Assumes a linear relationship between response
and feature variables.
d. Normality of Error Terms: Error terms should be normally distributed.
e. Homoscedasticity: There should be no visible pattern in the residual
values.

2. Explain the Anscombe’s quartet in detail.

Answer:
Anscombe's quartet is a collection of four datasets that have nearly identical simple
descriptive statistics but appear very different when graphed. These datasets were
created by the statistician Francis Anscombe in 1973 to illustrate the importance of
graphing data before analyzing it and the impact of outliers and the distribution of
data on statistical measures.

3. What is Pearson’s R?
Answer:
Pearson's R, also known as Pearson's correlation coefficient, is a statistical
measure that quantifies the strength and direction of a linear relationship
between two variables. It is denoted by rr and ranges from -1 to 1.

Key Features of Pearson’s R

Range:
o r=1r=1: Perfect positive linear correlation.
o r=−1r=−1: Perfect negative linear correlation.
o r=0r=0: No linear correlation.
Interpretation:
o Positive Values: Indicates a positive relationship where, as one variable
increases, the other variable also increases.
o Negative Values: Indicates a negative relationship where, as one variable
increases, the other variable decreases.
o Magnitude: The closer the value of rr is to 1 or -1, the stronger the linear
relationship between the two variables.

4. What is scaling? Why is scaling performed? What is the difference between

normalized scaling and standardized scaling?
Answer:
Scaling is a strategy for standardizing the independent features included in data over
a specified range. Data pre-processing involves handling significantly variable
magnitudes, values, and units. Rescaling variables is crucial for achieving a comparable
scale. If scales are not similar, certain coefficients may be much larger or smaller than
others when fitting a regression model.
▪ Normalized Scaling translates data to a scale of 0 to 1. However, this may
result in data loss or distortion of outliers. This method is typically
employed when features vary in magnitude and data distribution is
unclear. Min-Max scaling is a typical approach for such cases, using the
formula below:

Xscaled = X−Xmin / Xmax −Xmin

where:

• X is the original value.

• Xmin is the minimum value of the feature.
• Xmax is the maximum value of the feature.
• Xscaled is the scaled value in the range [0, 1].

▪ Standardized scaling translates data to a conventional normal distribution

using the formula below. This method is typically utilized when the feature
distribution is normal/gaussian, there are no outliers, and the modified
data does not have specified boundaries.

Xstandardized =(X−μ)/ σ

where:

• X is the original value.

• μ is the mean of the feature.
• σ is the standard deviation of the feature.
•
• Xstandardized is the standardized value.

5. You might have observed that sometimes the value of VIF is infinite. Why does this
happen?
Answer:

The Variance Inflation Factor (VIF) measures the collinearity between

predictor variables in multiple regression. It is calculated by dividing the
variance of all betas in a given model by the variance of a single beta when fitted
alone. A VIF of infinity indicates a perfect relationship between two independent
variables, leading to an R² value of 1, which results in 1/(1-R²) being infinite. To
address this, one of the variables causing perfect multicollinearity needs to be
removed from the dataset. In summary, an infinite

VIF value indicates that the corresponding variable can be exactly

expressed as a linear combination of other variables.

6. What is a Q-Q plot? Explain the use and importance of a Q-Q plot in linear
regression.
Answer:
A Q-Q plot, or Quantile-Quantile plot, is a graphical tool used to compare
the distribution of a dataset to a theoretical distribution, often the normal
distribution. The plot displays the quantiles of the sample data against the
quantiles of the theoretical distribution. If the data follows the theoretical
distribution, the points on the Q-Q plot will approximately lie on a straight line.

Use and Importance of a Q-Q Plot in Linear Regression:

Assessing Normality of Residuals:

▪ In linear regression, one key assumption is that the residuals (the
differences between observed and predicted values) are normally
distributed. A Q-Q plot can help check this assumption. If the
residuals are normally distributed, the points on the Q-Q plot will
fall along a straight line.
Detecting Deviations from Normality:
▪ Deviations from the straight line in a Q-Q plot can indicate
departures from normality. For example, systematic deviations,
such as an S-shaped curve, may indicate skewness, while
deviations at the ends of the plot (tails) can indicate the presence
of outliers or heavy tails.
Identifying Potential Problems:
▪ By visualizing the distribution of residuals, a Q-Q plot helps
identify potential problems with the regression model, such as
non-linearity, heteroscedasticity (non-constant variance), or the
presence of outliers. These issues can affect the validity of the
model's predictions and the reliability of statistical tests.
Model Diagnostics and Improvement:
Analysing the Q-Q plot can guide model diagnostics and
improvements. If the residuals are not normally distributed, it may
suggest the need for transforming variables, adding polynomial terms, or
using a different modeling approach.

In summary, a Q-Q plot is a valuable diagnostic tool in linear regression that

helps assess the normality of residuals, detect deviations from normality, identify
potential problems, and guide improvements to the regression model.

Subjective Questions
92% (13)
Subjective Questions
6 pages
Bike Sharing Assignment
100% (6)
Bike Sharing Assignment
7 pages
Exam Final
100% (1)
Exam Final
21 pages
MASTERCAMCNC Programming Workbook - Mill PDF
No ratings yet
MASTERCAMCNC Programming Workbook - Mill PDF
148 pages
Linear Regression
No ratings yet
Linear Regression
16 pages
Girish Chadha - 29th December 2022
100% (3)
Girish Chadha - 29th December 2022
35 pages
Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
Assignment-Based Subjective Questions
No ratings yet
Assignment-Based Subjective Questions
1 page
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Week 1 Activities
No ratings yet
Week 1 Activities
5 pages
MATH2023 Multivariable Calculus Chapter 6 Vector Calculus L2/L3 (Fall 2019)
No ratings yet
MATH2023 Multivariable Calculus Chapter 6 Vector Calculus L2/L3 (Fall 2019)
40 pages
Lec 03 Highway Engineering - Horizontal Alignment
100% (2)
Lec 03 Highway Engineering - Horizontal Alignment
8 pages
Subjective Questions
No ratings yet
Subjective Questions
3 pages
Linear Regression Assignment_Subjective
No ratings yet
Linear Regression Assignment_Subjective
7 pages
Linear Regression Subjective Questions
No ratings yet
Linear Regression Subjective Questions
14 pages
Assignment Linear Regression
No ratings yet
Assignment Linear Regression
10 pages
Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
Assignment-Based Subjective Questions
100% (1)
Assignment-Based Subjective Questions
10 pages
Subjective_Ques_SKS
No ratings yet
Subjective_Ques_SKS
8 pages
Subjective Questions
No ratings yet
Subjective Questions
8 pages
Explain The Linear Regression Algorithm in Detail
No ratings yet
Explain The Linear Regression Algorithm in Detail
12 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Linear_Regression_datascience_basit.pdf
No ratings yet
Linear_Regression_datascience_basit.pdf
19 pages
Topic - 9 PDF
No ratings yet
Topic - 9 PDF
12 pages
Econometrics for finance (2017-I)
No ratings yet
Econometrics for finance (2017-I)
6 pages
Revision235
No ratings yet
Revision235
8 pages
Bike Assignment - Subjective Sol
No ratings yet
Bike Assignment - Subjective Sol
5 pages
unit5_R
No ratings yet
unit5_R
5 pages
Question 4 (A) What Are The Stochastic Assumption of The Ordinary Least Squares? Assumption 1
No ratings yet
Question 4 (A) What Are The Stochastic Assumption of The Ordinary Least Squares? Assumption 1
9 pages
CIA Understanding
No ratings yet
CIA Understanding
5 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
7 OLS Assumptions
No ratings yet
7 OLS Assumptions
37 pages
Uttam Linear Regression 17March24 (1)
No ratings yet
Uttam Linear Regression 17March24 (1)
82 pages
CM
No ratings yet
CM
8 pages
Data Science Assignment
No ratings yet
Data Science Assignment
10 pages
Questions Stats and Trix
No ratings yet
Questions Stats and Trix
39 pages
Regression_Questionnaire
No ratings yet
Regression_Questionnaire
10 pages
Unit-III (Data Analytics)
100% (1)
Unit-III (Data Analytics)
15 pages
Q&A Univ 3unit
No ratings yet
Q&A Univ 3unit
18 pages
STAT22209 - Chapter 03-Multiple Regression - 2022
No ratings yet
STAT22209 - Chapter 03-Multiple Regression - 2022
41 pages
Linear Regression Basic Interview Questions
No ratings yet
Linear Regression Basic Interview Questions
36 pages
2023 Level II Key Facts and Formula Sheet (KFFS)
No ratings yet
2023 Level II Key Facts and Formula Sheet (KFFS)
14 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Econometrics for Mgt ppt-2 (1)
No ratings yet
Econometrics for Mgt ppt-2 (1)
58 pages
Linear_Regression (1)
No ratings yet
Linear_Regression (1)
35 pages
Unit v -Update
No ratings yet
Unit v -Update
53 pages
LINEAR REGRESSION IN R
No ratings yet
LINEAR REGRESSION IN R
6 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Edab Module - 3
No ratings yet
Edab Module - 3
17 pages
Chapter4_Regression.docx
No ratings yet
Chapter4_Regression.docx
15 pages
ML Unit-2 Final
No ratings yet
ML Unit-2 Final
32 pages
Unit 5-1
No ratings yet
Unit 5-1
17 pages
Assosa University School of Graduate Studies Mba Program
No ratings yet
Assosa University School of Graduate Studies Mba Program
10 pages
Linear - Regression & Evaluation Metrics
No ratings yet
Linear - Regression & Evaluation Metrics
31 pages
Unit1 - Data Science - SPPU
No ratings yet
Unit1 - Data Science - SPPU
15 pages
Multiple Linear Regression Analysis
No ratings yet
Multiple Linear Regression Analysis
23 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
51054 Mid Sample Ans
No ratings yet
51054 Mid Sample Ans
2 pages
Datascience Interview
100% (1)
Datascience Interview
31 pages
QUIZ1_solution
No ratings yet
QUIZ1_solution
6 pages
ASM using r 2 marks answer Keys
No ratings yet
ASM using r 2 marks answer Keys
10 pages
STATISTIC%20AND%20DATA%20SCIENCE%20II.pdf
No ratings yet
STATISTIC%20AND%20DATA%20SCIENCE%20II.pdf
37 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
Fulton Harris Representation Theory
100% (2)
Fulton Harris Representation Theory
564 pages
Foliations and Contact Structures On 3-Manifolds
No ratings yet
Foliations and Contact Structures On 3-Manifolds
1 page
Analytical Method Validation Protocol for Pre Probiotic Capsules
No ratings yet
Analytical Method Validation Protocol for Pre Probiotic Capsules
25 pages
1079-C-2457-Third Level SHORT QUES
No ratings yet
1079-C-2457-Third Level SHORT QUES
5 pages
Analysis and Design of Algorithm
No ratings yet
Analysis and Design of Algorithm
2 pages
Week 5 - Data Mining Exploring Data With R
No ratings yet
Week 5 - Data Mining Exploring Data With R
146 pages
Liquid Liquid Extraction
No ratings yet
Liquid Liquid Extraction
9 pages
Data Sufficiency - Introduction and Shortcuts
No ratings yet
Data Sufficiency - Introduction and Shortcuts
6 pages
MCO-03 Dec14
No ratings yet
MCO-03 Dec14
6 pages
Module Iv Types of Basic Digital Design
No ratings yet
Module Iv Types of Basic Digital Design
10 pages
Chapter 3 - On Aggregate Planning - Chapter Paper
No ratings yet
Chapter 3 - On Aggregate Planning - Chapter Paper
6 pages
Moment Distribution (Frames - Non Sway)
No ratings yet
Moment Distribution (Frames - Non Sway)
37 pages
ECE521H1_20191_631567517513Final2019
No ratings yet
ECE521H1_20191_631567517513Final2019
14 pages
SF Challwordproblems2unit1sample
No ratings yet
SF Challwordproblems2unit1sample
5 pages
As 91262
No ratings yet
As 91262
2 pages
Characterizations of Vague Groups: Dr. Hakimuddin Khan
No ratings yet
Characterizations of Vague Groups: Dr. Hakimuddin Khan
2 pages
Angles:: Chapter5: Trigonometric Functions 5.1:angles and Radian Measure
No ratings yet
Angles:: Chapter5: Trigonometric Functions 5.1:angles and Radian Measure
7 pages
Tugas Desain Riset Data Sas - Nandyni Zulfa Fitasari - E10021137 - D
No ratings yet
Tugas Desain Riset Data Sas - Nandyni Zulfa Fitasari - E10021137 - D
4 pages
28 Bridge Service Life Estimation Considering Inspection Reliabilit
No ratings yet
28 Bridge Service Life Estimation Considering Inspection Reliabilit
13 pages
MA 172 Practice Exam 2
No ratings yet
MA 172 Practice Exam 2
19 pages
Adbms Book
No ratings yet
Adbms Book
391 pages
Summative Test 2 With Answer Key
100% (3)
Summative Test 2 With Answer Key
3 pages
Square Root
No ratings yet
Square Root
5 pages
2018 AISSEE Class 9 - Paper 1
No ratings yet
2018 AISSEE Class 9 - Paper 1
20 pages
All Into One ML
No ratings yet
All Into One ML
432 pages
BP FLoc
No ratings yet
BP FLoc
0 pages

Linear Regression Assignment Questions and Answer

Uploaded by

Linear Regression Assignment Questions and Answer

Uploaded by

Assignment-based Subjective Questions

o Dependent Variable (Y): The variable being predicted.

The relationship is represented mathematically by the equation:

Types of Linear Relationships

o Positive Linear Relationship: Both the independent and dependent

2. Explain the Anscombe’s quartet in detail.

Key Features of Pearson’s R

4. What is scaling? Why is scaling performed? What is the difference between

Xscaled = X−Xmin / Xmax −Xmin

• X is the original value.

▪ Standardized scaling translates data to a conventional normal distribution

• X is the original value.

The Variance Inflation Factor (VIF) measures the collinearity between

VIF value indicates that the corresponding variable can be exactly

Use and Importance of a Q-Q Plot in Linear Regression:

Assessing Normality of Residuals:

In summary, a Q-Q plot is a valuable diagnostic tool in linear regression that

You might also like