0% found this document useful (0 votes)

57 views

Notes On Linear Regression - 2

This document discusses the key steps and assumptions in regression model building and diagnostics. It describes collecting and preprocessing data, dividing it into training and validation sets, defining the relationship model, estimating parameters, and performing diagnostics. Diagnostics include hypothesis tests of coefficients, checking for homoscedasticity, identifying outliers, and using metrics like R-squared, F-statistic, leverage values, and t-distributions. The goal is to select the best fitting regression model and identify any issues.

Uploaded by

Shruti Mishra

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views

Notes On Linear Regression - 2

Uploaded by

Shruti Mishra

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Steps in Regression Model Building

 Collect/Extract Data
 Pre-process the Data
 Divide the Data into Training and Validation Data Sets
 Define the Functional Form of Relationship
 Estimate the Regression Parameters
 Perform Regression Model Diagnostics
 Model Deployment

The Assumptions in Regression Models

 The regression model is linear in regression parameters.

 The expected value of the residuals is zero
 The residuals follow a normal distribution. For estimation of regression parameters, the
assumption of normal distribution for errors is not necessary. However, it is essential for
testing hypotheses such as whether there is a statistically significant association relationship
between the outcome variable and the features.
 The variance of the residuals is constant for all values of X . When the variance of the
residuals is constant for different values of X , it is called homoscedasticity. A non-constant
variance of residuals is called heteroscedasticity.

THE MODEL DIAGNOSITICS

Hypothesis Test for Regression Co-Efficient

The regression co-efficient (b1 ) captures the existence of a linear relationship between the response
variable and the explanatory variable. If b1 = 0, we can conclude that there is no statistically
significant linear relationship between the two variables.

The null and alternative hypotheses for the SLR model can be stated as follows:

H0 : There is no relationship between X and Y

HA: There is a relationship between X and Y

b1 = 0 would imply that there is no linear relationship between the response variable Y and the
explanatory variable X. Thus, the null and alternative hypotheses can be restated as follows:

H0 : b1 = 0

HA: b1 ≠ 0

If the p-value is less than 0.05 (or an appropriate significance value), we reject the null hypothesis
and conclude that there is significant evidence suggesting a linear relationship between X and Y.
(remember, the p-value gets smaller as the test statistic calculated from the data gets further away
from the center which is zero as predicted by the null hypothesis)

What is Homoskedasticity?

Refers to a condition in which the variance of the residual, or error term, in a regression model is
constant. That is, the error term does not vary much as the value of the predictor variable changes.
Another way of saying this is that the variance of the data points is roughly the same for all data
points.

This suggests a level of consistency and makes it easier to model and work with the data through
regression; however, the lack of homoskedasticity may suggest that the regression model may need
to include additional predictor variables to explain the performance of the dependent variable.

What is Heterocedasticity

Heteroskedasticity happens when the standard deviations of a predicted variable, monitored over

different values of an independent variable or as related to prior time periods, are non-constant.

With heteroskedasticity, the tell-tale sign upon visual inspection of the residual errors is that they
will tend to fan out (errors increase as the X or Y variable increases in magnitude)

What is co-efficient of determination (R-squared)?

The primary objective of regression is to explain the variation in Y using the knowledge of X. The
coefficient of determination (or R-square or R2 ) measures the percentage of variation in Y explained
by the model (b0 + b1 X).

R2 is the proportion of variation in response variable Y explained by the regression model.

Coefficient of determination (R2 ) has the following properties:

 The value of R2 lies between 0 and 1.

 Higher value of R2 implies better fit, but one should be aware of spurious regression
 Mathematically, the square of correlation coefficient is equal to coefficient of determination.
 We do not put any minimum threshold for R-squared ; higher value of R-squared implies
better fit

Calculation of R-Squared

R-Squared = SSR/SST

SSR: is the sum of squares due to regression (explained sum of squares)

SST: is the total sum of squares

Outlier Analysis:

The following distance measures are useful in identifying the influential observations:

 Z-Score
 Cook’s Distance
 Leverage Values

Z-Score

Z-score is the standardized distance of an observation from its mean value. For the predicted value
of the dependent variable Y, the Z-score is given by

Ypred – Ymean/Std-Y

Cook’s Distance

Cook’s distance measures how much the predicted value of the dependent variable changes for all
the observations in the sample when a particular observation is excluded from sample for the
estimation of regression parameters.

Leverage Value

Leverage value of an observation measures the influence of that observation on the overall fit of the
regression function.

Leverage value of more than 2k/n or 3k/n is treated as highly influential observation.

F-Statistic

Using the Analysis of Variance (ANOVA), we can test whether the overall model is statistically
significant.
The null and alternative hypothesis for F-test are given by

H0 : There is no statistically significant relationship between Y and any of the explanatory variables
(i.e., all regression coefficients are zero).

H1 : Not all regression coefficients are zero.

Alternatively:

H0 : All regression coefficients are equal to zero.

HA: Not all regression coefficients are equal to zero.

The F-statistic is given by

F = [SSR/k] / [SSE/n-k-1] = MSR/MSE

Where k is no. of parameters, n is no. of observations.

T-Distribution

The t-distribution, also known as Student’s t-distribution, is a way of describing data that follow a

bell curve when plotted on a graph, with the greatest number of observations close to the mean and
fewer observations in the tails.

It is a type of normal distribution used for smaller sample sizes, where the variance in the data is
unknown.

The t-distribution is used when data are approximately normally distributed, which means the data
follow a bell shape but the population variance is unknown. The variance in a t-distribution is
estimated based on the degrees of freedom of the data set (total number of observations minus 1).

It is a more conservative form of the standard normal distribution, also known as the z-distribution.

This means that it gives a lower probability to the center and a higher probability to the tails than
the standard normal distribution.

Machine Learning For Algorithmic Trading
44% (9)
Machine Learning For Algorithmic Trading
13 pages
The Dissertation Journey
100% (10)
The Dissertation Journey
249 pages
Sunbeam Popcorn Maker FPSBPP7310 FPSBPP7316
60% (10)
Sunbeam Popcorn Maker FPSBPP7310 FPSBPP7316
9 pages
Project Report Ai in Finance
100% (1)
Project Report Ai in Finance
38 pages
Ford Escape 4wd Workshop Manual v6 3 0l 2008
100% (4)
Ford Escape 4wd Workshop Manual v6 3 0l 2008
7,556 pages
Statistics, Data Analysis, and Decision Modeling, 5th Edition
100% (5)
Statistics, Data Analysis, and Decision Modeling, 5th Edition
556 pages
EPSM Unit 7 Data Analytics
100% (1)
EPSM Unit 7 Data Analytics
27 pages
Customer Churn Case Answers
No ratings yet
Customer Churn Case Answers
8 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Combine 02
No ratings yet
Combine 02
683 pages
Panel Data Lecture Notes
No ratings yet
Panel Data Lecture Notes
38 pages
Regression basics
No ratings yet
Regression basics
27 pages
Linear Regression
100% (2)
Linear Regression
28 pages
Bio2 Module 4 - Multiple Linear Regression
No ratings yet
Bio2 Module 4 - Multiple Linear Regression
20 pages
DOM105 Session 20
No ratings yet
DOM105 Session 20
10 pages
Module 3 EDA
No ratings yet
Module 3 EDA
14 pages
Evans - Analytics2e - PPT - 07 and 08
No ratings yet
Evans - Analytics2e - PPT - 07 and 08
49 pages
Econometrics notes Final
No ratings yet
Econometrics notes Final
10 pages
BRM Assignment
No ratings yet
BRM Assignment
26 pages
Summary: Correlation and Regression
No ratings yet
Summary: Correlation and Regression
6 pages
Econometrics Practical
No ratings yet
Econometrics Practical
13 pages
CORRELATION
No ratings yet
CORRELATION
10 pages
Quantitative Methods Vocabulary
No ratings yet
Quantitative Methods Vocabulary
5 pages
Statistical Testing and Prediction Using Linear Regression: Abstract
No ratings yet
Statistical Testing and Prediction Using Linear Regression: Abstract
10 pages
R-programming - Unit 5
No ratings yet
R-programming - Unit 5
43 pages
Unit-III (Data Analytics)
100% (1)
Unit-III (Data Analytics)
15 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
26 pages
Chapter 4
No ratings yet
Chapter 4
15 pages
304BA AdvancedStatisticalMethodsUsingR
No ratings yet
304BA AdvancedStatisticalMethodsUsingR
31 pages
Module 6 RM: Advanced Data Analysis Techniques
No ratings yet
Module 6 RM: Advanced Data Analysis Techniques
23 pages
8multiple Linear Regression
100% (1)
8multiple Linear Regression
21 pages
W6 - L6 - Multiple Linear Regression
No ratings yet
W6 - L6 - Multiple Linear Regression
3 pages
Multiple linear regression
No ratings yet
Multiple linear regression
39 pages
Correlation
No ratings yet
Correlation
5 pages
Psych Stat Reviewer Midterms
No ratings yet
Psych Stat Reviewer Midterms
10 pages
Ch 4- Correlation and Regression YARA&LAMA
No ratings yet
Ch 4- Correlation and Regression YARA&LAMA
27 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
CHAPTER 14 Regression Analysis
No ratings yet
CHAPTER 14 Regression Analysis
69 pages
Data Science Assignment
No ratings yet
Data Science Assignment
10 pages
REGRESSION
No ratings yet
REGRESSION
8 pages
Term End Examination: Descriptive Answer Script
No ratings yet
Term End Examination: Descriptive Answer Script
5 pages
Explain The Linear Regression Algorithm in Detail
No ratings yet
Explain The Linear Regression Algorithm in Detail
12 pages
@Vtucode.in 21CS71 Module 5 PDF[1]
No ratings yet
@Vtucode.in 21CS71 Module 5 PDF[1]
5 pages
Lecture 4 - Correlation and Regression
No ratings yet
Lecture 4 - Correlation and Regression
35 pages
1_UNIT 2 2 files merged
No ratings yet
1_UNIT 2 2 files merged
80 pages
MATH 101-Week 7-8- Lesson 4.1 Correlation & Regression Analysis
No ratings yet
MATH 101-Week 7-8- Lesson 4.1 Correlation & Regression Analysis
53 pages
Regression Anaysis Explaination Lecture Notes by Dr. Wahid Sherani
No ratings yet
Regression Anaysis Explaination Lecture Notes by Dr. Wahid Sherani
7 pages
Multiple Regression: X X, Then The Form of The Model Is Given by
No ratings yet
Multiple Regression: X X, Then The Form of The Model Is Given by
12 pages
Linear Regression and Correlation
No ratings yet
Linear Regression and Correlation
65 pages
Econometrics: Two Variable Regression: The Problem of Estimation
No ratings yet
Econometrics: Two Variable Regression: The Problem of Estimation
28 pages
Shell Regression
No ratings yet
Shell Regression
16 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
Jimma University: M.SC in Economics (Industrial Economics) Regular Program Individual Assignment: Econometrics
No ratings yet
Jimma University: M.SC in Economics (Industrial Economics) Regular Program Individual Assignment: Econometrics
20 pages
Chapter 9 Multiple Regression Analysis: The Problem of Inference
No ratings yet
Chapter 9 Multiple Regression Analysis: The Problem of Inference
10 pages
Cheat Sheet
No ratings yet
Cheat Sheet
5 pages
Data Analytics Lesson 11 Notes
No ratings yet
Data Analytics Lesson 11 Notes
8 pages
Chapter 6: How To Do Forecasting by Regression Analysis
No ratings yet
Chapter 6: How To Do Forecasting by Regression Analysis
7 pages
Viva Update For BS
No ratings yet
Viva Update For BS
10 pages
Level 2 Quants Notes
No ratings yet
Level 2 Quants Notes
7 pages
QT _Unit 2_Part B - Regression
No ratings yet
QT _Unit 2_Part B - Regression
40 pages
Ecotrix Ecotrix: B.A. Economics (Hons.) (University of Delhi) B.A. Economics (Hons.) (University of Delhi)
No ratings yet
Ecotrix Ecotrix: B.A. Economics (Hons.) (University of Delhi) B.A. Economics (Hons.) (University of Delhi)
18 pages
Pearson-Correlation-and-Linear-Regression
No ratings yet
Pearson-Correlation-and-Linear-Regression
42 pages
Regression Notes
No ratings yet
Regression Notes
6 pages
ArunRangrej
No ratings yet
ArunRangrej
5 pages
Linear - Regression & Evaluation Metrics
No ratings yet
Linear - Regression & Evaluation Metrics
31 pages
Experiment No 7
No ratings yet
Experiment No 7
7 pages
Chapter 5 - Violations of Regression Assumptions
No ratings yet
Chapter 5 - Violations of Regression Assumptions
44 pages
W6 - L4 - Simple Linear Regression
No ratings yet
W6 - L4 - Simple Linear Regression
4 pages
Cointegration
No ratings yet
Cointegration
6 pages
Chapter 4 MLR
No ratings yet
Chapter 4 MLR
17 pages
TED Talks List
100% (2)
TED Talks List
15 pages
Resume Updated
100% (3)
Resume Updated
2 pages
Consumer Reports Buying Guide 2021
100% (1)
Consumer Reports Buying Guide 2021
227 pages
Blackbook - Stealth Influence
100% (5)
Blackbook - Stealth Influence
104 pages
A Collection of Fraud Schemes
67% (3)
A Collection of Fraud Schemes
54 pages
Grammarly GenAI-TLP v2.0
No ratings yet
Grammarly GenAI-TLP v2.0
23 pages
Capital One Hack Advisory
No ratings yet
Capital One Hack Advisory
3 pages
Online Casino Software For Sale and Casino Software Solutions
No ratings yet
Online Casino Software For Sale and Casino Software Solutions
2 pages
The Impact of Control Technology
No ratings yet
The Impact of Control Technology
246 pages
Heting Chu - Research Methods and Design Beyond A Single Discipline - From Principles To Practice-Routledge (2024)
No ratings yet
Heting Chu - Research Methods and Design Beyond A Single Discipline - From Principles To Practice-Routledge (2024)
360 pages
Immediate download Decision Sciences: Theory and Practice 1st Edition Raghu Nandan Sengupta ebooks 2024
100% (1)
Immediate download Decision Sciences: Theory and Practice 1st Edition Raghu Nandan Sengupta ebooks 2024
55 pages
Mindmapping in 8 Easy Steps
No ratings yet
Mindmapping in 8 Easy Steps
40 pages
Knowledge Graphs Data in Context Responsive
100% (2)
Knowledge Graphs Data in Context Responsive
87 pages
Data Warehouse and Data Mining Notes
No ratings yet
Data Warehouse and Data Mining Notes
66 pages
Focus Investing PDF
No ratings yet
Focus Investing PDF
18 pages
Data Analytics Concepts Techniques and A PDF
100% (11)
Data Analytics Concepts Techniques and A PDF
451 pages
GRE Text Completion and Sentence Equivalence Practice Questions
100% (2)
GRE Text Completion and Sentence Equivalence Practice Questions
32 pages
(PDF) Introduction To Selling Value - Course-Final
No ratings yet
(PDF) Introduction To Selling Value - Course-Final
75 pages
Twelve Systems Engineering Roles
No ratings yet
Twelve Systems Engineering Roles
12 pages
Whitepaper - Third-Party Risk Management Services
No ratings yet
Whitepaper - Third-Party Risk Management Services
24 pages
Microsoft AppSource Partner Listing Guidelines PDF
No ratings yet
Microsoft AppSource Partner Listing Guidelines PDF
10 pages
How To Make $$ With AI
No ratings yet
How To Make $$ With AI
15 pages
VP Director Customer Success in San Francisco Bay CA Resume Thomas Tinor
No ratings yet
VP Director Customer Success in San Francisco Bay CA Resume Thomas Tinor
2 pages
(Ebook) Analysis of Panel Data by Cheng Hsiao ISBN 9781107038691, 1107038693 - Download the ebook today to explore every detail
100% (2)
(Ebook) Analysis of Panel Data by Cheng Hsiao ISBN 9781107038691, 1107038693 - Download the ebook today to explore every detail
61 pages
Cross-Sectional Dependence in Panel Data Analysis
No ratings yet
Cross-Sectional Dependence in Panel Data Analysis
43 pages
X X B X B X B y X X B X B N B Y: QMDS 202 Data Analysis and Modeling
No ratings yet
X X B X B X B y X X B X B N B Y: QMDS 202 Data Analysis and Modeling
6 pages
Chapter3 Anova Experimental Design Models
No ratings yet
Chapter3 Anova Experimental Design Models
36 pages
Autocorrelation
No ratings yet
Autocorrelation
16 pages
Semester-Long Internship Report: Tanmay Srinath (BMSCE, Bangalore)
No ratings yet
Semester-Long Internship Report: Tanmay Srinath (BMSCE, Bangalore)
31 pages
BEM3014 Econometric Modelling Course Plan Trimester 2, 201415
No ratings yet
BEM3014 Econometric Modelling Course Plan Trimester 2, 201415
3 pages
Correlation and Regression
No ratings yet
Correlation and Regression
8 pages
Hybrid Intrusion Detection System Based On Combination of
No ratings yet
Hybrid Intrusion Detection System Based On Combination of
16 pages
Lab 6
No ratings yet
Lab 6
6 pages
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
100% (1)
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
16 pages
ICT513 Data Analytics - A3
No ratings yet
ICT513 Data Analytics - A3
25 pages
Statistix 8
No ratings yet
Statistix 8
2 pages
CS273a Final Exam
No ratings yet
CS273a Final Exam
9 pages
Explore: Case Processing Summary
No ratings yet
Explore: Case Processing Summary
9 pages
Ch12 - Multiple Linear Regression
No ratings yet
Ch12 - Multiple Linear Regression
11 pages
Spearman Rho Correlation
No ratings yet
Spearman Rho Correlation
10 pages
Unbalanced Panel Data PDF
No ratings yet
Unbalanced Panel Data PDF
51 pages
JIEGMK
No ratings yet
JIEGMK
6 pages
Anis Case Study of Soft Drink Demand Estimation
75% (4)
Anis Case Study of Soft Drink Demand Estimation
10 pages
Course Outline Econometrics
No ratings yet
Course Outline Econometrics
2 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
Fdsa UNIT V
No ratings yet
Fdsa UNIT V
18 pages
A review of the-state-of-the-art in data-driven approaches for building
No ratings yet
A review of the-state-of-the-art in data-driven approaches for building
23 pages
Bus Stats Ch15 PDF
No ratings yet
Bus Stats Ch15 PDF
90 pages
Module 34. Analysis of Variance (ANOVA) PDF
No ratings yet
Module 34. Analysis of Variance (ANOVA) PDF
89 pages

Notes On Linear Regression - 2

Uploaded by

Notes On Linear Regression - 2

Uploaded by

Steps in Regression Model Building

The Assumptions in Regression Models

 The regression model is linear in regression parameters.

THE MODEL DIAGNOSITICS

Hypothesis Test for Regression Co-Efficient

H0 : There is no relationship between X and Y

HA: There is a relationship between X and Y

Heteroskedasticity happens when the standard deviations of a predicted variable, monitored over

What is co-efficient of determination (R-squared)?

R2 is the proportion of variation in response variable Y explained by the regression model.

 The value of R2 lies between 0 and 1.

SSR: is the sum of squares due to regression (explained sum of squares)

SST: is the total sum of squares

H1 : Not all regression coefficients are zero.

H0 : All regression coefficients are equal to zero.

HA: Not all regression coefficients are equal to zero.

The F-statistic is given by

F = [SSR/k] / [SSE/n-k-1] = MSR/MSE

Where k is no. of parameters, n is no. of observations.

The t-distribution, also known as Student’s t-distribution, is a way of describing data that follow a

It is a more conservative form of the standard normal distribution, also known as the z-distribution.

You might also like