0% found this document useful (0 votes)

4 views

Residual Analysis and test_02

Uploaded by

Vipul Khandke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Residual Analysis and test_02

Uploaded by

Vipul Khandke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 37

ICS422 Applied Predictive Analytics [3- 0-0-3]

Linear regression Residual Analysis

Class 15
Presented by
Dr. Selvi C
Assistant
Professor
IIIT Kottayam
Simple Linear Regression
• Simple linear regression is really a comparison of two
models
• One is where the independent variable does not even exist
• And the other uses the best fit regression line
• If there is only one variable, the best prediction for other
values is the mean of the “dependent” variable
• The difference between the best-fit line and the observed
value is called the residual (or error)
• The residuals are squared and then added together to
generate sum of squares (LITERALLY) residuals / error, SSE.
• Simple linear regression is designed to find the best fitting
line through the data that minimizes the SSE.
ICS 223 Compiler Design 2
3
4
Simple Linear Regression

5
REGRESSION EQUATION WITH
ESTIMATES
• If we actually knew the population parameters, Bo and
B1, we could use the Simple Linear Regression
Equation.
E(y) = + x
• In reality we almost never have the population
parameters. Therefore we will estimate them using
sample data. When using sample data, we have to
change our equation a little bit.
• ŷ, pronounced "y-hat“ is the point estimator of E(y)
ŷ= +x
• ŷ, is the mean value of y for a given value of x.
ICS 223 Compiler Design 6
Least square criterion
• = observed value of dependent variable (tip amount)
• = estimated (predicted) value of the dependent
variable (predicted tip amount)
• The goal is to minimize the sum of the squared
differences between the observed value for the
dependent variable () and the estimated/predicted
value of the dependent variable () that is provided by
the regression line. Sum of the squared residuals.

ICS 223 Compiler Design 7

Parameters
1.For each data point. 1. For each data point.
2. Take the x-value and 2. Take the x-value and
subtract the mean of x. subtract the mean of x.
3. Square Step 2 3. Take the y-value and
subtract the mean of y.
4. Add up all the products.
4. Multiply Step 2 and Step
3
5.Add up all of the products.

ICS 223 Compiler Design 8

Example

ICS 223 Compiler Design 9

• For every $1 the bill amount (x) increases, we would
expect the tip amount to increase by $0.1462 or about
15-cents.
• If the bill amount (x) is zero, then the
expected/predicted tip amount is $- 0.8188 or negative
82-cents! Does this make sense? NO. The intercept may
or may not make sense in the "real world."

ICS 223 Compiler Design 10

RESIDUAL ANALYSIS

• Residual (n): a quantity remaining after other things

have been subtracted or allowed for.
• Difference between the observed value of the
dependent variable (tip amount) and what is predicted
by the regression model
• So if the model predicts a tip of $10 for a given meal,
but the observed tip is $12, then the residual amount is
12 - 10 = 2
•-
• Observed tip - Predicted tip

ICS 223 Compiler Design 11

Goodness of fit
• Only a part of the variance in the dependent variable
will be explained by the values of the independent
variable;
• =(SSR / SST)
• The variance left unexplained is due to model error (SSE
/ SST)
• Think "How far off" or "How good" the model accounts
for the variance in the dependent variable

ICS 223 Compiler Design 12

Model Assumption

• Residuals offer the best information about the error

term, ε
• The expected value of the error term is zero; E (ε) = 0
• For all values of the independent variable x, the
variance of the error term ε is the same
• The values of the error term ε are independent of each
other
• The error term ε follows a normal distribution

ICS 223 Compiler Design 13

Assumption
For the results of a linear regression model to be valid and reliable, we
need to check that the following four assumptions are met:
1. Linear relationship: There exists a linear relationship between the
independent variable, x, and the dependent variable, y.
2. Independence: The residuals are independent. In particular, there is
no correlation between consecutive residuals in time series data.
3. Homoscedasticity: The residuals have constant variance at every
level of x.
4. Normality: The residuals of the model are normally distributed.

If one or more of these assumptions are violated, then the results of our
linear regression may be unreliable or even misleading.
14
15
Best case residual distribution
• Evenly distributed left to right, up to down, all over the graph

16
• Residuals are not evenly distributed

17
Points observed

• What happens if the residual analysis reveals

heteroscedasticity?
• Rebuild the model with different independent variable(s)
• Perform transformations on non-linear data
• Fit a non-linear regression model... but don't OVERFIT
• Are there statistical tests for residuals?

ICS 223 Compiler Design 18

R2 INTERPRETATION

ICS 223 Compiler Design 19

R2 INTERPRETATION
• Coefficient of Determination = r2 = 0.7493 or 74.93%
• We can conclude that 74.93% of the total sum of
squares can be explained by using the estimated
regression equation to predict the tip amount.
• The remainder is error.

ICS 223 Compiler Design 20

Comparison of R-squared to the
Standard Error of the Regression (S)
• The standard error of the regression provides the absolute
measure of the typical distance that the data points fall from
the regression line. S is in the units of the dependent variable.
• R-squared provides the relative measure of the percentage of
the dependent variable variance that the model explains. R-
squared can range from 0 to 100%.

21
Sum of Squared Error

• A measure of the variability of the observation about

the regression line

ICS 223 Compiler Design 22

Mean Squared Error
• MSE is an estimate of the variance of the error, ɛ.
• In other words, how spread out the data points are from the
regression line. MSE is SSE divided by its degrees of freedom
which is 2 because we are estimating the slope and
intercept.
MSE = =SSE/n-2
• Why divide by n - 2 and not just N? REMEMBER, we are using
sample data. It's also why we use and not .
• This is why MSE is not simply the average of the residuals.
• If we were using population data, we would just divide by N
and it would simply be the average of the residuals.
ICS 223 Compiler Design 23
Standard error of the Estimate
• The standard error of the estimate σ (or just "standard error")
is the standard deviation of the error term, ɛ. Now we are UN-
SQUARED!
• It is the average distance an observation falls from the
regression line in units of the dependent variable.
• Since the MSE is , the standard error is just the square root of
MSE.
• s= √MSE = √ SSE/n-2
• s = √7.5187 = 2.742
• So the average distance of the data points from the fitted line
is about $2.74.
ICS 223 Compiler Design
• You can think of s as a measure of how well the regression 24
Statistically significant
• How much variance in the dependent variable is
explained by the model / independent variable?
• For this we look at the value of R2 or Adjusted- R2
• Does a statistically significant linear relationship exist
between the independent and dependent variables?
• Is the overall F-test or t-test (in simple regression these are
actually the same thing) significant?
• Can we reject the null hypothesis that the slope b1 of the
regression line is ZERO?
• Does the confidence interval for the slope b1 contain zero?

ICS 223 Compiler Design 25

Estimators Everywhere
Linear regression contains many estimators
• the slope of the regression line
• the intercept of the regression line on the y-axis
• Centroid: the point that is the intersection of the mean
of each variable (x, y)
• The mean value of ŷ* for any value of x* (confidence
interval)
• The individual value of ŷ* for any value of x* (prediction
interval)
• And many others about variance, etc.
ICS 223 Compiler Design 26
Standard error of the Estimate
• The standard error of the estimate σ (or just "standard error")
is the standard deviation of the error term, ɛ.
• Since the MSE is , the standard error is just the square root of
MSE.
• s= √MSE = √ SSE/n-2
• s = √7.5187 = 2.742

ICS 223 Compiler Design 27

Degree of freedom
• What are degrees of freedom in statistics? Degrees of
freedom are the number of independent values that a
statistical analysis can estimate.
• Calculating the degrees of freedom is often the sample
size minus the number of parameters you’re estimating

28
Confidence Interval
• 95% confidence that the actual mean for the population
falls within this interval
t-value calculation
•±
Where is standard deviation of the slope,
is margin of error, is point estimator for the slope

29
Standard Deviation of the slope
•=

• =2.742/sqrt(4206)
• =0.04228

ICS 223 Compiler Design 30

31
Confidence Interval for slope
• ± 0.04228
• ± 0.04228
• ±
• (0.02885,0.2636)
We are 95% confident that the interval
(0.02885,0.2636)contains the true slope of the regression
line

ICS 223 Compiler Design 32

Does the interval contain zero?
• (0.02885,0.2636)

• Hypothesis :

• Can we reject null hypothesis have slope as zero?

• Null hypothesis is that the slope of the regression line
is zero and therefore there is no significant
relationship exist between two variables.

ICS 223 Compiler Design 33

Test statistics
• = = 3.4584
• t vs
• 3.4584 > 2.776 is significant, so reject null hypothesis

ICS 223 Compiler Design 34

Summary
• Does the confidence interval for the slope, contain the
value of ZERO?
• Is the test statistic t greater than the critical value for t
at the chosen significance level and correct degrees of
freedom?

ICS 223 Compiler Design 35

Any
Queries?
Thank you

The Psychology of Phubbing Yeslam Al Saggaf in 978 Annas Archive
No ratings yet
The Psychology of Phubbing Yeslam Al Saggaf in 978 Annas Archive
95 pages
Elan Guides Formula Sheet CFA 2013 Level 2
100% (2)
Elan Guides Formula Sheet CFA 2013 Level 2
91 pages
Residual Analysis and Test - 02
No ratings yet
Residual Analysis and Test - 02
22 pages
03 Revisions L Regression
No ratings yet
03 Revisions L Regression
25 pages
Regression Analysis and Multiple Regression: Session 7
No ratings yet
Regression Analysis and Multiple Regression: Session 7
100 pages
Lecturer 10 UET
No ratings yet
Lecturer 10 UET
54 pages
Yesim Ozan - Simple Linear Regression-Presentation - 08.08.15
No ratings yet
Yesim Ozan - Simple Linear Regression-Presentation - 08.08.15
19 pages
1.10 Simple Linear Regression - Answers
No ratings yet
1.10 Simple Linear Regression - Answers
22 pages
Lecturer 4 Regression Analysis
100% (1)
Lecturer 4 Regression Analysis
29 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
Week 1 MOD A Simple Regression Chapter 12 Berenson
No ratings yet
Week 1 MOD A Simple Regression Chapter 12 Berenson
60 pages
CSL0777 L12
No ratings yet
CSL0777 L12
18 pages
Module Three: Determining Cause and Making Reliable Forecasts
No ratings yet
Module Three: Determining Cause and Making Reliable Forecasts
44 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
95 pages
K 14slr PDF
No ratings yet
K 14slr PDF
49 pages
Lecture 6 Simple Linear Regression
No ratings yet
Lecture 6 Simple Linear Regression
36 pages
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
No ratings yet
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
67 pages
12 W12NSE6220 - Fall 2023 - Zeng
No ratings yet
12 W12NSE6220 - Fall 2023 - Zeng
44 pages
Simple Regression Analysis
No ratings yet
Simple Regression Analysis
60 pages
Chapter 2 Simple Linear Regression - Jan2023
No ratings yet
Chapter 2 Simple Linear Regression - Jan2023
66 pages
15.Simple Linear Regression-530
No ratings yet
15.Simple Linear Regression-530
54 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
328formulas03 (2019 - 04 - 03 15 - 13 - 21 UTC)
No ratings yet
328formulas03 (2019 - 04 - 03 15 - 13 - 21 UTC)
12 pages
Regrion
No ratings yet
Regrion
19 pages
Inference For Regression
No ratings yet
Inference For Regression
24 pages
Chapter14
No ratings yet
Chapter14
65 pages
9 W9INSE6220 Fall 2023
No ratings yet
9 W9INSE6220 Fall 2023
42 pages
Complete Business Statistics: Simple Linear Regression and Correlation
No ratings yet
Complete Business Statistics: Simple Linear Regression and Correlation
50 pages
Lecture 14 PDF
No ratings yet
Lecture 14 PDF
32 pages
Simple Linear Regression sample
No ratings yet
Simple Linear Regression sample
55 pages
2023 Statistics Fin 11
No ratings yet
2023 Statistics Fin 11
19 pages
Chapter 7 (I) Correlation and Regression Model - Oct21
No ratings yet
Chapter 7 (I) Correlation and Regression Model - Oct21
23 pages
File4-Session3-Introduction To Regression
No ratings yet
File4-Session3-Introduction To Regression
50 pages
Stat 302 Lec 12
No ratings yet
Stat 302 Lec 12
59 pages
Chapter 8 Regression Model - 2023
No ratings yet
Chapter 8 Regression Model - 2023
21 pages
Chap01-3 (Autosaved)
No ratings yet
Chap01-3 (Autosaved)
51 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
STAB27
No ratings yet
STAB27
51 pages
Regression
No ratings yet
Regression
46 pages
Regression Basics: Predicting A DV With A Single IV
No ratings yet
Regression Basics: Predicting A DV With A Single IV
20 pages
L1 QM07 High Yield Notes
No ratings yet
L1 QM07 High Yield Notes
4 pages
Simple Regression
No ratings yet
Simple Regression
46 pages
Estimation of Causal Relationships I: Illustration 1
No ratings yet
Estimation of Causal Relationships I: Illustration 1
8 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Applied Quantitative Analysis and Practices: Lecture#22
No ratings yet
Applied Quantitative Analysis and Practices: Lecture#22
27 pages
Applied Statistics II-SLR
100% (1)
Applied Statistics II-SLR
23 pages
Lecture 5 Regression
No ratings yet
Lecture 5 Regression
77 pages
-1
No ratings yet
-1
51 pages
Part 8 Linear Regression
No ratings yet
Part 8 Linear Regression
6 pages
U02Lecture06 Regression
No ratings yet
U02Lecture06 Regression
25 pages
Biostat Lecture 10
No ratings yet
Biostat Lecture 10
47 pages
Regress Model Chap 2 Four Per Page
No ratings yet
Regress Model Chap 2 Four Per Page
10 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
Evans - Analytics2e - PPT - 07 and 08
No ratings yet
Evans - Analytics2e - PPT - 07 and 08
49 pages
Linear Regression Full Version
No ratings yet
Linear Regression Full Version
34 pages
P&W CH24
No ratings yet
P&W CH24
38 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
51 pages
Linear Models
No ratings yet
Linear Models
92 pages
SPSS Regression PC
No ratings yet
SPSS Regression PC
8 pages
Chapter Two Correlation Analysis
No ratings yet
Chapter Two Correlation Analysis
78 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Hive-Part-2
No ratings yet
Hive-Part-2
47 pages
MapReduce
No ratings yet
MapReduce
37 pages
Naive-Bayes
No ratings yet
Naive-Bayes
25 pages
hive updated
No ratings yet
hive updated
18 pages
R Data Structures_07_4
No ratings yet
R Data Structures_07_4
27 pages
4 Hadoop Ecosystem
No ratings yet
4 Hadoop Ecosystem
16 pages
5 Decision Tree Updated
No ratings yet
5 Decision Tree Updated
30 pages
Linear Regression
No ratings yet
Linear Regression
12 pages
hive table session
No ratings yet
hive table session
23 pages
3.1 K Nearest Neighbour Classifier (1)
No ratings yet
3.1 K Nearest Neighbour Classifier (1)
24 pages
5 Decision Tree
No ratings yet
5 Decision Tree
26 pages
R Operators_03
No ratings yet
R Operators_03
26 pages
R Statements_04
No ratings yet
R Statements_04
21 pages
R Basics_02
No ratings yet
R Basics_02
34 pages
R Data Structures_07_3
No ratings yet
R Data Structures_07_3
35 pages
R Loops_05
No ratings yet
R Loops_05
16 pages
R Functions_06
No ratings yet
R Functions_06
26 pages
R_DataPreprocessing
No ratings yet
R_DataPreprocessing
23 pages
R Data Structures_07_2
No ratings yet
R Data Structures_07_2
18 pages
R Data Structures_07_1
No ratings yet
R Data Structures_07_1
30 pages
MLR- R and R2
No ratings yet
MLR- R and R2
17 pages
MLR Multicollinearlty,Categorical Variable
No ratings yet
MLR Multicollinearlty,Categorical Variable
48 pages
Multiple Linear Regression_Excel
No ratings yet
Multiple Linear Regression_Excel
14 pages
Ch01_ICS422_04
No ratings yet
Ch01_ICS422_04
84 pages
Ch01_ICS422_01
No ratings yet
Ch01_ICS422_01
42 pages
Ch01_ICS422_03
No ratings yet
Ch01_ICS422_03
46 pages
Ch01_ICS422_02
No ratings yet
Ch01_ICS422_02
39 pages
LR Assumptions_05
No ratings yet
LR Assumptions_05
12 pages
Ecotrix CA
No ratings yet
Ecotrix CA
17 pages
Preliminary Sizing Correlations For The Rear-End of Transport Aircraft
No ratings yet
Preliminary Sizing Correlations For The Rear-End of Transport Aircraft
9 pages
ITS66604 MidTerm Individual - Preview
No ratings yet
ITS66604 MidTerm Individual - Preview
7 pages
Effect of Technology Innovation On Growth of Small Medium Enterprises in Eldoret Town
No ratings yet
Effect of Technology Innovation On Growth of Small Medium Enterprises in Eldoret Town
15 pages
Artificial Intelligence, Machine Learning, Deep Learning & Data Science
No ratings yet
Artificial Intelligence, Machine Learning, Deep Learning & Data Science
9 pages
Cs3491 Aiml Q&A Material
No ratings yet
Cs3491 Aiml Q&A Material
22 pages
(eBook PDF) Operations Management: Sustainability and Supply Chain Management, 12th Global Edition - The ebook with rich content is ready for you to download
100% (2)
(eBook PDF) Operations Management: Sustainability and Supply Chain Management, 12th Global Edition - The ebook with rich content is ready for you to download
43 pages
Download full Microeconometrics Using Stata Second Edition Volume I Cross Sectional and Panel Regression Models A. Colin Cameron & Pravin K. Trivedi ebook all chapters
100% (1)
Download full Microeconometrics Using Stata Second Edition Volume I Cross Sectional and Panel Regression Models A. Colin Cameron & Pravin K. Trivedi ebook all chapters
51 pages
Cost Accounting: Midterm Reviewer (BSAC-1A)
No ratings yet
Cost Accounting: Midterm Reviewer (BSAC-1A)
11 pages
Data Science and Artificial Intelligence
No ratings yet
Data Science and Artificial Intelligence
9 pages
1-s2.0-S0264410X14017241-Cesaire Damien Ahanhanzo Benin Cost 2014
No ratings yet
1-s2.0-S0264410X14017241-Cesaire Damien Ahanhanzo Benin Cost 2014
23 pages
Lecture 1-3 Research Methods, Final
No ratings yet
Lecture 1-3 Research Methods, Final
154 pages
The Effect of Salinity On Growth of Rainbow Trout
No ratings yet
The Effect of Salinity On Growth of Rainbow Trout
7 pages
Engaruh Pendelegasian Wewenang Dan Tanggung Jawab Terhadap Efektivitas Kerja Pegawai Pada Dinas Kesehatan Provinsi Sumatera Utara
No ratings yet
Engaruh Pendelegasian Wewenang Dan Tanggung Jawab Terhadap Efektivitas Kerja Pegawai Pada Dinas Kesehatan Provinsi Sumatera Utara
11 pages
Data Science Career Guide Course Guidebook
No ratings yet
Data Science Career Guide Course Guidebook
18 pages
Group 8 - EFC Project Report
No ratings yet
Group 8 - EFC Project Report
21 pages
Compre2013 Analytical MBC
No ratings yet
Compre2013 Analytical MBC
3 pages
Linear Regression For Machine Learning
100% (1)
Linear Regression For Machine Learning
2 pages
Pengaruh Harga Dan Kualitas Produk Terhadap Keputusan Pembelian
No ratings yet
Pengaruh Harga Dan Kualitas Produk Terhadap Keputusan Pembelian
15 pages
Renewable Energy Sources - Combined Notes For The Lecture Series
100% (1)
Renewable Energy Sources - Combined Notes For The Lecture Series
231 pages
Essays On Impact of Microfinance On Poverty Allviation
No ratings yet
Essays On Impact of Microfinance On Poverty Allviation
199 pages
Heckman Selection Presentation - V3
No ratings yet
Heckman Selection Presentation - V3
3 pages
Chapter 15: Logistics, Distribution, and Transportation
No ratings yet
Chapter 15: Logistics, Distribution, and Transportation
30 pages
Interpretasi Panel Eviews-1
No ratings yet
Interpretasi Panel Eviews-1
9 pages
Language Anxiety, Acculturation, and L2 Self: A Relational Analysis in The Taiwanese Cultural Context
No ratings yet
Language Anxiety, Acculturation, and L2 Self: A Relational Analysis in The Taiwanese Cultural Context
11 pages
Applying The Rossiter-Percy Grid To Online Advertising Planning - The Role of Product - Brand Type in Previsit Intentions
No ratings yet
Applying The Rossiter-Percy Grid To Online Advertising Planning - The Role of Product - Brand Type in Previsit Intentions
9 pages
Simple Classical Forecasting Methods:, ,, ,, and - ARIMA Models (Box-Jenkins Procedure)
No ratings yet
Simple Classical Forecasting Methods:, ,, ,, and - ARIMA Models (Box-Jenkins Procedure)
7 pages

Residual Analysis and test_02

Uploaded by

Residual Analysis and test_02

Uploaded by

ICS422 Applied Predictive Analytics [3- 0-0-3]

Linear regression Residual Analysis

ICS 223 Compiler Design 7

ICS 223 Compiler Design 8

ICS 223 Compiler Design 9

ICS 223 Compiler Design 10

• Residual (n): a quantity remaining after other things

ICS 223 Compiler Design 11

ICS 223 Compiler Design 12

• Residuals offer the best information about the error

ICS 223 Compiler Design 13

• What happens if the residual analysis reveals

ICS 223 Compiler Design 18

ICS 223 Compiler Design 19

ICS 223 Compiler Design 20

• A measure of the variability of the observation about

ICS 223 Compiler Design 22

ICS 223 Compiler Design 25

ICS 223 Compiler Design 27

ICS 223 Compiler Design 30

ICS 223 Compiler Design 32

• Can we reject null hypothesis have slope as zero?

ICS 223 Compiler Design 33

ICS 223 Compiler Design 34

ICS 223 Compiler Design 35

You might also like