0% found this document useful (0 votes)

7 views

Simple Linear Regression

The document outlines the concepts and skills related to simple linear regression, including its statistical model, assumptions, and diagnostics. It emphasizes the importance of understanding variability, model fit, and the influence of data points on regression analysis. Additionally, it provides a structured learning plan for students, detailing lectures, workshops, and assessments over a two-week period.

Uploaded by

nightflameprayer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Simple Linear Regression

Uploaded by

nightflameprayer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Simple linear

regression as a
statistical model
Pete Brennan outcome = model + error
[email protected] In general, a model is a
representation of a person
School of Physiology, Pharmacology or a system which provides
& Neuroscience
some information about it.
(Wikipedia)
Concepts and Skills
(PHPH30005/30007/M0011)
C & S statistics week 1
Week 1 Lecture/ Workshop Online SPSS
consolidation tutorials
Monday L1 Statistical
models
Intro to statistics
Tuesday L2 Multiple linear Simple linear
regression regression (A)

L3 ANOVA and SPSS basics

planned contrasts workshop
Wednesday Regression
diagnostics (A)
Thursday C1 – questions Multiple linear
and answers regression (A)
Friday SPSS workshop 1 Planned contrasts
multiple linear (A)
regression
C & S statistics week 2
Week 1 Lecture/ Workshop Online SPSS
consolidation tutorials
Monday L4 Factorial
ANOVA
L5 Nonlinear
regression
Tuesday L6 Categorical Factorial ANOVA
data analysis (A) Robust
analysis
Wednesday Repeated
measures ANOVA
(A)
Thursday C2 – questions Nonlinear
and answers regression (A)
Friday SPSS workshop 2 Categorical data
Factorial ANOVA analysis (A)
C & S statistics assessment
Week 4 Lecture/ Exam
consolidation
Friday C3 – questions and
answers
Week 5
Thursday Formative MCQ
statistics exam
January exam week 1
Date to be confirmed Summative MCQ
statistics exam

(Advanced Concepts and Skills exam for Study in Industry students is

different from BSc MCQ exam
- Advanced statistics formative exam submitted in week 5
- Advanced statistics timed assessment in January Exam week 1)
Learning objectives
Explain how variability can be explained by the addition of random
error to a statistical model.
Explain how simple linear regression fits a straight-line model to data
based on minimising the sum of squares of the residuals (SSR)
Explain how the total variability in data can be split into variability
explained by the model and residual variability.
Calculate R2 from SST and SSM and interpret it.
Describe the assumptions underlying simple linear regression.
Explain the difference between distance and leverage of a data point.
Explain how diagnostic statistics can be used to assess the influence
of an individual data point on a model.
Explain what approaches you can take if you discover influential data
points in your model.
Regression

Regression is not the same as correlation!

Correlation tests whether there is an association
between two variables
Regression tests how well the data fit a theoretical
model
Regression is used to
- Quantify the effect of a predictor variable on a dependent
variable
- Predict values of a dependent variable based on the values of
one or more predictor variables
- Adjust for effects of confounding variables
What is a statistical model?

A mathematical relationship that represents the

most important features and relationships in data.

If the world was perfectly predictable

outcome = model

But the world is inherently variable so usually -

outcome = model + error

The error (not the outcome variable!) is assumed to be
normally distributed.
The mean as a statistical model
The mean is a model of the
underlying population mean.
Residuals are the
differences between each
data point and the sample
variable y

mean.
𝑦 residuals The sum of squares (SS)
is used as a measure of
the variability in the data.
The mean is the statistical
model that minimizes the
sum of squares of the
yi = (sum yi)/n + errori residuals (SSR).
residual
model variability
Simple linear regression
Models the linear
relationship between a
Dependent variable y

residuals
continuous dependent
b1 variable y and a single
1 independent (predictor)
variable x.
The slope coefficient (b1)
gives the amount by
b0
which y changes for each
0 unit change in x
Independent variable x
The best model minimizes
yi = b0 + b1x1i + errori SSR and so explains most
residual of the variability in the data.
model variability
Sources of variability
outcome = model + error
SST = SSM + SSR
residual
unexplained
total variability variability
of data (error)

variability
explained by
model
The coefficient of determination R2
For the null hypothesis H0
Dependent variable y

model
slope b1 = 0
SS of differences of data
from null hypothesis (blue
null hypothesis
dotted lines) gives SST
SS of differences of model
from null hypothesis (green
Independent variable x dotted lines) gives SSM

R2 = the proportion of the total variability in y that can

be explained by its relationship with x.
R2 = SSM/SST
variability explained total variability
by model of data
Testing model fit and slope
Testing model fit
Dependent variable y

model
The significance of R2 (fit of the
model) is tested by calculating an
F ratio and associated p value.
null hypothesis F = MSmodel/MSresidual (error)
The higher the F ratio, the lower
the p value and the more
significant the fit of the model to
the data.
Independent variable x

Testing model slope ( the effect of the predictor)

Is the model slope significantly different from null hypothesis slope of 0?
t = (bmodel - bnull)/SE(bmodel) = bmodel/SE(bmodel)
p value obtained from t distribution with N-2 degrees of freedom
Assumptions of simple linear
regression
1. x and y are asymmetrical – x (independent) predicts y (dependent)
2. The independent variable (x) is measured or controlled without
significant error.
3. The independent variable (x) is not entangled with the dependent
variable (y)
4. Linear relationship between x and y
5. The data points are independent
Dependent variable y

Independent data Related data points are not -

points are okay pseudoreplication

√ X
drug dose drug dose
Assumptions of simple linear
regression (continued)
Normality of residuals the scatter of y values from the
fitted line is random and approximately normally
distributed.
Homoscedasticity - the variance of the residuals is
reasonably similar across all x values.
Dependent variable y

sd of distribution equal
along the line =
homoscedasticity

drug dose
Simple linear regression –
how robust is the model?

Outliers can be identified from x,y plot, or from standardised residual plot.
red residual > blue residual
red residual squared >> blue residual squared
Distance and leverage
F(1, 17) = 1.23 F(1, 17) = 5.54 F(1, 16) = 8.36
A p = 0.283
B p = 0.031
C p = 0.011
R2 = 0.068 R2 = 0.246 R2 = 0.343

b1 = 0.120 b1 = 0.275 b1 = 0.271

Both the size of the residual (distance) and its leverage (the
distance from the mean of the predictor) will affect the effect of the
data point on the model fit and regression coefficients.
The robustness of the model can be checked by reanalysing the
data with the datapoint removed in a robustness test.
Model diagnostics
Is the regression model stable or is it biased by a few cases?

Standardised residual > ±3 is certainly worth checking as an

outlier.

If a lot more that 5% of standardised residuals > ±2 it may

indicate model is a poor fit.

Standardised DFFit > ±1 means a substantial influence on the fit

of the model (F ratio and R2)

Standardised DFBeta > ±1 means a substantial influence on the

size of the respective slope coefficient for the model (bslope).
Effect on diagnostic statistics
F(1, 17) = 1.23 F(1, 17) = 5.54 F(1, 16) = 8.36
A p = 0.283
B p = 0.031
C p = 0.011
R2 = 0.068 R2 = 0.246 R = 0.343
2

b1 = 0.120 b1 = 0.275 b1 = 0.271

Value for red point in A Value for red point in B Diagnostic statistic
-2.366 -2.509 Standardised residual
0.643 0.185 Cook’s distance
0.197 0.0001 Leverage value
-1.25 -0.756 Standardised DFFit
-1.11 -0.035 Standardised DFBeta
R = 0.649
2
Influential
8
points may not
5
6
7

have high
2
3
4
standardized
1
residuals!
What to do about it?
Outlier:
Check for typing/measurement error and correct.
Check underlying distribution of standardised residuals
If there is no obvious source of error and distribution of
residuals looks okay then some disciplines would recommend
removing outliers on the basis of pre-specified criteria, such as
standardised residual > ±3.
But the occasional outlier is expected!

Influential data points:

Run analysis with and without data point(s) and report how this
affects the model parameters and your conclusions (robustness
test).

MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet
The "EXPO-roof" in Hanover - A New Dimension For Ripped Shells in Timber
No ratings yet
The "EXPO-roof" in Hanover - A New Dimension For Ripped Shells in Timber
8 pages
Introduction of Regression
No ratings yet
Introduction of Regression
57 pages
Linear Regression
100% (2)
Linear Regression
28 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Intro To Regresion: Codergirl Data Analysis
No ratings yet
Intro To Regresion: Codergirl Data Analysis
32 pages
Topic Simple Linear Regression
No ratings yet
Topic Simple Linear Regression
38 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
No ratings yet
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
16 pages
Ch08 - Linear Regression
No ratings yet
Ch08 - Linear Regression
37 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
RiP Final Study Doc
No ratings yet
RiP Final Study Doc
35 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Lecture 12 Regression
No ratings yet
Lecture 12 Regression
55 pages
Linear Regression
No ratings yet
Linear Regression
38 pages
Chapter 8 Regression Model - 2023
No ratings yet
Chapter 8 Regression Model - 2023
21 pages
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
No ratings yet
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
67 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
DA-3rd unit
No ratings yet
DA-3rd unit
16 pages
Regression Models - Follow
No ratings yet
Regression Models - Follow
7 pages
Chapter 5
No ratings yet
Chapter 5
73 pages
Intro to reg models
No ratings yet
Intro to reg models
27 pages
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
No ratings yet
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
19 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
50 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
Dsur I Chapter 07 Linear Regression
No ratings yet
Dsur I Chapter 07 Linear Regression
18 pages
3 Da
No ratings yet
3 Da
16 pages
Stats101A - Chapter 3
No ratings yet
Stats101A - Chapter 3
54 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Statistics: Introduction To Regression
No ratings yet
Statistics: Introduction To Regression
14 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Model Development
No ratings yet
Model Development
80 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Lecture-3---Linear-Regression-imran-20022025-092939am
No ratings yet
Lecture-3---Linear-Regression-imran-20022025-092939am
46 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
65 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
FDA UNIT 5
No ratings yet
FDA UNIT 5
20 pages
Regression
No ratings yet
Regression
24 pages
Unit-III (Data Analytics)
100% (1)
Unit-III (Data Analytics)
15 pages
DA-MODULE-3
No ratings yet
DA-MODULE-3
54 pages
Linear Regression Model
No ratings yet
Linear Regression Model
36 pages
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
100% (1)
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
72 pages
Business Statistics II
100% (2)
Business Statistics II
100 pages
REGRESSION
No ratings yet
REGRESSION
8 pages
Sec2 Regression PDF
No ratings yet
Sec2 Regression PDF
183 pages
Linear Regression For Intermediate
No ratings yet
Linear Regression For Intermediate
6 pages
Regression Kann Ur 14
No ratings yet
Regression Kann Ur 14
43 pages
Lecture Notes - Linear Regression
No ratings yet
Lecture Notes - Linear Regression
26 pages
Unit 3
No ratings yet
Unit 3
24 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
II-I_MCA_Data Science and Analytics_Course Material_Unit2
No ratings yet
II-I_MCA_Data Science and Analytics_Course Material_Unit2
15 pages
Linear_Regression (1)
No ratings yet
Linear_Regression (1)
35 pages
Regression
No ratings yet
Regression
56 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
MODULE-3
No ratings yet
MODULE-3
34 pages
Session_19&20
No ratings yet
Session_19&20
54 pages
HSC Maths July 2019 Fe
No ratings yet
HSC Maths July 2019 Fe
3 pages
PMP Questions
100% (1)
PMP Questions
8 pages
Sheet 1
No ratings yet
Sheet 1
3 pages
Specular Showdown in The Wild West - Self Shadow
No ratings yet
Specular Showdown in The Wild West - Self Shadow
9 pages
RRL AMARO CORTEZ JMMsigned-ACB
No ratings yet
RRL AMARO CORTEZ JMMsigned-ACB
15 pages
Deep Sea Electronics: DSE3110 Operator Manual Document Number: 057-086
No ratings yet
Deep Sea Electronics: DSE3110 Operator Manual Document Number: 057-086
56 pages
Cauchy-Schwarz Inequality - Wikipedia, The Free Encyclopedia
No ratings yet
Cauchy-Schwarz Inequality - Wikipedia, The Free Encyclopedia
9 pages
AM of Ceramic Based Materials
No ratings yet
AM of Ceramic Based Materials
26 pages
Presto DE
No ratings yet
Presto DE
2 pages
Heat Exchanger
100% (1)
Heat Exchanger
16 pages
Lab 2 Worksheet - Plate Tectonics in Person Spring 2023
No ratings yet
Lab 2 Worksheet - Plate Tectonics in Person Spring 2023
14 pages
IP Lab 6 Respiratory Protocols UPDATED
No ratings yet
IP Lab 6 Respiratory Protocols UPDATED
7 pages
Code Freeze
No ratings yet
Code Freeze
5 pages
Spacelabs Ultraview 1030, 1050 Monitor - Circuit Diagrams
No ratings yet
Spacelabs Ultraview 1030, 1050 Monitor - Circuit Diagrams
46 pages
Bouncing Balls Experiment Coursework
100% (2)
Bouncing Balls Experiment Coursework
7 pages
Clover Intv QSTN
No ratings yet
Clover Intv QSTN
10 pages
FMWC Calendar
No ratings yet
FMWC Calendar
1 page
Masterpress User Manual
No ratings yet
Masterpress User Manual
11 pages
M03 Properties of Materials
No ratings yet
M03 Properties of Materials
82 pages
Chloride Induced Corrosion and Sulphate Attack - A Literature Review On Concrete Durability
No ratings yet
Chloride Induced Corrosion and Sulphate Attack - A Literature Review On Concrete Durability
13 pages
Log Cat 1657605944376
No ratings yet
Log Cat 1657605944376
1,565 pages
ECU - Application Manual KDI - 1903TCR - 2504TCR - StageV
No ratings yet
ECU - Application Manual KDI - 1903TCR - 2504TCR - StageV
29 pages
Poc II Lab Manual
No ratings yet
Poc II Lab Manual
13 pages
User Exits For ME22N
No ratings yet
User Exits For ME22N
4 pages
2 Thin Film Solar Cell-Bt
No ratings yet
2 Thin Film Solar Cell-Bt
6 pages
1st Exam Practice Scratch (Answer)
No ratings yet
1st Exam Practice Scratch (Answer)
2 pages
TYBCA SEM-6 PROJECT Report Format 2024-25 (2)
No ratings yet
TYBCA SEM-6 PROJECT Report Format 2024-25 (2)
5 pages
Liquid-Solid Extraction (Exp 3)
100% (2)
Liquid-Solid Extraction (Exp 3)
13 pages
Atomic Structure-Remaining
No ratings yet
Atomic Structure-Remaining
68 pages

Simple Linear Regression

Uploaded by

Simple Linear Regression

Uploaded by

Simple linear

L3 ANOVA and SPSS basics

(Advanced Concepts and Skills exam for Study in Industry students is

Regression is not the same as correlation!

A mathematical relationship that represents the

If the world was perfectly predictable

But the world is inherently variable so usually -

outcome = model + error

R2 = the proportion of the total variability in y that can

Testing model slope ( the effect of the predictor)

Independent data Related data points are not -

b1 = 0.120 b1 = 0.275 b1 = 0.271

Standardised residual > ±3 is certainly worth checking as an

If a lot more that 5% of standardised residuals > ±2 it may

Standardised DFFit > ±1 means a substantial influence on the fit

Standardised DFBeta > ±1 means a substantial influence on the

b1 = 0.120 b1 = 0.275 b1 = 0.271

Influential data points:

You might also like