0% found this document useful (0 votes)
87 views

Multiple Linear Regression 2021

1) Multiple linear regression analyzes the relationship between a dependent variable (e.g. PEFR) and multiple independent variables (e.g. age, weight, height). 2) Simple linear regression analysis found age, weight, and height were each significant factors for PEFR when considered individually. 3) The next steps will include variable selection using multiple linear regression and checking assumptions of the final regression model.

Uploaded by

notepadhajar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views

Multiple Linear Regression 2021

1) Multiple linear regression analyzes the relationship between a dependent variable (e.g. PEFR) and multiple independent variables (e.g. age, weight, height). 2) Simple linear regression analysis found age, weight, and height were each significant factors for PEFR when considered individually. 3) The next steps will include variable selection using multiple linear regression and checking assumptions of the final regression model.

Uploaded by

notepadhajar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

MULTIPLE LINEAR

REGRESSION
Professor Dr. Syed Hatim Noor
Dr. Wan Arfah Nadiah Wan Abdul Jamil
Universiti Sultan Zainal Abidin
Introduction
• Multiple linear regression • Outcome is a numerical
is the estimation of the variable
linear relationship
between a dependent • If independent variables
variable and more than are numerical (called
one independent Multiple Linear
variables or covariates Regression)
• Applied in exploratory • If independent variables
and explanatory studies are combination of
numerical and categorical
or categorical only
(called General Linear
Regression)
Syed Hatim Noor 2
Multiple Linear Regression
Model
• Y = β0 + β1X1 + β2X2 + β3X3 +……. βnXn
• Y is the outcome
• β0 is intercept
• β1…….. βn - regression coefficient for individual
independent variable
• X1 ……. Xn - individual independent variables
• Our interest is in regression coefficient, its 95%
confidence interval and corresponding p-value

Syed Hatim Noor 3


Example
• Relationship between • Need to look at the
(age, weight, height) relationship when all
and PEFR (Peaked variables are
Expiratory Flow Rate) considered at the
• If looking at same time
relationship of (multivariable
individual analysis) towards the
independent variable outcome PEFR
to PEFR (univariable • We can also look into
analysis) – may not presence of
be biologically sound confounding and
interactions
Syed Hatim Noor 4
Steps in Multiple Linear
Regression
(1) Data exploration & cleaning
(2) Univariable analysis (Simple Linear Regression)
(3) Variables selection (Multiple Linear Regression)
(preliminary main effect model)
(4) Checking multicollinearity & interaction
(preliminary final model)
(5) Checking assumptions (LINE-I-I) (final model)
(6) Interpretation, conclusion & presentation

Syed Hatim Noor 5


Step 1: Data exploration
and cleaning

Descriptives
Histogram
Statistic Std. Error
age Mean 65.26 .216
200
95% Confidence Lower Bound 64.84
Interval for Mean Upper Bound
65.68
150
5% Trimmed Mean 64.76
Median 64.00

Frequency
Variance 63.737
100
Std. Deviation 7.984
Minimum 45
Maximum 95
50
Range 50
Interquartile Range 11
Mean = 65.26
Skewness .824 .066 Std. Dev. = 7.984
0 N = 1,369
Kurtosis .337 .132 50 60 70 80 90
age

Syed Hatim Noor 6


Step 2: Simple Linear
Regression
• Refer to simple linear regression lecture
• Do SLR analyses for age and PEFR,
weight and PEFR, height and PEFR

Syed Hatim Noor 7


Regression equation
Regression equation
Y = a + bx

a,b = regression coefficients


a = an intercept of the
regression line (value of Y
when X = 0)
b = a slope of the line (an
amount of change in Y for
a unit change in X)

Syed Hatim Noor 8


(b = 0.65)

Syed Hatim Noor 9


Regression equation

• Y = a + bx
• Weight = -39.67 + (0.65*Height)

Syed Hatim Noor 10


SLR (age and PEFR)
700

600

500 Coefficientsa

Unstandardized Standardized
Coefficients Coefficients 95% Confidence Interval for B
PEFR

400 Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) 537.876 26.068 20.634 .000 486.739 589.013
age -3.289 .396 -.219 -8.295 .000 -4.067 -2.511
a. Dependent Variable: PEFR
300

200
R Sq Linear = 0.048

100

40 50 60 70 80 90 100
age

• Age is a significant factor for PEFR by SLR


analysis
Syed Hatim Noor 11
SLR (weight and PEFR)
700

600

Coefficientsa

500 Unstandardized Standardized


Coefficients Coefficients 95% Confidence Interval for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
PEFR

400
1 (Constant) 98.270 18.746 5.242 .000 61.495 135.044
weight 4.190 .344 .313 12.166 .000 3.514 4.865
a. Dependent Variable: PEFR
300

200
R Sq Linear = 0.098

100

30 40 50 60 70 80
weight

• Weight is a significant factor for PEFR by SLR


analysis
Syed Hatim Noor 12
SLR (height and PEFR)
700

600

Coefficientsa
500
Unstandardized Standardized
Coefficients Coefficients 95% Confidence Interval for B
PEFR

Model B Std. Error Beta t Sig. Lower Bound Upper Bound


400
1 (Constant) -223.854 46.830 -4.780 .000 -315.720 -131.988
height 3.566 .305 .302 11.708 .000 2.969 4.164
300
a. Dependent Variable: PEFR

200
R Sq Linear = 0.091

100

120 140 160 180


height

• Height is a significant factor for PEFR by SLR


analysis
Syed Hatim Noor 13
Table (1) Associated factors of PEFR amongst patients admitted to HUSM
(n=1369) by Simple linear regression
Variables Simple linear regression
b (95% CI) p-value
Age (years) -3.29 (-4.07,-2.51) <0.001

Weight (Kg) 4.19 (3.51, 4.87) <0.001

Height (cm) 3.57 (2.97, 4.16) <0.001

Syed Hatim Noor 14


Step 3: Variables Selection
(Multiple Linear Regression)
• Selection is done by automatic procedure
• A number of methods available
• Forward , backward, stepwise procedures
• All methods should be done and select the
model with all variables significant (called
the largest model) as the preliminary main
effect model

Syed Hatim Noor 15


Step 3: Variables Selection
(Multiple Linear Regression)

Syed Hatim Noor 16


Forward method
Coefficientsa

Unstandardized Standardized
Coefficients Coefficients 95% Confidence Interval for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) 98.270 18.746 5.242 .000 61.495 135.044
weight 4.190 .344 .313 12.166 .000 3.514 4.865
2 (Constant) -242.718 45.631 -5.319 .000 -332.232 -153.204
weight 3.155 .360 .235 8.774 .000 2.449 3.860
height 2.585 .317 .219 8.159 .000 1.964 3.207
3 (Constant) -63.934 53.764 -1.189 .235 -169.402 41.534
weight 2.692 .363 .201 7.417 .000 1.980 3.404
height 2.571 .313 .218 8.221 .000 1.958 3.185
age -2.326 .382 -.155 -6.090 .000 -3.076 -1.577
a. Dependent Variable: PEFR

Syed Hatim Noor 17


Backward method
Coefficientsa

Unstandardized Standardized
Coefficients Coefficients 95% Confidence Interval for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) -63.934 53.764 -1.189 .235 -169.402 41.534
age -2.326 .382 -.155 -6.090 .000 -3.076 -1.577
weight 2.692 .363 .201 7.417 .000 1.980 3.404
height 2.571 .313 .218 8.221 .000 1.958 3.185
a. Dependent Variable: PEFR

• In backward model, since all variables were significant,


there was only one model and no variable was excluded
from the model

Syed Hatim Noor 18


Stepwise method
Coefficientsa

Unstandardized Standardized
Coefficients Coefficients 95% Confidence Interval for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) 98.270 18.746 5.242 .000 61.495 135.044
weight 4.190 .344 .313 12.166 .000 3.514 4.865
2 (Constant) -242.718 45.631 -5.319 .000 -332.232 -153.204
weight 3.155 .360 .235 8.774 .000 2.449 3.860
height 2.585 .317 .219 8.159 .000 1.964 3.207
3 (Constant) -63.934 53.764 -1.189 .235 -169.402 41.534
weight 2.692 .363 .201 7.417 .000 1.980 3.404
height 2.571 .313 .218 8.221 .000 1.958 3.185
age -2.326 .382 -.155 -6.090 .000 -3.076 -1.577
a. Dependent Variable: PEFR

• Y = β0 + β1X1 + β2X2 + β3X3 +……. βnXn


• PEFR= -63.93 + (2.69*Weight) + (2.57*Height) – (2.33*Age)

Syed Hatim Noor 19


Step 4: Checking multicollinearity
• Need to check whether • Important variables may
independent variables in have been removed
preliminary main effect • In SPSS, use “enter”
models are correlated or option and “collinearity
not diagnostic” to determine
• If so, then there is a high Variance Inflation Factor
chance of getting (VIF)
inaccurate p-values and • If VIF is more than 10,
wide 95% confidence then there is a
interval of regression multicollinearity amongst
coefficients independent variables

Syed Hatim Noor 20


Step 4: Checking multicollinearity

• Collinearity diagnostics in statistics

Syed Hatim Noor 21


Step 4: Checking multicollinearity
Coefficientsa

Unstandardized Standardized
Coefficients Coefficients 95% Confidence Interval for B Collinearity Statistics
Model B Std. Error Beta t Sig. Lower Bound Upper Bound Tolerance VIF
1 (Constant) -63.934 53.764 -1.189 .235 -169.402 41.534
age-IC -2.326 .382 -.155 -6.090 .000 -3.076 -1.577 .949 1.054
weight 2.692 .363 .201 7.417 .000 1.980 3.404 .837 1.195
height 2.571 .313 .218 8.221 .000 1.958 3.185 .875 1.142
a. Dependent Variable: PEFR

• Every VIF for each independent variable is less


than 10. There is no multicollinearity problem in
this model.

Syed Hatim Noor 22


Step 4: Checking Interactions
• Interaction term • An interaction term is
needs to be added to the model
computed as an independent
• In SPSS, Transform variable (check only
then compute (by one term at one time)
multiplying two-way • If an interaction term
interaction) is statistically
• Need to be significant, then
biologically model should be main
meaningful effect terms with the
significant interaction
term
Syed Hatim Noor 23
Step 4: Checking interactions

Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -237.204 155.635 -1.524 .128
age-IC .270 2.221 .018 .121 .903
weight 5.962 2.780 .445 2.144 .032
height 2.586 .313 .219 8.264 .000
age_wt -.050 .042 -.267 -1.186 .236
a. Dependent Variable: PEFR

Syed Hatim Noor 24


Step 4: Checking interactions
Coefficientsa Coefficientsa

Unstandardized Standardized Unstandardized Standardized


Coefficients Coefficients Coefficients Coefficients
Model B Std. Error Beta t Sig. Model B Std. Error Beta t Sig.
1 (Constant) -92.221 371.671 -.248 .804 1 (Constant) 251.041 293.834 .854 .393
age-IC -1.894 5.630 -.126 -.336 .737 age-IC -2.348 .382 -.156 -6.138 .000
weight 2.693 .364 .201 7.408 .000 weight -3.002 5.235 -.224 -.573 .566
height 2.756 2.418 .233 1.140 .255 height .498 1.927 .042 .258 .796
age_ht -.003 .037 -.032 -.077 .939 wt_ht .038 .034 .514 1.090 .276
a. Dependent Variable: PEFR a. Dependent Variable: PEFR

• There is no any interaction detected amongst


independent variables

Syed Hatim Noor 25


Step 5:
Checking assumptions of the model (LINE-I-I)
Linearity (L) The relationship between Xs and Y is linear over the
(Overall linearity) range of values studied

Independence (I) Independent sample


Normality of the response The response variable, Y, has a normal distribution
variable (N) at each value of the explanatory variables, Xs
Homoscedasticity The distribution of Y has equal variances (or
(Equal variances) (E) standard deviations) at each value of Xs, that is,
standard deviations of Y is the same no matter what
the value of Xs
Fit of independent There is no peculiar relationship between
numerical variables (I) independent numerical variables and outcome
(Independent numerical variable. Independent numerical variables have
variable linearity) linear relationship with the outcome
Interactions between each All possible two way-Interactions among or between
independent variables (I) independent variables

Syed Hatim Noor 26


Step 5:
Checking assumptions of the model (LINE-I-I)
Linearity (L) Scatter plot between residuals and predicted
(Overall linearity) values (XP-YR)
Independence (I) Settled during design stage

Normality of the (1) Histogram with overlaid normal curve of


response variable (N) residuals
(2) Box and Whisker plot of residuals

Homoscedasticity Scatter plot between residuals and predicted


(Equal variances) (E) values (XP-YR)

Fit of independent Scatter plot between residuals and independent


numerical variables (I) numerical variables (XI-YR)

Interactions between All possible two way-Interactions among or


each independent between independent variables
variables (I)
Syed Hatim Noor 27
Step 5:
Predicted values and residuals

Syed Hatim Noor 28


Step 5:
Predicted values and residuals

• Y = β0 + β1X1 + β2X2 + β3X3 +……. βnXn


• PEFR (predicted)= -63.934 + (2.692*48) +
(2.571*146) – (2.326*55) = 312.60
• Residuals = PEFR (observed) – PEFR
(predicted)
• Residuals = 390 – 312.60 = 77.40
Syed Hatim Noor 29
Residuals
• The difference between the
observed value of the
dependent variable and the
value predicted by the
regression line

77.26

Syed Hatim Noor 30


Residuals

77.26

Syed Hatim Noor 31


Step 5: Checking assumptions of
linearity & equal variance

400.00000

200.00000
Unstandardized Residual

0.00000

-200.00000

-400.00000

150.00000 200.00000 250.00000 300.00000 350.00000 400.00000 450.00000


Unstandardized Predicted Value

Syed Hatim Noor 32


Step 5: Checking assumptions of
linearity & equal variance

Syed Hatim Noor 33


Step 5: Checking assumptions of
linearity & equal variance
400.00000
• For linearity, if there is
a peculiar shape of
200.00000
concavity or convexity
Unstandardized Residual

of fitted observations,
0.00000 then assumption is
not met
-200.00000
• There is no peculiar
feature here, linearity
-400.00000

150.00000 200.00000 250.00000 300.00000 350.00000 400.00000 450.00000


assumption is fulfilled
Unstandardized Predicted Value

Syed Hatim Noor 34


Step 5: Checking assumptions of
linearity & equal variance
• For equal variance, if
there is a peculiar
400.00000

shape of divergence
200.00000

or convergence or
Unstandardized Residual

fan-shape of fitted
0.00000
observations, then
assumption is not met
• There is no peculiar
-200.00000

-400.00000
feature here, equal
150.00000 200.00000 250.00000 300.00000 350.00000 400.00000 450.00000
variance assumption
Unstandardized Predicted Value
is fulfilled

Syed Hatim Noor 35


Linearity and equal variance


Observations
along the line

Linearity  Equal variance 

Concavity Convexity Divergent Convergent


Syed Hatim Noor 36
Step 5: Checking assumption
of normality

120

100

80
Frequency

60

40

20

Mean = -1.4599433E-
0 14
Std. Dev. =

20
-3

-2

-1

0.

30
10
109.77281718
00
00

00

00

0.

0.
0.

00
00

00
N = 1,369
.0

.0

.0

00
00

00

00

00
0

00
00
00

00

00

0
0
Unstandardized Residual

Syed Hatim Noor 37


Step 5: Checking assumption
of normality

400.00000
149
212

200.00000

0.00000

-200.00000

260
1,007
875
918

-400.00000

Unstandardized Residual
Syed Hatim Noor 38
Step 5: Checking assumption
of normality
120 400.00000
149
212

100

200.00000

80
Frequency

60
0.00000

40

-200.00000
20

260
Mean = -1.4599433E- 1,007
14 875
0 918
Std. Dev. =
20
-3

-2

-1

0.

30
10

109.77281718
00
00

00

00

0.

0.
0.

-400.00000
00
00

00

N = 1,369
.0

.0

.0

00
00

00

00

00
0

00
00
00

00

00

0
0

Unstandardized Residual Unstandardized Residual

Syed Hatim Noor 39


Step 5: Checking linearity of
independent numerical variables
400.00000 400.00000

200.00000 200.00000
Unstandardized Residual

Unstandardized Residual
0.00000 0.00000

-200.00000 -200.00000

-400.00000 -400.00000

40 50 60 70 80 90 100 30 40 50 60 70 80
age weight

Syed Hatim Noor 40


Step 5: Checking linearity of
independent numerical variables
• There is no peculiar
relationships between
400.00000

independent
200.00000

numerical variables
Unstandardized Residual

and residuals
• The forms of
0.00000

-200.00000
independent
numerical variables
-400.00000
are appropriate
120 140
height
160 180 • They have linear
relationship with the
residuals
Syed Hatim Noor 41
A Checklist for reporting
Multiple Linear Regression
• The relationship of interest or • Confirm that the assumptions
the purpose of the analysis are fulfilled (text and footnote
(text) of the table )
• Sample size • Inform about interactions and
• Multiple linear regression multicollinearity problem
equation • Report how any outlying data
• Regression coefficients, their are treated (e.g. transformation
95% confidence interval and of data if normality assumption
the actual p-value of each is not met) (text)
independent variable in the • Name the statistical package
final model used in the analysis (text)
• Coefficient of determination
(R2)

Syed Hatim Noor 42


Step 6: Interpretation,
Conclusion and Presentation
Table (1) Associated factors of PEFR amongst patients admitted to HUSM (n=1369)
Variables Simple Linear Regression Multiple Linear Regression
ba (95% CI) P-value bb (95% CI) P-value
Weight (kg) 4.19 (3.51, 4.87) <0.001 2.69 (1.98, 3.40) <0.001

Height (cm) 3.57 (2.97, 4.16) <0.001 2.57 (1.96, 3.19) <0.001

Age (years) -3.29 (-4.07, -2.51) <0.001 -2.33 (-3.08, -1.58) <0.001

a Crude regression coefficient


b Adjusted regression coefficient
Forward multiple linear regression method applied. Model assumptions are
fulfilled.
No multicollinearity detected. There were no interactions amongst independent
variables.
Coefficient of determination (R2) = 0.162
Final model equation
PEFR= -63.93 + (2.69*Weight) + (2.57*Height) – (2.33*Age) 43
Interpretation of results
• Results from multiple linear • There is a significant linear
regression analysis has shown negative relationship between
that there is a significant linear age and PEFR. Those who are
positive relationship between 1 year older have 2.33 unit
weight and PEFR. Those who
are 1 kg heavier have 2.69 unit lower in PEFR (adjusted b=-
higher in PEFR (adjusted 2.33; 95% CI -3.08, -1.58;
b=2.69; 95% CI 1.98, 3.40; p<0.001)
p<0.001) • Sixteen percent (16.2%) of the
• There is a significant linear variation in PEFR is explained
positive relationship between by weight, height and age
height and PEFR. Those who according to the multiple linear
are 1 cm taller have 2.57 unit
higher in PEFR (adjusted regression model (R2=0.162)
b=2.57; 95% CI 1.96, 3.19;
p<0.001)

Syed Hatim Noor 44


Thank You

You might also like