0% found this document useful (0 votes)
1 views

AS lecture 07 ( Multiple Linear Regression)

Uploaded by

amiraziz.uet
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

AS lecture 07 ( Multiple Linear Regression)

Uploaded by

amiraziz.uet
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

MULTIPLE LINEAR

REGRESSION
Lecture # 08

Dr. Imran Khalil


[email protected]
Contents
• Multiple Linear Regression (MLR)
• Two Main Objectives
• Assumptions of MLR
• How to Perform a MLR
• Coefficient of Determination (𝑅2 )
• Coefficient of Correlation (𝑟)
• Standard Deviation/Standard Error
• Practice Problems
2
Multiple Linear Regression
• Multiple linear regression is used to estimate the relationship
between two or more independent variables and one dependent
variable. You can use multiple linear regression when you want to
know:
• How strong the relationship is between two or more independent
variables and one dependent variable (e.g. how rainfall, temperature, and
amount of fertilizer added affect crop growth).

3
Two Main Objectives
• Establish if there is a relationship between two variables.
• More specifically, establish if there is a statistically significant
relationship between two variables.
• Examples: Income and spending, wage and gender, student height and
exam scores.
• Forecast new observations.
• Can we use what we know about the relationship to forecast unobserved
values?
• Examples: What will are sales be over the next quarter? What will be the
effect of advertising over sales?

4
How to Perform a Multiple Linear Regression
Multiple Regression Formula
𝒚 = 𝜷𝟎 + 𝜷𝟏 𝒙𝟏 + ⋯ + 𝜷𝒏 𝒙𝒏 + 𝒆
• 𝒚 is the predicted value of the dependent variable
• 𝜷𝟎 is the y-intercept (value of y when other parameters are set to 0.
• 𝜷𝟏 𝒙𝟏 the regression coefficient or slope 𝜷𝟏 of the first independent
variable 𝒙𝟏
• … do the same for however many independent variables you are
testing.
• 𝜷𝒏 𝒙𝒏 the regression coefficient of the last independent variable
• 𝒆 is the error of the estimate (model error), or how much variation
there is in our estimate of the regression coefficient.
5
Regression Analysis: Example
• Suppose that a manager wants to determine the relationship
between the firm’s advertising expenditures and quality control over
its sales revenue. The manager wants to test the hypothesis that
higher advertising expenditures and quality control lead to higher
sales for the firm, and, furthermore, she wants to estimate the
strength of the relationship (i.e., how much sales increase for each
dollar increase in advertising expenditures).
• The manager collects data on advertising expenditures, quality
control and on sales revenue for the firm over the past 10 years.

6
Advertising and Quality Expenditures and Sales Revenues of
the Firm in Each of 10 Years
Scatter Diagram
Advertising Quality
Year Sales (𝐲) Expense Control
(𝐱 𝟏 ) (𝐱 𝟐 )
1 44 10 3
2 40 9 4
3 42 11 3
4 46 12 3
5 48 11 4
6 52 12 5
7 54 13 6
8 58 13 7
9 56 14 7
10 60 15 8
7
Regression Analysis
Regression Line: Line of
Best Fit:
Draw the line, by
visual inspection, the
positively sloped straight
line that “best” fits
between the data points
(so that the data points
are about equally distant
on either side of the
line).

8
Multiple Regression Formula
ෝ = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏 + 𝜷𝟐 𝑿𝟐
𝒚
• 𝛽0 = 𝑦ത + 𝛽1 𝑋ത1 − 𝛽2 𝑋ത2 Regression Sum Calculation

(σ 𝑋1 )2
σ 𝑥22 σ 𝑥1 𝑦 −(σ 𝑥1 𝑥2 )(σ 𝑥2 𝑦) • σ 𝑥12 = σ 𝑋12−
• 𝛽1 = σ 𝑥12 σ 𝑥22 −(σ 𝑥1 𝑥2 )2
𝑛
(σ 𝑋2 )2
• σ 𝑥22 = 2
σ 𝑋2 −
𝑛
σ 𝑋1 σ 𝑦
σ 𝑥12 σ 𝑥2 𝑦 −(σ 𝑥1 𝑥2 )(σ 𝑥1 𝑦) • σ 𝑥1 𝑦 = σ 𝑋𝑦 −
• 𝛽2 = σ 𝑥12 σ 𝑥22 −(σ 𝑥1 𝑥2 )2
𝑛
σ𝑋 σ𝑦
• σ 𝑥2 𝑦 = σ 𝑋𝑦 − 2
𝑛
σ 𝑋1 σ 𝑋2
• 𝑦ത =
σ𝑦 • σ 𝑥1 𝑥2 = σ 𝑋1 𝑋2 −
𝑛 𝑛
σ 𝑋1
• 𝑋1 =
𝑛
σ 𝑋2
• 𝑋2 =
𝑛 9
Advertising and Quality Expenditures and Sales Revenues of
the Firm in Each of 10 Years
Advertising Quality
Year Sales (𝐲) Expense Control 𝑿𝟐𝟏 𝑿𝟐𝟐 𝑿𝟏 𝒚 𝑿𝟐 𝒚 𝑿𝟏 𝑿𝟐 𝒚𝟐
(𝐗 𝟏 ) (𝐗 𝟐 )
1 44 10 3 100 9 440 132 30 1936
2 40 9 4 81 16 360 160 36 1600
3 42 11 3 121 9 462 126 33 1764
4 46 12 3 144 9 552 138 36 2116
5 48 11 4 121 16 528 192 44 2304
6 52 12 5 144 25 624 260 60 2704
7 54 13 6 169 36 756 324 78 2916
8 58 13 7 169 49 754 406 91 3364
9 56 14 7 196 49 784 392 98 3136
10 60 15 8 225 64 900 480 120 3600

෍= 500 120 50 1470 282 6106 2610 626 25440


10
Advertising and Quality Expenditures and Sales Revenues of
the Firm in Each of 10 Years
Advertising Quality
Year Sales (𝐲) Expense Control 𝑿𝟐𝟏 𝑿𝟐𝟐 𝑿𝟏 𝒚 𝑿𝟐 𝒚 𝑿𝟏 𝑿𝟐 𝒚𝟐
(𝐗 𝟏 ) (𝐗 𝟐 )

෍= 500 120 50 1470 282 6106 2610 626 25440

σ𝑦 500
• 𝑦ത = = = 50
𝑛 10
σ 𝑋1 120
• 𝑋1 = = = 12
𝑛 10
σ𝑋 50
• 𝑋2 = 2 = = 5
𝑛 10

11
Advertising and Quality Expenditures and Sales Revenues of
the Firm in Each of 10 Years
Advertising Quality
Year Sales (𝐲) Expense Control 𝑿𝟐𝟏 𝑿𝟐𝟐 𝑿𝟏 𝒚 𝑿𝟐 𝒚 𝑿𝟏 𝑿𝟐 𝒚𝟐
(𝐗 𝟏 ) (𝐗 𝟐 )

෍= 500 120 50 1470 282 6106 2610 626 25440

Regression Sum Calculation

(σ 𝑋1 )2 120 2
• σ 𝑥12 = σ 𝑋12
− = 1470 − = 30
𝑛 10
2 2 (σ 𝑋2 )2 50 2
• σ 𝑥2 = σ 𝑋2 − = 282 − = 32
𝑛 10
σ 𝑋1 σ 𝑦 120 500
• σ 𝑥1 𝑦 = σ 𝑋1 𝑦 − = 6106 − = 106
𝑛 10
σ 𝑋2 σ 𝑦 50 500
• σ 𝑥2 𝑦 = σ 𝑋2 𝑦 − = 2610 − = 110
𝑛 10
σ𝑋 σ𝑋 120 50
• σ 𝑥1 𝑥2 = σ 𝑋1 𝑋2 − 1 2 = 626 − = 26
𝑛 10 12
Advertising and Quality Expenditures and Sales Revenues of
the Firm in Each of 10 Years
Advertising Quality
Year Sales (𝐲) Expense Control 𝑿𝟐𝟏 𝑿𝟐𝟐 𝑿𝟏 𝒚 𝑿𝟐 𝒚 𝑿𝟏 𝑿𝟐 𝒚𝟐
(𝐗 𝟏 ) (𝐗 𝟐 )

෍= 500 120 50 1470 282 6106 2610 626 25440

Regression Sum
Calculation σ 𝑥22 σ 𝑥1 𝑦 −(σ 𝑥1 𝑥2 )(σ 𝑥2 𝑦) 32 106 − 26 (110)
• 𝛽1 = σ 𝑥12 σ 𝑥22 −(σ 𝑥1 𝑥2 )2
= =
30 32 − 26 2
• σ 𝑥12 = 30 1.873
• σ 𝑥22 = 32
• σ 𝑥1 𝑦 = 106 σ 𝑥12 σ 𝑥2 𝑦 −(σ 𝑥1 𝑥2 )(σ 𝑥1 𝑦) 30 110 −(26)(106)
• σ 𝑥2 𝑦 = 110 • 𝛽2 = σ 𝑥12 σ 𝑥22 −(σ 𝑥1 𝑥2 )2
= =
30 32 − 26 2
• σ 𝑥1 𝑥2 = 26
1.915
13
Advertising and Quality Expenditures and Sales Revenues of
the Firm in Each of 10 Years
Advertising Quality
Year Sales (𝐲) Expense Control 𝑿𝟐𝟏 𝑿𝟐𝟐 𝑿𝟏 𝒚 𝑿𝟐 𝒚 𝑿𝟏 𝑿𝟐 𝒚𝟐
(𝐗 𝟏 ) (𝐗 𝟐 )

෍= 500 120 50 1470 282 6106 2610 626 25440

Regression Sum • 𝛽1 = 1.873


Calculation • 𝛽2 = 1.915 • 𝛽0 = 𝑦ത + 𝛽1 𝑋ത1 − 𝛽2 𝑋ത2
• σ 𝑥12 = 30 • 𝑦ത = 50
• σ 𝑥22 = 32
𝛽0 = 50 + 1.873(12) − 1.915(5)
• 𝑋1 = 12
• σ 𝑥1 𝑦 = 106 • 𝑋2 = 5 𝛽0 = 17.949
• σ 𝑥2 𝑦 = 110
• σ 𝑥1 𝑥2 = 26
14
Advertising and Quality Expenditures and Sales Revenues of
the Firm in Each of 10 Years
Advertising Quality
Year Sales (𝐲) Expense Control 𝑿𝟐𝟏 𝑿𝟐𝟐 𝑿𝟏 𝒚 𝑿𝟐 𝒚 𝑿𝟏 𝑿𝟐 𝒚𝟐
(𝐗 𝟏 ) (𝐗 𝟐 )

෍= 500 120 50 1470 282 6106 2610 626 25440

Regression Sum • 𝛽1 = 1.873 𝛽0 = 17.949


Calculation • 𝛽2 = 1.915
σ 𝑥12
ෝ = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏 + 𝜷𝟐 𝑿𝟐
𝒚
• = 30 • 𝑦ത = 50
• σ 𝑥22 = 32 • 𝑋1 = 12
• σ 𝑥1 𝑦 = 106 • 𝑋2 = 5 ෝ = 𝟏𝟕. 𝟗𝟒𝟗 + 𝟏. 𝟖𝟕𝟑 𝑿𝟏 + 𝟏. 𝟗𝟏𝟓 𝑿𝟐
𝒚
• σ 𝑥2 𝑦 = 110
• σ 𝑥1 𝑥2 = 26
15
𝐻0 and 𝐻1 Hypothesis

• 𝑯𝟎 : There is no relationship between X (advertising)


and Y (sales).
• 𝑯𝟎 : There is no relationship between X (Quality
Control) and Y (sales).
• 𝑯𝑨 : There is a significant relationship between X
(advertising) and Y (sales).
• 𝑯𝑨 : There is a significant relationship between X
(Quality Control) and Y (sales).
16
𝒕-statistics or 𝒕 ratio
• The 𝑡-test in linear regression helps you make a statistical decision
about whether to accept or reject the null hypothesis related to the
impact of individual predictor variables on the dependent variable.

𝒃
𝒕=
𝑺𝒃

• The higher this calculated 𝑡 ratio is, the more confident we have
significant relationship between 𝑋 (advertising) and 𝑌 (sales).

17
Tests of Significance
• To test the hypothesis that 𝑏 is statistically significant (i.e. that
advertising positively affects sales), we need first of all to calculate
the standard error (deviation) of 𝑏
σ(𝑌𝑡 − 𝑌෡𝑡 )2
𝑆𝑏 =
(𝑛 − 𝑘) σ(𝑋𝑡 − 𝑋) ത 2

σ 𝑒𝑡2
𝑆𝑏 =
ത 2
(𝑛 − 𝑘) σ(𝑋𝑡 − 𝑋)

18
Advertising and Sales Revenues of the Firm in Each of 10
Years

𝒚
Advertising ഥ𝟏 = 𝟏𝟕. 𝟗𝟒𝟗 ෡
Year Sales (𝐘) 𝑿𝟏 − 𝑿 𝑿𝟏 − 𝑿𝟏 𝟐
𝒆𝒕 = 𝒀 − 𝒀 𝒆𝟐𝒕
Expense (𝐗 𝟏 ) + 𝟏. 𝟖𝟕𝟑 𝑿𝟏
+ 𝟏. 𝟗𝟏𝟓 𝑿𝟐
1 44 10 -2 4 42.424 1.576 2.483776
2 40 9 -3 9 42.466 -2.466 6.081156
3 42 11 -1 1 44.297 -2.297 5.276209
4 46 12 0 0 46.17 -0.17 0.0289
5 48 11 -1 1 46.212 1.788 3.196944
6 52 12 0 0 50 2 4
7 54 13 1 1 53.788 0.212 0.044944
8 58 13 1 1 55.703 2.297 5.276209
9 56 14 2 4 57.576 -1.576 2.483776
10 60 15 2 9 61.364 -1.364 1.860496
෍= 500 120 30 30.73241
19
Quality Control and Sales Revenues of the Firm in Each of 10
Years

𝒚
Quality (𝑿𝟐 = 𝟏𝟕. 𝟗𝟒𝟗 ෡
Year Sales (𝐘) 𝑿𝟐 − 𝑿𝟐 𝒆𝒕 = 𝒀 − 𝒀 𝒆𝟐𝒕
Control (𝐗 𝟐 ) − 𝑿𝟐 )𝟐 + 𝟏. 𝟖𝟕𝟑 𝑿𝟏
+ 𝟏. 𝟗𝟏𝟓 𝑿𝟐
1 44 3 -2 4 42.424 1.576 2.483776
2 40 4 -1 1 42.466 -2.466 6.081156
3 42 3 -2 4 44.297 -2.297 5.276209
4 46 3 -2 4 46.17 -0.17 0.0289
5 48 4 -1 1 46.212 1.788 3.196944
6 52 5 0 0 50 2 4
7 54 6 1 1 53.788 0.212 0.044944
8 58 7 2 4 55.703 2.297 5.276209
9 56 7 2 4 57.576 -1.576 2.483776
10 60 8 3 9 61.364 -1.364 1.860496
෍= 500 ഥ =5
𝑿 32 30.73241
20
Tests of Significance
• To test the hypothesis that 𝑏 is statistically significant (i.e. that
advertising positively affects sales), we need first of all to calculate
the standard error (deviation) of 𝑏

σ(𝑌𝑡 − 𝑌෡𝑡 )2
𝑆𝑏 =
(𝑛 − 𝑘) σ(𝑋𝑡 − 𝑋) ത 2

30.732
𝑆𝑏 = = 0.357
(10 − 2)(30)
21
𝒕-statistics or 𝒕 ratio
𝟏. 𝟖𝟕𝟑
𝒕= = 𝟓. 𝟐𝟒𝟔
𝟎. 𝟑𝟓𝟕

• We compare the calculated 𝑡 ratio to the critical value of the 𝑡


distribution with 2 degree of freedom (𝑑𝑓) with 5% level of
significance.

22
23
𝒕-statistics or 𝒕 ratio
• The critical value is 𝒕 = 𝟐. 𝟑𝟎𝟔 for two tailed 𝑡 test.
• Since our calculated value of 𝒕 = 𝟓. 𝟑𝟒𝟔 exceeds the tabular value of
𝑡 = 2.306 for the 𝟓% level of significance with 𝟖 𝒅𝒇.
𝑡𝑐 > 𝑡
5.246 > 2.306
• We reject the null hypothesis that there is no relationship between
𝑋 (advertising) and 𝑌 (sales) and
• We accept the alternate hypothesis there is a significant relationship
between 𝑋 and 𝑌.
• It means that we are 95% confident that such a relationship exists.
24
Tests of Significance
• To test the hypothesis that 𝑏 is statistically significant (i.e. that
advertising positively affects sales), we need first of all to calculate
the standard error (deviation) of 𝑏

σ(𝑌𝑡 − 𝑌෡𝑡 )2
𝑆𝑏 =
(𝑛 − 𝑘) σ(𝑋𝑡 − 𝑋) ത 2

30.732
𝑆𝑏 = = 0.3464
(10 − 2)(32)
25
𝒕-statistics or 𝒕 ratio
𝟏. 𝟖𝟕𝟑
𝒕= = 𝟓. 𝟓𝟐𝟖
𝟎. 𝟑𝟓𝟕

• We compare the calculated 𝑡 ratio to the critical value of the 𝑡


distribution with 2 degree of freedom (𝑑𝑓) with 5% level of
significance.

26
27
𝒕-statistics or 𝒕 ratio
• The critical value is 𝒕 = 𝟐. 𝟑𝟎𝟔 for two tailed 𝑡 test.
• Since our calculated value of 𝒕 = 𝟓. 𝟓𝟐𝟖 exceeds the tabular value of
𝑡 = 2.306 for the 𝟓% level of significance with 𝟖 𝒅𝒇.
𝑡𝑐 > 𝑡
5.528 > 2.306
• We reject the null hypothesis that there is no relationship between
𝑋 (advertising) and 𝑌 (sales) and
• We accept the alternate hypothesis there is a significant relationship
between 𝑋 and 𝑌.
• It means that we are 95% confident that such a relationship exists.
28
𝟐
Coefficient of determination (𝑹 )
𝑅2 measures how much of the variation in the firm’s sales is explained by the variation
in its advertising expenditures and quality control.
σ𝑦 2
𝛽0 σ 𝑦 + 𝛽1 σ 𝑋1 𝑦 + 𝛽2 σ 𝑋2 𝑦 − 𝑛
2
𝑅 =
2 σ𝑦 2
σ𝑦 −
𝑛
500 2
17.949 500 + 1.873 6106 + 1.915 2616 − 10
𝑅2 =
500 2
25440 −
10
𝑹𝟐 = 𝟎. 𝟗𝟓𝟔
A value of 𝑹𝟐 = 𝟎. 𝟗𝟓𝟔 indicates that 𝟗𝟓. 𝟔% of the variability in 𝒚 is explained by its linear
relationship with the independent variables 𝑋1 , 𝑋2 and only 𝟓% of the variation is due to other
factors, which is not part of this model. 29
Coefficient of correlation (𝒓)
𝒓= 𝑹𝟐

This is simply a measure of the degree of association or


covariation that exists between variables 𝑋1 , 𝑋2 & 𝑌 . For our
advertising-sales example,
𝒓= 𝑹𝟐 = 𝟎. 𝟗𝟓𝟔 = 𝟎. 𝟗𝟕𝟕

This means that variables X & Y vary together 97.7% of the time.

30
Problem: 𝑹 𝟐

Compute the coefficient of determination


𝑛=5 𝛽0 = −1.33
෍ 𝑦 = 89 ෍ 𝑦 2 = 1885

𝛽1 = 0.38 𝛽2 = 1.62
෍ 𝑋1 𝑦 = 619 ෍ 𝑋2 𝑦 = 1007

σ𝑦 2
𝛽0 σ 𝑦 + 𝛽1 σ 𝑋1 𝑦 + 𝛽2 σ 𝑋2 𝑦 − 𝑛
𝑅2 =
2 σ𝑦 2
σ𝑦 −
𝑛

31
Standard Deviation of Regression or
Standard Error of Estimate
• All the observed values of (𝑦, 𝑋1 , 𝑋2 ) do not fall on the regression
line but they scatter away form it.
• The standard error of estimate is the standard deviation of
multiple regression.
• It measure the dispersion of y values about the population
multiple regression equation.
• For a multiple regression with two independent variables
𝑋1 𝑎𝑛𝑑 𝑋2 it is denoted by 𝜎𝑦12 . Here 1 and 2 indicates the
𝑋1 𝑎𝑛𝑑 𝑋2 .
• The sample standard error of estimate denoted by 𝑆𝑦12
32
Standard Deviation of Regression or
Standard Error of Estimate
σ 𝑦−𝑦ො 2
𝜎𝑦12 =
𝑛−3

σ 𝑦 2 −𝛽0 σ 𝑦−𝛽1 σ 𝑋1 𝑦−𝛽2 σ 𝑋2 𝑦


𝜎𝑦12 =
𝑛−3

25440 − 17.949 500 − 1.873 6106 − 1.915(2610)


𝜎𝑦12 =
10 − 3
𝜎𝑦12 = 2.09

33
Practice Problem:
Compute the standard error of estimate.
𝑛=5 𝛽0 = −1.33
෍ 𝑦 = 89 ෍ 𝑦 2 = 1885

𝛽1 = 0.38 𝛽2 = 1.62
෍ 𝑋1 𝑦 = 619 ෍ 𝑋2 𝑦 = 1007

σ 𝑦 2 − 𝛽0 σ 𝑦 − 𝛽1 σ 𝑋1 𝑦 − 𝛽2 σ 𝑋2 𝑦
𝜎𝑦12 =
𝑛−3

34
Practice Problem Advertising Quality
Year Sales (𝐲) Expense Control
(𝐱 𝟏 ) (𝐱 𝟐 )
1 30 10 15
2 22 5 8
3 16 10 12
4 7 3 7
5 14 2 10

35
Practice Problem:
Solve the multiple regression problem using the following data:
𝑛=5
෍ 𝑦 = 89 ෍ 𝑋1 𝑋2 = 351

෍ 𝑋1 = 30 ෍ 𝑋2 = 52 ෍ 𝑋1 𝑦 = 619

෍ 𝑋2 𝑦 = 1007 ෍ 𝑥12 = 238 ෍ 𝑥22 = 582

36
Acknowledgment
• [Peter Andrew Bruce] Practical Statistics for Data Scientists
• [David Forsyth] Probability and Statistics for Computer Science
• [Michael Baron] Probability and Statistics for Computer Scientists
• .

37

You might also like