0% found this document useful (0 votes)
51 views

Chapter 14 Multiple Regression

Uploaded by

ayesha arshad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Chapter 14 Multiple Regression

Uploaded by

ayesha arshad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

Business Statistics:

A First Course
Fifth Edition

Chapter 14

Multiple Regression
Learning Objectives

In this chapter, you learn:


 How to develop a multiple regression model
 How to interpret the regression coefficients
 How to determine which independent variables to
include in the regression model
 How to determine which independent variables are more
important in predicting a dependent variable
 How to use categorical variables in a regression model
The Multiple Regression
Model

Idea: Examine the linear relationship between


1 dependent (Y) & 2 or more independent variables (Xi)

Multiple Regression Model with k Independent Variables:

Y-intercept Population slopes Random Error

Yi  β 0  β1 X1i  β 2 X 2i      β k X ki  ε i
Multiple Regression Equation

The coefficients of the multiple regression model are


estimated using sample data

Multiple regression equation with k independent variables:


Estimated Estimated
(or predicted) Estimated slope coefficients
intercept
value of Y

ˆ  b  b X  b X    b X
Yi 0 1 1i 2 2i k ki
Example:
2 Independent Variables
 A distributor of frozen dessert pies wants to
evaluate factors thought to influence demand
 Dependent variable: Pie sales (units per week)
 Independent variables: Price (in $)
Advertising
($100’s)

 Data are collected for 15 weeks


Pie Sales Example
Pie Price Advertising
Week Sales ($) ($100s) Multiple regression equation:
1 350 5.50 3.3
2 460 7.50 3.3 Sales = b0 + b1 (Price)
3 350 8.00 3.0
4 430 8.00 4.5 + b2 (Advertising)
5 350 6.80 3.0
6 380 7.50 4.0
7 430 4.50 3.0
8 470 6.40 3.7
9 450 7.00 3.5
10 490 5.00 4.0
11 340 7.20 3.5
12 300 7.90 3.2
13 440 5.90 4.0
14 450 5.00 3.5
15 300 7.00 2.7
Estimating a Multiple Linear
Regression Equation
 Computer software is generally used to generate
the coefficients and measures of goodness of fit
for multiple regression

 Excel:
 Data / Data Analysis / Regression
Estimating a Multiple Linear
Regression Equation
 Excel:
 Data / Data Analysis / Regression
Excel Multiple Regression Output
Regression Statistics
Multiple R 0.72213

R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341 Sales  306.526 - 24.975(Price)  74.131(Advertising)
Observations 15

ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
The Multiple Regression Equation

Sales  306.526 - 24.975(Price)  74.131(Adv ertising)


where
Sales is in number of pies per week
Price is in $
Advertising is in $100’s.
b1 = -24.975: sales b2 = 74.131: sales will
will decrease, on increase, on average,
average, by 24.975 by 74.131 pies per
pies per week for week for each $100
each $1 increase in increase in
selling price advertising
Pie Sales Correlation Matrix
Pie Sales Price Advertising
Pie Sales 1
Price -0.44327 1
Advertising 0.55632 0.03044 1

 Price vs. Sales : r = -0.44327


 There is a negative association between
price and sales
 Advertising vs. Sales : r = 0.55632
 There is a positive association between
advertising and sales
Scatter Diagrams

Sales vs. Price


Sales
600
500
400
300
200
Sales vs. Advertising
100 Sales
600
0
0 2 4 6 8 10 500
Price 400
300
200
100
0
0 1 2 3 4 5
Advertising
Using The Equation to Make
Predictions
Predict sales for a week in which the selling
price is $5.50 and advertising is $350 i.e 3.5:

Sales  306.526 - 24.975(Price)  74.131(Advertising)


 306.526 - 24.975 (5.50)  74.131 (3.5)
 428.62

Note that Advertising is


Predicted sales in $100’s, so $350
means that X2 = 3.5
is 428.62 pies
Multiple Coefficient of
Determination (R2)
 Reports the proportion of total variation in y
explained by all x variables taken together

SSR Sum of squares regression


R 
2

SST Total sum of squares
Multiple Coefficient of
Determination
(continued)
Regression Statistics
SSR 29460.0
Multiple R 0.72213
R 2
  .52148
R Square 0.52148 SST 56493.3
Adjusted R Square 0.44172
Standard Error 47.46341 52.1% of the variation in pie sales
Observations 15 is explained by the variation in
price and advertising
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
Adjusted R2
 R2 never decreases when a new x variable is
added to the model
 This can be a disadvantage when comparing

models
Adjusted r2
(continued)
 Shows the proportion of variation in Y explained
by all X variables adjusted for the number of X
variables used and sample size
 2  n  1 
r 2
adj  1  (1  r ) 
  n  k  1 
(where n = sample size, k = number of independent variables
r² =sample r- square)

In the example:
n=15, k=3, r²= 0.52148
Adjusted r2 in Excel
Regression Statistics
Multiple R 0.72213
2
radj  .44172
R Square 0.52148
Adjusted R Square 0.44172
44.2% of the variation in pie sales is
Standard Error 47.46341 explained by the variation in price and
Observations 15 advertising, taking into account the sample
size and number of independent variables
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
Is the Model Significant?
 F Test for Overall Significance of the Model
 Shows if there is a linear relationship between all
of the X variables considered together and Y
 Use F-test statistic
 Hypotheses:
H0: β1 = β2 = … = βk = 0 (no linear relationship)
H1: at least one βi ≠ 0 (at least one independent
variable affects Y)
F Test for Overall Significance
 Test statistic:
SSR
MSR k
FSTAT  
MSE SSE
n  k 1

where FSTAT has numerator d.f. = k and


denominator d.f. = (n – k - 1)
F Test for Overall Significance
(continued)

H0: β1 = β2 = 0 Test Statistic:


H1: β1 and β2 not both zero MSR
FSTAT   6.5386
 = .05 MSE
df1= 2 df2 = 12
Decision:
Critical Since FSTAT test statistic is
Value:
in the rejection region (p-
F0.05 = 3.885
value < .05), reject H0
 = .05
Conclusion:
0 F There is evidence that at least one
Do not Reject H0
reject H0 independent variable affects Y
F0.05 = 3.885
Are Individual Variables
Significant?
 Use t tests of individual variable slopes
 Shows if there is a linear relationship between
the variable Xj and Y holding constant the effects
of other X variables
 Hypotheses:
 H0: βj = 0 (no linear relationship)
 H1: βj ≠ 0 (linear relationship does exist
between Xj and Y)
Are Individual Variables
Significant?
(continued)

H0: βj = 0 (no linear relationship)


H1: βj ≠ 0 (linear relationship does exist
between Xj and Y)

Test Statistic:
bj  0
t STAT  (df = n – k – 1)
Sb
j
Are Individual Variables
Significant? Excel Output
(continued)
Regression Statistics
Multiple R 0.72213
t Stat for Price is tSTAT = -2.306, with
R Square 0.52148 p-value .0398
Adjusted R Square 0.44172
Standard Error 47.46341 t Stat for Advertising is tSTAT = 2.855,
Observations 15 with p-value .0145

ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
Inferences about the Slope:
t Test Example
H0: βj = 0 From the Excel and Minitab output:
For Price tSTAT = -2.306, with p-value .0398
H1: βj  0
For Advertising tSTAT = 2.855, with p-value .0145
d.f. = 15-2-1 = 12
 = .05 The test statistic for each variable falls
t/2 = 2.1788 in the rejection region (p-values < .05)
Decision:
/2=.025 /2=.025 Reject H0 for each variable
Conclusion:
There is evidence that both
Reject H0 Do not reject H0
-tα/2 tα/2
Reject H0
Price and Advertising affect
0
-2.1788 2.1788 pie sales at  = .05
Confidence Interval Estimate
for the Slope
Confidence interval for the population slope βj

b j  tα / 2 Sb where t has
(n – k – 1) d.f.
j

Coefficients Standard Error


Intercept 306.52619 114.25389 Here, t has
Price -24.97509 10.83213
(15 – 2 – 1) = 12 d.f.
Advertising 74.13096 25.96732

Example: Form a 95% confidence interval for the effect of changes in


price (X1) on pie sales:
-24.975 ± (2.1788)(10.832)
So the interval is (-48.576 , -1.374)
(This interval does not contain zero, so price has a significant effect on sales)
Confidence Interval Estimate
for the Slope
(continued)
Confidence interval for the population slope βj

Coefficients Standard Error … Lower 95% Upper 95%


Intercept 306.52619 114.25389 … 57.58835 555.46404
Price -24.97509 10.83213 … -48.57626 -1.37392
Advertising 74.13096 25.96732 … 17.55303 130.70888

Example: Excel output also reports these interval endpoints:


Weekly sales are estimated to be reduced by between 1.37 to
48.58 pies for each increase of $1 in the selling price, holding the
effect of price constant
Chapter Summary
 Understand the multiple regression concept
 Developed the multiple regression model
 Tested the significance of the multiple
regression model
 Discussed adjusted r2
 Tested individual regression coefficients

You might also like