Chapter 13 Part 1
Chapter 13 Part 1
Regression Analysis
▪ Regression analysis is used to:
▪ Predict the value of a dependent variable based on the
value of at least one independent variable
▪ Explain the impact of changes in an independent
variable on the dependent variable
Dependent variable: the variable we wish to predict
or explain
Independent variable: the variable used to explain
the dependent variable
Simple Linear Regression
Model
Variable
X + ε i01ii
Y = a +β
component
Linear component
Random Error
εi
Predicted Value
of Y for Xi X
Simple Linear Regression
Equation (Prediction Line)
The simple linear regression equation provides an
estimate of the population regression line
Estimated
(or predicted) Y Estimate of the
value for observation i ˆ regression slope
Estimate of the
regression intercept Value of X for
i0
Ya Xi
=+ b1 observation
i
(Y − = −+
∑
min
∑
2
ˆ 2
Y ) min (Y (a b X ))
ii i01i
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 12-9
Excel Output
Regression Statistics
Multiple R 0.76211 R Square 0.58082 Adjusted R Square
0.52842 Standard Error 41.33032 Observations 10
house price = 98.24833 +
0.10977 (square feet)
The regression equation is:
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039 Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833
58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Graphical Presentation
Slope
s
350
0
0
300
0
= 0.10977
1
$
250
(
c
200
i
P
150
e
s
100
u
50
Intercept H
=+
98.25 0.1098(200 0)
.85
=
317
The predicted price for a house with 2000
square feet is 317.85($1,000s) =
$317,850
Measures of Variation
−= − SSE (
SST (Y Y)
=
∑ ∑
ˆ
2
i where: SSR ( ˆ 2
Y) Y i i
Y Y) i 2
Y
= Average value of the dependent variable
Yi = Observed values of the dependent
variable Yˆi = Predicted value of Y for the given
Xi value
sum of squares
Measures of
∧
SSR = ∑(Yi- Y)2
_
Y
Y
X
Xi
2
Coefficient of Determination, r
▪ The coefficient of determination is the portion
of the total variation in the dependent variable
that is explained by variation in the
independent variable
▪ The coefficient of determination is also called
r-squared and is denoted as r2
of 0r1
r squares
2
note:
regressio
≤ ≤
SSR
nsumof
= =SST
squares
totalsum 2
Excel Output
2
1893 8
SSR 4.934
r
Regression Statistics
variation in house
=== ANOVA
prices is explained by
0.58082 SST variation in square
Multiple R 0.76211 R Square
32600.5000 feet
0.58082 Adjusted R Square
0.52842 Standard Error 41.33032
58.08% of the
Observations 10
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039 Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833
58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
∑ Y)Y 2
( ˆ
−
n2−
ii
SSE
S ==
i1
YX
n2−
= Where
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039 Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833
58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Assumptions of Regression
Use the acronym LINE:
▪ Linearity
▪ The underlying relationship between X and Y is linear
▪ Independence of Errors
▪ Error values are statistically independent
▪ Normality of Error
▪ Error values (ε) are normally distributed for any given value of
X
Measuring Autocorrelation:
The Durbin-Watson
Statistic
15
▪ Here, residuals sign of positive -10 -15
s
10 0 2 4 6 8 Time (t)
pattern, not 5
0
random. Cyclical
sl
-5
patterns are a
u
∑ ▪ D less than 2
autocorrelation,
0 dL dU 2
Testing for Positive
Autocorrelation
(continued)
160
140
120
100
s
e
4.7038x 2
0
S
R
l
a 60
80
y = 30.65 + 40 = 0.8976
20
0 5 10 15 20 25 30
▪ Is there Time
autocorrelation?
Testing for Positive
Autocorrelation
(continued)
160
80
e
l
0
a
= 0.8976
y = 30.65 + 4.7038x 2
R
Sum of Squared
S
60
20
2
(e e )
− 3296.1
∑
ii1
8
−
Dn
∑
= 2
= i2 i
i1
=
e
== 3279.98 1.00494
S
SSE = Standard error of the
SYX estimate
= −
n2
Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032 S 0.03297 b1=
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039 Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833
58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
− 0
0.10977
t − 293
11
= t = 3.3
= 8
S 0.03 297
b1
1−
where F follows an F distribution with k numerator and (n – k - 1)
denominator degrees of freedom
Excel Output
Regression Statistics
MSR
F===
Multiple R 0.76211 R Standard Error 41.33032
18934.9348
Square 0.58082 Adjusted 11.0848
R Square 0.52842 1708.1957
MSE
With 1 and 8 degrees
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039 Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833
58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Significance
H0: β1 = 0 (continued)
Reject H0 at α = 0.05
Critical
Value:
Conclusion:
There is sufficient evidence that
house size affects selling price
0
Do not Reject H0
F
reject H0 5.32
F.05 =
1 n 2 b1
−
Excel Printout for House Prices:
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
98.24833 58.03348 1.69296 0.12892 -35.57720
Intercept 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Intercept 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Since the units of the house price variable is
$1000s, we are 95% confident that the average
impact on sales price is between $33.70 and
$185.80 per square foot of house size
rρ
t = .762 0 −
= 3.32
−
= 9
1r− n2−
22 1 .76210 2 −
−
Test Solution
rρ− .762 0
Example:
t − 3.329 Decisi
= =
= on:
1r− Reject H0
d.f. = 10-2 = 8
n2
22 1 .762 − α/2=.025
− 10 2
− Conclusion: association at
There is the 5% level of
evidence of a significance
linear
α/2=.025
0tα/2
-2.3060 2.3060
3.329