Introduction To Linear Regression and Correlation Analysis
Introduction To Linear Regression and Correlation Analysis
y y
x x
y y
x x
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-3
Scatter Plot Examples
(continued)
Strong relationships Weak relationships
y y
x x
y y
x x
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-4
Scatter Plot Examples
(continued)
No relationship
x
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-5
Correlation Coefficient
(continued)
x x x
r = -1 r = -.6 r=0
y y
x x
r = +.3
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. r = +1 Chap 14-8
Calculating the
Correlation Coefficient
Sample correlation coefficient:
r
( x x)( y y)
[ ( x x ) ][ ( y y ) ]
2 2
Tree n xy x y
Height, r
y 70 [n( x 2 ) ( x)2 ][n( y 2 ) ( y)2 ]
60
8(3142) (73)(321)
50
40
[8(713) (73)2 ][8(14111) (321) 2 ]
30
0.886
20
10
0
r = 0.886 → relatively strong positive
0 2 4 6 8 10 12 14
linear association between x and y
Trunk Diameter, x
Correlation between
Tree Height and Trunk Diameter
n2
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-13
Example: Produce Stores
Is there evidence of a linear relationship
between tree height and trunk diameter at
the .05 level of significance?
r .886
t 4.68
1 r 2 1 .886 2
n2 82
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-14
Example: Test Solution
r .886 Decision:
t 4.68
1 r 2 1 .886 2 Reject H0
y β0 β1x ε
Variable
y y β0 β1x ε
Observed Value
of y for xi
εi Slope = β1
Predicted Value
Random Error
of y for xi
for this x value
Intercept = β0
xi x
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-21
Estimated Regression Model
The sample regression line provides an estimate of
the population regression line
ŷ i b0 b1x variable
e 2
(y ŷ) 2
(y (b 0 b1x))
2
b1
(x x)(y y) algebraic equivalent for b1:
(x x) 2
xy x y
b1 n
and (
x n
2 x ) 2
b0 y b1x
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
350
Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
Xi x
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-36
Coefficient of Determination, R2
The coefficient of determination is the portion
of the total variation in the dependent variable
that is explained by variation in the
independent variable
SSR
R 2 where 0 R 1
2
SST
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-37
Coefficient of Determination, R2
(continued)
Coefficient of determination
SSR sum of squares explained by regression
R 2
SST total sum of squares
R r 2 2
where:
R2 = Coefficient of determination
r = Simple correlation coefficient
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-38
Examples of Approximate
R2 Values
y
R2 = 1
x
R = +1
2
y
0 < R2 < 1
x
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-40
Examples of Approximate
R2 Values
(continued)
R2 = 0
y
No linear relationship
between x and y:
Test statistic
SSR/1
F
SSE/(n 2) (with D1 = 1 and D2 = n - 2
degrees of freedom)
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
SSE
sε
n2
Where
SSE = Sum of squares error
n = Sample size
x 2
( x)2
n
where:
sb1 = Estimate of the standard error of the least squares slope
SSE
sε = Sample standard error of the estimate
n2
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-46
Excel Output
Regression Statistics sε 41.33032
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error
Observations
41.33032
10
sb1 0.03297
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
y y
d.f. = 10-2 = 8
Decision:
/2=.025 /2=.025 Reject H0
Conclusion:
Reject H0 Do not reject H0 Reject H
There is sufficient evidence
-tα/2 tα/2 0
1 (x p x)
2
ŷ t /2sε
n (x x) 2
1 (x p x)
2
ŷ t /2 sε 1
n (x x) 2
Prediction Interval
for an individual y,
y given xp
Confidence
Interval for
+ b x the mean of
y = b0
1
y, given xp
x
x xp
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-56
Example: House Prices
98.25 0.1098(2000)
317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-58
Estimation of Mean Values:
Example
Confidence Interval Estimate for E(y)|xp
Find the 95% confidence interval for the average
price of 2,000 square-foot houses
Predicted Price Yi = 317.85 ($1,000s)
1 (x p x)2
ŷ t α/2 sε 317.85 37.12
n (x x) 2
1 (x p x)2
ŷ t α/2 sε 1 317.85 102.28
n (x x) 2
In Excel, use
PHStat | regression | simple linear regression …
Check the
“confidence and prediction interval for X=”
box and enter the x-value and confidence level
desired
Input values
levels of x
Evaluate normal distribution assumption
y y
x x
residuals
x residuals x
Not Linear
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Linear
Chap 14-64
Residual Analysis for
Constant Variance
y y
x x
residuals
x residuals x
RESIDUAL OUTPUT
Predicted House Price Model Residual Plot
House Price Residuals
1 251.92316 -6.923162 80
2 273.87671 38.12329 60
3 284.85348 -5.853484 Residuals 40
4 304.06284 3.937162 20
5 218.99284 -19.99284 0
6 268.38832 -49.38832 0 1000 2000 3000
-20
7 356.20251 48.79749
-40
8 367.17929 -43.17929
-60
9 254.6674 64.33264
Square Feet
10 284.85348 -29.85348