Chap 11
Chap 11
Chapter 11
Simple Regression
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-1
Chapter Goals
After completing this chapter, you should be
able to:
Explain the simple linear regression model
Obtain and interpret the simple linear regression
equation for a set of data
Describe R2 as a measure of explanatory power of the
regression model
Understand the assumptions behind regression
analysis
Explain measures of variation and determine whether
the independent variable is significant
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-2
Chapter Goals
(continued)
After completing this chapter, you should be
able to:
Calculate and interpret confidence intervals for the
regression coefficients
Use a regression equation for prediction
Form forecast intervals around an estimated Y value
for a given X
Use graphical analysis to recognize potential problems
in regression analysis
Explain the correlation coefficient and perform a
hypothesis test for zero population correlation
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-3
11.1
Overview of Linear Models
Y = β0 + β1X
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-6
11.2
Linear Regression Model
The relationship between X and Y is
described by a linear function
Changes in Y are assumed to be caused by
changes in X
Linear regression population equation model
Yi β0 β1x i ε i
Yi β0 β1Xi ε i
Linear component Random Error
component
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-8
Simple Linear Regression
Model
(continued)
Y Yi β0 β1Xi ε i
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value
Intercept = β0
Xi X
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-9
Simple Linear Regression
Equation
The simple linear regression equation provides an
estimate of the population regression line
Estimated Estimate of Estimate of the
(or predicted) the regression regression slope
y value for
observation i intercept
Value of x for
yˆ i b0 b1x i observation i
ei ( y i - yˆ i ) y i - (b0 b1x i )
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-10
11.3
Least Squares Estimators
min (y i yˆ i )2
(x x)(y y)
i i
Cov(x, y) sy
b1 i1
n
2
rxy
sx sx
i
(x
i1
x) 2
b0 y b1x
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-13
Linear Regression Model
Assumptions
The true relationship form is linear (Y is a linear function
of X, plus random error)
The error terms, εi are independent of the x values
The error terms are random variables with mean 0 and
constant variance, σ2
(the constant variance property is called homoscedasticity)
2
E[ε i ] 0 and E[ε i ] σ 2 for (i 1, , n)
The random error terms, εi, are not correlated with one
another, so that
E[ε iε j ] 0 for all i j
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-14
Interpretation of the
Slope and the Intercept
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-15
Simple Linear Regression
Example
A real estate agent wishes to examine the
relationship between the selling price of a home
and its size (measured in square feet)
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-16
Sample Data for
House Price Model
House Price in $1000s Square Feet
(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-17
Graphical Presentation
350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-18
Regression Using Excel
Excel will be used to generate the coefficients and
measures of goodness of fit for regression
Data / Data Analysis / Regression
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-19
Regression Using Excel
(continued)
Data / Data Analysis / Regression
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-20
Excel Output
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-21
Excel Output
(continued)
Regression Statistics
Multiple R 0.76211 The regression equation is:
R Square 0.58082
Adjusted R Square 0.52842 house price 98.24833 0.10977 (square feet)
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-22
Graphical Presentation
House price model: scatter plot and
regression line
450
400
House Price ($1000s)
350
Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-23
Interpretation of the
Intercept, b0
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-24
Interpretation of the
Slope Coefficient, b1
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-25
11.4
Measures of Variation
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-27
Measures of Variation
(continued)
Y
yi 2
SSE = (yi - yi ) y
_
SST = (yi - y)2
y _2
_ SSR = (yi - y) _
y y
xi X
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-28
Coefficient of Determination, R2
The coefficient of determination is the portion
of the total variation in the dependent variable
that is explained by variation in the
independent variable
The coefficient of determination is also called
R-squared and is denoted as R2
SSR regression sum of squares
R 2
SST total sum of squares
note: 0 R 1 2
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-29
Examples of Approximate
r2 Values
Y
r2 = 1
X
r =1
2
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-30
Examples of Approximate
r2 Values
Y
0 < r2 < 1
X
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-31
Examples of Approximate
r2 Values
r2 = 0
Y
No linear relationship
between X and Y:
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-32
Excel Output
SSR 18934.9348
Regression Statistics
R 2
0.58082
Multiple R 0.76211 SST 32600.5000
R Square 0.58082
Adjusted R Square 0.52842 58.08% of the variation in
Standard Error 41.33032 house prices is explained by
Observations 10
variation in square feet
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-33
Correlation and R2
R r 2 2
xy
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-34
Estimation of Model
Error Variance
An estimator for the variance of the population model
error is
n
SSE i
e 2
σˆ s
2
2
e
i1
n2 n2
Division by n – 2 instead of n – 1 is because the simple regression
model uses two estimated parameters, b0 and b1, instead of one
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-35
Excel Output
Regression Statistics
Multiple R 0.76211 se 41.33032
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-36
Comparing Standard Errors
se is a measure of the variation of observed y
values from the regression line
Y Y
small se X large se X
2 2
s s
s2b1 e
e
(xi x) (n 1)s x
2 2
where:
sb1 = Estimate of the standard error of the least squares slope
SSE
se = Standard error of the estimate
n2
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-38
Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error
Observations
41.33032
10
sb1 0.03297
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-39
Comparing Standard Errors of
the Slope
Sb1 is a measure of the variation in the slope of regression
lines from different possible samples
Y Y
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-40
Inference about the Slope:
t Test
t test for a population slope
Is there a linear relationship between X and Y?
Null and alternative hypotheses
H0: β1 = 0 (no linear relationship)
H1: β1 0 (linear relationship does exist)
Test statistic
b1 β1 where:
t b1 = regression slope
sb1 coefficient
β1 = hypothesized slope
sb1 = standard
d.f. n 2 error of the slope
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-41
Inference about the Slope:
t Test
(continued)
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-42
Inferences about the Slope:
t Test Example
b1 sb1
H0: β1 = 0 From Excel output:
H1: β1 0 Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
b1 β1 0.10977 0
t t 3.32938
sb1 0.03297
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-43
Inferences about the Slope:
t Test Example
(continued)
Test Statistic: t = 3.329
b1 sb1 t
H0: β1 = 0 From Excel output:
H1: β1 0 Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
d.f. = 10-2 = 8 Square Feet 0.10977 0.03297 3.32938 0.01039
t8,.025 = 2.3060
Decision:
/2=.025 /2=.025 Reject H0
Conclusion:
Reject H0 Do not reject H0 Reject H0
There is sufficient evidence
-tn-2,α/2 0 tn-2,α/2 that square footage affects
-2.3060 2.3060 3.329 house price
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-44
Inferences about the Slope:
t Test Example
(continued)
P-value = 0.01039
P-value
H0: β1 = 0 From Excel output:
H1: β1 0 Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-46
Confidence Interval Estimate
for the Slope
(continued)
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-47
F-Test for Significance
F Test statistic: MSR
F
where MSE
SSR
MSR
k
SSE
MSE
n k 1
where F follows an F distribution with k numerator and (n – k - 1)
denominator degrees of freedom
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-48
Excel Output
Regression Statistics
Multiple R 0.76211
MSR 18934.9348
R Square 0.58082 F 11.0848
Adjusted R Square 0.52842 MSE 1708.1957
Standard Error 41.33032
Observations 10 With 1 and 8 degrees P-value for
of freedom the F-Test
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-49
F-Test for Significance
(continued)
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-51
Predictions Using
Regression Analysis
Predict the price for a house
with 2000 square feet:
98.25 0.1098(2000)
317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-52
Relevant Data Range
When using a regression model for prediction,
only predict within the relevant range of data
450
400
House Price ($1000s)
350
300
250
200
150 Risky to try to
100
extrapolate far
50
0
beyond the range
0 500 1000 1500 2000 2500 3000 of observed X’s
Square Feet
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-53
Estimating Mean Values and
Predicting Individual Values
Goal: Form intervals around y to express
uncertainty about the value of y for a given xi
Confidence
Interval for
the expected
Y
y
value of y,
given xi
y = b0+b1xi
Prediction Interval
for an single
observed y, given xi
xi X
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-54
Confidence Interval for
the Average Y, Given X
Confidence interval estimate for the
expected value of y given a particular xi
Confidence interval for E(Yn1 | Xn1 ) :
1 (x n1 x)2
yˆ n1 t n2,α/2se 2
n (x i x)
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-55
Prediction Interval for
an Individual Y, Given X
Confidence interval estimate for an actual
observed value of y given a particular xi
1 (x n1 x)2
yˆ n1 t n2,α/2 se 1 2
n (x i x)
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-56
Estimation of Mean Values:
Example
Confidence Interval Estimate for E(Yn+1|Xn+1)
1 (x i x)2
yˆ n1 t n-2,α/2 se 317.85 37.12
n (x i x) 2
1 (Xi X)2
yˆ n1 t n-1,α/2se 1 317.85 102.28
n (Xi X) 2
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-59
Correlation Analysis
The population correlation coefficient is
denoted ρ (the Greek letter rho)
The sample correlation coefficient is
s xy
r
sxsy
where
s xy
(x x)(y y)i i
n 1
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-60
Hypothesis Test for Correlation
r (n 2)
t
(1 r )
2
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-61
Decision Rules
Hypothesis Test for Correlation
/2 /2
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-62
11.9
Graphical Analysis
The linear regression model is based on
minimizing the sum of squared errors
If outliers exist, their potentially large squared
errors may have a strong influence on the fitted
regression line
Be sure to examine your data graphically for
outliers and extreme points
Decide, based on your model and logic, whether
the extreme points should remain or be removed
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-63
Chapter Summary
Introduced the linear regression model
Reviewed correlation and the assumptions of
linear regression
Discussed estimating the simple linear
regression coefficients
Described measures of variation
Described inference about the slope
Addressed estimation of mean values and
prediction of individual values
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 11-64