0% found this document useful (0 votes)
17 views49 pages

Chapter 13 Part 1

Uploaded by

shanjidakterimi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views49 pages

Chapter 13 Part 1

Uploaded by

shanjidakterimi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Introduction to

Regression Analysis
▪ Regression analysis is used to:
▪ Predict the value of a dependent variable based on the
value of at least one independent variable
▪ Explain the impact of changes in an independent
variable on the dependent variable
Dependent variable: the variable we wish to predict
or explain
Independent variable: the variable used to explain
the dependent variable
Simple Linear Regression
Model

▪ Only one independent variable, X


▪ Relationship between X and Y is
described by a linear function
▪ Changes in Y are assumed to be caused
by changes in X
Simple Linear Regression
Model
Population Independent
Y intercept Variable
Population Random
Slope Error
Coefficient term
Dependent

Variable
X + ε i01ii
Y = a +β
component
Linear component
Random Error

Simple Linear Regression


Model
(continued)
Y Intercept = β0

Observed Value of Y for Xi


Xi
Y = a +β X + ε i Slope = β1
01ii Random Error
for this Xi value

εi
Predicted Value
of Y for Xi X
Simple Linear Regression
Equation (Prediction Line)
The simple linear regression equation provides an
estimate of the population regression line

Estimated
(or predicted) Y Estimate of the
value for observation i ˆ regression slope

Estimate of the
regression intercept Value of X for
i0
Ya Xi
=+ b1 observation
i

The individual random error terms ei have a mean of zero


Chap 12-5

Least Squares Method

▪ a0 and b1 are obtained by finding the values of


a0 and b1that minimize the sum of the ˆ
squared differences between Y and : Y

(Y − = −+

min

2
ˆ 2

Y ) min (Y (a b X ))
ii i01i

Finding the Least Squares


Equation

▪ The coefficients a0 and b1, and other


regression results in this chapter, will be
found using Excel or Minitab

Formulas are shown in the text for those


who are interested
Interpretation of the
Slope and the Intercept
▪ a0is the estimated average value of Y
when the value of X is zero

▪ b1is the estimated change in the


average value of Y as a result of a
one-unit change in X
Sample Data for House Price
Model
(Y)
Square Feet (X)
House Price in $1000s
245 1400 312 1600 279 1700
308 1875 199 1100 219 1550
405 2350 324 2450 319 1425
255 1700

Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 12-9

Regression Using Excel


▪ Tools / Data
Analysis /
Regression
ANOVA

Excel Output
Regression Statistics
Multiple R 0.76211 R Square 0.58082 Adjusted R Square
0.52842 Standard Error 41.33032 Observations 10
house price = 98.24833 +
0.10977 (square feet)
The regression equation is:

df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039 Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833
58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Graphical Presentation

▪ House price model: scatter plot and


regression line
450
400
)

Slope
s
350
0

0
300

0
= 0.10977
1

$
250
(

c
200
i

P
150
e

s
100
u

50
Intercept H

0 500 1000 1500 2000 2500 3000


= 98.248 Square Feet
0

house price = 98.24833 + 0.10977 (square feet)


Interpretation of the
Intercept, b0
house price = 98.24833 + 0.10977 (square feet)

▪ a0is the estimated average value of Y when the


value of X is zero (if X = 0 is in the range of
observed X values)
▪ Here, no houses had 0 square feet, so a0 = 98.24833
just indicates that, for houses within the range of
sizes observed, $98,248.33 is the portion of the
house price not explained by square feet
Interpretation of the
Slope Coefficient, b1
house price = 98.24833 + 0.10977 (square feet)

▪ b1 measures the estimated change in the


average value of Y as a result of a one
unit change in X
▪ Here, b1 = .10977 tells us that the average value of a
house increases by .10977($1000) = $109.77, on
average, for each additional one square foot of size
Predictions using
Regression Analysis
Predict the price for a house
with 2000 square feet:

house price 98.25 0.1098 (sq.ft.) =

=+
98.25 0.1098(200 0)
.85
=
317
The predicted price for a house with 2000
square feet is 317.85($1,000s) =
$317,850
Measures of Variation

▪ Total variation is made up of two parts:

SST = SSR + SSE


Regression Sum of
Squares

=

Total Sum of Squares
Error Sum of Squares

−= − SSE (
SST (Y Y)
=
∑ ∑
ˆ
2
i where: SSR ( ˆ 2
Y) Y i i
Y Y) i 2
Y
= Average value of the dependent variable
Yi = Observed values of the dependent
variable Yˆi = Predicted value of Y for the given
Xi value
sum of squares
Measures of

Variation▪ SST = total (continued)

▪ Measures the variation of the Yi values around their


mean Y
▪ SSR = regression sum of squares
▪ Explained variation attributable to the relationship
between X and Y
▪ SSE = error sum of squares
▪ Variation attributable to factors other than the
relationship between X and Y
Measures of (continued) Y∧
Y
Yi
Variation∧
SSE = ∑(Yi- Yi
Y _ )2
_ ∧ SST = ∑(Yi- Y)2


SSR = ∑(Yi- Y)2
_
Y
Y

X
Xi
2
Coefficient of Determination, r
▪ The coefficient of determination is the portion
of the total variation in the dependent variable
that is explained by variation in the
independent variable
▪ The coefficient of determination is also called
r-squared and is denoted as r2
of 0r1
r squares
2
note:
regressio
≤ ≤
SSR
nsumof
= =SST
squares
totalsum 2
Excel Output
2
1893 8
SSR 4.934
r
Regression Statistics
variation in house
=== ANOVA
prices is explained by
0.58082 SST variation in square
Multiple R 0.76211 R Square
32600.5000 feet
0.58082 Adjusted R Square
0.52842 Standard Error 41.33032
58.08% of the
Observations 10

df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039 Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833
58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Standard Error of Estimate


▪ The standard deviation of the variation of
observations around the regression line is
estimated by
n

∑ Y)Y 2
( ˆ

n2−
ii
SSE
S ==
i1

YX
n2−
= Where

SSE = error sum of squares


n = sample size
Excel Output
Regression Statistics
Multiple R 0.76211 R Square 0.58082 SYX = 41.33032
Adjusted R Square 0.52842 Standard
Error 41.33032 Observations 10

ANOVA

df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039 Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833
58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Assumptions of Regression
Use the acronym LINE:
▪ Linearity
▪ The underlying relationship between X and Y is linear

▪ Independence of Errors
▪ Error values are statistically independent

▪ Normality of Error
▪ Error values (ε) are normally distributed for any given value of
X

▪ Equal Variance (Homoscedasticity)


▪ The probability distribution of the errors has constant variance

Measuring Autocorrelation:
The Durbin-Watson
Statistic

▪ Used when data are collected over time to


detect if autocorrelation is present
▪ Autocorrelation exists if residuals in one
time period are related to residuals in
another period
Autocorrelation
▪ Autocorrelation is correlation of the errors
(residuals) over time
Time (t) Residual Plot

15
▪ Here, residuals sign of positive -10 -15
s

show a cyclic autocorrelation


R

10 0 2 4 6 8 Time (t)

pattern, not 5
0
random. Cyclical
sl

-5
patterns are a
u

▪ Violates the regression assumption that


residuals are random and independent
The Durbin-Watson Statistic

▪ The Durbin-Watson statistic is used to test for


autocorrelation

H0: residuals are not correlated


H1: positive autocorrelation is present

ii1
▪ The possible
n (e e ) − − range is 0 ≤ D ≤ 4
2
close to 2 if H0is
true
▪ D should be
i1 may signal D greater than
D =
=n positive 2 may signal
e i negative
i2 2
= autocorrelation

∑ ▪ D less than 2
autocorrelation,

Testing for Positive


Autocorrelation
H0: positive autocorrelation does not
exist H1: positive autocorrelation is
present
▪ Calculate the Durbin-Watson test statistic = D
(The Durbin-Watson Statistic can be found using Excel or
Minitab)

▪ Find the values dL and dUfrom the Durbin-Watson table


(for sample size n and number of independent variables k)

Decision rule: reject H0if D < dL

Reject H0 Do not reject H0


Inconclusive

0 dL dU 2
Testing for Positive
Autocorrelation
(continued)

▪ Suppose we have the following time series data:

160
140

120

100
s

e
4.7038x 2
0
S

R
l

a 60
80
y = 30.65 + 40 = 0.8976
20

0 5 10 15 20 25 30

▪ Is there Time

autocorrelation?
Testing for Positive
Autocorrelation
(continued)
160

▪ Example with n = 25: 140


120
100
Excel/PHStat output: Durbin-Watson Calculations
Residuals 3279.98

80
e

l
0
a
= 0.8976
y = 30.65 + 4.7038x 2
R
Sum of Squared
S

60

Difference of Residuals 3296.18 Durbin-Watson


Statistic 1.00494 n
40

Sum of Squared 0 5 10 15 20 25 30 Time

20
2
(e e )
− 3296.1

ii1
8

Dn

= 2
= i2 i
i1
=
e
== 3279.98 1.00494

Testing for Positive


Autocorrelation
(continued)
▪ Here, n = 25 and there is k = 1 one independent variable
▪ Using the Durbin-Watson table, dL = 1.29 and dU = 1.45 ▪
D = 1.00494 < dL = 1.29, so reject H0 and conclude that
significant positive autocorrelation exists
▪ Therefore the linear model is not the appropriate model
to forecast sales
Decision: reject H0 since
D = 1.00494 < dL

Reject H0 Do not reject H0 Inconclusive


d =1.45
0 L=1.29 dU 2
Inferences About the Slope

▪ The standard error of the regression slope coefficient (b1) is


estimated by
S S
S ==
YX YX
b
1 SSX (X X)


i
2
where:
= Estimate of the standard
error of the least squares slope
b1

S
SSE = Standard error of the
SYX estimate
= −
n2

Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032 S 0.03297 b1=
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039 Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833
58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Inference about the Slope:


t Test
▪ t test for a population slope
▪ Is there a linear relationship between X and Y?
▪ Null and alternative hypotheses H0: β1 = 0
(no linear relationship)
H1: β1 ≠ 0 (linear relationship does exist) ▪
Test statistic
regression
bβ slope
t − coefficient
b1 β1 =
= 11S where: hypothesize
d slope
b1 =
Sb = standard
d.f. = n − 2 error of the slope 1

Inference about the Slope:


t Test
(continued)
Regression price = 98.25 + 0.1098
House Price in $1000s (y)
Square Feet (x)
Equation: house (sq.ft.)
Simple Linear
245 1400 312 1600 279 1700 308 1875 199 255 1700
1100 219 1550 405 2350 324 2450 319 1425
The slope of this model is
0.1098 house affect its sales price?
Does square footage of the
Inferences about the Slope:
t Test Example
From b1
Excel S
b1
H0: β1 = 0 output:
3.32938 0.01039
H1: β1 ≠ 0
Coefficients Standard Error t Stat P-value
98

Intercept 0.12892 Square Feet 0.10977 0.03297

− 0
0.10977
t − 293
11
= t = 3.3
= 8
S 0.03 297
b1

Inferences about the Slope:


t Test Example
(continued)
Test Statistic: t = 3.329
output:
H0: β1 = 0 From Excel St 1 b1
b
H1: β1 ≠ 0 d.f. = 10-2 = 8 Intercept 0.12892 Square Feet 0.10977 0.03297
3.32938 0.01039

Coefficients Standard Error t Stat P-value


9
Decision:
α/2=.025 Reject H0 :
α/2=.025
Conclusion
There is sufficient evidence
Reject H0 Reject H0 -tα/2Do not reject H0
that square footage affects
house price
0tα/2
-2.3060 2.3060 3.329

Inferences about the Slope:


t Test Example
P-value = Excel output:
(continued) P-value
0.01039 From
H0: β1 = 0
H1: β1 ≠ 0 Decision: P-value < α so
Coefficients Standard Error t Stat P-value
98Reject H0
Conclusion:
Intercept 0.12892 Square Feet 0.10977 0.03297
3.32938 0.01039

This is a two-tail test, so the


p-value is There is sufficient
P(t > 3.329)+P(t < -3.329) = evidence that square
0.01039 footage affects
(for 8 d.f.)
house price
F Test for Significance
MSR
F=
▪ F Test MSE
statistic:
SSR
MSR
where
=
k
SSE
MSE
= −
nk

1−
where F follows an F distribution with k numerator and (n – k - 1)
denominator degrees of freedom

(k = the number of independent variables in the regression model)

Excel Output
Regression Statistics

MSR
F===
Multiple R 0.76211 R Standard Error 41.33032
18934.9348
Square 0.58082 Adjusted 11.0848
R Square 0.52842 1708.1957
MSE
With 1 and 8 degrees

Observations 10 ANOVA of freedomP-value for the F Test

df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039 Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833
58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Significance

H0: β1 = 0 (continued)

F Test for Test Statistic:


H1: β1 ≠ 0 MSR
F== Decision:
α = .05
MSE 11.08
df1= 1 df2 = 8
Fα = 5.32 α = .05

Reject H0 at α = 0.05
Critical
Value:
Conclusion:
There is sufficient evidence that
house size affects selling price
0
Do not Reject H0
F
reject H0 5.32
F.05 =

Confidence Interval Estimate


for the Slope
Confidence Interval Estimate of the Slope:
b±t S d.f. = n - 2

1 n 2 b1

Excel Printout for House Prices:
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
98.24833 58.03348 1.69296 0.12892 -35.57720

Intercept 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

At 95% level of confidence, the confidence interval for


the slope is (0.0337, 0.1858)
Confidence Interval Estimate
for the Slope
(continued)

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


98.24833 58.03348 1.69296 0.12892 -35.57720

Intercept 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Since the units of the house price variable is
$1000s, we are 95% confident that the average
impact on sales price is between $33.70 and
$185.80 per square foot of house size

This 95% confidence interval does not include 0.


Conclusion: There is a significant relationship between
house price and square feet at the .05 level of significance

t Test for a Correlation Coefficient


▪ Hypotheses
H0: ρ = 0 (no correlation between X and Y)
H1: ρ ≠ 0 (correlation exists)
▪ Test statistic
= ρ
t r-
▪ (with n – 2 degrees of freedom)
1r− 2
where
= + >1
n2− 2
r r if b 0
2 =−<1
r r if b 0

Example: House Prices


Is there evidence of a linear relationship
between square feet and house price at the
.05 level of significance?

H0: ρ = 0 (No correlation)


H1: ρ ≠ 0 (correlation exists)
α =.05 , df = 10 - 2 = 8


t = .762 0 −
= 3.32

= 9
1r− n2−
22 1 .76210 2 −

Test Solution

rρ− .762 0

Example:

t − 3.329 Decisi
= =
= on:
1r− Reject H0
d.f. = 10-2 = 8
n2
22 1 .762 − α/2=.025

− 10 2
− Conclusion: association at
There is the 5% level of
evidence of a significance
linear
α/2=.025

Reject H0 Reject H0 -tα/2Do not reject H0

0tα/2
-2.3060 2.3060
3.329

You might also like