11 SimpleRegression
11 SimpleRegression
STATISTICS
Slope Random
Intercept Independent Error term
Coefficient
Variable
Dependent
Variable
Yi β0 β1Xi ε i
Deterministic component Random Error
component
EXAMPLE
You are a marketing analyst for Teddy Bears. You gather the
following data and want to find a simple relationship between
advertising and sales.
5 5 4
SCATTERGRAM
SALES VS. ADVERTISING
Sales
4
3
2
1
0
0 1 2 3 4 5
Advertising
4
SCATTERGRAM
SALES VS. ADVERTISING
5
LEAST SQUARES ESTIMATORS
Prediction equation
yˆi ˆ0 ˆ1 xi
Sample slope
SS xy xi x yi y
ˆ1
2
SS xx i x x
Sample y - intercept y y
SS xy x x
x x
2
SS xx
ˆ0 y ˆ1x SS yy y y
2
6
COMPUTATIONS – LEAST SQUARES LINE
xi (adv) yi (sales) (xi - 3)2 (xi - 3)(yi - 2)
1 1 4 2
2 1 1 1
3 2 0 0
4 2 1 0
5 4 4 4
∑xi = 15 ∑yi = 10 SSxx= ∑(xi - 3) 2 SSxy=∑(xi - 3)(yi - 2)
Mean = 3 Mean = 2 = 10 =7
7
COEFFICIENT INTERPRETATIONS
^
1. Slope (1)
• Sales Volume (y) is expected to increase by $ 700
for each $100 increase in advertising (x), over the
sampled range of advertising expenditures from
$100 to $500
^
2. y-Intercept (0)
• Since 0 is outside of the range of the sampled
values of x, the y-intercept has no meaningful
interpretation 8
MEASURES OF VARIATION
n=5
11
COEFFICIENT OF DETERMINATION, R2
note: 0 R 1
2
R2 INTERPRETATION
R2 = SSR/SST = 0.82
13
STANDARD ERROR OF THE
REGRESSION MODEL
SSE SSE
s
2
Degrees of freedom for error n 2
SSE 1.1
s
2
.36667
n2 52
s .36667 .6055
We would expect most of the observed revenues to
fall within 2s or $1,220 of the least squares line.
15
Yi β0 β1Xi ε i
16
MAKING INFERENCES ABOUT SLOPE
E(y) = 0 + 1x
Ho: 1 = 0
Ha: 1 ≠ 0
If 1 = 0, then x has no influence on y.
If we reject Ho, we say that x has a statistically significant
effect on y.
To test the null, we need to know the sampling distribution of
̂1
17
MAKING INFERENCES ABOUT THE
SLOPE 1
Sampling Distribution of ̂1 large n:
for
̂1 ~ N ( 1 , ˆ )
1
SS xx
Typically approximate
s
ˆ sˆ
1
SS xx by 1
SS xx
20
TEST OF SLOPE COEFFICIENT
SOLUTION
H0: 1 = 0
Ha: 1 0
.05
df 5–2=3
Critical Value(s):
Reject H0 Reject H0
.025 .025
-3.182 0 3.182 t
21
TEST OF SLOPE COEFFICIENT
SOLUTION
.05
df 5–2=3
Critical Value(s): Decision:
Reject H0 Reject H0 Reject Ho at = .05
.025 .025 Conclusion:
There is evidence of a
-3.182 0 3.182 t relationship
22
MAKING INFERENCES ABOUT THE
SLOPE 1
Confidence Interval for 1 : ˆ1 t 2 sˆ
1
[0.090, 1.309]
23
PREDICTION WITH REGRESSION
MODELS
Types of predictions
Point estimates
Interval estimates
What is predicted
Population mean response E(y) for given x
Point on population regression line
Individual response (yi) for given x
24
WHAT IS PREDICTED
y
yIndividual ^
b x
^
= b0
+ 1
^y i
Mean y, E(y)
E(y) = b0 + b1x
Prediction, ^
y
x
xP 25
USING THE MODEL FOR
ESTIMATION AND PREDICTION
100(1-α)% Confidence Interval for Mean Value of y at x=xp
1 xp x
2
yˆ t 2 s
n SS xx
100(1-α)% Prediction Interval for an Individual New Value of y at
x=xp
1 xp x
2
yˆ t 2 s 1
n SS xx
26
where tα/2 is based on (n-2) degrees of freedom
EXAMPLE
Find a 95% confidence interval for the mean monthly
sales when the store spends $400 on advertising.
27
EXAMPLE
Predict the monthly sales for next month if $400 is spent
on advertising. Use a 95% prediction interval.
28
CONFIDENCE INTERVALS V.
PREDICTION INTERVALS
y
^
b xi
^
= b0
+ 1
^y i
x
x 29
REGRESSION RESULTS IN R
30
ESTIMATOR: UNBIASEDNESS (ACCURACY) AND
MINIMUM VARIANCE (PRECISION OR EFFICIENCY)
32
MODEL ASSUMPTIONS
So far we only estimated deterministic component. Now we
turn our attention to random error ϵ. We first need some
modeling assumptions…
Assumption 1: E( /x) = E( ) = 0
The mean of the probability distribution of is 0. This
implies mean value of y for a given value of x is 0 + 1x.
y = 0 + 1 x +
Since, E( /x) = E( ) = 0,
E(y/x) = 0 + 1x
33
Sometimes, just written as E(y) = 0 + 1x
MODEL ASSUMPTIONS
Assumption 2: Homoskedasticity
• The variance of the probability distribution of
is constant for all settings of the independent
variable x. For our straight-line model, this
assumption means that the variance of is
equal to a constant, say 2, for all values of x.
• When this assumption does not hold, we say
we have a problem of heteroskedasticity.
34
MODEL ASSUMPTIONS
Assumption 3: Normality
The probability distribution of is normal.
Assumption 4: No Autocorrelation
The values of associated with any two observed
values of y are independent–that is, the value of
associated with one value of y has no effect on the
values of associated with other y values.
35
DISC: 203 – PROBABILITY &
STATISTICS
Practice Set – Simple Regression
36
EXERCISE 1
For five popular cars, the following information on engine size and
mileage ratings was recorded:
Engine Size, x Mileage Rating,
(cubic inches) y
144 28
232 21
306 23
388 17
414 15
df 5–2=3
Critical Value(s): Decision:
Reject H0 Reject H0 Reject Ho at = .05
.025 .025 Conclusion:
There is evidence of a
-3.182 0 3.182 t relationship
38
Since size is
statistically
significant, we can
interpret its value: if
size increases by 10 As p-value
cubic inches, =0.0226<0.05, the
mileage rating effect of size on
decreases by 0.43 mileage is
points on average. statistically
significant.
We are 95%
confident, an
increase in size by
10 cubic inches,
decreases mileage
rating by between
0.11 and 0.74 points.
EXERCISE 2
A real estate broker has been collecting data on home sales so
that he can investigate the relationship between the dependent
variable, y = value of the purchased home, and the
independent variable, x=annual family income of buyer. Data
from six recent sales are shown in the following table:
df 6–2=4
Critical Value(s): Decision:
Reject H0 Reject Ho at = .01
.01 Conclusion:
There is evidence of a
0 3.747 t positive relationship
41
For a one-tail test,
use half the p-value.
Since
0.000805<0.01, we
reject Ho at 1%
significance level.
42