0% found this document useful (0 votes)
26 views

Confidence Interval, Model Fitness and Prediction: S S T B

This document discusses confidence intervals, model fitness, and prediction for linear regression models. It provides examples of calculating 95% and 90% confidence intervals for slope and intercept estimates. It also discusses evaluating models using the coefficient of determination and providing predictions using the linear model, including calculating prediction intervals.

Uploaded by

tinkit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Confidence Interval, Model Fitness and Prediction: S S T B

This document discusses confidence intervals, model fitness, and prediction for linear regression models. It provides examples of calculating 95% and 90% confidence intervals for slope and intercept estimates. It also discusses evaluating models using the coefficient of determination and providing predictions using the linear model, including calculating prediction intervals.

Uploaded by

tinkit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

3.

Confidence interval, model fitness and prediction

3.1. Confidence interval


• Based on the sampling distribution of b0 and b1 (t-distribution), one can construct the
interval estimates
• A 100(1 – α)% confidence interval for b1
s2
b1 ± tα / 2,n − 2
S XX
• A 100(1 – α)% confidence interval for b0
1 x2
b0 ± tα / 2,n − 2 s +
n S XX

Example

Westwood company data


• 95% CI for b0 and b1
o t0.025,8 = 2.306
• A 95% CI for b1
o b1 = 2
o SE(b1) = 0.047
o CI = 2 ± 2.306 × 0.047 = (1.892 , 2.108)
o Exclude zero
• A 95% CI for b0
o b0 = 10
o SE(b0) = 2.5
o CI = 10 ± 2.306 × 2.5 = (4.228 , 15.772)
o Exclude zero

Example

CEO data
• 90% CI for b0 and b1
o t0.05,57 = 1.672
• 90% CI for b0
o b0 = 242.70
o SE(b0) = 168.76
o 242.70 ± 1.672 × 168.76 = (-39.47, 524.87)
o Include zero
• 90% CI for b1
o b1 = 3.1327
o SE(b1) = 3.22
o 3.1327 ± 1.672 × 3.22 = (-2.26, 8.53)
o Include zero

1
3.2. Quality of fitted model
• Does the data fit the model adequately?
• Will the model predict the response well enough?

(a) Coefficient of determination


• Proportion of variation in the response data that is explained by the model
SSR SSE
R2 = = 1−
SST SST
• Properties
o 0 ≤ R2 ≤ 1
o R2=1
ƒ Perfect fit of the model
ƒ 100% of the variation of Y can be explained by X
ƒ SSR + 0 = SST or SSE = 0
ƒ All the data points lie on the regression line
o R2 = 0
ƒ 0 + SSE = SST or SSE = SST
ƒ There is no linear relationship between X and Y
ƒ But it may still possible to have a strong nonlinear relationship between them
o R2 increases with number of regressors (→ outfitting)

• R2 and correlation
o For simple linear regression (with only one regressor), the coefficient of determination is
just equal to the square of the correlation coefficient between X and Y.
o Since
S
SSR = b12 S XX (exercise), b1 = XY , SST = SYY
S XX
o Therefore
2
b2S S 2 S XX ⎛⎜ S XY ⎞
⎟ = r2
R = 1 XX = XY
2
=
SYY 2
S XX SYY ⎜⎝ S XX SYY ⎟

2
Example

Westwood company data


• Coefficient of determination
SSR 13600
o R2 = = = 0.996
SST 13660
o Close to 1
o The linear regression model explains 99.6% of the variance of Y
o The model fits the data very well

Example

Shock data
• When all cases are considered
o R2 = 127.74 / 199.06 = 0.6417
o The linear regression model explains 64.17% of the variance of Y
o The model fits the data fairly well
• When obs. 3 and 14 are removed
o R2 = 101.35 / 125.33 = 0.8086
o The linear regression model explains 80.86% of the variance of Y
o The model fits the reduced data much better

Example

CEO data
• R2 = 45896/2820832 = 0.0163
o The linear regression model explains only 1.63% of the variance of Y
o The model does not fit the data well

3.3. Prediction
(a) Mean response
• Mean of Y given X = x0
E(Y | X = x0) = E( β0 + β 1 X + ε | X = x0) = β0 + β1 x0
• Estimated mean response
yˆ = b0 + b1 x0
• Standard error for mean response given X = x0
Var ( yˆ ) = Var (b 0 +b1 x0 )
= Var ( y + b1 (x0 − x )) Q b0 = y − b1 x
= Var ( y ) + Var (b1 ( x0 − x )) Q Cov( y , b1 ) = 0
⎛ 1 (x − x )2 ⎞
= σ 2 ⎜⎜ + 0 ⎟
⎝n S XX ⎟⎠
o Substitute s2 for σ2, standard error of estimated mean response
1 ( x0 − x )
2
s{yˆ } = s +
n S XX

3
• Confidence interval for mean response
o Under the condition of normal errors,
ƒ ŷ is normal since it is a linear combination of yi
ƒ s2 is σ 2 χ n2− 2 independent of ŷ
o A 100(1 – α)% confidence interval for mean response
1 (x0 − x )
2
yˆ ± tα / 2,n − 2 s +
n S XX

Example

Westwood company data


• Predict the mean response at lot size = 55 and 70
o yˆ (55) = 10 + 2 × 55 = 120
o yˆ (70) = 10 + 2 × 70 = 150
• SE for mean response
⎛ 1 (55 − 50 )2 ⎞
o SE ( y (55)) = 7.5⎜⎜ +
ˆ ⎟ = 0.8973

⎝ 10 3400 ⎠
⎛ 1 (70 − 50 )2 ⎞
o SE ( yˆ (70 )) = 7.5⎜⎜ + ⎟ = 1.2776

⎝ 10 3400 ⎠
• t0.025,8 = 2.306
• A 95% CI for mean response at lot size = 55
o 120 ± 2.306 × 0.8973 = (117.93,122.07 )
• A 95% CI for mean response at lot size = 70
o 150 ± 2.306 ×1.2776 = (147.05,152.95)
• Width of the CI increases with the distance from x0 to x
180

160

140

120

100

80

60

40
20 40 60 80

Example

Shock data
• 90% CI for the mean responses at number of shocks = 0, 8 and 15
• t0.05,14 = 1.761
x0 yˆ (x0 ) SE[ yˆ (x0 )] 90% CI
0 10.48 1.0776 8.58 12.38
8 6.19 0.5676 4.58 6.58
15 1.29 1.0776 -0.61 3.19
4
• SE is symmetric
o SE[ yˆ (0)] = SE[ yˆ (15)]
• The confidence band does not include 90% of the observations. Why?

Example

CEO data
• 95% CI for the mean responses at number of shocks = 32, 55 and 74
• t0.025,57 = 2.002
x0 yˆ (x0 ) SE[ yˆ ( x0 )] 95% CI
32 342.949 69.287 204.204 481.694
55 415.001 30.815 353.294 476.708
74 474.523 77.944 318.442 630.603
• The confidence intervals are very width on both ends away from the center of the data

5
(b) Individual response
• Predicted individual value of Y given X = x0
Y | X = x0 = β 0 + β1 x0 + ε
• Estimated individual response
yˆ = b0 + b1 x0
• Prediction interval for individual response
o Consider a single observation at X = x0 denoted by y0 ( = β 0 + β1 x0 + ε ) which is
independent of ŷ
o Expected value of ŷ equals to expected value of y0
E ( y0 − yˆ ) = E (β 0 + β1 x0 + ε − (b0 + b1 x0 ))
= β 0 + β1 x0 − E (b0 ) − E (b1 )x0 = 0
o Variance of the difference between the observation and the prediction given X = x0
Var ( y0 − yˆ ) = Var ( y0 ) + Var ( yˆ )
⎛ 1 (x − x )2 ⎞
= σ 2 + σ 2 ⎜⎜ + 0 ⎟

⎝ n S XX ⎠
⎛ 1 ( x − x )2 ⎞
= σ 2 ⎜⎜1 + + 0 ⎟
⎝ n S XX ⎟⎠
o Under normal assumption, since y0 and ŷ are normally distributed,
y0 − yˆ
~ N (0,1)
1 ( x0 − x )
2
σ 1+ +
n S XX
o s2 ~ σ 2 χ n2− 2 and is independent of y0 − yˆ
ƒ Replace σ by s
y0 − yˆ
~ tn−2
1 (x − x )
2
s 1+ + 0
n S XX
ƒ Prediction interval (CI for individual response)
1 (x0 − x )
2
yˆ ± tα / 2,n −2 s 1 + +
n S XX

Example

Westwood company data


• Predicted individual response = predicted mean response = ŷ
• t0.025,8 = 2.306
• SE for individual prediction at X = 55 and 70
⎛ 1 (55 − 50 ) ⎞
2
o SE ( yˆ (55)) = 7.5⎜⎜1 + + ⎟ = 2.8819
⎝ 10 3400 ⎟⎠

⎛ 1 (70 − 50 ) ⎞
2
o SE ( yˆ (70 )) = 7.5⎜⎜1 + + ⎟ = 3.0220
⎝ 10 3400 ⎟⎠
• A 95% CI for individual prediction at lot size = 55
o 120 ± 2.306 × 2.8819 = (113.35,126.65)

6
• A 95% CI for individual prediction at lot size = 70
o 150 ± 2.306 × 3.0220 = (143.03,156.97 )
• Prediction interval (blue lines) is wider than mean response interval (red lines)
180

160

140

120

100

80

60

40
20 40 60 80

Example

Shock data
• 90% CI for the individual prediction at number of shocks = 0, 8 and 15
• t0.05,14 = 1.761

x0 yˆ (x0 ) SE[ yˆ ( x0 )] 90% CI


0 10.48 2.5011 6.0793 14.8898
8 6.19 2.3273 1.4819 9.6802
15 1.29 2.5011 -3.1148 5.6957

7
Example

CEO data

• 95% CI

• Extend the prediction to X = 0


o The confidence interval for prediction which is far away from the mean of the data becomes
very wide
o The interval for the mean response (so as the individual prediction) include 0

You might also like