Chapter 11_Simple linear regression and Correlation
Chapter 11_Simple linear regression and Correlation
Yearly Revenue, y
(in millions of 5.2 8.9 11.7 16.8 ?
dollars)
11.1 Empirical Models
Suppose that we plot these points and try to draw a
line through them that fits. Note that there are
several ways in which this might be done. (See the
graphs below).Each would give a different estimate
of the company’s total revenue for 2016.
11.1 Empirical Models
Based on the scatter diagram, it is probably reasonable
to assume that the mean of the random variable Y is
related to x by the following straight-line relationship:
E Y | x Y |x 0 1 x
where the slope and intercept of the line are called
regression coefficients.
The simple linear regression model is given by
Y 0 1 x
where is the random error term.
11.1 Empirical Models
We think of the regression model as an empirical
model.
Suppose that the mean and variance of are 0
and 2, respectively, then
E Y | x E 0 1 x 0 1 x E
0 1 x
• and
The least squares estimators of 0 and 1 , say ,
0 1
L
n
2 yi x 0
must satisfy 0 1 i
0 0 1 i 1
L
n
2 yi x x 0
0 1 i i
1 0 1 i 1
11.2 Simple Linear Regression
Simplifying these two equations yields
n n
x y
n 0 1
i 1
i
i 1
i
n n n
0 xi 1 x xi yi *
2
i
i 1 i 1 i 1
where e y
i i y is called the residual. The residual
i
describes the error in the fit the model to the ith
observation yi . Later in this chapter we will use the
residuals to provide information about the adequacy of the
fitted model.
11.2 Simple Linear Regression
Example 2: Test for
significance of
regression using
the model for the
oxygen purity
data from Table.
Find simple linear
regression model
to the oxygen purity
data in Table ?
11.2 Simple Linear Regression
Solution:
The following quantities may be computed:
20 20
n 20; xi 23.92; yi 1,843.21
i 1 i 1
x 1.196; y 92.1605
20 20
i
y 2
i 1
170,044.5321; i 29.2892
x 2
i 1
20
x yi 1
i i 2,214.6566
11.2 Simple Linear Regression
Solution:
2
20
20 xi 23.92 2
S xx xi
2 i 1 29.2892 0.68088
i 1 20 20
20 20
20 yi xi
S xy xi yi i 1 i 1
i 1 20
2,214.6566
23.92 1,843.21 10.17744
20
11.2 Simple Linear Regression
Solution:
Therefore, the least squares estimates of the slope and
intercept are
S xy y x 74.28331
1 14.94748; 0 1
S xx
The fitted simple linear regression model (with the
coefficients reported to three decimal places) is
y 74.283 14.947 x
This model is plotted in Fig.11.2, along with the sample data.
11.2 Simple Linear Regression
SS E
An unbiased estimator of is
22
**
n 2
• Intercept Properties
2
; V
E 0 0
0
2 1 x
n S xx
In simple linear regression the estimated standard error of
the slope and intercept are
ˆ 2 1 x 2
se ˆ1
S xx
ˆ
and se 0 ˆ
2
n S
xx
respectively, where is computed from
2
**
11.4 Hypothesis Tests in Simple Linear
Regression
11.4.1. Use of t-Tests:
n S xx
11.4 Hypothesis Tests in Simple Linear
Regression
An important special case of the hypotheses of
Equation *** is
H 0 : 1 0
H1 : 1 0
These hypotheses relate to the significance of
regression.
Failure to reject H0 is equivalent to concluding
that there is no linear relationship between x and Y.
11.4 Hypothesis Tests in Simple Linear
Regression
Figure 1:
The hypothesis
H 0 : 1 0
is not rejected.
Figure 2:
The hypothesis
H 0 : 1 0
is rejected.
11.4 Hypothesis Tests in Simple Linear
Regression
Example 4: Test for
significance of
regression using
the model
for the oxygen
purity data from
table
a/1 at 0.01
b/ 0 at 0.01
11.4 Hypothesis Tests in Simple Linear
Regression
Solution:
H 0 : 1 0
a/ The hypotheses are
H1 : 1 0
2
We have 1 14.947; n 20; S xx 0.68088, 1.18
Test statistic
1 1 14.947
t0 11.35
2
/ S xx
se
1
1.18 / 0.68088
SS R / 1 MS R
Where F0
SS E / n 2 MS E
n 2 n 2
SS R
yi y ; SS E yi
yi
Reject if
i 1 i 1
Where is ndistribution (see Appendices VI)
2
SST yi y SS R SS E
i 1
H 0 F0 F ,1,n 2
F ,1,n 2 F
11.4 Hypothesis Tests in Simple Linear
Regression
ANOVA table
Source of Sum of Degrees Mean
F0
Variation Squares of Square
Freedom
Regression S SSR
SS R 1 xy 1 MS R
1 MS R
F0
Error SSE MS E
SS E n 2 MS E
n 2
Total SST n 1
11.4 Hypothesis Tests in Simple Linear
Regression
Example 5: We will use the analysis of variance approach to test for
significance of regression using the oxygen purity data model from Example 2.
Recall that
and
173.38;
SSTregression
The of
sum 14.947;
squares is S xy 10.17744, n 20
1
1
2 Y | x
Y Y
fY | x y e
; 0 Y X ; 1
Y |x 2 X X
11.8 Correlation
It is possible to draw inferences about the correlation
coefficient in this model. The estimator of is the sample
correlation coefficient
n
Y Y X
i i X S XY
R i 1
n 2 n S XX .SST
X Y Y
2
i X i
i 1 i 1
SST
Note that 1 R
S XX
S S
SS R
2
We may also write: R 1
2 XX
1 XY
SYY SST SST
11.8 Correlation
Properties:
1 R 1
R 0 :positive correlation.
R 0 :negative correlation
R 0 :no correlation
11.8 Correlation
Case 1:
It is often useful to test the hypotheses
H 0 : 0 (there is no relationship)
H1 : 0 (there is a relationship)
Test
R n 2
statistic t0
1 R2
Reject H0 if t0 t /2,n 2
11.8 Correlation
Case 2:
It is often useful to test the hypotheses
H 0 : 0
H1 : 0
Teststatistic
z0 arctanh R arctanh 0 n 3
where
u u
e e
tanh u u
e e u
Reject H0 if
z0 z /2
11.8 Correlation
The approximate 100(1- )% confidence interval
is
z /2 z /2
tanh arctan hr tanh arctan hr
n 3 n 3
Example: Use the given data to find the equation of the
regression line and the value of the linear correlation
coefficient r and test the hypothesis that 0 using 0.05
and 0.1
a/
x 2 4 5 6
y 7 11 13 20
b/ Cost 9 2 3 4 2 5 9 10
Number 85 52 55 68 67 86 83 73