0% found this document useful (0 votes)
52 views

Linear Regression Analysis: Module - Ii

The document discusses linear regression analysis and provides the following key points in 3 sentences: 1) It describes how to find a joint confidence region for the slope (β1) and intercept (β0) parameters using a chi-square distribution and the F-distribution. 2) It introduces analysis of variance techniques for testing hypotheses about multiple slope parameters in multiple linear regression models. 3) Key sums of squares - total, regression, and residual - are defined based on the variation between observed and predicted values, and they each have different degrees of freedom.

Uploaded by

naruto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Linear Regression Analysis: Module - Ii

The document discusses linear regression analysis and provides the following key points in 3 sentences: 1) It describes how to find a joint confidence region for the slope (β1) and intercept (β0) parameters using a chi-square distribution and the F-distribution. 2) It introduces analysis of variance techniques for testing hypotheses about multiple slope parameters in multiple linear regression models. 3) Key sums of squares - total, regression, and residual - are defined based on the variation between observed and predicted values, and they each have different degrees of freedom.

Uploaded by

naruto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

LINEAR REGRESSION ANALYSIS

MODULE – II
Lecture - 5

Simple Linear Regression


Analysis
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
2

Joint confidence region for β 0 and β1


A joint confidence region for β 0 and β1 can also be found. Such region will provide a 100(1 − α )% confidence that both the
estimates of β 0 and β1 are correct. Consider the centered version of the linear regression model

yi = β 0* + β1 ( xi − x ) + ε i
s xy
where β= β 0 + β1 x . The least squares estimators of β 0* and β=
*
0 1
are b0* y=
and b1 respectively.
sxx
Using the results that E (b* ) = β * ,
0 0

E (b1 ) = β1 ,
σ2
Var (b0* ) = ,
n
σ2
Var (b1 ) = .
s xx
When σ 2 is known, then the statistic

b0* − β 0*
~ N (0,1)
σ 2

n
and

b1 − β1
~ N (0,1).
σ 2

sxx
3
Moreover, both the statistics are independently distributed. Thus
2
 
 * * 
 b0 − β 0  ~ χ12
 σ  2
 
 n 
and
2
 
 
 b1 − β1  ~ χ 2
 2  1

 σ 
 s 
 xx 

are also independently distributed because b0* and b1 are independently distributed. Consequently sum of these two

n(b0* − β o* ) 2 sxx (b1 − β1 ) 2


+ ~ χ 22 .
σ 2
σ 2

Since
SS res
~ χ n2− 2
σ 2

and SSres is independently distributed of b0* and b1 , so the ratio


 n(b0* − β 0* ) 2 sxx (b1 − β1 ) 2 
 +  2
 σ2 σ2  ~ F2,n − 2 .
 SS res 
 2  (n − 2)
 σ 
4

Substituting b0=
*
b0 + b1 x and β=
*
0 β 0 + β1 x , we get

 n − 2   Qf 
  
 2   SS res 

where
n n
Q f = n(b0 − β 0 ) + 2∑ xi (b0 − β1 )(b1 − β1 ) + ∑ xi2 (b1 − β1 ) 2 .
2

=i 1 =i 1

Since
 n − 2  Q f 
P   1−α
≤ F2,n − 2  =
 2  SS res 

holds true for all values of β 0 and β1, so the 100(1 − α )% confidence region for β 0 and β1 is

 n − 2  Qf
  ≤ F2,n − 2;α .
 2  SS res

This confidence region is an ellipse which gives the 100 (1 − α )% probability that β 0 and β1 are contained simultaneously in
this ellipse.
5
Analysis of variance

The technique of analysis of variance is usually used for testing the hypothesis related to equality of more than one
parameters, like population means or slope parameters. It is more meaningful in case of multiple regression model when
there are more than one slope parameters. This technique is discussed and illustrated here to understand the related
basic concepts and fundamentals which will be used in developing the analysis of variance in the next module in multiple
linear regression model where the explanatory variables are more than two.
A test statistic for testing H 0 : β1 = 0 can also be formulated using the analysis of variance technique as follows.

On the basis of the identity yi − yˆi = ( yi − y ) − ( yˆi − y ),

the sum of squared residuals is


n
=
S (b) ∑ ( y − yˆ )
i =1
i i
2

n n n
= ∑ ( yi − y ) + ∑ ( yˆi − yi ) 2 − 2∑ ( yi − y )( yˆi − y ).
2

=i 1 =i 1 =i 1

n n
Further consider
=i 1 =i 1
∑ ( yi − y )( yˆi − y=) ∑ ( y − y )b ( x − x )
i 1 i

n
= b12 ∑ ( xi − x ) 2
i =1
n
= ∑ ( yˆ − y ) .
i =1
i
2

n n n
Thus we have
i i i
=i 1 =i 1 =i 1
∑ ( y − y )= ∑ ( y − yˆ ) + ∑ ( yˆ − y ).
2 2
i
2

The term ∑ ( y − y)
i =1
i
2
is called the sum of squares about the mean or corrected sum of squares of y (i.e., SS corrected)
or total sum of squares denoted as syy.
6
n
The term ∑ ( y − yˆ )
i =1
i i
2
describes the deviation: observation minus predicted value, viz., the residual sum of squares, i.e.:
n
=
SS res ∑ ( y − yˆ )
i =1
i i
2

n
whereas the term ∑ ( yˆ − y )
i =1
i
2
describes the proportion of variability explained by regression,
n
=
SS reg ∑ ( yˆ − y ) .
i =1
i
2

n
If all observations yi are located on a straight line, then in this case= ∑
( yi − yˆi ) 0 and= 2
thus SScorrected SS r e g .
i =1

Note that SSreg is completely determined by b1 and so has only one degrees of freedom. The total sum of squares
n n
s yy = ∑ ( yi − y ) 2 has (n - 1) degrees of freedom due to constraint ∑ ( y − y) =
i 0 and SS res has (n - 2) degrees of
i =1 i =1

freedom as it depends on b0 and b1.

All sums of squares are mutually independent and distributed as χ df2 with df degrees of freedom if the errors are normally
distributed.
The mean square due to regression is
SS r e g
MS r e g =
1
and mean square due to residuals is
SS res
MSE =.
n−2
The test statistic for testing H 0 : β1 = 0 is

MS r e g
F0 = .
MSE
7

If H 0 : β1 = 0 is true, then MS r e g and MSE are independently distributed and thus F0 ~ F1, n − 2 .

The decision rule for H1 : β1 ≠ 0 is to reject H0 if F0 > F1,n − 2;1−α


at α level of significance. The test procedure can be described in an Analysis of Variance table.
Analysis of variance for testing H 0 : β1 = 0

Source of variation Sum of squares Degrees of freedom Mean Square


Regression SSreg 1 MSreg

Residual SSres n-2 MSE


Total syy n-1

Some other forms of SS reg , SS res and syy can be derived as follows:
The sample correlation coefficient then may be written as
sxy
rxy = .
sxx s yy
sxy s yy
Moreover, we have =
b1 = rxy .
sxx sxx

The estimator of σ 2 in this case may be expressed as


1 n 2
s2 = ∑ ei
n − 2 i =1
1
= SS res .
n−2
8

Various alternative formulations for SSres are in use as well:


n
SS res= ∑ [ y − (b
i =1
i 0 + b1 xi )]2

n
= ∑ [( y − y ) − b ( x − x )]
i =1
i 1 i
2

=s yy + b12 sxx − 2b1sxy

= s yy − b12 sxx

( sxy ) 2
= s yy − .
sxx

Using this result, we find that

SScorrected = syy
and
SS r e=
g s yy − SS res

( sxy ) 2
=
sxx
= b12 sxx
= b1sxy .
9

Goodness of fit of regression

It can be noted that a fitted model can be said to be good when residuals are small. Since SSres is based on residuals, so a
measure of quality of fitted model can be based on Ssres. When intercept term is present in the model, a measure of
goodness of fit of the model is given by

SS res
R2 = 1 −
s yy
SS r e g
= .
s yy
This is known as the coefficient of determination. This measure is based on the concept that how much variation in y’s
stated by syy is explainable by SSreg. and how much unexplainable part is contained in SSres. The ratio SSreg / syy describes
the proportion of variability that is explained by regression in relation to the total variability of y. The ratio SSres / syy
describes the proportion of variability that is not covered by the regression.

It can be seen that


R2 = r2xy.
where rxy is the simple correlation coefficient between x and y. Clearly 0 ≤ R 2 ≤ 1 , so a value of R2 closer to one
indicates the better fit and value of R2 closer to zero indicates the poor fit.

You might also like