0% found this document useful (0 votes)
22 views

Regression Analysis Material

regression analysis material

Uploaded by

iamfrommars761
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Regression Analysis Material

regression analysis material

Uploaded by

iamfrommars761
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Bivariate Linear Regression Analysis

Having known that two variables X and Y are correlated. We can find mathematical equation
of this relationship between two correlated variables is called regression equation of two
correlated variables. Out of the two variables, we one variable is called as dependent variable
and other as independent variable and we predict the value of dependent variable given the
value of independent variable with the help of regression equation.
Then there are two cases as below.
Case I) Y is dependent variable and X is independent variable then the regression
equation is written as Y = a +bx
Using this equation, we can predict or estimate the value of dependent variable Y
using value of dependent variable X.
It can be proved that for a given bivariate data set (X,Y) this equation is reduced to
y- 𝑦̅ = 𝑏𝑦𝑥 ( 𝑥 - 𝑥̅ ) Where 𝑏𝑦𝑥 is called as regression coefficient of Y on X.
Using this equation, we can’t predict X given Y as this equation contains 𝑏𝑦𝑥 which is
regression coefficient of Y on X and for predicting X given Y and we require equation
𝐶𝑜𝑣(𝑥,𝑦)
based on 𝑏𝑥𝑦 . It can be proved that 𝑏𝑦𝑥 = 𝑉(𝑋)
Case II) X is dependent variable and Y is independent variable, then regression equation
is written as X = a’ + b’ Y using which we predict or estimate value of X given value
of Y.
It can be proved that for a given data set (X, Y) this equation is reduced to
x- 𝑥̅ = 𝑏𝑥𝑦 ( 𝑦- 𝑦̅ ) Where 𝑏𝑥𝑦 is called as regression coefficient of X on Y.
but we can’t estimate Y using this equation as it do not contain 𝑏𝑦𝑥 . It can be proved
𝐶𝑜𝑣(𝑥,𝑦)
that 𝑏𝑦𝑥 = 𝑉(𝑋)
Hence, we require two regression equations one for estimating Y and another for estimating X.

Slope of the line :- for the line of the form Y = mx +c the slope is ‘c’ as its coefficient of X,

If the equation of the line is y- 𝑦̅ = 𝑏𝑦𝑥 ( 𝑥 - 𝑥̅ ) then slope of the line is 𝑏𝑦𝑥 let m1 denote slope of
this line . Hence m1 = 𝑏𝑦𝑥
But if equation of the line is x- 𝑥̅ = 𝑏𝑥𝑦 ( 𝑦- 𝑦̅ )
Then first put the equation in the form as y = mx +c which is as below.
𝑏𝑥𝑦 ( 𝑦- 𝑦̅ ) = x- 𝑥̅
1 1
𝑦- 𝑦̅ = 𝑏 (x- 𝑥̅ ) Hence slope of the line = coefficient of X = 𝑏𝑥𝑦
= m2
𝑥𝑦
𝑚 −𝑚2
Then acute angel 𝜃 between two regression lines is found as tan -1 | 1+1𝑚 |
1 𝑚2

Concept of Regression Line


For given bivariate data for n pairs as (Xi,Yi) we plot the scatter diagram. If a straight line
can be drawn through the points of a scatter diagram which is close to all the points of
scatter diagram as far as possible. Such a line is called a regression line.
The points which are plotted are denoted as (X,Y) . If Y is the dependent variable, we
consider Y as the observed variable and expect is to be on the line Y’ = a + bx and call this
point as Y’ as expected value of Y.
If the point is not on this line, we calculate error or residual as Y- Y’. Now this error can be
positive or negative hence sum of the errors can be zero but that does not mean all points
are on line and value of each error is zero. Hence we consider sum of squares of errors
denoted as ∑ei2 where ei =Yi -Yi’ = observed value of Y – expected value of Y
= Y- a-bx
The difference between observed value(Y) and expected value (Y’) of dependent variable
is called error or residual. These errors cannot be predicted so they are random in nature.
Errors are also called as residuals as seen in diagram below.
The line of regression of Y on X is as shown below.

The equation of regression line is derived minimising this sum of


squares of errors.
𝑪𝒐𝒗(𝒙,𝒚)
To derive the value of byx as 𝒃𝒚𝒙 = 𝑽(𝑿)

Proof:-
Consider a sample of n pairs of observations (Xi, Yi) on the variables X and Y,i=1,2…n
We consider Y as dependent variable hence equation of the line of regression of Y on X is
Y = a + bX , Where Y is dependent variable and X is independent variable. Where Y is the
observed value of variable Y and Y’ is expected value of variable Y
Hence error component 𝑒𝑖 =Y- Y’ = observed value of Y – expected value of Y
The equation can be written as: Yi = a + bXi + 𝑒𝑖 , i= 1,2,….n where 𝑒𝑖 are errors taking Y as
dependent variable
Let D = ∑ei2 then to obtain the values of a and b such that D is minimum is called fitting of
line Y = a +bx to given bivariate data (Xi, Yi) , i= 1,2,…..n
Let D = ∑ei2 =∑ ( yi- a –bxi)2 or D = ∑ei2 =∑ ( yi- a –bxi)2
Using the method of least squares we fit a straight line Y=a+bX to given bivariate data.
We apply the principle of minima to find values of a and b such as D is minimum.
Conditions are :-
1) to find value ‘a’ such that it satisfies two conditions
∂D ∂2 D
1st condition is = 0 and 2nd condition is >0
∂a ∂a2
∂D ∂2 D
2) to find value of b should be such that it satisfies two conditions 1st ∂b
= 0 and 2nd ∂b2
>0

To find a such that 1)


∂D
∂a
= 0 => 2∑𝑛𝑖=1 ( yi- a –b𝑥𝑖 )(2-1) (-1)= 0 => ∑𝑛𝑖=1 ( yi- a –b𝑥𝑖 ) = 0 for i = 1,2,….n

So we get ∑𝑛𝑖=1 𝑌𝑖 = ∑ a +∑𝑛𝑖=1 b𝑋𝑖 or ∑ 𝑌= na + b ∑ 𝑋 ……..(1) 1st normal equation


∂2 D ∂2 D ∂ ∂D ∂
Now consider ∂a2
, ∂a2
= ∂a ( ∂a ) = ∂a
(2∑𝑛𝑖=1 (𝑦𝑖 - a –bxi) (-1))= 2n >0

Value of a obtained by equation(1) will minimize D.


Hence first normal equation is ∑y = n a + b∑ 𝑥
Similarly we find value of b minimising value of D using the method of least squares and
get 2nd normal equation as ∑ 𝑥𝑦= a∑x+b∑x2
∂D
So consider ∂b
= 0 => 2∑ (𝑦𝑖 - a –b𝑥𝑖 ) (-xi)= 0 for i = 1,2,….n

= ∑𝑛𝑖=1(𝑦𝑖 ) − a – b𝑥𝑖 ))(𝑥𝑖 ) = 0 => ∑𝑛𝑖=1(𝑥𝑖 ) yi − a xi – b𝑥𝑖 )𝑥𝑖 )) = 0

=> ∑𝑛𝑖=1 𝑥𝑖 yi - a ∑𝑛𝑖=1 𝑥𝑖 )- b ∑𝑛𝑖=1 xi 2 = 0 ..... (2) is 2nd normal equation

Which can also be written as ∑XY = a∑X+ b∑X2 …….…(2)


∂2 D ∂ ∂D ∂
∂b2
= ( )
∂b ∂b
= ∂b (2∑ (𝑦𝑖 - a –b𝑥𝑖 )) (-𝑥𝑖 ) for i = 1,2,….n

= (-2) ∑ ( 0 - 0 –xi2) = (2) ∑ xi2 > 0 as it is a positive term. Value of b obtained by equation (2)
will minimize D.

Second normal equation is ∑𝒙𝒚 = a∑𝒙+ b∑𝒙𝟐


Now we have two unknowns a and b hence two normal equations 1 and 2 as below
∑ 𝒚= na + b ∑𝒙 ………(1)

∑𝒙𝒚 = a∑𝒙+ b∑𝒙𝟐 ..........(2)


To solve the normal simultaneously and get values of a and b which will minimize D
∑𝒚 b ∑𝒙
Dividing both sides of equation (1) by n we get, 𝑛
=a+ 𝑛
=> 𝑦̅ = a + b 𝑥̅

Hence a = 𝑦̅ - b 𝑥̅ ……..(a)
So substituting the value of a in equation (2) we get

∑ 𝒙𝒚 = (𝑦̅ - b 𝑥̅ ) ∑𝒙 + b ∑𝒙𝟐

Now divide both the sides of equation by n we get


∑𝒙𝒚 = ∑𝒙 ∑𝒙 ∑ 𝑋2
𝑛
= 𝑦̅ 𝑛
- b𝑥̅ 𝑛
+b 𝑛
=

∑𝒙𝒚 = ∑ 𝑋2
𝑛
= 𝑦̅ 𝑥̅ + b 𝑛
- b𝑥̅ 𝑥̅
∑𝒙𝒚 1
𝑛
- 𝑥̅ 𝑦̅ = b[ 𝑛 ∑𝒙𝟐 − 𝑥̅ 2 ]
1 ∑𝒙𝒚
b[ 𝑛
∑𝒙𝟐 − 𝑥̅ 2 ] = 𝑛
- 𝑥̅ 𝑦̅
∑𝒙𝒚
− 𝑥̅ 𝑦̅ 𝐶𝑜𝑣(𝑥,𝑦)
b= 𝑛
1 = = 𝑏𝑦𝑥 hence the proof
∑𝒙𝟐 −𝑥̅ 2 𝑉(𝑋)
𝑛

𝐶𝑜𝑣(𝑥,𝑦)
since it is regression of y on x we call b as hence 𝑏𝑦𝑥 = 𝑉(𝑋)
…..(b)

To get the equation of regression line of Y on X substitute values of a and b into equation

y = a+bx
𝐶𝑜𝑣(𝑥,𝑦)
we have a = 𝑦̅ - b 𝑥̅ and b= 𝑉(𝑋)
= 𝑏𝑦𝑥

y = ( 𝑦̅ - b 𝑥̅ ) + 𝑏𝑦𝑥 x ……. Using (a) and (b)

y- 𝑦̅ = ( - 𝑏𝑦𝑥 𝑥̅ ) + 𝑏𝑦𝑥 x …. As b = 𝑏𝑦𝑥

y- 𝑦̅ = 𝑏𝑦𝑥 ( 𝑥 - 𝑥̅ ) is regression equation of Y on X where Y is dependent variable.

We study Bivariate Linear regression or Simple Linear Regression means regression


equation for simultaneous study of two variables of unit powers as X and Y where we do not
consider X2,X3 as powers of X also do not consider Y2,Y3 as powers of Y.
Linear -> Only Single power of X and Y is considered ie etc and are not considered.
Also this line of regression is close to all the points as far as possible. Regression equation
establishes a linear mathematical relationship between the two variables using which we
used to estimate the values of the dependent variables from the values of the independent
variables.

Coefficient of Determination :- In fitting the line Y= a+ bx for data (Xi,Yi) for i=


1,2,3…n with Y as dependent variable. We calculate total variation for each observed
value of Y as ∑𝑛𝑖=1(𝑦 − 𝑦̅)2
Total variation between y and 𝑦̅ = ∑𝑛𝑖=1(𝑦 − 𝑦̅)2 = ∑𝑛𝑖=1(𝑦 − 𝑦 ′ )2 +∑𝑛𝑖=1(𝑦 ′ − 𝑦̅)2
where y’ is point on the line which is expected value of y (the dependent variable)

∑𝑛𝑖=1(𝑦 ′ − 𝑦̅)2 is variation in y which explained by variation in independent


variable X. This type of variation is called as explained variation and
∑𝑛𝑖=1(𝑦 − 𝑦 ′ )2 is variation which is not explained so it is called as not explained
variation.
Total variation = Unexplained variation + Explained variation

unexplained variation Explained variation


1 = +
Total variation Total variation

unexplained variation
1= + r2 (coefficient of determination)
Total variation
There are two cases
i) r2 =1 , This happens if unexplained variation is zero then we get
1 = 0+ r2

Which means all the variation in Y is explained by variation in X.


Indicates all the points are on the line means no error in Y
Hence It is said the line Y= a +bX is best fit to the given data.
ii) if r2 =0 then total variation in Y is unexplained variation
Hence line fitted to the data is not line of best fit as points are not on the
line
Also it means error component is more such that it is not explained by
explanatory( Independent) variable X
Example :-If r2 = 0.99 then 99% of variation in Y the dependent variable is
explained by independent variable.

Properties of Regression Coefficients


1. A regression coefficient is an absolute measure. 𝑏𝑦𝑥 and 𝑏𝑥𝑦 are absolute measures

2. Def :-𝑏𝑦𝑥 is regression coefficient of Y on X it is defined as rate of change in y per unit

increase in X. It is slope of regression line of Y on X


𝑏𝑥𝑦 is rate of change in X per unit increase in Y,
1
𝑏𝑥𝑦
is the slope of line of regression of X on Y is.
so regression coefficient is the rate of change in the value of the dependent variable
with respect to unit increase in the value of the independent variable.
3.Regression coefficients are affected by change of scale but independent of shift of origin.
Proof:-consider n pairs (xi, yi) ; i = 1, 2, … n ,of variables (X,Y), which is sample of size n
drawn from bivariate population. Define standard deviation for X and Y also covariance
between X and Y.
̅̅̅2
∑f(x− x) ̅̅̅2
∑f(y− y)
Variance (x)= V(x) = n=∑f
, Variance (y)= V(y) = n=∑f
for frequency data

∑f(u− ̅̅̅
u)2 ∑f(v− ̅̅̅
v)2
Variance (u)= V(u) = n=∑f
, Variance (v)= V(v) = n=∑f
for frequency data

COV(X,Y)
Regression coefficient of Y and X is 𝑏𝑦𝑥 = V(X)

COV(X,Y)
Regression coefficient of X on Y is 𝑏𝑥𝑦 = V(Y)

̅ )(𝒚−𝒚
∑(𝒙−𝒙 ̅ ) ̅ )(𝒗−𝒗
∑(𝒖−𝒖 ̅ )
covariance between X and Y is COV(X,Y) = = 𝒏
also COV(u,v) = 𝒏

For frequency data ui = (xi - a)/c ; c ≠ 0, i = 1, 2,… n


xi = a + cui ie x= a +cu -----(1) ; i =1, 2,… n

f ixi = afi + cfiui


∑fixi a∑fi
∑fixi = a∑fi + c∑fiui hence ∑fi
= ∑fi
c∑fiui
+ ∑fi
𝑥̅ = 𝑎 + 𝑐𝑢̅ ………2
For frequency data vi = (yi - b)/d ; d ≠ 0, i = 1, 2,… n

yi = b + dvi ie y= b +dv -----(1’) ; i =1, 2,… n


f iyi = bfi + dfivi
∑fiyi b∑fi d∑fivi
∑fiyi = b∑fi + d∑fivi hence = +
∑fi ∑fi ∑fi

𝑦̅ = 𝑏 + 𝑑𝑣̅ ….. 2’

So ( 𝑥 − ̅̅̅̅ ̅̅̅ => ( 𝑥 − ̅̅̅̅


𝑥 ) = a + cu − (𝑎 + 𝑐𝑢) ̅̅̅
𝑥 ) = c (u − 𝑢)
̅𝟐 ∑𝐟(𝐮− ̅̅̅
𝐮)𝟐
𝑓( 𝑥 − ̅̅̅̅ ̅̅̅̅ 2=> ∑𝐟(𝐱− 𝐱) =
𝑥 ) 2 = c2 f (u − 𝑢) c2
𝐧=∑𝐟 𝐧=∑𝐟

hence v(x) = c2 V(u) …….(1)

( 𝑦 − ̅̅̅̅ ̅̅̅ => ( 𝑦 − 𝑦)


𝑦 ) = a + dv − (𝑏 + 𝑑𝑣) ̅̅̅ = d (v − 𝑣)
̅̅̅
̅̅̅𝟐 ∑𝐟(𝐯− ̅̅̅
𝐯)𝟐
(𝑦−𝑦 ̅̅̅̅ 2=> ∑𝐟(𝐲− 𝐲) =
̅̅̅̅) 2 = d2 f (v − 𝑣) d2
𝐧=∑𝐟 𝐧=∑𝐟

v(y) = d2 V(v) …….(2)


Now ∑( 𝑥 − ̅̅̅̅ ̅̅̅
𝑥 ) = c (u − 𝑢)
̅̅̅ = d (v − 𝑣)
And ( 𝑦 − 𝑦) ̅̅̅ so ( 𝑥 − ̅̅̅̅
𝑥 ) ( 𝑦 − ̅̅̅ ̅̅̅ d (v − 𝑣)
𝑦) = c (u − 𝑢) ̅̅̅
̅ )(𝒚−𝒚
∑(𝒙−𝒙 ̅ ) ̅̅̅ (v−𝑣)
cd (u−𝑢) ̅̅̅
cov(x,y) = =
𝒏 𝒏
̅̅̅̅ (v−𝑣)
cd (u−𝑢) ̅̅̅
𝒄𝒐𝒗(𝒙,𝒚) d cov (u,v) d
𝒃yx = = 𝒏
= = bvu
𝒗(𝒙) 𝑐 2 V(u) c V(u) c

̅̅̅̅ (v−𝑣)
cd (u−𝑢) ̅̅̅
𝒄𝒐𝒗(𝒙,𝒚) c cov (u,v) c
𝒃xy= == 𝒏
= = d buv
𝒗(𝒚) 𝑑2 V(v) d V(v)

so regression coefficients are not affected by shift of origin but


affected(dependent) on change of scale.
Hence the proof
4. All the three measure have same sign positive or negative as it depends on
covariance between X and Y.
• For bivariate data(X,Y) ,the algebraic sign of the regression
coefficients and correlation coefficient between the two variables is
same as it is determined by sign of covariance between X and Y
means if the regression coefficients (byx and bxy) are positive(negative)
then sign of correlation coefficient(r) is also positive(negative).
So we can say , all three measures will have positive sign or all three are
negative.
• If regression coefficients have positive(negative) value and
correlation coefficient is having negative(positive) value then data is
called inconsistent. Or if one of the coefficient is positive(negative)
then other also should be positive(negative) then also data is called as
inconsistent.

5. Relationship between correlation coefficient and regression coefficients.


To prove r2 = 𝑏𝑦𝑥 x 𝑏𝑥𝑦 and 𝑏𝑦𝑥 x 𝑏𝑥𝑦 ≤ 1
𝐶𝑜𝑣(𝑋,𝑌)
Proof :- Regression coefficient of Y on X( 𝑏𝑥𝑦 )= 𝜎𝑥2
𝐶𝑜𝑣(𝑋,𝑌)
Regression coefficient of X on Y(𝑏𝑥𝑦 ) =
𝜎𝑦2

Cov(X,Y)
Correlation coefficient by karl Pearson’s (r )=
σy σx
2
𝐶𝑜𝑣(𝑋,𝑌) 𝐶𝑜𝑣(𝑋,𝑌) 𝐶𝑜𝑣(𝑋,𝑌)
Consider 𝑏𝑦𝑥 𝑏𝑥𝑦 = =[ ] = r2 and r2 ≤ 1
𝜎𝑥2 𝜎𝑦2 𝜎𝑦 𝜎𝑥

Hence 𝑏𝑦𝑥 𝑏𝑥𝑦 ≤ 1

Hence r = ± √𝑏𝑦𝑥 𝑏𝑥𝑦 hence the proof

r = + √𝑏𝑦𝑥 𝑏𝑥𝑦 if both 𝑏𝑦𝑥 and 𝑏𝑥𝑦 are positive,

= - √𝑏𝑦𝑥 𝑏𝑥𝑦 if both 𝑏𝑦𝑥 and 𝑏𝑥𝑦 are negative,


Eg:-In each of the following cases obtain the value of correlation
coefficient r, and comment on it.
Ans :- We have correlation coefficient r = ± √𝑏𝑦𝑥 𝑏𝑥𝑦

i) 𝑏𝑦𝑥 =5/3 , 𝑏𝑥𝑦 =3/5


5 3
r=+√ 𝑥 if both 𝑏𝑦𝑥 and 𝑏𝑥𝑦 are positive
3 5
=+1
Comment : X and Y are perfect positively correlated or X and y show
perfect positive correlation.
ii) 𝑏𝑦𝑥 = 2/3 , 𝑏𝑥𝑦 = 0
2
r = + √ 𝑥 0 if both 𝑏𝑦𝑥 and 𝑏𝑥𝑦 are positive
3
=0
Comment:- Data is inconsistent as if one of the regression coefficient is
zero other also should be zero as it is because of cov(x,y) =0
iii) 𝑏𝑦𝑥 =3/4 , 𝑏𝑥𝑦 = 1/5
3 1
r=+√ 𝑥 if both 𝑏𝑦𝑥 and 𝑏𝑥𝑦 are positive
4 5
= 0.38
X and Y show weak correlation.
iv) 𝑏𝑦𝑥 =8/7 , 𝑏𝑥𝑦 = 5/4
8 5
r=+√ 𝑥 if both 𝑏𝑦𝑥 and 𝑏𝑥𝑦 are positive
7 4
= 1.19 >1 range of r is -1 < r < 1
Hence data is inconsistent.
v) 𝑏𝑦𝑥 = -3/4 , 𝑏𝑥𝑦 = 1/5 here data is inconsistent as 𝑏𝑦𝑥
is negative but 𝑏𝑥𝑦 is not negative as this negative sign
will be because of cov(x,y)
vi) 𝑏𝑦𝑥 = -3/4 , 𝑏𝑥𝑦 = -1/5

3 1
r=-√ 𝑥 if both 𝑏𝑦𝑥 and 𝑏𝑥𝑦 are negative
4 5

= -0.38 is shows weak negative correlation between X and Y


𝜎𝑥 𝜎𝑦
6. To prove that 𝑏𝑥𝑦 =r and 𝑏𝑦𝑥 =r
𝜎𝑦 𝜎𝑥

̅𝟐
∑𝐟(𝐱− 𝐱)
Define sd of x as (𝜎𝑥 ) =+ √ 𝐧=∑𝐟
𝟐
̅
∑𝐟(𝐲− 𝐲)
sd of x as (𝜎𝑦 )=+√ 𝐧=∑𝐟
𝐶𝑜𝑣(𝑋,𝑌)
correlation coefficient (r ) = 𝜎𝑥 𝜎𝑦

𝐶𝑜𝑣(𝑋,𝑌)
Proof :- Consider 𝑏𝑥𝑦 = 𝜎𝑦2

𝐶𝑜𝑣(𝑋,𝑌) 𝜎𝑥 𝐶𝑜𝑣(𝑋,𝑌) 𝜎𝑥
= =
𝜎𝑦2 𝜎𝑥 𝜎𝑥 𝜎𝑦 𝜎𝑦

𝝈𝒙
𝒃𝒙𝒚 = r 𝝈𝒚

𝐶𝑜𝑣(𝑋,𝑌)
Consider 𝑏𝑦𝑥 =
𝜎𝑥2
𝐶𝑜𝑣(𝑋,𝑌) 𝜎𝑦
= 𝜎𝑥2 𝜎𝑦

𝐶𝑜𝑣(𝑋,𝑌) 𝜎𝑦
= 𝜎𝑥 𝜎𝑦 𝜎𝑥
𝜎𝑦
𝑏𝑦𝑥 = r hence the proof
𝜎𝑥
7. If one of the regression coefficients is greater than unity, the other must be
less than unity. Means if 𝑏𝑦𝑥 > 1 then 𝑏𝑥𝑦 < 1 and if 𝑏𝑥𝑦 > 1 then 𝑏𝑦𝑥 < 1
But the converse need not be true. Means if 𝑏𝑦𝑥 < 1 then 𝑏𝑥𝑦 need not be >1
or if 𝑏𝑥𝑦 < 1 then 𝑏𝑦𝑥 need not be >1

𝐶𝑜𝑣(𝑋,𝑌)
Proof :- We have 𝑏𝑦𝑥 = = Regression coefficient of Y on X,
𝜎𝑥2
𝐶𝑜𝑣(𝑋,𝑌)
𝑏𝑥𝑦 = = Regression coefficient of X on Y
𝜎𝑦2
𝐶𝑜𝑣(𝑋,𝑌)
𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 (𝑟) =
𝜎𝑦 𝜎𝑥

a) To prove that if 𝑏𝑥𝑦 > 1 then 𝑏𝑦𝑥 < 1


1
Given that 𝑏𝑥𝑦 > 1 we get <1 …….[1]
𝑏𝑥𝑦

we know r2 ≤ 1,

r2 = 𝑏𝑦𝑥 . 𝑏𝑥𝑦
 𝑏𝑦𝑥 . 𝑏𝑥𝑦 ≤ 1
 𝑏𝑦𝑥 ≤ 1/𝑏𝑥𝑦 …. dividing both sides of equation by 𝑏𝑥𝑦 …... [2]

𝑏𝑦𝑥 ≤ 𝑏
1
𝑥𝑦
<1 … From [1]
 𝑏𝑦𝑥 < 1 if 𝑏𝑥𝑦 > 1
b) To prove if 𝑏𝑦𝑥 > 1 then 𝑏𝑥𝑦 < 1
1
if 𝑏𝑦𝑥 > 1, => < 1 ……..[1’]
𝑏𝑦𝑥

Now r2 = 𝑏𝑦𝑥 . 𝑏𝑥𝑦


hence 𝑏𝑦𝑥 𝑏𝑥𝑦 ≤ 1 as r2 ≤ 1,

𝑏𝑥𝑦 ≤
1
𝑏𝑦𝑥
…… divide both sides of equation by 𝑏𝑦𝑥 .[2’]
1
Hence we get 𝑏𝑥𝑦 ≤ < 1 using [1’]
𝑏𝑦𝑥

So if 𝑏𝑦𝑥 > 1 𝑡ℎ𝑒𝑛 𝑏𝑥𝑦 < 1


8.The arithmetic mean of the regression coefficients is always greater than or
equal to the correlation coefficient. i.e. ½ (bxy + byx) > r
Proof: Consider (σx – σy)2 ≥ 0
i.e. σx2 - 2σxσy + σy2 ≥ 0
i.e. σx2 + σy2 ≥ 2σxσy
2 𝜎2
𝜎𝑥+ 𝑦
=> ≥1
2 𝜎𝑥 𝜎𝑦

multiply both the sides of equation by r we get


2 𝜎2
𝜎𝑥+ 𝑦
r ≥𝑟
2 𝜎𝑥 𝜎𝑦
1 𝜎𝑦 𝜎𝑥
[r +r ]≥𝑟
2 𝜎𝑥 𝜎𝑦

1
[ 𝑏𝑦𝑥 + 𝑏𝑥𝑦 ] ≥ 𝑟
2

Properties of Regression Lines :


1.If r = +1 , r =-1,there is perfect correlation between two variables X and Y
𝜎𝑥 𝜎𝑦
a) 𝑏𝑥𝑦 = and 𝑏𝑦𝑥 = if r = +1
𝜎𝑦 𝜎𝑥
So first regression lines become x- 𝑥̅ =𝑏𝑦𝑥 ( y- 𝑦̅ )
𝜎𝑥
=> (x- 𝑥̅ )= ( y- 𝑦̅ )
𝜎𝑦
𝜎𝑦 (x- 𝑥̅ )= 𝜎𝑥 ( y- 𝑦̅ ) ……(1)

And another regression line is ( y- 𝑦̅ ) = 𝑏𝑦𝑥 (x- 𝑥̅ )


𝜎𝑦
=> ( y- 𝑦̅ ) =
𝜎𝑥
(x- 𝑥̅ )
𝜎𝑥 ( y- 𝑦̅ ) = 𝜎𝑦 (x- 𝑥̅ ) ……(2)

−𝜎𝑥 −𝜎𝑦
𝑏) 𝑏𝑥𝑦 = and 𝑏𝑦𝑥 = if r = -1
𝜎𝑦 𝜎𝑥
So first regression lines become x- 𝑥̅ =𝑏𝑦𝑥 ( y- 𝑦̅ )
− 𝜎𝑥
=> (x- 𝑥̅ )= ( y- 𝑦̅ )
𝜎𝑦

𝜎𝑦 (x- 𝑥̅ )= −𝜎𝑥 ( y- 𝑦̅ ) ……(1’)


And another regression line is ( y- 𝑦̅ ) = 𝑏𝑦𝑥 (x- 𝑥̅ )
𝜎𝑦
=> ( y- 𝑦̅ ) = −
𝜎𝑥
(x- 𝑥̅ )
−𝜎𝑥 ( y- 𝑦̅ ) = 𝜎𝑦 (x- 𝑥̅ ) ……(2’)
That means lines are identical or same when r = ±1
2. if r =0 then lines have equation y= 𝑦̅ and x = 𝑥̅ will be the regression
equations. So angle between the lines is 900.

point ( 𝑥̅ , 𝑦̅) lies on both the regression lines so it is the point of


intersection of both the lines. It can be observed that both these lines pass
through the point (𝑥̅ , 𝑦̅).

3. Smaller the angle between the regression lines, higher value of the
correlation (r) between the variables X and Y

when r = ±1 the regression lines are same means for maximum values
of r then angle between lines is zero(same lines)

but if r =0 (minimum value of r ) the regression lines are perpendicular to


each other means angle between the lines is 900.
4. To find the acute angle between the lines of regression:-
Consider slopes of two regression lines. Let 𝑚1 and 𝑚2 denote the
slopes of two regression lines then acute angel 𝜃 between then is found
𝑚 −𝑚
as tan -1 | 1 2 |
1+ 𝑚1 𝑚2
1
𝑏𝑦𝑥 −( ) 𝑏𝑥𝑦 𝑏𝑦𝑥 −1
If 𝑚1 = 𝑏𝑦𝑥 , 𝑚2 =
1
𝑏𝑥𝑦
then 𝜃 = tan -1 |
𝑏𝑥𝑦
1+ 𝑏𝑦𝑥 / 𝑏𝑥𝑦
| = tan -1 |
𝑏𝑥𝑦 + 𝑏𝑦𝑥
|

**********

You might also like