Correlation Regression Theory
Correlation Regression Theory
Introduction .
“If it is proved true that in a large number of instances two variables tend always to fluctuate in the same or in opposite
directions, we consider that the fact is established and that a relationship exists. This relationship is called correlation.”
(1) Univariate distribution : These are the distributions in which there is only one variable such as the heights of the
students of a class.
(2) Bivariate distribution : Distribution involving two discrete variable is called a bivariate distribution. For example,
the heights and the weights of the students of a class in a school.
(3) Bivariate frequency distribution : Let x and y be two variables. Suppose x takes the values x 1 , x 2 ,....., x n and
y takes the values y1 , y 2 ,....., y n , then we record our observations in the form of ordered pairs (x 1 , y 1 ) , where
1 i n,1 j n . If a certain pair occurs fij times, we say that its frequency is fij .
The function which assigns the frequencies fij ’s to the pairs (x i , y j ) is known as a bivariate frequency distribution.
Example: 1 The following table shows the frequency distribution of age (x) and weight (y) of a group of 60 individuals
x (yrs)
40 – 45 45 – 50 50 – 55 55 – 60 60 – 65
y (yrs.)
45 – 50 2 5 8 3 0
50 – 55 1 3 6 10 2
55 – 60 0 2 5 12 1
Then find the marginal frequency distribution for x and y.
Solution: Marginal frequency distribution for x
x 40 – 45 45 – 50 50 – 55 55 – 60 60 – 65
f 3 10 19 25 3
Marginal frequency distribution for y
y 45 – 50 50 – 55 55 – 60
f 18 22 20
Covariance .
Let (x 1 , x i ); i 1, 2,....., n be a bivariate distribution, where x 1 , x 2 ,....., x n are the values of variable x and
y 1 , y 2 ,....., y n those of y. Then the covariance Cov (x, y) between x and y is given by
1 n 1 n 1 n 1 n
Cov (x , y )
n i1
(x i x )(y i y ) or Cov ( x , y )
n i1
( x i y i x y ) where, x
n i1
x i and y
n
y
i1
i are means
1 15 40
(110 ) 22 3 8 2 .
5 5 5
dxdy n
dx . dy
(3) Modified formula : r , where dx x x ; dy y y
2
dx dy dy
2
2
2
dx
n
n
Cov ( x , y ) Cov ( x , y )
Also rxy .
x y var( x ). var( y )
Example: 3 For the data
x: 4 7 8 3 4
y: 5 8 6 3 5
The Karl Pearson’s coefficient is
63 63 63
(a) (b) 63 (c) (d)
94 66 94 66
Solution: (a) Take A 5, B 5
xi yi ui x i 5 vi yi 5 u i2 v i2 u iv i
4 5 –1 0 1 0 0
7 8 2 3 9 9 6
8 6 3 1 1 1 3
3 3 –2 –2 4 4 4
4 5 –1 0 0 0 0
Total
u i 1 v i 2 u 2
i 19 v 2
i 14
u v i i 13
1 12
r(x , y )
u v n u v i i i i
13
5
63
.
u n u v n v
1 1 2 2 2 2 94 66
2 1 2
i i i i 19 14
5 5
Solution: (a) x
x 5 50 n 10 .
n n
Cov ( x , y )
xy x .y 350 (5)(6) = 5.
n 10
Cov (x , y ) 5 5
r(x , y ) = .
x . y 4. 9 6
Example: 8 A, B, C, D are non-zero constants, such that
(i) both A and C are negative. (ii) A and C are of opposite sign.
If coefficient of correlation between x and y is r, then that between AX B and CY D is
A A
(a) r (b) – r (c) r (d) r
C C
Solution : (a,b) (i) Both A and C are negative.
Now Cov ( AX B, CY D) AC Cov .( X , Y )
AX B | A | x and CY D | C | y
AC .Cov ( X , Y ) AC
Hence ( AX B, CY D) = ( X , Y ) = ( X , Y ) r, ( AC 0 )
(| A | x )(| C | y ) | AC |
AC
(ii) ( AX B, CY D) = ( X , Y ) , ( AC 0 )
| AC |
AC
= ( X , Y ) = ( X , Y ) r .
AC
Rank Correlation .
Let us suppose that a group of n individuals is arranged in order of merit or proficiency in possession of two characteristics
A and B.
These rank in two characteristics will, in general, be different.
For example, if we consider the relation between intelligence and beauty, it is not necessary that a beautiful individual is
intelligent also.
Rank Correlation : 1
6 d , which is the Spearman's formulae for rank correlation coefficient.
n(n 2 1)
2
Where d = sum of the squares of the difference of two ranks and n is the number of pairs of observations.
r(x , y )
u v 1n u . v
i i i i
, where u i x i A, v i y i B .
u n u v n v
21 1 2
2
2
i i i i
Example: 9 Two numbers within the bracket denote the ranks of 10 students of a class in two subjects
(1, 10), (2, 9), (3, 8), (4, 7), (5, 6), (6, 5), (7, 4), (8, 3), (9, 2), (10, 1). The rank of correlation coefficient is
(a) 0 (b) – 1 (c) 1 (d) 0.5
2
i 1
i
2
i 1
2
i (n 1)2 4 x i (n 1)]
n n n
n(n 1)(2n 1) n(n 1)
i 1
di 4
2
i 1
2
x i (n)(n 1)2 4 (n 1) x
i 1
i = 4
6
(n)(n 1)2 4 (n 1)
2
n 2
n(n 1)
d
i 1
i
2
3
.
r 1
6 d 2
2
i
1
6(n)(n 2 1)
i.e., r 1 .
n(n 1) 3(n)(n 2 1)
Regression
Linear Regression .
If a relation between two variates x and y exists, then the dots of the scatter diagram will more or less be concentrated
around a curve which is called the curve of regression. If this curve be a straight line, then it is known as line of regression and the
regression is called linear regression.
Line of regression: The line of regression is the straight line which in the least square sense gives the best fit to the given
frequency.
Equations of lines of Regression .
(1) Regression line of y on x : If value of x is known, then value of y can be found as
Cov ( x , y ) y
yy 2
( x x ) or y y r (x x )
x x
(2) Regression line of x on y : It estimates x for the given value of y as
Cov ( x , y )
xx 2
(y y ) or x x r x (y y )
y y
r y Cov ( x , y )
(3) Regression coefficient : (i) Regression coefficient of y on x is b yx
x x2
r x Cov ( x , y )
(ii) Regression coefficient of x on y is b xy .
y y2
Angle between Two lines of Regression .
Equation of the two lines of regression are y y b yx ( x x ) and x x b xy (y y )
y
We have, m 1 slope of the line of regression of y on x = b yx r.
x
1
m 2 Slope of line of regression of x on y = y
b xy r. x
PRATAP BHAWAN, BEHIND LEELA CINEMA, HAZRATGANJ, LUCKNOW.
PH.: 9953737836, 9838162263. e-mail. id: [email protected]. www.inpsclasses.com
y r y
m 2 m1 r x x ( y r 2 y ) x (1 r 2 ) x y
tan = .
1 m 1m 2 r y y r x2 r y2 r( x2 y2 )
1 .
x r x
Here the positive sign gives the acute angle , because r 2 1 and x , y are positive.
1 r 2 x y
tan . 2 .....(i)
r x y2
Note : If r 0 , from (i) we conclude tan or / 2 i.e., two regression lines are at right angels.
If r 1 , tan 0 i.e., 0 , since is acute i.e., two regression lines coincide.
Important points about Regression coefficients bxy and byx .
(1) r byx .b xy i.e . the coefficient of correlation is the geometric mean of the coefficient of regression.
(2) If b yx 1 , then b xy 1 i.e. if one of the regression coefficient is greater than unity, the other will be less than unity.
(3) If the correlation between the variable is not perfect, then the regression lines intersect at (x , y ) .
1
(4) b yx is called the slope of regression line y on x and is called the slope of regression line x on y.
b xy
(5) b yx b xy 2 byx b xy or b yx b xy 2r , i.e. the arithmetic mean of the regression coefficient is greater than the
correlation coefficient.
(6) Regression coefficients are independent of change of origin but not of scale.
y2
(7) The product of lines of regression’s gradients is given by .
x2
(8) If both the lines of regression coincide, then correlation will be perfect linear.
(9) If both b yx and b xy are positive, the r will be positive and if both b yx and b xy are negative, the r will be negative.
Important Tips
If r 0 , then tan is not defined i.e. . Thus the regression lines are perpendicular.
2
If r 1 or 1 , then tan = 0 i.e. = 0. Thus the regression lines are coincident.
bc d ad b
If regression lines are y ax b and x cy d , then x and y .
1 ac 1 ac
1 1
If b , b and r 0 then
yx xy
(b xy b yx ) r and if b , b and r 0 then (b xy b yx ) r .
xy yx
2 2
Correlation measures the relationship between variables while regression measures only the cause and effect of relationship between the variables.
If line of regression of y on x makes an angle , with the +ive direction of X-axis, then tan byx .
If line of regression of x on y makes an angle , with the +ive direction of X-axis, then cot b xy .
Example : 11 The two lines of regression are 2 x 7 y 6 0 and 7 x 2 y 1 0 . The correlation coefficient between x and y is
(a) – 2/7 (b) 2/7 (c) 4/49 (d) None of these
Solution: (b) The two lines of regression are 2 x 7 y 6 0 .....(i) and 7 x 2 y 1 0 ......(ii)
If (i) is regression equation of y on x, then (ii) is regression equation of x on y.
2 6 2 1
We write these as y x and x y
7 7 7 7
Cov ( x , y ) 35
byx = – 3.3.
Var ( x ) 10 . 6
Example: 14 If two lines of regression are 8 x 10 y 66 0 and 40 x 18 y 214 , then (x , y ) is
(a) (17, 13) (b) (13, 17) (c) (– 17, 13) (d) (– 13, – 17)
Solution: (b) Since lines of regression pass through (x , y ) , hence the equation will be 8 x 10 y 66 0 and 40 x 18 y 214
On solving the above equations, we get the required answer x 13 , y 17 .
2 4
Example: 15 The regression coefficient of y on x is and of x on y is . If the acute angle between the regression line is , then tan
3 3
1 1 2
(a) (b) (c) (d) None of these
18 9 9
1 4 3
b xy
2 4 byx 1
Solution: (a) b yx , b xy . Therefore, tan = 3 2 .
3 3 b xy 4/3 18
1 1
b yx 2/3
Example: 16 If the lines of regression of y on x and x on y make angles 30 o and 60 o respectively with the positive direction of X-axis, then the
correlation coefficient between x and y is
1 1
(a) (b)
2 2
1 1
(c) (d)
3 3
1
Solution: (c) Slope of regression line of y on x = b yx tan 30 o
3
1
Slope of regression line of x on y = tan 60 o 3
b xy
1 1 1 1
b xy . Hence, r b xy .byx .
3 3 3 3
Example: 17 If two random variables x and y, are connected by relationship 2 x y 3 , then rxy
(a) 1 (b) – 1 (c) – 2 (d) 3
Solution: (b) Since 2 x y 3
2 x y 3 ; y y 2( x x ) . So, byx 2
1 1
Also x x (y y ) , b xy
2 2
PRATAP BHAWAN, BEHIND LEELA CINEMA, HAZRATGANJ, LUCKNOW.
PH.: 9953737836, 9838162263. e-mail. id: [email protected]. www.inpsclasses.com
1
rxy2 b yx .b xy (2) 1 rxy 1 . ( both byx , b xy are –ive)
2
Solution: (a) Sy y 1 r2 y 1 1 0 .