Unit 6 Correlation Coefficient: Structure
Unit 6 Correlation Coefficient: Structure
Coefficient
Structure
6.1 Introduction
Objectives
6.2 Concept and Definition of Correlation
6.3 Types of Correlation
6.4 Scatter Diagram
6.5 Coefficient of Correlation
Assumptions for Correlation Coefficient
6.6 Properties of Correlation Coefficient
6.7 Short-cut Method for the Calculation of Correlation Coefficient
6.8 Correlation Coefficient in Case of Bivariate Frequency Distribution
6.9 Summary
6.10 Solutions / Answers
6.1 INTRODUCTION
In Block 1, you have studied the various measures such as measures of central
tendency, measures of dispersion, moments, skewness and kurtosis which
analyse variables separately. But in many situations we are interested in
analysing two variables together to study the relationship between them. In
this unit, you will learn about the correlation, which studies the linear
relationship between the two or more variables. You would be able to
calculate correlation coefficient in different situations with its properties. Thus
before starting this unit you are advised to go through the arithmetic mean and
variance that would be helpful in understanding the concept of correlation.
In Section 6.2, the concept of correlation is discussed with examples, that
describes the situations, where there would be need of correlation study.
Section 6.3 describes the types of correlation. Scatter diagrams which give an
idea about the existence of correlation between two variables is explained in
Section 6.4. Definition of correlation coefficient and its calculation procedure
are discussed in Section 6.5. In this unit, some problems are given which
illustrate the computation of the correlation coefficient in different situations
as well as by different methods. Some properties of correlation coefficient
with their proof are also given. In Section 6.6 the properties of the correlation
coefficient are described whereas the shortcut method for the calculation of
the correlation coefficient is explained in Section 6.7. In Section 6.8 the
method of calculation of correlation coefficient in case of bivariate frequency
distribution is explored.
Objectives
After reading this unit, you would be able to
describe the concept of correlation;
explore the types of correlation;
describe the scatter diagram; 25
Correlation for Bivariate Data interpret the correlation from scatter diagram;
define correlation coefficient;
describe the properties of correlation coefficient; and
calculate the correlation coefficient.
Similarly,
V ( y) the variance of Y is defined by
1 n
V ( y) ( y i y) 2
n i 1
where, n is number of paired observations.
Then, the correlation coefficient “r” may be defined as:
30
1 n Correlation
( x i x )( yi y)
n i1 Coefficient
r Corr( x , y) … (2)
1 n 1 n
( x i x ) 2 ( y i y) 2
n i1 n i1
Karl Pearson’s correlation coefficient r is also called product moment
correlation coefficient. Expression in equation (2) can be simplified in various
forms. Some of them are
n
(x
i 1
i x )( y i y)
r … (3)
n n
( x i x ) 2 ( y i y) 2
i1 i1
or
1 n
x i yi x y
n i 1
r … (4)
n 2 n 2
x i y i
i1
x 2 i 1 y2
n n
or
n
x y
i 1
i i nx y
r … (5)
n 2 2
n
2 2
i x n x y i ny
i1 i1
or
n n n
n x i y i x i yi
i 1 i 1 i 1
r … (6)
2 2
n
2 2
n
n n
i
n x i i
x n y y i
i1 i1 i1 i 1
32 1 r 1
If r 1 , the correlation is perfect positive and if r 1 correlation is perfect Correlation
negative. Coefficient
Advertisement expenditure 30 44 45 43 34 44
Profit 56 55 60 64 62 63
(x
i 1
i x )( y i y )
r Corr( x , y)
n n
(x i x ) 2 ( y i y) 2
i 1 i 1
Steps for calculation are as follow:
1. In columns 1 and 2, we take the values of variables X
and Y respectively.
2. Find sum of the variables X and Y i.e.
6 6
x i 240 and
i 1
y
i 1
i 360
x
i 1
i x
i 1
i
240
x 40
n 6 6
n 6
yi
i 1
y
i 1
i
360
and y 60
n 6 6
4. In column 3, we take deviations of each observations of X from mean of
X, i.e. 30 − 40 = −10, 44 − 40 = 4 and so on other values of the column
can be obtained.
5. Similarly column 5 is prepared for variable Y i.e.
56 – 60 = −4, 55 − 60 = −5
and so on.
6. Column 4 is the square of column 3 and column 6 is the square of column
5.
7. Column 7 is the product of column 3 and column 5.
8. Sum of each column is obtained and written at the end of column.
35
Correlation for Bivariate Data
To find out the correlation coefficient by above formula, we require the values
n n n
of (x i x )( yi y) , (x i x ) 2 and (y i y)2 which are obtained by the
i 1 i 1 i 1
following table:
x i
y i (x i x) (x i x) 2 (y i y)
7
( yi y)2 (x
i 1
i x)( y i y)
i 1 i 1 i 1 i 1 i 1
6 6 6
Taking the values of ( x i x )( yi y ) , (x i x ) 2 and (y i y)2 from
i 1 i 1 i 1
the table and substituting in the above formula we have the correlation
coefficient
6
(x
i 1
i x)( yi y)
r Corr(x, y)
6 2
6
2
( x i x) ( yi y)
i 1 i 1
32 32 32
r Corr(x , y) 0.27
202 70 14140 118.91
Hence, the correlation coefficient between expenditure on advertisement and
profit is 0.27. This indicates that the correlation between expenditure on
advertisement and profit is positive and we can say that as expenditure on
advertisement increases (or decreases) profit increases (or decreases). Since it
lies between 0.25 and 0.5 it can be considered as week positive correlation
coefficient.
Example 2: Calculate Karl Pearson’s coefficient of correlation between price
and demand for the following data.
Price 17 18 19 20 22 24 26 28 30
Demand 40 38 35 30 28 25 22 21 20
x i2 and
i 1
y
i 1
2
i which are being obtained in the following table:
x y x2 y2 xy
17 40 289 1600 680
18 38 324 1444 684
19 35 361 1225 665
20 30 400 900 600
22 28 484 784 616
24 25 576 625 600
26 22 676 484 572
28 21 784 441 588
30 20 900 400 600
x 204 y 259 x 2
4794 y 2
7903 xy 5605
n n n
n x i yi x i yi
i 1 i 1 i 1
r
2 2
n 2 n n 2 n
n x i x i n y i y i
i 1 i 1 i 1 i 1
r Corr(x, y)
(9 5605) (204)(259)
(9 4794) (204 204)(9 7903) (259 259)
50445 52836
r Corr( x, y)
(43146 41616) (71127 67081)
2391
r Corr ( x , y)
1530 4046
2391
r Corr ( x , y )
2488 .0474
r Corr ( x , y) 0.96
37
n
Correlation for Bivariate Data
Note: We can use x instead of x i . Second expression indicates sum
i 1
E5) Find the coefficient of correlation for the following ages of husband and
wife:
Husband’s age 23 27 28 29 30 31
Wife’s age 18 22 23 24 25 26
d = (x A
x x ) : Sum of deviation from assumed mean Ax in X-series,
38
d = (x A ) : Sum of deviation from assumed mean A in Y-series,
y y y Correlation
Coefficient
d d (x A )( y A ) : Sum of product of deviations from assumed
x y x y
39
Correlation for Bivariate Data 240 240
r 0.99
344170 241.8264
Thus, there is a very high correlation between x and y.
Now, let us solve an exercise.
E6) Find correlation coefficient between the values of X and Y from the
following data by short -cut method:
x 10 20 30 40 50
y 90 85 80 60 45
8. Multiply respective d x and d y for each cell frequency and put the figures
in left hand upper corner of each cell.
40
9. Find f xy d x d y by multiplying fxy with d x d y and put the figures in right hand Correlation
Coefficient
lower corner of each cell and we apply the following formula:
f x d x f yd y
f xy d xd y
N
r
2
2
( f x d x ) 2
( f y d y ) 2
f x d x y y
f d
N N
where, N = f x y y .
CI MV dx -2 -1 0 +1 +2 fx fx d x fx d 2x f xy d xd y
(x)
dy
15-25 20 -2 4 2 - - - 9 -18 36 30
6 3
24 6
25-35 30 -1 2 1 0 - - 29 -29 29 22
3 16 10
6 16 0
35-45 40 0 - 0 0 0 - 32 0 0 0
10 15 7
0 0 0
45-55 50 +1 - - 0 1 2 21 21 21 18
7 10 4
0 10 8
55-65 60 +2 - - - 2 4 9 18 36 28
4 5
8 20
fy 9 29 32 21 9 N= d 2x f
100
f x dx f x xy dxd y
=-8 = 122 = 98
-18 -29 0 21 18
f ydy f y dy
= -8
f y d 2y 36 29 0 21 36
f y d 2y
= 122
30 22 0 18 28
f xyd xd y f xy d xdv
41
= 98
Correlation for Bivariate Data (8 8)
98
r 100
2
(8) (8) 2
122 122
100 100
98 0.64
r = 0.802
122 0.64122 0.64
6.8 SUMMARY
42
E4) We have some calculation in the following table: Correlation
Coefficient
︶
2 2
x y (x x) (x x) (y y) (y y) (x x)(y y
1 2 -2 4 -4 16 8
2 4 -1 1 -2 4 2
3 6 0 0 0 0 0
4 8 1 1 2 4 2
5 10 2 4 4 16 8
15 30 0 10 0 40 20
15 30 0 10 0 40 20
Here x 15
x x / n 15 / 5 3
and
y 30
y y / n 30 / 5 6
From the calculation table, we observe that
(x x)2 10 , (y y)2 40 and (x x)(y y) 20
Substituting these values in the formula
n
(x
i 1
i x )( y i y)
r
n n
( x i x ) 2 ( y i y) 2
i1 i1
20 20
r Corr ( x , y ) 1
10 40 20
Hence, there is perfect positive correlation between X and Y.
E5) Let us denote the husband’s age as X and wife’s age by Y
x y x2 y2 xy
23 18 529 324 414
27 22 729 484 594
28 23 784 529 644
29 24 841 576 696
30 25 900 625 750
31 26 961 676 806
2 2
x 168 y 138 x 4744 y 3214 xy 3904
Here,
2
x 168 , y 138 , x 4744 ,
2
y 3214 xy 3904 43
Correlation for Bivariate Data
We use the formula
n xy ( x )( y )
r Corr( x , y)
2 2
2 2
n x ( x ) n y ( y)
(6 3904) (168)(138)
r
(6 4744) (168 168)(6 3214) (138 138)
23424 23184
r
28464 2822419284 19044
23424 23184
r =1
240 240
Hence there is perfect positive correlation between X and Y.
E6) By short-cut method correlation coefficient is obtained by
n d xd y d x d y
r Corr(x , y)
n d 2
x
( d x ) 2 n d 2y ( d y )2
2 2
d , d , d d , d
x y x y x and d y are being obtained through the
following table.
Let A x = assumed mean of X = 30 and A y = Assumed mean of Y = 70
44