Practical No. 4 Data Analysis-Rank Correlation
Practical No. 4 Data Analysis-Rank Correlation
4
Data Analysis: Rank Correlation
Aim: To understand the correlation between two different variables in given set of data.
Introduction: In the previous practical we have discussed about a single variable, now in this
practical we need to understand the relationship between two variables.
Meaning: The strength and nature of relationship between two variables is known as
Correlation.
There are three types of relationships
1) Increase in one variable, leads to increase in the other. (Positive Correlation)
e.g., a) When the population in area increases, demand of water also increases.
b) If the literacy rate increases, per capita GDP also increases.
c) If the distance from market increases, the transportation cost also increases.
2) Increase in one variable, leads to decrease in the other. (Negative Correlation)
e.g., a) As the temperature increases, atmospheric pressure decreases.
b) As the speed of railway increases, travel time decreases.
3) Change in one variable, does not change the other. (No Correlation)
e.g. a) Increase in education investment has no relationship with number of clothes each
one wears.
b) Students wearing costly cloths, are very clever.
Mathematically (numerically) rank correlation can be expressed as:
• 0.00 to 1.00 shows perfect Positive Correlation
• 0.00 to -1.00 shows perfect Negative Correlation
• 0.00 (Zero) Zero correlation/NO Correlation
• At other times, correlation values can be between -1 and 1.
There are various methods are used for calculation of Correlation. We are going to study
Spearman’s Rank Correlation Method. It is used when the data given for two variables is in
the form of ranks of preferences.
The formula for Spearman’s Rank Correlation is
6 ∑(𝑅₁−𝑅₂)²
Formula: R= 1-( 𝑛(𝑛²−1)
)
Q. Calculate the Spearman’s Rank correlation with the help of following data :
Wards in a city No. of people BPL No. of people unemployed
A 20 40
B 80 120
C 00 60
D 200 240
E 120 160
F 160 180
G 60 80
H 180 200
I 90 90
J 100 100
Solution: -
Step 1: Copy the data in a table and put them in another column with ranks.
Step 2: Arrange the data according to ranks and put them numbers like 1,2,3….
Accordingly. Highest value gets rank 1.
Step 3: Find the difference between two ranks. (R1 –R2)
Step 4: Square the values. (R1 –R2) ²
Step 5: Find the sum of all squares.
Step 6: Now find the Correlation with the following formula:
6 ∑(𝑅₁−𝑅₂)²
R = 1-(
𝑛(𝑛²−1)
)
Ward No. of Rank (R1) No. of Rank (R2) (R1-R2) (R1-R2)2
people BPL Unemployed
(X1) (Y1)
A 20 9 40 10 -1 1
B 80 7 120 5 2 4
C 00 10 60 9 1 1
D 200 1 240 1 0 0
E 120 4 160 4 0 0
F 160 3 180 3 0 0
G 60 8 80 8 0 0
H 180 2 200 2 0 0
I 90 6 90 7 -1 1
J 100 5 100 6 -1 1
6 ∑(𝑅₁−𝑅₂)²
R = 1− ( )
𝑛(𝑛²−1)
(6×8)
= 1− 10(102 −1)
48
= 1− 10(100−1)
48
= 1− 10×99
48
= 1− 990
990−48
= 990
942
= 990
= 0.95
R = 0.95
Thus, there is a positive high correlation between population BPL and
unemployment in the 10 Wards of a city. This means that if BPL population
increases, unemployment also increases.
Exa.02) Urban population and literacy ratio of 10 areas is given in these two
data. Interpret your result.
1 60 73
2 35 29
3 15 36
4 22 14
5 18 20
6 38 48
7 47 45
8 5 12
9 12 13
10 9 10
6 ∑(𝑅₁−𝑅₂)²
R = 1− ( )
𝑛(𝑛²−1)
(6×18)
= 1−
10(102 −1)
108
= 1−
10(100−1)
108
= 1−
10×99
108
= 1−
990
990−108
=
990
882
=
990
R = 0.8909
R = 0.89
Result: In the 10 areas, percentage of Urban population and literacy rate
have positive correlation. As the percentage of Urban population increases
the literacy rate also increase.