0% found this document useful (0 votes)
5 views

Session 4 Correlation analysis

This document discusses correlation analysis, which investigates the relationship between two continuous and quantitative variables using various methods such as Pearson's and Spearman's correlation coefficients. It explains the concept of covariance and the coefficient of correlation, detailing how these measures indicate the strength and direction of relationships between variables. Additionally, it provides guidance on using SPSS for correlation analysis and includes examples and interpretations of results.

Uploaded by

Neha Mittal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Session 4 Correlation analysis

This document discusses correlation analysis, which investigates the relationship between two continuous and quantitative variables using various methods such as Pearson's and Spearman's correlation coefficients. It explains the concept of covariance and the coefficient of correlation, detailing how these measures indicate the strength and direction of relationships between variables. Additionally, it provides guidance on using SPSS for correlation analysis and includes examples and interpretations of results.

Uploaded by

Neha Mittal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Correlation Analysis

I n the previous chapter, we have studied non-parametric tests. Let us now move forward
and study correlation analysis.
We know that income and expenditure variables are interrelated, implying that they increase
or decrease together. Similarly, price and demand are also interrelated variables, implying
that when the price of a product increases, its demand decreases. Thus, we can say that
income and expenditure and price and demand are correlated with each other.
The technique for investigating the relationship between two continuous and quantitative
variables is called correlation. A correlation coefficient is denoted by r and ranges between –1
and +1. For instance, a value of r = .08 indicates a positive association between the variables,
whereas a value of r = –.03 indicates a negative association.

The methods of correlation include Pearson’s correlation coefficient, Spearman’s


coefficient of correlation, Kendall’s tau correlation coefficient and partial correlation.
Apart from manual formulae, SPSS is used for finding the correlation between
variables. In this chapter, we will discuss the concept of correlation analysis in detail.
We will also explain the concept of covariance. At the end, we will study the types of
correlation analysis with the help of SPSS.
The various cases of relationship to be discussed are mentioned below:

Copy right: DR Ajay Kumar Chauhan, IMT Ghaziabad


156  Chapter 10

• Unstandardized measure of the relationship


Covariance
between two scale variable.

Karl Pearson`s Correlation • Standardized measure of relationship


Coefficient between two scale variables.

Spearman Correlation • Measure of the non-parametric correlation


Coefficient between two ordinal variables.

• Measures of the non-parametric correlation


Kendal Tau Corelation
between two ordinal variables with repeated
Coefficient
ranks

• Correlation between two scale variables


Partial Correlation
after controlling other scale variables

• Correlation between actual value and


Multiple Correlation predicted value of a variable (comes from
some regression model)

10.1 Concept of Correlation Analysis


Correlation analysis studies the strength of linear relationship between the different
types of variables. It measures the extent to which one variable affects other
variables. Simple correlation analysis refers to the tool that helps in finding the
degree of relationship between two variables. It is the most commonly used
measure to describe a relationship. It was given by Karl Pearson and therefore, is
known as Karl Pearson’s coefficient of correlation. To calculate simple correlation, it
is assumed that there is a linear relationship between two variables.
There are various tools, such as Pearson`s correlation coefficient, Spearman’s
coefficient of correlation and partial correlation, for studying correlation analysis.
The correlation coefficient ranges between –1 to +1. A perfect positive correlation
implies that changes in one variable will result in a same change in the other
variable. Similarly, a perfect negative correlation implies that changes in one
variable will result in opposite change in the other variable. A value of zero implies
that there is no correlation between the two variables.
The equation used to calculate correlation is as follows:
∑(𝐗𝐢 − 𝐗)(𝐘𝐢 − 𝐘)
𝑪𝒐𝒓𝒓𝒆𝒍𝒂𝒕𝒊𝒐𝒏(𝒓) = 𝐧𝛔𝐱𝛔𝐲

Or
Correlation Analysis  157

Correlation(r) = (n∑XiYi –∑Xi∑Yi) / √n∑Xi2 – (∑Xi)2 * √n∑Yi2 – (∑Yi)2


Where, Xi= ith value of X variable
X = Mean of X variable
Yi = ith value of Y variable
Y = Mean of Y variable
n = Number of pairs of observations
σx = Standard deviation of X
σy = Standard deviation of Y
Figure 10.1 shows an example of correlation analysis in which association between
body weight and diastolic blood pressure in a group of healthy people is given:

 Figure 10.1: Correlation Analysis of Blood Pressure and Body Weight

10.2 Covariance
Covariance measures the strength of relation between two or more sets of random
variables. In case of a single variable, variance is a measure of the average distance
of observations from the mean. The variance of a single scale variable can be
mathematically expressed as follows:
∑𝐧𝐢= 𝟏 (𝐱 𝐢 − 𝐱̅)𝟐
𝐕𝐚𝐫𝐢𝐚𝐧𝐜𝐞 (𝛔𝟐 ) =
𝐧 − 𝟏
Where, xi = ith observation
𝑥̅ = Mean of x observations
n = Number of observations
158  Chapter 10

In case of two scale variables, the relation between them can be estimated by the
type of change (deviations) in their observations from their respective mean. In
other words, if the deviation in observation of the first variable from the mean is of
the same nature as of the deviation in the corresponding observation of the second
variable from the mean, the variables may have a positive relationship between
them. The covariance between the two variables X and Y can be mathematically
expressed as follows:
∑(𝐱 𝐢 − 𝐱̅)(𝐲𝐢 − 𝐲̅)
𝐂𝐨𝐯𝐚𝐫𝐢𝐚𝐧𝐜𝐞 (𝐱, 𝐲) =
𝐍−𝟏

A positive value of covariance between two variables indicates a positive


relationship between them. This means that if one variable deviates from the mean
in one direction, the other variable will also deviate from its mean in the same
direction. A negative covariance between two variables indicates a negative
relationship between them.
The problem of measuring the relationship between two variables by using
covariance is that it is not a standardized measure of relationship. It means that
when the units of the variables change, the value of covariance also changes.
Hence, covariance is an unstandardized measure of relationship between two scale
variables. In order to solve the problem of dependence of covariance on the units
of variables, it is required to develop another modified measure of relationship. This
can be done by dividing the covariance between two variables with the product of
the standard deviation of both the variables. This new modified measure is known
as the coefficient of correlation, which is discussed in the next section.

10.2.1 Coefficient of Correlation


The coefficient of correlation is a standardized measure of relationship between two
scale variables, and its value always lies between the range –1 to +1.
It can be mathematically expressed as follows:
𝑪𝒐𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 (𝒙, 𝒚) ∑ (𝒙𝒊 − 𝒙
̅)(𝒚𝒊 − 𝒚̅)
𝑪𝒐𝒆𝒇𝒇𝒊𝒄𝒊𝒆𝒏𝒕 𝒐𝒇 𝑪𝒐𝒓𝒓𝒆𝒍𝒂𝒕𝒊𝒐𝒏 (𝒓) = =
𝝈𝒙 𝝈𝒚 (𝑵 − 𝟏) 𝝈𝒙 𝝈𝒚
Where, x = Standard deviation of x
y = Standard deviation of y
Correlation Analysis  159

r = Coefficient of correlation
The correlation coefficient of value +1 indicates the presence of a perfect positive
correlation between the variables. Conversely, the correlation coefficient of value –1
indicates the presence of a perfect negative correlation between the variables. The
correlation coefficient of value 0 indicates that there is no relationship between the
variables. The relationship between two variables can be further divided into the
following four categories, shown in Table 10.1:
Table 10.1: Ranges of Correlation Coefficient
Correlation Coefficient (r) =  0.15 No Correlation
Correlation Coefficient (r) =  0.15 to  0.35 Low Correlation
Correlation Coefficient (r) =  0.35 to  0.65 Medium Correlation
Correlation Coefficient (r) =  0.65 to  1 Strong Correlation

Using SPSS for Correlation Analysis


Table 10.2 shows the data that will be used in correlation analysis:
Table 10.2: Data for Practice

Training Performance Training Performance Training Performance


Score Score Score
6.24 80.88 5.38 76.26 5.43 62.09

4.62 62.82 3.83 56.07 10 105.02

7.62 79.8 10.07 92.09 11.55 124.3

10 120.99 5.27 116.63 9.22 112.65

3.86 69.35 10.79 80.62 6.84 91.65

5.45 72.55 4.55 109.6 6.9 74.77

8.52 91.64 3.05 56.82 3.07 56.49

8.43 98.91 7.83 109.03 10.07 117.77

4.62 57.54 4.69 85.15 3.83 68.51

7 58.8 10.81 117.69 10.07 92.78


160  Chapter 10

Table 10.2: Data for Practice

Training Performance Training Performance Training Performance


Score Score Score
6.1 73.96 5.38 67.89 7 90.38

9.24 120.98 9.24 118.21 3.07 83.44

3.07 40.25 8.59 100.7 4.74 96.02

6.14 94.84 3.93 69.13 10.07 116.21

4.62 57.59 3.12 86.55 6.03 94.17

3.86 65.22 4.62 80.75 9.98 113.13

5.5 88.9 5.36 64.43 3.07 62.8

9.24 105.96 10.07 108.92 9.33 111.73

6.1 80.19 7.65 87.63 7.55 118.79

3.05 53.27 10.07 94.45 5.38 87.66

4.62 65.8 5.38 79.47 6.14 73.32

9.24 111.65 7.78 75.63 7.69 87.86

6.03 90.24 4.64 84.48 11.55 129.21

3.07 80.68 10.74 116.32 7.76 113.07

7.83 88.06 4.64 76.3 9.27 116.08

7 79.23 10.74 118.81 3.86 77.23

3.07 57.43 3.07 66.85 3.07 53.82

8.59 111.83 7.65 93.19 11.55 129.64

6.79 81.9 6.08 103.76 6.84 96.49

6.03 108.15 9.31 111.29 5.27 104.48

SPSS Command
The SPSS command required is as follows:
Step 1: Click ‘Analyze’ ➔ ‘Correlate’ ➔ ‘Bivariate’
Correlation Analysis  161

The same is shown in Figure 10.2:

(Copyright: IBM Corp. IBM SPSS Statistics for Windows, Version 21.0.)
 Figure 10.2: SPSS Command for Correlation Analysis (1)

Step 2: Transfer both the variables to the ‘Variables’ window, as shown in Figure
10.3:
162  Chapter 10

(Copyright: IBM Corp. IBM SPSS Statistics for Windows, Version 21.0.)
 Figure 10.3: SPSS Command for Correlation Analysis (2)

Step 3: In ‘Bivariate Correlations: Options,’ select ‘Cross-product deviations and


covariances,’ then click (➔) ‘Continue,’ as shown in Figure 10.4:

(Copyright: IBM Corp. IBM SPSS Statistics for Windows, Version 21.0.)
 Figure 10.4: SPSS Command for Correlation Analysis (3)

Data Analysis and Interpretation


The output obtained from the above SPSS analysis is shown in Table 10.3. The
results indicate that the p-value of the Pearson’s coefficient of correlation (0.776) is
0.000, which is less than 5 percent level of significance. This indicates that with 95
percent level of confidence, the null hypothesis of no significant correlation
between the training score and performance cannot be accepted. Hence, it can be
concluded from the results that there exists a positive strong and significant
correlation between the training score and performance.
Table 10.3 shows the output for Pearson’s coefficient of correlation:
Table 10.3: Pearson’s Coefficient of Correlation Output

Training Performance
Score

Training Pearson’s Correlation 1 .776**


Correlation Analysis  163

Score Sig. (2-tailed) .000

Sum of Squares and Cross- 569.504 3743.449


products

Covariance 6.399 42.061

N 90 90

Performance Pearson’s Correlation .776** 1

Sig. (2-tailed) .000

Sum of Squares and Cross- 3743.449 40902.865


products
Covariance 42.061 459.583

N 90 90

** Correlation is significant at 0.01 level (2-tailed)

10.3 Non-parametric Correlation


There are two methods for studying correlation, namely, parametric and non-
parametric. A parametric correlation uses the mean and deviation data for
estimating information, whereas a non-parametric correlation only uses the ordinal
position of pairs of scores. One of the assumptions of parametric correlation is that
the variables must be scale variables and normally distributed. Non-parametric
correlation measures the relationship between ordinal variables.
There are two tests for non-parametric correlation, namely, Spearman’s correlation
coefficient and Kendall’s tau statistics. A detailed explanation of these two non-
parametric tests is given in the following sections.

10.3.1 Spearman’s Coefficient of Correlation


Spearman’s coefficient of correlation, given by Charles Spearman, is a measure of
non-parametric correlation between two ordinal variables. It can be used when the
data has violated parametric assumptions, such as normal distribution. This method
is also called rank correlation. It works on the ranking of the observed score of the
variables. The observed scores are converted into ranks in such a way that the
highest rank is given the value 1 and 2, 3, 4... to the subsequent ones.
164  Chapter 10

Generally, this measure is used when the data is qualitative in nature. The equation
to calculate rank correlation is as follows:
Rank correlation = 1 – [6∑di2/n (n2 – 1)]
Where, di = difference between the individual/ith pair of variables
n = number of pairs of observations

10.3.2 Kendall’s Tau Correlation Coefficient


It is another non-parametric measure of correlation between two ordinal variables. It is used
in the situation when the researcher has a small dataset with a large number of tied ranks.
This means that many observed scores of the variables are same and share the same ranks.
The Spearman’s correlation coefficient is more popular than Kendall’s tau, but it is suggested
that Kendall’s tau statistic is better than the Spearman’s correlation coefficient. This is
because more accurate inference can be drawn from Kendall’s tau statistic as compared to
the Spearman’s correlation coefficient.

Using SPSS for Non-parametric Correlation


Table 10.4 shows the practice data in case of the Spearman’s coefficient of
correlation and Kendall’s tau correlation coefficient:
Table 10.4: Data for Practice

Score 1 Score 2 Score 1 Score 2


8 10 18 19
10 13 17 14
9 11 20 18
12 16 19 18
13 13 12 11
7 9 17 12
10 12 9 6
8 9 10 17
12 14 12 12
16 18 7 15
19 10 12 12
13 15 16 11
15 18
Correlation Analysis  165

The SPSS command for Spearman’s coefficient of correlation and Kendall’s tau
correlation coefficient is as follows:
Step 1: Click ‘Analyze’ ➔ ‘Correlate’ ➔ ‘Bivariate’
The same is shown in Figure 10.5:

(Copyright: IBM Corp. IBM SPSS Statistics for Windows, Version 21.0.)
 Figure 10.5: SPSS Command for Non-Parametric Correlation Analysis (1)

Step 2: Select ‘Kendall’s tau-b’ and ‘Spearman’, deselect ‘Pearson’ and click ‘OK,’ as
shown in Figure 10.6:

(Copyright: IBM Corp. IBM SPSS Statistics for Windows, Version 21.0.)
166  Chapter 10

 Figure 10.6: SPSS Command for Non-Parametric Correlation Analysis (2)

Data Analysis and Interpretation


The output obtained from the above SPSS analysis is shown in Table 10.5. The
results indicate that the p-values of the Spearman’s coefficient of correlation (0.517)
and Kendall’s tau statistic (0.404) are 0.008 and 0.007, respectively, which are, in
both cases, less than 5 percent level of significance. This indicates that with 95
percent coefficient level of confidence, the null hypothesis of no significant
correlation between the scores cannot be accepted. Hence, it can be concluded
from the result that there exists a significant correlation between the scores of the
candidates.
Table 10.5 shows the output obtained for non-parametric correlation:
Table 10.5: Non-parametric Correlation Statistics

Scores1 Scores2

Kendall’s tau Scores 1 Correlation Coefficient 1.000 .404**

Sig. (2-tailed) . .007

N 25 25

Scores 2 Correlation Coefficient .404** 1.000

Sig. (2-tailed) .007 .

N 25 25

Spearman’s Scores 1 Correlation Coefficient 1.000 .517**


Coefficient
Sig. (2-tailed) . .008

N 25 25

Scores 2 Correlation Coefficient .517** 1.000

Sig. (2-tailed) .008 .

N 25 25

** Correlation is significant at 0.01 level (2-tailed)


Correlation Analysis  167

10.4 Partial Correlation


In any bivariate correlation between two variables, the main problem is the
possibility of indirect correlation. This means that two variables in fact may not be
correlated directly but found to be correlated because of the influence they have
from a third outside variable. When this third variable is controlled, the correlation
between them becomes insignificant. Thus, we can say that the partial correlation
between two variables is the correlation between them after controlling the
effect of other outside variables.
Using SPSS for Partial Correlation
Table 10.6 shows the data for practicing partial correlation analysis by using SPSS:
Table 10.6: Data for Practice

Money Interest GDP Money Interest GDP


Supply Rate Supply Rate

22175 11.13 334800 28715.66 10.08 368280

22841 11.17 336708 28996.33 11.45 376768

23461 11.8 340096 28479.33 12.45 381016

23427 14.18 341844 28669 10.77 385396

23811 14.38 342776 29018.66 10.52 390240

23612.33 12.98 342264 29398.66 9.67 391580

24543 10.72 340716 30203.66 9.03 396384

25638.66 14.53 347780 31059.33 9.02 405308

25316 17.13 354836 30745.33 11.03 405680

25501.33 18.57 359352 30477.66 8.73 408116

25382.33 21.02 356152 31563.66 8.47 409160

24753 16.62 353636 32800.66 8.4 409616

25094.33 15.35 349568 33958.33 7.25 416484

25253.66 16.05 345284 35795.66 8.3 422916

24936.66 14.32 343028 35878.66 9.3 429980


168  Chapter 10

Table 10.6: Data for Practice

Money Interest GDP Money Interest GDP


Supply Rate Supply Rate

25553 10.88 340292 36336 8.7 436264

26755.33 9.62 346072 36480.33 8.62 440592

27412 9.32 353860 37108.66 9.13 446680

28403.33 9.33 359544 38423 10.05 450328

28402.33 9.55 362304 38480.66 10.83 453516

The SPSS command for practicing partial correlation analysis is as follows:


Step 1: Click ‘Analyze’ ➔ ‘Correlate’ ➔ ‘Partial’

The same is shown in Figure 10.7:

(Copyright: IBM Corp. IBM SPSS Statistics for Windows, Version 21.0.)
 Figure 10.7: SPSS Command for Partial Correlation Analysis (1)
Correlation Analysis  169

Step 2: Transfer the two main variables to the ‘Variables’ window and the third
variable to the ‘Controlling for’ window. Then, click ‘OK,’ as shown in Figure 10.8:

(Copyright: IBM Corp. IBM SPSS Statistics for Windows, Version 21.0.)
 Figure 10.8: SPSS Command for Partial Correlation Analysis (2)

Data Analysis and Interpretation


The output of the Pearson correlation and partial correlation is shown in Tables 10.7
and 10.8, respectively. The results of the Pearson correlation indicate that there
exists a significant correlation between interest rate and GDP (r = – 0.555) without
controlling the third variable money supply, but the partial correlation between
them is found to be insignificant.
Table 10.7 shows the output of the Pearson correlation:
Table 10.7: Pearson Correlation Output

Money Interest GDP


Supply Rate

Money supply Pearson Correlation 1 -.598** .976**

Sig. (2-tailed) .000 .000

N 40 40 40

Interest rate Pearson Correlation -.598** 1 -.555**

Sig. (2-tailed) .000 .000


170  Chapter 10

N 40 40 40

GDP Pearson Correlation .976** -.555** 1

Sig. (2-tailed) .000 .000

N 40 40 40

** Correlation is significant at 0.01 level (2-tailed)

Table 10.8 shows the output of the partial correlation obtained from SPSS:
Table 10.8: Partial Correlation Output

Control Variables Interest GDP


Rate

Money supply Interest rate Correlation 1.000 .165

Significance (2-tailed) . .315

df 0 37

GDP Correlation .165 1.000

Significance (2-tailed) .315 .

df 37 0

Table 10.8 shows that the p-value of the correlation between interest rate and GDP
(0.165) is found to be more than 5 percent level of significance (0.315). Hence, the
null hypothesis of no significant correlation can be accepted. This indicates that
after controlling the third variable money supply, the correlation between interest
rate and GDP becomes insignificant.

You might also like