Chapter 7 Correlation
Chapter 7 Correlation
Chapter 7: Correlation
C
Chhaapptteerr 7::
C
COOR
RRRE
ELLA
ATTIIO
ONN
CHAPTER OVERVIEW
Introduction
What is correlation coefficient?
Chapter 1: Introduction
o Pearson product moment
Chapter 2: Descriptive Statistics
Range of values
Chapter 3: The Normal Distribution
o Positive correlation
Chapter 4: Hypothesis Testing
o Negative correlation
Chapter 5: T-test
o Zero correlation
Chapter 6: Oneway Analysis of Variance
Calculation of the correlation Chapter 7: Correlation
coefficient
Chapter 8: Chi-Square
SPSS correlation coefficient
Correlation and causation
Summary
Key Terms
This chapter introduces the concept of correlation and how it is used in analysing
educational data. The correlation coefficient is a useful statistical tool in showing the
relationship between two variables. The relationship can range from – 1.00 to + 1.00;
though in the behavioural sciences seldom is there a perfect positive or negative
correlation between two variables. However, it should be emphasised that correlation is
not causation. In other words, even though there is a high correlation between A and B; it
does not mean that A caused B.
Introduction
2
Chapter 7: Correlation
Researchers are often concerned with the way two variables relate to each
other for a given groups of persons such as students in schools, workers in a factory or
office. For example, do students who have higher scores in mathematics also have
higher scores in mathematics? Is there a relationship between a person's self-esteem
and his or her personality? Is there a relationship between attitudes towards reading
and the number of books read? Is there a relationship between years of experience as
a teacher and attitudes towards teaching? These are some of the questions asked by
educational researchers. To answer these questions, you must make observations or
collect data for each variable for a group of persons.
b) Assumptions Testing
Correlational analysis has the following underlying assumptions: (S. Coakes and L.
Steed, 2002, SPSS Analysis Without Anguish. Brisbane: John Wiley & Sons)
Related Pairs – the data to be collected from related pairs: i.e. if you obtain a
score on an X variable, there must ne a score on the Y variable from the same
subject.
Scale of Measurement – data should be interval or ration in nature
Normality – the scores for each variable should be normally distributed
Linearity – the relationship between the two variables must be linear
3
Chapter 7: Correlation
EXAMPLE:
Data was gathered for the following two variables from a sample of 12 students.
(X) (Y)
1 120 31
2 112 25
3 110 19
4 120 24
5 103 17
6 126 28
7 113 18
8 114 20
9 106 16
10 108 15
11 128 27
12 109 19
The scatter diagram (see Figure 7.1) which shows a moderate positive
relationship between IQ Scores and Science Scores. However, we do not have
a summarised measure of this relationship. There is need for a more precise
measure to describe the relationship between the two variables. You need a
numerical descriptive measure of the correlation between IQ scores and
Science scores which will be discussed later.
5
Chapter 7: Correlation
Figure 7.1
Scatter Diagram Showing the Relationship between IQ Scores (X axis) and
Science Score (Y axis) for 12 Students
Note that rxy can never take on a value less than - 1 nor a value greater than + 1. The
following are three graphs showing various values of rxy and the type of linear
relationship that exists between X and Y for the given values of rxy.
6
Chapter 7: Correlation
a) POSITIVE CORRELATION
English
Score
4
[y axis]
1 2 3 4
See Figure 7.2. If Attitudes (x) and English Achievement (y) had a positive
relationship than the Slope (β1) will be a positive number. Lines with positive slopes
go from the bottom left toward the upper right. i.e. and increase from 1 to 2 on the x
axis is followed by an increase from 3 to 3.5 on the y axis.
7
Chapter 7: Correlation
b) NEGATIVE CORRELATION
English
Score
4
[y axis]
3
1 2 3 4
If Attitudes (x) and English Achievement (y) have a negative relationship than the
Slope (β1) will be a negative number. Lines with negative slopes go from the upper
right to the lower left. The above graph has a slope of -1. An increase of 1 on the X
axis is associated with a decrease of 0.5 on the Y Axis; i.e. an increase from 1 to 2 on
the x axis is followed by a decrease from 5 to 4.5 on the y axis.
8
Chapter 7: Correlation
c) ZERO CORRELATION
English
Score
4
[y axis]
3
1 2 3 4
If Attitudes (x) and English Achievement (y) have NO relationship than the Slope
(β1) will be ZERO (see Figure 7.4). In other words, there is NO SYSTEMATIC
RELATIONSHIP between X and Y. Some students with high Attitude scores have
positive low English scores while some students have low Attitude score have high
positive English scores. .
9
Chapter 7: Correlation
The Pearson Correlation Coefficient (called the Pearson r) is the commonly used
formula in computing the correlation between two variables. The formula measures
the strength and direction of a linear relationship between variable X and variable Y.
The sample correlation coefficient is denoted by r. The formula for the sample
correlation coefficient is:
10
Chapter 7: Correlation
(Σ x) (Σ y)
SSxy = Σ xy ─ = 22.50
n
(Σ x)²
SSxx = Σx ² ─ = 47.25
n
(Σy)²
SSyy = Σy ² ─ = 56.00
n
22.50
= √ (47.50)(56.00)
= 0.437
A study was conducted to determine the relationship between reading ability and
performance in science. A reading ability and science test was administered to 200
lower secondary students. The Pearson product-moment correlation was used to
determine the significance of the relationship. The steps for using SPSS is shown
below:
11
Chapter 7: Correlation
SPSS Procedures:
SPSS Output:
Reading Science
To interpret the correlation coefficient, you examine the coefficient and its associated
significance value (p). The output show that the relationship between Reading and
Science scores is significant with a correlation coefficient of r = 0.63 which is p < .05.
Thus higher reading scores are associated with higher scores in science.
NULL HYPOTHESIS
The null hypothesis (Ho:) states that the correlation between X and Y is ρ =
0.0. What is the probability that the correlation obtained in the sample came from a
population where the parameter ρ = 0.0? The t-test for the significance of a correlation
coefficient is used. Note that the correlation between Reading and Science (r = 0.630)
is significant at p < 0.05.
Hence, the null hypothesis is REJECTED which affirms that the two variables
are positively related in the population.
12
Chapter 7: Correlation
Coefficient of Determination
SPSS Procedures:
SPSS Output
80
70
60
50
40
30
SCIENCE
20 30 40 50 60 70 80
READING
As you can see from the scatterplot (Figure 7.4) there is a linear relationship between
reading and Science scores. Given that the scores cluster uniformly around the
regression line, the assumption of homogeneity of variance has not been violated.
Causation and correlation are two concepts that has been wrongly interpreted
by some researchers. The presence of a correlation between two variables does not
necessarily mean there exists a causal link between them. Say for instance that the
there is a correlation (0.60) between "teachers salary" and "academic performance of
students".
Does this imply that a well-paid teaching staff "cause" better academic
performance of students? Would the percent of academic performance increase if we
increased the pay of teachers? It is dangerous to conclude causation just because there
is a correlation or relationship between two variables. It tells nothing by itself about
whether "teachers salary" causes "achievement".
14
Chapter 7: Correlation
LEARNING ACTIVITY
15 22
13 17
14 20
12 18
16 23
12 21
11 19
17 24
15 19
13 16
SUMMARY
16
Chapter 7: Correlation
The null hypothesis (Ho:) states that the correlation between X and Y is ρ =
0.0.
The presence of a correlation between two variables does not necessarily mean
there exists a causal link between them.
The presence of a correlation between two variables does not necessarily mean
there exists a causal link between them.
KEY WORDS:
Correlation
Correlation coefficient
Pearson product moment
Range of values
Positive correlation
Negative correlation
Zero correlation
Scatterplot
Causation
Coefficient of determination
----------00--------