0% found this document useful (0 votes)

37 views

Chapter 7 Correlation

This chapter discusses correlation and the correlation coefficient. Correlation analyzes the relationship between two variables. The correlation coefficient ranges from -1 to 1, indicating the strength and direction of the linear relationship between variables. A value of 0 means no relationship, while values closer to 1 or -1 indicate a strong relationship. The chapter explains Pearson's correlation coefficient, the assumptions of correlation analysis, and how to interpret the strength of relationships based on the size of the correlation coefficient. Examples are provided to illustrate positive, negative, and no correlation.

Uploaded by

Rizana Hamsadeen

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views

Chapter 7 Correlation

Uploaded by

Rizana Hamsadeen

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

1

Chapter 7: Correlation

C
Chhaapptteerr 7::
C
COOR
RRRE
ELLA
ATTIIO
ONN

Upon completion of this chapter, you should be able to:

Explain the concept of relationship between variables

Discuss the use of the statistical tests to determine correlation
Interpret SPSS outputs on correlation tests

CHAPTER OVERVIEW

 Introduction
 What is correlation coefficient?
Chapter 1: Introduction
o Pearson product moment
Chapter 2: Descriptive Statistics
 Range of values
Chapter 3: The Normal Distribution
o Positive correlation
Chapter 4: Hypothesis Testing
o Negative correlation
Chapter 5: T-test
o Zero correlation
Chapter 6: Oneway Analysis of Variance
 Calculation of the correlation Chapter 7: Correlation
coefficient
Chapter 8: Chi-Square
 SPSS correlation coefficient
 Correlation and causation

Summary
Key Terms

This chapter introduces the concept of correlation and how it is used in analysing
educational data. The correlation coefficient is a useful statistical tool in showing the
relationship between two variables. The relationship can range from – 1.00 to + 1.00;
though in the behavioural sciences seldom is there a perfect positive or negative
correlation between two variables. However, it should be emphasised that correlation is
not causation. In other words, even though there is a high correlation between A and B; it
does not mean that A caused B.

Introduction
2
Chapter 7: Correlation

Researchers are often concerned with the way two variables relate to each
other for a given groups of persons such as students in schools, workers in a factory or
office. For example, do students who have higher scores in mathematics also have
higher scores in mathematics? Is there a relationship between a person's self-esteem
and his or her personality? Is there a relationship between attitudes towards reading
and the number of books read? Is there a relationship between years of experience as
a teacher and attitudes towards teaching? These are some of the questions asked by
educational researchers. To answer these questions, you must make observations or
collect data for each variable for a group of persons.

What is Correlation Coefficient?

The correlation coefficient a concept from statistics is a measure of how well

trends in the predicted values follow trends in past actual values. It is a measure of
how well the predicted values from a forecast model "fit" with the real-life data.

The correlation coefficient is a number between 0 and 1. If there is no

relationship between the predicted values and the actual values the correlation
coefficient is 0 or very low (the predicted values are no better than random numbers).
As the strength of the relationship between the predicted values and actual values
increases so does the correlation coefficient. A perfect fit gives a coefficient of 1.0.
Thus the higher the correlation coefficient the better.

a) Pearson Product Moment Correlation Coefficient

Pearson's product moment correlation coefficient, usually denoted by r, is one

example of a correlation coefficient. It is a measure of the linear association between
two variables that have been measured on interval or ratio scales, such as the
relationship between amount of education and income levels. If there is a relationship
between amount of education and income levels, the two variables co-vary.

b) Assumptions Testing

Correlational analysis has the following underlying assumptions: (S. Coakes and L.
Steed, 2002, SPSS Analysis Without Anguish. Brisbane: John Wiley & Sons)

 Related Pairs – the data to be collected from related pairs: i.e. if you obtain a
score on an X variable, there must ne a score on the Y variable from the same
subject.
 Scale of Measurement – data should be interval or ration in nature
 Normality – the scores for each variable should be normally distributed
 Linearity – the relationship between the two variables must be linear
3
Chapter 7: Correlation

 Homogeneity of Variance – the variability in scores for one variable is

roughly the same at all values of the other variable; i.e. it is concerned with
how the scores cluster uniformly about the regression line.

c) Strength of the Correlation

The strength of a relationship is indicated by the size of the correlation

coefficient: the larger the correlation, the stronger the relationship. A strong
relationship exists where cases in one category of the X variable usually have a
particular value on the Y variable while those in a different value of X have a
different value on Y.
For example, if people who exercise regularly nearly always have better health
than those who do not exercise, then exercise and health are more strongly correlated.
If those who exercise regularly are just a little more likely to be healthy than the non-
exercisers then the two variables are only weakly related.
How high does a correlation coefficient have to be to be called strong? How
small is weak correlation? The answer to these questions varies with the variables
being studied. For example, if the literature shows that in previous research, a
correlation of 0.51 was found between variable X and variable Y, but in your study
you obtained a correlation of 0.60; then you might conclude that the correlation
between variable X and Y is strong.
However, Cohen (1988) has provided some guidelines to determine the
strength of the relationship between two variables by providing descriptors for the
coefficients (see Table 7.1). Keep in mind that in education and psychology it is rarely
that the coefficients will be “very strong” or “near perfect” since the variables
measured are constructs involving human characteristics which are subject to wide
variation.

Trivial Low to Moderate to Substantial to Very Strong Near

Moderate Substantial Very Strong Perfect

0.01-0.09 0.10-0.29 0.30-0.49 0.50-0.69 0.70-0.89 > 0.90

Table 7.1 General guidelines on the strength of the relationship between

variables
4
Chapter 7: Correlation

EXAMPLE:

Data was gathered for the following two variables from a sample of 12 students.

Student No. IQ Test Scores (Science Test)

(X) (Y)

1 120 31
2 112 25
3 110 19
4 120 24
5 103 17
6 126 28
7 113 18
8 114 20
9 106 16
10 108 15
11 128 27
12 109 19

 Each unit or student is represented by a point on the scatter diagram. A dot is

placed for each student at the point of intersection of a straight line drawn
through his IQ score perpendicular to the X axis and through his Science score
perpendicular to the Y axis. For example, a student who obtained an IQ score
of 120 also obtained a Science score of 24. The intersection between these
lines is represented by the dot 'A'.

 The scatter diagram (see Figure 7.1) which shows a moderate positive
relationship between IQ Scores and Science Scores. However, we do not have
a summarised measure of this relationship. There is need for a more precise
measure to describe the relationship between the two variables. You need a
numerical descriptive measure of the correlation between IQ scores and
Science scores which will be discussed later.
5
Chapter 7: Correlation

Figure 7.1
Scatter Diagram Showing the Relationship between IQ Scores (X axis) and
Science Score (Y axis) for 12 Students

Range of Values (rxy)

Note that rxy can never take on a value less than - 1 nor a value greater than + 1. The
following are three graphs showing various values of rxy and the type of linear
relationship that exists between X and Y for the given values of rxy.
6
Chapter 7: Correlation

a) POSITIVE CORRELATION

Value of rxy = + 1.00 = Perfect & Direct Relationship

English
Score
4
[y axis]

1 2 3 4

Attitude Towards English [x axis]

3
Figure 7.2 Perfect Correlation

See Figure 7.2. If Attitudes (x) and English Achievement (y) had a positive
relationship than the Slope (β1) will be a positive number. Lines with positive slopes
go from the bottom left toward the upper right. i.e. and increase from 1 to 2 on the x
axis is followed by an increase from 3 to 3.5 on the y axis.
7
Chapter 7: Correlation

b) NEGATIVE CORRELATION

Value of rxy = ─ 1.00 = Perfect Inverse Relationship

English
Score
4
[y axis]

3
1 2 3 4

Attitude Towards English [x axis]

3
Figure 7.3 Negative Correlation

If Attitudes (x) and English Achievement (y) have a negative relationship than the
Slope (β1) will be a negative number. Lines with negative slopes go from the upper
right to the lower left. The above graph has a slope of -1. An increase of 1 on the X
axis is associated with a decrease of 0.5 on the Y Axis; i.e. an increase from 1 to 2 on
the x axis is followed by a decrease from 5 to 4.5 on the y axis.
8
Chapter 7: Correlation

c) ZERO CORRELATION

Value of rxy = .00 = No Relationship

English
Score
4
[y axis]

3
1 2 3 4

3 Attitude Towards English [x axis]

Figure 7.4 No Correlation

If Attitudes (x) and English Achievement (y) have NO relationship than the Slope
(β1) will be ZERO (see Figure 7.4). In other words, there is NO SYSTEMATIC
RELATIONSHIP between X and Y. Some students with high Attitude scores have
positive low English scores while some students have low Attitude score have high
positive English scores. .
9
Chapter 7: Correlation

Correlation of the Correlation Coefficient (r or rxy)

A researcher conducted a study to determine the relationship between verbal and

spatial ability. She was interested in finding out whether students who scored high on
verbal ability also scored high on spatial ability. She administered two 15 item tests
measuring verbal and spatial ability to a sample 12 primary school students. The
results of the study are shown in the table below:

Student Verbal Spatial

Test Test
x y x² y² xy
1 13 7 169 49 91
2 10 6 100 36 60
3 12 9 144 81 108
4 14 10 196 100 140
5 10 7 100 49 70
6 12 11 144 122 132
7 13 12 169 144 156
8 9 10 81 100 90
9 14 13 196 169 182
10 11 12 122 144 132
11 8 9 64 81 72
12 9 8 81 64 72

Σx = 135 Σy = 114 Σx² = 1566 Σy² =1139 Σxy =1305

a) Illustration Of The Calculation Of The Correlation Coefficient (R or Rxy)

for the Data in the Table Above.

The Pearson Correlation Coefficient (called the Pearson r) is the commonly used
formula in computing the correlation between two variables. The formula measures
the strength and direction of a linear relationship between variable X and variable Y.
The sample correlation coefficient is denoted by r. The formula for the sample
correlation coefficient is:
10
Chapter 7: Correlation

(Σ x) (Σ y)
SSxy = Σ xy ─ = 22.50
n

(Σ x)²
SSxx = Σx ² ─ = 47.25
n

(Σy)²
SSyy = Σy ² ─ = 56.00
n

Using the formula to obtain the correlation coefficient :

22.50

= √ (47.50)(56.00)

= 0.437

To Obtain A Bivariate Pearson Product-Moment Correlation

Using SPSS

A study was conducted to determine the relationship between reading ability and
performance in science. A reading ability and science test was administered to 200
lower secondary students. The Pearson product-moment correlation was used to
determine the significance of the relationship. The steps for using SPSS is shown
below:
11
Chapter 7: Correlation

SPSS Procedures:

1. Select the Analyze menu.

2. Click on Correlate and then Bivariate ..... to open the
Bivariate Correlations dialogue box.
3. Select the variables you require (i.e. reading and science) and
click on the button to move the variables into the
Variables: box.
4. Ensure that the Pearson correlation option has been
selected.
5. In the Test of Significance box, select the One-tailed radio
button.
6. Click on OK.

SPSS Output:

Reading Science

Reading Pearson Correlation 1.000 0.630**

Sig. (1-tailed) 0.000
N 200 200

Science Pearson Correlation 0.630** 1.000

Sig. (1-tailed) 0.000
N 200 200

To interpret the correlation coefficient, you examine the coefficient and its associated
significance value (p). The output show that the relationship between Reading and
Science scores is significant with a correlation coefficient of r = 0.63 which is p < .05.
Thus higher reading scores are associated with higher scores in science.

NULL HYPOTHESIS
The null hypothesis (Ho:) states that the correlation between X and Y is ρ =
0.0. What is the probability that the correlation obtained in the sample came from a
population where the parameter ρ = 0.0? The t-test for the significance of a correlation
coefficient is used. Note that the correlation between Reading and Science (r = 0.630)
is significant at p < 0.05.
Hence, the null hypothesis is REJECTED which affirms that the two variables
are positively related in the population.
12
Chapter 7: Correlation

Coefficient of Determination

r = the correlation between X and Y = 0.630 and r² = the coefficient

of determination = (0.630)² = 0.3969

Hence 39.6% of the variance in Y that can be explained by X.

TO OBTAIN A SCATTERPLOT USING SPSS

SPSS Procedures:

1. Select the Graph menu.

2. Click on Scatter ..... to open the Scatterplot dialogue box
3. Ensure Simple Scatterplot option is selected.
4. Click on the Define command pushbutton to open the Simple Scatterplot sub-
dialogue box.
5. Select the first variable (i.e. science) and click on the button to move the
variable into the Y Axis: box. .
6. Select the second variable (i.e. reading) and click on the button to move the
variable into the X Axis: box.
6. Click on OK.
13
Chapter 7: Correlation

SPSS Output

SCIENCE

20 30 40 50 60 70 80

READING

Figure 7.4 Scatterplot

As you can see from the scatterplot (Figure 7.4) there is a linear relationship between
reading and Science scores. Given that the scores cluster uniformly around the
regression line, the assumption of homogeneity of variance has not been violated.

Causation And Correlation

Causation and correlation are two concepts that has been wrongly interpreted
by some researchers. The presence of a correlation between two variables does not
necessarily mean there exists a causal link between them. Say for instance that the
there is a correlation (0.60) between "teachers salary" and "academic performance of
students".
Does this imply that a well-paid teaching staff "cause" better academic
performance of students? Would the percent of academic performance increase if we
increased the pay of teachers? It is dangerous to conclude causation just because there
is a correlation or relationship between two variables. It tells nothing by itself about
whether "teachers salary" causes "achievement".
14
Chapter 7: Correlation

Significance Of The Correlation Coefficient

We introduced Pearson correlation as a measure of the strength of a

relationship between two variables. But any relationship should be assessed for its
significance as well as its strength. The significance of the relationship is expressed in
probability levels: p (e.g., significant at p =.05). This tells how unlikely a given
correlation coefficient, r, will occur given no relationship in the population. It
assumes that you have a sample of cases from a population. The question is whether
your observed statistic for the sample is likely to be observed given some assumption
of the corresponding population parameter. If your observed statistic does not exactly
match the population parameter, perhaps the difference is due to sampling error.

To be useful, a correlation coefficient needs to be accompanied by a test of

statistical significance. It is also important for you to know about the sample size.
Generally, a strong correlation in a small population may be statistically non-
significant, while a much weaker correlation in a large sample may be statistically
significant. For example, in a large sample, even low correlations (as low as 0.06) can
be statistically significant. Similar sized correlations that are statistically significant
with large samples are not significant for the smaller samples, This is because with
smaller samples the likelihood of sampling error is higher.
15
Chapter 7: Correlation

LEARNING ACTIVITY

A researcher conducted a study which aimed to

determine the relationship between self-efficacy and
academic performance in geography. A 20 item self-
efficacy scale and a 25 item geography test was
administered to a group of 12 students.
The following are the results of the study:

Self-Efficacy Scale Geography Test

15 22
13 17
14 20
12 18
16 23
12 21
11 19
17 24
15 19
13 16

a) What is the correlation coefficient?

b) What is the mean for the self-efficacy scale and the mean or the
geography test?
c) Comment on the scatter plot.

SUMMARY
16
Chapter 7: Correlation

 The correlation coefficient a concept from statistics is a measure of how well

trends in the predicted values follow trends in past actual values.

 Pearson's product moment correlation coefficient, usually denoted by r, is a

measure of the linear association between two variables

 The null hypothesis (Ho:) states that the correlation between X and Y is ρ =
0.0.

 The presence of a correlation between two variables does not necessarily mean
there exists a causal link between them.

 The strength of a relationship is indicated by the size of the correlation

coefficient: the larger the correlation, the stronger the relationship.

 The scatterplot is a graphical representation of the intersection of a point on

the x-axis with the point on the y-axis.

 The presence of a correlation between two variables does not necessarily mean
there exists a causal link between them.

 The coefficient of determination is the proportion of variance in Y that can be

explained by X.

KEY WORDS:

 Correlation
 Correlation coefficient
 Pearson product moment
 Range of values
 Positive correlation
 Negative correlation
 Zero correlation
 Scatterplot
 Causation
 Coefficient of determination

----------00--------