0% found this document useful (0 votes)
12 views

attachment 1

The document provides a comprehensive overview of correlation analysis, including types of correlation, scatter diagrams, covariance, correlation coefficients, probable error, and Spearman's rank correlation. It explains the mathematical formulas used to calculate these metrics and includes examples and exercises for practical application. Key points emphasize the interpretation of correlation values and the significance of the results.

Uploaded by

petersarikaz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

attachment 1

The document provides a comprehensive overview of correlation analysis, including types of correlation, scatter diagrams, covariance, correlation coefficients, probable error, and Spearman's rank correlation. It explains the mathematical formulas used to calculate these metrics and includes examples and exercises for practical application. Key points emphasize the interpretation of correlation values and the significance of the results.

Uploaded by

petersarikaz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

CORRELATION ANALYSIS

Content
• Type of correlation
• Scatter diagram
• The covariance
• The correlation co-efficient
• Probable Error
• Spearman’s rank correlation

Formulae

1. Covariance

2. Correlation co-efficient

a)

b)
c)
d) When deviations are taken from actual mean.

3. Probable error

4. Spearman’s rank correlation

a)

b)
Point to remember

1. If the value of is ‘+’. There is a positive correlation between the variables if it as negative
then negative correlation.
2. If there is no correlation.
3. Value of is not affected by change in scale and origin.
4. The range of is -1 to +1.

TYPES OF CORRELATION

Let be the two variables under study. We note that there exist, three types of relationship between
these two variables (i) they are directly proportional i.e. when one increases (decreases) the other
1
increases (decreases); (ii) they are inversely proportional, i.e. when one increases (decreases) the other
decreases (increases); (iii) no relation. These three situations can be stated as follows statistically (i)
positively correlated (ii) negatively correlated (iii) no-correlation.
SCATTER DIAGRAM
It is the simplest way of the diagrammatic representation of bivariate data. Let and be the two
variables under study with observation each. If we plot on a graph sheet,
the resulting diagram gives us a vague idea about the correlation between these two variables and .

Y-Values
25

20

15

Y-Values
10

0
0 2 4 6 8 10 12 14

Figure 1

Fig 1 represents that there is a positive correlation between the variables and .

2
Y-Values
25

20

15

Y-Values
10

0
0 2 4 6 8 10 12 14

Figure 2
Fig 2 represents that there is a negative correlation between the variables and .

Y-Values
17.5

17

16.5

16
Y-Values

15.5

15

14.5
0 1 2 3 4 5 6 7 8 9

Figure 3

In Fig 3 we are not able to recognize a pattern, i.e., there is no correlation between and .

Thus a scatter diagram help us to get an idea of the correlation between the two variables ( and ).

3
THE COVARIANCE

Covariance states only the nature of the relationship between the variables, but we also want to know
how far these variables are associated with each other, we use correlation co-efficient to find the same.
Correlation co-efficient is a measure of degree or extent of linear relationship between two variables
and . Correlation co-efficient was defined by Karl Pearson in 1890. The correlation co-efficient for
is given by:

Short Cut Method

Let then
If the deviation are taken from actual mean of and then

(Range of the correlation co-efficient is (-1, 1) i.e. .


Example 1

You are given the following data relating to aptitude scores and productivity index
Aptitude score (X) 9 18 18 20 20 23 Productivity
index (Y) 33 23 33 42 29 32 Find the co-efficient of
correlation between aptitude scores and productivity index.

Solution

9 33 -9 1 -9 81 1
18 23 0 -9 0 0 81
18 33 0 1 0 0 1
20 42 2 10 20 4 100
20 29 2 -3 -6 4 9
23 32 5 0 0 25 0
Total 108 192 0 0 5 114 192

Here

, and ,

4
Example 2

Find the correlation co-efficient for the following data


4 6 8 10 12
3 6 9 12 15

Solution

4 3 -2 -2 4 4 4
6 6 -1 -1 1 1 18 9 0
0 0 0 0
10 12 1 1 1 1 1
12 15 2 2 4 4 4

Total 40 45 0 0 10 10 10

Here

Which shows that there is a perfect positive linear correlation between and .

PROBABLE ERROR AND ITS SIGNIFICANCE

Probable error of correlation co-efficient is a measure of reliability of , which is given by the formula.

If , correlation coefficient is definitely not significant. If , correlation co-efficient is


significant.

Example 3

If and is significant.

5
EXERCISES
1. Find Karl Pearson’s co-efficient of correlation of the following data.
Age of husband : 20 22 23 25 25 28 29 30 30 34
Age of wife : 18 20 22 24 21 26 26 25 27 29

2. Calculate Karl Pearson’s co-efficient of correlation between percentage of pass and failure from the
following data.
No. of students : 800 600 900 700 500 400
No. passed : 480 300 450 560 450 300

3. Calculate Karl Pearson’s co-efficient of correlation for the following data. Price (in shs) : 21
22 23 24 25 26 27 28 29 30
Demand (in 000 units) : 18 19 19 16 17 16 16 15 13 11

4. Calculate Karl Pearson’s co-efficient of correlation


:1 2 3 4 5 6 7
:3 5 6 8 10 11 13

5. Calculate co-efficient of correlation using Karl Pearson’s method.


Advertisement (‘000’ Shs) : 39 65 62 90 82 75 25 98 36 78
Sales (in ‘000’ Shs) : 47 53 58 86 62 68 60 91 51 84

6. Given below are the monthly incomes and their net saving of a sample of 10 supervisory staff
belonging to a firm. Calculate the correlation co-efficient.
Employee No.: 1 2 3 4 5 6 7 8 9 10
Monthly income (x): 780 360 980 250 750 82 900 620 650 390
Net saving: : 84 51 91 60 68 62 86 58 53 47

7. From the following data calculate the correlation co-efficient.

8. From the following details find the value of


and .

6
9. From the following data, compute the co-efficient of correlation between and .
series series
No. of items: 15 15
Arithmetic mean: 25 18
Sum of squares of deviation from mean: 136 138

10. Find the correlation co-efficient.

11. Find the probable error.

(i)
(ii)
Comment on the value of .

12. How do you interpret when it is ?


(a) When , it means is proportion to , it means as increases also increases. There is a
perfect linear association between and
(b) When , it means is inversely proportional to , it means as increases (decreases)
decreases (increases). This ensures perfect negative association.
(c) When , it means there is no linear association between two variables.
13. Calculate the correlation between age and playing habits of students.
Age (years): 15 16 17 18 19 20
No. of students: 800 600 900 700 500 400
Regular players: 480 300 450 560 450 300

14. A company wanted to assess the impact of and expenditure on annual profit following
table presents data for past 8 years.
9 7 5 10 4 5 3 2
45 42 41 60 30 34 25 20
Compute the correlation co-efficient. Comment on the value.

15. From the following data, compute the co-efficient of correlation between and .
series series
No. of items: 15 15
Arithmetic mean: 25 18
Sum of squares of deviation from mean: 136 138

16. Find the correlation co-efficient.


(iii)
(iv)
7
17. Find the probable error.
(ii)
(iii)
Comment on the value of .

SPEARMAN’S RANK CORRELATION.

In correlation co-efficient we used the actual value to measure the degree of relationship; here we use
the rank of the given data. Rank correlation is useful when we study the relationship between quantitative
characteristic like beauty, intelligence etc. Ranks are obtained by arranging the data in order of their
merits. Spearman’s rank correlation formula is given by

Where is the difference between the ranks of the two variables.

Example 1

Calculate rank correlation from the following.


Marks in statistics : 15 20 25 18 40 60 80
Marks in Accountancy : 40 30 50 36 20 10 60

Solution

15 40 7 3 4 16
20 30 5 5 0 0
25 50 4 2 2 4
18 36 6 4 2 4
40 20 3 6 -3 9
60 10 2 7 -5 25
80 60 1 1 0 0

Total 58

8
REPEATED RANKS

If there are more than one item with the same value in the series then common ranks are given to the
repeated items. This common rank is the average of the ranks which these items would have attained if
they were slightly different from each other and the next item gets the rank next to the ranks (actual rank).
As a result there is a small adjustment in the rank correlation formula.

This adjustment factor is added to ( where is the number of items repeated ) for each reapeated
values.

Example 2
Calculate rank correlation from the following.
Marks in statistics : 15 20 28 12 40 60 20 80
Marks in Accountancy : 40 30 50 30 20 10 30 60

Solution

15 40 7 3 4 16

20 30 5.5 5 0.5 0.25

28 50 4 2 2 0.4

12 36 8 5 3 9

40 20 3 7 -4 16

60 10 2 8 -6 36

20 30 5.5 5 0.5 0.25

80 60 1 1 0 0

Total 81.5

In series: 20 is repeated 2 times, the actual ranks will be 5 and 6


the average is 5.5, the mark 15 gets the seventh position.
Here

In Y series : 30 is repeated 3 times, it shares 4th, 5th and 6th position 30 gets the average.

9
Here

EXERCISES
1. Ten competitors in a beauty contest are ranked by three judges in the following order.
First judge: 1 5 4 8 9 6 10 7 3 2 Second judge:
4 8 7 6 5 9 10 3 2 1
First judge: 6 7 8 1 5 10 9 2 3 4

2. Find rank correlation co-efficient for the following data and give your comments.
Marks in accounts : 85 56 89 58 59 67 74 78
Marks in maths 38 69 56 58 63 78 87 77

3. Find the rank correlation co-efficient and comment


Marks in mathematics : 60 70 50 40 80 90 85 96
Marks in statistics : 64 58 72 44 86 80 95 81

4. Find rank correlation co-efficient for the following data and give your comments.
Marks in statistics : 84 56 89 58 59 67 74 78
Marks in law : 38 69 56 58 63 78 87 77

5. A sample of five fathers and their eldest sons gave the following data about their weight in kgs.
Father : 65 60 67 63 74
Son : 49 45 57 40 60
Obtain the rank correlation co-efficient and comment on the results.

6. Calculate the rank correlation.


: 65 66 67 68 69 70 71 72
: 67 68 69 72 78 80 82 85
7. Height of fathers and sons are given in inches.
Height of father : 65 66 67 68 69 71 73
Height of son : 67 68 64 68 72 70 69

8. Compute the rank correlation between I.Q’s and marks scored in examination.
Personal: A B C D E F
I.Q: 100 110 140 160 120 130
Exam mark: 70 80 81 78 72 75

9. Ten competitors in a beauty contest are ranked by three judges in the following order.

10
Judge I: 1 6 5 10 3 2 4 9 7 8
Judge II: 3 5 8 4 7 10 2 1 6 9
Judge III: 6 4 9 8 1 2 3 10 5 7
Use the rank correlation co-efficient to determine which pair of judges has the nearest approach to
common tastes in beauty.

10. The table below shows the respective I.Q’s of 7 father and their eldest sons.
Father I.Q : 90 98 103 104 105 110 114
Sons I.Q : 102 95 107 114 100 98 113
Calculate the rank correlation for the above statistics.

11. Answer the following


1. and . Find
2. and. Find
3. Find
4. and Find

REGRESSION ANALYSIS INTRODUTION

In the previous chapter we discussed about the nature of relation-ship between the variable (i.e
correlation) and the extend to which they are correlated. Correlation analysis just gave us an idea about
the association of the variable, but we would like to know how far they are related to each other. i.e we
would like to know the functional relationship between two variables. Regression methods are meant to
determine these functional relationships.
REGRESSION
Regression analysis is a mathematical measure of the average relationship between two or more variables
in term of the original units of the data.
In regression analysis there are two types of variable; namely the dependent and independent variable. A
dependent variable is the one whose value is to be predicted and independent variable is the one which
influences the value of the variable (dependent)
LINE OF REGRESSION
In scatter diagram (discussed in previous topic) , we find that the points cluster around some curve called
curve regression. If curve is straight line, it is called the line of regression.
“The line of regression is the line which gives the best estimate to the value of one variable for any specific
value of the other variable”.
Let us consider the relationship between two variable and . Here there are two lines of regression (i)
on and (ii) on .
(i) Line of regression on : Here is the dependent variable of is independent variable and
this line gives the best estimate for the value of for any specified value of . The regress
equation for on is given by

11
(ii) Line of regression on : this line is used to estimate the value of for any specified value
of . The regression equation for on is given by

In the above equations and are constants and are obtained by least squares principle.

LEAST SQUARES METHOD


Legender’s principle of least squares is “To minimize the sum of squares of the deviations of the actual
values (on ) from its estimated values as given by the line of best fit.”

Regression Equation of on is .
Then according to least squares principles, we have to minimize.

Actual Estimated
is minimized by partially differentiating it Value Value and respective and equating them to
zero. In this way we get two equations.
These equations are known as normal equations. Solving these equation
simultaneously, we can obtain the values of the constants and .

Regression equation of on :
The regression equation of on is
Here we have to minimize
Similarly here we get
As the normal equations and the values of and can be obtain from them.

Example
Find the regression equation on for the following data.
: 2 4 6 8
: 5 15 20 25
Solution

2 5 4 25 10
4 15 16 225 60
6 20 36 400 120
8 25 64 625 200
Total 20 65 120 1275 390

The normal equation for on .

12
(1)
(2)
9 5: (3)
(2)
(3) - (2): -65 =

Substituting the value of in (1) above we get

REGRESSION CO-EFFICIENT
In a regression equation ‘ is the intercept which the line cuts I the axis and is the slope of the
line and is called the regression co-efficient. The regression co-efficient is a measure of change in
dependent variable corresponding to a unit change in independent variable.

Regression co-efficient of on is denoted by and on is denoted by


RELATIONSHIP BETWEEN REGRESSION CO-EFFICIENT ‘ ’ AND CORRELATION CO-EFFICIENT ‘ By

definition we know that

By solving the normal equation 6.3 we get

Multiplying and dividing by we get

for on

Similarly for on
We get

for on

13
for on
Now the two regression equations are
for on
for on

Computing of regression co-efficient

Regression co-efficient for on is given by

Let

Be the deviation from the original units then

If the deviations are taken from actual means then

Similarly for on we have

Let

Be the deviation from the original units then

If the deviations are taken from actual means then

Example 1
You are given the following

Mean 10 15
14
S.D 2 4

Find the two regression equations

Solution
Regression equation on

Regression equation on

Example
Find the regression equation for the following
2 4 6 8
10 20 25 30

Solution

Regression equation

15
2 10 -3 -5 9 25 -15

4 20 -1 5 1 25 -15

6 25 1 10 1 100 10

8 30 3 15 9 225 45

Total 20 65 0 25 20 375 35

Regression equation of on .

Regression equation of on .

PROPERTIES OF REGRESSION CO-EFFICIENT

1. The range of regression co-efficient is to


2. The correlation co-efficient between two variable and is given by

3.
Property hold the same for regression co-efficient on .

4. If one of the regression co-efficient is greater than unity the other one is less than unity.
i.e, if then
5. If the variable and are independent the regression co-efficient is zero.

16
DISTINCTION BETWEEN CORRELATION AND REGRESSION ANALYSIS
As we have discussed about these two analysis separately, now let us distinguish between them.

1. It studies the relationship between two It estimates the functional relationship


variables between and variable
2. Correlation need not imply cause and It studies the cause and effect of relationships
effect relationship between the between the variables
variables under study
3. Correlation analysis is confined only to Regression analysis deals both linear and non
the study of linear relationship linear relationship of the variables under
study.
4. It measures the degree of covariance It studies the nature of covariance.

5. The relationship may be purely a There is perfect relationship and it has


chance and it may not have practical practical relevance
relevance

6. Correlation co- Regression co-efficient is not symmetric i.e


efficient
between and is symmetric i.e
7. Regression co-efficient is an absolute measure
Correlation co-efficient is a relative
measure and its range is -1 to 1 and its range is to

Exercise

1. Obtain the regression of on and on from the following table and estimate the blood pressure
when age is 50.

Age Blood pressure

56 147
42 125
72 160
36 118
63 149
47 128
55 150
49 145
38 115
42 140
68 152
60 155

17
2. A company wanted to assess the impact of and expenditure on annual profit . Following
table presents data for past 8 years.

9 45
7 42
5 41 10 60
4 30
5 34
3 25
2 20

Find the regression equation on . Estimate the profit for and expenditure of Shs 8 thousand.

3. Calculate the two lines of regression for the following data.


: 1 2 3 4 5
: 3 6 9 12 15

4. You are given the following in formation about advertisement and sales.
Adv. Exp. Sales
(Shs million) (Shs million)
Mean 20 120
S.D 5 25
Correlation co-efficient = 0.8
1. Calculate the two regression equations.
2. Find the likely sales when advertisement expenditure is Shs 25 million

5. Given No. of pairs = 12

A. M 74.50 125.50
S.D 13.07 15.85
Summation of products of corresponding deviations from respective means = 2176. Calculate the co-
efficient of correlation between and and find the regressions equations.

6. From the following details estimate the value of A when B =55.


Mean of A = 39.5, mean of B = 47.5
Std. Dev. of A = 10.8, Std. Dev. of B = 16.8
Co-efficient of correlation between A and B =0.42.
18
7. You are given the following data.

Mean 20 10

S.D 3 4

Estimate the value of when

8. Find the two regression equations.

Mean 25 22

S.D 4 5

9. If and , find .

10. From the data below, find the two regression equation.

25 43

28 46

35 49

32 41

31 36

36 32

29 31

38 30

34 33

32 39

Compute co-efficient of correlation between and

19
11. Calculate the regression equation for the following data and also compute Karl Pearson’s coefficient
of correlation:
No. of students : 800 600 900 700 500 400
No. of Passed : 480 300 450 560 450 310

12. To study the effect of rain on yield of wheat the following results were obtained.
Mean S.D
Yield : 800 12
Rainfall : 50 2

Estimate the yield when the rainfall is 80 inches.

13. Find the two regression equations, regression co-efficient of correlation from the following figures.

20

You might also like