attachment 1
attachment 1
Content
• Type of correlation
• Scatter diagram
• The covariance
• The correlation co-efficient
• Probable Error
• Spearman’s rank correlation
Formulae
1. Covariance
2. Correlation co-efficient
a)
b)
c)
d) When deviations are taken from actual mean.
3. Probable error
a)
b)
Point to remember
1. If the value of is ‘+’. There is a positive correlation between the variables if it as negative
then negative correlation.
2. If there is no correlation.
3. Value of is not affected by change in scale and origin.
4. The range of is -1 to +1.
TYPES OF CORRELATION
Let be the two variables under study. We note that there exist, three types of relationship between
these two variables (i) they are directly proportional i.e. when one increases (decreases) the other
1
increases (decreases); (ii) they are inversely proportional, i.e. when one increases (decreases) the other
decreases (increases); (iii) no relation. These three situations can be stated as follows statistically (i)
positively correlated (ii) negatively correlated (iii) no-correlation.
SCATTER DIAGRAM
It is the simplest way of the diagrammatic representation of bivariate data. Let and be the two
variables under study with observation each. If we plot on a graph sheet,
the resulting diagram gives us a vague idea about the correlation between these two variables and .
Y-Values
25
20
15
Y-Values
10
0
0 2 4 6 8 10 12 14
Figure 1
Fig 1 represents that there is a positive correlation between the variables and .
2
Y-Values
25
20
15
Y-Values
10
0
0 2 4 6 8 10 12 14
Figure 2
Fig 2 represents that there is a negative correlation between the variables and .
Y-Values
17.5
17
16.5
16
Y-Values
15.5
15
14.5
0 1 2 3 4 5 6 7 8 9
Figure 3
In Fig 3 we are not able to recognize a pattern, i.e., there is no correlation between and .
Thus a scatter diagram help us to get an idea of the correlation between the two variables ( and ).
3
THE COVARIANCE
Covariance states only the nature of the relationship between the variables, but we also want to know
how far these variables are associated with each other, we use correlation co-efficient to find the same.
Correlation co-efficient is a measure of degree or extent of linear relationship between two variables
and . Correlation co-efficient was defined by Karl Pearson in 1890. The correlation co-efficient for
is given by:
Let then
If the deviation are taken from actual mean of and then
You are given the following data relating to aptitude scores and productivity index
Aptitude score (X) 9 18 18 20 20 23 Productivity
index (Y) 33 23 33 42 29 32 Find the co-efficient of
correlation between aptitude scores and productivity index.
Solution
9 33 -9 1 -9 81 1
18 23 0 -9 0 0 81
18 33 0 1 0 0 1
20 42 2 10 20 4 100
20 29 2 -3 -6 4 9
23 32 5 0 0 25 0
Total 108 192 0 0 5 114 192
Here
, and ,
4
Example 2
Solution
4 3 -2 -2 4 4 4
6 6 -1 -1 1 1 18 9 0
0 0 0 0
10 12 1 1 1 1 1
12 15 2 2 4 4 4
Total 40 45 0 0 10 10 10
Here
Which shows that there is a perfect positive linear correlation between and .
Probable error of correlation co-efficient is a measure of reliability of , which is given by the formula.
Example 3
If and is significant.
5
EXERCISES
1. Find Karl Pearson’s co-efficient of correlation of the following data.
Age of husband : 20 22 23 25 25 28 29 30 30 34
Age of wife : 18 20 22 24 21 26 26 25 27 29
2. Calculate Karl Pearson’s co-efficient of correlation between percentage of pass and failure from the
following data.
No. of students : 800 600 900 700 500 400
No. passed : 480 300 450 560 450 300
3. Calculate Karl Pearson’s co-efficient of correlation for the following data. Price (in shs) : 21
22 23 24 25 26 27 28 29 30
Demand (in 000 units) : 18 19 19 16 17 16 16 15 13 11
6. Given below are the monthly incomes and their net saving of a sample of 10 supervisory staff
belonging to a firm. Calculate the correlation co-efficient.
Employee No.: 1 2 3 4 5 6 7 8 9 10
Monthly income (x): 780 360 980 250 750 82 900 620 650 390
Net saving: : 84 51 91 60 68 62 86 58 53 47
6
9. From the following data, compute the co-efficient of correlation between and .
series series
No. of items: 15 15
Arithmetic mean: 25 18
Sum of squares of deviation from mean: 136 138
(i)
(ii)
Comment on the value of .
14. A company wanted to assess the impact of and expenditure on annual profit following
table presents data for past 8 years.
9 7 5 10 4 5 3 2
45 42 41 60 30 34 25 20
Compute the correlation co-efficient. Comment on the value.
15. From the following data, compute the co-efficient of correlation between and .
series series
No. of items: 15 15
Arithmetic mean: 25 18
Sum of squares of deviation from mean: 136 138
In correlation co-efficient we used the actual value to measure the degree of relationship; here we use
the rank of the given data. Rank correlation is useful when we study the relationship between quantitative
characteristic like beauty, intelligence etc. Ranks are obtained by arranging the data in order of their
merits. Spearman’s rank correlation formula is given by
Example 1
Solution
15 40 7 3 4 16
20 30 5 5 0 0
25 50 4 2 2 4
18 36 6 4 2 4
40 20 3 6 -3 9
60 10 2 7 -5 25
80 60 1 1 0 0
Total 58
8
REPEATED RANKS
If there are more than one item with the same value in the series then common ranks are given to the
repeated items. This common rank is the average of the ranks which these items would have attained if
they were slightly different from each other and the next item gets the rank next to the ranks (actual rank).
As a result there is a small adjustment in the rank correlation formula.
This adjustment factor is added to ( where is the number of items repeated ) for each reapeated
values.
Example 2
Calculate rank correlation from the following.
Marks in statistics : 15 20 28 12 40 60 20 80
Marks in Accountancy : 40 30 50 30 20 10 30 60
Solution
15 40 7 3 4 16
28 50 4 2 2 0.4
12 36 8 5 3 9
40 20 3 7 -4 16
60 10 2 8 -6 36
80 60 1 1 0 0
Total 81.5
In Y series : 30 is repeated 3 times, it shares 4th, 5th and 6th position 30 gets the average.
9
Here
EXERCISES
1. Ten competitors in a beauty contest are ranked by three judges in the following order.
First judge: 1 5 4 8 9 6 10 7 3 2 Second judge:
4 8 7 6 5 9 10 3 2 1
First judge: 6 7 8 1 5 10 9 2 3 4
2. Find rank correlation co-efficient for the following data and give your comments.
Marks in accounts : 85 56 89 58 59 67 74 78
Marks in maths 38 69 56 58 63 78 87 77
4. Find rank correlation co-efficient for the following data and give your comments.
Marks in statistics : 84 56 89 58 59 67 74 78
Marks in law : 38 69 56 58 63 78 87 77
5. A sample of five fathers and their eldest sons gave the following data about their weight in kgs.
Father : 65 60 67 63 74
Son : 49 45 57 40 60
Obtain the rank correlation co-efficient and comment on the results.
8. Compute the rank correlation between I.Q’s and marks scored in examination.
Personal: A B C D E F
I.Q: 100 110 140 160 120 130
Exam mark: 70 80 81 78 72 75
9. Ten competitors in a beauty contest are ranked by three judges in the following order.
10
Judge I: 1 6 5 10 3 2 4 9 7 8
Judge II: 3 5 8 4 7 10 2 1 6 9
Judge III: 6 4 9 8 1 2 3 10 5 7
Use the rank correlation co-efficient to determine which pair of judges has the nearest approach to
common tastes in beauty.
10. The table below shows the respective I.Q’s of 7 father and their eldest sons.
Father I.Q : 90 98 103 104 105 110 114
Sons I.Q : 102 95 107 114 100 98 113
Calculate the rank correlation for the above statistics.
In the previous chapter we discussed about the nature of relation-ship between the variable (i.e
correlation) and the extend to which they are correlated. Correlation analysis just gave us an idea about
the association of the variable, but we would like to know how far they are related to each other. i.e we
would like to know the functional relationship between two variables. Regression methods are meant to
determine these functional relationships.
REGRESSION
Regression analysis is a mathematical measure of the average relationship between two or more variables
in term of the original units of the data.
In regression analysis there are two types of variable; namely the dependent and independent variable. A
dependent variable is the one whose value is to be predicted and independent variable is the one which
influences the value of the variable (dependent)
LINE OF REGRESSION
In scatter diagram (discussed in previous topic) , we find that the points cluster around some curve called
curve regression. If curve is straight line, it is called the line of regression.
“The line of regression is the line which gives the best estimate to the value of one variable for any specific
value of the other variable”.
Let us consider the relationship between two variable and . Here there are two lines of regression (i)
on and (ii) on .
(i) Line of regression on : Here is the dependent variable of is independent variable and
this line gives the best estimate for the value of for any specified value of . The regress
equation for on is given by
11
(ii) Line of regression on : this line is used to estimate the value of for any specified value
of . The regression equation for on is given by
In the above equations and are constants and are obtained by least squares principle.
Regression Equation of on is .
Then according to least squares principles, we have to minimize.
Actual Estimated
is minimized by partially differentiating it Value Value and respective and equating them to
zero. In this way we get two equations.
These equations are known as normal equations. Solving these equation
simultaneously, we can obtain the values of the constants and .
Regression equation of on :
The regression equation of on is
Here we have to minimize
Similarly here we get
As the normal equations and the values of and can be obtain from them.
Example
Find the regression equation on for the following data.
: 2 4 6 8
: 5 15 20 25
Solution
2 5 4 25 10
4 15 16 225 60
6 20 36 400 120
8 25 64 625 200
Total 20 65 120 1275 390
12
(1)
(2)
9 5: (3)
(2)
(3) - (2): -65 =
REGRESSION CO-EFFICIENT
In a regression equation ‘ is the intercept which the line cuts I the axis and is the slope of the
line and is called the regression co-efficient. The regression co-efficient is a measure of change in
dependent variable corresponding to a unit change in independent variable.
for on
Similarly for on
We get
for on
13
for on
Now the two regression equations are
for on
for on
Let
Let
Example 1
You are given the following
Mean 10 15
14
S.D 2 4
Solution
Regression equation on
Regression equation on
Example
Find the regression equation for the following
2 4 6 8
10 20 25 30
Solution
Regression equation
15
2 10 -3 -5 9 25 -15
4 20 -1 5 1 25 -15
6 25 1 10 1 100 10
8 30 3 15 9 225 45
Total 20 65 0 25 20 375 35
Regression equation of on .
Regression equation of on .
3.
Property hold the same for regression co-efficient on .
4. If one of the regression co-efficient is greater than unity the other one is less than unity.
i.e, if then
5. If the variable and are independent the regression co-efficient is zero.
16
DISTINCTION BETWEEN CORRELATION AND REGRESSION ANALYSIS
As we have discussed about these two analysis separately, now let us distinguish between them.
Exercise
1. Obtain the regression of on and on from the following table and estimate the blood pressure
when age is 50.
56 147
42 125
72 160
36 118
63 149
47 128
55 150
49 145
38 115
42 140
68 152
60 155
17
2. A company wanted to assess the impact of and expenditure on annual profit . Following
table presents data for past 8 years.
9 45
7 42
5 41 10 60
4 30
5 34
3 25
2 20
Find the regression equation on . Estimate the profit for and expenditure of Shs 8 thousand.
4. You are given the following in formation about advertisement and sales.
Adv. Exp. Sales
(Shs million) (Shs million)
Mean 20 120
S.D 5 25
Correlation co-efficient = 0.8
1. Calculate the two regression equations.
2. Find the likely sales when advertisement expenditure is Shs 25 million
A. M 74.50 125.50
S.D 13.07 15.85
Summation of products of corresponding deviations from respective means = 2176. Calculate the co-
efficient of correlation between and and find the regressions equations.
Mean 20 10
S.D 3 4
Mean 25 22
S.D 4 5
9. If and , find .
10. From the data below, find the two regression equation.
25 43
28 46
35 49
32 41
31 36
36 32
29 31
38 30
34 33
32 39
19
11. Calculate the regression equation for the following data and also compute Karl Pearson’s coefficient
of correlation:
No. of students : 800 600 900 700 500 400
No. of Passed : 480 300 450 560 450 310
12. To study the effect of rain on yield of wheat the following results were obtained.
Mean S.D
Yield : 800 12
Rainfall : 50 2
13. Find the two regression equations, regression co-efficient of correlation from the following figures.
20