Corr. & Reg
Corr. & Reg
Correlation
When the two variables are such that the change in the value of one variable is accompanied by a change in the value
of the other variable, they are said to be correlated.
For example : age & height of a child, demand & price of a commodity, rainfall & agricultural production etc. are
correlated variables. Correlation may be positive or negative :
Positive Correlation :- When high values of one variable are associated with high values of the other variable, they are
said to be directly or positively correlated.
i.e. If the increase/decrease in the value of one variable causers an increase/decrease in the value of the other variable,
correlation between them is positive. e.g. age & height of a child.
Negative Correlation :- When high values of one variable tend to accompany low values of the other, they are
inversely or negatively correlated.
i.e., If the increase / decrease in the value of one variable causes a decrease/ increase in the value of the other variable,
correlation between them is negative. e.g., age & playing habit, vaccination & disease.
Properties of correlation coefficient (r) :-
(1) It is independent of change of origin & scale.
(2) It is a pure number, i.e., it has no unit of measurement.
(3) It lies from – 1 to +1 ; (–1 r 1)
-: Limits of correlation coefficient :-
Direction
Degree
Positive Negative
i) Perfect +1 –1
ii) Very High 0·75 r < 1 –1 < r – 0·75
iii) High 0·50 r < 0·75 –0·75< r –0·50
iv) Low/moderate 0·25 r < 0·50 – 0·50 < r – 0.25
v) Very Low 0 < r < 0.25 – 0.25 < r < 0
vi) Absent (i.e., No Correlation) 0 0
-: Measures of Correlation :-
(I) Scatter Diagram Method
(II) Karl Pearson‟s Coefficient of Correlation (Co – Variance method)
(III) Concurrent Deviations Method
(IV) Rank Method ASHUTOSH GUPTA CLASSES Mob: 7983232057 / 9837121456
(V) Two – way frequency table (Bivariate correlation method)
(I) Scatter Diagram :- If we plot the values of „X‟ & „Y‟ on a graph paper, the resulting diagram is known as scatter
diagram. Scatter diagram gives a rough idea of correlation between the two variables.
O (r = +1) X O (r = –1) X
Y Y
(II) Karl Pearson’s Coefficient of Correlation (Co – Variance method) or, Product Moment correlation coefficient
𝟏
𝑿− 𝑿 . 𝒀− 𝒀 𝑿− 𝑿 . 𝒀− 𝒀
i.e. 𝒓 = 𝒏
or,
𝒓= ........................... (ii)
𝑿− 𝑿 𝟐 𝒀− 𝒀 𝟐
𝑿− 𝑿 𝟐 𝒀− 𝒀 𝟐
𝒏 𝒏
(By using „Actual Means‟)
𝑿−𝒂 𝒀−𝒃
Where, 𝒅𝑿 = 𝒉
& 𝒅𝒀 = 𝒌
a, b, h & k are constants
where, ‘n’ ................. total no. of signs ( in CX.CY column )(i.e. no. of observations –1)
2𝑐 − 𝑛 ‘C’ ................. no. of positive signs ( in CX.CY column )
𝑟𝑐 = ± ±
𝑛
Coefficient of determination
‘r2’ is called coefficient of determination It gives an idea about the explained and unexplained variation
(100 r2)% .............. explained variation ; 100(1 – r2 )% .........unexplained variation
Remark :- ‘(1 – r2)’ is called coefficient of non-determination
(1) Find out the coefficient of correlation between the sales and expenses of the following „10‟ firms
Firms 1 2 3 4 5 6 7 8 9 10
Sales (in ‟000Rs.) 50 50 55 60 65 65 65 60 60 50
Expenses (in ‟000Rs.) 11 13 14 16 16 15 15 14 13 13
(2) The following results are obtained between two series from their respective means. Compute the coefficient of
correlation :
X – series Y – series
(i) No. of items 7 7
(ii) Arithmetic Mean 4 8
(i) Sum of squares of deviations
28 76
from A.M.
(ii) Summation of products of
deviations of „X‟ & „Y‟ series 46
from their respective means
(3) Calculate „r‟, if given that n = 10 ; X = 100 ; Y = 150 ; (X–10)2 = 180 ; (Y–15)2 = 215 ;
(X–10)(Y–15) = 60 ; ASHUTOSH GUPTA CLASSES Mob: 7983232057 / 9837121456
(4) Quotation of index number of equity share prices of a certain joint stock company and of preference shares are
given below :
Years 1991 1992 1993 1994 1995 1996 1997
Equity 97·5 99·4 98·6 96·2 95·1 98·4 97·1
Preference shares 75·1 75·9 77·1 78·2 79·0 74·8 76·2
Use the method of rank correlation to determine the relationship between equity share and preference share prices.
(5) Compute the rank correlation coefficient from the following data :
A 115 109 112 87 98 98 120 100 98 118
B 75 73 85 70 76 65 82 73 68 80
(6) Ten students were ranked on the basis of two attributes beauty „X‟ and intelligence „Y‟. The coefficient of rank
correlation between „X‟ and „Y‟ was found to be 0·5. It was later discovered that the difference in ranks in the two
attributes obtained by one of the students was wrongly taken as „3‟ instead of „7‟. Find the correct rank correlation
coefficient ?
(7) Compute the rank correlation coefficient from the following data :
A 45 56 39 54 45 40 56 60 30 36
B 40 36 30 44 36 32 45 42 20 36
(8) Find the rank correlation coefficient between sales and expenses of „10‟ firms :
Sales 50 56 54 60 67 63 60 62 68 69
Expenses 21 23 24 27 32 34 28 30 33 32
4
(9) (i) Compute the correlation coefficient between the corresponding values of X & Y in the following table:-
X 2 4 5 6 8 11
Y 18 12 10 8 7 5
(ii) Multiply each X value in the table by 2 and add 6. Multiply each value of Y in the table by 3 & subtract 15.
Find correlation coefficient between two new set of values. Explain why you do or do not obtain the same result
as in (i).
(15) Calculate the correlation coefficient between X & Y series from the following data :
12 _ 12 _ 12 _ _
(a) (XI – X)2 = 360 ; (Y1 – Y)2 = 250 and (XI – X)(YI – Y) = 225.
i=1 i=1 i=1
_ _ _ _
(b) n = 100, (XI – X)2 = 169 ; (YI - Y)2 =64 ; (XI - X)(YI – Y) = 101
(16) In a contest, two judges ranked 8 candidates A,B,C,D,E,F,G & H in order of their performance, as shown in the
following table. Find the rank correlation coefficient.
Candidates A B C D E F G H
First Judge : 5 2 8 1 4 6 3 7
Second Judge : 4 5 7 3 2 8 1 6
1984–Nov The date relating to import price (y) and import quantity (x) in respect of a given commodity are as under
Year 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984
Import Price (Y) 2 3 6 5 4 3 5 7 8 7
Quantity Imported(X) 6 5 4 5 7 10 9 7 8 9
(a) Calculate Karl Pearson‟s coefficient of correlation between x and y and comment on it.
(b) Find the percentage of variation in import prices that is explained by the variation in the quantity imported.
1987 – Nov 11(a) Coefficient of correlation (r) between two variables X and Y is 0·95. What percent variation in X
(the dependent variable) remains unexplained by the variation in Y (the independent variable) ?
1988 – Nov 8(b) Calculate Spearman‟s coefficient of correlation between marks assigned to ten students by judges X
and Y in a certain competitive test as shown below-
S.No. 1 2 3 4 5 6 7 8 9 10
Mark by X Judge 52 53 42 60 45 41 37 38 25 27
Mark by Y judge 65 68 43 38 77 48 35 30 25 50
1991 – Nov 8(d) Calculate the coefficient of correlation, using the method of concurrent deviations, between supply
and demand given the following data. Also comment on your result.
Year: 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
Supply (000 tons) 120 110 120 119 140 125 127 119 140 160
Demand (000 tons) 240 250 260 266 232 245 255 267 268 239
1993 – June 9(b) The coefficient of correlation between two varieties X and Y is 0·8 and their co- variance is 20. If
the variance of X series is 16, find the standard deviation of Y series.
5
1994 – June 9(b) Calculate the coefficient of correlation from the following data by the method of rank differences:-
X 10 4 2 5 8 5 6 9
Y 10 6 2 5 8 4 5 9
1994 – Nov 11(b) Calculate Karl Pearson‟s coefficient of correlation for the data given below:
Independent Variable X 3 7 5 4 6 8 2 7
Dependent Variable Y 7 12 8 8 10 13 5 10
1995 – Nov 9(b) Calculate coefficient of correlation, using the method of concurrent deviation, between supply &
demand of an item for a ten year period as given below:-
Year 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Supply 125 160 164 174 155 170 165 162 172 175
Demand 115 125 192 190 165 174 124 127 152 169
2001 – May 14 (b)(iii) Given that the correlation between „x‟ & „y‟ is 0·5. What is the correlation between 2x – 4
and 3 – 2y ?
Regression
By regression we mean average relationship between two variables and this relationship is used to estimate the most
likely values of one variable for specified values of the other variable. One of the variables is called independent
variable and the other is called dependent variable.
-: Regression Lines :-
If there is some correlation between the variables, the points of the scatter diagram are concentrated around a straight
line. This line is called regression line.
Y
A line is drawn to these points which passes through the
maximum number of points & the distances of points
from the line through which it does not pass should be minimum.
-: Y on X Line :-
If we minimize the distances parallel to the axis of Y, we get the regression line of „Y on X‟.
Form : Y = a + bX
Y
Y dependent variable
„Y on X‟ Line
X Independent variable
X
O
-: X on Y Line :-
If we minimize the distances parallel to the axis of X. we get the regression line of „X on Y‟.
Form : X = a + bY
Y
X Dependent Variable
„X on Y line‟
Y Independent Variable
X
O
(Y Y ) r ( X X ) ............. *
σx
__ __
or (Y Y) b yx ( X X) ............. * *
σy
Here b y x ( r ) is called the regression coefficient of „Y on X‟.
σx
Note :- This line gives the best estimated value of „Y‟ for a given value of „X‟. i.e., we can use this equation when „X‟
is given & „Y‟ is to be determined.
7
(x ,y )
„Y on X‟ line
„X on Y‟ Line
(2) The angle between the regression lines depends on the value of „r‟. As the value of „r‟ increases from „0‟ to „1‟,the
angle between the two regression lines diminishes from 90o to 0o.
X
O
(4) If “r = 1”, Both lines become identical. __
(X X )
(i) When r = +1 …………. (ii) When r = –1………….
(2) The coefficient of correlation & the two regression coefficients (ie., r, byx & bxy) all have the same sign.
8
(3) Regression coefficients are independent of change of origin but not of scale.
(4) If one regression coefficient in greater than one, then the other regression coefficient must be less than one.
(5) Since A.M. of two numbers is always greater than or equal to their G.M. (A.M.≥G.M.), and therefore, A.M. of
bYX & bXY is equal to or greater than the coefficient of correlation.
Practical Problems
(1) You are given variance of X = 9. The regression equations are 8x – 10y + 66 = 0 and 40x – 18y = 214,
find (i) Average values of „X‟ & „Y‟ (ii) Correlation coefficient between the two variables
(iii) Standard Deviation of Y
(2) The correlation coefficient between supply (y) & price (x) of a commodity is 0·60. If
__ __
𝜎x = 1·50 ; 𝜎y = 2·00 ; X = 10 & Y = 20 ; Find the equations of regression lines.
(3) The lines of regression of y on x & x on y are respectively y = x + 5 & 16x = 9y – 94. Find the variance of x, if
the variance of y is 16. Also find the covariance of x & y.
__ __
(4) For a bivariate data X = 20 ; Y = 45 ; byx = 4 & bxy = 1/9 find (a) r (b) 𝜎x if 𝜎y = 12 (c) write down the two
regression lines.
(5) A student obtained the two regression lines as :- 2x – 5y = 7 & 3x + 2y = 8 ; Do you agree with him.
__ __
(6) Find X & Y & r ; for 2x – 3y = 0 & 4y – 5x = 8.
__ __
(7) The two regression lines are given as : y = 0·5X + 25 & X = 0·4y + 22 Calculate (a) X & Y ; (b) the most
likely value of x for y = 8.
(8) The two regression lines are given as : 3X +2Y = 26 & 6X + Y = 31,
Calculate: (a) r (b) 𝜎y, if variance of X = 25.
(9) The equation of two regression lines in a correlation analysis are as follows : 3x + 2y = 26 & 6x + y = 31.
__ __
A student obtain the mean values X = 7 ; Y = 4 and the value of r = 0·5. do you agree with him ?
__ __
(10) Given, Variance of X = 25 ; Regression equations are 5x – y = 22 & 64x – 45y = 24. Find (a) X & Y ;
(b) σy (c) r.
1983 – Nov 8(d) Regression equations of two variables X and Y are as follows:
3X + 2Y – 26 = 0 6X + Y – 31 = 0
Find – (i) the mean of X, (ii) the regression coefficient of X on Y, (iii) the coefficient of correlation between X and Y,
(iv) the most probable value of Y when X = 5.
1987 – Nov 8(b) Given is the following information :-
X(Rs) Y(Rs)
Arithmetic average 6 8
40
Standard deviation 5
3
8
Coefficient of correlation between X and Y = . Find (i) the regression coefficient of Y on X
15
(ii) the regression equation of X on Y ; (iii) the most likely value of Y when X = 100 Rupees.
1993 – June 14(a) (iii) If the regression coefficient of the X on Y is – 1/6 and that of Y on X is –3/2, what is the value of
the correlation coefficient between X and Y.
1993 – Dec 9(b) Compute the two regression equations on the basis of the following information:-
X Y
Mean 40 45
Standard deviation 10 9
Karl Pearson‟s correlation coefficient between X and Y = 0·50. Also estimate the value of Y when X = 48 using the
appropriate regression equation.