0% found this document useful (0 votes)
16 views

Corr. & Reg

Uploaded by

nbnoble1814
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Corr. & Reg

Uploaded by

nbnoble1814
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

1

Correlation
When the two variables are such that the change in the value of one variable is accompanied by a change in the value
of the other variable, they are said to be correlated.
For example : age & height of a child, demand & price of a commodity, rainfall & agricultural production etc. are
correlated variables. Correlation may be positive or negative :
Positive Correlation :- When high values of one variable are associated with high values of the other variable, they are
said to be directly or positively correlated.
i.e. If the increase/decrease in the value of one variable causers an increase/decrease in the value of the other variable,
correlation between them is positive. e.g. age & height of a child.
Negative Correlation :- When high values of one variable tend to accompany low values of the other, they are
inversely or negatively correlated.
i.e., If the increase / decrease in the value of one variable causes a decrease/ increase in the value of the other variable,
correlation between them is negative. e.g., age & playing habit, vaccination & disease.
Properties of correlation coefficient (r) :-
(1) It is independent of change of origin & scale.
(2) It is a pure number, i.e., it has no unit of measurement.
(3) It lies from – 1 to +1 ; (–1  r 1)
-: Limits of correlation coefficient :-
Direction
Degree
Positive Negative
i) Perfect +1 –1
ii) Very High 0·75  r < 1 –1 < r  – 0·75
iii) High 0·50  r < 0·75 –0·75< r  –0·50
iv) Low/moderate 0·25  r < 0·50 – 0·50 < r  – 0.25
v) Very Low 0 < r < 0.25 – 0.25 < r < 0
vi) Absent (i.e., No Correlation) 0 0

-: Measures of Correlation :-
(I) Scatter Diagram Method
(II) Karl Pearson‟s Coefficient of Correlation (Co – Variance method)
(III) Concurrent Deviations Method
(IV) Rank Method ASHUTOSH GUPTA CLASSES Mob: 7983232057 / 9837121456
(V) Two – way frequency table (Bivariate correlation method)
(I) Scatter Diagram :- If we plot the values of „X‟ & „Y‟ on a graph paper, the resulting diagram is known as scatter
diagram. Scatter diagram gives a rough idea of correlation between the two variables.

O (r = +1) X O (r = –1) X

Y Y

O High Degree Positive X O High Degree Negative X


2
Y Y Y

O Low Degree Positive X O Low Degree Negative X O r = 0; No correlation X

(II) Karl Pearson’s Coefficient of Correlation (Co – Variance method) or, Product Moment correlation coefficient

As per ‘Karl Pearson’ , coefficient of correlation,


𝑪𝒐𝒗. (𝑿, 𝒀)
𝒓=
𝝈𝑿 . 𝝈𝒀 ........................... (i)

𝟏
𝑿− 𝑿 . 𝒀− 𝒀 𝑿− 𝑿 . 𝒀− 𝒀
i.e. 𝒓 = 𝒏
or,
𝒓= ........................... (ii)
𝑿− 𝑿 𝟐 𝒀− 𝒀 𝟐
𝑿− 𝑿 𝟐 𝒀− 𝒀 𝟐
𝒏 𝒏
(By using „Actual Means‟)

Direct Method Formula :-


𝑋 𝑌
𝑋𝑌 −
𝑟= 𝑛
𝑋 2 𝑌 2 ........................... (iii)
𝑋2 − 𝑌2 −
𝑛 𝑛

Short Cut Formula :-


𝑑𝑋 𝑑𝑌
𝑑𝑋 𝑑𝑌 −
𝑟= 𝑛
𝑑𝑋 2 𝑑𝑌 2
........................... (iv)
𝑑𝑋 2 − 𝑑𝑌 2 −
𝑛 𝑛

𝑿−𝒂 𝒀−𝒃
Where, 𝒅𝑿 = 𝒉
& 𝒅𝒀 = 𝒌
a, b, h & k are constants

(III) Concurrent Deviations Method


In this method, only the direction of change is found. For each term the change is considered with reference to the previous value.

where, ‘n’ ................. total no. of signs ( in CX.CY column )(i.e. no. of observations –1)
2𝑐 − 𝑛 ‘C’ ................. no. of positive signs ( in CX.CY column )
𝑟𝑐 = ± ±
𝑛

Note :- (1) Here ‘r’ is also termed as ‘coefficient of concurrent deviations’


(2) Compare with just preceding term (i) Self bigger, then (+)
(ii) Self lesser, then (−)
3
(IV) Rank Method
Sometimes we have to find correlation between such variables which cannot be measured quantitatively (as beauty, intelligence,
honesty etc.) but ranks can be assigned to individuals with respect to two variables in order of their merit. If the ranks assigned to
the individuals range from ‘1’ to ‘n’, the rank correlation coefficient between two series of ranks can be found by using
EDWARD SPEARMAN’S formula as: 𝑑2
𝑟 = 1−6 3 ‘n’ ....................no. of pairs
𝑛 − 𝑛 ‘d’ .................... difference between the ranks
TIED RANKS :- If some items are repeated, then common ranks (i.e. average of the ranks) are given to the items repeated and
𝑚 3 −𝑚
add the factor 12
to 𝑑2 for each repeated value. Here ‘m’ is the no. of times an item has been repeated.

Coefficient of determination
‘r2’ is called coefficient of determination It gives an idea about the explained and unexplained variation
(100 r2)% .............. explained variation ; 100(1 – r2 )% .........unexplained variation
Remark :- ‘(1 – r2)’ is called coefficient of non-determination

(1) Find out the coefficient of correlation between the sales and expenses of the following „10‟ firms
Firms 1 2 3 4 5 6 7 8 9 10
Sales (in ‟000Rs.) 50 50 55 60 65 65 65 60 60 50
Expenses (in ‟000Rs.) 11 13 14 16 16 15 15 14 13 13
(2) The following results are obtained between two series from their respective means. Compute the coefficient of
correlation :
X – series Y – series
(i) No. of items 7 7
(ii) Arithmetic Mean 4 8
(i) Sum of squares of deviations
28 76
from A.M.
(ii) Summation of products of
deviations of „X‟ & „Y‟ series 46
from their respective means
(3) Calculate „r‟, if given that n = 10 ; X = 100 ; Y = 150 ; (X–10)2 = 180 ; (Y–15)2 = 215 ;
(X–10)(Y–15) = 60 ; ASHUTOSH GUPTA CLASSES Mob: 7983232057 / 9837121456
(4) Quotation of index number of equity share prices of a certain joint stock company and of preference shares are
given below :
Years 1991 1992 1993 1994 1995 1996 1997
Equity 97·5 99·4 98·6 96·2 95·1 98·4 97·1
Preference shares 75·1 75·9 77·1 78·2 79·0 74·8 76·2
Use the method of rank correlation to determine the relationship between equity share and preference share prices.
(5) Compute the rank correlation coefficient from the following data :
A 115 109 112 87 98 98 120 100 98 118
B 75 73 85 70 76 65 82 73 68 80
(6) Ten students were ranked on the basis of two attributes beauty „X‟ and intelligence „Y‟. The coefficient of rank
correlation between „X‟ and „Y‟ was found to be 0·5. It was later discovered that the difference in ranks in the two
attributes obtained by one of the students was wrongly taken as „3‟ instead of „7‟. Find the correct rank correlation
coefficient ?
(7) Compute the rank correlation coefficient from the following data :
A 45 56 39 54 45 40 56 60 30 36
B 40 36 30 44 36 32 45 42 20 36
(8) Find the rank correlation coefficient between sales and expenses of „10‟ firms :
Sales 50 56 54 60 67 63 60 62 68 69
Expenses 21 23 24 27 32 34 28 30 33 32
4
(9) (i) Compute the correlation coefficient between the corresponding values of X & Y in the following table:-
X 2 4 5 6 8 11
Y 18 12 10 8 7 5
(ii) Multiply each X value in the table by 2 and add 6. Multiply each value of Y in the table by 3 & subtract 15.
Find correlation coefficient between two new set of values. Explain why you do or do not obtain the same result
as in (i).

(10) Find the coefficient of correlation from the following data :-


X 300 350 400 450 500 550 600 650 700
Y 800 900 1000 1100 1200 1300 1400 1500 1600
(11) Following are the deviations from the respective assumed means, i.e. mean no. of labourers & mean no. of
bales consumed. Find out the coefficient of correlation from the above data :
Labourers (in 000) –16 0 1 7 –37 0 11 19 16 1
Bales consumed (in lakhs) 0 0 2 –2 0 4 4 7 6 5
(12) Karl Pearson‟s coeff. of correlation between two variable X & Y is 0·28, their covariance is +7·6. If the
variance of X is 9, find the S.D. of Y- series.
(13) The coeff. of rank correlation between the marks in Statistics & Mathematics obtained by a certain group of
students is 2/3 and the sum of the squares of the differences in ranks is 55. Find the number of students in the group.
(14) Karl Pearson‟s coeff. of correlation between two variables X & Y is 0.52, their covariance is +7.8. If the variance
of X is 16, find S.D. of Y series. ASHUTOSH GUPTA CLASSES Mob: 7983232057 / 9837121456

(15) Calculate the correlation coefficient between X & Y series from the following data :
12 _ 12 _ 12 _ _
(a) (XI – X)2 = 360 ; (Y1 – Y)2 = 250 and (XI – X)(YI – Y) = 225.
i=1 i=1 i=1
_ _ _ _
(b) n = 100, (XI – X)2 = 169 ; (YI - Y)2 =64 ; (XI - X)(YI – Y) = 101
(16) In a contest, two judges ranked 8 candidates A,B,C,D,E,F,G & H in order of their performance, as shown in the
following table. Find the rank correlation coefficient.
Candidates A B C D E F G H
First Judge : 5 2 8 1 4 6 3 7
Second Judge : 4 5 7 3 2 8 1 6
1984–Nov The date relating to import price (y) and import quantity (x) in respect of a given commodity are as under
Year 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984
Import Price (Y) 2 3 6 5 4 3 5 7 8 7
Quantity Imported(X) 6 5 4 5 7 10 9 7 8 9
(a) Calculate Karl Pearson‟s coefficient of correlation between x and y and comment on it.
(b) Find the percentage of variation in import prices that is explained by the variation in the quantity imported.
1987 – Nov 11(a) Coefficient of correlation (r) between two variables X and Y is 0·95. What percent variation in X
(the dependent variable) remains unexplained by the variation in Y (the independent variable) ?
1988 – Nov 8(b) Calculate Spearman‟s coefficient of correlation between marks assigned to ten students by judges X
and Y in a certain competitive test as shown below-
S.No. 1 2 3 4 5 6 7 8 9 10
Mark by X Judge 52 53 42 60 45 41 37 38 25 27
Mark by Y judge 65 68 43 38 77 48 35 30 25 50

1991 – Nov 8(d) Calculate the coefficient of correlation, using the method of concurrent deviations, between supply
and demand given the following data. Also comment on your result.
Year: 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
Supply (000 tons) 120 110 120 119 140 125 127 119 140 160
Demand (000 tons) 240 250 260 266 232 245 255 267 268 239
1993 – June 9(b) The coefficient of correlation between two varieties X and Y is 0·8 and their co- variance is 20. If
the variance of X series is 16, find the standard deviation of Y series.
5
1994 – June 9(b) Calculate the coefficient of correlation from the following data by the method of rank differences:-
X 10 4 2 5 8 5 6 9
Y 10 6 2 5 8 4 5 9

1994 – Nov 11(b) Calculate Karl Pearson‟s coefficient of correlation for the data given below:
Independent Variable X 3 7 5 4 6 8 2 7
Dependent Variable Y 7 12 8 8 10 13 5 10

1995 – Nov 9(b) Calculate coefficient of correlation, using the method of concurrent deviation, between supply &
demand of an item for a ten year period as given below:-
Year 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Supply 125 160 164 174 155 170 165 162 172 175
Demand 115 125 192 190 165 174 124 127 152 169

1996 – May 14(b) A result such as r2 = 1 – (650/500) is not possible. Why .?


1996 – Nov 9(b) Calculate rank correlation coefficient between the two series X and Y, given below:
X : 70 65 71 62 58 69 78 64
Y : 91 76 65 83 90 64 55 48
Also comment on the result ASHUTOSH GUPTA CLASSES Mob: 7983232057 / 9837121456
1997 – Nov 9(b) Calculate the coefficient of concurrent deviation for the following data:-
Supply 65 40 35 75 63 80 35 20 80 60 50
Demand 60 55 50 56 30 70 40 35 80 75 80
1997 – Nov 14(b)(iii) If the correlation coefficient is 0·7, then what is the coefficient of determination. ?
1998 – May 11(b) Calculate the rank correlation coefficient from the data given below:-
X: 75 88 95 70 60 80 81 50
Y: 120 134 150 115 110 140 142 100

2001 – May 14 (b)(iii) Given that the correlation between „x‟ & „y‟ is 0·5. What is the correlation between 2x – 4
and 3 – 2y ?

CORRELATION AND REGRESSION


1.„r‟ will be positive when:
(a) X increases, Y decreases (b) X decreases, Y increases (c) Both X and Y are increasing (d) None of these.
2. If two variables have the linear relationship X + Y = 100, the correlation will be:
(a) –1 (b) +1 (c) +0.80 (d) 0.20.
3. If X – Y = 50, the correlation between X and Y will be: (a) – 1 (b) +1 (c) 0.50 (d) None of these.
4. The coefficient of correlation: (a) has no limit (b) is more than 1 (c) is less than –1 (d) None of these.
5. If the sum of the product of deviations of X and Y series from their means is zero, the correlation coefficient will be:
(a) 0 (b) 1 (c) –1 (d) None.
6

Regression
By regression we mean average relationship between two variables and this relationship is used to estimate the most
likely values of one variable for specified values of the other variable. One of the variables is called independent
variable and the other is called dependent variable.
-: Regression Lines :-
If there is some correlation between the variables, the points of the scatter diagram are concentrated around a straight
line. This line is called regression line.
Y
A line is drawn to these points which passes through the
maximum number of points & the distances of points
from the line through which it does not pass should be minimum.

Note:- The line of regression bears approximately equal


number of points on both sides.
X
O

-: Y on X Line :-
If we minimize the distances parallel to the axis of Y, we get the regression line of „Y on X‟.
Form : Y = a + bX
Y
Y  dependent variable
„Y on X‟ Line
X  Independent variable

X
O

-: X on Y Line :-
If we minimize the distances parallel to the axis of X. we get the regression line of „X on Y‟.
Form : X = a + bY
Y
X Dependent Variable
„X on Y line‟
Y Independent Variable

X
O

-: Equation of Regression Line of ‘Y on X’ :-


 __ σy __ 

 (Y  Y )  r ( X  X ) ............. *
 σx 
 __ __

or (Y  Y)  b yx ( X  X) ............. * *
 
σy
Here b y x ( r ) is called the regression coefficient of „Y on X‟.
σx
Note :- This line gives the best estimated value of „Y‟ for a given value of „X‟. i.e., we can use this equation when „X‟
is given & „Y‟ is to be determined.
7

-: Equation of Regression Line of ‘X on Y’ :-


 __
σx __ 
( X  X )  r . ( Y  Y).................... *
 σy 
 __ __

or, ( X  X)  b xy ( Y  Y)...................... * *
 
σx
Here, bxy ( r . ) is called regression coefficient of „X on Y‟.
σy
Note :- This line gives the best estimated value of „X‟ for a given value of „Y‟. i.e., we can use this equation when „Y‟
is given & „X‟ is to be determined.
-: Properties of Regression Lines :-
__ σy __
Y on X : (Y  Y ) = r (X  X )
σx
__ σx __
X on Y : (X  X )  r. (Y  Y )
σy
__ __ __ __
(1) Both regression lines pass through ( X , Y ) . Thus ( X , Y ) is the point of intersection of two regression lines.

(x ,y )

„Y on X‟ line

„X on Y‟ Line
(2) The angle between the regression lines depends on the value of „r‟. As the value of „r‟ increases from „0‟ to „1‟,the
angle between the two regression lines diminishes from 90o to 0o.

(3) If “r = 0” (i.e., no correlation)


__ __ Y __ __
Y on X : Y  Y  0 or, Y  Y ( X, Y ) __
__ __ Y  Y
X on Y : X  X  0 or, X  X Y
“Both” Lines will make an angle of 90o to each other”.

X
O
(4) If “r =  1”, Both lines become identical. __
(X  X )
(i) When r = +1 …………. (ii) When r = –1………….

-: Properties of regression Coefficients :-


σy σx
b yx  r. b xy  r.
σx σy
σy σx
(1) byx . bxy = r. . r. = r2
σx σy
i.e., r   b y x. b xy  Coefficient of correlation is the G.M. of the two regression coefficients.

(2) The coefficient of correlation & the two regression coefficients (ie., r, byx & bxy) all have the same sign.
8
(3) Regression coefficients are independent of change of origin but not of scale.
(4) If one regression coefficient in greater than one, then the other regression coefficient must be less than one.
(5) Since A.M. of two numbers is always greater than or equal to their G.M. (A.M.≥G.M.), and therefore, A.M. of
bYX & bXY is equal to or greater than the coefficient of correlation.
Practical Problems
(1) You are given variance of X = 9. The regression equations are 8x – 10y + 66 = 0 and 40x – 18y = 214,
find (i) Average values of „X‟ & „Y‟ (ii) Correlation coefficient between the two variables
(iii) Standard Deviation of Y
(2) The correlation coefficient between supply (y) & price (x) of a commodity is 0·60. If
__ __
𝜎x = 1·50 ; 𝜎y = 2·00 ; X = 10 & Y = 20 ; Find the equations of regression lines.
(3) The lines of regression of y on x & x on y are respectively y = x + 5 & 16x = 9y – 94. Find the variance of x, if
the variance of y is 16. Also find the covariance of x & y.
__ __
(4) For a bivariate data X = 20 ; Y = 45 ; byx = 4 & bxy = 1/9 find (a) r (b) 𝜎x if 𝜎y = 12 (c) write down the two
regression lines.
(5) A student obtained the two regression lines as :- 2x – 5y = 7 & 3x + 2y = 8 ; Do you agree with him.
__ __
(6) Find X & Y & r ; for 2x – 3y = 0 & 4y – 5x = 8.
__ __
(7) The two regression lines are given as : y = 0·5X + 25 & X = 0·4y + 22 Calculate (a) X & Y ; (b) the most
likely value of x for y = 8.
(8) The two regression lines are given as : 3X +2Y = 26 & 6X + Y = 31,
Calculate: (a) r (b) 𝜎y, if variance of X = 25.
(9) The equation of two regression lines in a correlation analysis are as follows : 3x + 2y = 26 & 6x + y = 31.
__ __
A student obtain the mean values X = 7 ; Y = 4 and the value of r = 0·5. do you agree with him ?
__ __
(10) Given, Variance of X = 25 ; Regression equations are 5x – y = 22 & 64x – 45y = 24. Find (a) X & Y ;
(b) σy (c) r.
1983 – Nov 8(d) Regression equations of two variables X and Y are as follows:
3X + 2Y – 26 = 0 6X + Y – 31 = 0
Find – (i) the mean of X, (ii) the regression coefficient of X on Y, (iii) the coefficient of correlation between X and Y,
(iv) the most probable value of Y when X = 5.
1987 – Nov 8(b) Given is the following information :-
X(Rs) Y(Rs)
Arithmetic average 6 8
40
Standard deviation 5
3
8
Coefficient of correlation between X and Y = . Find (i) the regression coefficient of Y on X
15
(ii) the regression equation of X on Y ; (iii) the most likely value of Y when X = 100 Rupees.

1993 – June 14(a) (iii) If the regression coefficient of the X on Y is – 1/6 and that of Y on X is –3/2, what is the value of
the correlation coefficient between X and Y.
1993 – Dec 9(b) Compute the two regression equations on the basis of the following information:-
X Y
Mean 40 45
Standard deviation 10 9
Karl Pearson‟s correlation coefficient between X and Y = 0·50. Also estimate the value of Y when X = 48 using the
appropriate regression equation.

You might also like