Correlation and Regression: by Tushar Bhatt
Correlation and Regression: by Tushar Bhatt
By
Tushar Bhatt
[M.Sc(Maths), M.Phil(Maths),
M.Phil(Stat.),M.A(Edu.),P.G.D.C.A]
Assistant Professor in Mathematics,
Atmiya University,
Rajkot
Meaning of Correlation
Co – Means two, therefore correlation is a relation
between two variables (like X and Y )
1. Type – 1 correlation
2. Type – 2 correlation
3. Type – 3 correlation
Type – 1 correlation
Type – 1 correlation
Type – 2 correlation
Simple Multiple
correlation correlation
Simple Correlation
Under simple correlation problem there are only two
variables are studied.
Multiple Correlation
Under multiple correlation problem there are three or
more than three variables are studied.
Partial Correlation
Under multiple correlation problem there are two
variables considered and other variables keeping as
constant, known as partial correlation.
Total Correlation
Total correlation is based on all the relevant variables,
which is normally not feasible .
Type – 3 correlation
Type –3 correlation
Linear Non-Linear
correlation correlation
Linear Correlation
A correlation is said to be linear when the amount of
change in one variable tends to bear a constant ratio to
the amount of change in the other.
The graph of the variables having a linear relationship
will form a straight line.
For example:
X 1 2 3 4 5
Y 5 7 9 11 13
Direct Short-cut
method method
Definition : Covariance
Karl Pearson’s coefficient (r) of correlation method
Direct method (Frequency is not given)
co v(X , Y )
Case -1: If X a n d Y are integers thenr
x y
( xi X ) (y i Y )
cov( X , Y ) , n no ' of obsevations
n
( xi X )2
x s t .d e v i a t i o n o f X
n
(y i Y ) 2
y s t .d e v i a t i o n o f Y
n
X mean of x series
Y mean of y series
Karl Pearson’s coefficient (r) of correlation method
Short-cut Method (Frequency is not given)
Case -2: If either X o r Y may not be integers
then
dx dy
dx dy
n
r
dx dy
2 2
dx 2
n dy 2
n
X 1 2 3 4 5 6 7
Y 6 8 11 9 12 10 14
Ans : 0.84
Add. 39 65 62 90 82 75 25 98 36 78
Cost
Sales 47 53 58 86 62 68 60 91 51 84
Ans : 0.7807
Examples
Ex-3 : Find the correlation coefficient from the
following tabular data :
X 1 2 3 4 5 6 7 8 9 10
Y 46 42 38 34 30 26 22 18 14 10
Ans : -0.99(approx)
• Ex-4: Calculate Pearson’s coefficient of correlation
from the following taking 100 and 50 as the assumed
average of x-series and y-series respectively:
X 104 111 104 114 118 117 105 108 106 100 104 105
Y 57 55 47 45 45 50 64 63 66 62 69 61
Ans : -0.67
Coefficient (r) of correlation for Bivariate Grouped data method
In case of bivariate grouped frequency distribution
,coefficient of correlation is given by
fu fv
fu v
n
r
2 2
fu fv
fu 2
n fv 2
n
X A
u , A is a s s u m e d m e a n o f x s e r ie s ,
c
c is le n g th o f a n in te rv a l
Y B
v , B is a s s u m e d m e a n o f y s e r ie s ,
d
d is le n g th o f a n in te r v a l
Examples
Ex-5 : Find the correlation coefficient between the
grouped frequency distribution of two variables (Profit
and Sales) given in the form of a two way frequency
table :
Sales (in rupees )
Profit 100- 110- 120-
(RS) 80-90 90-100 Total
110 120 130
50-55 1 3 7 5 2 18
55-60 2 4 10 7 4 27
60-65 1 5 12 10 7 35
65-70 - 3 8 6 3 20
Total 4 15 37 28 16 100
Ans : 0.0946
Examples
Ex-6 : Find the correlation coefficient between the
ages of husbands and the ages of wives given in the
form of a two way frequency table :
Ages of Husbands (in years )
20-25 25-30 30-35 35-40 Total
Wives
15-20 20 10 3 2 35
ages(
yr) 20-25 4 28 6 4 42
25-30 - 5 11 - 16
30-35 - - 2 - 2
35-40 - - - - 0
Total 24 43 22 6 95
Ans : 0.61
Spearman’s Rank Correlation Method
The methods, we discussed in previous section are depends on the
magnitude of the variables.
but there are situations, where magnitude of the variable is not
possible then we will use “ Spearman’s Rank correlation method”.
For example we can not measure beauty and intelligence
quantitatively. It possible to rank individual in order.
Edward Spearman’s formula for Rank Correlation coefficient R, as
follows:
6 d 2
R 1
n3 n
n no ' of individuals in each series
d The difference between the ranks of the tw o ser ies
Examples
Ex-7 : Calculate the rank correlation coefficient if two
judges in a beauty contest ranked the entries follows:
Judge X 1 2 3 4 5
Judge Y 5 4 3 2 1
Ans : -1
• Ex-8: Ten students got the following percentage of
marks in mathematics and statistics. Evaluate the rank
correlation between them.
Roll. No. 1 2 3 4 5 6 7 8 9 10
Marks in
78 36 98 25 75 82 90 62 65 39
Maths
Marks in Stat.
84 51 91 60 68 62 86 58 53 47
Ans : 0.8181
Solution -7
6 d 2
R 1
n3 n
6 40 240
R 1 1 1 2 1
125 5 120
Solution -7
Solution -8
6 d 2
R 1
n3 n
6 30 180
R 1 1 1 0 .1 8 1 8 0 .8 1 8 1
1000 10 990
Scatter Diagram Method
In this method first we plot the observations in XY –
plane .
X - Independent variable along with horizontal axis.
Y - Dependent variable along with vertical axis.
Correlation
Perfect Strong
Positive Positive Negative Perfect Positive
Correlation Correlation Correlation Negative Correlation
Correlation
Weak
Strong Strong Weak Positive
Negative
Negative Negative Correlation
Correlation
Correlation Correlation
Interpretation of correlation coefficient
Regressio
n
Equations of Regression Lines
Y Y b yx ( X X ) w h e re
cov( X , Y )
1. byx ,
( x ) 2
2. cov( X , Y )
XY X Y ,
n
X
2 2
X
3.
2
x
n n
4. n Total no. of observations
5. byx regression coefficient of regression lineY on X
Lines of Regression
2. cov( X , Y )
XY X Y ,
n
Y
2 2
Y
3.
2
y
n n
4. n Total no. of observations
5. bxy regression coefficient of regression line X onY
Regression Equations
The algebraic expressions of the regression lines are
called
regression equations.
Since there are two regression lines therefore there are
two
regression equations.
Using previous method we have obtained the regression
equation Y on X as Y = a + b X and that of X on Y as
X=a + b Y
The values of “a” and “b” are depends on the means, the
standard
deviation and coefficient of correlation between the two
variables.
Regression equation Y on X
y
Y Y r ( X X ) w h e re
x
X
2 2
X
1. x
2
,
n n
2. r Correlation coefficient between X and Y
Y
2 2
Y
3. y
n
2
n
4. n Total no. of observation or f
Regression equation X on Y
X X r x
(Y Y ) w h ere
y
X
2 2
X
1.
2
x ,
n n
2. r Correlation coefficient between X and Y
Y
2 2
Y
3.
2
y
n n
4. n Total no. of observation or f
Ex-3 From the following data calculate two equations Y=0.
of lines regression. 45X+
X Y 40.5
Mean 60 67.5 X=0.
Standard 15 13.5 556Y
Deviation +22.4
Where correlation coefficient r = 0.5. 7
Ex-4 From the following data calculate two equations Y=4.
of lines regression. 16X+
409.8
X Y 1
Mean 508.4 23.7 X=0.
Standard 36.8 4.6 065Y
Deviation
Where correlation coefficient r = 0.52. –
9.35
Difference between correlation and Regression
1. Describing Relationships
Correlation describes the degree to which two variables are related.
Regression gives a method for finding the relationship between two
variables.
2. Making Predictions
Correlation merely describes how well two variables are related.
Analysing the correlation between two variables does not improve
the accuracy with which the value of the dependent variable could
be predicted for a given value of the independent variable.
Regression allows us to predict values of the dependent variable for
a given value of the independent variable more accurately.
3. Dependence Between Variables
In analysing correlation, it does not matter which variable is
independent and which is independent.
In analysing regression, it is necessary to identify between the
dependent and the independent variable.
Assignment
Perso
1 2 3 4 5 6 7 8 9 10
n
Chole
sterol 307 259 341 317 274 416 267 320 274 336
(X)
Diast
olic
80 75 90 74 75 110 70 85 88 78
B.P(Y
)
Ex-2 Find the correlation coefficient between Intelligence Ratio (I.R) and Ans. =
Emotional Ration(E.R) from the following data 0.596
3
Student 1 2 3 4 5 6 7 8 9 10
I.R(X) 105 104 102 101 100 99 98 96 93 92
Ex-3 Find the correlation coefficient from the following data Ans.
=
-0.79
X 1100 1200 1300 1400 1500 1600 1700 1800 1900 200
Y 0.30 0.29 0.29 0.25 0.24 0.24 0.24 0.29 0.18 0.15
Ex-4 Find the correlation coefficient from the following data Ans.
=
0.958
X 1 2 3 4 5 6 7 8 9 2 10
Y 10 12 16 28 25 36 41 49 40 50
Ex-5 Find the correlation coefficient from the following data Ans.
=
0.949
X 78 89 97 69 59 79 68 561
Y 125 137 156 112 107 138 123 110
Assignment
Ex-6 : Find the correlation coefficient between the
marks of class test for the subjects maths and science
given in the form of a two way frequency table :
Ages of Husbands (in years )
10-15 15-20 20-25 25-30 Total
Wives
40-50 0 1 1 1 3
ages(
yr) 50-60 3 3 0 1 7
60-70 3 3 3 1 10
70-80 1 0 1 1 3
80-90 0 3 3 1 7
Total 7 10 8 5 30
Ans : 0.1413
Assignment
Ex-7 : Find the correlation coefficient between the
marks of annual exam for the subjects Account and
statistics given in the form of a two way frequency
table :
Marks in account
Stat 60-65 65-70 70-75 75-80 Total
marks 50-60 5 5 5 5 20
60-70 0 5 5 10 20
70-80 8 10 0 22 40
80-90 3 3 3 3 12
90-100 3 3 0 2 8
Total 19 26 13 42 100
Ans : 0.45
Assignment
Q-2 Two judges in a beauty contest rank the 12
contestants as follows :
X 1 2 3 4 5 6 7 8 9 10 11 12 -
0.454
Y 12 19 6 10 3 5 4 7 8 2 11 1
What degree of agreement is there between the
judges?
Q-3 Nine Students secured the following
percentage of marks in mathematics and
chemistry
Roll.No 1 2 3 4 5 6 7 8 9
Marks in Maths 78 36 98 25 75 82 90 62 65 0.84
Marks in Chem. 84 51 91 60 68 62 86 58 53
Q-9 Explain the term regression and state the difference between
correlation and regression.
Q-10 What are the regression coefficient? Stat their properties.
Q-11 Explain the terms Lines of regression and Regression equations.
Assignment
Q-12 Two judges in a beauty contest rank the 12
contestants as follows :
X 1 2 3 4 5 6 7 8 9 10 11 12 -
0.454
Y 12 19 6 10 3 5 4 7 8 2 11 1
What degree of agreement is there between the
judges?
Q-13 Nine Students secured the following
percentage of marks in mathematics and
chemistry
Roll.No 1 2 3 4 5 6 7 8 9
Marks in Maths 78 36 98 25 75 82 90 62 65 0.84
Marks in Chem. 84 51 91 60 68 62 86 58 53