100% found this document useful (1 vote)
310 views

Correlation and Regression: by Tushar Bhatt

This document discusses correlation and different types of correlation. It begins by defining correlation as a statistical measure of the strength of association between two or more variables. There are three main types of correlation discussed: 1. Type 1 correlation can be positive or negative. Positive correlation means variables change in the same direction, while negative means they change in opposite directions. 2. Type 2 correlation can be simple, involving two variables, or multiple, involving three or more variables. 3. Type 3 correlation can be linear, where a constant ratio exists between variable changes, or non-linear. The document then discusses various methods to measure correlation, including Pearson's coefficient, Spearman's rank correlation, and scatter diagrams

Uploaded by

chirag sabhaya
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
310 views

Correlation and Regression: by Tushar Bhatt

This document discusses correlation and different types of correlation. It begins by defining correlation as a statistical measure of the strength of association between two or more variables. There are three main types of correlation discussed: 1. Type 1 correlation can be positive or negative. Positive correlation means variables change in the same direction, while negative means they change in opposite directions. 2. Type 2 correlation can be simple, involving two variables, or multiple, involving three or more variables. 3. Type 3 correlation can be linear, where a constant ratio exists between variable changes, or non-linear. The document then discusses various methods to measure correlation, including Pearson's coefficient, Spearman's rank correlation, and scatter diagrams

Uploaded by

chirag sabhaya
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 66

Correlation and Regression

By
Tushar Bhatt
[M.Sc(Maths), M.Phil(Maths),
M.Phil(Stat.),M.A(Edu.),P.G.D.C.A]
Assistant Professor in Mathematics,
Atmiya University,
Rajkot
 Meaning of Correlation
 Co – Means two, therefore correlation is a relation
between two variables (like X and Y )

 Correlation is a Statistical method that is commonly


used to compare two or more variables

 For example, comparison between income and


expenditure, price and demand etc...
 Definition of Correlation

 Correlation is a statistical measure for finding out


degree (strength) of association between two or more
than two variables.
 Types of Correlation
 There are three types of correlation as follows :

1. Type – 1 correlation
2. Type – 2 correlation
3. Type – 3 correlation
 Type – 1 correlation

Type – 1 correlation

Positive correlation Negative correlation


 Positive Correlation
 The correlation is said to be positive, if the values of
two variables changing with same direction.

 In other words as X increasing , Y is in increasing


similarly as X decreasing , Y is in decreasing.

 For example : Water consumption and Temperature.


 Negative Correlation
 The correlation is said to be negative, if the values of
two variables changing with opposite direction.

 In other words as X increasing , Y is in decreasing


similarly as X decreasing , Y is in increasing.

 For example : Alcohol consumption and Driving


ability.
 Type – 2 correlation

Type – 2 correlation

Simple Multiple
correlation correlation
 Simple Correlation
 Under simple correlation problem there are only two
variables are studied.

 Multiple Correlation
 Under multiple correlation problem there are three or
more than three variables are studied.
 Partial Correlation
 Under multiple correlation problem there are two
variables considered and other variables keeping as
constant, known as partial correlation.

 Total Correlation
 Total correlation is based on all the relevant variables,
which is normally not feasible .
 Type – 3 correlation

Type –3 correlation

Linear Non-Linear
correlation correlation
 Linear Correlation
 A correlation is said to be linear when the amount of
change in one variable tends to bear a constant ratio to
the amount of change in the other.
 The graph of the variables having a linear relationship
will form a straight line.
 For example:

X 1 2 3 4 5
Y 5 7 9 11 13

 Y = 3+2X (as per above table)


 Non – Linear Correlation

 The correlation would be non-linear, if the amount of


change in one variable does not bear a constant ratio to
the amount of change in the other variable.
 The methods to measure of correlation

 There are three methods to measure of correlation :

1. Karl Pearson’s coefficient of correlation method


2. Coefficient of correlation for Bivariate Grouped data
method
3. Spearman’s Rank correlation method
4. Scatter diagram method
 The methods to measure of correlation

Karl Pearson’s coefficient of


correlation method

Direct Short-cut
method method
 Definition : Covariance
 Karl Pearson’s coefficient (r) of correlation method
Direct method (Frequency is not given)
co v(X , Y )
Case -1: If X a n d Y are integers thenr 
 x  y
 ( xi  X ) (y i  Y )
cov( X , Y )  , n  no ' of obsevations
n
 ( xi  X )2
 x  s t .d e v i a t i o n o f X 
n
 (y i  Y ) 2
 y  s t .d e v i a t i o n o f Y 
n
X  mean of x  series
Y  mean of y  series
 Karl Pearson’s coefficient (r) of correlation method
Short-cut Method (Frequency is not given)
Case -2: If either X o r Y may not be integers
then
  dx    dy 
 dx dy 
n
r
   
  dx    dy 
2 2
   
  dx 2

n    dy 2

n 
   

dx  x  A , A is assumed mean of x  series


dy  y  B , B is assumed mean of y  series
 Examples
Ex-1 : Find the correlation coefficient from the following tabular data :

X 1 2 3 4 5 6 7
Y 6 8 11 9 12 10 14
Ans : 0.84

Ex-2 : Calculate Karl Pearson’s coefficient of correlation between


advertisement cost and sales as per the data given below:

Add. 39 65 62 90 82 75 25 98 36 78
Cost
Sales 47 53 58 86 62 68 60 91 51 84
Ans : 0.7807
 Examples
Ex-3 : Find the correlation coefficient from the
following tabular data :
X 1 2 3 4 5 6 7 8 9 10
Y 46 42 38 34 30 26 22 18 14 10

Ans : -0.99(approx)
• Ex-4: Calculate Pearson’s coefficient of correlation
from the following taking 100 and 50 as the assumed
average of x-series and y-series respectively:
X 104 111 104 114 118 117 105 108 106 100 104 105
Y 57 55 47 45 45 50 64 63 66 62 69 61

Ans : -0.67
 Coefficient (r) of correlation for Bivariate Grouped data method
 In case of bivariate grouped frequency distribution
,coefficient of correlation is given by
 fu  fv 
 fu v 
n
r 
   
   
2 2
 fu   fv 
  fu 2

n    fv 2

n 
   
X  A
u  , A is a s s u m e d m e a n o f x  s e r ie s ,
c
c is le n g th o f a n in te rv a l
Y  B
v  , B is a s s u m e d m e a n o f y  s e r ie s ,
d
d is le n g th o f a n in te r v a l
 Examples
Ex-5 : Find the correlation coefficient between the
grouped frequency distribution of two variables (Profit
and Sales) given in the form of a two way frequency
table :
Sales (in rupees ) 
Profit 100- 110- 120-
(RS) 80-90 90-100 Total
110 120 130

50-55 1 3 7 5 2 18
55-60 2 4 10 7 4 27
60-65 1 5 12 10 7 35
65-70 - 3 8 6 3 20
Total 4 15 37 28 16 100

Ans : 0.0946
 Examples
Ex-6 : Find the correlation coefficient between the
ages of husbands and the ages of wives given in the
form of a two way frequency table :
Ages of Husbands (in years )
20-25 25-30 30-35 35-40 Total
Wives
15-20 20 10 3 2 35
ages(
yr) 20-25 4 28 6 4 42
25-30 - 5 11 - 16
30-35 - - 2 - 2
35-40 - - - - 0
Total 24 43 22 6 95

Ans : 0.61
 Spearman’s Rank Correlation Method
 The methods, we discussed in previous section are depends on the
magnitude of the variables.
 but there are situations, where magnitude of the variable is not
possible then we will use “ Spearman’s Rank correlation method”.
 For example we can not measure beauty and intelligence
quantitatively. It possible to rank individual in order.
 Edward Spearman’s formula for Rank Correlation coefficient R, as
follows:

6 d 2
R 1 
n3  n
n  no ' of individuals in each series
d  The difference between the ranks of the tw o ser ies
 Examples
Ex-7 : Calculate the rank correlation coefficient if two
judges in a beauty contest ranked the entries follows:
Judge X 1 2 3 4 5
Judge Y 5 4 3 2 1

Ans : -1
• Ex-8: Ten students got the following percentage of
marks in mathematics and statistics. Evaluate the rank
correlation between them.
Roll. No. 1 2 3 4 5 6 7 8 9 10
Marks in
78 36 98 25 75 82 90 62 65 39
Maths
Marks in Stat.
84 51 91 60 68 62 86 58 53 47

Ans : 0.8181
Solution -7

6 d 2

R 1
n3  n
6  40 240
R  1 1 1  2  1
125  5 120
Solution -7
Solution -8

6 d 2

R 1
n3  n
6  30 180
R  1 1  1  0 .1 8 1 8  0 .8 1 8 1
1000  10 990
 Scatter Diagram Method
 In this method first we plot the observations in XY –
plane .
 X - Independent variable along with horizontal axis.
 Y - Dependent variable along with vertical axis.
Correlation

Linear Correlation No Correlation Non-linear Correlation

Perfect Strong
Positive Positive Negative Perfect Positive
Correlation Correlation Correlation Negative Correlation
Correlation

Weak
Strong Strong Weak Positive
Negative
Negative Negative Correlation
Correlation
Correlation Correlation
Interpretation of correlation coefficient
Regressio
n
 Equations of Regression Lines

(a) Equation of Regression line Y on X

Y  Y  b yx ( X  X ) w h e re
cov( X , Y )
1. byx  ,
( x ) 2

2. cov( X , Y ) 
 XY   X   Y  ,
n
 X 
2 2
X
3.    
2
x   
n  n 
4. n  Total no. of observations
5. byx  regression coefficient of regression lineY on X
 Lines of Regression

(b) Equation of Regression line X on Y


X  X  b xy ( Y  Y ) w h e re
cov( X , Y )
1. bxy  ,
( y ) 2

2. cov( X , Y ) 
 XY   X   Y  ,
n
  Y 
2 2
Y
3.    
2
y   
n  n 
4. n  Total no. of observations
5. bxy  regression coefficient of regression line X onY
 Regression Equations
 The algebraic expressions of the regression lines are
called
regression equations.
 Since there are two regression lines therefore there are
two
regression equations.
 Using previous method we have obtained the regression
equation Y on X as Y = a + b X and that of X on Y as
X=a + b Y
 The values of “a” and “b” are depends on the means, the
standard
deviation and coefficient of correlation between the two
variables.
 Regression equation Y on X
 y
Y Y  r ( X  X ) w h e re
 x

 X 
2 2
X
1.   x
 
2
   ,
n  n 
2. r  Correlation coefficient between X and Y

  Y 
2 2
Y
3.   y
  n
2
  
 n 
4. n  Total no. of observation or f
 Regression equation X on Y

X  X  r x
(Y  Y ) w h ere
 y

 X 
2 2
X
1.    
2
x    ,
n  n 
2. r  Correlation coefficient between X and Y

  Y 
2 2
Y
3.    
2
y   
n  n 
4. n  Total no. of observation or f
Ex-3 From the following data calculate two equations Y=0.
of lines regression. 45X+
X Y 40.5
Mean 60 67.5 X=0.
Standard 15 13.5 556Y
Deviation +22.4
Where correlation coefficient r = 0.5. 7
Ex-4 From the following data calculate two equations Y=4.
of lines regression. 16X+
409.8
X Y 1
Mean 508.4 23.7 X=0.
Standard 36.8 4.6 065Y
Deviation
Where correlation coefficient r = 0.52. –
9.35
 Difference between correlation and Regression
1. Describing Relationships
 Correlation describes the degree to which two variables are related.
 Regression gives a method for finding the relationship between two
variables.
2. Making Predictions
 Correlation merely describes how well two variables are related.
Analysing the correlation between two variables does not improve
the accuracy with which the value of the dependent variable could
be predicted for a given value of the independent variable.
 Regression allows us to predict values of the dependent variable for
a given value of the independent variable more accurately.
3. Dependence Between Variables
 In analysing correlation, it does not matter which variable is
independent and which is independent.
 In analysing regression, it is necessary to identify between the
dependent and the independent variable.
Assignment

Q-1 Do as directed (Ex-1 to Ex-5 _ solve using Karl pearson’s method)


Ex-1 Find the correlation coefficient between the serum and diastolic Ans. =
blood pressure and serum cholesterol levels of 10 randomly selected 0.809
data of 10 persons.

Perso
1 2 3 4 5 6 7 8 9 10
n
Chole
sterol 307 259 341 317 274 416 267 320 274 336
(X)
Diast
olic
80 75 90 74 75 110 70 85 88 78
B.P(Y
)
Ex-2 Find the correlation coefficient between Intelligence Ratio (I.R) and Ans. =
Emotional Ration(E.R) from the following data 0.596
3
Student 1 2 3 4 5 6 7 8 9 10
I.R(X) 105 104 102 101 100 99 98 96 93 92

E.R(Y) 101 103 100 98 95 96 104 92 97 94


Assignment

Ex-3 Find the correlation coefficient from the following data Ans.
=
-0.79
X 1100 1200 1300 1400 1500 1600 1700 1800 1900 200
Y 0.30 0.29 0.29 0.25 0.24 0.24 0.24 0.29 0.18 0.15
Ex-4 Find the correlation coefficient from the following data Ans.
=
0.958
X 1 2 3 4 5 6 7 8 9 2 10
Y 10 12 16 28 25 36 41 49 40 50
Ex-5 Find the correlation coefficient from the following data Ans.
=
0.949
X 78 89 97 69 59 79 68 561
Y 125 137 156 112 107 138 123 110
Assignment
Ex-6 : Find the correlation coefficient between the
marks of class test for the subjects maths and science
given in the form of a two way frequency table :
Ages of Husbands (in years )
10-15 15-20 20-25 25-30 Total
Wives
40-50 0 1 1 1 3
ages(
yr) 50-60 3 3 0 1 7
60-70 3 3 3 1 10
70-80 1 0 1 1 3
80-90 0 3 3 1 7
Total 7 10 8 5 30

Ans : 0.1413
Assignment
Ex-7 : Find the correlation coefficient between the
marks of annual exam for the subjects Account and
statistics given in the form of a two way frequency
table :
Marks in account 
Stat   60-65 65-70 70-75 75-80 Total
marks 50-60 5 5 5 5 20
60-70 0 5 5 10 20
70-80 8 10 0 22 40
80-90 3 3 3 3 12
90-100 3 3 0 2 8
Total 19 26 13 42 100

Ans : 0.45
Assignment
Q-2 Two judges in a beauty contest rank the 12
contestants as follows :
X 1 2 3 4 5 6 7 8 9 10 11 12 -
0.454
Y 12 19 6 10 3 5 4 7 8 2 11 1
What degree of agreement is there between the
judges?
Q-3 Nine Students secured the following
percentage of marks in mathematics and
chemistry
Roll.No 1 2 3 4 5 6 7 8 9
Marks in Maths 78 36 98 25 75 82 90 62 65 0.84
Marks in Chem. 84 51 91 60 68 62 86 58 53

Find the rank correlation coefficient and


comment on its value.
Assignment
Q-4 What is correlation ? How will you measure it?
Q-5 Define coefficient of correlation. Explain how you will interpret the
value of coefficient of correlation .
Q-6 What is Scatter diagram? To what extent does it help in finding
correlation between two variables ? Or Explain Scatter diagram
method.
Q-7 What is Rank correlation?
Q-8 Explain the following terms with an example .
(i) Positive and negative correlation
(ii) Scatter diagram
(iii) correlation coefficient
(iv) total correlation
(v) partial correlation

Q-9 Explain the term regression and state the difference between
correlation and regression.
Q-10 What are the regression coefficient? Stat their properties.
Q-11 Explain the terms Lines of regression and Regression equations.
Assignment
Q-12 Two judges in a beauty contest rank the 12
contestants as follows :
X 1 2 3 4 5 6 7 8 9 10 11 12 -
0.454
Y 12 19 6 10 3 5 4 7 8 2 11 1
What degree of agreement is there between the
judges?
Q-13 Nine Students secured the following
percentage of marks in mathematics and
chemistry
Roll.No 1 2 3 4 5 6 7 8 9
Marks in Maths 78 36 98 25 75 82 90 62 65 0.84
Marks in Chem. 84 51 91 60 68 62 86 58 53

Find the rank correlation coefficient and


comment on its value.
Mathematicians are born not made

You might also like