0% found this document useful (0 votes)
19 views

Prob Stats Module 3

This document discusses correlation and regression analysis. It defines correlation as measuring the strength of the linear relationship between two variables. Key points include: - Correlation can be measured graphically with a scatter plot or numerically with Pearson's correlation coefficient - Regression finds the average relationship between variables using a regression line that best estimates the value of one variable based on the other - Properties of correlation and regression include the correlation coefficient ranging from -1 to 1, regression lines passing through the mean points, and relationships between the correlation coefficient and regression slopes - Examples are provided to calculate correlation coefficients and regression lines using sample data

Uploaded by

AMRIT RANJAN
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Prob Stats Module 3

This document discusses correlation and regression analysis. It defines correlation as measuring the strength of the linear relationship between two variables. Key points include: - Correlation can be measured graphically with a scatter plot or numerically with Pearson's correlation coefficient - Regression finds the average relationship between variables using a regression line that best estimates the value of one variable based on the other - Properties of correlation and regression include the correlation coefficient ranging from -1 to 1, regression lines passing through the mean points, and relationships between the correlation coefficient and regression slopes - Examples are provided to calculate correlation coefficients and regression lines using sample data

Uploaded by

AMRIT RANJAN
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Correlation and Regression

S. Devi Yamini

Module - III 1 / 19
Correlation
Correlation deals with the measure of strength of the linear relationship
between variables.

Module - III 2 / 19
Correlation
Correlation deals with the measure of strength of the linear relationship
between variables.

Graphical - Scatter plot


Correlation coefficient (due to Karl Pearson)
Rank correlation

Module - III 2 / 19
Scatter Plot

Module - III 3 / 19
Correlation

Karl Pearson Correlation coefficient (Product moment coefficient)


Cov (X , Y )
rXY =
σ(X )σ(Y )

Module - III 4 / 19
Correlation

Karl Pearson Correlation coefficient (Product moment coefficient)


Cov (X , Y )
rXY =
σ(X )σ(Y )
P P P
N XY − X Y
rXY =p P
(N X − [ X ] )(N Y − [ Y ]2 )
2
P 2 P 2 P

Module - III 4 / 19
Properties

1 −1 ≤ rXY ≤ 1
If rXY = −1 =⇒ perfect negative correlation
If rXY = 1 =⇒ perfect positive correlation
If rXY = 0 =⇒ Uncorrelated (no linear relationship bet X and Y )

Module - III 5 / 19
Properties

1 −1 ≤ rXY ≤ 1
If rXY = −1 =⇒ perfect negative correlation
If rXY = 1 =⇒ perfect positive correlation
If rXY = 0 =⇒ Uncorrelated (no linear relationship bet X and Y )
P P P
N UV − U V
2 rXY = rUV = √
U 2 −[ U]2 )(N
P P P 2 P 2
(N V −[ V ] )
X −a Y −b
where U = h and V = k

Module - III 5 / 19
Regression

It is mathematical measure of average relationship between two or more


variables

Module - III 6 / 19
Regression

It is mathematical measure of average relationship between two or more


variables

Regression line
Line which gives the best estimate to the value of one variable for any
specific value of the other variable.

Module - III 6 / 19
Regression Line

Module - III 7 / 19
Lines of regression

Regression line of y on x
σy
y − ȳ = rxy (x − x̄)
σx

Module - III 8 / 19
Lines of regression

Regression line of y on x
σy
y − ȳ = rxy (x − x̄)
σx

Regression line of x on y
σx
x − x̄ = rxy (y − ȳ )
σy

Module - III 8 / 19
Lines of regression

Regression line of y on x
σy
y − ȳ = rxy (x − x̄)
σx

Regression line of x on y
σx
x − x̄ = rxy (y − ȳ )
σy

Regression coefficients
σy
byx = rxy
σx
σx
bxy = rxy
σy

Module - III 8 / 19
Properties
1 2 =b ∗b
rxy xy yx

Module - III 9 / 19
Properties
1 2 =b ∗b
rxy xy yx
2 rxy , bxy , byx will have same sign

Module - III 9 / 19
Properties
1 2 =b ∗b
rxy xy yx
2 rxy , bxy , byx will have same sign
3 Both the lines of regression pass through (X̄ , Ȳ )

Module - III 9 / 19
Properties
1 2 =b ∗b
rxy xy yx
2 rxy , bxy , byx will have same sign
3 Both the lines of regression pass through (X̄ , Ȳ )
4 If there is a perfect correlation between two variables, then there is
only one regression line.

Module - III 9 / 19
Properties
1 2 =b ∗b
rxy xy yx
2 rxy , bxy , byx will have same sign
3 Both the lines of regression pass through (X̄ , Ȳ )
4 If there is a perfect correlation between two variables, then there is
only one regression line.
5 If r = 0, then the two lines of regression are perpendicular. If r = 1 or
−1, then the two lines coincide.

Module - III 9 / 19
Properties
1 2 =b ∗b
rxy xy yx
2 rxy , bxy , byx will have same sign
3 Both the lines of regression pass through (X̄ , Ȳ )
4 If there is a perfect correlation between two variables, then there is
only one regression line.
5 If r = 0, then the two lines of regression are perpendicular. If r = 1 or
−1, then the two lines coincide.
6 P P P
N XY − X Y
bXY = P 2 P 2
N Y −[ Y]

Module - III 9 / 19
Properties
1 2 =b ∗b
rxy xy yx
2 rxy , bxy , byx will have same sign
3 Both the lines of regression pass through (X̄ , Ȳ )
4 If there is a perfect correlation between two variables, then there is
only one regression line.
5 If r = 0, then the two lines of regression are perpendicular. If r = 1 or
−1, then the two lines coincide.
6 P P P
N XY − X Y
bXY = P 2 P 2
N Y −[ Y]

7 P P P
N XY − X Y
bYX = P 2 P 2
N X − [ X]

Module - III 9 / 19
Problems

1. Calculate the correlation coefficient for the following heights (in inches)
of fathers’ (x) and their sons’ (y ):

x 65 66 67 67 68 69 70 72
y 67 68 65 68 72 72 69 71

Obtain the lines of regression for the above data and find the estimate of
x for y = 70

Module - III 10 / 19
Problems

1. Calculate the correlation coefficient for the following heights (in inches)
of fathers’ (x) and their sons’ (y ):

x 65 66 67 67 68 69 70 72
y 67 68 65 68 72 72 69 71

Obtain the lines of regression for the above data and find the estimate of
x for y = 70
rxy = 0.603, x̄ = 68, ȳ = 69, σx = 4.5, σy = 5.5

Module - III 10 / 19
Problems

1. Calculate the correlation coefficient for the following heights (in inches)
of fathers’ (x) and their sons’ (y ):

x 65 66 67 67 68 69 70 72
y 67 68 65 68 72 72 69 71

Obtain the lines of regression for the above data and find the estimate of
x for y = 70
rxy = 0.603, x̄ = 68, ȳ = 69, σx = 4.5, σy = 5.5
Regression equation of x on y : x = 0.5454y + 30.3674
Regression equation of y on x: y = 0.6666x + 23.6712

Module - III 10 / 19
Try!!!!

2. Calculate the coefficient of correlation between X and Y by Karl


Pearson’s method:

X 25 30 28 29 32 24 36 28 27 21
Y 18 20 21 16 14 13 22 15 19 12

Also, obtain the regression equations.

Module - III 11 / 19
Try!!!!

2. Calculate the coefficient of correlation between X and Y by Karl


Pearson’s method:

X 25 30 28 29 32 24 36 28 27 21
Y 18 20 21 16 14 13 22 15 19 12

Also, obtain the regression equations. rXY = 0.5955, positive correlation

Module - III 11 / 19
Problems
3. A computer while calculating the correlation coefficient between x and
y from 25
P pairs of observations, obtained thePfollowing: P
n = 25, x = 125, x 2 = 650, y = 100, y 2 = 460, xy = 508. It
P P
was later discovered that they had copied two pairs as (6, 14) and (8, 6)
while the correct values were (8, 12) and (6, 8). Obtain the correct value
of the correlation coefficient.

Module - III 12 / 19
Problems
3. A computer while calculating the correlation coefficient between x and
y from 25
P pairs of observations, obtained thePfollowing: P
n = 25, x = 125, x 2 = 650, y = 100, y 2 = 460, xy = 508. It
P P
was later discovered that they had copied two pairs as (6, 14) and (8, 6)
while the correct values were (8, 12) and (6, 8). Obtain the correct value
of the correlation coefficient.
rxy = 0.667
4. Can y = 5 + 2.8x and x = 3 − 0.5y be the estimated regression
equations of y on x and x on y respectively?

Module - III 12 / 19
Problems
3. A computer while calculating the correlation coefficient between x and
y from 25
P pairs of observations, obtained thePfollowing: P
n = 25, x = 125, x 2 = 650, y = 100, y 2 = 460, xy = 508. It
P P
was later discovered that they had copied two pairs as (6, 14) and (8, 6)
while the correct values were (8, 12) and (6, 8). Obtain the correct value
of the correlation coefficient.
rxy = 0.667
4. Can y = 5 + 2.8x and x = 3 − 0.5y be the estimated regression
equations of y on x and x on y respectively? No

Module - III 12 / 19
Problems
3. A computer while calculating the correlation coefficient between x and
y from 25
P pairs of observations, obtained thePfollowing: P
n = 25, x = 125, x 2 = 650, y = 100, y 2 = 460, xy = 508. It
P P
was later discovered that they had copied two pairs as (6, 14) and (8, 6)
while the correct values were (8, 12) and (6, 8). Obtain the correct value
of the correlation coefficient.
rxy = 0.667
4. Can y = 5 + 2.8x and x = 3 − 0.5y be the estimated regression
equations of y on x and x on y respectively? No
5. Out of two lines of regression, which is the regression line of X on Y .

X + 2Y − 5 = 0, 2X + 3Y − 8 = 0

Also, obtain (i) the value of correlation coefficient, (ii) mean values of X
and Y , (iii) if the variance of X is 12, find σY .

Module - III 12 / 19
Problems
3. A computer while calculating the correlation coefficient between x and
y from 25
P pairs of observations, obtained thePfollowing: P
n = 25, x = 125, x 2 = 650, y = 100, y 2 = 460, xy = 508. It
P P
was later discovered that they had copied two pairs as (6, 14) and (8, 6)
while the correct values were (8, 12) and (6, 8). Obtain the correct value
of the correlation coefficient.
rxy = 0.667
4. Can y = 5 + 2.8x and x = 3 − 0.5y be the estimated regression
equations of y on x and x on y respectively? No
5. Out of two lines of regression, which is the regression line of X on Y .

X + 2Y − 5 = 0, 2X + 3Y − 8 = 0

Also, obtain (i) the value of correlation coefficient, (ii) mean values of X
and Y , (iii) if the variance of X is 12, find σY .
rXY = −0.866, bXY = −1.5, bYX = −0.5
X̄ = 1, Ȳ = 2, σY = 2
Module - III 12 / 19
Partial correlation

A partial correlation measures the relationship between two variables while


controlling the influence of the third variable by holding it constant.

Module - III 13 / 19
Partial correlation

A partial correlation measures the relationship between two variables while


controlling the influence of the third variable by holding it constant.
Zero-order partial correlation coefficient - rxy , rxz , ryz

Module - III 13 / 19
Partial correlation

A partial correlation measures the relationship between two variables while


controlling the influence of the third variable by holding it constant.
Zero-order partial correlation coefficient - rxy , rxz , ryz
First-order partial correlation coefficient :
rxy − rxz ryz
rxy .z = q
2 )(1 − r 2 )
(1 − rxz yz

Module - III 13 / 19
Problems

1. The simple correlation coefficients between temperature (X1 ), corn


yield (X2 ), and rainfall (X3 ) are

r12 = 0.59, r13 = 0.46, and r23 = 0.77

Find the partial correlation coefficients r12.3 , r23.1 , and r31.2 .

Module - III 14 / 19
Problems

1. The simple correlation coefficients between temperature (X1 ), corn


yield (X2 ), and rainfall (X3 ) are

r12 = 0.59, r13 = 0.46, and r23 = 0.77

Find the partial correlation coefficients r12.3 , r23.1 , and r31.2 .


2. If all the correlation coefficients of zero order in a set of p-variates are
r
equal to r , show that every partial correlation of first order is 1+r

Module - III 14 / 19
Problems

1. The simple correlation coefficients between temperature (X1 ), corn


yield (X2 ), and rainfall (X3 ) are

r12 = 0.59, r13 = 0.46, and r23 = 0.77

Find the partial correlation coefficients r12.3 , r23.1 , and r31.2 .


2. If all the correlation coefficients of zero order in a set of p-variates are
r
equal to r , show that every partial correlation of first order is 1+r
3. The correlation between a general intelligence test and school
achievement in a group of children from 6 to 15 years is 0.8. The
correlation between the general intelligence test and age in the same group
is 0.7 and the correlation between school achievement and age is 0.6.
What is the correlation between general intelligence and school
achievement in children of the same age?
(Hint: X1 = General intelligence, X2 = School achievement, X3 = Age.
Given r12 = 0.8, r13 = 0.7, r23 = 0.6. Calculate r12.3 )
Module - III 14 / 19
Multiple correlation

We study the effects of all the independent variables simultaneously on a


dependent variable. For example, to study the correlation coefficient
between the yied of paddy (X1 ) and the other independent variables
namely, manure (X2 ), humidity (X3 ), type of seedlings (X4 ), rainfall (X5 ),
we use multiple correlation, denoted by R1.2345

Module - III 15 / 19
Multiple correlation

We study the effects of all the independent variables simultaneously on a


dependent variable. For example, to study the correlation coefficient
between the yied of paddy (X1 ) and the other independent variables
namely, manure (X2 ), humidity (X3 ), type of seedlings (X4 ), rainfall (X5 ),
we use multiple correlation, denoted by R1.2345
The multiple correlation coefficient of X1 on X2 and X3 is denoted by R1.23
s
2 + r 2 − 2r r r
r12 13 12 13 23
R1.23 = 2
1 − r23

Module - III 15 / 19
Multiple correlation

We study the effects of all the independent variables simultaneously on a


dependent variable. For example, to study the correlation coefficient
between the yied of paddy (X1 ) and the other independent variables
namely, manure (X2 ), humidity (X3 ), type of seedlings (X4 ), rainfall (X5 ),
we use multiple correlation, denoted by R1.2345
The multiple correlation coefficient of X1 on X2 and X3 is denoted by R1.23
s
2 + r 2 − 2r r r
r12 13 12 13 23
R1.23 = 2
1 − r23

Properties
0 ≤ R1.23 ≤ 1
R1.23 ≥ r12 , r13 , r23

Module - III 15 / 19
Problems

1. The simple correlation coefficients between temperature (X1 ), corn


yield (X2 ), and rainfall (X3 ) are

r12 = 0.59, r13 = 0.46, and r23 = 0.77

Find the multiple correlation coefficients R1.23 , R2.31 , and R3.12 .

Module - III 16 / 19
Problems

1. The simple correlation coefficients between temperature (X1 ), corn


yield (X2 ), and rainfall (X3 ) are

r12 = 0.59, r13 = 0.46, and r23 = 0.77

Find the multiple correlation coefficients R1.23 , R2.31 , and R3.12 .


2. The following zero order correlation coefficients are given
r12 = 0.98, r13 = 0.44, r23 = 0.54. Calculate the multiple correlation
coefficient treating first variable as dependent and second and third
variables as independent.

Module - III 16 / 19
Problems

1. The simple correlation coefficients between temperature (X1 ), corn


yield (X2 ), and rainfall (X3 ) are

r12 = 0.59, r13 = 0.46, and r23 = 0.77

Find the multiple correlation coefficients R1.23 , R2.31 , and R3.12 .


2. The following zero order correlation coefficients are given
r12 = 0.98, r13 = 0.44, r23 = 0.54. Calculate the multiple correlation
coefficient treating first variable as dependent and second and third
variables as independent.
3. If all the correlation coefficients of zero order in a set of p-variates are
equal to r , show that every√ multiple correlation
R1.23 = R2.13 = R3.12 = √r 1+r 2

Module - III 16 / 19
Multiple Regression

If X , Y , and Z are three variables, then the regression equation of X on Y


and Z is
X = aY + bZ + c

Module - III 17 / 19
Multiple Regression

If X , Y , and Z are three variables, then the regression equation of X on Y


and Z is
X = aY + bZ + c

Problem
Find the multiple linear regression of X1 on X2 and X3 from the data
relating to three variables

X1 4 6 7 9 13 15
X2 15 12 8 6 4 3
X3 30 24 20 14 10 4

Module - III 17 / 19
Multiple Regression

If X , Y , and Z are three variables, then the regression equation of X on Y


and Z is
X = aY + bZ + c

Problem
Find the multiple linear regression of X1 on X2 and X3 from the data
relating to three variables

X1 4 6 7 9 13 15
X2 15 12 8 6 4 3
X3 30 24 20 14 10 4

X1 = 0.3899X2 − 0.6233X3 + 16.4776

Module - III 17 / 19
Multiple Regression
For a multivariate data, the regression equation of X on Y and Z is
ω11 ω12 ω13
(X − X̄ ) + (Y − Ȳ ) + (Z − Z̄ ) =0
σ1 σ2 σ3
where
 
1 r12 r13
ω = det r12 1
 r23 
r13 r23 1
 
1 r23
ω11 = det
r23 1
 
r12 r23
ω12 = −det
r13 1
 
r12 1
ω13 = det
r13 r23
Module - III 18 / 19
Problems

1. Find the regression equation of X on Y and Z given the following


results:

Variables Mean SD r12 r23 r31


X 35.8 4.2 0.6 - -
Y 52.4 5.3 - 0.7 -
Z 48.8 6.1 - - 0.8

Module - III 19 / 19
Problems

1. Find the regression equation of X on Y and Z given the following


results:

Variables Mean SD r12 r23 r31


X 35.8 4.2 0.6 - -
Y 52.4 5.3 - 0.7 -
Z 48.8 6.1 - - 0.8
 
1 0.6 0.8
ω = det  0.6 1 0.7 
0.8 0.7 1
X = 0.062Y + 0.513Z + 7.6

Module - III 19 / 19

You might also like