0% found this document useful (0 votes)

9 views

Correlation

Uploaded by

aditidocmoc

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Correlation

Uploaded by

aditidocmoc

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Study Notes

Correlation
Correlation

Introduction

 Bivariate Distribution: A distribution in which each unit of a series assumes two values
 Multivariate Distribution: A distribution in which each unit of a series assumes more than
one value.
 Correlation is a statistical tool which studies relationship between two variables.
 The presence of correlation between two variables X and Y simply means that when the
value of one variable is found to change in one direction, the value of the other variable is
found to change either in the same direction (i.e., positive change) or in the opposite
direction (i.e., negative change), but in a definite way.
 It is an analysis of the covariance between two variables.
 It helps to measure the magnitude and direction of relationship between two variables.
 Types of correlation
o Positive and negative correlation:
 When the variables move in the same direction, these variables are said to
be correlated positively and
 if they move in the opposite direction they are said to be negatively
correlated (for e.g., price and demand of a commodity, sale of woollen
garments and day temperature)
o Linear and non-linear correlation:
 With a unit change in one variable there is a constant change in other
variable over the entire range of values then the correlation between the
variables is linear
 If corresponding to a unit change in one variable, the other variable does not
change at a constant rate but at a fluctuating rate then the correlation is said
to be non-linear or curvilinear

Correlation and Causation

 Causation implies correlation but reverse is not true i.e., Correlation doesn’t imply
causation. For e.g., ice-cream sales and sun glasses sales could be positively correlated
but there is no causation between them.
 Correlation analysis fails to reflect the cause-and-effect relationship between the
variables. It only tells the degree of association.
 In bivariate distribution, if the variables have the cause-and-effect relationship, they bound to
vary in sympathy with each other.
 Correlation only implies co-variation.
 Reasons of high degree of correlation
o Mutual dependence
o Both the variables being influenced by the same external factors
o Pure chance
 A high value of r is neither necessary nor sufficient for a causal relationship between
X and Y.

2
Correlation

Not necessary because r is close to 0 yet X and Y can have causal relationship. This is possible
if the relationship between X and Y is non-linear since r only measures straight line
relationships. E.g., Y = X 2

Not sufficient because it may be because of Spurious correlation

 Chance correlation, e.g., Increase in Hippopotamus population and steel
production.
 X and Y may be affected by a third variable (“common response variable”,
“confounding factor”, “lurking variable”) without being related to each other. For e.g.,
ice-cream sales and sun glasses sales could be positively correlated and rise in
temperature may be the cause of such correlation.

Degrees of Correlation

Methods of studying correlation

1. Scattered diagram method

2. Karl Pearson’s coefficient of correlation (Covariance method)
3. Two-way frequency table (Bivariate correlation method)
4. Spearman Rank Order Correlation Method
5. Concurrent deviation method

Scatter Diagram Method

 A scatter diagram helps to have a visual or graphical idea about the nature of association
between two variables.
 For example, if two variables, X and Y are plotted along the X-axis and Y-axis respectively in
the x-y plane of a graph sheet, the resultant diagram of dots is known as scatter diagram.
 The various possible situations are:

Perfect Positive High Positive Low Positive No Correlation

Correlation Correlation Correlation

3
Correlation

Non-linear/ Low Negative High negative Perfect Negative

Curvilinear Correlation Correlation Correlation

s
Positive Non-linear relation Negative Non-linear relation

Karl Pearson’s Coefficient of Correlation

 It is a mathematical method for measuring the intensity or magnitude of linear

relationship between two variables
 Suggested by Karl Pearson, a British Bio-metrician and statistician
 Karl Pearson’s correlation coefficient is also called as product moment correlation
coefficient.
 Karl Pearson’s measures, known as Pearsonian correlation coefficient between two
variables series X and Y is denoted by ‘r (X,Y)’ or ‘rxy’ or ‘r’
 It is numerical measure of linear relationship between two variables. When there is a
non-linear relation between X and Y, then calculating the Karl Pearson’s coefficient of
correlation can be misleading.
 It can be defined as the ratio of the covariance between X and Y, to the product of the
standard deviation of X and Y i.e.

Covariance between X and Y is defined as

Σ ( X i− X ) ( Y i−Y )
Cov4( X , Y ) =
Σxy ∑ XY
= N
= N
- XY
N

where, N = number of observations

Correlation

 Alternatively, r is given as:

1
Σ ( X i −X ) ( Y i−Y ) Σ ( X i− X ) ( Y i−Y )
N Σ ( X i− X ) ( Y i−Y )

√
r= = =
√
2 2
∑ ( X i −X ) ∑ ( Y i−Y )
2
∑ ( X i− X )
⋅
∑ ( Y i−Y )
2
N
N
⋅
N
√∑ ( X − X ) ∑ ( Y −Y )
i
2
i
2

N N

N ∑ XY −∑ XΣY
N
r=

√ √
2 2
2 (∑ X ) 2 (∑Y )
∑X − ⋅ ∑Y − ⋅
N N
N ∑ XY −∑ XΣY
r=
√ [N ∑ X −( ∑ X ) ] [ NΣ Y −( ΣY ) ]
2 2 2 2

Where,
N = number of observations
 Sign of covariance (X,Y) gives sign of r as the standard deviations are always positive.

For example,

The correlation has to be determined between the rainfall and the yield of the vegetable
sown from the given data:
Rainfall (mm) 12 9 8 10 11 13 7
Yield (Kg) 14 8 6 9 11 12 3

Solution:

Rainfall (X) Yield (Y) XY X^2 Y^2

12 14 168 144 196
9 8 72 81 64
8 6 48 64 36
10 9 90 100 81
11 11 121 121 121
13 12 156 169 144
7 3 21 49 9
70 63 676 728 651

r = (7 * 676 – 7063)/ ( (7728 - 702 ) (7*651 – 632))

= 0.949 i.e., high positive correlation between rainfall and plant yield.

Properties of correlation coefficient

 The value of r is independent of the units in which X and Y are measured.

 A negative value of r indicates an inverse relation. A change in one variable is
associated with change in the other variable in the opposite direction.
 If r is positive the two variables move in the same direction.

5
Correlation

 The value of r does not depend on which of the two variables under study is labelled X
and which is labelled Y, i.e.; it does not depend upon which variable is dependent /
independent {rxy = ryx}.
 Limit of correlation coefficient: The correlation coefficient value ranges between –1
and +1[ -1 ≤ r ≤1].

 r = 1 if and only if all ( X i , Y i) pairs lie on a straight line with positive slope and r = -1 if
and only if all ( X i , Y i ) pairs lie on a straight line with negative slope. In other words, all
the points in the scatter are collinear and the correlation is perfect.
 If r = 0 the two variables are uncorrelated. There is no linear relation between them.
However, other types of relation may be there.
 The correlation coefficient is independent of change of origin and scale i.e., if X and Y
ae transformed into new variables U [U= (X-a)/h] and V [ V = (Y-b)/k] by changing the
origin and scale, then the correlation coefficient between X and Y is same as the
correlation coefficient between U and V.

6
Correlation

Corollary: If X and Y are random variables and a,b,c,d are any numbers provided only that a ≠
0, c ≠ 0 , then r (aX + b, cY + d) = [ ac / │a││c│] r (X,Y). In other words, r is affected by
change of sign. If a and c have different signs, sign of r would change.

7
Correlation

 The two independent variables are uncorrelated but the reverse is not true. A 0
coefficient of correlation only implies absence of a “linear” relationship between them.

8
Correlation

 If variable X and Y are correlated by the linear equation aX + bY + c = 0, then the

correlation between X and Y is (+1) if the signs of a and b are opposite and (-1) if the
signs of a and b are alike.
 The square of the sample correlation coefficient is equal to the coefficient of
determination resulting from fitting the simple regression model.
 r measures only linear relationships. Ex- Y = X 2 . Here, r = 0.

Assumptions underlying Karl Pearson’s Correlation Coefficient

 Each variable is affected by a large number of independent contributory causes of such

nature so as to produce normal distribution
 The variables X and Y under the study are linearly related.
 The forces operating on each of the variable series are not independent of each other
but are related in a casual fashion.

Interpretation of r

Correlation Coefficient, r Relationship between variables

r=1 Perfect positive correlation
r>0 Positive correlation
r=0 No correlation
r<0 Negative correlation
r = -1 Perfect negative correlation

 The reliability of the significance of the value of correlation coefficient depends up on a

number factors.
 One way of testing the significance of r is finding probable error
 Which in addition to the value of r takes into account the size of the sample.
 Another more useful measure of interpreting the value of r is coefficient of
determination. It is observed there that the closeness of the relationship between two
variables as determined by correlation coefficient r is not proportional.

Testing the Significance of r

 A hypothesis test of the "significance of the correlation coefficient" is performed to

decide whether the linear relationship in the sample data is strong enough to represent
the relationship in the population.
 Probable Error: The probable error of the correlation coefficient is an amount which if
added to and subtracted from the mean correlation coefficient, produces amounts within
which the chances are even that a coefficient of correlation from a series selected at
random will fall

P.E (r) = 0.6745 X S.E. (r)

1−r 2
S.E. (r) =
√n

9
Correlation

 Reason for taking 0.6745 is that in a normal distribution 50% of the observation lies in
the range µ ± 0.6745 σ , where µ is mean and σ is standard deviation
 Use of Probable Error
o To determine the limits within which the population correlation coefficient may be
expected to lie [Limits for population correlation coefficient are r ± P.E (r) ].
o To test if an observed value of sample correlation coefficient is significant of any
correlation in the population
 If r < P.E (r) correlation is non-significant
 If r > σ [P.E (r)], correlation is definitely significant
 Other case, significance of r is not known.
 P.E. can be used only if data is drawn from a normal population
 The sample is drawn using random sampling
 For small sample size, P.E. may lead to fallacious conclusion. In that case, a rigorous
test for testing the significance of an observed sample correlation coefficient is provided
by student’s t test
 Student’s t test: The test statistics is given by


 This t is distributed as Student’s t distribution with (n-2) degrees of freedom.

Note –
The symbol for the population correlation coefficient is ρ, the Greek letter "rho."
ρ = population correlation coefficient (unknown)
r = sample correlation coefficient (known; calculated from sample data)

For example,

The correlation coefficient between infant mortality rate and mother’s year of schooling is -0.12
based on a sample of 12 towns. Can we conclude that there is a negative correlation between
the two variables?

Solution:

X (deaths/1000 births)

Y (years)

r = -0.12, n = 12

Ho : ρ = 0

H a: Ρ < 0

10
Correlation

Test Statistic

t = -0.12
√ 12−2
1−(−0.12)2

= -0.382

At n-2 = 12-2 = 10 degrees of freedom and 5% level of significance the critical t value is 1.812
Since, -0.382 is not less than -1.812, we can’t reject the null hypothesis and the test statistic is
insignificant. We can’t conclude that there is a negative correlation between the two variables.

Two-way frequency table

 For a fairly large bivariate distribution, the data may be summarized in form of a two-way
frequency table.
 For each variable the values are grouped into different classes.
 If there are m classes for X variable and n classes for Y variable then there will be m*n cells
in that two-way frequency table.
 The formula for calculating r for bivariate frequency table is given by

r xy = ruv =

 Where u = (x-a)/ h and v = (y-b)/k

 And h and k are widths of x classes and y classes respectively and a and b are constant.

Spearman’s Rank Order Correlation Method

 It was developed by the British psychologist C.E. Spearman.

 It is used when the variables under consideration are arranged in a serial order.
 Useful while dealing with qualitative characters.
 Non parametric version of Pearson product moment correlation or Pearson correlation
coefficient.
 Spearman’s correlation is equivalent to calculating the Pearson correlation coefficient on the
ranked data.
 Measures strength and direction of monotonic relationship (in a monotonic relationship,
one variable increases, the other tends to either increase or decrease (not both) but not
necessarily at a constant rate) between two variables.

11
Correlation

 Can run Spearman’s rank correlation on a non-monotonic relationship to determine if

there is a monotonic component to the association.
 Spearman’s rank correlation coefficient can be used in some cases where there is a relation
whose direction is clear but which is nonlinear.
 Spearman’s correlation coefficient is not affected by extreme values. In this respect, it is
better than Karl Pearson’s correlation coefficient. Thus, if the data contains some extreme
values, Spearman’s correlation coefficient can be very useful.
 Assumption: Need two variables that are either ordinal, interval, ratio or continuous.
 It is a distribution free measure
 Its value lies between 1 and –1.
 Whether the order in which employees complete a test exercise is related to the number of
months they have been employed or correlation between the IQ of a person with the number
of hours spent in front of TV per week are some examples use case.
 Example: To find out relationship between two variables, A say Intelligence and B say
Beauty, first we have to arrange a group of individuals in order of merit with respect to
proficiency in these two characteristics. Let X and Y denotes the rank in A and B
characteristics respectively. Considering no ties, the correlation between X and Y (known
as spearman’s rank correlation) can be given by

 Where xi is the rank of ith individual in A character and yi is the rank of ith individual in B
character and n is the number of pairs (Both the series are ranked separately; largest value
gets the first rank and so on.)
 If there is a tie, take the average of the ranks they would have otherwise occupied and use
the following formula:

12
Correlation

 The occurrence of ties causes no problem in the calculation of the Spearman correlation
coefficient when the Pearson formula is used with the ranks.

Where i = paired score.

 The fundamental difference between the Pearson and Spearman correlation coefficients is
that the Pearson coefficient works with a linear relationship between the two variables
whereas the Spearman Coefficient works with monotonic relationships as well.
 One more difference is that Pearson works with raw data values of the variables whereas
Spearman works with rank-ordered variables.

Method of concurrent deviation

 Very casual method of determining correlation

 Used when precision is not required
 It is based on the principle that if the short time fluctuations of the time series are
positively correlated i.e., if their deviation is concurrent, their curves would move in the
same direction and indicate positive relation between them
 Based on signs of deviations of the values of variables from its preceding value
o We put a + sign if value of variable is greater than preceding value
o We put a - sign if value of variable is less than preceding value
o We put a = sign if value of variable is same as preceding value
 The deviation is said to be concurrent if they have same sign, i.e., either both deviations are
positive or both are negative or both are equal.
 The formula for calculating correlation coefficient using this method is


 Where, c is the number of pairs of concurrent deviations and m is the number of pair of
deviations. Also, m is one less than the number of pairs of observations.
 The quantity inside the square root must be positive otherwise r will be imaginary which is
not possible.
 Thus, if (2c-m) is positive we take + sign in and outside the square root and if (2c-m) is
negative we take - sign in and outside the square root.

13
Correlation

Coefficient of determination

 It gives the percentage variation in the dependent variable that is accounted for by the
independent variable.
o Example: If r2 is 0.72, it implies that on the basis of the sample, 72% of the variation
in one variable is caused by the variation of the other variable.
 It gives the ratio of the explained variance to the total variance.
 It is given by the square of the correlation coefficient.
 It is always non-negative and does not tell us about the direction of relationship (+ve or -ve)
between the two series.

Expalined variance
Coefficient of determination = r2 =
Total variance
Coefficient of non-determination (K2): It is the ratio of unexplained variation to the total
variation
Expalined variance
K2 =1- r2 = 1-
Total variance

Coefficient of Alienation (K): It is given by square root of Coefficient of non-determination

K= ± √ 1−r 2

BIM Essential Guide For Contractors
100% (1)
BIM Essential Guide For Contractors
62 pages
Correlation Analysis
No ratings yet
Correlation Analysis
54 pages
Chapter 6 Correlation and Regression
No ratings yet
Chapter 6 Correlation and Regression
29 pages
Lecture-25 CORRELATION - 34861774 - 2024 - 05 - 04 - 23 - 38
No ratings yet
Lecture-25 CORRELATION - 34861774 - 2024 - 05 - 04 - 23 - 38
4 pages
Correlation
100% (1)
Correlation
78 pages
Correlation Notes
No ratings yet
Correlation Notes
9 pages
Correlation 1
100% (1)
Correlation 1
57 pages
Correlation Analysis
100% (1)
Correlation Analysis
51 pages
Correlation
No ratings yet
Correlation
30 pages
Correlation
No ratings yet
Correlation
19 pages
Correlation Analysis
No ratings yet
Correlation Analysis
48 pages
Oe Statistics Notes
No ratings yet
Oe Statistics Notes
32 pages
Correlation Bmlt
No ratings yet
Correlation Bmlt
5 pages
Correlation Ansd Simple Regression
No ratings yet
Correlation Ansd Simple Regression
27 pages
Lec12 Correlation
No ratings yet
Lec12 Correlation
6 pages
Correlation & Regression
No ratings yet
Correlation & Regression
26 pages
Correction and Regression
No ratings yet
Correction and Regression
30 pages
MRS - Diana-Correlation Analysis-Notes
No ratings yet
MRS - Diana-Correlation Analysis-Notes
16 pages
Correlation
No ratings yet
Correlation
83 pages
Measures of Correlation
No ratings yet
Measures of Correlation
23 pages
Correlation 1
No ratings yet
Correlation 1
9 pages
1.2. Ch-2 - Correlation Theory-1
No ratings yet
1.2. Ch-2 - Correlation Theory-1
29 pages
Correlation KDK DHH W
No ratings yet
Correlation KDK DHH W
16 pages
Coo Relation
No ratings yet
Coo Relation
6 pages
Correlation 26-2-24
No ratings yet
Correlation 26-2-24
16 pages
STAT-111 - (C1) Corelations and Regression
No ratings yet
STAT-111 - (C1) Corelations and Regression
10 pages
lecture 5
No ratings yet
lecture 5
30 pages
CORRELATION & REGRESSION Notes for Mba
100% (1)
CORRELATION & REGRESSION Notes for Mba
62 pages
Correlation and Regression -intro
No ratings yet
Correlation and Regression -intro
24 pages
Correlation
No ratings yet
Correlation
8 pages
Correlation
No ratings yet
Correlation
84 pages
CH-5 Correlation and Regression
No ratings yet
CH-5 Correlation and Regression
20 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
11 pages
Correlation and Regression
No ratings yet
Correlation and Regression
45 pages
Stat-CORRELATION & REGRESSION ANALYSIS
No ratings yet
Stat-CORRELATION & REGRESSION ANALYSIS
79 pages
Unit 3-1
No ratings yet
Unit 3-1
12 pages
UNIT III PORIYAN NOTES (1)
No ratings yet
UNIT III PORIYAN NOTES (1)
33 pages
4-1 Introduction To Corrrelation and Its Properties
0% (1)
4-1 Introduction To Corrrelation and Its Properties
14 pages
Correlation
No ratings yet
Correlation
6 pages
Chapter 4 (Correlation part)
No ratings yet
Chapter 4 (Correlation part)
16 pages
Chapter Four Correlation Analysis: Positive or Negative
No ratings yet
Chapter Four Correlation Analysis: Positive or Negative
15 pages
PPP Correlation BIOSTATISTICS
No ratings yet
PPP Correlation BIOSTATISTICS
14 pages
Topic 2 - Correlation Theory
No ratings yet
Topic 2 - Correlation Theory
15 pages
Correlation Analysis and Its Types
No ratings yet
Correlation Analysis and Its Types
50 pages
Peter
No ratings yet
Peter
48 pages
Correlation Analysis: by Prof. Solomon James Sr. Lecturer, ASBM
No ratings yet
Correlation Analysis: by Prof. Solomon James Sr. Lecturer, ASBM
18 pages
Correlation 1
No ratings yet
Correlation 1
9 pages
Correlation Analysis and Regression 22
No ratings yet
Correlation Analysis and Regression 22
41 pages
Presentation On: Correlation and Rank Correlation: Submitted To
100% (3)
Presentation On: Correlation and Rank Correlation: Submitted To
23 pages
Correlation Notes
No ratings yet
Correlation Notes
15 pages
Regression and Correlation
No ratings yet
Regression and Correlation
23 pages
Correlation and Regression
No ratings yet
Correlation and Regression
23 pages
20200519072923cce68d4cc4
No ratings yet
20200519072923cce68d4cc4
28 pages
Business Statistics Unit 1
No ratings yet
Business Statistics Unit 1
22 pages
Correlation Analysis MBA
No ratings yet
Correlation Analysis MBA
17 pages
4-1 Introduction To Corrrelation and Its Properties
No ratings yet
4-1 Introduction To Corrrelation and Its Properties
14 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Mathematical Analysis 1: theory and solved exercises
From Everand
Mathematical Analysis 1: theory and solved exercises
Alessio Mangoni
5/5 (1)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Exercises of Tensors
From Everand
Exercises of Tensors
Simone Malacrida
No ratings yet
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet
OTC 6869 Advancements in Liquid Hydrocyclone Separation Systems
No ratings yet
OTC 6869 Advancements in Liquid Hydrocyclone Separation Systems
10 pages
Berroco FreePattern Blish
No ratings yet
Berroco FreePattern Blish
4 pages
Final BSR Road Bridge Nagaur of 2019
No ratings yet
Final BSR Road Bridge Nagaur of 2019
66 pages
Endo Lec 7
No ratings yet
Endo Lec 7
16 pages
Liberty Physics Paperset
No ratings yet
Liberty Physics Paperset
38 pages
A955305 PDF
100% (1)
A955305 PDF
177 pages
AZQW RRH 8T8R n77 320W Datasheet Preliminary
No ratings yet
AZQW RRH 8T8R n77 320W Datasheet Preliminary
3 pages
PP CH 33
No ratings yet
PP CH 33
21 pages
stein答案全部
No ratings yet
stein答案全部
287 pages
Quiz Questions
No ratings yet
Quiz Questions
14 pages
Turco 4215 NC-LT - (Data Sheet)
No ratings yet
Turco 4215 NC-LT - (Data Sheet)
2 pages
Osmania University ME SE Structures Syllabus
100% (2)
Osmania University ME SE Structures Syllabus
36 pages
Math Grade 9 - Soal - Final Test Preparation - Mathematics - 2022-2023
No ratings yet
Math Grade 9 - Soal - Final Test Preparation - Mathematics - 2022-2023
4 pages
MT1008 CC02 Linear-Algebra Group-Report Topic-1
No ratings yet
MT1008 CC02 Linear-Algebra Group-Report Topic-1
33 pages
Unit 2 - WSN
No ratings yet
Unit 2 - WSN
18 pages
Patent CA2213743A1 - A Noncaustic Composition For Removing Foreign Deposits - Google Patents
No ratings yet
Patent CA2213743A1 - A Noncaustic Composition For Removing Foreign Deposits - Google Patents
8 pages
Good and Evil Dichotomy Essay
No ratings yet
Good and Evil Dichotomy Essay
5 pages
SmartLogger3000 User Manual
No ratings yet
SmartLogger3000 User Manual
315 pages
Global Free Trade 2.0 - Dela Cruz
No ratings yet
Global Free Trade 2.0 - Dela Cruz
33 pages
Booster Uoe Part 2
No ratings yet
Booster Uoe Part 2
6 pages
Gábor Krajsovszky Heterocyclic Compounds PDF
No ratings yet
Gábor Krajsovszky Heterocyclic Compounds PDF
268 pages
Sustainable Aviation: Greening The Flight Path Thomas Walker 2024 Scribd Download
100% (3)
Sustainable Aviation: Greening The Flight Path Thomas Walker 2024 Scribd Download
62 pages
Material Handling (Industrial Plant Design)
No ratings yet
Material Handling (Industrial Plant Design)
71 pages
Tutorial 5 Phys1 UIR 2022 2023 Solution
No ratings yet
Tutorial 5 Phys1 UIR 2022 2023 Solution
13 pages
Design of Plate Girders Inc. Span To Depth Ratios
No ratings yet
Design of Plate Girders Inc. Span To Depth Ratios
63 pages
Organic Geochemistry (Geol. 507)
No ratings yet
Organic Geochemistry (Geol. 507)
3 pages
Advantages and Disadvantages of Smoking
No ratings yet
Advantages and Disadvantages of Smoking
4 pages
ASTM F75. CoCr - F75
No ratings yet
ASTM F75. CoCr - F75
2 pages
As Expt 10.2 (9) An Analysis of Iron Tablets
75% (4)
As Expt 10.2 (9) An Analysis of Iron Tablets
3 pages

Correlation

Uploaded by

Correlation

Uploaded by

Study Notes

Correlation and Causation

Not sufficient because it may be because of Spurious correlation

Methods of studying correlation

1. Scattered diagram method

Scatter Diagram Method

Perfect Positive High Positive Low Positive No Correlation

Non-linear/ Low Negative High negative Perfect Negative

Karl Pearson’s Coefficient of Correlation

 It is a mathematical method for measuring the intensity or magnitude of linear

Covariance between X and Y is defined as

where, N = number of observations

 Alternatively, r is given as:

Rainfall (X) Yield (Y) XY X^2 Y^2

r = (7 * 676 – 70*63)/ ( (7*728 - 702 ) (7*651 – 632))

Properties of correlation coefficient

 The value of r is independent of the units in which X and Y are measured.

 If variable X and Y are correlated by the linear equation aX + bY + c = 0, then the

Assumptions underlying Karl Pearson’s Correlation Coefficient

 Each variable is affected by a large number of independent contributory causes of such

Correlation Coefficient, r Relationship between variables

 The reliability of the significance of the value of correlation coefficient depends up on a

Testing the Significance of r

 A hypothesis test of the "significance of the correlation coefficient" is performed to

P.E (r) = 0.6745 X S.E. (r)

Two-way frequency table

 Where u = (x-a)/ h and v = (y-b)/k

Spearman’s Rank Order Correlation Method

 It was developed by the British psychologist C.E. Spearman.

 Can run Spearman’s rank correlation on a non-monotonic relationship to determine if

Where i = paired score.

Method of concurrent deviation

 Very casual method of determining correlation

Coefficient of Alienation (K): It is given by square root of Coefficient of non-determination

You might also like

r = (7 * 676 – 7063)/ ( (7728 - 702 ) (7*651 – 632))