0% found this document useful (0 votes)

20 views

Unit 4

Uploaded by

Uttareshwar Sontakke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Unit 4

Uploaded by

Uttareshwar Sontakke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

UNIT 4 CORRELATION ANALYSIS

Objectives
After going through this unit, you will be able to:
 Illustrate correlation analysis
 Explain misconceptions with correlation
 Describe correlation terminologies
Structure
4.1 Introduction
4.2 Covariance and Correlation in Projects
4.3 Correlation Analysis using Scatter Plots
4.4 Karl Pearson’s Coefficient of Correlation
4.5 Spearman’s Rank Correlation Coefficient
4.6 Keywords
4.7 Summary

4.1 INTRODUCTION

In general, correlation exists when two variables have a linear relationship beyond what is expected
by chance alone. The most common measure of correlation is called the “Pearson Product-Moment
Correlation Coefficient”. It is examined by measures from only two variables, namely the covariance
between the two variables {cov(x,y)} and the standard deviation of each (σx, σy). This measure can
range from -1 to 1, inclusive. A value of -1 represents a “perfect negative correlation”, while a value
of 1 represents a “perfect positive correlation”. The closer a correlation measure is to these extremes,
the “stronger” the correlation between the two variables. A value of zero means that no correlation
is observed. It is important to note that a correlation measure of zero does not necessarily mean that
there is no relationship between the two variables, just that there is no linear relationship present in
the data that is being analyzed. It is also sometimes difficult to judge whether a correlation measure
is “high” or “low”.

There are certain situations where a correlation measure of 0.3, for example, may be considered
negligible. In other circumstances, such as in the social sciences, a 0.3 correlation measure may
suggest that further examination is needed. As with all data analysis, the context of the data must be
understood in order to evaluate any results.
4.2 COVARIANCE AND CORRELATION IN PROJECTS

It often arises in the course of executing projects that one or more random variables, or events, appear
to bear on the same project problem. For instance, fixed costs that accumulate period by period and
the overall project schedule duration are two random variables with obvious dependencies. Two
statistical terms come into play when two or more variables are in the same project space: covariance
and correlation.
Covariance
Covariance is a measure of how much one random variable depends on another. Typically, we
think in terms of “if X gets larger, does Y also get larger or does Y get smaller?” The covariance will
be negative for the latter and positive for the former. The value of the covariance is not particularly
meaningful since it will be large or small depending on whether X and Y are large or small.
Covariance is defined simply as:

Cov( X , Y ) = E( X * Y ) - E( X ) * E( Y )

If X and Y are independent, then E( X * Y ) = E( X ) * E( Y ), and COV( X , Y ) = 0.

If the covariance of two random variables is not 0, then the variance of the sum of X and Y
becomes:
VAR( X + Y ) = VAR( X ) + VAR( Y ) + 2 * COV( X , Y )
The covariance of a sum becomes a governing equation for the project management problem of
shared resources, particularly people. If the random variable X describes the availability need for
a resource and Y for another resource, then the total variance of the availability need of the
combined resources is given by the equation above.

Correlation
Covariance does not directly measure the strength of the “sensitivity” of X on Y; judging the
strength is the job of correlation. Sensitivity will tell us how much the cost changes if the schedule
is extended a month or compressed a month. In other words, sensitivity is always a ratio, also
called a density, as in this example: $cost change/month change. But if cost and time are random
variables, what does the ratio of any single outcome among all the possible outcomes forecast for
the future? Correlation is a statistical estimate of the effects of sensitivity, measured on a scale of
-1 to +1.
The Greek letter rho, ρ , used on populations of data, and “r”, used with samples of data, stand
for the correlation between two random variables: r( X , Y ). The usual way of referring to “r” or
“ρ” is as the “correlation coefficient.” As such, their values can range from -1 to +1. “0” value
means no correlation, whereas -1 means highly correlated but moving in opposite directions, and
+1 means highly correlated moving in the same direction.

The correlation function is defined as the covariance normalized by the product of the standard
deviations:
r( X , Y ) = COV( X , Y )/( σ X * σ Y )
We can now rewrite the variance equation:
VAR( X + Y ) = VAR( X ) + VAR( Y ) + 2 * ρ ( σ X + σ Y )

4.3 CORRELATION ANALYSIS USING SCATTER PLOTS

In statistics, correlation, (often measured as a correlation coefficient), indicates the strength and
direction of a linear relationship between two random variables. In general statistical usage,
correlation or co-relation refers to the departure of two variables from independence. In this broad
sense there are several coefficients, measuring the degree of correlation, adapted to the nature of
data. A number of different coefficients are used for different situations. The best known is the
Pearson product-moment correlation coefficient, which is obtained by dividing the covariance of the
two variables by the product of their standard deviations. Despite its name, it was first introduced by
Francis Galton. Several authors have offered guidelines for the interpretation of a correlation
coefficient. Cohen (1988), for example, has suggested the following interpretations for correlations in
psychological research, in the table below.

Correlation Negative Positive

Small −0.3 to −0.1 0.1 to 0.3
Medium −0.5 to −0.3 0.3 to 0.5
Large −1.0 to −0.5 0.5 to 1.0

As Cohen himself has observed, however, all such criteria are in some ways arbitrary and should not
be observed too strictly. This is because the interpretation of a correlation coefficient depends on the
context and purposes. A correlation of 0.9 may be very low if one is verifying a physical law using high-
quality instruments, but may be regarded as very high in the social sciences where there may be a
greater contribution from complicating factors.
Along this vein, it is important to remember that “large” and “small” should not be taken as synonyms
for “good” and “bad” in terms of determining that a correlation is of a certain size. For example, a
correlation of 1.0 or −1.0 indicates that the two variables analyzed are equivalent modulo scaling.
Scientifically, this more frequently indicates a trivial result than an earth-shattering one. For example,
consider discovering a correlation of 1.0 between how many feet tall a group of people are and the
number of inches from the bottom of their feet to the top of their heads. A scatter plot is a graph that
represents bivariate data as points on a two-dimensional Cartesian plane. The following set of data
values were observed for the height h (in cm) and weight w (in kg) of nine Year 10 students.

Plot a scatter plot for this set of data.

Solution:

The scatter plot is obtained by plotting w against h, as shown above.

We use the scatter plot to look for patterns that might indicate that the variables are related.
Then, if the variables are related, we can visualise what kind of line (or curve), or equation,
describes the relationship. Association (or relationship) between two variables will be described
as strong, weak or none; and the direction of the association may be positive, negative or none.
In the previous example, w increases as h increases. We say that a strong positive association
exists between the variables h and w.
Consider the following scatter plot:
It is clear from the scatter plot that y decreases as x increases. We say that a strong negative
association exists between the variables x and y.
Consider the following scatter plot:

We observe that y increases as x increases, and the points do not lie on a straight line. We say that
a weak positive association exists between the variables x and y.
Consider the following scatter plot:

We observe that y decreases as x increases, and the points do not lie on a straight line. We say
that a weak negative association exists between the variables x and y.
Consider the following scatter plot:
It is clear from the scatter plot that as x increases, there is no apparent effect on the y. In such a
case, we say that no association exists between the variables x and y.
Consider the following scatter plot:

If a data value does not fit the trend of the data, then it is said to be an outlier. In the above
scatter plot, it is easy to identify the outliers. There are two outliers in the set of data values.

4.4 KARL PEARSON’S COEFFICIENT OF CORRELATION

Sum of Squares

We introduced a notation earlier in the course called the sum of squares. This notation was the
SS notation, and will make these formulas much easier to work with.

Notice these are all the same pattern,

SS(x) could be written as

Also, note that

Pearson's Correlation Coefficient

There is a measure of linear correlation. The population parameter is denoted by the Greek letter
rho and the sample statistic is denoted by the roman letter r. Here are some properties of r
 r only measures the strength of a linear relationship. There are other kinds of relationships
besides linear.
 r is always between -1 and 1 inclusive. -1 means perfect negative linear correlation and +1
means perfect positive linear correlation
 r has the same sign as the slope of the regression (best fit) line
 r does not change if the independent (x) and dependent (y) variables are interchanged
 r does not change if the scale on either variable is changed. You may multiply, divide, add,
or subtract a value to/from all the x-values or y-values without changing the value of r.
 r has a Student's t distribution

Here is the formula for r. Don’t worry about it, we won’t be finding it this way. This formula can
be simplified through some simple algebra and then some substitutions using the SS notation
discussed earlier.

If you divide the numerator and denominator by n, then you get something which is starting to
hopefully look familiar. Each of these values has been seen before in the Sum of Squares notation
section. So, the linear correlation coefficient can be written in terms of sum of squares.
Application of Correlation in Hypothesis Testing
The claim we will be testing is “There is significant linear correlation”.
The Greek letter for r is rho, so the parameter used for linear correlation is rho
H0: rho = 0
H1: rho <> 0
r has a t distribution with n-2 degrees of freedom, and the test statistic is given by:

Now, there are n-2 degrees of freedom this time. This is a difference from before. As an over-
simplification, you subtract one degree of freedom for each variable, and since there are 2
variables, the degrees of freedom are n-2.
This doesn't look like our

If you consider the standard error for r is

the formula for the test statistic is , which does look like the pattern we're looking for.

Remember that Hypothesis testing is always done under the assumption that the null hypothesis is
true.

4.5 SPEARMAN’S RANK CORRELATION COEFFICIENT

In statistics, Spearman's rank correlation coefficient or Spearman's rho, named after Charles
Spearman and often denoted by the Greek letter ρ (rho) or as rs, is a non-parametric measure of
correlation – that is, it assesses how well an arbitrary monotonic function could describe the
relationship between two variables, without making any other assumptions about the particular
nature of the relationship between the variables. Certain other measures of correlation are parametric
in the sense of being based on possible relationships of a parameterised form, such as a linear
relationship.

In principle, ρ is simply a special case of the Pearson product-moment coefficient in which two sets of
data Xi and Yi are converted to rankings xi and yi before calculating the coefficient. In practice,
however, a simpler procedure is normally used to calculate ρ. The raw scores are converted to ranks,
and the differences, di between the ranks of each observation on the two variables are calculated.

If there are no tied ranks, then ρ is given by:

di = xi − yi = the difference between the ranks of corresponding values X i and Yi, and

n = the number of values in each data set (same for both sets).

If tied ranks exist, classic Pearson's correlation coefficient between ranks has to be used instead of
this formula:

4.7 KEY WORDS

 Scatter Plot – Graph representing bivariate data as points on a two-dimensional Cartesian
plane.
 Pearson's correlation coefficient – Result of Pearson product-moment coefficient.
 Spearman's rank correlation coefficient – A non-parametric measure of correlation.

4.8 SUMMARY
These sorts of studies involve comparing two variables (e.g., income and crime, smoking habits and
health) in order to see if there might be some connection and perhaps even a suggestion of cause. As
a cigarette smoking habit rises, do health problems also rise? As income decreases, does the frequency
of crime increase? As people grow older to they become less or more tolerant of others?
Correlation is an extremely important analytical tool which enables us to begin to sort out claims about
important connections, which may or may not be true: the amount of smoking and the incidence of
lung cancer, HIV infection and the onset of AIDS, the age of a car and its value, television programming
of playoff games and attendance at lectures, poverty and crime, IQ tests and income levels,
intelligence and heredity, age and mechanical skills, and so on. People make claims about such matters
all the time. The principle of correlation enables us to investigate such claims in order to understand
whether they are true or not and, if true, just what the strength of that relationship might be.

(Ebook) Father-Daughter Relationships: Contemporary Research and Issues by Nielsen, Linda ISBN 9780367232870, 9780429279133, 9780367232863, 9781000005172, 0367232871, 0429279132, 0367232863, 1000005178 All Chapters Instant Download
100% (11)
(Ebook) Father-Daughter Relationships: Contemporary Research and Issues by Nielsen, Linda ISBN 9780367232870, 9780429279133, 9780367232863, 9781000005172, 0367232871, 0429279132, 0367232863, 1000005178 All Chapters Instant Download
47 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Correlation and Covariance
No ratings yet
Correlation and Covariance
11 pages
Correlation Analysis-Students NotesMAR 2023
No ratings yet
Correlation Analysis-Students NotesMAR 2023
24 pages
Correlation Analysis
No ratings yet
Correlation Analysis
48 pages
Correlation notes
No ratings yet
Correlation notes
17 pages
The Significance of Correlation
No ratings yet
The Significance of Correlation
6 pages
Unit 3 Covariance and Correlation
No ratings yet
Unit 3 Covariance and Correlation
7 pages
Unit II Notes Correlation and Regression
No ratings yet
Unit II Notes Correlation and Regression
19 pages
ch7 - CORELATION
No ratings yet
ch7 - CORELATION
16 pages
CORRELATION
No ratings yet
CORRELATION
4 pages
Correlation:: (Bálint Tóth, Pázmány Péter Catholic University:, Do Not Share Without Author's Permission)
No ratings yet
Correlation:: (Bálint Tóth, Pázmány Péter Catholic University:, Do Not Share Without Author's Permission)
9 pages
Correlation Notes
No ratings yet
Correlation Notes
9 pages
Coo Relation
No ratings yet
Coo Relation
6 pages
Online Class Etiquettes and Precautions For The Students
No ratings yet
Online Class Etiquettes and Precautions For The Students
49 pages
Chapter 8. Correlation and Regression Analyses
No ratings yet
Chapter 8. Correlation and Regression Analyses
36 pages
Correlation Notes
No ratings yet
Correlation Notes
15 pages
Unit 3-1
No ratings yet
Unit 3-1
12 pages
Correleation and PMCC
No ratings yet
Correleation and PMCC
12 pages
Topic 4.5 Correlational Analysis
No ratings yet
Topic 4.5 Correlational Analysis
28 pages
Correlation 805deee567bf3bca405e2e973070a021
No ratings yet
Correlation 805deee567bf3bca405e2e973070a021
18 pages
Correlation and Regression Analysis
100% (1)
Correlation and Regression Analysis
59 pages
Correlation and Regression
No ratings yet
Correlation and Regression
22 pages
Modelling and Forecast
No ratings yet
Modelling and Forecast
19 pages
Correlation
No ratings yet
Correlation
19 pages
Correlation and Regression Analysis
0% (1)
Correlation and Regression Analysis
17 pages
Earthquake Microzonation of Yogyakarta City
No ratings yet
Earthquake Microzonation of Yogyakarta City
23 pages
Correlation Analysis Notes-2
No ratings yet
Correlation Analysis Notes-2
5 pages
Correlation and Its Applications in Economics
No ratings yet
Correlation and Its Applications in Economics
22 pages
Approach To Comparative Politics
No ratings yet
Approach To Comparative Politics
8 pages
Notes For Correlation Unit - 3 Business Statistics
No ratings yet
Notes For Correlation Unit - 3 Business Statistics
21 pages
Oe Statistics Notes
No ratings yet
Oe Statistics Notes
32 pages
Correlation
No ratings yet
Correlation
83 pages
Statistics
No ratings yet
Statistics
21 pages
Correlation
No ratings yet
Correlation
17 pages
Correlation and Regression -intro
No ratings yet
Correlation and Regression -intro
24 pages
Statistics module 3hejeiehhwwhgsysysudhhdbb
No ratings yet
Statistics module 3hejeiehhwwhgsysysudhhdbb
44 pages
correlationCoefficient
No ratings yet
correlationCoefficient
8 pages
MIS_BA_20232024_notes_chapter02
No ratings yet
MIS_BA_20232024_notes_chapter02
8 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
34 pages
Unit 6 Correlation Coefficient: Structure
No ratings yet
Unit 6 Correlation Coefficient: Structure
20 pages
4-1 Introduction To Corrrelation and Its Properties
No ratings yet
4-1 Introduction To Corrrelation and Its Properties
14 pages
Correlation of Experimental Data CLIL 2017
No ratings yet
Correlation of Experimental Data CLIL 2017
8 pages
Correlation Analysis
No ratings yet
Correlation Analysis
16 pages
Chapter 6 Correlation and Regression
No ratings yet
Chapter 6 Correlation and Regression
29 pages
Topic 7.1_Correlation and Simple Linear Regression
No ratings yet
Topic 7.1_Correlation and Simple Linear Regression
20 pages
Correlation: Hapter
No ratings yet
Correlation: Hapter
16 pages
ABM 401 Lesson 12
No ratings yet
ABM 401 Lesson 12
14 pages
4-1 Introduction To Corrrelation and Its Properties
0% (1)
4-1 Introduction To Corrrelation and Its Properties
14 pages
Correlation
No ratings yet
Correlation
6 pages
Correlation
No ratings yet
Correlation
6 pages
Chapter 6 PDF
No ratings yet
Chapter 6 PDF
3 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
100 pages
QMM 1
No ratings yet
QMM 1
18 pages
Block 5 MS 08 Correlation
No ratings yet
Block 5 MS 08 Correlation
13 pages
Correlation
No ratings yet
Correlation
13 pages
Business Project 12 Content
No ratings yet
Business Project 12 Content
33 pages
Peter
No ratings yet
Peter
48 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Mathematical Equality: Fundamentals and Applications
From Everand
Mathematical Equality: Fundamentals and Applications
Fouad Sabry
No ratings yet
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet
Gcse Maths Statistics Coursework Cars
100% (2)
Gcse Maths Statistics Coursework Cars
8 pages
The Impact of Brand Reputation Brand Equity and Brand
No ratings yet
The Impact of Brand Reputation Brand Equity and Brand
15 pages
Data Mining For Intelligence
No ratings yet
Data Mining For Intelligence
4 pages
ABSTRACT BY MUSKAN KORI (MSC.1ST YEAR)
No ratings yet
ABSTRACT BY MUSKAN KORI (MSC.1ST YEAR)
9 pages
Effect of Physical-Sports Leisure Activities On Yo
No ratings yet
Effect of Physical-Sports Leisure Activities On Yo
9 pages
ML Application for Logging & Petrophysical Interpretation
No ratings yet
ML Application for Logging & Petrophysical Interpretation
21 pages
Reliability and Validity of The Thematic Apperception Test Scored
No ratings yet
Reliability and Validity of The Thematic Apperception Test Scored
78 pages
The Current Status and Future Prospects of Digital Marketing in Ethiopia: Focus On Hawassa District Dashen Bank S.C
No ratings yet
The Current Status and Future Prospects of Digital Marketing in Ethiopia: Focus On Hawassa District Dashen Bank S.C
12 pages
Sadistic Statistics An Introduction to Statistics for the Social and Behavioral Sciences 1st Edition by Connolly, Sluckin 9781349012268 - Download the full ebook version right now
100% (7)
Sadistic Statistics An Introduction to Statistics for the Social and Behavioral Sciences 1st Edition by Connolly, Sluckin 9781349012268 - Download the full ebook version right now
90 pages
Lesson-3.3-Probability-Normal-Distribution-Linear-Regression-and-Correlation
No ratings yet
Lesson-3.3-Probability-Normal-Distribution-Linear-Regression-and-Correlation
29 pages
FINC2340 Assignments
No ratings yet
FINC2340 Assignments
2 pages
Toxicity Assessment of 4 Azo Dyes in Zebrafish Embryos
No ratings yet
Toxicity Assessment of 4 Azo Dyes in Zebrafish Embryos
9 pages
Mathematics P2 SEPT-2014-3
No ratings yet
Mathematics P2 SEPT-2014-3
20 pages
2019 Discriminantvalidityofthe CBRscale JPBM
No ratings yet
2019 Discriminantvalidityofthe CBRscale JPBM
13 pages
The Influence of Handling Health Services Complaints On Patient's Trust in Regional General Hospital
No ratings yet
The Influence of Handling Health Services Complaints On Patient's Trust in Regional General Hospital
20 pages
Iogi2018,+06 +Ichsan+Ramadhan+Mokodompit +OK
No ratings yet
Iogi2018,+06 +Ichsan+Ramadhan+Mokodompit +OK
11 pages
A New Image Encryption Algorithm For Grey and Color Medical Images
No ratings yet
A New Image Encryption Algorithm For Grey and Color Medical Images
11 pages
Financial Literacy On Investment Performance: The Mediating Effect of Big-Five Personality Traits Model
No ratings yet
Financial Literacy On Investment Performance: The Mediating Effect of Big-Five Personality Traits Model
9 pages
McCarthy, D. P., Saegert, S. - Residential Density, Social Overload, and Social Withdrawal - Human Ecology, Vol. 6 (1978) 253-272
No ratings yet
McCarthy, D. P., Saegert, S. - Residential Density, Social Overload, and Social Withdrawal - Human Ecology, Vol. 6 (1978) 253-272
20 pages
BS-CHAPTER5
No ratings yet
BS-CHAPTER5
4 pages
CHAPTER III (Cosme)
No ratings yet
CHAPTER III (Cosme)
13 pages
Social media usage with political participation of grade 12 humss students
No ratings yet
Social media usage with political participation of grade 12 humss students
62 pages
BES220 - Theme 5 Linear Regression - Lecture 2 Line Fitting and Correlation - Slides
No ratings yet
BES220 - Theme 5 Linear Regression - Lecture 2 Line Fitting and Correlation - Slides
20 pages
PrajnanVol XLVIINo 42018-19
No ratings yet
PrajnanVol XLVIINo 42018-19
29 pages
Phenotypic and Genetic Correlations Between Body Weight and Linear Body Measurements of Nigerian Local and Improved Chickens
No ratings yet
Phenotypic and Genetic Correlations Between Body Weight and Linear Body Measurements of Nigerian Local and Improved Chickens
4 pages
A Systems Approach: The Impact of Index Funds in Commodity Futures Markets
No ratings yet
A Systems Approach: The Impact of Index Funds in Commodity Futures Markets
11 pages
Influence of Strategic Factors On Early
No ratings yet
Influence of Strategic Factors On Early
12 pages
Filmview: A Review Paper On Movie Recommendation Systems: © JUN 2023 - IRE Journals - Volume 6 Issue 12 - ISSN: 2456-8880
No ratings yet
Filmview: A Review Paper On Movie Recommendation Systems: © JUN 2023 - IRE Journals - Volume 6 Issue 12 - ISSN: 2456-8880
6 pages
IFT Notes R03 Probability Concepts
No ratings yet
IFT Notes R03 Probability Concepts
19 pages

Unit 4

Uploaded by

Unit 4

Uploaded by

UNIT 4 CORRELATION ANALYSIS

If X and Y are independent, then E( X * Y ) = E( X ) * E( Y ), and COV( X , Y ) = 0.

4.3 CORRELATION ANALYSIS USING SCATTER PLOTS

Correlation Negative Positive

Plot a scatter plot for this set of data.

The scatter plot is obtained by plotting w against h, as shown above.

4.4 KARL PEARSON’S COEFFICIENT OF CORRELATION

Notice these are all the same pattern,

Also, note that

Pearson's Correlation Coefficient

If you consider the standard error for r is

4.5 SPEARMAN’S RANK CORRELATION COEFFICIENT

If there are no tied ranks, then ρ is given by:

4.7 KEY WORDS

You might also like