0% found this document useful (0 votes)
82 views

Correlation Rev 1.0

Correlation is a statistical analysis that measures the strength and direction of association between two continuous variables. It ranges from +1 to -1, where +1 is total positive correlation, -1 is total negative correlation, and 0 is no correlation. There are three types of correlations: positive correlation where both variables increase or decrease together, negative correlation where one variable increases as the other decreases, and no correlation where the variables are independent of each other. Correlation alone does not prove causation. Common techniques to determine correlation are Pearson's r, Spearman's rho, and Kendall's tau. Correlation is useful for understanding relationships but has limitations and should not be used with categorical data.

Uploaded by

Ahmed M. Hashim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views

Correlation Rev 1.0

Correlation is a statistical analysis that measures the strength and direction of association between two continuous variables. It ranges from +1 to -1, where +1 is total positive correlation, -1 is total negative correlation, and 0 is no correlation. There are three types of correlations: positive correlation where both variables increase or decrease together, negative correlation where one variable increases as the other decreases, and no correlation where the variables are independent of each other. Correlation alone does not prove causation. Common techniques to determine correlation are Pearson's r, Spearman's rho, and Kendall's tau. Correlation is useful for understanding relationships but has limitations and should not be used with categorical data.

Uploaded by

Ahmed M. Hashim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 5

Problem Solving and Analysis Tools

CORRELATION
QUALITY TOOLS
Correlation
Description of Correlation

A correlation is a statistical technique, degree and an index of the relationship strength


between any two or more quantities (variables) in which they vary together over a period
and it shows whether and how strongly pairs of variables are related. Possible correlations
range from +1 to –1. It does not prove or disprove any cause-and-effect
(causal) relationships between them.

Although correlation is fairly obvious that data may contain unsuspected correlations. One
may also suspect there are correlations, but don't know which are the strongest. An
intelligent correlation analysis can lead to a greater understanding of data.

Types
There are three types of correlations that are identified:

1. Positive correlation:
When an increase in one variable leads to an increase in the other and a decrease in
one leads to a decrease in the other. For example, the amount of money that a person
possesses might correlate positively with the number of cars he owns.

2. Negative correlation:
When an increase in one variable leads to a decrease in another and vice versa. For
example, the level of education might correlate negatively with crime. This means if
by some way the education level is improved in a country, it can lead to lower crime.
Note that this doesn't mean that a lack of education causes crime. It could be, for
example, that both lack of education and crime have a common reason: poverty.

3. No correlation:
Two variables are uncorrelated when a change in one doesn't lead to a change in the
other and vice versa. For example, among millionaires, happiness is found to be
uncorrelated to money. This means an increase in money doesn't lead to happiness.

Techniques in determining Correlation:


 Pearson Product Moment Correlations (or "r"), most commonly-used method,
assume the two variables being considered are measured on continuously- measured
scales (like the numbers 1, 2, 3, 4, 5, 6, 7 or height or weight).

 Spearman Rank Order Correlations (or "rho") and Kendall's Tau-b (or "tau")
Correlations are used when the variables are measured as ranks (from highest-to-
lowest or lowest-to-highest).

How to use Correlation


Like all statistical techniques, correlation is only appropriate for certain kinds of data.
Correlation works for quantifiable data in which numbers are meaningful, usually
quantities of some sort. It cannot be used for purely categorical data, such as gender, brands
purchased, or favorite color.

Rating Scales
Rating scales are a controversial middle case. The numbers in rating scales have meaning,
but that meaning isn't very precise. They are not like quantities. With a quantity (such as
dollars), the difference between 1 and 2 is exactly the same as between 2 and 3. With a
rating scale, that isn't really the case. One can be sure that your respondents think a rating
of 2 is between a rating of 1 and a rating of 3, but you cannot be sure they think it is exactly
halfway between. This is especially true if you labeled the mid-points of your scale (you
cannot assume "good" is exactly half way between "excellent" and "fair").

Correlation coefficient:
The main result of a correlation study is called the Correlation coefficient (r). It ranges
from -1.0 to +1.0. A value close to +1 indicates a strong positive correlation while a value
close to -1 indicates strong negative correlation. A value near zero shows that the variables
are uncorrelated.

Graphical/Pictorial presentation of “Positive correlation coefficients” (r):

Graphical/Pictorial presentation of “Negative correlation coefficients” (r):


Graphical/Pictorial presentation of “No correlation coefficients”(r):

While correlation coefficients are normally reported as r = (a value between -1 and +1),
squaring them makes then easier to understand. The square of the coefficient (or r square)
is equal to the percent of the variation in one variable that is related to the variation in the
other. After squaring r, ignore the decimal point. An r of .5 means 25% of the variation is
related (.5 squared =.25). An r value of .7 means 49% of the variance is related (.7 squared
= .49).
Following guidelines have been proposed to interpreting Pearson's correlation coefficient.
Coefficient, r
Strength of Association Positive Negative
Small .1 to .3 -0.1 to -0.3
Medium .3 to .5 -0.3 to -0.5
Large .5 to 1.0 -0.5 to -1.0
Remember that these values are guidelines and whether an association is strong or not will
also depend on what is to be measured.

When to use the Correlation

 Correlation is used to find a linear relationship between two variables. It can be


used in a causal as well as an associative research hypothesis but it can't be used
with an attributive RH because it is univariate.

 Correlation is used for testing in Within Groups studies

 Economic theory and business studies relationships between variables

 Correlation analysis helps in deriving precisely the degree and direction of such
relationships.

 The effect of correlation is to reduce effect of uncertainty of predictions and these


predictions are more reliable and near to reality.
Tips on use of Correlation

 The variables must be either interval or ratio measurements.


 The variables must be approximately normally distributed.
 There is a linear relationship between the two variables.
 Outliers are either kept to a minimum or are removed entirely.

Applications of Correlation
 Relationships between height and weights
 Relationships between quantum of rainfall and wheat
 Relationships between price and demand of commodity
 Relationships between dose of insulin and sugar level

Examples
Height and weight are related; taller people tend to be heavier than shorter people. The
relationship isn't perfect. People of the same height vary in weight, and you can easily think
of two people you know where the shorter one is heavier than the taller one. Nonetheless,
the average weight of people 5'5'' is less than the average weight of people 5'6'', and their
average weight is less than that of people 5'7'', etc. Correlation can tell you just how much
of the variation in peoples' weights is related to their heights.

An example of a curvilinear relationship is age and health care. They are related, but the
relationship doesn't follow a straight line. Young children and older people both tend to
use much more health care than teenagers or young adults. Multiple regressions (can be
used to examine curvilinear relationships.

Two scatter plots are given below showing the amount of sleep needed per day by age and
its correlation by estimating a line of best fit. It can been be noticed as one grow older, less
sleep is needed but obviously, a 40 year old needs more that 2 hours of sleep/day. This
example proofs that prediction may be carried out up to a certain time but not for all
References

 Edwards, A. L. "The Correlation Coefficient." Ch. 4 in An Introduction to Linear


Regression and Correlation. San Francisco, CA: W. H. Freeman, pp. 33-46, 1976.

 Gonick, L. and Smith, W. "Regression." Ch. 11 in The Cartoon Guide to Statistics. New
York: Harper Perennial, pp. 187-210, 1993.

 Snedecor, G. W. and Cochran, W. G. "The Sample Correlation Coefficient " and


"Properties of ." §10.1-10.2 in Statistical Methods, 7th ed. Ames, IA: Iowa State Press,

 Spiegel, M. R. "Correlation Theory." Ch. 14 in Theory and Problems of Probability and


Statistics, 2nd ed. New York: McGraw-Hill, pp. 294-323, 1992.

You might also like