Correlation Rev 1.0
Correlation Rev 1.0
CORRELATION
QUALITY TOOLS
Correlation
Description of Correlation
Although correlation is fairly obvious that data may contain unsuspected correlations. One
may also suspect there are correlations, but don't know which are the strongest. An
intelligent correlation analysis can lead to a greater understanding of data.
Types
There are three types of correlations that are identified:
1. Positive correlation:
When an increase in one variable leads to an increase in the other and a decrease in
one leads to a decrease in the other. For example, the amount of money that a person
possesses might correlate positively with the number of cars he owns.
2. Negative correlation:
When an increase in one variable leads to a decrease in another and vice versa. For
example, the level of education might correlate negatively with crime. This means if
by some way the education level is improved in a country, it can lead to lower crime.
Note that this doesn't mean that a lack of education causes crime. It could be, for
example, that both lack of education and crime have a common reason: poverty.
3. No correlation:
Two variables are uncorrelated when a change in one doesn't lead to a change in the
other and vice versa. For example, among millionaires, happiness is found to be
uncorrelated to money. This means an increase in money doesn't lead to happiness.
Spearman Rank Order Correlations (or "rho") and Kendall's Tau-b (or "tau")
Correlations are used when the variables are measured as ranks (from highest-to-
lowest or lowest-to-highest).
Rating Scales
Rating scales are a controversial middle case. The numbers in rating scales have meaning,
but that meaning isn't very precise. They are not like quantities. With a quantity (such as
dollars), the difference between 1 and 2 is exactly the same as between 2 and 3. With a
rating scale, that isn't really the case. One can be sure that your respondents think a rating
of 2 is between a rating of 1 and a rating of 3, but you cannot be sure they think it is exactly
halfway between. This is especially true if you labeled the mid-points of your scale (you
cannot assume "good" is exactly half way between "excellent" and "fair").
Correlation coefficient:
The main result of a correlation study is called the Correlation coefficient (r). It ranges
from -1.0 to +1.0. A value close to +1 indicates a strong positive correlation while a value
close to -1 indicates strong negative correlation. A value near zero shows that the variables
are uncorrelated.
While correlation coefficients are normally reported as r = (a value between -1 and +1),
squaring them makes then easier to understand. The square of the coefficient (or r square)
is equal to the percent of the variation in one variable that is related to the variation in the
other. After squaring r, ignore the decimal point. An r of .5 means 25% of the variation is
related (.5 squared =.25). An r value of .7 means 49% of the variance is related (.7 squared
= .49).
Following guidelines have been proposed to interpreting Pearson's correlation coefficient.
Coefficient, r
Strength of Association Positive Negative
Small .1 to .3 -0.1 to -0.3
Medium .3 to .5 -0.3 to -0.5
Large .5 to 1.0 -0.5 to -1.0
Remember that these values are guidelines and whether an association is strong or not will
also depend on what is to be measured.
Correlation analysis helps in deriving precisely the degree and direction of such
relationships.
Applications of Correlation
Relationships between height and weights
Relationships between quantum of rainfall and wheat
Relationships between price and demand of commodity
Relationships between dose of insulin and sugar level
Examples
Height and weight are related; taller people tend to be heavier than shorter people. The
relationship isn't perfect. People of the same height vary in weight, and you can easily think
of two people you know where the shorter one is heavier than the taller one. Nonetheless,
the average weight of people 5'5'' is less than the average weight of people 5'6'', and their
average weight is less than that of people 5'7'', etc. Correlation can tell you just how much
of the variation in peoples' weights is related to their heights.
An example of a curvilinear relationship is age and health care. They are related, but the
relationship doesn't follow a straight line. Young children and older people both tend to
use much more health care than teenagers or young adults. Multiple regressions (can be
used to examine curvilinear relationships.
Two scatter plots are given below showing the amount of sleep needed per day by age and
its correlation by estimating a line of best fit. It can been be noticed as one grow older, less
sleep is needed but obviously, a 40 year old needs more that 2 hours of sleep/day. This
example proofs that prediction may be carried out up to a certain time but not for all
References
Gonick, L. and Smith, W. "Regression." Ch. 11 in The Cartoon Guide to Statistics. New
York: Harper Perennial, pp. 187-210, 1993.