0% found this document useful (0 votes)
37 views

Prof. Dr. Moustapha Ibrahim Salem Mansourms@alexu - Edu.eg 01005857099

This document discusses correlation and regression analysis. It defines correlation as measuring the strength and direction of the linear relationship between two variables, without implying causation. Regression is used to predict the value of one variable based on the other. Key points include: - Correlation is measured on a scale from -1 to 1. Higher positive or negative values indicate a stronger linear relationship. - Scatter plots visually show the relationship between variables. - The correlation coefficient quantifies the correlation but does not prove causation. There may be alternative explanations like a third hidden variable. - Correlation only applies to linear relationships. Curvilinear patterns could show no correlation despite a relationship.

Uploaded by

Ahmed Elsayed
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Prof. Dr. Moustapha Ibrahim Salem Mansourms@alexu - Edu.eg 01005857099

This document discusses correlation and regression analysis. It defines correlation as measuring the strength and direction of the linear relationship between two variables, without implying causation. Regression is used to predict the value of one variable based on the other. Key points include: - Correlation is measured on a scale from -1 to 1. Higher positive or negative values indicate a stronger linear relationship. - Scatter plots visually show the relationship between variables. - The correlation coefficient quantifies the correlation but does not prove causation. There may be alternative explanations like a third hidden variable. - Correlation only applies to linear relationships. Curvilinear patterns could show no correlation despite a relationship.

Uploaded by

Ahmed Elsayed
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Prof. Dr.

Moustapha Ibrahim Salem


[email protected]
01005857099
Correlation
• Development of behavioral statistics
• Development of the basis of correlati
on and regression
• Father of Eugenics
• Science of fingerprints as unique
• Retrospective IQ of 200 Sir Francis Galton
)Cousin to Darwin(
• Drove himself mad just to prove you
could do it
• Invented the pocket
• Much of statistics is concerned with relationsh
ips among variables and whether
• observed relationships are real or simply due t
o chance.
• In particular, the simplest case deals with the r
elationship between two variables.
• When analyzing two variables, one question b
ecomes important as it determines the type o
f analysis that will be done.
• Is the purpose to explore the nature of t
he relationship, or is the purpose to use
one variable to explain variation in anot
her variable?
• For example, there is a difference betwee
n examining height and weight to see if t
here is a strong relationship, as opposed
to using height to predict weight.
• Consequently, you need to distinguish between a correla
tional analysis in which only the strength of the relations
hip will be described, and regression where one variable
will be used to predict the values of a second variable.
• The two variables are often called either a response vari
able or an explanatory variable.
• A response variable (also known as a dependent or Y vari
able) measures the outcome of a study.
• An explanatory variable (also known as an independent
or X variable) is the variable that attempts to explain the
observed outcomes.
Correlation
?Do Variables Relate to One Another
Is worker’s pay related to Positive
performance?
Is exercise related to illness? Negative
Is CO2 related to global Positive
warming?
Is TV viewing related to shoe Zero
size?
Scatter Plots and Correlat
ion
• A scatter plot (or scatter diagram) is used to s
how the relationship between two variables.
• Correlation analysis is used to measure stren
gth of the association (linear relationship) be
tween two variables.
Only concerned with strength of the relationship
No causal effect is implied
Linear Correlation
Linear Correlation
Correlation Coefficient
• The population correlation coefficient ρ (rho)
measures the strength and direction of the lin
ear association between the variables
• The sample correlation coefficient r is an esti
mate of ρ and is used to measure the strength
and direction of the linear relationship in the s
ample observations
Correlation Coefficient
• The formula for r is:
Correlation Coefficient
• A statistic that quantifies a relation between two vari
ables
• Has no units
• Can be either positive or negative
• Falls between -1.00 and 1.00
• The value of the number (not the sign) indicates the s
trength of the relation
• ρ and r are unaffected by linear transformations of th
e individual variables, e.g. unit changes such as conve
rting from imperial to metric units.
Correlation Coefficient
Correlation Coefficient
Calculating a Correlation Coefficie
nt
Correlation Coefficient
Correlation Coefficient
Correlation Coefficient
Correlation Coefficient
Interpreting Correlations
Cautions
• Random Sampling Required.
• Sample correlation coefficients are only valid under si
mple random samples. If the data were collected in a
haphazard fashion or if certain data points were over
sampled, then the correlation coefficient may be sev
erely biased.
• There are examples of high correlation but no practic
al use and low correlation but great practical use.
Cautions
• Correlation measures ‘strengt
h’ of a linear relationship; a cu
rvilinear relationship may have
a correlation of 0, but there wil
l still be a good correlation.
• Watch for outliers and high lev
erage points
Cautions
• Effects of lurking variables.
• For example, suppose there is a positive association b
etween wages of male nurses and years of experienc
e; between female nurses and years of experience; b
ut males are generally paid more than females.
• There is a positive correlation within each group, but
an overall negative correlation when the data are po
oled together.
Cautions
• Correlation does not imply causation.
• This is the most frequent mistake made b
y people. There are set of principles of ca
usal inference that need to be satisfied in
order to imply cause and effect
Correlation and Causation
• The fact that two variables are strongly correla
ted does not in itself imply a cause and- effect
relationship between the variables.
• If there is a significant correlation between tw
o variables, you should consider the following
possibilities
Correlation and Causation
1. Is there a direct cause-and-effect relationship between the variable
s?
• Does x cause y?
2. Is there a reverse cause-and-effect relationship between the variabl
es?
• Does y cause x?
3. Is it possible that the relationship between the variables can be cau
sed by a third variable or by a combination of several other variables?
4. Is it possible that the relationship between two variables may be a
coincidence?
Establishing cause-and eff
ect
• It is generally agreed that most or all of the foll
owing must be considered before causation ca
n be declared.
• Strength of the association. The stronger an o
bserved association appears over a series of di
fferent studies, the less likely this association i
s spurious because of bias.
Establishing cause-and eff
ect
• Dose-response effect
• The value of the response variable changes in a mean
ingful way with the dose (or level) of the suspected c
ausal agent.
• Lack of temporal ambiguity
• The hypothesized cause precedes the occurrence of t
he effect. The ability to establish this time pattern wil
l depend upon the study design used
Establishing cause-and eff
ect
• Consistency of the findings
• Most, or all, studies concerned with a given causal hypoth
esis produce similar findings. Of course, studies dealing w
ith a given question may all have serious bias problems th
at can diminish the importance of observed associations.
• Specificity of the association
• The observed effect is associated with only the suspected
cause (or few other causes that can be ruled out)
Establishing cause-and eff
ect
• Engineering or theoretical plausibility
• The hypothesized causal relationship is consistent with
current engineering or
• theoretical knowledge. Note, that the current state of k
nowledge may be insufficient to explain certain findings
• Coherence of the evidence
• The findings do not seriously conflict with accepted fact
s about the outcome variable being studied
End of Chapter
4

You might also like