Chapter - 5 - Correlation and Regression
Chapter - 5 - Correlation and Regression
SP ( X M X )(Y MY )
SP – Computational formula
• Definitional formula emphasizes SP as the sum
of two difference scores
• Computational formula results in easier
calculations
• SP computational formula:
SP XY
X Y
n
Pearson Correlation
Calculation
• Ratio comparing the covariability of X and Y
(numerator) with the variability of X and Y
separately (denominator)
SP
r
SS X SS Y
Figure 14.4
Example 14.3 Scatterplot
Pearson Correlation and
z-Scores
• Pearson correlation formula can be expressed
as a relationship of z-scores.
Sample : r
z X zY
n 1
Population :
z X zY
N
Learning Check
• A scatterplot shows a set of data points that fit
very loosely around a line that slopes down to
the right. Which of the following values would
be closest to the correlation for these data?
A • 0.75
B • 0.35
C • -0.75
D • -0.35
Learning Check - Answer
• A scatterplot shows a set of data points that fit
very loosely around a line that slopes down to
the right. Which of the following values would
be closest to the correlation for these data?
A • 0.75
B • 0.35
C • -0.75
D • -0.35
Learning Check
• Decide if each of the following statements
is True or False
(20)(20)
SP 20 20 40 20
10
14.3 Using and Interpreting
the Pearson Correlation
• Correlations used for:
– Prediction
– Validity
– Reliability
– Theory verification
Interpreting Correlations
• Correlation describes a relationship but does
not demonstrate causation
• Establishing causation requires an experiment
in which one variable is manipulated and
others carefully controlled
• Example 14.4 (and Figure 14.5) demonstrates
the fallacy of attributing causation after
observing a correlation
Figure 14.5 Correlation:
Churches and Serious Crimes
Correlations and Restricted
Range of Scores
• Correlation coefficient value (size) will be
affected by the range of scores in the data
• Severely restricted range may provide a very
different correlation than would a broader
range of scores
• To be safe, never generalize a correlation
beyond the sample range of data
Figure 14.6 Restricted Score
Range Influences Correlation
Correlations and Outliers
rs 1
6 D 2
n(n 2 1)
Point-Biserial Correlation
• Measures relationship between two variables
– One variable has only two values
(called a dichotomous or binomial variable)
• Effect size for independent samples t-test in
Chapter 10 can be measures by r2
– Point-biserial r2 has same value as the r2
computed from t-statistic
– t-statistic tests significance of the mean difference
– r statistic measures the correlation size
Point-Biserial Correlation
• Applicable in the same situation as the
independent-measures t test in Chapter 10
– Code one group 0 and the other 1 (or any two
digits) as the Y score
– t-statistic evaluates the significance of mean
difference
– Point-Biserial r measures correlation magnitude
– r2 quantifies effect size
Phi Coefficient
• Both variables (X and Y) are dichotomous
– Both variables are re-coded to values 0 and 1 (or
any two digits)
– The regular Pearson formulas is used to calculate r
– r2 (coefficient of determination) measures effect
size (proportion of variability in one score
predicted by the other)
Learning Check
• Participants were classified as “morning people”
or “evening people” then measured on a 50-point
conscientiousness scale. Which correlation
should be used to measure the relationship?
A • Pearson correlation
B • Spearman correlation
C • Point-biserial correlation
D • Phi-coefficient
Learning Check - Answer
• Participants were classified as “morning people”
or “evening people” then measured on a 50-point
conscientiousness scale. Which correlation
should be used to measure the relationship?
A • Pearson correlation
B • Spearman correlation
C • Point-biserial correlation
D • Phi-coefficient
Learning Check
• Decide if each of the following statements
is True or False
SEoE =
SS residual
(Y Yˆ ) 2
df n2
Figure 14.17 Regression Lines:
Perfectly Fit vs. Example 14.13
Relationship Between Correlation
and Standard Error of Estimate
• As r goes from 0 to 1, SEoE decreases to 0
• Predicted variability in Y scores:
SSregression = r2 SSY
• Unpredicted variability in Y scores:
SSresidual = (1 - r2) SSY
• Standard Error of Estimate based on r:
SSresidual (1 r 2 ) SSY
df n2
Testing Regression Significance
• Analysis of Regression
– Similar to Analysis of Variance
– Uses an F-ratio of two Mean Square values
– Each MS is a SS divided by its df
• H0: the slope of the regression line (b or beta)
is zero
Mean Squares and F-ratio
SS regression
MS regression
df regression
SSresidual
MS residual
df residual
MS regression
F
MS residual
Figure 14.18 Partitioning SS
and df in Regression Analysis
Learning Check
• A linear regression has b = 3 and a = 4.
What is the “predicted Y” (Ŷ) for X = 7?
A • 14
B • 25
C • 31
D • Cannot be determined
Learning Check - Answer
• A linear regression has b = 3 and a = 4.
What is the predicted Y for X = 7?
A • 14
B • 25
C • 31
D • Cannot be determined
Learning Check
• Decide if each of the following statements
is True or False
Concepts
?
Any
Questions
?