Chapter 8
Chapter 8
Correlation and
Regression
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1
Correlation and Regression CHAPTER
10
1.1
Descriptive and Inferential
Outline
Statistics
10-1 Scatter Plots and Correlation
10-2 Regression
10-3 Coefficient of Determination and Standard
Error of the Estimate
10-4 Multiple Regression (Optional)
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2012 The McGraw-Hill Companies, Inc. Slide 2
Learning Objectives
1 Draw
1.1 a scatter plot for a set of ordered pairs.
2 Compute the correlation coefficient.
Descriptive and inferential
3 Test the hypothesis H0: ρ = 0.
statistics
4 Compute the equation of the regression line.
5 Compute the coefficient of determination.
6 Compute the standard error of the estimate.
7 Find a prediction interval.
8 Be familiar with the concept of multiple
regression.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Introduction
• In addition to hypothesis testing and
confidence intervals, inferential statistics
involves determining whether a
relationship between two or more
numerical or quantitative variables exists.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 4
Introduction
• Correlation is a statistical method used
to determine whether a linear relationship
between variables exists.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 5
Introduction
• The purpose of this chapter is to answer
these questions statistically:
1. Are two or more variables related?
2. If so, what is the strength of the
relationship?
3. What type of relationship exists?
4. What kind of predictions can be
made from the relationship?
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 6
Introduction
1. Are two or more variables related?
2. If so, what is the strength of the
relationship?
To answer these two questions, statisticians use
the correlation coefficient, a numerical
measure to determine whether two or more
variables are related and to determine the
strength of the relationship between or among
the variables.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 7
Introduction
3. What type of relationship exists?
There are two types of relationships: simple and
multiple.
In a simple relationship, there are two variables:
an independent variable (predictor variable) and
a dependent variable (response variable).
In a multiple relationship, there are two or more
independent variables that are used to predict
one dependent variable.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 8
Introduction
4. What kind of predictions can be made
from the relationship?
Predictions are made daily in all areas. Examples
include weather forecasting, stock market
analyses, sales predictions, crop predictions,
gasoline price predictions, and sports predictions.
Some predictions are more accurate than others,
due to the strength of the relationship. That is, the
stronger the relationship is between variables, the
more accurate the prediction is.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 9
10.1 Scatter Plots and Correlation
• A scatter plot is a graph of the ordered
pairs (x, y) of numbers consisting of the
independent variable x and the
dependent variable y.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 10
Chapter 10
Correlation and Regression
Section 10-1
Example 10-1
Page #552
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 11
Example 10-1: Car Rental Companies
Construct a scatter plot for the data shown for car rental
companies in the United States for a recent year.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 12
Example 10-1: Car Rental Companies
Positive Relationship
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 13
Chapter 10
Correlation and Regression
Section 10-1
Example 10-2
Page #552
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 14
Example 10-2: Absences/Final Grades
Construct a scatter plot for the data obtained in a study on
the number of absences and the final grades of seven
randomly selected students from a statistics class.
Negative Relationship
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 16
Chapter 10
Correlation and Regression
Section 10-1
Example 10-3
Page #553
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 17
Example 10-3: Age and Wealth
A researcher wishes to see if there is a relationship
between the ages of the wealthiest people in the world
and their net worth. The data shows a random sample of
10 persons selected from the Forbes list of the 400 richest
people for a recent year.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 18
Example 10-3: Age and Wealth
No Relationship
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 19
Correlation
• The population correlation coefficient,
denoted by , is computed by using all possible
pairs of data values (x, y) taken from a
population.
• The linear correlation coefficient, denoted by
r, is computed from the sample data and
measures the strength and direction of a linear
relationship between two quantitative variables.
The one explained in this section is called the
Pearson product moment correlation
coefficient (PPMC).
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 20
Correlation
• The range of the correlation coefficient is from
1 to 1.
• If there is a strong positive linear
relationship between the variables, the value
of r will be close to 1.
• If there is a strong negative linear
relationship between the variables, the value
of r will be close to 1.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 21
Correlation
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 22
Correlation Coefficient
The formula for the correlation coefficient is
n xy
x y
r
n x x n y y 2 2
2 2
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 23
Chapter 10
Correlation and Regression
Section 10-1
Example 10-4
Page #556
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 24
Example 10-4: Car Rental Companies
Compute the correlation coefficient for the data in
Example 10–1.
Cars x Income y
Company (in 10,000s) (in billions) xy x2 y2
A 63.0 7.0 441.00 3969.00 49.00
B 29.0 3.9 113.10 841.00 15.21
C 20.8 2.1 43.68 432.64 4.41
D 19.1 2.8 53.48 364.81 7.84
E 13.4 1.4 18.76 179.56 1.96
F 8.5 1.5 12.75 72.25 2.25
Σx= Σy= Σ xy = Σ x2 = Σ y2 =
153.8 18.7 682.77 5859.26 80.67
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 25
Example 10-4: Car Rental Companies
Compute the correlation coefficient for the data in
Example 10–1.
Σx = 153.8, Σy = 18.7, Σxy = 682.77, Σx2 = 5859.26,
Σy2 = 80.67, n = 6
n xy
x y
r
n x x n y y
2 2
2 2
r
6 682.77 153.8 18.7
6 5859.26 153.8 2 6 80.67 18.7 2
r 0.982 (strong positive relationship)
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 26
Chapter 10
Correlation and Regression
Section 10-1
Example 10-5
Page #557
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 27
Example 10-5: Absences/Final Grades
Compute the correlation coefficient for the data in
Example 10–2.
Number of Final Grade
Student absences, x y (pct.) xy x2 y2
A 6 82 492 36 6,724
B 2 86 172 4 7,396
C 15 43 645 225 1,849
D 9 74 666 81 5,476
E 12 58 696 144 3,364
F 5 90 450 25 8,100
G 8 78 624 64 6,084
Σx= Σy= Σ xy = Σ x2 = Σ y2 =
57 511 3745 579 38,993
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 28
Example 10-5: Absences/Final Grades
Compute the correlation coefficient for the data in
Example 10–2.
Σx = 57, Σy = 511, Σxy = 3745, Σx2 = 579,
Σy2 = 38,993, n = 7
n xy
x y
r
n x x n y y
2 2
2 2
r
7 3745 57 511
7 579 57 2 7 38, 993 5112
r 0.944 (strong negative relationship)
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 29
Chapter 10
Correlation and Regression
Section 10-1
Example 10-6
Page #558
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 30
Example 10-6: Age and Wealth
Compute the value of the correlation coefficient for the
data given in Example 10–3 for the age and wealth of the
richest persons in the United States.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 31
Example 10-6: Age and Wealth
n 2
t r
1 r2
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 34
Chapter 10
Correlation and Regression
Section 10-1
Example 10-7
Page #559
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 35
Example 10-7: Car Rental Companies
Test the significance of the correlation coefficient found in
Example 10–4. Use α = 0.05 and r = 0.982.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 36
Example 10-7: Car Rental Companies
Step 3: Compute the test value.
n 2 6 2
t r 2 0.982 10.398
1 r 1 0.982
2
Section 10-1
Example 10-8
Page #561
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 38
Example 10-8: Age and Wealth
Using Table I, test the significance at α = 0.01 of the
correlation coefficient r = 0.307, obtained in Example 10–6.
H0: ρ = 0 and H1: ρ 0
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 39
Possible Relationships Between
Variables
When the null hypothesis has been rejected for a specific
a value, any of the following five possibilities can exist.
1. There is a direct cause-and-effect relationship
between the variables. That is, x causes y.
2. There is a reverse cause-and-effect relationship
between the variables. That is, y causes x.
3. The relationship between the variables may be
caused by a third variable.
4. There may be a complexity of interrelationships
among many variables.
5. The relationship may be coincidental.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 40
Possible Relationships Between
Variables
1. There is a direct cause-and-effect relationship
between the variables. That is, x causes y.
For example,
o water causes plants to grow
o poison causes death
o heat causes ice to melt
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 41
Possible Relationships Between
Variables
2. There is a reverse cause-and-effect relationship
between the variables. That is, y causes x.
For example,
o Suppose a researcher believes excessive coffee
consumption causes nervousness, but the
researcher fails to consider that the reverse
situation may occur. That is, it may be that an
extremely nervous person craves coffee to calm
his or her nerves.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 42
Possible Relationships Between
Variables
3. The relationship between the variables may be
caused by a third variable.
For example,
o If a statistician correlated the number of deaths
due to drowning and the number of cans of soft
drink consumed daily during the summer, he or
she would probably find a significant relationship.
However, the soft drink is not necessarily
responsible for the deaths, since both variables
may be related to heat and humidity.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 43
Possible Relationships Between
Variables
4. There may be a complexity of interrelationships
among many variables.
For example,
o A researcher may find a significant relationship
between students’ high school grades and college
grades. But there probably are many other
variables involved, such as IQ, hours of study,
influence of parents, motivation, age, and
instructors.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 44
Possible Relationships Between
Variables
5. The relationship may be coincidental.
For example,
o A researcher may be able to find a significant
relationship between the increase in the number of
people who are exercising and the increase in the
number of people who are committing crimes. But
common sense dictates that any relationship
between these two values must be due to
coincidence.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 45
10.2 Regression
• If the value of the correlation coefficient is
significant, the next step is to determine
the equation of the regression line
which is the data’s line of best fit.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 46
Regression
• Best fit means that the sum of the
squares of the vertical distance from
each point to the line is at a minimum.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 47
Regression Line y a bx
a
y x 2
x xy
n x x 2 2
n xy x y
b
n x x 2 2
where
a = y intercept
b = the slope of the line.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 48
Coefficient of Determination and
10.3 Standard
Finding theError of the Estimate
Regression Line Equation
Step 2 Find the values of xy, x2, and y2. Place them
in the appropriate columns and sum each
column.
Coefficient of Determination and
10.3 Standard
Finding theError
Correlation Coefficient and the
of the Estimate
Regression Line Equation
Step 3 Substitute in the formula to find the value of
r.
Section 10-2
Example 10-9
Page #568
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 51
Example 10-9: Car Rental Companies
Find the equation of the regression line for the data in
Example 10–4, and graph the line on the scatter plot.
Σx = 153.8, Σy = 18.7, Σxy = 682.77, Σx2 = 5859.26,
Σy2 = 80.67, n = 6
y x x xy
2
a
n x x 2 2
18.7 5859.26 153.8 682.77
0.396
6 5859.26 153.8
2
n xy
x y 6 682.77 153.8 18.7 0.106
b
n x x
2
6 5859.26 153.8
2 2
y a bx y 0.396 0.106 x
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 52
Example 10-9: Car Rental Companies
Find two points to sketch the graph of the regression line.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 53
Example 10-9: Car Rental Companies
Find the equation of the regression line for the data in
Example 10–4, and graph the line on the scatter plot.
y 0.396 0.106 x
40, 4.636
15, 1.986
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 54
Assumptions for Valid Predictions
1. The sample is a random sample.
2. For any specific value of the independent variable
x, the value of the dependent variable y must be
normally distributed about the regression line.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 55
Assumptions for Valid Predictions
3. The standard deviation of each of the
dependent variables must be the same for
each value of the independent variable.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 56
Chapter 10
Correlation and Regression
Section 10-2
Example 10-11
Page #569
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 57
Example 10-11: Car Rental Companies
Use the equation of the regression line to predict the
income of a car rental agency that has 200,000
automobiles.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 58
Regression
• The magnitude of the change in one variable
when the other variable changes exactly 1 unit
is called a marginal change. The value of
slope b of the regression line equation
represents the marginal change.
• For valid predictions, the value of the
correlation coefficient must be significant.
• When r is not significantly different from 0, the
best predictor of y is the mean of the data
values of y.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 59
Extrapolations (Future Predictions)
• Extrapolation, or making predictions beyond
the bounds of the data, must be interpreted
cautiously.
• Remember that when predictions are made,
they are based on present conditions or on the
premise that present trends will continue. This
assumption may or may not prove true in the
future.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 60
10.3 Coefficient of Determination
and Standard Error of the Estimate
• The total variation y y is the
2
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 61
Variation
• The variation obtained from the
relationship (i.e., from the predicted y'
values) is y y and is called the
2
explained variation.
• Variation due to chance, found by
y y , is called the unexplained
2
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 62
Variation
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 63
Coefficient of Determiation
• The coefficient of determination is the
ratio of the explained variation to the total
variation.
• The symbol for the coefficient of
determination is r 2.
2 explained variation
• r
total variation
• Another way to arrive at the value for r 2
is to square the correlation coefficient.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 64
Coefficient of Nondetermiation
• The coefficient of nondetermination is
a measure of the unexplained variation.
• The formula for the coefficient of
nondetermination is 1.00 – r 2.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 65
Standard Error of the Estimate
• A prediction interval is an interval estimate
of a predicted value of y when the
regression equation is used and a specific
value of x is given.
• The standard error of the estimate,
denoted by sest is the standard deviation of
the observed y values about the predicted
y' values. The formula for the standard error
of estimate is: y y
2
sest
n 2
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 66
Chapter 10
Correlation and Regression
Section 10-3
Example 10-12
Page #587
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 67
Example 10-12: Copy Machine Costs
A researcher collects the following data and determines
that there is a significant relationship between the age of a
copy machine and its monthly maintenance cost. The
regression equation is y = 55.57 + 8.13x. Find the
standard error of the estimate.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 68
Example 10-12: Copy Machine Costs
Age x Monthly
Machine (years) cost, y y¢ y–y¢ (y – y ¢)2
A 1 62 63.70 –1.70 2.89
B 2 78 71.83 6.17 38.0689
C 3 70 79.96 –9.96 99.2016
D 4 90 88.09 1.91 3.6481
E 4 93 88.09 4.91 24.1081
F 6 103 104.35 –1.35 1.8225
169.7392
y 55.57 8.13 x
y 55.57 8.13 1 63.70 y y
2
sest
y 55.57 8.13 2 71.83 n 2
y 55.57 8.13 3 79.96 169.7392
sest 6.514
y 55.57 8.13 4 88.09 4
y 55.57 8.13 6 104.35
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 69
Chapter 10
Correlation and Regression
Section 10-3
Example 10-13
Page #589
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 70
Example 10-13: Copy Machine Costs
Find the standard error of the estimate for the data for
Example 10–12 by using the formula below. The equation of
the regression line is y = 55.57 + 8.13x.
sest
a y b xy
y 2
n 2
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 71
Example 10-13: Copy Machine Costs
sest
a y b xy
y 2
n 2
42,186 55.57 496 8.13 1778
sest 6.483
4
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 72
Formula for the Prediction Interval
about a Value y
n x X
2
1
y t 2 sest 1 y
n n x 2 x 2
n x X
2
1
y t 2 sest 1
n n x 2 x 2
with d.f. = n 2
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 73
Chapter 10
Correlation and Regression
Section 10-3
Example 10-14
Page #590
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 74
Example 10-14: Copy Machine Costs
For the data in Example 10–12, find the 95% prediction
interval for the monthly maintenance cost of a machine
that is 3 years old.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 75
Example 10-14: Copy Machine Costs
Step 4: Substitute in the formula and solve.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 76
Example 10-14: Copy Machine Costs
Step 4: Substitute in the formula and solve.
79.96 19.43 y 79.96 19.43
60.53 y 99.39
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 77
10.4 Multiple Regression (Optional)
In multiple regression, there are several
independent variables and one dependent
variable, and the equation is
y a b1 x1 b2 x2 bk xk
where
x1 , x2 , , xk = independent variables.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 78
Multiple Correlation Coefficient
• In multiple regression, as in simple
regression, the strength of the
relationship between the independent
variables and the dependent variable is
measured by a correlation coefficient.
• This multiple correlation coefficient is
symbolized by R.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 79
Assumptions for Multiple Regression
1. normality assumption—for any specific value of the
independent variable, the values of the y variable are
normally distributed.
2. equal-variance assumption—the variances (or
standard deviations) for the y variables are the same
for each value of the independent variable.
3. linearity assumption—there is a linear relationship
between the dependent variable and the independent
variables.
4. nonmulticollinearity assumption—the independent
variables are not correlated.
5. independence assumption—the values for the y
variables are independent.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 80
Multiple Correlation Coefficient
The formula for R is
2 2
r yx1 r yx2 2ryx1 ryx2 rx1 x2
R 2
1 r x1 x2
where
ryx1 = correlation coefficient for y and x1
ryx2 = correlation coefficient for y and x2
rx1x2 = correlation coefficient for x1 and x2
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 81
Chapter 10
Correlation and Regression
Section 10-4
Example 10-15
Page #595
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 82
Example 10-15: State Board Scores
A nursing instructor wishes to see whether a student’s
grade point average and age are related to the student’s
score on the state board nursing examination. She
selects five students and obtains the following data.
Find the value of R.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 83
Example 10-15: State Board Scores
A nursing instructor wishes to see whether a student’s
grade point average and age are related to the student’s
score on the state board nursing examination. She
selects five students and obtains the following data.
Find the value of R.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 84
Example 10-15: State Board Scores
ryx2 1 ryx2 2 2ryx1 ryx2 rx1 x2
R
1 rx21x2
R
1 0.371
2
R 0.989
Hence, the correlation between a student’s grade point
average and age with the student’s score on the nursing
state board examination is 0.989. In this case, there is a
strong relationship among the variables; the value of R is
close to 1.00.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 85
F Test for Significance of R
The formula for the F test is
2
R k
F
1 R n k 1
2
where
n = the number of data groups
k = the number of independent variables.
d.f.N. = n – k
d.f.D. = n – k – 1
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 86
Chapter 10
Correlation and Regression
Section 10-4
Example 10-16
Page #596
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 87
Example 10-16: State Board Scores
Test the significance of the R obtained in Example 10–15
at α = 0.05.
R2 k
F
1 R 2
n k 1
0.978 2
F 44.45
1 0.978 5 2 1
The critical value obtained from Table H with a 0.05,
d.f.N. = 3, and d.f.D. = 2 is 19.16. Hence, the decision is
to reject the null hypothesis and conclude that there is a
significant relationship among the student’s GPA, age,
and score on the nursing state board examination.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 88
Adjusted R2
The formula for the adjusted R2 is
R 1
2 1 R n 1 2
adj
n k 1
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 89
Chapter 10
Correlation and Regression
Section 10-4
Example 10-17
Page #597
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 90
Example 10-17: State Board Scores
Calculate the adjusted R2 for the data in Example 10–16.
The value for R is 0.989.
Radj
2
1
1 R 2
n 1
n k 1
Radj
2
1
1 0.989 2
5 1 0.956
5 2 1
In this case, when the number of data pairs and the
number of independent variables are accounted for, the
adjusted multiple coefficient of determination is 0.956.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 10 91