Chapter 4 PowerPoint
Chapter 4 PowerPoint
4-1 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
STATISTICS
INFORMED DECISIONS USING DATA
Chapter 4
Describing the
Relation between
Two Variables
4-2 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and
Correlation
Learning Objectives
1. Draw and interpret scatter diagrams
2. Describe the properties of the linear correlation
coefficient
3. Compute and interpret the linear correlation
coefficient
4. Determine whether a linear relation exists between
two variables
5. Explain the difference between correlation and
causation
4-3 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
The response variable is the variable
whose value can be explained by the value
of the explanatory or predictor variable.
4-4 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Objective 1
• Draw and interpret scatterplot
4-5 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
A scatterplot (or scatter diagram) is a graph
that shows the relationship between two
quantitative variables measured on the same
individual.
• Each individual in the data set is represented
by a point in the scatterplot.
• The explanatory variable is plotted on the
horizontal axis, and the response variable is
plotted on the vertical axis.
4-6 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Scatter diagrams show the type of relation that exists
between two variables and can imply a linear relation, a
nonlinear relation, or no relation among variables.
4-7 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Scatter diagrams show the type of relation that exists
between two variables and can imply a linear relation, a
nonlinear relation, or no relation among variables.
4-8 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
The figure shown displays various scatter diagrams and
the type of relation implied.
4-9 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
For two linearly related variables, we say that there is a
positive linear correlation between x and y, when as the
x values increase, the corresponding y values also
increase.
4-10 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
For two variables that are linearly related, we say that
there is a negative linear correlation between x and y,
when as the x values increase, the corresponding y values
decrease.
4-11 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Objective 2
• Describe the Properties of the Linear
Correlation Coefficient
4-12 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
The linear correlation coefficient or Pearson product
moment correlation coefficient is a measure of the
strength and direction of the linear relation between two
quantitative variables.
Notation:
• ρ (rho): population correlation coefficient
• r: sample correlation coefficient
4-13 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Sample Linear Correlation Coefficient
4-14 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Properties of the Linear Correlation Coefficient
1. The linear correlation coefficient is always between
–1 and 1, inclusive. That is, –1 ≤ r ≤ 1.
2. r measures the strength of a linear relationship. It is
not designed to measure the strength of a relationship
that is not linear. r values near 0 indicate the
unlikeliness
3. If all values of either variable are converted to a
different scale, the value of r does not change. In other
words, r is a unitless quantity
4. r is very sensitive to outliers in the sense that a single
outlier could dramatically affect its value.
4-16 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Values of the Linear Correlation Coefficient
4-17 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
PRACTICE Identifying r values
Match each
correlation coefficient
below to a the most
appropriate graph at
right.
4-18 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Objective 3
• Compute and Interpret the Linear Correlation
Coefficient
4-19 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Computing the Correlation Coefficient
(1 of 4) by Hand
For the data shown in Table 2, compute the linear
correlation coefficient. A scatter diagram of the data is
shown in the figure. The dashed lines on the scatter
diagram represent the mean of x and the mean of y.
4-20 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Computing the Correlation Coefficient
(2 of 4) by Hand
Recall, the formula for correlation coefficient.
4-21 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Computing the Correlation Coefficient
(3 of 4) by Hand
Step 1 Columns 1 and 2 of Table 3 contain the data from
Table 2.
Step 2 We determine the squares of each x and y value,
recording respectively in Columns 3 and 4 of Table 3.
Table 3:
4-22 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Computing the Correlation Coefficient
(3 of 4) by Hand
Step 1 Columns 1 and 2 of Table 3 contain the data from
Table 2.
Step 2 We determine the squares of each x and y value,
recording respectively in Columns 3 and 4 of Table 3.
Table 3:
4-23 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Computing the Correlation Coefficient
(3 of 4) by Hand
Step 3 Calculate the product of each x value with paired y value
and record in column 5.
Step 4 Calculate the sum of each column and record in bottom
row.
Table 3:
4-24 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Computing the Correlation Coefficient
(3 of 4) by Hand
Step 3 Calculate the product of each x value with paired y value
and record in column 5.
Step 4 Calculate the sum of each column and record in bottom
row.
Table 3:
4-25 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Computing the Correlation Coefficient
(3 of 4) by Hand
Step 3 Calculate the product of each x value with paired y value
and record in column 5.
Step 4 Calculate the sum of each column and record in bottom
row.
Table 3:
4-26 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Computing the Correlation Coefficient
(4 of 4) by Hand
Step 6 Substitute calculated values into formula for r.
5(148) − (20)(50)
𝑟=
5 104 − (20)² 5(626) − (50)²
!"#$
𝑟= = -0.946
%"$ #&$
4-27 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Computing the Correlation Coefficient
(4 of 4) by Hand
Step 6 Substitute calculated values into formula for r.
The
correlation
coefficient
5(148) − (20)(50) suggests a
𝑟= strong
5 104 − (20)² 5(626) − (50)² negative
association
!"#$ between
𝑟= = -0.946 the two
%"$ #&$
variables.
4-28 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Determining the Linear Correlation
(1 of 2) Coefficient Using Technology
Use StatCrunch to draw a scatter diagram of the data in
Table 2. Then confirm the type of correlation identified
in previous example.
4-29 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Objective 5
• Explain the Difference between Correlation
and Causation
4-30 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
If data used in a study are observational, we cannot
conclude the two correlated variables have a causal
relationship.
4-31 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Examples:
4-32 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Tuesday
Start here
4-33 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Objective 1
• Find the Least-Squares Regression Line and
Use the Line to Make Predictions
4-40 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
The difference between the observed value of y and
the predicted value of y is the error, or residual.
4-41 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
4-42 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Least-Squares Regression Criterion
The least-squares regression line is the line
that minimizes the sum of the squared errors (or
residuals).
This regression line is the straight line that
“best” fits the scatterplot of the data.
4-43 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Least-Squares Regression Line Example
Consider x=8
• Observed value: y=4
• Predicted value:
𝑦/ = 1 + 8 = 9
• Residual:
𝑦 − 𝑦/ = 4 − 9 = −5
4-44 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
The Least-Squares Regression Line
The equation of the least-squares regression line is given by
ŷ = b1 x + b0
where
4-45 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(1 of 2) Line by Hand
Find the least-squares regression line for
the data in Table 2, used earlier.
4-46 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(1 of 2) Line by Hand
Find the least-squares regression line for
the data in Table 2, used earlier.
4-47 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(2 of 2) Line by Hand
Substitute x = 4, y = 10, and b1 = -2.1676 into
Formula (3):
b0 = y - b1 x = 10 - (-2.1676)(4) = 18.6704
4-48 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(1 of 6) Line Using Technology
The table below contains a simple random sample of lottery
games along with the jackpot amount (in millions of dollars)
and tickets sold (in millions) for each.
4-49 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(1 of 6) Line Using Technology
Use the jackpot data on the previous slide to answer the
questions below.
(a) Create a scatter plot of the data and determine whether
there seems to be a linear correlation between jackpot
amount and tickets sold.
(b) Find the least-squares regression line.
(c) Draw the least-squares regression line on the scatter
diagram of the data.
(d) Predict the number of lottery tickets sold when the
jackpot is $625 million.
(e) In reality, 90 million tickets were sold when the jackpot
was $625 million. Determine the residual for the
predicted value found in part (d).
4-50 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(2 of 6) Line Using Technology
(a) The plot below was created in StatCrunch. Notice, the
data points seem to follow a pattern which is fairly linear. It
is reasonable to proceed by finding a regression line.
4-51 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(3 of 6) Line Using Technology
(b) The figure show the output obtained from the
StatCrunch. The least-squares regression line (where x:
jackpot amount and y: tickets sold) is:
𝑦/ = 0.174𝑥 − 10.872
StatCrunch
4-52 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(3 of 6) Line Using Technology
(c)
4-53 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(5 of 6) Line Using Technology
(d) Let x = 625 in the least-squares regression equation
𝑦/ = 0.174𝑥 − 10.872 to predict the number of millions
of lottery tickets purchased.
4-54 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(6 of 6) Line Using Technology
(e) Residual = observed y – predicted y
Residual = y - yˆ
==274 - 270.3
90 -97.9
==3.7
-7.9
yards
4-55 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Objective 3
• Compute the Sum of Squared Residuals
4-56 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Comparing the Sum of Squared Residuals
(1 of 3) and Relative Frequency Distribution
Compare the sum of squared residuals for the data given in
Table 2 (as used in previous examples).
Jackpot (x) Tickets (y) 1=0.174x-10.872
𝒚 1
y-𝒚 1)²
(y - 𝒚
334 54
127 16
300 41
227 27
202 23
180 18
164 18
145 16
255 26
4-57 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Comparing the Sum of Squared Residuals
(1 of 3) and Relative Frequency Distribution
Compare the sum of squared residuals for the data given in
Table 2 (as used in previous examples).
Jackpot (x) Tickets (y) 1=0.174x-10.872
𝒚 1
y-𝒚 1)²
(y - 𝒚
334 54 47.24
127 16 11.23
300 41 41.33
227 27 28.63
202 23 24.28
180 18 20.45
164 18 17.66
145 16 14.36
255 26 33.50
4-58 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Comparing the Sum of Squared Residuals
(1 of 3) and Relative Frequency Distribution
Compare the sum of squared residuals for the data given in
Table 2 (as used in previous examples).
Jackpot (x) Tickets (y) 1=0.174x-10.872
𝒚 1
y-𝒚 1)²
(y - 𝒚
334 54 47.24 6.76
127 16 11.23 4.77
300 41 41.33 -0.33
227 27 28.63 -1.63
202 23 24.28 -1.28
180 18 20.45 -2.45
164 18 17.66 0.34
145 16 14.36 1.64
255 26 33.50 -7.50
4-59 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Comparing the Sum of Squared Residuals
(1 of 3) and Relative Frequency Distribution
Compare the sum of squared residuals for the data given in
Table 2 (as used in previous examples).
Jackpot (x) Tickets (y) 1=0.174x-10.872
𝒚 1
y-𝒚 1)²
(y - 𝒚
334 54 47.24 6.76 45.64
127 16 11.23 4.77 22.79
300 41 41.33 -0.33 0.11
227 27 28.63 -1.63 2.64
202 23 24.28 -1.28 1.63
180 18 20.45 -2.45 5.99
164 18 17.66 0.34 0.11
145 16 14.36 1.64 2.70
255 26 33.50 -7.50 56.22
&)² = 137.84
Σ(y − 𝒚
4-60 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Comparing the Sum of Squared Residuals
(2 of 3) and Relative Frequency Distribution
The table on the previous slide contains the value of the explanatory
variable in Column 1. Column 2 contains the corresponding response
variable. Column 3 contains the predicted values regression line
equation: 𝒚7=0.174x-10.872
Column 5 contains the squares of the residuals, and adding all such
values yields the sum of squared residuals.
4-61 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Comparing the Sum of Squared Residuals
(3 of 3) and Relative Frequency Distribution
The sum of the squared residuals for the line in the lottery ticket
example is 137.84; Again, any line other than the least-squares
regression line will have a sum of squared residuals that is
greater than 137.84.
4-62 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-
Squares Regression Line
Learning Objectives
1. Compute and interpret the coefficient of
determination
2. Perform residual analysis on a regression
model
3. Identify influential observations
4-63 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Objective 1
• Compute and interpret the coefficient of
determination
4-64 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
The coefficient of determination, r2,
measures the proportion of total variation in
the response variable that is explained by the
least-squares regression line.
The coefficient of determination is a number
between 0 and 1, inclusive. That is, 0 ≤ r2 ≤ 1
If r2 = 0 the line has no explanatory value.
If r2 = 1 means the line explains 100% of the
variation in the response variable.
4-65 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
To find the coefficient of determination, r2, for the
least-squares regression model ŷ = b1 x + b0 (that is, a
single explanatory variable to the first degree), square
the linear correlation coefficient.
Caution!
Squaring the linear correlation coefficient to obtain
the coefficient of determination works only for the
least-squares linear regression model
yÙ= b1 x + b0
The method does not work in general.
4-66 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Determining the Coefficient of Determination
4-68 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Residuals play an important role in determining
the adequacy of the linear model. In fact,
residuals can be used for the following purposes:
• To determine whether a linear model is
appropriate to describe the relation between
the predictor and response variables.
• To determine whether the variance of the
residuals is constant.
• To check for outliers.
4-69 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
To determine if a linear model is appropriate,
we also need to draw a residual plot, which is
a scatter diagram with the residuals on the
vertical axis and the explanatory variable on
the horizontal axis.
4-70 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
The residual plot in Figure (a) does not show any pattern, so a
linear model is appropriate. However, the residual plot in Figure
(b) shows a U-shaped pattern, which indicates that a linear model
is inappropriate.
4-71 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
More residual plot examples
4-72 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Is a Linear Model Appropriate?
(1 of 3)
The data in Table 8 were
collected by placing a
temperature probe in a portable
heater, removing the probe, and
recording the temperature (in
degrees Fahrenheit) over time
(in seconds). Determine
whether the relation between
the temperature of the probe and
time is linear.
4-73 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Is a Linear Model Appropriate?
(2 of 3)
Enter the data into statistical software or a graphing calculator
with advanced statistical features. Figure (a) shows a scatter
diagram of the data from StatCrunch. Temperature and time
appear to be linearly related with a negative slope.
4-74 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Is a Linear Model Appropriate?
(3 of 3)
Use statistical software to determine the least-squares regression
line and store the residuals. Figure (b) shows a plot of the
residuals versus the explanatory variable, time, using StatCrunch.
4-75 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
An outlier can be thought of as an observation whose
response variable is inconsistent with the overall
pattern of the data. To determine outliers, either:
4-76 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Identifying Outliers
4-78 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Definitions:
• In a scatterplot, an outlier is a point lying far away from
the other data points.
• Paired sample data may include one or more influential
points, which are points that strongly affect the graph of
the regression line.
4-79 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Influence is affected by two factors:
1. the relative vertical position of the observation
(residuals) and
2. the relative horizontal position of the observation
(leverage).
4-80 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Influential Points vs. Outliers
Influential points tend to be outliers (see left) but not all outliers
are influential points (as they may or may not drastically change
the regression line.
4-81 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Identifying Influential Observations
4-82 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
In-class example 2
4-83 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved