0% found this document useful (0 votes)
42 views

Chapter 4 PowerPoint

Uploaded by

kelvintruong1820
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Chapter 4 PowerPoint

Uploaded by

kelvintruong1820
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Announcements

• Weighted mean example: watch Canvas video

4-1 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
STATISTICS
INFORMED DECISIONS USING DATA

Chapter 4
Describing the
Relation between
Two Variables

4-2 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and
Correlation
Learning Objectives
1. Draw and interpret scatter diagrams
2. Describe the properties of the linear correlation
coefficient
3. Compute and interpret the linear correlation
coefficient
4. Determine whether a linear relation exists between
two variables
5. Explain the difference between correlation and
causation
4-3 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
The response variable is the variable
whose value can be explained by the value
of the explanatory or predictor variable.

4-4 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Objective 1
• Draw and interpret scatterplot

4-5 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
A scatterplot (or scatter diagram) is a graph
that shows the relationship between two
quantitative variables measured on the same
individual.
• Each individual in the data set is represented
by a point in the scatterplot.
• The explanatory variable is plotted on the
horizontal axis, and the response variable is
plotted on the vertical axis.

4-6 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Scatter diagrams show the type of relation that exists
between two variables and can imply a linear relation, a
nonlinear relation, or no relation among variables.

4-7 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Scatter diagrams show the type of relation that exists
between two variables and can imply a linear relation, a
nonlinear relation, or no relation among variables.

Correlation: The distinct pattern of No Correlation: The plotted points


the plotted points suggests that there do not show a distinct pattern, so it
is a correlation between overhead appears that there is no correlation
widths and weights of seals. between heights of presidents and
heights of their main opponents.

4-8 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
The figure shown displays various scatter diagrams and
the type of relation implied.

Notice the difference between Figure (a) and Figure (b).


The data follow a linear pattern that slants upward to the
right in Figure (a) and downward to the right in Figure
(b). Figures (c) and (d) show nonlinear relations. In
Figure (e), there is no relation between the explanatory
and response variables.

4-9 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
For two linearly related variables, we say that there is a
positive linear correlation between x and y, when as the
x values increase, the corresponding y values also
increase.

4-10 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
For two variables that are linearly related, we say that
there is a negative linear correlation between x and y,
when as the x values increase, the corresponding y values
decrease.

4-11 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Objective 2
• Describe the Properties of the Linear
Correlation Coefficient

4-12 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
The linear correlation coefficient or Pearson product
moment correlation coefficient is a measure of the
strength and direction of the linear relation between two
quantitative variables.

Notation:
• ρ (rho): population correlation coefficient
• r: sample correlation coefficient

4-13 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Sample Linear Correlation Coefficient

Notation for the Linear Correlation Coefficient


• n number of pairs of sample data.
• ∑ denotes addition of the items indicated.
• ∑x sum of all x values (explanatory variable).
• ∑x² indicates that each x value should be squared and then those
squares added.
• (∑x)² indicates that the x values should be added and the total
then squared. Avoid confusing ∑x² and (∑x)².

4-14 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Properties of the Linear Correlation Coefficient
1. The linear correlation coefficient is always between
–1 and 1, inclusive. That is, –1 ≤ r ≤ 1.
2. r measures the strength of a linear relationship. It is
not designed to measure the strength of a relationship
that is not linear. r values near 0 indicate the
unlikeliness
3. If all values of either variable are converted to a
different scale, the value of r does not change. In other
words, r is a unitless quantity
4. r is very sensitive to outliers in the sense that a single
outlier could dramatically affect its value.
4-16 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Values of the Linear Correlation Coefficient

• r can range in value from -1 to 1.


• data with r close to 1 or -1 display strong correlation
• positive correlation indicates an upward trend in the data
• negative correlation indicates a downward trend in the data

4-17 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
PRACTICE Identifying r values

Match each
correlation coefficient
below to a the most
appropriate graph at
right.

4-18 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Objective 3
• Compute and Interpret the Linear Correlation
Coefficient

4-19 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Computing the Correlation Coefficient
(1 of 4) by Hand
For the data shown in Table 2, compute the linear
correlation coefficient. A scatter diagram of the data is
shown in the figure. The dashed lines on the scatter
diagram represent the mean of x and the mean of y.

4-20 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Computing the Correlation Coefficient
(2 of 4) by Hand
Recall, the formula for correlation coefficient.

We will begin by organizing our data, (x,y) pairs, in a


table and calculating each piece of the formula above.

4-21 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Computing the Correlation Coefficient
(3 of 4) by Hand
Step 1 Columns 1 and 2 of Table 3 contain the data from
Table 2.
Step 2 We determine the squares of each x and y value,
recording respectively in Columns 3 and 4 of Table 3.

Table 3:

4-22 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Computing the Correlation Coefficient
(3 of 4) by Hand
Step 1 Columns 1 and 2 of Table 3 contain the data from
Table 2.
Step 2 We determine the squares of each x and y value,
recording respectively in Columns 3 and 4 of Table 3.

Table 3:

4-23 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Computing the Correlation Coefficient
(3 of 4) by Hand
Step 3 Calculate the product of each x value with paired y value
and record in column 5.
Step 4 Calculate the sum of each column and record in bottom
row.

Table 3:

4-24 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Computing the Correlation Coefficient
(3 of 4) by Hand
Step 3 Calculate the product of each x value with paired y value
and record in column 5.
Step 4 Calculate the sum of each column and record in bottom
row.

Table 3:

4-25 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Computing the Correlation Coefficient
(3 of 4) by Hand
Step 3 Calculate the product of each x value with paired y value
and record in column 5.
Step 4 Calculate the sum of each column and record in bottom
row.

Table 3:

4-26 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Computing the Correlation Coefficient
(4 of 4) by Hand
Step 6 Substitute calculated values into formula for r.

5(148) − (20)(50)
𝑟=
5 104 − (20)² 5(626) − (50)²

!"#$
𝑟= = -0.946
%"$ #&$

4-27 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Computing the Correlation Coefficient
(4 of 4) by Hand
Step 6 Substitute calculated values into formula for r.

The
correlation
coefficient
5(148) − (20)(50) suggests a
𝑟= strong
5 104 − (20)² 5(626) − (50)² negative
association
!"#$ between
𝑟= = -0.946 the two
%"$ #&$
variables.

4-28 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Determining the Linear Correlation
(1 of 2) Coefficient Using Technology
Use StatCrunch to draw a scatter diagram of the data in
Table 2. Then confirm the type of correlation identified
in previous example.

StatCrunch: “graph” -> Scatter


plot -> (select data)

4-29 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Objective 5
• Explain the Difference between Correlation
and Causation

4-30 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
If data used in a study are observational, we cannot
conclude the two correlated variables have a causal
relationship.

A lurking variable is related to both the explanatory


variable and response variable.

4-31 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Examples:

As air-conditioning bills increase, so does the crime rate. Does


this mean that folks should turn off their air conditioners so that
crime rates decrease? Certainly not! In this case, the lurking
variable is air temperature. As air temperatures rise, both air-
conditioning bills and crime rates rise.

Among all elementary school children, the relationship between


the number of cavities in a child’s teeth and the size of his or her
vocabulary is strong and positive. The lurking variable is age, but
it is not reasonable to say that getting cavities increases a kid’s
vocabulary.

4-32 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Tuesday
Start here

4.2 Least-Squares Regression


Learning Objectives

1. Find the least-squares regression line and use


the line to make predictions
2. Interpret the slope and the y-intercept of the
least-squares regression line
3. Compute the sum of squared residuals

4-33 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Objective 1
• Find the Least-Squares Regression Line and
Use the Line to Make Predictions

4-40 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
The difference between the observed value of y and
the predicted value of y is the error, or residual.

4-41 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
4-42 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Least-Squares Regression Criterion
The least-squares regression line is the line
that minimizes the sum of the squared errors (or
residuals).
This regression line is the straight line that
“best” fits the scatterplot of the data.

4-43 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Least-Squares Regression Line Example

Consider x=8
• Observed value: y=4
• Predicted value:
𝑦/ = 1 + 8 = 9
• Residual:
𝑦 − 𝑦/ = 4 − 9 = −5

4-44 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
The Least-Squares Regression Line
The equation of the least-squares regression line is given by
ŷ = b1 x + b0
where

4-45 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(1 of 2) Line by Hand
Find the least-squares regression line for
the data in Table 2, used earlier.

In a previous example we found


r = –0.946, x = 4, sx = 2.44949, y = 10,
and sy = 5.612486.

4-46 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(1 of 2) Line by Hand
Find the least-squares regression line for
the data in Table 2, used earlier.

In a previous example we found


r = –0.946, x = 4, sx = 2.44949, y = 10,
and sy = 5.612486.

Substitute r = – 0.946, sx = 2.44949, and sy = 5.612486


into Formula (2):
sy 5.612486
b1 = r × = -0.946 × = -2.1676
sx 2.44949

4-47 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(2 of 2) Line by Hand
Substitute x = 4, y = 10, and b1 = -2.1676 into
Formula (3):
b0 = y - b1 x = 10 - (-2.1676)(4) = 18.6704

The least-squares regression line is


yˆ = -2.1676 x + 18.6704

4-48 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(1 of 6) Line Using Technology
The table below contains a simple random sample of lottery
games along with the jackpot amount (in millions of dollars)
and tickets sold (in millions) for each.

4-49 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(1 of 6) Line Using Technology
Use the jackpot data on the previous slide to answer the
questions below.
(a) Create a scatter plot of the data and determine whether
there seems to be a linear correlation between jackpot
amount and tickets sold.
(b) Find the least-squares regression line.
(c) Draw the least-squares regression line on the scatter
diagram of the data.
(d) Predict the number of lottery tickets sold when the
jackpot is $625 million.
(e) In reality, 90 million tickets were sold when the jackpot
was $625 million. Determine the residual for the
predicted value found in part (d).
4-50 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(2 of 6) Line Using Technology
(a) The plot below was created in StatCrunch. Notice, the
data points seem to follow a pattern which is fairly linear. It
is reasonable to proceed by finding a regression line.

4-51 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(3 of 6) Line Using Technology
(b) The figure show the output obtained from the
StatCrunch. The least-squares regression line (where x:
jackpot amount and y: tickets sold) is:
𝑦/ = 0.174𝑥 − 10.872

StatCrunch

stat -> regression ->


simple linear

4-52 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(3 of 6) Line Using Technology
(c)

4-53 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(5 of 6) Line Using Technology
(d) Let x = 625 in the least-squares regression equation
𝑦/ = 0.174𝑥 − 10.872 to predict the number of millions
of lottery tickets purchased.

𝑦/ = 0.174 625 − 10.872 = 97.9


We predict that when the jackpot amount is set at $625
million, around 97.9 million people will purchase lottery
tickets.

4-54 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Finding the Least-Squares Regression
(6 of 6) Line Using Technology
(e) Residual = observed y – predicted y

Residual = y - yˆ
==274 - 270.3
90 -97.9
==3.7
-7.9
yards

Because the residual is negative (the observed value of


$90 million is less than the predicted value of $97.9
million), the regression line overestimates the ticket sales
for this jackpot amount.

4-55 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Objective 3
• Compute the Sum of Squared Residuals

4-56 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Comparing the Sum of Squared Residuals
(1 of 3) and Relative Frequency Distribution
Compare the sum of squared residuals for the data given in
Table 2 (as used in previous examples).
Jackpot (x) Tickets (y) 1=0.174x-10.872
𝒚 1
y-𝒚 1)²
(y - 𝒚
334 54

127 16

300 41

227 27

202 23

180 18

164 18

145 16

255 26

4-57 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Comparing the Sum of Squared Residuals
(1 of 3) and Relative Frequency Distribution
Compare the sum of squared residuals for the data given in
Table 2 (as used in previous examples).
Jackpot (x) Tickets (y) 1=0.174x-10.872
𝒚 1
y-𝒚 1)²
(y - 𝒚
334 54 47.24
127 16 11.23
300 41 41.33
227 27 28.63
202 23 24.28
180 18 20.45
164 18 17.66
145 16 14.36
255 26 33.50

4-58 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Comparing the Sum of Squared Residuals
(1 of 3) and Relative Frequency Distribution
Compare the sum of squared residuals for the data given in
Table 2 (as used in previous examples).
Jackpot (x) Tickets (y) 1=0.174x-10.872
𝒚 1
y-𝒚 1)²
(y - 𝒚
334 54 47.24 6.76
127 16 11.23 4.77
300 41 41.33 -0.33
227 27 28.63 -1.63
202 23 24.28 -1.28
180 18 20.45 -2.45
164 18 17.66 0.34
145 16 14.36 1.64
255 26 33.50 -7.50

4-59 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Comparing the Sum of Squared Residuals
(1 of 3) and Relative Frequency Distribution
Compare the sum of squared residuals for the data given in
Table 2 (as used in previous examples).
Jackpot (x) Tickets (y) 1=0.174x-10.872
𝒚 1
y-𝒚 1)²
(y - 𝒚
334 54 47.24 6.76 45.64
127 16 11.23 4.77 22.79
300 41 41.33 -0.33 0.11
227 27 28.63 -1.63 2.64
202 23 24.28 -1.28 1.63
180 18 20.45 -2.45 5.99
164 18 17.66 0.34 0.11
145 16 14.36 1.64 2.70
255 26 33.50 -7.50 56.22
&)² = 137.84
Σ(y − 𝒚

4-60 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Comparing the Sum of Squared Residuals
(2 of 3) and Relative Frequency Distribution
The table on the previous slide contains the value of the explanatory
variable in Column 1. Column 2 contains the corresponding response
variable. Column 3 contains the predicted values regression line
equation: 𝒚7=0.174x-10.872

In Column 4, we compute the residuals for each observation:


residual = observed y – predicted y = y - yˆ .

Column 5 contains the squares of the residuals, and adding all such
values yields the sum of squared residuals.

4-61 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Comparing the Sum of Squared Residuals
(3 of 3) and Relative Frequency Distribution
The sum of the squared residuals for the line in the lottery ticket
example is 137.84; Again, any line other than the least-squares
regression line will have a sum of squared residuals that is
greater than 137.84.

4-62 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-
Squares Regression Line
Learning Objectives
1. Compute and interpret the coefficient of
determination
2. Perform residual analysis on a regression
model
3. Identify influential observations

4-63 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Objective 1
• Compute and interpret the coefficient of
determination

4-64 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
The coefficient of determination, r2,
measures the proportion of total variation in
the response variable that is explained by the
least-squares regression line.
The coefficient of determination is a number
between 0 and 1, inclusive. That is, 0 ≤ r2 ≤ 1
If r2 = 0 the line has no explanatory value.
If r2 = 1 means the line explains 100% of the
variation in the response variable.

4-65 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
To find the coefficient of determination, r2, for the
least-squares regression model ŷ = b1 x + b0 (that is, a
single explanatory variable to the first degree), square
the linear correlation coefficient.

Caution!
Squaring the linear correlation coefficient to obtain
the coefficient of determination works only for the
least-squares linear regression model
yÙ= b1 x + b0
The method does not work in general.

4-66 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Determining the Coefficient of Determination

Determine and interpret the coefficient of determination,


r2, for the lottery ticket and jackpot example from earlier.
The figure shows the
results from
StatCrunch (same
display as earlier).

Interpretation: 89.7% of the variation in tickets sold is
explained by the least-squares regression line, and
10.3% of the variation in ticket sales is explained by
other factors.
4-67 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Objective 2
• Perform Residual Analysis on a Regression
Model

4-68 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Residuals play an important role in determining
the adequacy of the linear model. In fact,
residuals can be used for the following purposes:
• To determine whether a linear model is
appropriate to describe the relation between
the predictor and response variables.
• To determine whether the variance of the
residuals is constant.
• To check for outliers.

4-69 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
To determine if a linear model is appropriate,
we also need to draw a residual plot, which is
a scatter diagram with the residuals on the
vertical axis and the explanatory variable on
the horizontal axis.

If a plot of the residuals against the predictor


variable shows a discernable pattern, such
as a curve, then the response and predictor
variable may not be linearly related.

4-70 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
The residual plot in Figure (a) does not show any pattern, so a
linear model is appropriate. However, the residual plot in Figure
(b) shows a U-shaped pattern, which indicates that a linear model
is inappropriate.

4-71 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
More residual plot examples

4-72 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Is a Linear Model Appropriate?
(1 of 3)
The data in Table 8 were
collected by placing a
temperature probe in a portable
heater, removing the probe, and
recording the temperature (in
degrees Fahrenheit) over time
(in seconds). Determine
whether the relation between
the temperature of the probe and
time is linear.

4-73 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Is a Linear Model Appropriate?
(2 of 3)
Enter the data into statistical software or a graphing calculator
with advanced statistical features. Figure (a) shows a scatter
diagram of the data from StatCrunch. Temperature and time
appear to be linearly related with a negative slope.

The linear correlation coefficient


between these two variables is
–0.997. |– 0.997| = 0.997 > 0.754
(the critical value from Table II
with n = 7), the correlation
coefficient suggests a negative
association exists between time
and temperature.

4-74 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Is a Linear Model Appropriate?
(3 of 3)
Use statistical software to determine the least-squares regression
line and store the residuals. Figure (b) shows a plot of the
residuals versus the explanatory variable, time, using StatCrunch.

The upside-down U-shaped pattern


in the plot indicates that the linear
model is not appropriate.
The predicted values overestimate
the temperature for early and late
times, whereas they underestimate
the temperature for the middle
times.

4-75 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
An outlier can be thought of as an observation whose
response variable is inconsistent with the overall
pattern of the data. To determine outliers, either:

• construct a plot of residuals against the explanatory


variable (take note of significantly high residuals,
which indicate outliers)
• draw a boxplot of the residuals.

4-76 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Identifying Outliers

Figure (a) shows a scatter diagram of a set of data.


The residual plot is shown in Figure (b) and the boxplot of
the residuals is in Figure (c). Do the data have any outliers?

We can see that the data do contain an outlier. We label


the outlier in the figures.
4-77 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Objective 3
• Identify Influential Observations

4-78 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Definitions:
• In a scatterplot, an outlier is a point lying far away from
the other data points.
• Paired sample data may include one or more influential
points, which are points that strongly affect the graph of
the regression line.

4-79 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
Influence is affected by two factors:
1. the relative vertical position of the observation
(residuals) and
2. the relative horizontal position of the observation
(leverage).

Leverage is a measure that depends on how much


the observation’s value of the explanatory variable
differs from the mean value of the explanatory
variable.

4-80 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Influential Points vs. Outliers

Influential points tend to be outliers (see left) but not all outliers
are influential points (as they may or may not drastically change
the regression line.

4-81 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
EXAMPLE Identifying Influential Observations

Consider the nine pairs of jackpot/tickets data. The


scatterplot located to the left below shows the regression
line. If we include an additional pair of data, x = 980 and y
= 12, we get the regression line shown to the right below.

4-82 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved
In-class example 2

For the rest of today’s class, please work on In-class


Example #2 (posted under Week 3 module on Canvas).

You will practice:


• Calculating r and r-squared
• Finding least squares regression line
• Making predictions

4-83 Copyright © 2021, 2017, 2013 Pearson Education, Inc. All Rights Reserved

You might also like