0% found this document useful (0 votes)

71 views

Two Quantitative Variables: Scatterplot, Correlation, and Linear Regression

This document discusses analyzing the relationship between two quantitative variables through scatterplots, correlation, and linear regression. It provides examples analyzing the relationship between presidential approval ratings and election margins, cricket chirp rates and temperature, life expectancy and fat consumption, and more. Key points made include that correlation measures the strength and direction of a linear relationship, but does not prove causation, and that outliers can influence correlation. Linear regression fits a line of best fit to predict a response variable based on an explanatory variable.

Uploaded by

brownka5

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views

Two Quantitative Variables: Scatterplot, Correlation, and Linear Regression

Uploaded by

brownka5

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Two Quantitative Variables: Scatterplot, Correlation, and Linear

Regression

Example. When a US president runs for re-election, how strong is the relationship between
the president’s approval rating and the outcome of the election? The table below includes
all the presidential elections since 1940 in which an incumbent was running and shows the
presidential approval rating at the time of the election.

Year Candidate Approval Margin Result

1940 Roosevelt 62 10.0 Won
1948 Truman 50 4.5 Won
1956 Eisenhower 70 15.4 Won
1964 Johnson 67 22.6 Won
1972 Nixon 57 23.2 Won
1976 Ford 48 -2.1 Lost
1980 Carter 31 -9.7 Lost
1984 Reagan 57 18.2 Won
1992 G. H. W. Bush 39 -5.5 Lost
1996 Clinton 55 8.5 Won
2004 G. W. Bush 49 2.4 Won

1. What was the highest approval rating for any of the losing presidents? What was the
lowest approval rating for any of the winning presidents? Make a conjecture about the
approval rating needed by a sitting president in order to win re-election.

2. Approval rating and margin of victory are both quantitative variables. Does there seem
to be an association between the two variables?
Scatterplot
A scatterplot is a graph of the relationship between two quantitative variables.
A scatterplot includes a pair of axes with appropriate numerical scales, one for each variable.
The paired data for each case are plotted as a point on the scatterplot. If there are explanatory
and response variables, we put the explanatory variable on the x-axis and the response
variable on the y-axis.

Example. Draw a scatterplot for the data on approval rating and margin of victory in the
table above.

Interpreting a Scatterplot
When looking at a scatterplot we often address the following questions:

• Do the points form a clear trend with a particular direction, are they more scattered
about a general trend, or is there no obvious pattern?

• If there is a trend, is it generally upward or generally downward as we look from left to

right? A general upward trend is called a positive association while a general downward
trend is called a negative association.

• If there is a trend, does it seem to follow a straight line, which we call a linear associ-
ation, or some other curve or pattern?

• Are there any outlier points that are clearly distinct from a general pattern in the data?

2
Example. Four scatterplots are shown in the figure below. For each pair of variables, discuss
the information contained in the scatterplot. If there appears to be a positive or negative
association, discuss what that means in the specific context.

3
Summarizing a Relationship between Two Quantitative Variables: Correlation
Just as the mean or median summarizes the center and the standard deviation or IQR mea-
sures the spread of the distribution for a single quantitative variable, we need a numerical
statistic to measure the strength and direction of association between two quantitative vari-
ables. One such statistic is the correlation.

Correlation
The correlation is a measure of the strength and direction of linear association between two
quantitative variables.

Notation for the Correlation

The correlation between two quantitative variables of a sample is denoted r.
The correlation between two quantitative variables of a population is denoted ρ.

The correlations for each of the pairs of variables that have been displayed in scatterplots
earlier in this section are displayed below.

Variable 1 Variable 2 Correlation

Margin of victory Approval rating 0.86
Average mercury Acidity -0.58
Average mercury Alkalinity -0.59
Alkalinity Acidity 0.72
Average mercury Standardized mercury 0.96

Properties if the Correlation

The sample correlation r has the following properties:

• Correlation is always between -1 and 1, inclusive: −1 ≤ r ≤ 1.

• The sign r (positive or negative) indicates the direction of association.

• Values of r close to ±1 show a strong linear relationship, while values of r close to 0

show no linear relationship.

• The correlation r has no units and is independent of the scale of either variable.

• The correlation is symmetric: The correlation between variables x and y is the same
as the correlation between y and x.

The population correlation ρ also satisfies these properties.

4
Example. Common folk wisdom claims that one can determine the temperature on a sum-
mer evening by counting how fast the crickets are chirping. Is there really an association
between chirp rate and temperature? The data below were collected by E.A. Bessey and
C.A. Bessey, who measured chirp rates for crickets and temperature during the summer of
1898.

Temperature (◦ F) 54.5 59.5 63.5 67.5 72.0 78.5 83.8

Chirps (per minute) 81 97 103 123 150 182 195

1. Use the scatterplot to estimate the correlation between chirp rate and temperature.
Explain your reasoning.

2. Use technology to find the correlation and use correlation notation.

3. Are chirp rate and temperature associated?

5
Example. The figure below shows the estimated average life expectancy (in years) for a
sample of 40 countries against the average amount of fat (measured in grams per capita per
day) in the food supply for each country. The scatterplot shows a clear positive association
(r = 0.70) between these two variables. The countries with short life expectancies all have
below-average fat consumption, while the countries consuming more than 100 grams of fat
on average all have life expectancies over 70 years. Does this mean that we should eat more
fat to live longer?

6
Correlation Caution #1
A strong positive or negative correlation does not (necessarily) imply a cause and effect
relationship between the two variables.

Example. Core body temperature for an individual person tends to fluctuate during the
day according to a regular circadian rhythm. Suppose that the body temperature for an
adult woman are recorded every hour of the day, starting at 6 am. The results are shown in
the figure below. Does there appear to be an association between the time of day and body
temperature? Estimate the correlation between the hour of the day and the woman’s body
temperature.

7
Correlation Caution #2
A correlation near zero does not (necessarily) mean that the two variables are not associated,
since the correlation measures only the strength of a linear relationship.

Example. The figure below shows the alcohol consumption (drinks per week) and average
daily caloric intake for 91 subjects who are at least 60 years old, from the data in Nutri-
tionStudy. Notice the distinct outlier who claims to imbibe 203 drinks per week as part of
a 6662 calorie diet! This is almost certainly an incorrect observation. The second plot shows
these same data with the outlier removed. How do you think the correlation between calories
and alcohol consumption change when the outlier is deleted?

8
Correlation Caution #3
Correlation can be heavily influenced by outliers. Always plot your data.

A Formula for Correlation

1 X x − x̄ y − ȳ
r=
n−1 sx sy

This formula essentially involves converting all values for both variables to z-scores, which
puts the correlation on a fixed ±1 scale and makes it independent of the scale of measurement.
For a positive association, large values of x tend to occur with large values of y (both z-scores
are positive) and small values (with negative z-scores) tend to occur together. In either case,
the products are positive, which leads to a positive sum. For a negative association, the
z-scores tend to have opposite signs (small x with large y and vice versa) so the products
tend to be negative.

9
The Regression Line
The process of fitting a line to a set of data is called linear regression and the line of the best
fit is called the regression line. The regression line provides a model of a linear association
between two variables, and we can use the regression line on a scatterplot to give a predicted
value of the response variable, based on a given value of the explanatory variable.

Example. Use the regression line in the figure below to estimate the predicted tip amount
on a $60 bill.

10
Explanatory and Response Variables
The regression line to predict y from x is NOT the same as the regression line to predict x
from y. Be sure to always pay attention to which is the explanatory variable and which is
the response variable.
A regression line is always in the form

\ = a + b · Explanatory
Response

The equation of the regression line is often called a prediction equation because we can use
it to make predictions. We substitute the value o the explanatory variable into the prediction
equation to calculate the predicted response.

Example. Three different bill amounts from the RestaurantTips dataset are given. In
d = −.292 + 0.182 · Bill to predict the tip.
each case, use the regression line Tip

1. A bill of $59.33

2. A bill of $9.52

3. A bill of $23.70

11
Residuals
The residual at a data value is the difference between the observed and predicted values of
the response variable:

Residual = Observed − Predicted = y − ŷ

On a scatterplot, the residual represents the vertical deviation from the line to a data point.
Points above will have positive residuals and points below the line will have negative residuals.
If the predicted values closely match the observed data values, the residuals will be small.

Example. In the previous example, we found the predicted tip amount for three different
bills in the restaurantTips dataset. The actual tips left by each of these customers are
shown below. Use the information to calculate the residuals for each of these sample points.

1. The tip left on a bill of $59.33 was $10.00

2. The tip left on a bill of $9.52 was $1.00

3. The tip left on a bill of $23.70 was $10.00

12
Example. The data from ElectionMargin are given below.

Year Candidate Approval Margin Result

1. The regression line for these 11 data points is

\ = −36.5 + 0.836(Approval)
Margin

Calculate the predicted values and the residuals for all the data points.

13
2. Which residual is the largest? For this largest residual, is the observed margin higher
or lower than the margin predicted by the regression line? To which president and year
does this residual correspond?

Least Squares Line

The least squares line, also called the line of best fit, is the line which minimizes the
sum of the squared residuals, (y − ŷ)2 .

Interpreting the Slope and Intercept of the Regression Line

For the regression line ŷ = a + bx,

• The slope b represents the predicted change in the response variable y given a one unit
increase in the explanatory variable x.

• The intercept a represents the predicted value of the response variable y if the explana-
tory variable x is zero. The interpretation may be nonsensical since it is often not
reasonable for the explanatory variable to be zero.

d = −0.292 + 0.182 · Bill, interpret

Example. For the RestaurantTips regression line Tip
the slope and the intercept in context.

14
Example. In an earlier example, we consider some scatterplots from the dataset Flori-
daLakes showing relationships between acidity, alkalinity, and fish mercury levels in n = 53
Florida lakes. We wish to predict a quantity that is difficult to measure (mercury level of
fish) using a value that is more easily obtained from a water sample (acidity). We saw that
there appears to be a negative linear association between these two variables, so a regression
line is appropriate.

1. Use technology to find the regression line to predict Mercury from pH, and plot it on
a scatterplot of the data.

2. Interpret the slope of the regression line in the context of Florida lakes.

15
Regression Caution #1
Avoid trying to apply a regression line to predict values far from those that were used to
create it.

Example. In the previous example, we used the acidity (pH) of Florida lakes to predict
mercury levels in fish. Suppose that, instead of mercury, we use acidity to predict the calcium
concentration (mg/l) in Florida lakes. The figure below shows a scatterplot of these data
\ = −51.4 + 11.17 · pH for the 53 lakes in our sample. Give an
with the regression line Calcium
interpretation for the slope in this situation. Does the intercept make sense? Comment on
how well the linear prediction equation describes the relationship between these two variables.

16
Regression Caution #2
Always plot the data. Although the regression line can be calculated for any set of paired
quantitative variables, it is only appropriate to use a regression line when there is a linear
trend in the data.

Regression Caution #3
Outliers can have a strong influence on the regression line, just as we saw for correlation. In
particular, data points for which the explanatory value is an outlier are often called influential
points because they exert an overly strong effect on the regression line.

Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Interacting Subframe
No ratings yet
Interacting Subframe
11 pages
Unit 4 Statistics Notes Scatter Plot 2023-24
No ratings yet
Unit 4 Statistics Notes Scatter Plot 2023-24
15 pages
L3 Bivariate Worksheet
No ratings yet
L3 Bivariate Worksheet
25 pages
Corr_Regression Analysis
No ratings yet
Corr_Regression Analysis
19 pages
Correlation and Regression: Predicting The Unknown
No ratings yet
Correlation and Regression: Predicting The Unknown
5 pages
CH 5 - Correlation and Regression
No ratings yet
CH 5 - Correlation and Regression
9 pages
Statistics Regression Final Project
100% (2)
Statistics Regression Final Project
12 pages
IPS7e_LecturePPT_ch02
No ratings yet
IPS7e_LecturePPT_ch02
105 pages
Association
No ratings yet
Association
57 pages
SEE5211 Chapter3-P2017
No ratings yet
SEE5211 Chapter3-P2017
58 pages
Module 2 - Section 4 (Linear Regression) - 11
No ratings yet
Module 2 - Section 4 (Linear Regression) - 11
20 pages
Correlation
No ratings yet
Correlation
22 pages
Bi Variate 1
No ratings yet
Bi Variate 1
75 pages
Stat and Prob Q4 Week 7 Module 15 Lorena
No ratings yet
Stat and Prob Q4 Week 7 Module 15 Lorena
24 pages
stat215 test 2
No ratings yet
stat215 test 2
18 pages
Relationship- Correlation and Regression (1)
No ratings yet
Relationship- Correlation and Regression (1)
42 pages
Stats10_Chapter+4 2
No ratings yet
Stats10_Chapter+4 2
54 pages
The Significance of Correlation
No ratings yet
The Significance of Correlation
6 pages
Machinery
No ratings yet
Machinery
9 pages
Correlation
No ratings yet
Correlation
19 pages
Confidence Interval
No ratings yet
Confidence Interval
6 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (1)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Correg
No ratings yet
Correg
19 pages
Chapter 3 Slides
No ratings yet
Chapter 3 Slides
40 pages
5_Chapter9-linear regression
No ratings yet
5_Chapter9-linear regression
15 pages
Z-Score Examples With Solutions
No ratings yet
Z-Score Examples With Solutions
6 pages
Second Stats Packet 24
No ratings yet
Second Stats Packet 24
100 pages
BRM File
No ratings yet
BRM File
35 pages
Statistics & Probability Q4 - Week 7-8
No ratings yet
Statistics & Probability Q4 - Week 7-8
15 pages
Af Notes by Midhila)
No ratings yet
Af Notes by Midhila)
60 pages
Correlation N Regression
No ratings yet
Correlation N Regression
25 pages
APNotes Chap03
No ratings yet
APNotes Chap03
2 pages
Quantitative MEthods Course Guide Inferential Statistics (1) - 220924 - 143556
No ratings yet
Quantitative MEthods Course Guide Inferential Statistics (1) - 220924 - 143556
8 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
23 pages
MIS_BA_20232024_notes_chapter02
No ratings yet
MIS_BA_20232024_notes_chapter02
8 pages
Correlation New
No ratings yet
Correlation New
37 pages
6 Correlation and Linear Regression
No ratings yet
6 Correlation and Linear Regression
32 pages
Correlation
No ratings yet
Correlation
29 pages
Approach To Comparative Politics
No ratings yet
Approach To Comparative Politics
8 pages
Lecture 7
No ratings yet
Lecture 7
65 pages
How Can We Explore The Association Between Two Quantitative Variables?
No ratings yet
How Can We Explore The Association Between Two Quantitative Variables?
7 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
25 pages
Correlation Regression
100% (1)
Correlation Regression
25 pages
Correlation and Regression Analysis Using SPSS
No ratings yet
Correlation and Regression Analysis Using SPSS
102 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
Final Project: Raiha, Maheen, Fabiha Mahnoor, Zara
No ratings yet
Final Project: Raiha, Maheen, Fabiha Mahnoor, Zara
14 pages
Measures+of+Association
No ratings yet
Measures+of+Association
14 pages
Looking at Data Relationships p79: Explanatory
No ratings yet
Looking at Data Relationships p79: Explanatory
8 pages
coeficiente de correlação
No ratings yet
coeficiente de correlação
6 pages
correlation analysis
No ratings yet
correlation analysis
52 pages
LinearRegression Correlation
No ratings yet
LinearRegression Correlation
3 pages
Stats 1 - IITM BS Notes - Part 2
No ratings yet
Stats 1 - IITM BS Notes - Part 2
16 pages
FODS Unit-3
No ratings yet
FODS Unit-3
25 pages
06 Simple Linear Regression Part1
No ratings yet
06 Simple Linear Regression Part1
8 pages
Chapter 2
No ratings yet
Chapter 2
67 pages
correlationCoefficient
No ratings yet
correlationCoefficient
8 pages
Business Statistics Method: by Farah Nurul Aisyah (4122001020) Jasmine Alviana Zalzabillah (4122001070)
No ratings yet
Business Statistics Method: by Farah Nurul Aisyah (4122001020) Jasmine Alviana Zalzabillah (4122001070)
35 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Precalculus: A Self-Teaching Guide
From Everand
Precalculus: A Self-Teaching Guide
Steve Slavin
4.5/5 (5)
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Top 10 Myths
75% (4)
Top 10 Myths
3 pages
Hallgren Study of Student Achievement
No ratings yet
Hallgren Study of Student Achievement
12 pages
MODULE2 Microphones
No ratings yet
MODULE2 Microphones
70 pages
Lesson Plan 1
No ratings yet
Lesson Plan 1
4 pages
Covid Well: Complete Haemogram Test Erythrocytes
No ratings yet
Covid Well: Complete Haemogram Test Erythrocytes
5 pages
End of Semester Research Paper
No ratings yet
End of Semester Research Paper
9 pages
Specalog 745
100% (1)
Specalog 745
28 pages
Download An Effective Strategy for Safe Design in Engineering and Construction David England ebook All Chapters PDF
100% (1)
Download An Effective Strategy for Safe Design in Engineering and Construction David England ebook All Chapters PDF
47 pages
P Remanand Kocheri: Professional Summary Skills
No ratings yet
P Remanand Kocheri: Professional Summary Skills
2 pages
SSC CGL: Study Material For GI & Reasoning
No ratings yet
SSC CGL: Study Material For GI & Reasoning
8 pages
Prov Result After Stage3
No ratings yet
Prov Result After Stage3
6 pages
Best Song Chords
No ratings yet
Best Song Chords
14 pages
KKD-SAMUR-HSEJHA-011 JHA Bar Bending Cutting UPDATE 30.7.2013
No ratings yet
KKD-SAMUR-HSEJHA-011 JHA Bar Bending Cutting UPDATE 30.7.2013
3 pages
Проект Аширали Г. Асанова А. Ауелбекова У. Ишанова А.
No ratings yet
Проект Аширали Г. Асанова А. Ауелбекова У. Ишанова А.
26 pages
Surface Vehicle Recommended Practice: Rev. JUL95
No ratings yet
Surface Vehicle Recommended Practice: Rev. JUL95
10 pages
Solutions To The Olympiad Hamilton Paper: First Solution
No ratings yet
Solutions To The Olympiad Hamilton Paper: First Solution
6 pages
Sample Exam
No ratings yet
Sample Exam
9 pages
Measuring Volunteer Impact
No ratings yet
Measuring Volunteer Impact
25 pages
C.V - Resume - Refilwe Prinah Isaac-Masiapeto
No ratings yet
C.V - Resume - Refilwe Prinah Isaac-Masiapeto
2 pages
Precis Dec 2024 Mock 6
No ratings yet
Precis Dec 2024 Mock 6
2 pages
KOR PULMONAL Revisi Maret 2017 SGL
No ratings yet
KOR PULMONAL Revisi Maret 2017 SGL
17 pages
Modeling Population Growth:: Exponential, Hyperbolic and Logistic Modeling
No ratings yet
Modeling Population Growth:: Exponential, Hyperbolic and Logistic Modeling
18 pages
Dolphins - Non-Fiction Comprehension
No ratings yet
Dolphins - Non-Fiction Comprehension
3 pages
Name: - Score: - / PRELIM - Hands On Activity#1 Task
No ratings yet
Name: - Score: - / PRELIM - Hands On Activity#1 Task
2 pages
ECN 221 With Answers
No ratings yet
ECN 221 With Answers
29 pages
QP Biology PREBOARD IG Paper 6 2024
No ratings yet
QP Biology PREBOARD IG Paper 6 2024
13 pages
Customization To Set Up BP & Customer Integration: 1. Activate Creation of Post Processing Orders
No ratings yet
Customization To Set Up BP & Customer Integration: 1. Activate Creation of Post Processing Orders
10 pages
Greases For Gear Applications
No ratings yet
Greases For Gear Applications
16 pages
Percentage Sheet - 04
No ratings yet
Percentage Sheet - 04
2 pages

Two Quantitative Variables: Scatterplot, Correlation, and Linear Regression

Uploaded by

Two Quantitative Variables: Scatterplot, Correlation, and Linear Regression

Uploaded by

Two Quantitative Variables: Scatterplot, Correlation, and Linear

Year Candidate Approval Margin Result

• If there is a trend, is it generally upward or generally downward as we look from left to

Notation for the Correlation

Variable 1 Variable 2 Correlation

Properties if the Correlation

• Correlation is always between -1 and 1, inclusive: −1 ≤ r ≤ 1.

• The sign r (positive or negative) indicates the direction of association.

• Values of r close to ±1 show a strong linear relationship, while values of r close to 0

The population correlation ρ also satisfies these properties.

Temperature (◦ F) 54.5 59.5 63.5 67.5 72.0 78.5 83.8

2. Use technology to find the correlation and use correlation notation.

3. Are chirp rate and temperature associated?

A Formula for Correlation

Residual = Observed − Predicted = y − ŷ

1. The tip left on a bill of $59.33 was $10.00

2. The tip left on a bill of $9.52 was $1.00

3. The tip left on a bill of $23.70 was $10.00

Year Candidate Approval Margin Result

1. The regression line for these 11 data points is

Least Squares Line

Interpreting the Slope and Intercept of the Regression Line

d = −0.292 + 0.182 · Bill, interpret

You might also like