0% found this document useful (0 votes)

3 views

Thomas Watson - Scatterplot practice

Uploaded by

thomas.watson455

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Thomas Watson - Scatterplot practice

Uploaded by

thomas.watson455

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Name: _________________

AP Statistics Handout: Lesson 3.3

Topics: residuals & residual plots

Lesson 3.3 Guided Notes

Residuals

Residuals: The error (vertical distance) between a linear model’s Residual = actual – predicted
_______________
PREDICTION and theACTUAL __________ data point. Residual = 𝒚 − 𝒚

A linear regression was performed on students’

attendance and their test scores, resulting in the linear
equation: 𝑦 = −7.69 + 0.57𝑥, where 𝑦 = predicted raw
score and x = percent of school days attended.
a) A new student comes to the school. If his attendance
rate is 80%, what is his predicted test score? Show your
work below and by drawing on the graph.

-7.69+0.57(.80)=36 predicted score

b) The student gets 44 questions correct on his exam. Find, draw, and interpret our model’s prediction
error (residual) for this student.

44-36=8 which is the residual

Residual Plots Scatterplot Residual Plot

Does this linear model provide a good fit

residual values

for this data? Justify why or why not.

y-values

Yes because there isn't a pattern

and its just all over the place
x-values x-values

Insert text here

Does this linear model provide a good fit for
residual values

this data? Justify why or why not.

y-values

No the residual plot resembles a

upside quadratic
x-values x-values
2

High Composite First Year Dataset – University of California Admissions

SAT Family
Student School (HSGPA & College
GPA
Score
SAT)
Income
GPA
 In 2020, one of the world’s largest school systems, the University of
California, released a study1 of standardized tests and admission in their
Student 1 3.12 1240 64.8% $45,696 1.85
schools. The report gave the public a rare look at admissions statistics,
Student 2 3.24 1460 70.9% $115,754 2.84
which are normally kept private.
Student 3 3.66 1670 80.5% $48,209 3.27
Student 4 3.43 1860 81.6% $63,582 2.61  Today, we’ll analyze a simulated version of their full dataset, with n =
Student 5 3.35 1890 81.2% $33,641 3.41 1,000 students. Simulation is required, since the raw data is still private.
Note: This is simulated data that closely matches key summary statistics from UC However, the simulated data matches the key summary statistics from
testing task force report’s latest years (2015 & 2016). Full data file, including
citations and summary, is available on the lesson page: skewthescript.org/3-3.
the report.

Lesson 3.3 (Day 1) Discussion

Scatterplot Residual Plot

The University of California report found a positive correlation between family income and SAT scores.
For this reason, the report expressed concern that using the SAT in admissions could disadvantage the
kids who couldn’t afford tutors / test prep.

Discussion Question: To level the playing field, many colleges “consider student backgrounds” when
evaluating their SAT scores.

a) How could a college use this data to consider income when evaluating SAT scores? Hint: Think
about how they could use the residual plot.

They'll see over/under predictions and just give out free act/sat books cause
there literally like 200 dollars for a book which is like 3 test sounds like market
capitalization

b) Is the method you proposed in part (a) “fair” to both wealthy and low-income students?
Explain your thinking.
If its free and distributed to schools to equall give then there
should be a increase

1.
“Report of the UC Academic Council Standardized Testing Task Force,” University of California Academic Senate (2020):
https://senate.universityofcalifornia.edu/_files/committees/sttf/sttf-report.pdf
3

Lesson 3.3 (Day 1) Practice

1) An avid bird watcher counted the number of geese on a pond each morning and wondered if the
temperature could be used to predict the number of geese she would see. A linear regression was
performed resulting in the least-square regression line of 𝑦 = −9.89 + 0.25𝑥, where x = temperature in
degrees Fahrenheit and y = the number of geese on the pond.

a) When it is 55℉ in the morning, how many

geese are predicted to be on the pond?

y=-9.89 +0.25(55) = 3.86 = 4

b) On a morning that was 55℉, there were

actually 4 geese on the pond. Calculate and
interpret the residual for this day, and then draw
the residual on the scatterplot.
The residual is 0.5

c) On a morning that was 61℉, there were also 4 geese on the pond. Calculate and interpret the
model’s residual for this day, and then draw the residual on the scatterplot.

the residual is -1.5 meaning the underpredicted by -1.5 gooses

d) The residual plot for the number of geese vs. temperature data set is given below. Does the linear
model provide a good fit for the data? Why or why not?

Yes because it doesn't show a pattern and seems pretty linear

2) The size of a home or apartment is often described using square footage of the floor plan. Suppose
we want to predict the monthly rent for an apartment from its square footage. The least squares
regression line for x = square footage and y = monthly rent was calculated to be 𝑦 = −995.89 + 2.33𝑥
for a sample of small 1-2 bedroom apartments in a city. Using the residual plot below, is this linear
model a good fit for the data? Why or why not?

The scattering has a pattern and it represents a non-linear

relationship
3) Linear regression was performed on two quantitative variables resulting in the LSRL of
𝑦 = −4.28 + 7.77𝑥 and the residual plot shown below.

Which do you think would be more reliable using the LSRL: a prediction using x = 1.5 or a prediction
using x = 8.5? Justify your answer.

1.5 since it has a slightly better scattering and doesn't have more of a pattern then
8.5 being X in the Least Square Regression Line
5

AP Statistics Handout: Lesson 3.3 (Day 2)

Topics: standard deviation of residuals (s), coefficient of determination (r 2), outliers

Lesson 3.3 Guided Notes

After testing centers closed in 2020 due to the Covid-19 pandemic, many colleges dropped their SAT /
ACT testing requirements. Now, many colleges continue to list these tests as optional for applicants. But,
slowly, more and more colleges have started requiring these tests again. Let’s explore why.

Dataset – University of California Admissions

High Composite First Year
SAT Family
Student School (HSGPA & College
Score Income  In 2020, one of the world’s largest school systems, the University
GPA SAT) GPA
of California, released a study1 of standardized tests and
Student 1 3.12 1240 64.8% $45,696 1.85
admission in their schools. The report gave the public a rare look
Student 2 3.24 1460 70.9% $115,754 2.84
at admissions statistics, which are normally kept private.
Student 3 3.66 1670 80.5% $48,209 3.27
Student 4 3.43 1860 81.6% $63,582 2.61  Today, we’ll analyze a simulated version of their full dataset, with
Student 5 3.35 1890 81.2% $33,641 3.41 n = 1,000 students. Simulation is required, since the raw data is
Note: This is simulated data that closely matches key summary statistics from UC still private. However, the simulated data matches the key
testing task force report’s latest years (2015 & 2016). Full data file, including
citations and summary, is available on the lesson page: skewthescript.org/3-3.
summary statistics from the report.

Colleges use high school GPA and SAT/ACT scores to predict how well applicants will perform in college
classes. Above are the relationships between high school GPA, SAT score, and college GPA from our
dataset. The x-axis on the rightmost graph represents the “composite” - the percent of points earned for
high school GPA (out of 4.0) and SAT score (out of 2400), where GPA & SAT are evenly weighted.

a) Use the first LSRL model to predict the first year college GPA of a student who has a 3.5 GPA in high
school. You can visually approximate using the graph (show your thinking by drawing on the graph).

b) Which of the explanatory variables above (high school GPA, SAT score, or composite of GPA/SAT)
looks like the best predictor of college students’ GPAs? Explain your thinking.

composite data seems closer to graph

1
“Report of the UC Academic Council Standardized Testing Task Force,” University of California Academic Senate (2020):
https://senate.universityofcalifornia.edu/_files/committees/sttf/sttf-report.pdf
6

Standard Deviation of the Residuals (s)

Standard deviation of the residuals (s): _________________ error between data points and their LSRL.

s: The typical residual length

Stem for interpreting s: When using the
Stronger correlation
s:  _________________
LSRL with explanatory variable to predict
response variable, we will typically be
off by about value of s with units of the
response variable (y).
Weaker correlation
s:
 _________________

1) The standard deviation of the residuals for the LSRL

between attendance and test scores is s = 1.99.
Interpret this value.

s = 0.563 s = 0.531 s = 0.518

Above, the three predictors of college GPA (high school GPA, SAT scores, and a composite of both) are
displayed, along with some residuals. The standard deviation of the residuals is also shown.

2) Which of the predictors above (high school GPA, SAT score, or composite of GPA/SAT) is the best
predictor of college GPA? Why might this explain why some colleges are requiring the SAT again?
7

The coefficient of determination (r2)

Graphic inspired by mathisfun.com

𝑟=1 𝑟 = 0.91 𝑟 = 0.48 𝑟=0 𝑟 = −0.48 𝑟 = −0.91 𝑟 = −1
𝑟 =1 𝑟 = _____ 𝑟 = _____ 𝑟 = _____ 𝑟 = _____ 𝑟 = _____ 𝑟 = _____

𝑟 close to 0  _ correlation | 𝑟 close to 1  _ correlation

𝒓𝟐 = 𝟏. 𝟎𝟎 = 𝟏𝟎𝟎%
The linear model ______________
explains the data’s pattern.
Stem for interpreting r2 : 𝑟 % of the
variation in response variable can be
𝒓𝟐 = 𝟎. 𝟕𝟐 = 𝟕𝟐% explained by the linear relationship
The linear model explains with explanatory variable
_________ of the data’s pattern,
but not all of it. There is some error.

r2 = 0.13 r2 = 0.22 r2 = 0.26

3) Interpret the r2 value for the relationship between 4) Does including SAT scores substantially improve
high school GPA and college GPA. the strength of predictions? Justify your answer.
8
A B C

The effect of outliers

5) For each of the above models (A, B, C), is the circled data value an outlier? Explain.

6) Are these measures (r, r2, and s)

resistant to outliers? How can you tell?

No, because the change grea

No Outlier With Outlier
r = 0.79 | r2 = 0.62 | s = 1.31 .61 | r2 =.37
r = ____ ____ | s = 1.85
____

Lesson 3.3 (Day 2) Discussion

The UC report found that high school GPA used to be almost as

strong as the SAT in predicting college GPA. See the graph at left
for an older (2007) cohort of applicants.

Discussion: Give one possible reason that high school GPA

became a weaker predictor of college GPA over time.
Hint: Think about the shape of the data along the x-axis.

The changes in the grading scale

Note: This is simulated data, matching summary

statistics from the University of California report.
9

Lesson 3.3 (Day 2) Practice

1) Two scatterplots and their corresponding least-squares regression lines are shown. One graph (A or B)
has 𝑟 = 0.955 and the other has 𝑟 = 0.749. One graph (A or B) has 𝑠 = 5.251 and the other has 𝑠 =
10.336. Match each graph with its respective 𝑟 and s values. Justify your answers.

Graph A Graph B

.749 and 10.336 data is further off the line the data .955 and 5.251 data
is nearly percect and aren't
very far off the line

2) An avid bird watcher counted the number of geese on a pond each morning and wondered if the
temperature could be used to predict the number of geese she would see. A linear regression was
performed using x = temperature (℉) and y = number of geese.

a) The resulting value of 𝑟 was 0.85. Interpret this value in context.

.85 of the data is connected to eachother

b) The resulting value of 𝑠 was 0.82. Interpret this value in context.

68 percent of the data was within .82 of the

predicted line

Survival Analysis - Guo
No ratings yet
Survival Analysis - Guo
172 pages
Introduction To Econometrics - Stock & Watson - CH 4 Slides
100% (2)
Introduction To Econometrics - Stock & Watson - CH 4 Slides
84 pages
Creswell Educational Research 5th
0% (2)
Creswell Educational Research 5th
25 pages
3.3.A.Day1.STS.Handout.Key
No ratings yet
3.3.A.Day1.STS.Handout.Key
4 pages
301 Project
No ratings yet
301 Project
12 pages
GE4 Quiz No. 3
100% (2)
GE4 Quiz No. 3
5 pages
Unit Study Guide On Linear Regression Models
No ratings yet
Unit Study Guide On Linear Regression Models
5 pages
Untitled
No ratings yet
Untitled
12 pages
Engineering Statistics Minitab
No ratings yet
Engineering Statistics Minitab
44 pages
CEP933 Lab 2 Presentation
No ratings yet
CEP933 Lab 2 Presentation
44 pages
AP STAT Midterm Review
No ratings yet
AP STAT Midterm Review
8 pages
FinalExam Fall2020 Updated GB213
No ratings yet
FinalExam Fall2020 Updated GB213
11 pages
MAS202Group1 Group-Assignment
No ratings yet
MAS202Group1 Group-Assignment
20 pages
STA215 STA220 Practice Test
No ratings yet
STA215 STA220 Practice Test
13 pages
ch12 Ate Alternate Examples
No ratings yet
ch12 Ate Alternate Examples
18 pages
Linear Regression
No ratings yet
Linear Regression
73 pages
Final Exam - Balines Rhea Rose
No ratings yet
Final Exam - Balines Rhea Rose
10 pages
Instructors Manual
No ratings yet
Instructors Manual
18 pages
An investigator pools common cold sufferers
No ratings yet
An investigator pools common cold sufferers
18 pages
Homework 9 - UMU
No ratings yet
Homework 9 - UMU
14 pages
AP Stats Exercise
No ratings yet
AP Stats Exercise
4 pages
Problem Set 3
No ratings yet
Problem Set 3
3 pages
AP Statistics - 2014-2015 Semester 1 Test 3
No ratings yet
AP Statistics - 2014-2015 Semester 1 Test 3
4 pages
AP Stats 3.2
No ratings yet
AP Stats 3.2
57 pages
Week 5activity - MarcasBTM-7104: Statistics I
No ratings yet
Week 5activity - MarcasBTM-7104: Statistics I
16 pages
Assignment SPSS Word2
No ratings yet
Assignment SPSS Word2
17 pages
STAT June 21 2018
No ratings yet
STAT June 21 2018
1 page
Intermediate Statistics Test Sample 2
100% (1)
Intermediate Statistics Test Sample 2
19 pages
Intermediate Statistics Test Sample 2
0% (1)
Intermediate Statistics Test Sample 2
19 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
16 pages
Regression Exercise PDF
No ratings yet
Regression Exercise PDF
2 pages
MIDTERM REVIEW (1)
No ratings yet
MIDTERM REVIEW (1)
5 pages
Ets Regression
No ratings yet
Ets Regression
28 pages
HW 9.3 Solutions
No ratings yet
HW 9.3 Solutions
6 pages
STAT501 Online FinalExam Fall2024
No ratings yet
STAT501 Online FinalExam Fall2024
14 pages
Chapter 2 & 3-Review of Probability and Statistics
No ratings yet
Chapter 2 & 3-Review of Probability and Statistics
93 pages
Stats 12 Practice Test
No ratings yet
Stats 12 Practice Test
6 pages
The Next 5 Questions Are Based On The Following Information.
No ratings yet
The Next 5 Questions Are Based On The Following Information.
10 pages
Chapter 3 Notes-Alyssa
No ratings yet
Chapter 3 Notes-Alyssa
10 pages
Chapter 3 Notes-Alyssa
No ratings yet
Chapter 3 Notes-Alyssa
10 pages
Homework 1
No ratings yet
Homework 1
3 pages
Linear Regression - Kevin
No ratings yet
Linear Regression - Kevin
31 pages
Multinomial Logistic Regression - Spss Data Analysis Examples
No ratings yet
Multinomial Logistic Regression - Spss Data Analysis Examples
1 page
Piyachat Dobbs 15760087 10/08/15: 2.2 The Following Table Contains The ACT Scores and The GPA (Grade Point Average)
No ratings yet
Piyachat Dobbs 15760087 10/08/15: 2.2 The Following Table Contains The ACT Scores and The GPA (Grade Point Average)
3 pages
IV_AI-DS_AD3491_FDSA_Unit5
No ratings yet
IV_AI-DS_AD3491_FDSA_Unit5
35 pages
QMB Asn 3
No ratings yet
QMB Asn 3
9 pages
Chapter 7. Software Application
No ratings yet
Chapter 7. Software Application
43 pages
Question Bank On Biostatistics
No ratings yet
Question Bank On Biostatistics
2 pages
F10M17 (01) LiaoFinal
No ratings yet
F10M17 (01) LiaoFinal
14 pages
Homework 5
No ratings yet
Homework 5
2 pages
GSI - S Defence - Decision Requires Judgement-Checkpoint
No ratings yet
GSI - S Defence - Decision Requires Judgement-Checkpoint
1 page
Eco220y A17
No ratings yet
Eco220y A17
28 pages
ST1381 Elementary Statistics PDF
100% (1)
ST1381 Elementary Statistics PDF
299 pages
ST102 Exercise 1
No ratings yet
ST102 Exercise 1
4 pages
Problem Set for Statitstics
No ratings yet
Problem Set for Statitstics
10 pages
ad stat final exam
No ratings yet
ad stat final exam
9 pages
Untitled
No ratings yet
Untitled
5 pages
Assignment # 1_BBA_2k23
No ratings yet
Assignment # 1_BBA_2k23
4 pages
Laine Reed Linreg Project Report
No ratings yet
Laine Reed Linreg Project Report
6 pages
Midterm1 2016-02-16 Solutions
No ratings yet
Midterm1 2016-02-16 Solutions
10 pages
Data Analysis in 6th Grade
From Everand
Data Analysis in 6th Grade
Christopher Casey
No ratings yet
Statistics Super Review, 2nd Ed.
From Everand
Statistics Super Review, 2nd Ed.
The Editors of REA
5/5 (3)
Manual PPAP 4th Edition
No ratings yet
Manual PPAP 4th Edition
74 pages
Q Methodology
No ratings yet
Q Methodology
35 pages
Why We Need Statistics in Clinical Psychology
100% (1)
Why We Need Statistics in Clinical Psychology
12 pages
Elec 303 HW10 Probability
No ratings yet
Elec 303 HW10 Probability
2 pages
Voorbeeldexamen Econometrie - Oplossing
No ratings yet
Voorbeeldexamen Econometrie - Oplossing
6 pages
A Practitioners Implementation of Indicator Kriging PDF
100% (1)
A Practitioners Implementation of Indicator Kriging PDF
12 pages
PH Ysics: Manual For M.Sc. (P) Nuclear Physics Lab
No ratings yet
PH Ysics: Manual For M.Sc. (P) Nuclear Physics Lab
177 pages
Zuur Et Al 2009 BOOK - Chap01 - Introduction
No ratings yet
Zuur Et Al 2009 BOOK - Chap01 - Introduction
10 pages
Lecture 9 - Chi Square Test 2013
No ratings yet
Lecture 9 - Chi Square Test 2013
9 pages
Uji Normalitas Dan Homogenitas
No ratings yet
Uji Normalitas Dan Homogenitas
18 pages
Chebyshev Stats
No ratings yet
Chebyshev Stats
2 pages
NN 2
No ratings yet
NN 2
42 pages
Rubric Individual Case Study Assignment 2 - Inferential Statistics (35%)
No ratings yet
Rubric Individual Case Study Assignment 2 - Inferential Statistics (35%)
3 pages
M. A. Jingu: Fcpa PHD: Audit Sampling: Isa 530, Audit Sampling
No ratings yet
M. A. Jingu: Fcpa PHD: Audit Sampling: Isa 530, Audit Sampling
20 pages
Linear Regression Merged
No ratings yet
Linear Regression Merged
38 pages
MATM111 - Lesson 1 - Introduction To Statistics
No ratings yet
MATM111 - Lesson 1 - Introduction To Statistics
3 pages
Ermeregress
No ratings yet
Ermeregress
21 pages
PG - M.A. - Education - 348 12 - Essentials of Educational Psychology
No ratings yet
PG - M.A. - Education - 348 12 - Essentials of Educational Psychology
248 pages
GC SY Bistatistics III - Chi Square Test Practice Sums
No ratings yet
GC SY Bistatistics III - Chi Square Test Practice Sums
2 pages
The Effect of Internal Control On Asset Misappropriation: The Case of Vietnam
No ratings yet
The Effect of Internal Control On Asset Misappropriation: The Case of Vietnam
13 pages
ID Analisis Pendapatan Nelayan Pemilik Paya
No ratings yet
ID Analisis Pendapatan Nelayan Pemilik Paya
11 pages
MCQ Introduction With Correct Answers
No ratings yet
MCQ Introduction With Correct Answers
5 pages
People Workforce HR Analytics PDF
100% (1)
People Workforce HR Analytics PDF
20 pages
Statistical Methods For Decision Making
No ratings yet
Statistical Methods For Decision Making
11 pages
Stat 244 Cheatsheet
No ratings yet
Stat 244 Cheatsheet
2 pages
MIT18 05S14 Reading2
No ratings yet
MIT18 05S14 Reading2
6 pages
Lecturenotes12 10
No ratings yet
Lecturenotes12 10
22 pages
sampling math
No ratings yet
sampling math
3 pages

Thomas Watson - Scatterplot practice

Uploaded by

Thomas Watson - Scatterplot practice

Uploaded by

Name: _________________

AP Statistics Handout: Lesson 3.3

Lesson 3.3 Guided Notes

A linear regression was performed on students’

-7.69+0.57(.80)=36 predicted score

44-36=8 which is the residual

Residual Plots Scatterplot Residual Plot

Does this linear model provide a good fit

for this data? Justify why or why not.

Yes because there isn't a pattern

Insert text here

this data? Justify why or why not.

No the residual plot resembles a

High Composite First Year Dataset – University of California Admissions

Lesson 3.3 (Day 1) Discussion

Scatterplot Residual Plot

Lesson 3.3 (Day 1) Practice

a) When it is 55℉ in the morning, how many

y=-9.89 +0.25(55) = 3.86 = 4

b) On a morning that was 55℉, there were

the residual is -1.5 meaning the underpredicted by -1.5 gooses

Yes because it doesn't show a pattern and seems pretty linear

The scattering has a pattern and it represents a non-linear

AP Statistics Handout: Lesson 3.3 (Day 2)

Lesson 3.3 Guided Notes

Dataset – University of California Admissions

composite data seems closer to graph

Standard Deviation of the Residuals (s)

s: The typical residual length

1) The standard deviation of the residuals for the LSRL

s = 0.563 s = 0.531 s = 0.518

The coefficient of determination (r2)

Graphic inspired by mathisfun.com

𝑟 close to 0  _______ correlation | 𝑟 close to 1  _______ correlation

r2 = 0.13 r2 = 0.22 r2 = 0.26

The effect of outliers

6) Are these measures (r, r2, and s)

No, because the change grea

Lesson 3.3 (Day 2) Discussion

The UC report found that high school GPA used to be almost as

Discussion: Give one possible reason that high school GPA

The changes in the grading scale

Note: This is simulated data, matching summary

Lesson 3.3 (Day 2) Practice

a) The resulting value of 𝑟 was 0.85. Interpret this value in context.

b) The resulting value of 𝑠 was 0.82. Interpret this value in context.

68 percent of the data was within .82 of the

You might also like

𝑟 close to 0  _ correlation | 𝑟 close to 1  _ correlation