Linear Regression - Kevin
Linear Regression - Kevin
Recall
of followers
on
of fÉfwrble
thefilm's official
offollowers Instagram account
We cannot conclude
on thefilm's official there is CAUSATION
Instagram account
i EFFECT
RV i
i I
mustconduct
an EXPERIMENT
TicketSales
i
Regression i i ve linear relationship
TicketSales
f
FMMWMssionlineakallineofBestfit i.ie
change in gimmased
predicts the
when x
miffy fease
offollowers
on thefilm's official
Instagram account
Typesof variables
anithitative Quantitative
variable vs variable
a.k.a Categorical variable
e
g cities in a country
weight kg
gender Male Female
GPA points
levelofeducation
marital status ofstudents in class
breed of a dog time s
speed mph
eye colour
age of an individual years
Scatter Plot
ofppl Gpp
lineofbest fit
5
B REGRESSION
9 LINE
Ic 5 a be
MM 101040
2
1
slope
Timespentstudyingcar
J bot b x b
b
5 bot bi
g Gpa
5 predicted
GPA
studytime s x timespenton Ins
bo
p
b slope 0 b slope so
ve linearrelationship relation
T.my enq
Eudytime 1hr on Instagram h
Regression Line
slope
y intercept
Y m by
mmbi
Effo4Ustope any value of x
predicted intercept
valueof y
y
J b b x coefficient
standarddeviation
b
Sy ofy
b standarddeviation
5 bix of x
2
6 true value
true value of x
of y
Ext
A researcher wants to predict a student's GPA from the
amount oftimethey
studyeach week Ttrporvariable ly
study Time GPA
Ki Yi
I 2.0 J b b x coefficient
s standarddeviation
2 15 ofy
3 2.5 2fygandaenat.in
5 3.5 IE.at
6 Itryyque
of y
6 Recall
3.0
2
S Xi 5
8 4.0 n t
10
4.5
r
dqvdynein.tn
5 5 5 3
51 3.27
Sy 1.08 i
1113127711.08 20
7 0.94
A 2.0 4 1 4
B 2 1.5 3 1.5 4.5
1 3 2.5 2 0.5 1
D 5 0 0.5 0
3,5
E 6 3.0 I 0 0
F 8 4.0 3 1 3
6 1.4375
r 0.9441 1 II Tear
relationship
correlation 2 0,8928
as study time
dttfhh
If a student
d9hefhasesbyI.hr
we predict a student
doesn't
study GPA to increase
Q What is the predicted GPA of at all the
minimumGPA
by 0.311
a student whostudies for
is 1.445
6.5 hrs a week y int is newearingful interpretation
3.47
QQ.IR
88ggfyyoninGfAis
actoredforbyittressionton
poinition
ii iii in
EE
o
b 0.897
Negativeslope as centralPressure falls
MaxWindSpeed increases
MaxWindSpeedincreases
by approx 0.897knots for
every 1 millibar drop in central
pressure
bo 955.27 Notmeaningful
Central Pressure of
0 mb in vacuum
not possible on Earth
b r
38 27197 281 mg
r b us r
correlation coefficient r
squared
r O r
Better prediction
Consider
y regressionline
g
altualvalue
predictedvalue
82 0.90
actual predictedvalue
areclose
regressionline
s
altualvalue
predictedvalue
0.07
actual predictedvalue
are far
away
y regressionline
g
altualvalue
preditedvalue
r I
actual predictedvalue
are thesame
Residual how faroff the predicted value is from the actual value
Residual
Yi J
6 2 predicted value of
actual value y
of y
Consider
Slatter Plot
GPA
5 0 value
refines line
factual
4.0
iy.fi t
3.0 i diiiiEidual ui is
2.0 3.5
ftp.ggsidual 0.56 3
i e negative 0.5
s si study Time
Study
Time
No
data points have various spread
around the line
F
reified
n 2
6
Standardisation
ofthe residuals
Histogram of residuals
of Residuals
680h
any residualsabove 2s
f Residuals
is unusual
21.6 118.4
9.2 a2 s4 6
Ex2 cont'd
n
MW S 955.27 0.897 920
130.03 knots
Residual yo 5
110 130.03
20,03 knots
94.454 1000
The I
prize increases by 94,454 as the housesize increases
by 1 thousand of
squarefeet Gpp
unit
as bi GPA n
thousands of dollars
b
thousands of squarefeet studytime hr
D 3.117 94.45412000
07
Not meaningful Even if it were positive no one would buy a
house w O ft
mm
residual 53 79 1
um 53,790
KY we
01 GPA 9
r 0.595 59.5 of the variation in housesize is
Effeminizes
012
Te
Positive v70 correlation coefficient
slope 014.454 r
613
R would not change r 1correlation coefficient is unchanged
5 b correlation
deviation
staffed
1 2
0 areaffected
b 5 b x
deviation
staffard read
thevalue Engine
of y
Unitconversion of ft to m
involves
I
do Sx
Sy will change
b slope also changes
QI
r2We shouldn't be surprised by any residual
Residual is 53,790
mfrs 100,000 smaller than 2 SD
z
Kigh A residual of 100,000 is
do not have data
less than 2 53,790 107,580
find Z score
16120 1sso
53,790 53,490 hhf no use residual
as SD
80 1 SD 53,790
iii M
Response price
a 1k
Explanatory V
Size ft
b
1K
ft
1
tue larger homes should lost move
v2 0.714 g
71.4 of the variation in price is
affobtsrgesiitdre TT
a
Yes no pattern
b
No there's a curve pattern
1
May not spread is changing
Price
R 71.4
r
F4 0.845 no
8 0
larger homes cost
more size
b
I
2.535
1
N
1.690 0.8450
I 7 price
0.84516902.535
10 r 0.845
Priceshould be 0.845SD
below the mean in
price
a
Priceshould be 1.690 SD
above bdoor the mean in
price
mm
I
0.061 1000 61.00
Price increases
by about 61.00 forevery additional sq ft
b
230,820
priv
c
a
No Your score is better than about 95 of people
assuming scones follow the Normal model
100268
16
4
b
Yes His score is better than only 16 of people
5 0 154030 0 06505271
a
probably
Theresiduals show
some initially low
points but there's
no clear curvature
b
92.4 of the variation
in nicotine is accounted to
itsregression on tar content
iifftisiif.ee
a Do you think a linear model is
appropriate here Explain
b
Explain the meaning R in this context
r
at
R 92.4 0.924
r R2 0.96
Nicotine
20 2 0.96 1.92
9M
deviation
5
Hyalue
Equine
f
a
o 0.065052
4 Line 0.154030 0.065052Tan
b
to Fotine 0.154030 0.065052147
O 39611
0.396
my
Nicotine content
increases by 0.065 mg
per additional milligram of tar
mg ofnicotine
bo unify mi
b mg of tan
d We'd expect a cigarette
wf.no tar to
have 0.065052 ing of nicotine
g
le residual predicted
algal
9 Tav 7mg
Fotine
yi I 0.154030 0.06505217
0.609394
0.5 9 0.609394
Yi 0.5 0.609394
0.109394
mg
0.109mg to 3s f
abstentions
ie
actual
HCI
a
It'sappropriate
Therelationship is straightenough.nl
MFI a fewoutliers
I b cofficientation
y mate
6 6
I b
116.55
bo 5 bixy standard
deviation
bi r 0.65
7072.47
6
megan ate 0.010711 0
bo 338.2 10.01077146234
f
If 156.5040.0107144993
156.5038
156.50
324.925
156,50 0.0107144993
324.9251
324.93
Zmpy 0.65
ZHI
a
j b cofficientation
deviation
I
staggard
b
deviation
bo 5 b x standard
0
megan year
b 0.037 1.1026
bo 572.52 1 1026 29.67
539.805
Fp 539.805 1.1026Age
R
b
Yes
Both variables are quantitative Fp 539.805 1.1026 18
594.935 594.94
d
R2 0.0372 1.369 103 0 001469 100
0.1469
0.15
0.15 of the variability in TYP is accounted for the regression
model on
age
6
No Theplot is nearly flat The model explainsalmost none of
the variation R
2
0.0015 in TYP
a
Fairlystraight positive and
moderately strong Possibly some
d
g both
b 0 685 99
s 0.66159 0.662
60 612 2 06621596 3
217.740
e
Everypointof verbal score 217.7
adds 0.662 Math 217.7 0.662Verbal
points to the predicted
math stone
f
Math 217.7 0.6625500 548.7 points
9 residual
y J Fath 217.7 0.662 1800
747.3
8007473
52.7 points
a Math
verbal
5 0.685
5 0.685
unchanged
verbal
Math
I bo bit
b r 0.685 99.5
96.1 0.709
bo 5 b I
596.3 0.709 612.21
162.0967
162.1
Terbal 162.1 0.709 Math
residual Yi 5 0
Y 5
Actual verbal score is HIGHER thanthe predicted verbal score
verbal
162.1 0.709 500 516.6 pts
559.6891
559.7 pts
f
Regressionto the mean Someone whose math score is below
average
is predicted to have a verbal score below average but not as far
in SDs So if we use a predatedverbal score to predict math
score it's differentfromtheactual math score