0% found this document useful (0 votes)
12 views35 pages

L3 Assessingperformance Errors Biasvar Annotated

Uploaded by

meldakarakis0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views35 pages

L3 Assessingperformance Errors Biasvar Annotated

Uploaded by

meldakarakis0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

4/3/18

Regression:
Predicting House Prices
STAT/CSE 416: Intro to Machine Learning
Emily Fox
University of Washington
April 3, 2018
©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Feature x ML ŷ
Training
extraction model
Data

y ⌃
f
ML algorithm

Quality
metric
2 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

1
4/3/18

Inputs vs. features

Hi ML expert,
here is a data
table to analyze

3 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

h(x)
Feature ML ŷ
Training
extraction model
Data

y ŵ

ML algorithm

Quality
metric
4 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

2
4/3/18

Generic linear regression model


Model:
yi = w0 h0(xi) + w1 h1(xi) + … + wD hD(xi) + εi
D
X
= wj hj(xi) + εi
j=0

feature 1 = h0(x) … e.g., 1


feature 2 = h1(x) … e.g., x[1] = sq. ft.
feature 3 = h2(x) … e.g., x[2] = #bath
or, log(x[7]) x[2] = log(#bed) x #bath

feature D+1 = hD(x) … some other function of x[1],…, x[d]
5 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Simple linear regression model


Fit a line through the data
y yi = w0+w1 xi + εi
price ($)

f(x) = w0+w1 x

parameters
square feet (sq.ft.) x of model
6 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

3
4/3/18

Simple linear regression


Model:
yi = w0 + w1 xi+ εi

Input? Output?

feature 1 = parameter 1 = w0
feature 2 = parameter 2 = w1

7 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Even higher order polynomial


y
price ($)

f(x) = w0 + w1 x+ w2 x2 + … + wp xp

square feet (sq.ft.) x


8 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

4
4/3/18

Polynomial regression
Model:
yi = w0 + w1 xi+ w2 xi2 + … + wp xip + εi

Input? Output?

feature 1 = parameter 1 = w0
feature 2 = parameter 2 = w1
feature 3 = parameter 3 = w2
… …
feature p+1 = parameter p+1 = wp
9 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Capturing trends and seasonality


Model:
yi = w0 + w1 ti + w2 sin(2πti / 12) + w3 cos(2πti / 12) + εi
Linear
trend
Seasonal component =
Input? Output? Sin/cos with period 12
(resets annually)
feature 1 =
feature 2 =
feature 3 =
feature 4 =
10 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

5
4/3/18

Adding more inputs


y f(x) = w0 + w1 sq.ft.
+ w2 #bath
price ($)

x[2]

s
o m
ro
th
ba
#
square feet (sq.ft.) x[1]
11 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Example of linear regression with multiple inputs


Model:
yi = w0 + w1 xi[1] + w2 xi[2] + εi

Input? Output?

feature 1 = parameter 1 = w0
feature 2 = parameter 2 = w1
feature 3 = parameter 3 = w2

12 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

6
4/3/18

Generic linear regression model


Model:
yi = w0 h0(xi) + w1 h1(xi) + … + wD hD(xi) + εi
D
X
= wj hj(xi) + εi
j=0

feature 1 = h0(x) … e.g., 1


feature 2 = h1(x) … e.g., x[1] = sq. ft.
feature 3 = h2(x) … e.g., x[2] = #bath
or, log(x[7]) x[2] = log(#bed) x #bath

feature D+1 = hD(x) … some other function of x[1],…, x[d]
13 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

h(x)
Feature ML ŷ
Training
extraction model
Data

y ŵ

ML algorithm

Quality
metric
14 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

7
4/3/18

RSS for multiple regression

RSS(w) = (yi- )2
price ($)

x[2]

s
om
ro
th
ba
#

square feet (sq.ft.) x[1]


15 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

h(x)
Feature ML ŷ
Training
extraction model
Data

y ŵ

ML algorithm

Quality
metric
16 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

8
4/3/18

Gradient descent
Algorithm:

while not converged Δ


w(t+1) ß w(t) - η RSS(w(t))

17 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

h(x)
Feature ML ŷ
Training
extraction model
Data

y ŵ

ML algorithm

Quality
metric
18 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

9
4/3/18

Compact notation
D
X
f(xi) = w0 h0(xi) + w1 h1(xi) + … + wD hD(xi) = wj hj(xi)
j=0

1 0 0 0 5 3 0 0 1 0 0 0 0

3 0 0 0 2 0 0 1 0 1 0 0 0

19 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Interpreting the fitted function

©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

10
4/3/18

Interpreting the coefficients –


Simple linear regression
y ŷ = ŵ0 + ŵ1 x
price ($)

predicted
change in $

1 sq. ft.

square feet (sq.ft.) x


21 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Interpreting the coefficients –


Two linear features
ŷ = ŵ0 + ŵ1 x[1] + ŵ2 x[2]
fix
y
price ($)

x[2]
s
om
ro
th
ba
#

square feet x[1]


(sq.ft.)
22 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

11
4/3/18

Interpreting the coefficients –


Two linear features
ŷ = ŵ0 + ŵ1 x[1] + ŵ2 x[2]
fix
y
price ($)

predicted
change in $
For fixed
1 bathroom # sq.ft.!
# bathrooms x[2]
23 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Interpreting the coefficients –


Multiple linear features
ŷ = ŵ0 + ŵ1 x[1] + …+ŵj x[j] + … + ŵd x[d]
fix fix fix fix
y
price ($)

x[2]
s
om
ro
th
ba
#

square feet x[1]


(sq.ft.)
24 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

12
4/3/18

Interpreting the coefficients-


Polynomial regression
ŷ = ŵ0 + ŵ1x +… + ŵj xj + … + ŵp xp

y
price ($)

Can’t hold other


features fixed!

square feet x
25
(sq.ft.) STAT/CSE 416: Intro to Machine Learning
©2018 Emily Fox

BONUS: Influence of high leverage points

©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

13
4/3/18

SWITCH TO IPYNB

©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

BONUS: Asymmetric errors

©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

14
4/3/18

Symmetric cost functions


y Residual sum of squares (RSS)
price ($)

RSS(w0,w1) = (yi-[w0+w1xi])2

Assumes cost of over-


estimating sales price is same
as under-estimating
square feet (sq.ft.) x
29 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Asymmetric cost functions


y
different solution
price ($)

What if cost of listing house


too high has bigger cost?
Too high à no offers ($=0)
Too low à offers for lower $
square feet (sq.ft.) x
30 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

15
4/3/18

Summary for regression

©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

What you can do now…


• Describe the input (features) and output (real-valued predictions) of a
regression model
• Calculate a goodness-of-fit metric (e.g., RSS)
• Understand how gradient descent is used to estimate model parameters
by minimizing RSS
• Exploit the estimated model to form predictions
• Describe a regression model using multiple features
• Interpret coefficients in a regression model with multiple features
• Describe other applications where regression is useful

32 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

16
4/3/18

Assessing Performance
STAT/CSE 416: Intro to Machine Learning
Emily Fox
University of Washington
April 3, 2018
©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Make predictions, get $, right??


Model + algorithm à fitted function
Algorithm
Model Predictions à decisions à outcome

Fit f

34 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

17
4/3/18

Or, how much am I losing?


Example: Lost $ due to inaccurate listing price
- Too low à low offers
- Too high à few lookers + no/low offers

How much am I losing compared to perfection?

Perfect predictions: Loss = 0


My predictions: Loss = ???

35 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Measuring loss
Loss function: Cost of using ŵ at x
when y is true
L(y,fŵ(x))
actual
f(x) = predicted value ŷ
value

Examples: (assuming loss for underpredicting = overpredicting)


Absolute error: L(y,fŵ(x)) = |y-fŵ(x)|
Squared error: L(y,fŵ(x)) = (y-fŵ(x))2
36 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

18
4/3/18

Fit data with a line or … ?


y
price ($)

Dude, it’s
not a linear
relationship!
square feet (sq.ft.) x
37 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

What about a quadratic function?


y
price ($)

Dude, it’s
not a linear
relationship!
square feet (sq.ft.) x
38 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

19
4/3/18

Even higher order polynomial


y
price ($)

I can
minimize
your RSS
square feet (sq.ft.) x
39 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Do you believe this fit?


y
price ($)

My house
isn’t worth
so little

square feet (sq.ft.) x


40 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

20
4/3/18

Do you believe this fit?


y
price ($)

Minimizes RSS,
but bad predictions

square feet (sq.ft.) x


41 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

“Remember that all models are wrong; the


practical question is how wrong do they have
to be to not be useful.” George Box, 1987.

©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

21
4/3/18

Assessing the loss

©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Assessing the loss


Part 1: Training error

©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

22
4/3/18

Define training data


y
price ($)

square feet (sq.ft.) x


45 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Define training data


y
price ($)

square feet (sq.ft.) x


46 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

23
4/3/18

Example:
Fit quadratic to minimize RSS
y
price ($)

ŵ minimizes
RSS of
training data
square feet (sq.ft.) x
47 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Compute training error


1. Define a loss function L(y,fŵ(x))
- E.g., squared error, absolute error,…

2. Training error
= avg. loss on houses in training set
N
1 X
= L(yi,fŵ(xi))
N i=1

fit using training data

48 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

24
4/3/18

Example:
Use squared error loss (y-fŵ(x))2
y
price ($)

Training error (ŵ) = 1/N *


[($train 1-fŵ(sq.ft.train 1))2
+ ($train 2-fŵ(sq.ft.train 2))2
+ ($train 3-fŵ(sq.ft.train 3))2
+ … include all
square feet (sq.ft.) x training houses]
49 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Example:
Use squared error loss (y-fŵ(x))2
y
Training error (ŵ) =
N
1 X
(yi-fŵ(xi))2
price ($)

N i=1

RMSE
v =
u N
u1 X
t (y -f (x ))2
N i=1 i ŵ i

square feet (sq.ft.) x


50 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

25
4/3/18

Training error vs. model complexity

y
Error

price ($)
square feet (sq.ft.) x
Model complexity

51 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Training error vs. model complexity

y
Error

price ($)

square feet (sq.ft.) x


Model complexity

52 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

26
4/3/18

Training error vs. model complexity

y
Error

price ($)
square feet (sq.ft.) x
Model complexity

53 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Training error vs. model complexity

y
Error

price ($)

square feet (sq.ft.) x


Model complexity

54 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

27
4/3/18

Training error vs. model complexity


Error

y Model complexity y

55 x ©2018 Emily Fox


x STAT/CSE 416: Intro to Machine Learning

Is training error a good measure of predictive


performance?
How do we expect to perform on a new house?

y
price ($)

56
square feet (sq.ft.) x
©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

28
4/3/18

Is training error a good measure of predictive


performance?
Is there something particularly bad about having xt sq.ft.??

y
price ($)

xt
57
square feet (sq.ft.) x
©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Is training error a good measure of predictive


performance?
Issue:
Training error is overly optimistic…ŵ was fit to training data
y
price ($)

Small training error ≠> good predictions


unless training data includes everything you
might ever see
xt
58
square feet (sq.ft.) x
©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

29
4/3/18

Assessing the loss


Part 2: Generalization (true) error

©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Generalization error

Really want estimate of loss over all possible ( ,$) pairs

Lots of houses
in neighborhood,
but not in dataset

60 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

30
4/3/18

Distribution over houses


In our neighborhood, houses of what # sq.ft. ( )
are we likely to see?

square feet (sq.ft.)


61 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Distribution over sales prices


For houses with a given # sq.ft. ( ), what house prices $
are we likely to see?

For fixed
# sq.ft.

price ($)

62 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

31
4/3/18

Generalization error definition


Really want estimate of loss over all possible ( ,$) pairs

average over all possible


(x,y) pairs weighted by
Formally: how likely each is

generalization error = Ex,y[L(y,fŵ(x))]

fit using training data

63 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Generalization error vs. model complexity

y
Error


price ($)

square feet (sq.ft.) x


Model complexity

64 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

32
4/3/18

Generalization error vs. model complexity

y
Error

price ($)
square feet (sq.ft.) x
Model complexity

65 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Generalization error vs. model complexity

y fŵ
Error

price ($)

square feet (sq.ft.) x


Model complexity

66 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

33
4/3/18

Generalization error vs. model complexity

y fŵ
Error

price ($)
square feet (sq.ft.) x
Model complexity

67 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

Generalization error vs. model complexity

y
Error


price ($)

square feet (sq.ft.) x


Model complexity

68 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning

34
4/3/18

Generalization error vs. model complexity

Can’t
Error

compute!

y Model complexity y

69 x ©2018 Emily Fox


x STAT/CSE 416: Intro to Machine Learning

35

You might also like