0% found this document useful (0 votes)
3 views

lecture03b_overfitting

The document discusses the concepts of underfitting and overfitting in machine learning, particularly focusing on linear models. It explains how underfitting occurs when a model is too simple to capture the underlying function, while overfitting happens when a model is too complex and fits the noise in the data. The document also suggests augmenting input features to improve model performance and provides strategies to mitigate overfitting, such as increasing the amount of data.

Uploaded by

Quan Nguyen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

lecture03b_overfitting

The document discusses the concepts of underfitting and overfitting in machine learning, particularly focusing on linear models. It explains how underfitting occurs when a model is too simple to capture the underlying function, while overfitting happens when a model is too complex and fits the noise in the data. The document also suggests augmenting input features to improve model performance and provides strategies to mitigate overfitting, such as increasing the amount of data.

Uploaded by

Quan Nguyen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Machine Learning Course - CS-433

Underfitting and Overfitting

Sept 24, 2024

Martin Jaggi
Last updated on: September 24, 2024
credits to Mohammad Emtiyaz Khan & Rüdiger Urbanke
Motivation
Models can be too limited or they can be too rich. In the
first case we cannot find a function that is a good fit for the
data in our model. We then say that we underfit. In the
second case we have such a rich model family that we do not
just fit the underlying function but we in fact fit the noise
in the data as well. We then talk about an overfit. Both of
these phenomena are undesirable. This discussion is made
more difficult since all we have is data and so we do not know
a priori what part is the underlying signal and what part is
noise.

Underfitting with Linear Models


It is easy to see that linear models might underfit. Consider
a scalar case as shown in the figure below.

1 M =0
t

−1

0 x 1

The solid curve is the underlying function and the circles


are the actual data. E.g., we assume that there is a scalar
function g(x) but that we do not observe g(xn) directly but
only a noisy version of it, yn = g(xn) + Zn, where Zn is
the noise. The noise might be due for example to some
measurement inaccuracies. The yn are shown as blue circles.
If our model family consists of only linear functions of the
scalar input x, i.e., H = {fw (x) = wx}, where w is a scalar
constant (the slope of the function), then it is clear that
we cannot match the given function accurately, regardless of
how many samples we get and how small the noise is. We
therefore will underfit.

Extended/Augmented Feature Vectors From the above


example it might seem that linear models are too simple to
ever overfit. But in fact, linear models are highly prone to
overfitting, much more so than complicated models like neu-
ral nets.
Since linear models are inherently not very rich the following
is a standard “trick” to make them more powerful.
In order to increase the representational power of linear mod-
els we typically “augment” the input. E.g., if the input (fea-
ture) is one-dimensional we might add a polynomial basis (of
arbitrary degree M ),

ϕ(xn) := [1, xn, x2n, x3n, . . . , xM


n ]

so that we end up with an extended feature vector.


We then fit a linear model to this extended feature vector
ϕ(xn):

yn ≈ w0 + w1xn + w2x2n + . . . + wM xM
n =: ϕ(x n ) w.
Overfitting with Linear Models
In the following four figures, circles are data points, the green
line represents the “true function”, and the red line is the
model. The parameter M is the maximum degree in the
polynomial basis.

1 M =0 1 M =1
t t

0 0

−1 −1

0 x 1 0 x 1

1 M =3 1 M =9
t t

0 0

−1 −1

0 x 1 0 x 1

For M = 0 (the model is a constant) the model is under-


fitting and the same is true for M = 1. For M = 3 the
model fits the data fairly well and is not yet so rich as to fit
in addition the small “wiggles” caused by the noise. But for
M = 9 we now have such a rich model that it can fit every
single data point and we see severe overfitting taking place.
What can we do to avoid overfitting? If you increase the
amount of data (increase N , but keep M fixed), overfitting
might reduce. This is shown in the following two figures
where we again consider the same model complexity M = 9
but we have extra data (N = 15 or even N = 100).

1 N = 15 1 N = 100
t t

0 0

−1 −1

0 x 1 0 x 1

A Word About Notation


If it is important to distinguish the original input x from
the augmented input then we will use ϕ(x) to denote this
augmented input vector. But we can consider this augmen-
tation as part of the pre-processing, and then we might sim-
ply write x to denote the input. This will save us a lot of
notation.

Additional Materials
Read about overfitting in the paper by Pedro Domingos (Sections 3 and 5
of “A few useful things to know about machine learning”).

You might also like