0% found this document useful (0 votes)

37 views

Machine Learning and Pattern Recognition Bayesian Complexity Control

The document discusses Bayesian model selection and choosing between models of different complexity. It describes how simpler models tend to be favored as they concentrate prior probability more on observed data patterns, while more complex models distribute probability too broadly. Hyperparameters like noise levels can also be selected by maximizing marginal likelihood instead of cross-validation.

Uploaded by

zeliawillscumberg

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views

Machine Learning and Pattern Recognition Bayesian Complexity Control

Uploaded by

zeliawillscumberg

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Bayesian model choice

Fully Bayesian procedures can’t suffer from “overfitting” exactly, because parameters aren’t
fitted: Bayesian statistics only involves integrating or summing over uncertain parameters, not
optimizing. The predictions can depend heavily on the model, and choice of prior however.
Previously we chose models — and parameters that controlled the complexity of models — by
cross-validation. The Bayesian framework offers an alternative, using marginal likelihoods.

1 The problems we have instead of overfitting

In the Bayesian framework, we should use any knowledge we have. So if we knew (for some
reason) that some points were observations of a 9th order polynomial, we would ideally use
that model regardless of how much data we had. For example, given only 5 data points, we
can’t know where the underlying function is. However, predictions use a weighted average
over all possible fits: our predictive distribution would be broad/uncertain, and centred
around a sensible regularized interpolant (i.e. the fitted function).
That’s not to say we have no problems when using Bayesian methods.
If the model is too simple, then the posterior distribution over weights becomes sharply
peaked around the least bad fit. This is because the likelihood will be low for all models,
but due to extremely unlikely observations, the likelihood will be much lower for models
that aren’t close to the least bad fit. The posterior is normalized, which means that the low
likelihood of the least bad fit will be pushed up.
In the following plot, we have a case where the model is too simple:

−2

−4

−3 −2 −1 0 1 2 3

Here, we assumed a line model. But the given observations clearly follow a bent curve,
so the line model is too simple. While it looks like there is just one line, we’ve drawn 12
lines from the posterior almost on top of each other, indicating that the posterior is sharply
peaked.
A sharply peaked posterior represents a belief with low uncertainty. As a result, we can be
very confident about properties of a model, even if running some checks (such as looking at
residuals) would show that the model is obviously in strong disagreement with the data.
Strong correlations between residuals is an indication for this being the case.
We can also have problems when we use a model that’s too complicated. As an extreme
example for illustration, we could imagine fitting a function on x ∈ [0, 1] with a million
RBFs spaced evenly over that range with bandwidths ∼ 10−6 . We can closely represent any
reasonable function with this representation. However, given (say) 20 observations, most of
the basis functions will be many bandwidths away from all of the observations. Thus, the
posterior distribution over most of the coefficients will be similar to the prior (check: can you

MLPR:w5a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 1

see why?). Except at locations that are nearly on top of the observed data, our predictions
will be nearly the same as under the prior. We will learn slowly with this model.
[The website version of this note has a question here.]

2 Simple cases of probabilistic model choice

We have already used a simple form of probabilistic model comparison in Bayes classifiers.
Given two fixed models p(x | y = 1) and p(x | y = 0), we could evaluate a feature vector under
each and use Bayes’ rule to express our beliefs, P(y | x), about which model the features
came from.
With Gaussian class models, there are different ways that a model can win a comparison.
If a model has a tight distribution, then it will usually be the most probable model when
observations are close to its mean, even if those observations could also have come from a
broad distribution centred nearby. On the other hand, broad distributions become the most
probable model for extreme feature vectors or outliers. Finally, while the prior probabilities
of the class models have some effect, the likelihoods of the models often dominate.
In the card game video, we talked about a nice example of comparing broad and narrow
models is dice with different numbers of sides. If we told you that we chose a dice at random
from a million-sided dice and a 10-sided dice and got a 5, you’d be pretty sure we’d rolled
the 10-sided dice. That’s partly because of priors: you don’t think we could possibly own
a million sided dice. But even if you thought that million-sided dice were just as common,
you’d have the same view. For example, we could implement this game on a computer and
show you the Python code:
from random import random
from math import floor
sides = [10, 1e6]
dice = (0.5 < random())
outcome = floor(random() * sides[dice]) + 1
outcome
Every time you see this code output a number between 1 and 10, you’ll assume that it came
from dice=1. While it’s rare to get a small outcome like 5 with a million sided dice, it’s no
rarer than any other outcome, such as 63,823. However, the small outcomes are more easily
generated under the alternative narrow model, so it wins the comparison.
[The website version of this note has a question here.]

3 Application to regression models

Why do we usually favour a simple fit over the model with a million narrow basis functions
described above? It could be because of priors: we might favour simple models. But actually
the world is complicated, and our prior beliefs are that many functions have many degrees
of freedom. What would happen if we gave half of our prior mass to the model with a
million narrow basis functions? It would still usually lose a model comparison.
A regression model has to distribute its prior mass over all of the possible regression surfaces
that it can represent, or over all of its parameters. If a model can represent many different
regression surfaces, only some of these will match the data. The probability that a model M
assigns to some observations in a training set is:
Z Z
p(y | X, M) = p(y, w | X, M) dw = p(y | X, w, M) p(w | M) dw, (1)

where the parameters w are assumed unknown. Narrowly focussed models, where the
mass of the prior distribution p(w | M) is concentrated on simple curves, will assign higher

MLPR:w5a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 2

density to the outputs y observed in many natural datasets than the million narrow basis
function model. The narrow basis function model can model smooth functions, but can
also fit highly oscillating data — it’s a broader model that can explain outliers, but that will
usually lose to simpler models for well-behaved data.
The probability of the data under the model, given above, is the model’s marginal likelihood,
and can be used to score different models, instead of a cross-validation score. For Gaussian
models, and some other models (usually with conjugate priors, as discussed in the previous
note), we can compute the integral. Later in the course we also discuss how to approximate
marginal likelihoods, where we can’t solve the integral in closed form.
[The website version of this note has a question here.]

4 Application to hyperparameters
The main challenge with model-selection is often setting real-valued parameters like the
noise level, the typical spread of the weights (their prior standard deviation), and the widths
of some radial basis functions. These values are harder to cross-validate than simple discrete
choices, and if we have too many of these parameters, we can’t cross-validate them all.
Incidentally, in the million narrow RBFs example, the main problem wasn’t that there
were a million RBFs, it was that they were narrow. Linear regression will make reasonable
predictions with many RBFs and only a few datapoints if the bandwidth parameter is broad
and we regularize. So we usually don’t worry about picking a precise number of basis
functions.1
The full Bayesian approach to prediction was described in the previous note. We integrate
over all parameters we don’t know. In a fully Bayesian approach, that integral would
include noise levels, the standard deviation of the weights in the prior, and the widths
of basis functions. Because the best setting of each of these values is unknown. However,
computing integrals over all of these quantities can be difficult (we will return to methods
for approximating such difficult integrals later in the course).
A simpler approach is to maximize some parameters, according to their marginal likelihood,
the likelihood with most of the parameters integrated out. For example, in a linear regression
model with prior
p(w | σw ) = N (w; 0, σw2 I), (2)
and likelihood
p(y | x, w, σy ) = N (y; w> x, σy2 ), (3)
we can fit the hyperparameters σw and σy , parameters which specify the model, to maximize
their marginal likelihood:
Z Z
p(y | X, σw , σy ) = p(y, w | X, σw , σy ) dw = p(y | X, w, σy ) p(w | σw ) dw. (4)

No held-out validation set is required.

Fitting a small number of parameters (σw and σy ) to the marginal likelihood is less prone
to overfitting than fitting everything (σw , σy , and w) to the likelihood p(y | X, w, σw , σy ).
However, overfitting is still possible.

5 Check your understanding

Can you work out how to optimize the marginal likelihood p(y | X, σw , σy ) for a linear
regression model? Look back at the initial note on Bayesian regression for results that could

1. When we cover Gaussian processes, we will have an infinite number of basis functions and still be able to make
sensible predictions!

MLPR:w5a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 3

be useful. In that note we were assuming that the hyperparameters σw and σy were known
and fixed. In the notation of that note, the marginal likelihood was simply p(y | X ), because
we didn’t bother to condition every expression on the hyperparameters. More guidance in
the footnote2 .

6 Further Reading
Bishop Section 3.4 and Murphy Section 7.6.4 are on Bayesian model selection. For keen
students, earlier sections of Murphy give mathematical detail for a Bayesian treatment of the
noise variance.
For keen students: Chapter 28 of MacKay’s book has a lengthier discussion of Bayesian
model comparison. Some time ago, Iain wrote a note discussing one of the figures in that
chapter.
For very keen students: It can be difficult to put sensible priors on models with many
parameters. In these situations it can sometimes be better to start out with a model class
that we know is too simple, and only swap to a complex model when we have a lot of data.
Bayesian model comparison can fail to tell us the best time to switch to a more complex
model. The paper Catching up faster by switching sooner (Erven et al., 2012) has a nice
language modelling example, and Iain’s thoughts are in the discussion of the paper.
For very keen students, Gelman et al.’s Bayesian Data Analysis book is a good starting point
for reading about model checking and criticism. All models are wrong, but we want to
improve parts of a model that are most strongly in disagreement with the data.

2. Bayes’ rule tells us that: p(w | D) = p(w) p(y | w, X )/p(y | X ). The Bayesian regression note identified all of
the distributions in this equation except for p(y | X ), so we can simply rearrange it to write p(y | X ) as a fraction
containing three Gaussian distributions. The identity is true for any w, so we can use any w (e.g., w = 0, or w = w N )
and we will get the same answer. We could optimize the hyperparameters by grid search.

MLPR:w5a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 4

Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
N-1000-III/IV: Installation and Programming Manual
No ratings yet
N-1000-III/IV: Installation and Programming Manual
121 pages
Bayesian Nonparametrics and The Probabilistic Approach To Modelling
No ratings yet
Bayesian Nonparametrics and The Probabilistic Approach To Modelling
27 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
Bayesian Modelling For Data Analysis and Learning From Data
No ratings yet
Bayesian Modelling For Data Analysis and Learning From Data
19 pages
PRML Slides 3
No ratings yet
PRML Slides 3
57 pages
COMP4702 Notes 2019: Week 2 - Supervised Learning
No ratings yet
COMP4702 Notes 2019: Week 2 - Supervised Learning
23 pages
Chapter-3-Linear Models For Regression
100% (1)
Chapter-3-Linear Models For Regression
61 pages
Lecture 13 - Least Squares
No ratings yet
Lecture 13 - Least Squares
28 pages
HW1
No ratings yet
HW1
18 pages
Chapter 6: How To Do Forecasting by Regression Analysis
No ratings yet
Chapter 6: How To Do Forecasting by Regression Analysis
7 pages
ML Algo
No ratings yet
ML Algo
36 pages
Fdsa UNIT V
No ratings yet
Fdsa UNIT V
18 pages
Question Bank 3&4 Unit ML
No ratings yet
Question Bank 3&4 Unit ML
6 pages
HIGH-DIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH NONCONVEXITY
No ratings yet
HIGH-DIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH NONCONVEXITY
28 pages
Model Selection
No ratings yet
Model Selection
11 pages
ML Assignment 1
No ratings yet
ML Assignment 1
7 pages
ML Lecture Linear Regression 1
No ratings yet
ML Lecture Linear Regression 1
33 pages
A) The Least-Squares Method
No ratings yet
A) The Least-Squares Method
19 pages
Advanced Regression Assignment
No ratings yet
Advanced Regression Assignment
5 pages
Unit 2
No ratings yet
Unit 2
8 pages
5
No ratings yet
5
23 pages
assignment
No ratings yet
assignment
7 pages
Us20 Allison
No ratings yet
Us20 Allison
10 pages
Em Semester Project
No ratings yet
Em Semester Project
21 pages
Chapter 3 Summary
No ratings yet
Chapter 3 Summary
8 pages
Bias Variance Ridge Regression
No ratings yet
Bias Variance Ridge Regression
4 pages
Linear Regression and Classification
No ratings yet
Linear Regression and Classification
8 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Understanding The Bias-Variance Tradeoff and Visualizing It With Example and Python Code - by Aqeel Anwar - Towards Data Science
No ratings yet
Understanding The Bias-Variance Tradeoff and Visualizing It With Example and Python Code - by Aqeel Anwar - Towards Data Science
13 pages
Plenary - Willsky
No ratings yet
Plenary - Willsky
46 pages
Perspectives On System Identification
100% (1)
Perspectives On System Identification
13 pages
Essay
No ratings yet
Essay
10 pages
jpskycak-2018-intuiting-predictive-algorithms-1
No ratings yet
jpskycak-2018-intuiting-predictive-algorithms-1
16 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
6 pages
Uncertainty Assessment For Reconstructions Based On Deformable Geometry
No ratings yet
Uncertainty Assessment For Reconstructions Based On Deformable Geometry
7 pages
Chi Squared Tutorial
No ratings yet
Chi Squared Tutorial
5 pages
Spatial Inla
No ratings yet
Spatial Inla
32 pages
Week 6 - Model Assumptions in Linear Regression
No ratings yet
Week 6 - Model Assumptions in Linear Regression
17 pages
On Adding and Subtracting Eigenspaces With Evd and SVD: Peter Hall David Marshall Ralph Martin
No ratings yet
On Adding and Subtracting Eigenspaces With Evd and SVD: Peter Hall David Marshall Ralph Martin
16 pages
Anomaly Detection - Problem Motivation
No ratings yet
Anomaly Detection - Problem Motivation
9 pages
ML Questions 2021
100% (1)
ML Questions 2021
26 pages
Assignment On Probit Model
No ratings yet
Assignment On Probit Model
17 pages
Deon Garrett Et Al - Comparison of Linear and Nonlinear Methods For EEG Signal Classification
No ratings yet
Deon Garrett Et Al - Comparison of Linear and Nonlinear Methods For EEG Signal Classification
7 pages
Financial Econometrics Lecture 4
No ratings yet
Financial Econometrics Lecture 4
41 pages
Machinelearning Algorithm Basics2 NOTES
No ratings yet
Machinelearning Algorithm Basics2 NOTES
72 pages
Bayesian Analysis of Binary Sequences
No ratings yet
Bayesian Analysis of Binary Sequences
13 pages
Hands On Bayesian Statistics With Python
No ratings yet
Hands On Bayesian Statistics With Python
12 pages
General Linear Model: Statistical Methods
No ratings yet
General Linear Model: Statistical Methods
9 pages
IV_AI-DS_AD3491_FDSA_Unit5
No ratings yet
IV_AI-DS_AD3491_FDSA_Unit5
39 pages
Machine Learning and Pattern Recognition Week 2
No ratings yet
Machine Learning and Pattern Recognition Week 2
7 pages
Immediate download Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual all chapters
100% (2)
Immediate download Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual all chapters
37 pages
Deep Learning Answers
No ratings yet
Deep Learning Answers
36 pages
Regression Models Overview
No ratings yet
Regression Models Overview
170 pages
14-AOS1221
No ratings yet
14-AOS1221
37 pages
Automatic Selection by Penalized Asymmetric L - Norm in An High-Dimensional Model With Grouped Variables
No ratings yet
Automatic Selection by Penalized Asymmetric L - Norm in An High-Dimensional Model With Grouped Variables
39 pages
Probability Concepts Explained
No ratings yet
Probability Concepts Explained
10 pages
Spatial Probit Model in R
No ratings yet
Spatial Probit Model in R
14 pages
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Biological Data Science Lecture6
No ratings yet
Biological Data Science Lecture6
29 pages
w2c_central_limit
No ratings yet
w2c_central_limit
1 page
MATH11183 Week 1-Part 2
No ratings yet
MATH11183 Week 1-Part 2
18 pages
w2e_multivariate_gaussian
No ratings yet
w2e_multivariate_gaussian
6 pages
MDA3S
No ratings yet
MDA3S
22 pages
Part 4
No ratings yet
Part 4
24 pages
Biological Data Science Lecture4
No ratings yet
Biological Data Science Lecture4
21 pages
TS Part2
No ratings yet
TS Part2
62 pages
Week 2 Naive Bayes
No ratings yet
Week 2 Naive Bayes
15 pages
BDS 2018-19
No ratings yet
BDS 2018-19
6 pages
Part 5
No ratings yet
Part 5
31 pages
BDS 2016-17
No ratings yet
BDS 2016-17
4 pages
Part 3
No ratings yet
Part 3
29 pages
Week 8 Pca
No ratings yet
Week 8 Pca
26 pages
PMRslides 03 B
No ratings yet
PMRslides 03 B
45 pages
MLPR w0f - Machine Learning and Pattern Recognition
No ratings yet
MLPR w0f - Machine Learning and Pattern Recognition
3 pages
PMRslides 02
No ratings yet
PMRslides 02
13 pages
Bayesian Workshop1 Solution
No ratings yet
Bayesian Workshop1 Solution
3 pages
Slides 03 A
No ratings yet
Slides 03 A
21 pages
w9b Netflix Prize
No ratings yet
w9b Netflix Prize
3 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
2019 AMAM Exam Paper
No ratings yet
2019 AMAM Exam Paper
3 pages
Bio Statslectures
No ratings yet
Bio Statslectures
60 pages
Bayesian Week4 LectureNotes
No ratings yet
Bayesian Week4 LectureNotes
15 pages
Machine Learning and Pattern Recognition - Laplace - Approximation
No ratings yet
Machine Learning and Pattern Recognition - Laplace - Approximation
4 pages
2017 AMAM Exam Paper
No ratings yet
2017 AMAM Exam Paper
6 pages
Heat Advection
No ratings yet
Heat Advection
12 pages
Machine Learning and Pattern Recognition Minimal Stochastic Variational Inference Demo
No ratings yet
Machine Learning and Pattern Recognition Minimal Stochastic Variational Inference Demo
3 pages
Machine Learning and Pattern Recognition Sampling Based Approximations
No ratings yet
Machine Learning and Pattern Recognition Sampling Based Approximations
3 pages
Machine Learning and Pattern Recognition Variational KL
No ratings yet
Machine Learning and Pattern Recognition Variational KL
5 pages
SWM0081 D400 Quick Start Guide V100 R0
No ratings yet
SWM0081 D400 Quick Start Guide V100 R0
3 pages
TegraPol 21 25 31 35 TegraForce 3 5 TegraDoser 1 PDF
No ratings yet
TegraPol 21 25 31 35 TegraForce 3 5 TegraDoser 1 PDF
346 pages
N-Channel Enhancement Mode MOSFET: Product Summary
No ratings yet
N-Channel Enhancement Mode MOSFET: Product Summary
5 pages
Get Trading with Candlesticks Visual Tools for Improved Technical Analysis and Timing 1st Edition Michael C. Thomsett free all chapters
No ratings yet
Get Trading with Candlesticks Visual Tools for Improved Technical Analysis and Timing 1st Edition Michael C. Thomsett free all chapters
45 pages
Civil Engineering Department List of Laboratories: Sr. No. Name of The Laboratory/ Work
No ratings yet
Civil Engineering Department List of Laboratories: Sr. No. Name of The Laboratory/ Work
9 pages
Mesfin Hunde 2017
No ratings yet
Mesfin Hunde 2017
18 pages
Week 7 AIDA Insights and Applications 3
No ratings yet
Week 7 AIDA Insights and Applications 3
30 pages
X415JA_4_2_60NB0SR0_MB6210_X515JAB_MB_4G_I5_1035G1_AS_R4_2_DMIC
No ratings yet
X415JA_4_2_60NB0SR0_MB6210_X515JAB_MB_4G_I5_1035G1_AS_R4_2_DMIC
11 pages
Intro STK RDM
No ratings yet
Intro STK RDM
41 pages
Well Performance Analysis Detailed Guide
No ratings yet
Well Performance Analysis Detailed Guide
6 pages
Arithmetic B
No ratings yet
Arithmetic B
21 pages
Bank Management File
No ratings yet
Bank Management File
35 pages
EEA3 EIA3 Specification v1 4
No ratings yet
EEA3 EIA3 Specification v1 4
15 pages
Wa0012
No ratings yet
Wa0012
39 pages
Distance Education Technologies Als
No ratings yet
Distance Education Technologies Als
11 pages
A Conceptual Foundation For The Shannon-Weaver Model of Communication
No ratings yet
A Conceptual Foundation For The Shannon-Weaver Model of Communication
9 pages
Yasir Imam Rizvi
No ratings yet
Yasir Imam Rizvi
2 pages
3.3.2.8 Lab - Configuring Basic PPP With Authentication
0% (2)
3.3.2.8 Lab - Configuring Basic PPP With Authentication
17 pages
N101bge L31
No ratings yet
N101bge L31
36 pages
Group 14 Minor Project Report
No ratings yet
Group 14 Minor Project Report
57 pages
Program Kerja: Bagian Keuangan Tahun 2020
No ratings yet
Program Kerja: Bagian Keuangan Tahun 2020
59 pages
CS430 Computer Graphics: Chi-Cheng Lin, Winona State University
No ratings yet
CS430 Computer Graphics: Chi-Cheng Lin, Winona State University
37 pages
Tabel A18 Runs Test PDF
No ratings yet
Tabel A18 Runs Test PDF
2 pages
DLS System
No ratings yet
DLS System
9 pages
Msbte Question
No ratings yet
Msbte Question
7 pages
TM30-333 Chinese Language Guide
No ratings yet
TM30-333 Chinese Language Guide
75 pages
11-2.3.04 Electricity at Work
No ratings yet
11-2.3.04 Electricity at Work
10 pages
Philips Bds4223v 27 Service Manual LCD
No ratings yet
Philips Bds4223v 27 Service Manual LCD
88 pages
Linking and Relocation - Stacks - Procedures - Macros
No ratings yet
Linking and Relocation - Stacks - Procedures - Macros
11 pages

Machine Learning and Pattern Recognition Bayesian Complexity Control

Uploaded by

Machine Learning and Pattern Recognition Bayesian Complexity Control

Uploaded by

Bayesian model choice

1 The problems we have instead of overfitting

MLPR:w5a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 1

2 Simple cases of probabilistic model choice

3 Application to regression models

MLPR:w5a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 2

No held-out validation set is required.

5 Check your understanding

MLPR:w5a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 3

MLPR:w5a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 4

You might also like