0% found this document useful (0 votes)

33 views

Approximate Inference: Sargur Srihari Srihari@cedar - Buffalo.edu

This document discusses approximate inference methods for probabilistic models using machine learning. It introduces the need for approximate inference when exact posterior distributions and expectations are intractable due to high dimensionality or complex integrals/sums. It describes two main types of approximations: stochastic approximations using Markov chain Monte Carlo methods and deterministic approximations using variational inference. Variational inference finds a tractable distribution to approximate the true posterior by maximizing a evidence lower bound derived using Kullback-Leibler divergence. The document provides examples of variational inference for univariate Gaussians and mixture of Gaussians models.

Uploaded by

phamngovinhphuc123

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Approximate Inference: Sargur Srihari Srihari@cedar - Buffalo.edu

Uploaded by

phamngovinhphuc123

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Machine Learning

Srihari

Approximate Inference
Sargur Srihari [email protected]

Machine Learning

Srihari

Central Tasks of Probabilistic Models

1. Evaluation of posterior distribution p(Z|X)
latent variables Z, observed data X
In classification Z=Class, X=Features In regression Z= w =parameter vector, X= t =target vector

2. Evaluation of expectation wrt to p(Z|X)

In EM: evaluate expectation of complete-data log-likelihood wrt posterior distribution of latent variables

Machine Learning

Srihari

Need for Approximate Inference

Often infeasible to evaluate posterior distributions or expectations wrt distributions
High dimensionality of latent space Complex and intractable expectations

For continuous variables

Required integrations have no closed form solutions Dimensionality of space and integrand prohibit numerical integration

For discrete variables

Summation in marginalization: exponential no. of states
3

Machine Learning

Srihari

Types of Approximations
1. Stochastic
Markov chain Monte Carlo
Have allowed use of Bayesian methods across many domains

Computationally demanding
They can generate exact results

2. Deterministic
Variational Inference (or Variational Bayes) Based on analytical approximations to posterior
e.g., particular factorization or specific parametric form such as Gaussian

Scale well in large applications Can never generate exact results

Machine Learning

Srihari

Variational Inference
Based on Calculus of Variations
Invented by Euler
Standard Calculus concerns derivatives of functions
Function takes variable as input and returns value of function

Functional is a mapping with function as input

Returns value of functional as output

Example of a functional is entropy H[p(x)] = p(x)ln p(x) dx Functional Derivative Quantity maximized is a functional
How does function change in response to small changes in input function

Leonhard Euler Swiss Mathematician 1707-1783

Machine Learning

Srihari

Function versus Functional

Function takes a value of a variable as input and returns the function value as output
Derivative describes how output varies as we make infinitesimal changes in input value

Functional takes a function as input and returns the functional value as output
Derivative of functional describes how value of functional changes with infinitesimal changes in input function

Recall Gaussian Processes dealt with distributions of functions Now we deal with how to find a function that maximizes a functional, e.g., entropy
6

Machine Learning

Srihari

Inference Problem
Observed Variables
X = {x1,.., xN} N i.i.d. data

Latent Variables and Parameters

Z = {z1,..,zN}

Joint distribution p(X,Z) is specified

E.g., full set of tables or pdfs given

Goal is to find approximation for posterior distribution p(Z|X)

Which is given by p(X,Z)/p(X)
7

Machine Learning

Srihari

Approach of Variational Methods

Find tractable distribution q to approximate p Use two ideas:
1. KL Divergence
Which is zero when p =q
q(x) KL( p || q) = p(x)ln dx p(x)

KL(q||p) ln p(X) L(q)

2. For any choice of q(Z)

ln p(X) can be decomposed into two terms

A functional L(q) with arguments q(Z) and X Second is KL(q||p) where p is a distribution p(Z|X)

We wish to maximize L(q)

Which has the effect of minimizing KL
8

Machine Learning

Srihari

Decomposition of Log Marginal Probability

ln p(X) = L(q) + KL(q || p) where p(X,Z) A L(q) = q(Z)ln dZ functional q(Z) and p(Z | X) KL{q || p} = q(Z)ln dZ q(Z)
Also applicable to discrete distributions By replacing integrations with summations

Kullback-Leibler Divergence

Some Observations:

Lower bound on ln p(X) is L(q) Maximizing the lower bound L(q) wrt distribution q(Z) is equivalent to minimizing KL Divergence When KL divergence vanishes q(Z) equals the posterior p(Z|X)

Plan:
We seek that distribution q(Z) for which L(q) is largest Since true posterior is intractable we consider restricted family for q(Z) Seek member of this family for which KL divergence is minimized
9

Machine Learning

Srihari

Variational Approximation Example

Use parametric distribution q(Z|) Lower bound L(q) is a function of and can use standard nonlinear optimization to determine optimal

Negative Logarithms

Laplace Original distribution Variational distribution is Gaussian

Optimized with respect to mean and variance

Machine Learning

Srihari

Factorized Approximations
Restricting the family of distributions Partition the elements of Z into disjoint groups
M

q(Z) = qi (Z i )
i=1

Among all distributions q(Z) having this form we seek the one for which lower bound L(q) is largest
11

Machine Learning

Srihari

Two alternative forms of KL Divergence

Green: Correlated Gaussian distribution For 1, 2 and 3 standard deviations Red: q(z) over same variables given by product of two independent univariate Gaussians

Minimization based on KL Divergence KL(q||p)

Minimization based on Reverse KL Divergence KL(p||q)

Machine Learning

Srihari

Two alternative forms of KL divergence

Approximating a multimodal distribution by a unimodal one
a b c

Blue Contours: multimodal distribution p(Z) Red Contours in (a): Single Gaussian q(Z) that best approximates p(Z) in (b): single Gaussian q(Z) that best approximates p(Z) in the sense of minimizing KL(p||q) in(c): showing different local minimum of KL divergence
13

Machine Learning

Srihari

Alpha Family of Divergences

Two forms of divergence are members of the alpha family of divergences 4 D ( p || q) = (1 p(x) q(x) dx ) 1
(1+ )/ 2 (1 )/ 2

where < < KL(p||Q) corresponds to 1 KL(q||p) corresponds to -1 For all a D(p||q) >0 with equality iff p(x)=q(x) When =0 we get symmetric divergence that is linearly related to the Hellinger Distance (a valid distance measure) D ( p || q) = ( p(x)1/ 2 q(x)1/ 2 ) 2 dx
H

Machine Learning

Srihari

Variational Inference of Univariate Gaussian

Variational Inference of mean and precision Green: Contours of true posterior Iterative scheme converges to red contours

Machine Learning

Srihari

Variational Mixture of Gaussians

Demonstrates how Bayesian treatment elegantly resolves mle issues Conditional distribution of Z given N K
z p(Z | ) = k nk

Conditional distribution of observed data Priors over parameters:

Dirichlet distribution over

n=1 k=1

p(X | Z, ,) = N(x n | k ,1 ) znk k

p( ) = Dir( | 0 ) = C( 0 ) 0 1 k

Gaussian-Wishart over mean and precision

k=1

p(,) = p( | ) p()
K

= N(k m0 ( 0 k )1 )W ( k |W 0 , 0 )
k =1

Machine Learning

Srihari

Variational Distribution
priors precisions

Joint distribution of the random variables:

p(X,Z, , ,) = p(X | Z, ,) p(Z | ) p( | ) p()
means Directed acyclic graph Representing mixture

Machine Learning

Srihari

Variational Bayesian Mixture

K=6 components After convergence there are only two components Density of red ink shows Mixing coefficients

Ch9 2-MixturesofGaussians PDF
No ratings yet
Ch9 2-MixturesofGaussians PDF
38 pages
4 4-Laplace
No ratings yet
4 4-Laplace
25 pages
Basic Sampling Methods: Sargur Srihari Srihari@cedar - Buffalo.edu
No ratings yet
Basic Sampling Methods: Sargur Srihari Srihari@cedar - Buffalo.edu
30 pages
Variational Inference Ref Paper
No ratings yet
Variational Inference Ref Paper
13 pages
Ch13 4-LinearDynamicalSystems
No ratings yet
Ch13 4-LinearDynamicalSystems
20 pages
Class19 Approxinf
No ratings yet
Class19 Approxinf
45 pages
An Introduction To Objective Bayesian Statistics PDF
No ratings yet
An Introduction To Objective Bayesian Statistics PDF
69 pages
Modeling, Inference and Prediction: 2.1 Probabilistic Models
No ratings yet
Modeling, Inference and Prediction: 2.1 Probabilistic Models
16 pages
3.3 Bias Variance
No ratings yet
3.3 Bias Variance
14 pages
Var Bayes Linreg
No ratings yet
Var Bayes Linreg
14 pages
5.4 MLBasics Estimators
No ratings yet
5.4 MLBasics Estimators
23 pages
K-Means Clustering: Sargur Srihari Srihari@cedar - Buffalo.edu
No ratings yet
K-Means Clustering: Sargur Srihari Srihari@cedar - Buffalo.edu
20 pages
Lecture2 2015
No ratings yet
Lecture2 2015
58 pages
Time Grad
No ratings yet
Time Grad
11 pages
Bayesian-inference-slides-2021
No ratings yet
Bayesian-inference-slides-2021
37 pages
24 Variational Inference
No ratings yet
24 Variational Inference
24 pages
A Beginner’s Guide to Variational Inference
No ratings yet
A Beginner’s Guide to Variational Inference
48 pages
Cheat Sheet 1
No ratings yet
Cheat Sheet 1
2 pages
Eric Jang - A Beginner's Guide To Variational Methods - Mean-Field Approximation
No ratings yet
Eric Jang - A Beginner's Guide To Variational Methods - Mean-Field Approximation
9 pages
Lecture5 Maximum Likelihood
No ratings yet
Lecture5 Maximum Likelihood
13 pages
Probability Theory: Sargur N. Srihari Srihari@cedar - Buffalo.edu
No ratings yet
Probability Theory: Sargur N. Srihari Srihari@cedar - Buffalo.edu
49 pages
MCMC Bayes PDF
No ratings yet
MCMC Bayes PDF
27 pages
1901.11033v3
No ratings yet
1901.11033v3
51 pages
Lecture2 2013
No ratings yet
Lecture2 2013
60 pages
Machine Learning and Pattern Recognition Bayesian Complexity Control
No ratings yet
Machine Learning and Pattern Recognition Bayesian Complexity Control
4 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
2.SupervisedLearning Error
No ratings yet
2.SupervisedLearning Error
32 pages
Variation Al
No ratings yet
Variation Al
25 pages
Statistical Methods: Multivariate Analysis
No ratings yet
Statistical Methods: Multivariate Analysis
1 page
Lecture Notes Week 2
No ratings yet
Lecture Notes Week 2
10 pages
Fundamentals of Statistics (18.6501x)
No ratings yet
Fundamentals of Statistics (18.6501x)
20 pages
Auto Encoding Variational Bayes
No ratings yet
Auto Encoding Variational Bayes
14 pages
PRML Slides 2
No ratings yet
PRML Slides 2
86 pages
L5_6_7_ML
No ratings yet
L5_6_7_ML
28 pages
Vapnik - Complete Statistical Theory of Learning Learning U
No ratings yet
Vapnik - Complete Statistical Theory of Learning Learning U
59 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
55 pages
Machine Learning PDF
No ratings yet
Machine Learning PDF
77 pages
Advanced ML Notes (Midterm)
No ratings yet
Advanced ML Notes (Midterm)
10 pages
Lecture6 Handout
No ratings yet
Lecture6 Handout
41 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
111 pages
RigNotes15 PDF
No ratings yet
RigNotes15 PDF
130 pages
Lec11 Introduction2BayesianStatistics
No ratings yet
Lec11 Introduction2BayesianStatistics
48 pages
UNIT 3 - Frequentist Statistics
No ratings yet
UNIT 3 - Frequentist Statistics
65 pages
Notes
No ratings yet
Notes
9 pages
Deep GP Untuk Speech
No ratings yet
Deep GP Untuk Speech
8 pages
hw12 Lichili
No ratings yet
hw12 Lichili
6 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
Variational Bayes
No ratings yet
Variational Bayes
38 pages
Machine Learning and Pattern Recognition Variational KL
No ratings yet
Machine Learning and Pattern Recognition Variational KL
5 pages
Regularization in Neural Networks: Sargur Srihari Srihari@buffalo - Edu
No ratings yet
Regularization in Neural Networks: Sargur Srihari Srihari@buffalo - Edu
31 pages
Importance Sampling
No ratings yet
Importance Sampling
13 pages
Lecture_Notes_MAI
No ratings yet
Lecture_Notes_MAI
114 pages
Understanding Diffusion Models: A Unified Perspective
No ratings yet
Understanding Diffusion Models: A Unified Perspective
23 pages
Bio24_Rathouz
No ratings yet
Bio24_Rathouz
45 pages
0 4 Intro Variational Approximation
No ratings yet
0 4 Intro Variational Approximation
25 pages
UNIT 3-Bayesian Statistics
No ratings yet
UNIT 3-Bayesian Statistics
80 pages
13 LinearFactorModels
No ratings yet
13 LinearFactorModels
37 pages
An Introduction To Variational Calculus in Machine Learning
No ratings yet
An Introduction To Variational Calculus in Machine Learning
7 pages
Principles of Statistics
No ratings yet
Principles of Statistics
113 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Approximate Inference: Sargur Srihari Srihari@cedar - Buffalo.edu

Uploaded by

Approximate Inference: Sargur Srihari Srihari@cedar - Buffalo.edu

Uploaded by

Machine Learning

Central Tasks of Probabilistic Models

2. Evaluation of expectation wrt to p(Z|X)

Need for Approximate Inference

For continuous variables

For discrete variables

Scale well in large applications Can never generate exact results

Functional is a mapping with function as input

Leonhard Euler Swiss Mathematician 1707-1783

Function versus Functional

Latent Variables and Parameters

Joint distribution p(X,Z) is specified

Goal is to find approximation for posterior distribution p(Z|X)

Approach of Variational Methods

KL(q||p) ln p(X) L(q)

2. For any choice of q(Z)

ln p(X) can be decomposed into two terms

We wish to maximize L(q)

Decomposition of Log Marginal Probability

Variational Approximation Example

Laplace Original distribution Variational distribution is Gaussian

Two alternative forms of KL Divergence

Minimization based on KL Divergence KL(q||p)

Minimization based on Reverse KL Divergence KL(p||q)

Two alternative forms of KL divergence

Alpha Family of Divergences

Variational Inference of Univariate Gaussian

Variational Mixture of Gaussians

Conditional distribution of observed data Priors over parameters:

p(X | Z, ,) = N(x n | k ,1 ) znk k

Gaussian-Wishart over mean and precision

Joint distribution of the random variables:

Variational Bayesian Mixture

You might also like