Approximate Inference: Sargur Srihari Srihari@cedar - Buffalo.edu
Approximate Inference: Sargur Srihari Srihari@cedar - Buffalo.edu
Srihari
Approximate Inference
Sargur Srihari [email protected]
Machine Learning
Srihari
Machine Learning
Srihari
Machine Learning
Srihari
Types of Approximations
1. Stochastic
Markov chain Monte Carlo
Have allowed use of Bayesian methods across many domains
Computationally demanding
They can generate exact results
2. Deterministic
Variational Inference (or Variational Bayes) Based on analytical approximations to posterior
e.g., particular factorization or specific parametric form such as Gaussian
Machine Learning
Srihari
Variational Inference
Based on Calculus of Variations
Invented by Euler
Standard Calculus concerns derivatives of functions
Function takes variable as input and returns value of function
Example of a functional is entropy H[p(x)] = p(x)ln p(x) dx Functional Derivative Quantity maximized is a functional
How does function change in response to small changes in input function
Machine Learning
Srihari
Functional takes a function as input and returns the functional value as output
Derivative of functional describes how value of functional changes with infinitesimal changes in input function
Recall Gaussian Processes dealt with distributions of functions Now we deal with how to find a function that maximizes a functional, e.g., entropy
6
Machine Learning
Srihari
Inference Problem
Observed Variables
X = {x1,.., xN} N i.i.d. data
Machine Learning
Srihari
Machine Learning
Srihari
Kullback-Leibler Divergence
Some Observations:
Lower bound on ln p(X) is L(q) Maximizing the lower bound L(q) wrt distribution q(Z) is equivalent to minimizing KL Divergence When KL divergence vanishes q(Z) equals the posterior p(Z|X)
Plan:
We seek that distribution q(Z) for which L(q) is largest Since true posterior is intractable we consider restricted family for q(Z) Seek member of this family for which KL divergence is minimized
9
Machine Learning
Srihari
Negative Logarithms
10
Machine Learning
Srihari
Factorized Approximations
Restricting the family of distributions Partition the elements of Z into disjoint groups
M
q(Z) = qi (Z i )
i=1
Among all distributions q(Z) having this form we seek the one for which lower bound L(q) is largest
11
Machine Learning
Srihari
Machine Learning
Srihari
Blue Contours: multimodal distribution p(Z) Red Contours in (a): Single Gaussian q(Z) that best approximates p(Z) in (b): single Gaussian q(Z) that best approximates p(Z) in the sense of minimizing KL(p||q) in(c): showing different local minimum of KL divergence
13
Machine Learning
Srihari
where < < KL(p||Q) corresponds to 1 KL(q||p) corresponds to -1 For all a D(p||q) >0 with equality iff p(x)=q(x) When =0 we get symmetric divergence that is linearly related to the Hellinger Distance (a valid distance measure) D ( p || q) = ( p(x)1/ 2 q(x)1/ 2 ) 2 dx
H
14
Machine Learning
Srihari
15
Machine Learning
Srihari
n=1 k=1
p( ) = Dir( | 0 ) = C( 0 ) 0 1 k
k=1
p(,) = p( | ) p()
K
= N(k m0 ( 0 k )1 )W ( k |W 0 , 0 )
k =1
16
Machine Learning
Srihari
Variational Distribution
priors precisions
17
Machine Learning
Srihari
18