SlideShare a Scribd company logo
Variational Inference
Presenter: Shuai Zhang, CSE, UNSW
Content
1
•Brief Introduction
2
•Core Idea of VI
•Optimization
3
•Example: Bayesian Mix of Gauss
What is Variational Inference?
Variational Bayesian methods are a family of techniques for
approximating intractable integrals arising in Bayesian inference
and machine learning [Wiki].
It is widely used to approximate posterior densities for Bayesian
models, an alternative strategy to Markov Chain Monte Carlo, but
it tends to be faster and easier to scale to large data.
It has been applied to problems such as document analysis,
computational neuroscience and computer vision.
Core Idea
Consider a general problem of Bayesian Inference - Let the latent
variables in our problem be and the observed data
Inference in a Bayesian model amounts to conditioning on data
and computing the posterior
Approximate Inference
The inference problem is to compute the conditional given by the
below equation.
the denominator is the marginal distribution of the data obtained
by marginalizing all the latent variables from the joint distribution
p(x,z).
For many models, this evidence integral is unavailable in closed
form or requires exponential time to compute. The evidence is
what we need to compute the conditional from the joint; this is
why inference in such models is hard
MCMC
In MCMC, we first construct an ergodic Markov chain on z whose
stationary distribution is the posterior
Then, we sample from the chain to collect samples from the
stationary distribution.
Finally, we approximate the posterior with an empirical estimate
constructed from the collected samples.
VI vs. MCMC
MCMC VI
More computationally intensive Less intensive
Guarantees producing
asymptotically exact samples from
target distribution
No such guarantees
Slower Faster, especially for large data
sets and complex distributions
Best for precise inference Useful to explore many scenarios
quickly or large data sets
Core Idea
Rather than use sampling, the main idea behind variational
inference is to use optimization.
we restrict ourselves a family of approximate distributions D over
the latent variables. We then try to find the member of that
family that minimizes the Kullback-Leibler divergence to the exact
posterior. This reduces to solving an optimization problem.
The goal is to approximate p(z|x) with the resulting q(z). We
optimize q(z) for minimal value of KL divergence
Core Idea
The objective is not computable. Because
Because we cannot compute the KL, we optimize an alternative
objective that is equivalent to the KL up to an added constant.
We know from our discussion of EM.
Core Idea
Thus, we have the objective function:
Maximizing the ELBO is equivalent to minimizing the KL
divergence.
Intuition: We rewrite the ELBO as a sum of the expected log
likelihood of the data and the KL divergence between the prior
p(z) and q(z)
Mean field approximation
Now that we have specified the variational objective function
with the ELBO, we now need to specify the variational family of
distributions from which we pick the approximate variational
distribution.
A common family of distributions to pick is the Mean-field
variational family. Here, the latent variables are mutually
independent and each governed by a distinct factor in the
variational distribution.
Coordinate ascent mean-field VI
Having specified our objective function and the variational family
of distributions from which to pick the approximation, we now
work to optimize.
CAVI maximizes ELBO by iteratively optimizing each variational
factor of the mean-field variational distribution, while holding
the others fixed. It however, does not guarantee finding the
global optimum.
Coordinate ascent mean-field VI
given that we fix the value of all other variational factors ql(zl) (l
not equal to j), the optimal 𝑞 𝑗(𝑧𝑗) is proportional to the
exponentiated expected log of the complete conditional. This
then is equivalent to being proportional to the log of the joint
because the mean-field family assumes that all the latent
variables are independent.
Coordinate ascent mean-field VI
Below, we rewrite the first term using iterated expectation and
for the second term, we have only retained the term that
depends on
In this final equation, the RHS is equal to the negative KL
divergence between 𝑞 𝑗 and exp(A). Thus, maximizing this
expression is the same as minimizing the KL divergence between
𝑞 𝑗 and exp(A).
This occurs when 𝑞 𝑗 =exp(A).
Coordinate ascent mean-field VI
Bayesian Mixture of Gaussians
The full hierarchical model of
The joint dist.
Bayesian Mixture of Gaussians
The mean field variational family contains approximate posterior
densities of the form
Bayesian Mixture of Gaussians
Derive the ELBO as a function of the variational factors. Solve for
the ELBO
Bayesian Mixture of Gaussians
Next, we derive the CAVI update for the variational factors.
References
1. https://am207.github.io/2017/wiki/VI.html
2. https://www.cs.princeton.edu/courses/archive/fall11/cos597C/lectures/variational-inference-i.pdf
3. https://www.cs.cmu.edu/~epxing/Class/10708-17/notes-17/10708-scribe-lecture13.pdf
4. https://arxiv.org/pdf/1601.00670.pdf
Week Report
• Last week
• Metric Factorization model
• Learning Group
• This week
• Submit the ICDE paper

More Related Content

What's hot (11)

PPT
Clustering: Large Databases in data mining
ZHAO Sam
 
PDF
Birch
ngocdiem87
 
PDF
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING
csandit
 
PDF
Emergence of Invariance and Disentangling in Deep Representations
Sangwoo Mo
 
PPTX
Clique and sting
Subramanyam Natarajan
 
PPTX
Optics
RohitPaul52
 
PPTX
Iclr2020: Compression based bound for non-compressed network: unified general...
Taiji Suzuki
 
PDF
Deep learning ensembles loss landscape
Devansh16
 
PDF
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
Taiji Suzuki
 
PDF
Scalable Graph Clustering with Pregel
Sqrrl
 
PDF
Parallel kmeans clustering in Erlang
Chinmay Patel
 
Clustering: Large Databases in data mining
ZHAO Sam
 
Birch
ngocdiem87
 
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING
csandit
 
Emergence of Invariance and Disentangling in Deep Representations
Sangwoo Mo
 
Clique and sting
Subramanyam Natarajan
 
Optics
RohitPaul52
 
Iclr2020: Compression based bound for non-compressed network: unified general...
Taiji Suzuki
 
Deep learning ensembles loss landscape
Devansh16
 
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
Taiji Suzuki
 
Scalable Graph Clustering with Pregel
Sqrrl
 
Parallel kmeans clustering in Erlang
Chinmay Patel
 

Similar to Learning group variational inference (20)

PDF
Zahedi
formater1
 
PPTX
NICE Implementations of Variational Inference
Natan Katz
 
PPTX
NICE Research -Variational inference project
Natan Katz
 
PDF
Variational inference
Natan Katz
 
PDF
Variational Bayes: A Gentle Introduction
Flavio Morelli
 
PPTX
PRML Chapter 10
Sunwoo Kim
 
PDF
Deep VI with_beta_likelihood
Natan Katz
 
PDF
Approximate Inference (Chapter 10, PRML Reading)
Ha Phuong
 
PDF
Introduction to modern Variational Inference.
Tomasz Kusmierczyk
 
PDF
Variational Bayes
Matt Moores
 
PPT
Variational Inference
Tushar Tank
 
PDF
Variational Inference in Python
Peadar Coyle
 
ODP
Explaining the Basics of Mean Field Variational Approximation for Statisticians
Wayne Lee
 
PDF
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
Yuta Kashino
 
PPTX
STAN_MS_PPT.pptx
Md Abul Hayat
 
PDF
Variational inference using implicit distributions
Tomasz Kusmierczyk
 
PPT
Part 2: Unsupervised Learning Machine Learning Techniques
butest
 
PDF
(DL hacks輪読) Variational Inference with Rényi Divergence
Masahiro Suzuki
 
PDF
8803-09-lec16.pdf
KSChidanandKumarJSSS
 
PPTX
Deep generative learning_icml_part1
Scyfer
 
Zahedi
formater1
 
NICE Implementations of Variational Inference
Natan Katz
 
NICE Research -Variational inference project
Natan Katz
 
Variational inference
Natan Katz
 
Variational Bayes: A Gentle Introduction
Flavio Morelli
 
PRML Chapter 10
Sunwoo Kim
 
Deep VI with_beta_likelihood
Natan Katz
 
Approximate Inference (Chapter 10, PRML Reading)
Ha Phuong
 
Introduction to modern Variational Inference.
Tomasz Kusmierczyk
 
Variational Bayes
Matt Moores
 
Variational Inference
Tushar Tank
 
Variational Inference in Python
Peadar Coyle
 
Explaining the Basics of Mean Field Variational Approximation for Statisticians
Wayne Lee
 
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
Yuta Kashino
 
STAN_MS_PPT.pptx
Md Abul Hayat
 
Variational inference using implicit distributions
Tomasz Kusmierczyk
 
Part 2: Unsupervised Learning Machine Learning Techniques
butest
 
(DL hacks輪読) Variational Inference with Rényi Divergence
Masahiro Suzuki
 
8803-09-lec16.pdf
KSChidanandKumarJSSS
 
Deep generative learning_icml_part1
Scyfer
 
Ad

More from Shuai Zhang (8)

PPTX
Introduction to Random Walk
Shuai Zhang
 
PPTX
Reading group nfm - 20170312
Shuai Zhang
 
PPTX
Talk@rmit 09112017
Shuai Zhang
 
PPTX
Learning group em - 20171025 - copy
Shuai Zhang
 
PPTX
Learning group dssm - 20170605
Shuai Zhang
 
PPTX
Reading group gan - 20170417
Shuai Zhang
 
PPTX
Introduction to XGboost
Shuai Zhang
 
PPTX
Introduction to CNN
Shuai Zhang
 
Introduction to Random Walk
Shuai Zhang
 
Reading group nfm - 20170312
Shuai Zhang
 
Talk@rmit 09112017
Shuai Zhang
 
Learning group em - 20171025 - copy
Shuai Zhang
 
Learning group dssm - 20170605
Shuai Zhang
 
Reading group gan - 20170417
Shuai Zhang
 
Introduction to XGboost
Shuai Zhang
 
Introduction to CNN
Shuai Zhang
 
Ad

Recently uploaded (20)

PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
Securing Model Context Protocol with Keycloak: AuthN/AuthZ for MCP Servers
Hitachi, Ltd. OSS Solution Center.
 
PDF
Modern Decentralized Application Architectures.pdf
Kalema Edgar
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
Wondershare Filmora Crack Free Download 2025
josanj305
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
Essential Content-centric Plugins for your Website
Laura Byrne
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPTX
Role_of_Artificial_Intelligence_in_Livestock_Extension_Services.pptx
DrRajdeepMadavi
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
Survival Models: Proper Scoring Rule and Stochastic Optimization with Competi...
Paris Women in Machine Learning and Data Science
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
PDF
Home Cleaning App Development Services.pdf
V3cube
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Securing Model Context Protocol with Keycloak: AuthN/AuthZ for MCP Servers
Hitachi, Ltd. OSS Solution Center.
 
Modern Decentralized Application Architectures.pdf
Kalema Edgar
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Wondershare Filmora Crack Free Download 2025
josanj305
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Digital Circuits, important subject in CS
contactparinay1
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Essential Content-centric Plugins for your Website
Laura Byrne
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Role_of_Artificial_Intelligence_in_Livestock_Extension_Services.pptx
DrRajdeepMadavi
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Survival Models: Proper Scoring Rule and Stochastic Optimization with Competi...
Paris Women in Machine Learning and Data Science
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
Home Cleaning App Development Services.pdf
V3cube
 

Learning group variational inference

  • 2. Content 1 •Brief Introduction 2 •Core Idea of VI •Optimization 3 •Example: Bayesian Mix of Gauss
  • 3. What is Variational Inference? Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning [Wiki]. It is widely used to approximate posterior densities for Bayesian models, an alternative strategy to Markov Chain Monte Carlo, but it tends to be faster and easier to scale to large data. It has been applied to problems such as document analysis, computational neuroscience and computer vision.
  • 4. Core Idea Consider a general problem of Bayesian Inference - Let the latent variables in our problem be and the observed data Inference in a Bayesian model amounts to conditioning on data and computing the posterior
  • 5. Approximate Inference The inference problem is to compute the conditional given by the below equation. the denominator is the marginal distribution of the data obtained by marginalizing all the latent variables from the joint distribution p(x,z). For many models, this evidence integral is unavailable in closed form or requires exponential time to compute. The evidence is what we need to compute the conditional from the joint; this is why inference in such models is hard
  • 6. MCMC In MCMC, we first construct an ergodic Markov chain on z whose stationary distribution is the posterior Then, we sample from the chain to collect samples from the stationary distribution. Finally, we approximate the posterior with an empirical estimate constructed from the collected samples.
  • 7. VI vs. MCMC MCMC VI More computationally intensive Less intensive Guarantees producing asymptotically exact samples from target distribution No such guarantees Slower Faster, especially for large data sets and complex distributions Best for precise inference Useful to explore many scenarios quickly or large data sets
  • 8. Core Idea Rather than use sampling, the main idea behind variational inference is to use optimization. we restrict ourselves a family of approximate distributions D over the latent variables. We then try to find the member of that family that minimizes the Kullback-Leibler divergence to the exact posterior. This reduces to solving an optimization problem. The goal is to approximate p(z|x) with the resulting q(z). We optimize q(z) for minimal value of KL divergence
  • 9. Core Idea The objective is not computable. Because Because we cannot compute the KL, we optimize an alternative objective that is equivalent to the KL up to an added constant. We know from our discussion of EM.
  • 10. Core Idea Thus, we have the objective function: Maximizing the ELBO is equivalent to minimizing the KL divergence. Intuition: We rewrite the ELBO as a sum of the expected log likelihood of the data and the KL divergence between the prior p(z) and q(z)
  • 11. Mean field approximation Now that we have specified the variational objective function with the ELBO, we now need to specify the variational family of distributions from which we pick the approximate variational distribution. A common family of distributions to pick is the Mean-field variational family. Here, the latent variables are mutually independent and each governed by a distinct factor in the variational distribution.
  • 12. Coordinate ascent mean-field VI Having specified our objective function and the variational family of distributions from which to pick the approximation, we now work to optimize. CAVI maximizes ELBO by iteratively optimizing each variational factor of the mean-field variational distribution, while holding the others fixed. It however, does not guarantee finding the global optimum.
  • 13. Coordinate ascent mean-field VI given that we fix the value of all other variational factors ql(zl) (l not equal to j), the optimal 𝑞 𝑗(𝑧𝑗) is proportional to the exponentiated expected log of the complete conditional. This then is equivalent to being proportional to the log of the joint because the mean-field family assumes that all the latent variables are independent.
  • 14. Coordinate ascent mean-field VI Below, we rewrite the first term using iterated expectation and for the second term, we have only retained the term that depends on In this final equation, the RHS is equal to the negative KL divergence between 𝑞 𝑗 and exp(A). Thus, maximizing this expression is the same as minimizing the KL divergence between 𝑞 𝑗 and exp(A). This occurs when 𝑞 𝑗 =exp(A).
  • 16. Bayesian Mixture of Gaussians The full hierarchical model of The joint dist.
  • 17. Bayesian Mixture of Gaussians The mean field variational family contains approximate posterior densities of the form
  • 18. Bayesian Mixture of Gaussians Derive the ELBO as a function of the variational factors. Solve for the ELBO
  • 19. Bayesian Mixture of Gaussians Next, we derive the CAVI update for the variational factors.
  • 20. References 1. https://am207.github.io/2017/wiki/VI.html 2. https://www.cs.princeton.edu/courses/archive/fall11/cos597C/lectures/variational-inference-i.pdf 3. https://www.cs.cmu.edu/~epxing/Class/10708-17/notes-17/10708-scribe-lecture13.pdf 4. https://arxiv.org/pdf/1601.00670.pdf
  • 21. Week Report • Last week • Metric Factorization model • Learning Group • This week • Submit the ICDE paper

Editor's Notes

  • #2: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #3: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #4: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #5: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #6: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #7: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #8: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #9: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #10: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #11: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #12: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #13: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #14: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #15: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #16: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #17: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #18: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #19: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #20: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #21: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  • #22: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation