Bishop-Pattern-Recognition-and-Machine-Learning-2006 第455 - 459页

Uploaded by

qiuyihuang1999

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

Bishop-Pattern-Recognition-and-Machine-Learning-2006 第455 - 459页

Uploaded by

qiuyihuang1999

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Exercises 455

then, by continuity, any local maximum of L(q, θ) will also be a local maximum of
ln p(X|θ).
Consider the case of N independent data points x1 , . . . , xN with corresponding
latent variables z1 , . . . , zN . The joint distribution p(X, Z|θ) factorizes over the data
points, and this structure can be exploited in an incremental form of EM in which
at each EM cycle only one data point is processed at a time. In the E step, instead
of recomputing the responsibilities for all of the data points, we just re-evaluate the
responsibilities for one data point. It might appear that the subsequent M step would
require computation involving the responsibilities for all of the data points. How-
ever, if the mixture components are members of the exponential family, then the
responsibilities enter only through simple sufficient statistics, and these can be up-
dated efficiently. Consider, for instance, the case of a Gaussian mixture, and suppose
we perform an update for data point m in which the corresponding old and new
values of the responsibilities are denoted γ old (zmk ) and γ new (zmk ). In the M step,
the required sufficient statistics can be updated incrementally. For instance, for the
Exercise 9.26 means the sufficient statistics are defined by (9.17) and (9.18) from which we obtain
new
γ (zmk ) − γ old (zmk )
µ k = µk +
new old
xm − µoldk (9.78)
Nknew

together with
Nknew = Nkold + γ new (zmk ) − γ old (zmk ). (9.79)
The corresponding results for the covariances and the mixing coefﬁcients are analo-
gous.
Thus both the E step and the M step take ﬁxed time that is independent of the
total number of data points. Because the parameters are revised after each data point,
rather than waiting until after the whole data set is processed, this incremental ver-
sion can converge faster than the batch version. Each E or M step in this incremental
algorithm is increasing the value of L(q, θ) and, as we have shown above, if the
algorithm converges to a local (or global) maximum of L(q, θ), this will correspond
to a local (or global) maximum of the log likelihood function ln p(X|θ).

Exercises
9.1 ( ) www Consider the K-means algorithm discussed in Section 9.1. Show that as
a consequence of there being a finite number of possible assignments for the set of
discrete indicator variables rnk , and that for each such assignment there is a unique
optimum for the {µk }, the K-means algorithm must converge after a finite number
of iterations.
9.2 ( ) Apply the Robbins-Monro sequential estimation procedure described in Sec-
tion 2.3.5 to the problem of finding the roots of the regression function given by
the derivatives of J in (9.1) with respect to µk . Show that this leads to a stochastic
K-means algorithm in which, for each data point xn , the nearest prototype µk is
updated using (9.5).
456 9. MIXTURE MODELS AND EM

9.3 ( ) www Consider a Gaussian mixture model in which the marginal distribution
p(z) for the latent variable is given by (9.10), and the conditional distribution p(x|z)
for the observed variable is given by (9.11). Show that the marginal distribution
p(x), obtained by summing p(z)p(x|z) over all possible values of z, is a Gaussian
mixture of the form (9.7).
9.4 ( ) Suppose we wish to use the EM algorithm to maximize the posterior distri-
bution over parameters p(θ|X) for a model containing latent variables, where X is
the observed data set. Show that the E step remains the same as in the maximum
likelihood case, whereas in the M step the quantity to be maximized is given by
Q(θ, θ old ) + ln p(θ) where Q(θ, θ old ) is deﬁned by (9.30).
9.5 ( ) Consider the directed graph for a Gaussian mixture model shown in Figure 9.6.
By making use of the d-separation criterion discussed in Section 8.2, show that the
posterior distribution of the latent variables factorizes with respect to the different
data points so that

N
p(Z|X, µ, Σ, π) = p(zn |xn , µ, Σ, π). (9.80)
n=1

9.6 ( ) Consider a special case of a Gaussian mixture model in which the covari-
ance matrices Σk of the components are all constrained to have a common value
Σ. Derive the EM equations for maximizing the likelihood function under such a
model.
9.7 ( ) www Verify that maximization of the complete-data log likelihood (9.36) for
a Gaussian mixture model leads to the result that the means and covariances of each
component are fitted independently to the corresponding group of data points, and
the mixing coefficients are given by the fractions of points in each group.
9.8 ( ) www Show that if we maximize (9.40) with respect to µk while keeping the
responsibilities γ(znk ) fixed, we obtain the closed form solution given by (9.17).
9.9 ( ) Show that if we maximize (9.40) with respect to Σk and πk while keeping the
responsibilities γ(znk ) fixed, we obtain the closed form solutions given by (9.19)
and (9.22).
9.10 ( ) Consider a density model given by a mixture distribution

K
p(x) = πk p(x|k) (9.81)
k=1

and suppose that we partition the vector x into two parts so that x = (xa , xb ).
Show that the conditional density p(xb |xa ) is itself a mixture distribution and ﬁnd
expressions for the mixing coefﬁcients and for the component densities.
Exercises 457

9.11 ( ) In Section 9.3.2, we obtained a relationship between K means and EM for

Gaussian mixtures by considering a mixture model in which all components have
covariance I. Show that in the limit → 0, maximizing the expected complete-
data log likelihood for this model, given by (9.40), is equivalent to minimizing the
distortion measure J for the K-means algorithm given by (9.1).
9.12 ( ) www Consider a mixture distribution of the form

K
p(x) = πk p(x|k) (9.82)
k=1

where the elements of x could be discrete or continuous or a combination of these.

Denote the mean and covariance of p(x|k) by µk and Σk , respectively. Show that
the mean and covariance of the mixture distribution are given by (9.49) and (9.50).
9.13 ( ) Using the re-estimation equations for the EM algorithm, show that a mix-
ture of Bernoulli distributions, with its parameters set to values corresponding to a
maximum of the likelihood function, has the property that

1
N
E[x] = xn ≡ x. (9.83)
N
n=1

Hence show that if the parameters of this model are initialized such that all compo-
nents have the same mean µk = µ for k = 1, . . . , K, then the EM algorithm will
converge after one iteration, for any choice of the initial mixing coefficients, and that
this solution has the property µk = x. Note that this represents a degenerate case of
the mixture model in which all of the components are identical, and in practice we
try to avoid such solutions by using an appropriate initialization.
9.14 ( ) Consider the joint distribution of latent and observed variables for the Bernoulli
distribution obtained by forming the product of p(x|z, µ) given by (9.52) and p(z|π)
given by (9.53). Show that if we marginalize this joint distribution with respect to z,
then we obtain (9.47).
9.15 ( ) www Show that if we maximize the expected complete-data log likelihood
function (9.55) for a mixture of Bernoulli distributions with respect to µk , we obtain
the M step equation (9.59).
9.16 ( ) Show that if we maximize the expected complete-data log likelihood function
(9.55) for a mixture of Bernoulli distributions with respect to the mixing coefficients
πk , using a Lagrange multiplier to enforce the summation constraint, we obtain the
M step equation (9.60).
9.17 ( ) www Show that as a consequence of the constraint 0 p(xn |µk ) 1 for
the discrete variable xn , the incomplete-data log likelihood function for a mixture
of Bernoulli distributions is bounded above, and hence that there are no singularities
for which the likelihood goes to infinity.
458 9. MIXTURE MODELS AND EM

9.18 ( ) Consider a Bernoulli mixture model as discussed in Section 9.3.3, together

with a prior distribution p(µk |ak , bk ) over each of the parameter vectors µk given
by the beta distribution (2.13), and a Dirichlet prior p(π|α) given by (2.38). Derive
the EM algorithm for maximizing the posterior probability p(µ, π|X).
9.19 ( ) Consider a D-dimensional variable x each of whose components i is itself a
multinomial variable of degree M so that x is a binary vector with components
xij
where i = 1, . . . , D and j = 1, . . . , M , subject to the constraint that j xij = 1 for
all i. Suppose that the distribution of these variables is described by a mixture of the
discrete multinomial distributions considered in Section 2.2 so that

K
p(x) = πk p(x|µk ) (9.84)
k=1

where

D
M
xij
p(x|µk ) = µkij . (9.85)
i=1 j =1

The parameters µkij represent the probabilities

p(xij = 1|µk ) and must satisfy
0 µkij 1 together with the constraint j µkij = 1 for all values of k and i.
Given an observed data set {xn }, where n = 1, . . . , N , derive the E and M step
equations of the EM algorithm for optimizing the mixing coefficients πk and the
component parameters µkij of this distribution by maximum likelihood.
9.20 ( ) www Show that maximization of the expected complete-data log likelihood
function (9.62) for the Bayesian linear regression model leads to the M step re-
estimation result (9.63) for α.
9.21 ( ) Using the evidence framework of Section 3.5, derive the M-step re-estimation
equations for the parameter β in the Bayesian linear regression model, analogous to
the result (9.63) for α.
9.22 ( ) By maximization of the expected complete-data log likelihood defined by
(9.66), derive the M step equations (9.67) and (9.68) for re-estimating the hyperpa-
rameters of the relevance vector machine for regression.
9.23 ( ) www In Section 7.2.1 we used direct maximization of the marginal like-
lihood to derive the re-estimation equations (7.87) and (7.88) for finding values of
the hyperparameters α and β for the regression RVM. Similarly, in Section 9.3.4
we used the EM algorithm to maximize the same marginal likelihood, giving the
re-estimation equations (9.67) and (9.68). Show that these two sets of re-estimation
equations are formally equivalent.
9.24 ( ) Verify the relation (9.70) in which L(q, θ) and KL(qp) are defined by (9.71)
and (9.72), respectively.
Exercises 459

9.25 ( ) www Show that the lower bound L(q, θ) given by (9.71), with q(Z) =
p(Z|X, θ (old) ), has the same gradient with respect to θ as the log likelihood function
ln p(X|θ) at the point θ = θ (old) .
9.26 ( ) www Consider the incremental form of the EM algorithm for a mixture of
Gaussians, in which the responsibilities are recomputed only for a speciﬁc data point
xm . Starting from the M-step formulae (9.17) and (9.18), derive the results (9.78)
and (9.79) for updating the component means.
9.27 ( ) Derive M-step formulae for updating the covariance matrices and mixing
coefﬁcients in a Gaussian mixture model when the responsibilities are updated in-
crementally, analogous to the result (9.78) for updating the means.

Argosy Driver's Manual PDF
No ratings yet
Argosy Driver's Manual PDF
166 pages
Algoritmo E-M. Utilizado para Calcular La Mezcla de Gausianas
No ratings yet
Algoritmo E-M. Utilizado para Calcular La Mezcla de Gausianas
8 pages
Chap2 Part2 GMM
No ratings yet
Chap2 Part2 GMM
34 pages
کتاب ششم بارگزاری شده
No ratings yet
کتاب ششم بارگزاری شده
49 pages
Notes7_Mixtures_and_EM
No ratings yet
Notes7_Mixtures_and_EM
7 pages
GAUSSIAN MIXTURES
No ratings yet
GAUSSIAN MIXTURES
5 pages
Chapter 9.4 Allele Frequency Estimation
No ratings yet
Chapter 9.4 Allele Frequency Estimation
24 pages
CB PDF
No ratings yet
CB PDF
69 pages
Mixture Models and Expectation-Maximization: Justus H. Piater
No ratings yet
Mixture Models and Expectation-Maximization: Justus H. Piater
11 pages
掃描文件 2019年10月24日
No ratings yet
掃描文件 2019年10月24日
19 pages
GMM
No ratings yet
GMM
26 pages
Clustering Mixture
No ratings yet
Clustering Mixture
22 pages
Week 7 - Latent Variable Models and Expectation Maximization
No ratings yet
Week 7 - Latent Variable Models and Expectation Maximization
39 pages
Lec15 16 Handout
No ratings yet
Lec15 16 Handout
33 pages
An Alternative View of EM_poornima
No ratings yet
An Alternative View of EM_poornima
4 pages
Dis10 Sol PDF
No ratings yet
Dis10 Sol PDF
6 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
Cse291d 7
No ratings yet
Cse291d 7
39 pages
Lec16 PDF
No ratings yet
Lec16 PDF
10 pages
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
No ratings yet
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
24 pages
GMMEMNotes
No ratings yet
GMMEMNotes
10 pages
22 Mixture Models A EM
No ratings yet
22 Mixture Models A EM
32 pages
gmm
No ratings yet
gmm
8 pages
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
No ratings yet
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
7 pages
EM Algo
No ratings yet
EM Algo
8 pages
AI29
No ratings yet
AI29
3 pages
Dsci303-19 GM - em
No ratings yet
Dsci303-19 GM - em
81 pages
Oral Texte
No ratings yet
Oral Texte
12 pages
The Expectation Maximization Algorithm
No ratings yet
The Expectation Maximization Algorithm
7 pages
Expectation Maximization
No ratings yet
Expectation Maximization
19 pages
EM-converted
No ratings yet
EM-converted
22 pages
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
No ratings yet
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
3 pages
Expectation Maximization
No ratings yet
Expectation Maximization
21 pages
lecture5
No ratings yet
lecture5
16 pages
Tutorial On Generalized Expectation Maximization: Javier R. Movellan
No ratings yet
Tutorial On Generalized Expectation Maximization: Javier R. Movellan
6 pages
Tutorial On Generalized Expectation
No ratings yet
Tutorial On Generalized Expectation
6 pages
Flexmix Intro
No ratings yet
Flexmix Intro
18 pages
ML-2-Expectation Maximization
No ratings yet
ML-2-Expectation Maximization
11 pages
Module13 GaussianMixtureModel
No ratings yet
Module13 GaussianMixtureModel
17 pages
20-gaussian-mixture-model
No ratings yet
20-gaussian-mixture-model
55 pages
(Slides) The em Algorithm
No ratings yet
(Slides) The em Algorithm
14 pages
Package Mixr': R Topics Documented
No ratings yet
Package Mixr': R Topics Documented
29 pages
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
No ratings yet
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
4 pages
Chapter 1 - Part1
No ratings yet
Chapter 1 - Part1
56 pages
Beamer
No ratings yet
Beamer
34 pages
TR 97 021
No ratings yet
TR 97 021
15 pages
9 Unsupervised Learning: 9.1 K-Means Clustering
No ratings yet
9 Unsupervised Learning: 9.1 K-Means Clustering
34 pages
PRML Slides 2
No ratings yet
PRML Slides 2
86 pages
CM Latent - Models 2022
No ratings yet
CM Latent - Models 2022
27 pages
Mixture Models and Clustering
No ratings yet
Mixture Models and Clustering
8 pages
Gaussian Distribution
No ratings yet
Gaussian Distribution
5 pages
Lecture3 EM
No ratings yet
Lecture3 EM
36 pages
Tutorial em
No ratings yet
Tutorial em
57 pages
EM Algorithm: Shu-Ching Chang Hyung Jin Kim December 9, 2007
No ratings yet
EM Algorithm: Shu-Ching Chang Hyung Jin Kim December 9, 2007
10 pages
L08_GMM
No ratings yet
L08_GMM
11 pages
EM Presentation 2013
No ratings yet
EM Presentation 2013
18 pages
Sections 9 Probabitliy Online Quewston Random
No ratings yet
Sections 9 Probabitliy Online Quewston Random
3 pages
Chapter 9 - Statistical Estimat - 2016 - Introduction To Statistical Machine Lea
No ratings yet
Chapter 9 - Statistical Estimat - 2016 - Introduction To Statistical Machine Lea
8 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Problems in Quantum Mechanics: Third Edition
From Everand
Problems in Quantum Mechanics: Third Edition
D. ter Haar
3/5 (2)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
2_WhyWhatDM
No ratings yet
2_WhyWhatDM
9 pages
CV_231218002_101551_Mr_Mok_WingYeungRussell
No ratings yet
CV_231218002_101551_Mr_Mok_WingYeungRussell
3 pages
CUHK_ICSA_Report_V1.0
No ratings yet
CUHK_ICSA_Report_V1.0
34 pages
CV_231218002_83407_Mr_Su_Yujia
No ratings yet
CV_231218002_83407_Mr_Su_Yujia
1 page
CV_231218002_84224_Mr_Chen_Shouzheng
No ratings yet
CV_231218002_84224_Mr_Chen_Shouzheng
3 pages
CV_231218002_81212_Miss_LI_RUI
No ratings yet
CV_231218002_81212_Miss_LI_RUI
1 page
Nomination Acceptance Letter - Mitacs GRI 2025
No ratings yet
Nomination Acceptance Letter - Mitacs GRI 2025
1 page
PRML Errata 1st 20110921
No ratings yet
PRML Errata 1st 20110921
29 pages
A Brief Tour of Deep Learning From A Statistical Perspective
No ratings yet
A Brief Tour of Deep Learning From A Statistical Perspective
31 pages
Asr 013
No ratings yet
Asr 013
16 pages
12-3 RNA and Protein Synthesis: 1 Focus
No ratings yet
12-3 RNA and Protein Synthesis: 1 Focus
7 pages
Concentration Units
No ratings yet
Concentration Units
10 pages
Durian
No ratings yet
Durian
9 pages
Futura Ers
No ratings yet
Futura Ers
49 pages
Main Idea and Supporting Details/Generalizing
No ratings yet
Main Idea and Supporting Details/Generalizing
1 page
The Impact of Supply Chain Management in the Nigerian Oil and Gas Industry (1)
No ratings yet
The Impact of Supply Chain Management in the Nigerian Oil and Gas Industry (1)
121 pages
Power Grid Corporation of India Limited
No ratings yet
Power Grid Corporation of India Limited
3 pages
Uganda Christian University Main Campus: Faculty of Social Sciences
No ratings yet
Uganda Christian University Main Campus: Faculty of Social Sciences
22 pages
Colorectal THRIVE 2025 incl. Masterclass
No ratings yet
Colorectal THRIVE 2025 incl. Masterclass
7 pages
304 1 ET V1 S1 - File1
No ratings yet
304 1 ET V1 S1 - File1
10 pages
Module Fili
No ratings yet
Module Fili
22 pages
Types of Fertilizers
No ratings yet
Types of Fertilizers
20 pages
Quiz2 Module2 Soap Detergents-Review
100% (1)
Quiz2 Module2 Soap Detergents-Review
5 pages
Brochure CBI HM Electronic A Def LR
No ratings yet
Brochure CBI HM Electronic A Def LR
28 pages
Jurnal Ebm PDF
No ratings yet
Jurnal Ebm PDF
10 pages
Adithi G sem_6_Final_Internship_report[1]
No ratings yet
Adithi G sem_6_Final_Internship_report[1]
47 pages
Sachin Report
No ratings yet
Sachin Report
26 pages
5 Rygh 2018
No ratings yet
5 Rygh 2018
19 pages
UTEN 2007-2012 Progress Report
No ratings yet
UTEN 2007-2012 Progress Report
172 pages
Core Cutting Test of Concrete
No ratings yet
Core Cutting Test of Concrete
4 pages
Basal Implants in The Mandibular Esthetic Zone A Case Series
No ratings yet
Basal Implants in The Mandibular Esthetic Zone A Case Series
6 pages
Nav 1 Quiz 6 Reviewer
No ratings yet
Nav 1 Quiz 6 Reviewer
4 pages
Gullco
No ratings yet
Gullco
119 pages
Solutions For Test One
No ratings yet
Solutions For Test One
5 pages
Scope of Wine Grape Cultivation in Andhra Pradesh, India - A SWOT Analysis
No ratings yet
Scope of Wine Grape Cultivation in Andhra Pradesh, India - A SWOT Analysis
4 pages
Petition For Declaration of Nullity of Marriage DRAFT 2
100% (1)
Petition For Declaration of Nullity of Marriage DRAFT 2
18 pages
UUEG Chapter14-15 Gerunds-Infinitives3
No ratings yet
UUEG Chapter14-15 Gerunds-Infinitives3
22 pages
Oedometer Test: Kurva Akar Waktu (AP 0,25 kg/cm2) Kurva Akar Waktu (AP 0,25 kg/cm2)
No ratings yet
Oedometer Test: Kurva Akar Waktu (AP 0,25 kg/cm2) Kurva Akar Waktu (AP 0,25 kg/cm2)
24 pages
Garoupa, N., & Magalhães, P. C. (2020).
No ratings yet
Garoupa, N., & Magalhães, P. C. (2020).
40 pages

Bishop-Pattern-Recognition-and-Machine-Learning-2006 第455 - 459页

Uploaded by

Bishop-Pattern-Recognition-and-Machine-Learning-2006 第455 - 459页

Uploaded by

Exercises 455

9.11 ( ) In Section 9.3.2, we obtained a relationship between K means and EM for

where the elements of x could be discrete or continuous or a combination of these.

9.18 ( ) Consider a Bernoulli mixture model as discussed in Section 9.3.3, together

The parameters µkij represent the probabilities

You might also like