0% found this document useful (0 votes)
31 views

Cse291d 7

A mixture model could be used to cluster patients based on features like vital signs, symptoms, medical history, etc. Each cluster would correspond to a triage category like urgent, priority, routine. The model would be trained on historical patient data where triage labels are known. New patients could then be assigned to clusters/categories in real-time to help direct staff resources. Periodic retraining could refine the model as more data is collected. This approach aims to standardize triage while leveraging all available patient information to help prioritize care.

Uploaded by

ballechase
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Cse291d 7

A mixture model could be used to cluster patients based on features like vital signs, symptoms, medical history, etc. Each cluster would correspond to a triage category like urgent, priority, routine. The model would be trained on historical patient data where triage labels are known. New patients could then be assigned to clusters/categories in real-time to help direct staff resources. Periodic retraining could refine the model as more data is collected. This approach aims to standardize triage while leveraging all available patient information to help prioritize care.

Uploaded by

ballechase
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

CSE291D Lecture 7

Mixture models (revisited)

1
Announcements
• Project proposals are due today!!

• Please submit your proposal to me by email


– It’s due today, but I’ll give you a small extension:
until noon tomorrow

• I am happy to discuss your project plans

2
Log-sum-exp trick
• When computing probabilities, you will probably quickly hit
numerical underflow issues (e.g. in your homework!)

• Solution: work with unnormalized probabilities in log space.

• Normalize only when you need to, in a


numerically stable way.

3
Log-sum-exp

• Log-sum-exp in
2 dimensions:

• Monotonic,
convex

Figure from https://inst.eecs.berkeley.edu/~ee127a/book/login/def_lse_fcn.html 4


Log-sum-exp trick

MATLAB code:

5
Latent variable models

Z Latent variables

Parameters Φ X Observed data


Data
Points

Dimensionality(X) >> dimensionality(Z)


Z is a bottleneck, which finds a compressed, low-dimensional representation of X 6
Mixture models

Discrete
Z latent variables:
Cluster assignments

Parameters Φ X Observed data


Data
Points

7
Learning outcomes
By the end of the lesson, you should be able to:

• Train mixture models in a variety of ways:


– EM
– Gibbs sampling
– Collapsed Gibbs sampling

• Apply mixture models to data analysis tasks

8
9
10
Mixture models

Convex combination
of distributions

Marginalizing over a
latent variable zi

11
Mixture of Gaussians

12
Uses of mixture models: Clustering

Cluster centers

13
Uses of mixture models:
Density estimation

14
Mixture models
for classification (e.g. naïve Bayes)

Cluster assignments
Y correspond to
observed class labels

Parameters Φ X
Data
Points

15
Semi-supervised classification
Y Z

Φ X Φ X
Data Data
Points Points

Naïve Bayes with missing labels? Mixture models with some


observed labels?

16
Mixtures of experts

Regression
output
Y

Input feature X

17
Mixtures of experts

Regression
output
Y
Input-dependent
cluster assignments:
gating function

Input feature X

18
Mixtures of experts

Gating function:

Predicted y:

19
Topic models

Mimno, D. (2012). Computational historiography: Data mining in a century of classics journals.


20
ACM Journal on Computing and Cultural Heritage, Vol. 5, No. 1, Article 3,.
Computing the MLE
• Suppose the complete data log-likelihood is in the
exponential family

• Is this convex in theta?


What about the log-likelihood
when z is unobserved (the observed data LL)?

21
Actually, D is strictly speaking the
right answer: the complete data log-
likelihood is concave.

Of course, the negative complete


data log-likelihood is convex, so the
difference doesn’t matter in practice.

,non-
22
,convex convex
Computing the MLE
• Suppose the complete data log-likelihood is in
the exponential family

Log-sum-exp

Linear
Convex Concave

23
Computing the MLE
• Observed data LL:

Log-sum-exp Log-sum-exp

Difference of
convex, D.C.

Convex Has local optima


Convex

24
Mixtures of exponential families
• 1-of-k notation:

25
Mixtures of exponential families
• The complete data log-likelihood is in the exponential
family:

• The non-convexity result applies

• Sufficient statistics are counts of cluster assignments, and


counts of component sufficient statistics per cluster

26
EM for exponential family mixtures
• E-step: Compute lower bound, expected
complete data log-likelihood:

Linear
Convex Convex Convex

27
EM for exponential family mixtures

• E-step responsibilities:

28
M-step

• Lagrange multiplier, take derivative, set to 0,

• Compute MLE for each component, with expected sufficient


statistics plugged in for the sufficient statistics
• It’s as if fractional data points were assigned to each cluster,
weighted by their responsibilities
29
Gibbs sampling
• We can use MCMC to infer the full posterior
over latent variables, parameters, instead of
just a point estimate

• Gibbs updates for each , , in turn.

• This is in your homework!

30
Collapsed Gibbs sampling
• Marginalize out the parameters
• Perform Gibbs sampling on just the z’s
• Recover parameter estimates based on z at the end
of the algorithm

• Rao-Blackwell Theorem:

31
Collapsed Gibbs sampling

Before collapsing After collapsing

32
Marginalize out mixture parameters
• Assume a Dirichlet prior on mixture weights

• Polya urn model! (Same as posterior


predictive for Dirichlet-multinomial)

33
Collapsed conditional probabilities

• Probability of drawing the last ball from the


urn, given all the others

34
Collapsed Gibbs sampler
for mixture model

35
36
Performance of collapsed sampler

37
Mixing advantages of
collapsed sampler
• Stochasticity: updating z updates the counts
immediately, so the information is propagated
sooner

• Removes dependency between old parameter


and z.
– When you update a z variable, theta is “out of date”
and “wants” to keep the z’s in their old location. So
there is a battle between old parameters and the
data, which slows down mixing

38
Think-pair-share: Triage

• You are a data scientist working for a hospital.


They need help designing an automatic triage
system which clusters patients according to
several levels of the urgency of care needed.

Design a mixture modeling approach for


automatically grouping clusters into triage
categories so that patients can get the right level
of attention from the hospital staff.

39

You might also like