0% found this document useful (0 votes)
5 views

Chapter 1 - Part1

Uploaded by

jee.extra7
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Chapter 1 - Part1

Uploaded by

jee.extra7
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 56

Utilities: Preferences you don’t want to compromise while achieving the

goal
 Safe driving
 Comfortable
 Reaching on time
Problem taken from Russell Norvig
edition 4
Full joint distribution
,
 Summation is always over hidden variables
 Summation moves inwards
 P(j/A) is independent of P(M/A) so Summation M goes in
 Like wise sum of B is independent of P(J/A) and P(M/A) so it moves in
 =f4(J)
 f1 function is of A and B as summation over E will remove the
variable E as it will be summed up. similarly for f2 and f3
Final step
Inference in Bayesian Network

 Graphical independence representation yields efficient inference


schemes
 We generally want to compute
 Marginal probability: P(Z)
 Exterior probability P(Z/E)
Gaussian Mixture model

 Helps in estimation of complicated data


 In estimation we should come up with a model which will better
estimate the data.
 That means our aim to see what could be a good generative story for
the given data set?
 These values are not just 0’s and 1’s,these are real numbers.
 So we can apply gaussian distribution and apply estimation technique
 Then apply the principle of maximum likelihood to get the best
gaussian.
 The mean of maximum likelihood for gaussian will be the sample
mean
 The Gaussian distribution for this sample mean will be
 The extreme points are low density region
 The probability of data coming in these extreme regions is considered
low
 Model will try to get the data under the curve and find the best
gaussian
 So this wont include the extreme ends as it’s the least density area.
 So this is not the model which will best define the data as most of the
points belong to low density region.
What kind of PDF would be more accurate
for this data?

 This is a better model for this dataset but this is definitely not a gaussian distribution as
gaussian will have only one peak,this has 3 peaks.
 So we can conclude, this type of data set requires a different model to fit into
 Also we can see the data exhibits some kind of clustering.
 A new generative model which is a mixture of gaussian will better explains the data,
mechanism which will define the data in a probabilistic way.
Way to model

 Step 1: Pick which mixture a data point comes from.

 Step 2: Generate data point from that mixture.

 But this being a probabilistic approach, we haven’t taken that into


these steps, so let us consider the probabilistic approach and revise
the steps
 Consider the model as a dice having 3 sides (representing 3 clusters
say)
Cont…
 Generate a mixture component among {1,2,3…,k} for our example it is 3 cluster.
 Zi € {1,2,3…,k} ex: if the dice of 3 sides(3 cluster) thrown gives 3 then Zi=3
 i.e the ith data point comes from the third mixture.
 Here the value of Zi is calculated in a probabilistic manner by throwing the dice.
 So to be precise Zi is the mixture indicator.

{l is any number between 1 to k indicating mixture number}


Step 1:

for all i
Every mixture has its own mean and variance

Say Zi is 5 that means I am going to the fifth mixture and there is mean
Step 2:

and variance inside it and sample data points from it.


Generate N(
 To generate a single data point we have to go through these steps.
 Roll the dice,select the mixture and then sample a data point from
the Gaussian mixture.
 Repeat the two steps for the next data point and so on…
 These events are iids independent identically distributed.

 Step1

 Step2
 Latent or unobserved variable
 Gaussian Mixture model is a latent variable model because the output
that we assuming not only depends on some parameters that you
want to estimate but also on unobserved variable
What we need to pause and think?

 As an estimation procedure Howmany parameters do we require to


define the model completely.

 Parameters ]
 ,)
 Total : 2k +k-1
 3K-1
 The likelihood function is dependent on various parameters µ,sigma
and probability π
 As the first step is to select which mixture
 The selection of mixture depends on π
 Here big pi is product as all events are iids
 The data point will always not be closer to the mean as it depends on
the probability of the mixture being picked. if that probability itself is
very low then this may lead to the data point being picked up by
other cluster
Introduction to GMM

• Not possible to solve it analytically


• It is not a closed form equation. You cannot take derivate
with respect to μ or with respect to σand get the maximum likelihood
solution.

• Some complex gradient method can be used t optimize it.


• WE need to solve this with alternate method
 Not possible to solve it analytically
 Some complex gradient method can be used to optimize it.
 WE need to solve this with alternate method
Convex and concave functions

 If the linear interpolation always gives a value higher than the actual
function itself then such functions are called convex functions

You might also like