0% found this document useful (0 votes)
40 views

Maximum-Likelihood & Bayesian Parameter Estimation: Srihari: CSE 555

This document discusses and compares maximum likelihood and Bayesian parameter estimation techniques. It provides details on maximum likelihood estimation, including that it finds the parameters that maximize the probability of observing the training samples. The document also describes how to perform maximum likelihood estimation for a Gaussian distribution, finding the maximum likelihood estimates for the mean μ as simply the sample mean, and for the variance σ2 as the sample variance, though this estimate of σ2 is biased.

Uploaded by

jhgfdrgejht
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Maximum-Likelihood & Bayesian Parameter Estimation: Srihari: CSE 555

This document discusses and compares maximum likelihood and Bayesian parameter estimation techniques. It provides details on maximum likelihood estimation, including that it finds the parameters that maximize the probability of observing the training samples. The document also describes how to perform maximum likelihood estimation for a Gaussian distribution, finding the maximum likelihood estimates for the mean μ as simply the sample mean, and for the variance σ2 as the sample variance, though this estimate of σ2 is biased.

Uploaded by

jhgfdrgejht
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Maximum-Likelihood & Bayesian

Parameter Estimation

Srihari: CSE 555 0


Maximum Likelihood Versus Bayesian
Parameter Estimation
Optimal classifier can be designed knowing P(ωi) and p(x |
ωi)
Obtain them from training samples assuming known
forms of pdfs, e.g., p(x | ωi) ~ N( µi, Σi) has 2 parameters
• Estimation techniques
z Maximum-Likelihood (ML)
z Find parameters that maximize probability of observations
z Bayesian estimation
z Parameters are random variables with known prior distribution
sharpened by observations
z Results nearly identical, but approaches are different
Maximum Likelihood Parameter Estimation

• Parameters are fixed but unknown


• Best parameters obtained by maximizing probability
of obtaining samples observed
z Has good convergence properties as sample size increases
z Simpler than any other alternative techniques

z The General principle


Assume c classes and
p(x | ωj) ~ N( µj, Σj)
P(x | ωj) ≡ P (x | ωj, θj) where:
θ = (µ j , Σ j )
Maximum Likelihood Estimation
z Use n training samples in a class to estimate θ
z If D contains n independently drawn samples, x1, x2,…, xn
n
p ( D | θ ) = ∏ p ( xk | θ )
k =1

p (D | θ ) is called the likelihood of θ w.r.t. the set of samples


l(θ ) is the log - likelihood of θ

z ML estimate of θ is, by definition the value that maximizes


p(D | θ)
“It is the value of θθ̂that best agrees with the actually observed
training samples”
One-Dimensional Example Four of infinite
no. of source distributions

Likelihood as a
function of mean
(peaks at mean)
Would be sharp
peak with many
samples

Log-likelihood
function(also
peaks at mean)
Maximizing the log-likelihood function
z Let θ = (θ1, θ2, …, θp)t and let ∇θ be the gradient operator
t
⎡ ∂ ∂ ∂ ⎤
∇θ = ⎢ , ,..., ⎥
⎣ ∂θ 1 ∂θ 2 ∂θ p ⎦

z We define l (θ) as the log-likelihood function


l (θ) = ln P(D | θ)

θˆ = arg max l(θ )


Determine θ that maximizes the log-likelihood θ

Set of necessary conditions for an optimum is:


n
∇θ l = ∑ ∇θ ln p ( xk | θ ) = 0
k =1
MLE: The Gaussian Case: unknown µ
z p(xi | µ) ~ N(µ, Σ)

1
2
[ ] 1
ln p ( xk | µ ) = − ln (2π ) d Σ − ( xk − µ ) t Σ −1 ( xk − µ )
2
and ∇ µ ln P ( xk | µ ) = Σ −1 ( xk − µ )
θ = µ therefore:

• The ML estimate for µ must satisfy:

∑ ( xk − µˆ ) = 0
Σ −1

k =1
• Multiplying by Σ

1 n
µ̂ = ∑ xk
n k =1
Just the sample mean
MLE: Gaussian Case- unknown µ and σ
1 1
θ = (θ1, θ2) = (µ, σ2) ln p ( x k | θ ) = − ln 2 πθ − (x k − θ 1)2

2
2 2

⎛ ∂ ⎞
⎜ (ln p ( x k | θ )) ⎟
∂ θ
∇ θ l = ⎜⎜ 1 ⎟ = 0


⎜⎜ (ln p ( x k | θ )) ⎟⎟
⎝ ∂ θ 2 ⎠
⎧ 1
⎪θ (x k − θ 1) = 0
⎪ 2

⎪− 1 + (x k − θ 1) = 0
2

⎪⎩ 2 θ 2 2 θ 22

⎧n 1
⎪∑ ˆ ( xk − θ1 ) = 0 (1)
⎪ k =1 θ 2 n
⎨ n
⎪− 1 n
( xk − θˆ1 ) 2 n ∑ (x k − µ )2
⎪ ∑
+∑ =0 (2) µ =∑
xk
; σ2 = k =1

⎩ k =1 θ 2 k =1 θ2
ˆ ˆ 2
k =1 n n
MLE Bias
z ML estimate for σ2 is biased

⎡1 2⎤ n−1 2
E ⎢ Σ( x i − x ) ⎥ = .σ ≠ σ 2
⎣n ⎦ n

z An elementary unbiased estimator for Σ is:

1 k =n
C= ∑ (x k − µ )(x k − µˆ )
t

14n4- 4
1 k4
=14244444 3
Sample covariance matrix

You might also like