0% found this document useful (0 votes)

26 views

Notes On Beta and Dirchilet Distribution

The document discusses the Beta and Dirichlet distributions. The Beta distribution defines the probability of θ given parameters a and b. It has a mean of a/(a+b) and a variance of ab/(a+b)2(a+b+1). The Dirichlet distribution generalizes the Beta to multiple probabilities θ1,...θK that sum to 1, with parameters α1,...αK. It has a mean for θk of αk/Σαi. Both distributions are commonly used as conjugate priors in Bayesian models.

Uploaded by

Jun Wang

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Notes On Beta and Dirchilet Distribution

Uploaded by

Jun Wang

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Notes on Beta and Dirchilet Distribu!

Beta Distribu!on

Defini!on

Γ(a + b) a−1
p(θ; a, b) = Beta(θ; a, b) = θ (1 − θ)b−1
Γ(a)Γ(b)

Normaliza!on

∫ p(θ; a, b)dθ = 1
Γ(a + b) a−1
∫ θ (1 − θ)b−1 dθ = 1
Γ(a)Γ(b)
Γ(a + b)
∫ θa−1 (1 − θ)b−1 dθ = 1
Γ(a)Γ(b)
Γ(a)Γ(b)
∫ θ (1 − θ) dθ =
a−1 b−1
Γ(a + b)

Mean
E[θ] = ∫ θ × Beta(θ; a, b)dθ
Γ(a + b) a−1
=∫ θ× θ (1 − θ)b−1 dθ
Γ(a)Γ(b)
Γ(a + b)
= ∫ θa (1 − θ)b−1 dθ
Γ(a)Γ(b)
Γ(a + b) Γ(a + 1)Γ(b)
= ×
Γ(a)Γ(b) Γ(a + b + 1)
Γ(a + b) a × Γ(a)Γ(b)
= ×
Γ(a)Γ(b) (a + b) × Γ(a + b)
a
=
a+b

Mode

Γ(a + b) a−1
ℓ(θ) = log p(θ; a, b) = log θ (1 − θ)b−1
Γ(a)Γ(b)
Γ(a + b)
= log + (a − 1) log θ + (b − 1) log(1 − θ)
Γ(a)Γ(b)

We calculate the deriva!ve with respect to θ

dℓ(θ) 1 1
= (a − 1) − (b − 1) =0
dθ θ 1−θ

a−1
θmax =
a+b−2

Variance

var(θ) = E[(θ − E(θ))2 ]

= E[θ2 − 2θE(θ) + E(θ)2 ]
= E[θ2 ] − E[θ]2
E[θ2 ] = ∫ θ2 × Beta(θ; a, b)dθ
Γ(a + b) a−1
=∫ θ × 2
θ (1 − θ)b−1 dθ
Γ(a)Γ(b)
Γ(a + b)
= ∫ θa+1 (1 − θ)b−1 dθ
Γ(a)Γ(b)
Γ(a + b) Γ(a + 2)Γ(b)
= ×
Γ(a)Γ(b) Γ(a + b + 2)
Γ(a + b) a(a + 1) × Γ(a)Γ(b)
= ×
Γ(a)Γ(b) (a + b)(a + b + 1) × Γ(a + b)
a(a + 1)
=
(a + b)(a + b + 1)

a(a + 1) a 2
var(θ) = −( )
(a + b)(a + b + 1) a+b
ab
=
(a + b)2 (a + b + 1)

Plots
The above figure shows plots of the Beta distribu!on Beta(θ∣a, b) given as a func!on
of θ for various values of the hyper-parameters a and b.

The beta-Bernoulli model

Prior

θ ∼ Beta(θ; a, b)

Likelihood

D = (X1 , ⋯ , Xi , ⋯ , XN )

Xi ∼ Bernoulli(θ)
p(Xi = 1) = θ
p(Xi = 0) = 1 − θ

p(D∣θ) = θN1 (1 − θ)N0

N1 + N2 = N

Posterior

p(D, θ) p(D∣θ)p(θ)
p(θ∣D) = = ∝ p(D∣θ)p(θ)
p(D) ∫ p(D∣θ)p(θ)dθ

p(θ∣D) ∝ θN1 (1 − θ)N0 × θa−1 (1 − θ)b−1 = θN1 +a−1 (1 − θ)N0 +b−1

Γ(a + N1 )Γ(b + N0 )
∫ θN1 +a−1 (1 − θ)N0 +b−1 dθ =
Γ(a + b + N0 + N1 )

There is a constant C

∫ p(θ∣D)dθ = 1 = ∫ C × θN1 +a−1 (1 − θ)N0 +b−1 dθ

So C is

Γ(a + b + N0 + N1 )
C=
Γ(a + N1 )Γ(b + N0 )

And we can get

Γ(a + b + N0 + N1 ) N1 +a−1
p(θ∣D) = θ (1 − θ)N0 +b−1
Γ(a + N1 )Γ(b + N0 )

Because the posterior distribu!on p(θ∣D) is Beta(θ; N1 + a, N0 + b), so we can

easily calculate the corresponding mean, mode and variance based on the previous
sec!on on Beta distribu!on.
E[θ∣D] = ∫ θ × p(θ∣D)dθ
a + N1
=
a + N1 + b + N0

If we set M = a + b and we know N = N0 + N1 , then we can get

× M + NN1 × N
a
E[θ∣D] = M
M +N
M a N N1
= × + ×
M +N M M +N N
a N1
=λ× + (1 − λ) ×
M N

If λ = MM +N , then 1 − λ = N
M +N . We know a is the prior mean of θ , and N1 is the
M N
MLE of θ .

We will now show that the posterior mean is convex combina!on of the prior mean and
the MLE, which captures the no!on that the posterior is a compromise between what
we previously believed and what the data is telling us. So the weaker the prior, the
smaller is λ, and hence the closer the posterior mean is to the MLE. One can show
similarly that the posterior mode is a convex combina!on of the prior mode and the
MLE, and that it too converges to the MLE.

Posterior predic!ve distribu!on

So far, we have been focusing on inference of the unknown parameter(s). Let us now
turn our a#en!on to predic!on of future observable data. Consider predic!ng the
probability of heads in a single future trial under a Beta(a + N1 , b + N0 ) posterior.

We have
1
p(Xnew = 1∣D) = ∫ p(Xnew = 1, θ∣D)dθ
0
1
=∫ p(Xnew = 1∣θ, D) × p(θ∣D)dθ
0
1
=∫ p(Xnew = 1∣θ) × p(θ∣D)dθ
0
1
=∫ θ × Beta(θ; a + N1 , b + N0 )dθ
0
a + N1
= E(θ∣D) =
a+b+N

Thus we see that the mean of the posterior predic!ve distribu!on is equivalent (in this
case) to plugging in the posterior mean parameters: p(Xnew = 1∣D) =
Bernoulli(Xnew ∣E(θ∣D)).

Dirichlet Distribu!on

Defini!on

α =(α1 , ⋯ , αk , ⋯ , αK )
θ =(θ1 , ⋯ , θk , ⋯ , θK )

K K
Γ(∑k=1 αk )
p(θ; α) = Dir(θ; α) = K ∏ θkαk −1
∏k=1 Γ(αk ) k=1

Normaliza!on
∫ Dir(θ; α)dθ = 1
K K
Γ(∑k=1 αk )
∫ K ∏ θkαk −1 dθ =1
∏k=1 Γ(αk ) k=1
K K
Γ(∑k=1 αk )
K ∫ ∏ θk
αk −1
dθ =1
∏k=1 Γ(αk ) k=1
K K
∏ Γ(αk )
∫ ∏ θkαk −1 dθ = k=1K
k=1 Γ(∑k=1 αk )

Mean

E[θk ] = ∫ θk × Dir(θ; α)dθ

K K
Γ(∑i=1 αi )
= ∫ θk × K ∏ θiαi −1 dθ
∏i=1 Γ(αi ) i=1
K
Γ(∑i=1 αi )
∫ θk ∏ θiαi −1 dθ
(αk −1)+1
= K
∏i=1 Γ(αi ) i≠k
K
Γ(∑i=1 αi )
∫ θk ∏ θiαi −1 dθ
(αk +1)−1
= K
∏i=1 Γ(αi ) i≠k

Because Γ(αk + 1) = Γ(αk ) × αk , we can get

∏i≠k Γ(αi )Γ(αk + 1)

∫ ∏ θiαi dθ
(α +1)−1
θk k = K
i≠k Γ(1 + ∑i=1 αi )
αk Γ(αk ) × ∏i≠k Γ(αi )
= K K
Γ(∑i=1 αi ) × ∑i=1 αi
K
αk × ∏i=1 Γ(αi )
= K K
Γ(∑i=1 αi ) × ∑i=1 αi
We can get

K K
Γ(∑i=1 αi ) αk × ∏i=1 Γ(αi )
E[θk ] = K × K K
∏i=1 Γ(αi ) Γ(∑i=1 αi ) × ∑i=1 αi
αk
= K
∑i=1 αi

Mode

K K
Γ(∑k=1 αk )
ℓ(θ) = log p(θ; a, b) = log K ∏ θkαk −1
∏k=1 Γ(αk ) k=1
K K
Γ(∑k=1 αk )
= log K + ∑(αk − 1) log θk
∏k=1 Γ(αk ) k=1

K
Because ∑k=1 θk = 1, we can get

K K K
Γ(∑k=1 αk )
ℓ(θ) = log K + ∑(αk − 1) log θk + λ(1 − ∑ θk )
∏k=1 Γ(αk ) k=1 k=1

We calculate the par!al deriva!ve with respect to θk :

∂ℓ(θ) αk − 1
= −λ=0
∂θk θk

We can get

αk − 1
θk =
λ
K
Again, because ∑k=1 θk = 1, we can get

K
αk − 1
∑ =1
λ
k=1

So
K
λ = ∑(αk − 1)
k=1

Finally, we can get

αk − 1 αk − 1 αk − 1
θk(max) = = K = K
λ ∑k=1 (αk − 1) (∑k=1 αk ) − K

Variance

var(θk ) = E[θk2 ] − E[θk ]2

First, we can get

E[θk2 ] = ∫ θk2 × Dir(θ; α)dθ

K K
Γ(∑i=1 αi )
= ∫ θk2 × K ∏ θiαi −1 dθ
∏i=1 Γ(αi ) i=1
K
Γ(∑i=1 αi )
∫ θk ∏ θiαi −1 dθ
(αk −1)+2
= K
∏i=1 Γ(αi ) i≠k
K
Γ(∑i=1 αi )
∫ θk ∏ θiαi −1 dθ
(αk +2)−1
= K
∏i=1 Γ(αi ) i≠k

Because Γ(αk + 2) = Γ(αk ) × αk × (αk + 1), we can get

∏i≠k Γ(αi )Γ(αk + 2)

∫ ∏ θiαi dθ
(α +2)−1
θk k = K
i≠k Γ(2 + ∑i=1 αi )
αk (αk + 1)Γ(αk ) × ∏i≠k Γ(αi )
= K K K
Γ(∑i=1 αi ) × ∑i=1 αi × (1 + ∑i=1 αi )
K
αk (αk + 1) × ∏i=1 Γ(αi )
= K K K
Γ(∑i=1 αi ) × ∑i=1 αi × (1 + ∑i=1 αi )
We have

K K
Γ(∑i=1 αi ) αk (αk + 1) × ∏i=1 Γ(αi )
E[θk2 ] = K × K K K
∏i=1 Γ(αi ) Γ(∑i=1 αi ) × ∑i=1 αi × (1 + ∑i=1 αi )
αk (αk + 1)
= K K
∑i=1 αi × (1 + ∑i=1 αi )

Finally, we get

αk (αk + 1) αk
var(θk ) = K K −( K
)2
∑i=1 αi × (1 + ∑i=1 αi ) ∑i=1 αi
K
αk ((∑i=1 αi ) − αk )
= K K
(∑i=1 αi )2 × (1 + ∑i=1 αi )
In the above figure, (a) The Dirichlet distribu!on when K = 3 defines a distribu!on
over the simplex, which can be represented by the triangular surface. Points on this
3
surface sa!sfy 0 ≤ θk ≤ 1 and ∑k=1 θk = 1. (b) Plot of the Dirichlet density when
α = (2, 2, 2). (c) α = (20, 2, 2). (d) α = (0.1, 0.1, 0.1).

In the above figure, all K = 10 values of αk = 1.

In the above figure, all K = 10 values of αk = 10.

!
In the above figure, all K = 10 values of αk = 100.
In the above figure, all K = 10 values of αk = 0.1.

In the above figure, all K = 10 values of αk = 0.01.

In the above figure, all K = 10 values of αk = 0.001.

The Dirichlet-mul!noulli model

Prior

θ ∼ Dirichlet(θ; α)

Likelihood

D = (X1 , ⋯ , Xi , ⋯ , XN )

Xi ∼ M ultinoulli(θ)
K
p(D∣θ) = ∏ θkNk
k=1
K
∑ Nk = N
k=1

Posterior

p(D, θ) p(D∣θ)p(θ)
p(θ∣D) = = ∝ p(D∣θ)p(θ)
p(D) ∫ p(D∣θ)p(θ)dθ

K K K
p(θ∣D) ∝ ∏ θkNk × ∏ θkαk −1 = ∏ θkNk +αk −1
k=1 k=1 k=1

K K
∏k=1 Γ(Nk + αk )
∫ ∏ θkNk +αk −1 = K
k=1 Γ(∑k=1 (Nk + αk ))

There is a constant C

K
∫ p(θ∣D)dθ = 1 = ∫ C × ∏ θkNk +αk −1 dθ
k=1

So C is

K
Γ(∑k=1 (Nk + αk ))
C= K
∏k=1 Γ(Nk + αk )

And we can get

K K
Γ(∑k=1 (Nk + αk ))
p(θ∣D) = K ∏ θkNk +αk −1
∏k=1 Γ(Nk + αk ) k=1

Because the posterior distribu!on p(θ∣D) is Dirchilet(θ; N1 + αk , ⋯ , NK +

αK ), so we can easily calculate the corresponding mean, mode and variance based on
the previous sec!on on Beta distribu!on.

αk + Nk
E[θk ∣D] = K
∑i=1 (αi + Ni )

K K
If we set M = ∑k=1 αk and we know N = ∑k=1 Nk , then we can get
αk
× M + NNk × N
E[θk ∣D] = M
M +N
M αk N Nk
= × + ×
M +N M M +N N
αk Nk
=λ× + (1 − λ) ×
M N

If λ = MM N αk Nk
+N , then 1 − λ = M +N . We know M is the prior mean of θk , and N is the
MLE of θk .

Posterior predic!ve distribu!on

The posterior predic!ve distribu!on for a single mul!noulli trial is given by the following
expression:
1
p(Xnew = j∣D) = ∫ p(Xnew = j, θ∣D)dθ
0

= ∫ p(Xnew = j∣θ, D) × p(θ∣D)dθ

= ∫ p(Xnew = j∣θ) × p(θ∣D)dθ

= ∫ θj × Dir(θ; N1 + αk , ⋯ , NK + αK )dθ
αj + Nj
= E(θj ∣D) = K
∑k=1 (αk + Nk )

The above expression avoids the zero-count problem. In fact, this form of Bayesian
smoothing is even more important in the mul!nomial case than the binary case, since
the likelihood of data sparsity increases once we start par!!oning the data into many
categories.

Reference

David M. Blei, Probabilis!c Topic Models, Tutorial on KDD 2011

(h#ps://www.cs.princeton.edu/~blei/kdd-tutorial.pdf)

Christopher M. Bishop, Pa#ern Recogni!on and Machine Learning, 2006

Kevin Murphy, Machine Learning: a Probabilis!c Perspec!ve, 2012

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (82)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
The 36 Questions That Lead To Love - The New York Times
91% (35)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Buy ebook Finite Mathematics for the Managerial Life and Social Sciences 10th Edition Soo Tang Tan cheap price
No ratings yet
Buy ebook Finite Mathematics for the Managerial Life and Social Sciences 10th Edition Soo Tang Tan cheap price
55 pages
Astam Formula Sheet
No ratings yet
Astam Formula Sheet
10 pages
1001 Songs
70% (73)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
2 DP Handout
No ratings yet
2 DP Handout
41 pages
A Very Gentle Note On The Construction of DP Zhang
No ratings yet
A Very Gentle Note On The Construction of DP Zhang
15 pages
ML and MAP - HTML
No ratings yet
ML and MAP - HTML
9 pages
Dis Tri Buci Ones
No ratings yet
Dis Tri Buci Ones
16 pages
The University of Nottingham: Do NOT Turn Examination Paper Over Until Instructed To Do So
No ratings yet
The University of Nottingham: Do NOT Turn Examination Paper Over Until Instructed To Do So
6 pages
[MLE] - MLE-vs-Bayes
No ratings yet
[MLE] - MLE-vs-Bayes
11 pages
Statistical Machine Learning 1665832214
No ratings yet
Statistical Machine Learning 1665832214
55 pages
IntroBayesTimeSeries1
No ratings yet
IntroBayesTimeSeries1
72 pages
Johnson11MLSS Talk Extras
No ratings yet
Johnson11MLSS Talk Extras
73 pages
Common Statistical Densities: Appendix 1
No ratings yet
Common Statistical Densities: Appendix 1
59 pages
Notes4_BayesianLearning
No ratings yet
Notes4_BayesianLearning
8 pages
ln13
No ratings yet
ln13
5 pages
Examples of Variational Inference With Gaussian-Gamma Distribution
No ratings yet
Examples of Variational Inference With Gaussian-Gamma Distribution
6 pages
CH 5
No ratings yet
CH 5
45 pages
Stat520 Ch.3
No ratings yet
Stat520 Ch.3
5 pages
SP2009F - Lecture03 - Maximum Likelihood Estimation (Parametric Methods)
No ratings yet
SP2009F - Lecture03 - Maximum Likelihood Estimation (Parametric Methods)
23 pages
Bayes' Estimators: The Method
No ratings yet
Bayes' Estimators: The Method
7 pages
Beta-Binomial - Binomial Joint Distribution of The Sample ("Likelihood"), Beta Prior
100% (1)
Beta-Binomial - Binomial Joint Distribution of The Sample ("Likelihood"), Beta Prior
5 pages
Beta Binomial
No ratings yet
Beta Binomial
5 pages
Formulae Sheet
No ratings yet
Formulae Sheet
11 pages
Bayesian Workshop1 Solution
No ratings yet
Bayesian Workshop1 Solution
3 pages
Tutorial 5 So LN
No ratings yet
Tutorial 5 So LN
10 pages
Hankin - A Generalization of The Dirichlet Distribution
No ratings yet
Hankin - A Generalization of The Dirichlet Distribution
17 pages
Bayesian-inference-slides-2021
No ratings yet
Bayesian-inference-slides-2021
37 pages
murphysolns
No ratings yet
murphysolns
45 pages
Lecture 3
No ratings yet
Lecture 3
4 pages
ProblemSet1Sol
No ratings yet
ProblemSet1Sol
7 pages
The Exponential Family of Distributions: P (X) H (X) e
No ratings yet
The Exponential Family of Distributions: P (X) H (X) e
13 pages
CH 3
No ratings yet
CH 3
79 pages
Lecture 10
No ratings yet
Lecture 10
59 pages
Chapter 5. Bayesian Statistics (II)
No ratings yet
Chapter 5. Bayesian Statistics (II)
30 pages
Fuskpaper Bayes
No ratings yet
Fuskpaper Bayes
51 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Minka Dirichlet PDF
No ratings yet
Minka Dirichlet PDF
14 pages
Lecture4 More Bayes
No ratings yet
Lecture4 More Bayes
24 pages
Estimating A Dirichlet Distribution Thomas P. Minka
No ratings yet
Estimating A Dirichlet Distribution Thomas P. Minka
15 pages
Week 10
No ratings yet
Week 10
2 pages
W10 Notes
No ratings yet
W10 Notes
2 pages
BT_Wk3_LectureNotes(3)
No ratings yet
BT_Wk3_LectureNotes(3)
16 pages
8112 Notes
No ratings yet
8112 Notes
79 pages
Advanced Machine Learning: CS 281
100% (1)
Advanced Machine Learning: CS 281
88 pages
Lecture2 2015
No ratings yet
Lecture2 2015
58 pages
Ch3 - 2009 Conjugate Families of Distributions
No ratings yet
Ch3 - 2009 Conjugate Families of Distributions
67 pages
Binomial Distribution
50% (2)
Binomial Distribution
5 pages
Formula Sheet Math236
No ratings yet
Formula Sheet Math236
2 pages
Assign 1
No ratings yet
Assign 1
5 pages
Web
No ratings yet
Web
329 pages
X400004_20220215_solutions
No ratings yet
X400004_20220215_solutions
8 pages
final_soln
No ratings yet
final_soln
5 pages
Truncated Cauchy Distribution
No ratings yet
Truncated Cauchy Distribution
8 pages
ML-Map-and-Bayseian
No ratings yet
ML-Map-and-Bayseian
35 pages
Bayesian Data Analysis - Reading Instructions 2: Chapter 2 - Outline
No ratings yet
Bayesian Data Analysis - Reading Instructions 2: Chapter 2 - Outline
36 pages
BML Lecture Notes
No ratings yet
BML Lecture Notes
126 pages
15.097: Probabilistic Modeling and Bayesian Analysis
No ratings yet
15.097: Probabilistic Modeling and Bayesian Analysis
42 pages
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Analytic Geometry: Graphic Solutions Using Matlab Language
From Everand
Analytic Geometry: Graphic Solutions Using Matlab Language
Ing. Mario Castillo
No ratings yet
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet
GNN-XAI 学习提纲.md
No ratings yet
GNN-XAI 学习提纲.md
4 pages
神经网络中涉及的向量和矩阵求导
100% (1)
神经网络中涉及的向量和矩阵求导
18 pages
Logistic Regression and Cross-Entropy
No ratings yet
Logistic Regression and Cross-Entropy
3 pages
Notes On Backpropagation
No ratings yet
Notes On Backpropagation
14 pages
Model With One-Word Context: 2vec 2vec 2vec 2vec
100% (1)
Model With One-Word Context: 2vec 2vec 2vec 2vec
17 pages
Notes On Jensen's Inequality
No ratings yet
Notes On Jensen's Inequality
7 pages
Exponential Family Related To LDA
No ratings yet
Exponential Family Related To LDA
12 pages
Lda-The Gritty Details
100% (1)
Lda-The Gritty Details
12 pages
" " - Fennel Hudson: Exams Test Your Memory, Life Tests Your Learning Others Will Test Your Patience
100% (1)
" " - Fennel Hudson: Exams Test Your Memory, Life Tests Your Learning Others Will Test Your Patience
2 pages
11 Lecture 11 Model Based Decision Making Optimization and Multi Criteria Systems 29022024 105211am
No ratings yet
11 Lecture 11 Model Based Decision Making Optimization and Multi Criteria Systems 29022024 105211am
28 pages
Lang, M. Et Al. (2023)
No ratings yet
Lang, M. Et Al. (2023)
67 pages
Ensemble Learning
No ratings yet
Ensemble Learning
22 pages
Chapter 4
No ratings yet
Chapter 4
17 pages
Sol 9 Fall 04
No ratings yet
Sol 9 Fall 04
8 pages
MM 308
No ratings yet
MM 308
28 pages
Class 7 Imo 4 Years Sample Paper
33% (3)
Class 7 Imo 4 Years Sample Paper
6 pages
Year 7 Maths - Data - Mean, Mode, - Median - Questions (Ch3 Ex3)
No ratings yet
Year 7 Maths - Data - Mean, Mode, - Median - Questions (Ch3 Ex3)
3 pages
HW 01 - Ise 303
No ratings yet
HW 01 - Ise 303
2 pages
Daily Lesson Log of M7Ge-Iiig-1 (Week Seven-Day Three)
No ratings yet
Daily Lesson Log of M7Ge-Iiig-1 (Week Seven-Day Three)
4 pages
Morgan Stanley: Varun Bhave: Pre-Intern Experience
No ratings yet
Morgan Stanley: Varun Bhave: Pre-Intern Experience
2 pages
Activity-1 Salosagcol, Leobert Yancy G
No ratings yet
Activity-1 Salosagcol, Leobert Yancy G
4 pages
PT2-Grade 10 Portions
No ratings yet
PT2-Grade 10 Portions
5 pages
Kvs Senior Secondary Summer Holiday HW Final Class Xii 1525165274
No ratings yet
Kvs Senior Secondary Summer Holiday HW Final Class Xii 1525165274
13 pages
Microsoft Word - (Mai 5.2-5.3) Basic Derivatives - Tangent and Normal
No ratings yet
Microsoft Word - (Mai 5.2-5.3) Basic Derivatives - Tangent and Normal
18 pages
Unit 5 Hypothesis Testing-Compressed-1
No ratings yet
Unit 5 Hypothesis Testing-Compressed-1
53 pages
Scientific Method PPT
100% (1)
Scientific Method PPT
17 pages
Parameter Vs Statistics
No ratings yet
Parameter Vs Statistics
6 pages
Paulino, Paul Tyrone R. Assignment 2
No ratings yet
Paulino, Paul Tyrone R. Assignment 2
14 pages
Application Note - Using Low Level Metrics On Speedway Revolution
No ratings yet
Application Note - Using Low Level Metrics On Speedway Revolution
16 pages

Notes On Beta and Dirchilet Distribution

Uploaded by

Notes On Beta and Dirchilet Distribution

Uploaded by

Notes on Beta and Dirchilet Distribu!

We calculate the deriva!ve with respect to θ

var(θ) = E[(θ − E(θ))2 ]

The beta-Bernoulli model

p(D∣θ) = θN1 (1 − θ)N0

p(θ∣D) ∝ θN1 (1 − θ)N0 × θa−1 (1 − θ)b−1 = θN1 +a−1 (1 − θ)N0 +b−1

∫ p(θ∣D)dθ = 1 = ∫ C × θN1 +a−1 (1 − θ)N0 +b−1 dθ

And we can get

Because the posterior distribu!on p(θ∣D) is Beta(θ; N1 + a, N0 + b), so we can

If we set M = a + b and we know N = N0 + N1 , then we can get

Posterior predic!ve distribu!on

E[θk ] = ∫ θk × Dir(θ; α)dθ

Because Γ(αk + 1) = Γ(αk ) × αk , we can get

∏i≠k Γ(αi )Γ(αk + 1)

We calculate the par!al deriva!ve with respect to θk :

Finally, we can get

var(θk ) = E[θk2 ] − E[θk ]2

First, we can get

E[θk2 ] = ∫ θk2 × Dir(θ; α)dθ

Because Γ(αk + 2) = Γ(αk ) × αk × (αk + 1), we can get

∏i≠k Γ(αi )Γ(αk + 2)

In the above figure, all K = 10 values of αk = 1.

In the above figure, all K = 10 values of αk = 0.01.

The Dirichlet-mul!noulli model

And we can get

Because the posterior distribu!on p(θ∣D) is Dirchilet(θ; N1 + αk , ⋯ , NK +

Posterior predic!ve distribu!on

= ∫ p(Xnew = j∣θ, D) × p(θ∣D)dθ

= ∫ p(Xnew = j∣θ) × p(θ∣D)dθ

David M. Blei, Probabilis!c Topic Models, Tutorial on KDD 2011

Christopher M. Bishop, Pa#ern Recogni!on and Machine Learning, 2006

Kevin Murphy, Machine Learning: a Probabilis!c Perspec!ve, 2012

You might also like