0% found this document useful (0 votes)

33 views

3 Prediction Handout

This document provides an overview of Bayesian nonparametrics and predictive approaches, specifically focusing on the Dirichlet process and Pitman-Yor process. It summarizes that the Dirichlet process exhibits restrictive logarithmic growth of the number of discovered species as the sample size increases. In contrast, the Pitman-Yor process allows for a richer predictive structure by making the probability of discovering a new species dependent on both the sample size and number of previously discovered species. The Pitman-Yor process falls under the category of Gibbs-type priors, which exhibit a flexible predictive distribution characterized by a discount parameter and recursive weight formulas.

Uploaded by

Alessandro Sinai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

3 Prediction Handout

Uploaded by

Alessandro Sinai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Machine Learning II

Part I: An overview of Bayesian Nonparametrics

Predictive approach to BNP

Igor Prünster
Bocconi University

1 / 25
Exchangeability & Prediction

Exchangeability & Prediction

de Finetti’s representation theorem

A sequence of X–valued observations (Xn )n≥1 is exchangeable if and only

if
n
Z Y
P[X1 ∈ A1 , . . . , Xn ∈ An ] = P(Ai )Q(dP)
P i=1

for any n ≥ 1, where P is the space of probability measures on X.

Prediction within de Finetti’s framework:

▶ Singular (or one–step) prediction
Z
P[Xn+1 ∈ A |X (n) ] = P(A) Q(dP|X (n) )
| {z } P | {z }
predicitve distribution ∈ P posterior distribution
(n)
where throughout we set X := (X1 , . . . , Xn ).
▶ m–step prediction
Z m
Y
P[Xn+1 ∈ A1 , . . . , Xn+m ∈ Am |X (n) ] = P(Ai ) Q(dP|X (n) ).
P i=1

2 / 25
Exchangeability & Prediction

Two key properties of the Dirichlet process

Recall the following key properties of the Dirichlet process:

▶ For the number of distinct values (species, agents, clusters, components
etc.) Kn generated by a sample X (n) we have

θk
P [Kn = k] = |s(n, k)|
(θ)n
n
X θ
E[Kn ] =
i=1
θ + i −1

with (θ)n := θ(θ + 1) . . . (θ + n − 1) and |s(n, k)| the signless Stirling

number of the first kind.
▶ The predictive distributions are of the form
Kn
θ n 1 X
P[Xn+1 ∈ · |X (n) ] = P ∗( · ) + Ni δXi∗ ( · )
θ+n | {z } θ+n n i=1
| {z } prior guess
| {z }
| {z }
P[Xn+1 =“new” | X (n) ] P[Xn+1 =“old” | X (n) ] emprirical measure

3 / 25
Exchangeability & Prediction

m–step prediction for the Dirichlet process

Given a basic sample X (n) , prediction of various features of an additional

sample X (m) := (Xn+1 , . . . , Xn+m ) is of interest. In particular,

Km(n) := Km+n − Kn

is the number of new species to be recorded in X (m) given X (n) .

In the DP case [Favaro, P & Walker, 2011] one has
h i θj (θ)n
P Km(n) = j|X (n) = |s(m, j, n)|
(θ)(n+m)
m
h i X θ
E Km(n) |X (n) =
i=1
θ + n + i −1

with|s(n, k, r )| denoting the non–central version of the signless Stirling number.

(n)
=⇒ The only sample information affecting the distribution of Km |X (n) is the
size n; it depends neither on Kn nor on N1 , . . . , NKn !

4 / 25
Exchangeability & Prediction

Since P[Xn+1 = “new” | X (n) ] = θ+n

θ
for any sample size n, clearly Kn will
diverge as the sample size n → ∞. But which is its growth rate?
▶ In the unconditional case, by Korwar and Hollander (1973), one has

Kn a.s.
−→ θ (n → ∞)
log n

▶ In the conditional case, given a fixed sample X (n) , one has

(n)
Km a.s.
X (n) −→ θ (m → ∞).
log m

Remark: Ideally one would like a model with a flexible growth rate which
depends on the model parameters (e.g. nσ with parameter σ ∈ (0, 1)). Instead,
the DP displays a logarithmic growth and such a behaviour is unaffected by the
sample X (n) =⇒ highly restrictive and inappropriate in applications [e.g.
linguistics (Teh, 2006), certain bipartite graphs (Caron, 2012), network models
(Caron & Fox, 2017) and species sampling]
Can such a model be really considered nonparametric? What is re-
sponsible for such a restrictive behaviour?
It all boils down to P[Xn+1 = “new” | X (n) ] = θ
θ+n
!
5 / 25
Gibbs–type priors & the Pitman–Yor process

Probability of discovering a new species

As seen, a key quantity in the analysis of discrete nonparametric priors is the
probability of discovering a new species

P[Xn+1 = “new” | X (n) ]. (∗)

Fundamental Characterization:
Based on (∗), discrete P̃ classified in 3 categories [De Blasi et al., 2015]:
(a) P[Xn+1 = “new” | X (n) ] = f (n, model parameters)
⇐⇒ depends on n but not on Kn and Nn = (N1 , . . . , NKn )
⇐⇒ Dirichlet process;
(b) P[Xn+1 = “new” | X (n) ] = f (n, Kn , model parameters)
⇐⇒ depends on n and Kn but not on Nn = (N1 , . . . , NKn )
⇐⇒ Gibbs–type priors [Gnedin & Pitman, 2006];
(c) P[Xn+1 = “new” | X (n) ] = f (n, Kn , Nn , model parameters)
⇐⇒ depends on all information conveyed by the sample i.e. n, Kn and
Nn = (N1 , . . . , NKn )
⇐⇒ serious tractability issues.
=⇒ Let’s go for the intermediate case which exhibits a richer predictive
structure than the DP!
6 / 25
Gibbs–type priors & the Pitman–Yor process

Gibbs–type priors

Q is a Gibbs-type prior of order σ ∈ (−∞, 1) if and only if it gives rise to

predictive distributions of the form

P[Xn+1 ∈ · | X (n) ]
PKn
i=1 (Ni − σ) δXi ( · )

Vn+1,Kn +1 Vn+1,Kn +1 ∗
= P ∗( · ) + 1 −
Vn,K Vn,Kn n − σKn
| {z n }
| {z }
prior guess | {z }| {z }
weighted emprirical measure
P[Xn+1 =“new” | X (n) ] P[Xn+1 =“old” | X (n) ]

where P ∗ is diffuse and {Vn,j : n ≥ 1, 1 ≤ j ≤ n} is a set of weights which

satisfy the recursion

Vn,j = (n − jσ)Vn+1,j + Vn+1,j+1 . (♢)

=⇒ A Gibbs–type prior is completely characterized by choice of P ∗ , σ < 1 and

a set of weights Vn,j ’s.
=⇒ Crucially now P[Xn+1 = “new” | X (n) ] depends on both the sample size n
and the distinct values in the sample Kn .

7 / 25
Gibbs–type priors & the Pitman–Yor process

The Pitman-Yor process & the NGG process

By defining, for σ ≥ 0 and θ > −σ or σ < 0 and θ = r |σ| with r ∈ N,
Qk−1
(θ + iσ)
Vn,k = i=1
(θ + 1)n−1
one obtains the two parameter Poisson–Dirichlet process aka Pitman–Yor (PY)
process [Pitman & Yor, 1997], which yields
PKn
i=1 (Ni − σ)δXi ( · )

(n) θ + σKn ∗ n − σKn ∗
P Xn+1 ∈ · X = P (·) + .
θ+n θ+n n − σKn
θ+Kn σ θ
=⇒ if σ = 0, the PY reduces to the Dirichlet process and θ+n
to θ+n
.
The normalized generalized gamma process (NGG) arises with
n−1
!
eβ σ k−1 X n − 1

i
Vn,k = (−1)i β i/σ Γ k − ; β ,
Γ(n) i=0 i σ

where β > 0, σ ∈ (0, 1) and Γ(x, a) denotes the incomplete gamma function. If
σ = 1/2 it reduces to the normalized inverse Gaussian process (N–IG). [Lijoi,
Mena & P, 2005 & 2007b].
=⇒ In the following we restrict attention to the PY process but most results
carry over (with minor modifications) to general Gibbs–type priors
8 / 25
Gibbs–type priors & the Pitman–Yor process

A closer look at the predictive structure

The Gibbs–structure allows to look at the predictives as the result of two steps:
(1) Xn+1 is a new species with probability
θ + σKn
θ+n
or “old” {X1∗ , . . . , XK∗n } with probability (n − σKn )(θ + n)−1 .
=⇒ depends on n and Kn but not on the frequencies Nn = (N1 , . . . , NKn )
=⇒ P[Xn+1 = “new” | X (n) ] is monotonically increasing or decreasing in
Kn according to σ > 0 or σ < 0, respectively.
(2) (i) Given Xn+1 is new, it is independently sampled from P ∗ .
(ii) Given Xn+1 is a tie, it coincides with Xi∗ with probability
Ni − σ
.
n − σKn
=⇒ A reinforcement mechanism driven by σ takes place among the
“old” values. For instance, if N1 = 2 and N2 = 1 then
 
 < 2 if σ < 0  1.5 if σ = −1
P[Xn+1 = X1∗ | X (n) ] 2−σ e.g .
= = 2 if σ = 0 = 2 if σ = 0
P[Xn+1 = X2∗ | X (n) ] 1−σ
> 2 if σ > 0 3 if σ = 0.5
 

9 / 25
Gibbs–type priors & the Pitman–Yor process

The number of clusters generated by the PY process

The (prior) distribution of the number of clusters [Pitman, 2006] is given by
Qk−1
(θ + iσ)
P(Kn = j) = i=1 C (n, j; σ)
(θ + 1)n−1 σ j
with C (n, j; σ) = j!1 ji=0 (−1)i ji (−iσ)n a generalized factorial coefficient.
P

In general, the dependence of the distribution of Kn on the prior parameters is:

▶ σ controls the “ flatness ” (or variability) of the (prior) distribution of Kn .
▶ θ controls the location of the (prior) distribution of Kn
(n)
Recall that Km is the number of new species to be recorded in the additional
sample of size m given X (n) featuring Kn = j distinct values. One has that
Qj+k−1
(θ + iσ)

(θ + 1)n−1 i=j
(n)
P Km = k X (n)
= C (m, k; σ, −n + jσ).
(θ + 1)n+m−1 σk
and also that the expected number of new species is

(n) (n) θ (θ + n + σ)m
E[Km |X ] = j + −1 .
σ (θ + n)m
See Lijoi, Mena & P (2007a) and Favaro, Lijoi, Mena & P (2009).
10 / 25
Gibbs–type priors & the Pitman–Yor process

Prior distribution of the number of clusters as σ varies

0.3

0.2

0.1

0
0.2
10 0.3
20 0.4
0.5
30
0.6
40 0.7
50 0.8

Prior distributions on the number of clusters corresponding to the PY

process with n = 50, θ = 1 and σ = 0.2, 0.3, . . . , 0.8.

11 / 25
Gibbs–type priors & the Pitman–Yor process

Asymptotics for the number of clusters in the PY case

a.s.
▶ If σ < 0 and θ = r |σ|, then Kn −→ r as n → ∞. Also, conditionally on
(n) a.s.
X featuring Kn = j distinct values, Km |X (n) −→ r − j for m → ∞.
(n)

▶ One has
Kn a.s.
−→ Yθ/σ (n → ∞).
nσ
where Yq , with q ≥ 0 is a suitably generalized Mittag–Leffler random
variable. [Pitman, 2006]
▶ Conditional on X (n) featuring Kn = j distinct values, as the size of the
additional sample m diverges
(n)
Km a.s.
X (n) −→ Zn,j (m → ∞)
mσ
d
where Zn,j = Bj+θ/σ, n/σ−j Y(θ+n)/σ with Ba,b a beta random variable
independent of Yq . [Favaro, Lijoi, Mena & P, 2009]
=⇒ In the DP case logarithmic growth of Kn : such a restriction has been
overcome by allowing a richer predictive structure and the growth now depends
on the model parameter σ.
12 / 25
Species sampling

Data structure in species sampling problems

▶ X (n) = basic sample of draws from a population containing different

species (plants, genes, animals,...). Information:
⋄ sample size n and number of distinct species in the sample Kn ;
PKn
⋄ a collection of frequencies N = (N1 , . . . , NKn ) s.t. i=1 Ni = n;
⋄ the labels (names) Xi∗ ’s of the distinct species, for i = 1, . . . , Kn .

▶ The information provided by N can also be coded by M := (M1 , . . . , Mn )

Mi = number of species in the sample X (n) having frequency i.
Note that ni=1 Mi,n = Kn and ni=1 iMi,n = n.
P P

▶ Example: Consider a basic sample such that

⋄ n = 10 with j = 4 and frequencies (n1 , n2 , n3 , n4 ) = (2, 5, 2, 1).
⋄ equivalently we can code this information as

(m1 , m2 , . . . , m10 ) = (1, 2, 0, 0, 1, . . . , 0),

meaning that 1 species appears once, 2 appear twice and 1 five times.

13 / 25
Species sampling

One–step Prediction
▶ Discovery probability estimation: Given a basic sample X (n) , estimate the
probability of discovering at the (n+1)–th sampling step either a new
species or an “ old ” species with frequency r .
▶ Turing estimator [Good, 1953; Mao & Lindsay, 2002]: probability of
discovering at (n+1)–th step a new species is
m1
n
and a species with frequency r in X (n) is
mr +1
(r + 1) .
n
▶ Problem: mr +1 is used to estimate the probability of discovering a species
with frequency r =⇒ counterintuitive! It should be based on mr .
E.g. If m5 = 10, m6 = 0, the estimated probability of detecting a species
with frequency 5 would be 0 =⇒ problem bypassed by use of “ smoothing
functions ” but, in I.J. Good’s words, it seems like an “adhockery”!
Origin of the problem? In a frequentist nonparametric setup there is no
natural quantity to use for estimating the probability of discovering a new
species and so m1 is used. Hence, the discovery probability of species with
frequency 1 uses m2 and so on.
14 / 25
Species sampling

BNP approach to discovery probab. [Lijoi, Mena & P, 2007a]

Key advantage of the Bayesian nonparametric approach is that the

predictive structure includes a positive probability of discovering a new
species, value, cluster, agent etc.
Assume the data (Xn )n≥1 are exchangeable with a PY de Finetti measure.
(n)
BNP analog to Turing estimator: given Pna basic sample X featuring Kn = j
distinct species with m1 , . . . , mn s.t. i=1 i mi = n:
▶ the probability of discovering a new species is

θ + σj
P[Xn+1 = “new” | X (n) ] = .
θ+n

▶ the probability of detecting a species with frequency r in X (n) is

r −σ
P[Xn+1 = “species with frequency r ” | X (n) ] = mr .
θ+n

=⇒ Probability of sampling a species with frequency r depends, in agreement

with intuition, on mr and also on Kn = j.

15 / 25
Species sampling

▶ BNP analog of the Good–Toulmin estimator [Favaro, Lijoi, Mena & P,

2009]: estimator for the probability of discovering a new species at the
(n+m+1)–th step

θ + kσ (θ + n + σ)m
P[Xn+m+1 = “new” | X (n) ] = .
θ + n (θ + n + 1)m

▶ BNP estimator for the probability of discovering a species with frequency

r at the (n+m+1)–th sampling step [Favaro, Lijoi and P, 2012]:

P[Xn+m+1 = “ species with frequency r ” | X (n) ]

r
!
X m (θ + n − i + σ)m−r +i
= mi (i − σ)r +1−i
i=1
r − i (θ + n)m+1
!
m (θ + kσ)(θ + n + σ)m−r
+ (1 − σ)r
r (θ + n)m+1

17 / 25
Species sampling

Discovery probability at the (n + m + 1)–th sampling step.

1.2 Anaerobic Aerobic
PY-Estimator PY-Estimator
GT-Estimator GT-Estimator
Probability of discovering a new species DP-Estimator DP-Estimator
1.0

0.8

0.6

0.4

0.2

0.0

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Size of the additional sample

EST data from Naegleria gruberi aerobic and anaerobic cDNA libraries
with basic sample n ∼
= 950: Good–Toulmin (GT), DP process and PY
process estimators of the probability of discovering a new gene at the
(n + m + 1)–th sampling step for m = 1, . . . , 2000.
18 / 25
Species sampling

Expected number of new genes in an additional sample of size

m. 2000
Aerobic Anaerobic
PY-Estimator PY-Estimator
GT-Estimator GT-Estimator
DP-Estimator DP-Estimator
Expected number of new genes

1500

1000

500

0 200 400 600 800 1000 1200 1400 1600 1800 2000
Size of the additional sample

EST data from Naegleria gruberi aerobic and anaerobic cDNA libraries
with basic sample n ∼
= 950: Good–Toulmin (GT), DP process and PY
process estimators of the number of new genes to be observed in an
additional sample of size m = 1, . . . , 2000.
19 / 25
Species sampling

Some remarks on BNP models for species sampling problems

▶ BNP estimators based on Gibbs–type priors available for the before

mentioned and other quantities of interest in species sampling problems.
▶ BNP models correspond to large probabilistic models in which all objects
of potential interest are modeled jointly and coherently thus leading to
intuitive predictive structures
=⇒ avoids ad–hoc procedures and incoherencies sometimes connected
with frequentist nonparametric procedures.
▶ Gibbs–type priors with σ > 0 (recall that they assume an infinite number
of species) are ideally suited for populations with large unknown number
of species =⇒ typical case in Genomics.
▶ In Ecology “∞” assumption often too strong =⇒ Gibbs–type priors with
σ < 0 (surprising heuristic by–product: by combining Gibbs-type priors
with σ > 0 and σ < 0 is possible to identify situations in which frequentist
estimators work).

20 / 25
Frequentist Consistency

Frequentist Posterior Consistency

“ What if ” or frequentist approach to consistency [Diaconis and Freedman,

1986]: What happens if the data are not exchangeable but i.i.d. from a “true”
P0 ? Does the posterior Q( · |X (n) ) accumulate around P0 as the sample size
increases?
Definition. Q is weakly consistent at P0 if for every Aε
n→∞
Q(Aε |X (n) ) −→ 1 a.s. − P0∞

where Aε is a weak neighbourhood of P0 and P0∞ denotes the infinite product

measure.
The typical proof strategy for discrete nonparametric priors consists in showing
▶ E[P̃ | X (n) ] n→∞
−→ P0 a.s.–P0∞ (the key step consists in showing that
n→∞
P[Xn+1 = “new” | X (n) ] −→ 0 a.s.–P0∞ )
▶ Var[P̃ | X (n) ] n→∞
−→ 0 a.s.–P0∞ by finding a suitable bound on the variance.

21 / 25
Frequentist Consistency

Consistency of the PY process

Denote by κn the number of distinct values generated by the “ true ” P0 in a
sample of size n. We distinguish two cases of P0 :
▶ P0 is discrete (with finite or infinite support points) =⇒ κnn n→∞
−→ 0:
The predictive distribution given a sample generated by P0 is then

P[Xn+1 ∈ · | X (n) ] =
κn
θ + σκn ∗ n − σκn X (Ni − σ)
= P (·) + δXi∗ ( · )→ P0 ( · )
θ+n θ+n n − σκ
| {z } | {z } i=1 | {z n}
→0 →1 ∼Ni /n

▶ P0 is diffuse (i.e. P0 ({x}) = 0 ∀x ∈ X) =⇒ κn = n and Ni = 1, ∀i.

n
θ + σn ∗ n − σn X (1 − σ)
P[Xn+1 ∈ · | X (n) ] = P (·) + δXi∗ ( · )
θ+n θ+n n − σn
| {z } | {z } i=1 | {z }
→σ →1−σ =1/n
∗
→ σP ( · ) + (1 − σ)P0 ( · )

22 / 25
Frequentist Consistency

n→∞
Moreover, in both cases we have Var[P̃ | X (n) ] −→ 0 a.s.–P0∞ and hence:
▶ The PY process is consistent for discrete data generating P0 (e.g. species
sampling problems);
▶ The PY process is inconsistent for diffuse data generating P0 .
Similar results hold for general Gibbs–type priors [De Blasi, Lijoi and P, 2013].
Should one worry about the inconsistency result in the case of diffuse P0 ? NO!
Discrete nonparametric priors are designed to model discrete distributions and
should not be used to model data from continuous distributions. In fact, as the
sample size n diverges:
▶ P0 generates (Xn )n≥1 containing no ties with probability 1;
▶ a discrete de Finetti measure generates (Xn )n≥1 containing no ties with
probability 0
=⇒ model and data generating mechanism are incompatible!
=⇒ There are not good or bad priors, but rather models which are
suitable/compatible/designed for a certain phenomenon and not for others!

23 / 25
Some References

Some References

• Antoniak(1974). Mixtures of Dirichlet processes with applications to Bayesian

nonparametric problems. Ann. Statist. 2, 1152–1174.
• De Blasi, Favaro, Lijoi, Mena, Prünster & Ruggiero (2015). Are Gibbs-type priors
the most natural generalization of the Dirichlet process? IEEE TPAMI 37, 212-229.
• De Blasi, Lijoi, & Prünster (2013). An asymptotic analysis of a class of discrete
nonparametric priors. Satist. Sinica 23, 1299–1322.
• Caron (2012). Bayesian nonparametric models for bipartite graphs. NIPS’2012.
• Caron & Fox (2017), Sparse graphs using exchangeable random measures. J. R.
Stat. Soc. B 79, 1295-1366.
• Diaconis & Freedman (1986). On the consistency of Bayes estimates. Ann. Statist.
14, 1-26.
• Favaro, Lijoi, Mena & Prünster (2009). Bayesian nonparametric inference for species
variety with a two parameter Poisson-Dirichlet process prior. J. Roy. Statist. Soc. Ser.
B 71, 993–1008.
• Favaro, Lijoi & Prünster (2012). A new estimator of the discovery probability.
Biometrics 68, 1188-96.
• Favaro, Prünster & Walker (2011). On a class of random probability measures with
general predictive structure. Scand. J. Statist. 38, 359–376.
• Ferguson (1973). A Bayesian analysis of some nonparametric problems. Ann.
Statist. 1, 209-30.
• Gnedin & Pitman (2006). Exchangeable Gibbs partitions and Stirling triangles. J.
Math. Sci. (N.Y.) 138, 5674-85.

24 / 25
Some References

• Good & Toulmin (1956). The number of new species, and the increase in
population coverage, when a sample is increased. Biometrika 43, 45-63.
• Good (1953). The population frequencies of species and the estimation of
population parameters. Biometrika 40, 237-64.
• Korwar & Hollander (1973). Contribution to the theory of Dirichlet processes.
Ann. Probab. 1, 705-11
• Lijoi, Mena & Prünster (2007a). Bayesian nonparametric estimation of the
probability of discovering new species. Biometrika 94, 769–786.
• Lijoi, Mena & Prünster (2007b). Controlling the reinforcement in Bayesian
nonparametric mixture models. J. Roy. Statist. Soc. Ser. B 69, 715–740.
• Lo (1991). A characterization of the Dirichlet process. Statist. Probab. Lett. 12,
185–187.
• Mao (2004). Prediction of the conditional probability of discovering a new class. J.
Am. Statist. Assoc. 99, 1108-18.
• Mao & Lindsay (2002). A Poisson model for the coverage problem with a genomic
application. Biometrika 89, 669-81.
• Pitman & Yor (1997). The two-parameter Poisson-Dirichlet distribution derived
from a stable subordinator. Ann. Probab. 25, 855–900.
• Pitman (2006). Combinatorial Stochastic Processes. Lecture Notes in Math.,
vol.1875, Springer, Berlin.
• Regazzini (1978). Intorno ad alcune questioni relative alla definizione del premio
secondo la teoria della credibilità. Giorn. Istit. Ital. Attuari, 41, 77–89.
• Teh (2006). A Hierarchical Bayesian Language Model based on Pitman-Yor
Processes. Coling/ACL 2006, 985-92.

25 / 25

St2334-Cheatsheet Organized
No ratings yet
St2334-Cheatsheet Organized
2 pages
Mood - Graybill - Boes (1974) Introduction To The Theory of Statistics PDF
75% (8)
Mood - Graybill - Boes (1974) Introduction To The Theory of Statistics PDF
577 pages
A Probability and Statistics Cheatsheet
No ratings yet
A Probability and Statistics Cheatsheet
28 pages
ALL ST218 Lecture Notes
No ratings yet
ALL ST218 Lecture Notes
87 pages
Eng Afr Fin
No ratings yet
Eng Afr Fin
394 pages
2 DP Handout
No ratings yet
2 DP Handout
41 pages
Probability and Statistics Cheat Sheet
100% (2)
Probability and Statistics Cheat Sheet
28 pages
Lecture 4
No ratings yet
Lecture 4
7 pages
Mood - Graybill - Boes (1974) Introduction To The Theory of Statistics
100% (1)
Mood - Graybill - Boes (1974) Introduction To The Theory of Statistics
577 pages
1) Alexander McFarlane Mood, Franklin A. Graybill, Duane C. Boes - Introduction To The Theory of Statistics
No ratings yet
1) Alexander McFarlane Mood, Franklin A. Graybill, Duane C. Boes - Introduction To The Theory of Statistics
577 pages
A 18-Page Statistics & Data Science Cheat Sheets
No ratings yet
A 18-Page Statistics & Data Science Cheat Sheets
18 pages
Module01_ProbabilityAndHypothesisTesting
No ratings yet
Module01_ProbabilityAndHypothesisTesting
62 pages
Exam P Review Sheet
No ratings yet
Exam P Review Sheet
12 pages
Bayes Intro PT 2
No ratings yet
Bayes Intro PT 2
13 pages
18.443 MIT Stats Course
No ratings yet
18.443 MIT Stats Course
139 pages
Principles of Statistics
No ratings yet
Principles of Statistics
113 pages
BT_Wk3_LectureNotes(3)
No ratings yet
BT_Wk3_LectureNotes(3)
16 pages
Mood A.m., Graybill F.a., Boes D.C. Introduction To The Theory of Statistics (3rd Ed., McGraw-Hil - 0
No ratings yet
Mood A.m., Graybill F.a., Boes D.C. Introduction To The Theory of Statistics (3rd Ed., McGraw-Hil - 0
577 pages
Introduction To Theory of Statistics, A. Mood, F. Graybill and B. Boes, McGrow-Hill
100% (1)
Introduction To Theory of Statistics, A. Mood, F. Graybill and B. Boes, McGrow-Hill
578 pages
Mood Introduction To The Theory of Statistics
0% (1)
Mood Introduction To The Theory of Statistics
577 pages
North South University Mat361 Total Marks-30 (Time - 70 Min + 10min)
No ratings yet
North South University Mat361 Total Marks-30 (Time - 70 Min + 10min)
8 pages
$$$MGB3rdSearchable PDF
100% (1)
$$$MGB3rdSearchable PDF
577 pages
20240303_Kazadi_Joel_9213934_DLMDSAS01
No ratings yet
20240303_Kazadi_Joel_9213934_DLMDSAS01
20 pages
Fall 2018 Statistics 201A Aditya Guntuboyina
No ratings yet
Fall 2018 Statistics 201A Aditya Guntuboyina
101 pages
Probability and Statistics: Cookbook
No ratings yet
Probability and Statistics: Cookbook
31 pages
Stat Cookbook
No ratings yet
Stat Cookbook
31 pages
Statistical Procedures for Engineering, Management, And -- Blank, Leland T_ -- McGraw-Hill Series in Industrial Engineering and Management -- 9780070058514
No ratings yet
Statistical Procedures for Engineering, Management, And -- Blank, Leland T_ -- McGraw-Hill Series in Industrial Engineering and Management -- 9780070058514
680 pages
Probability Distribution: Shreya Kanwar (16eemme023)
No ratings yet
Probability Distribution: Shreya Kanwar (16eemme023)
51 pages
Statistics
No ratings yet
Statistics
60 pages
JSCIENCES215871291149000
No ratings yet
JSCIENCES215871291149000
9 pages
All Lectures 2018 Fall 201 A
No ratings yet
All Lectures 2018 Fall 201 A
100 pages
P299 Module 8 Notes
No ratings yet
P299 Module 8 Notes
8 pages
BT_Wk3_LectureNotes(2)
No ratings yet
BT_Wk3_LectureNotes(2)
19 pages
Advanced ML Notes (Midterm)
No ratings yet
Advanced ML Notes (Midterm)
10 pages
Stat Cookbook
No ratings yet
Stat Cookbook
31 pages
PRML Slides 2
No ratings yet
PRML Slides 2
86 pages
Probability and Statistics: Cookbook
No ratings yet
Probability and Statistics: Cookbook
31 pages
Johnson11MLSS Talk Extras
No ratings yet
Johnson11MLSS Talk Extras
73 pages
Intro To Data Science Lecture 2
No ratings yet
Intro To Data Science Lecture 2
12 pages
The Infinite Gaussian Mixture Model: Carl Edward Rasmussen
No ratings yet
The Infinite Gaussian Mixture Model: Carl Edward Rasmussen
7 pages
MIR - Ivchenko G. I., Medvedev Yu. and Chistyakov A. - Problems in Mathematical Statistics - 1991
100% (4)
MIR - Ivchenko G. I., Medvedev Yu. and Chistyakov A. - Problems in Mathematical Statistics - 1991
282 pages
Probability and Statistics Cookbook
No ratings yet
Probability and Statistics Cookbook
28 pages
Stat Cookbook
No ratings yet
Stat Cookbook
31 pages
Stats Cheat Sheet
No ratings yet
Stats Cheat Sheet
28 pages
Probability and Statistics: Cookbook
No ratings yet
Probability and Statistics: Cookbook
28 pages
Unit II
No ratings yet
Unit II
140 pages
Basic Facts Quals
No ratings yet
Basic Facts Quals
30 pages
Formulario Ep Probability and Statistics
No ratings yet
Formulario Ep Probability and Statistics
28 pages
Chapter_4 (2)
No ratings yet
Chapter_4 (2)
22 pages
STAT2011 Week1 2024
No ratings yet
STAT2011 Week1 2024
14 pages
Formula Sheet
No ratings yet
Formula Sheet
19 pages
Stationarity and Nonstationarity
No ratings yet
Stationarity and Nonstationarity
261 pages
DSA5205_2_DIstribution&Risk
No ratings yet
DSA5205_2_DIstribution&Risk
59 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Mathematical Formulas for Economics and Business: A Simple Introduction
From Everand
Mathematical Formulas for Economics and Business: A Simple Introduction
K.H. Erickson
4/5 (4)
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
3.5/5 (1)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Useful Formulae: Mathematical & Physical
From Everand
Useful Formulae: Mathematical & Physical
Matthew Watkins
No ratings yet
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Adaptive method for dosing with feedback
No ratings yet
Adaptive method for dosing with feedback
17 pages
L8 Statistical Estimation 1
No ratings yet
L8 Statistical Estimation 1
48 pages
J.D. Opdyke - Robust Stats - ABA Presentation - 08-08-11 - Updated Scrib
No ratings yet
J.D. Opdyke - Robust Stats - ABA Presentation - 08-08-11 - Updated Scrib
97 pages
Making Sense of Data Mooc Notes PDF
No ratings yet
Making Sense of Data Mooc Notes PDF
32 pages
Au B.com Business Statistics
No ratings yet
Au B.com Business Statistics
221 pages
MA-TESOL-RM1-2024-W5-Sampling
No ratings yet
MA-TESOL-RM1-2024-W5-Sampling
46 pages
13051-Accepted Unedited MS-80529-3-10-20220706
No ratings yet
13051-Accepted Unedited MS-80529-3-10-20220706
18 pages
HD - Machine Learnind and Econometrics
No ratings yet
HD - Machine Learnind and Econometrics
185 pages
Unit 5 - Power System Operation and Control
100% (2)
Unit 5 - Power System Operation and Control
28 pages
Where can buy Random Walks on Boundary for Solving PDEs Karl K. Sabelfeld ebook with cheap price
No ratings yet
Where can buy Random Walks on Boundary for Solving PDEs Karl K. Sabelfeld ebook with cheap price
81 pages
Basic Statistics Power Point
No ratings yet
Basic Statistics Power Point
41 pages
Violations of Classical Assumptions: Chapter Four
No ratings yet
Violations of Classical Assumptions: Chapter Four
38 pages
Mms Testing of Hypothesis
No ratings yet
Mms Testing of Hypothesis
69 pages
Point and Interval Estimation: Presented By: Shubham Mehta 0019
100% (1)
Point and Interval Estimation: Presented By: Shubham Mehta 0019
43 pages
Practice Final
No ratings yet
Practice Final
15 pages
MAT 2602 Statistics Module 2021
No ratings yet
MAT 2602 Statistics Module 2021
288 pages
SSRN Id274824
No ratings yet
SSRN Id274824
36 pages
Medical Image Analysis (The MICCAI Society book Series) 1st Edition Alejandro Frangi instant download
100% (1)
Medical Image Analysis (The MICCAI Society book Series) 1st Edition Alejandro Frangi instant download
46 pages
Regression1 Framework
No ratings yet
Regression1 Framework
52 pages
Answers To Homework Assignment 3
No ratings yet
Answers To Homework Assignment 3
20 pages
QA_M4__MLR_Chapter 18 IND_Business StatisticsGovind Chand Beri
No ratings yet
QA_M4__MLR_Chapter 18 IND_Business StatisticsGovind Chand Beri
25 pages
Homework Questions
No ratings yet
Homework Questions
9 pages
[Ebooks PDF] download (Ebook) Handbook of Agricultural Economics. Volume 1A: Agricultural Production. Handbooks in Economics 18 by Bruce L. Gardner, Gordon C. Rausser ISBN 9780444507280, 0444507280 full chapters
100% (2)
[Ebooks PDF] download (Ebook) Handbook of Agricultural Economics. Volume 1A: Agricultural Production. Handbooks in Economics 18 by Bruce L. Gardner, Gordon C. Rausser ISBN 9780444507280, 0444507280 full chapters
77 pages
Completely Randomized Design With and Without Subsamples
No ratings yet
Completely Randomized Design With and Without Subsamples
22 pages
Stat Cluster Sampling
No ratings yet
Stat Cluster Sampling
22 pages
Department of Statistics Course Structure of B.Sc. Statistics
No ratings yet
Department of Statistics Course Structure of B.Sc. Statistics
24 pages
EWMA Tutorial
No ratings yet
EWMA Tutorial
6 pages
Get Microeconometrics Using Stata Cross Sectional and Panel Regression Models 2nd Edition A Colin Cameron Pravin K Trivedi free all chapters
100% (1)
Get Microeconometrics Using Stata Cross Sectional and Panel Regression Models 2nd Edition A Colin Cameron Pravin K Trivedi free all chapters
46 pages
機率大抄
No ratings yet
機率大抄
2 pages

3 Prediction Handout

Uploaded by

3 Prediction Handout

Uploaded by

Machine Learning II

Part I: An overview of Bayesian Nonparametrics

Exchangeability & Prediction

de Finetti’s representation theorem

A sequence of X–valued observations (Xn )n≥1 is exchangeable if and only

for any n ≥ 1, where P is the space of probability measures on X.

Prediction within de Finetti’s framework:

Two key properties of the Dirichlet process

Recall the following key properties of the Dirichlet process:

with (θ)n := θ(θ + 1) . . . (θ + n − 1) and |s(n, k)| the signless Stirling

m–step prediction for the Dirichlet process

Given a basic sample X (n) , prediction of various features of an additional

is the number of new species to be recorded in X (m) given X (n) .

with|s(n, k, r )| denoting the non–central version of the signless Stirling number.

Since P[Xn+1 = “new” | X (n) ] = θ+n

▶ In the conditional case, given a fixed sample X (n) , one has

Probability of discovering a new species

P[Xn+1 = “new” | X (n) ]. (∗)

Q is a Gibbs-type prior of order σ ∈ (−∞, 1) if and only if it gives rise to

where P ∗ is diffuse and {Vn,j : n ≥ 1, 1 ≤ j ≤ n} is a set of weights which

Vn,j = (n − jσ)Vn+1,j + Vn+1,j+1 . (♢)

=⇒ A Gibbs–type prior is completely characterized by choice of P ∗ , σ < 1 and

The Pitman-Yor process & the NGG process

A closer look at the predictive structure

The number of clusters generated by the PY process

In general, the dependence of the distribution of Kn on the prior parameters is:

Prior distribution of the number of clusters as σ varies

Prior distributions on the number of clusters corresponding to the PY

Asymptotics for the number of clusters in the PY case

Data structure in species sampling problems

▶ X (n) = basic sample of draws from a population containing different

▶ The information provided by N can also be coded by M := (M1 , . . . , Mn )

▶ Example: Consider a basic sample such that

(m1 , m2 , . . . , m10 ) = (1, 2, 0, 0, 1, . . . , 0),

BNP approach to discovery probab. [Lijoi, Mena & P, 2007a]

Key advantage of the Bayesian nonparametric approach is that the

▶ the probability of detecting a species with frequency r in X (n) is

=⇒ Probability of sampling a species with frequency r depends, in agreement

More discovery probability estimation problems

▶ BNP analog of the Good–Toulmin estimator [Favaro, Lijoi, Mena & P,

▶ BNP estimator for the probability of discovering a species with frequency

P[Xn+m+1 = “ species with frequency r ” | X (n) ]

Discovery probability at the (n + m + 1)–th sampling step.

Size of the additional sample

Expected number of new genes in an additional sample of size

Some remarks on BNP models for species sampling problems

▶ BNP estimators based on Gibbs–type priors available for the before

Frequentist Posterior Consistency

“ What if ” or frequentist approach to consistency [Diaconis and Freedman,

where Aε is a weak neighbourhood of P0 and P0∞ denotes the infinite product

Consistency of the PY process

▶ P0 is diffuse (i.e. P0 ({x}) = 0 ∀x ∈ X) =⇒ κn = n and Ni = 1, ∀i.

• Antoniak(1974). Mixtures of Dirichlet processes with applications to Bayesian

You might also like