0% found this document useful (0 votes)
33 views

3 Prediction Handout

This document provides an overview of Bayesian nonparametrics and predictive approaches, specifically focusing on the Dirichlet process and Pitman-Yor process. It summarizes that the Dirichlet process exhibits restrictive logarithmic growth of the number of discovered species as the sample size increases. In contrast, the Pitman-Yor process allows for a richer predictive structure by making the probability of discovering a new species dependent on both the sample size and number of previously discovered species. The Pitman-Yor process falls under the category of Gibbs-type priors, which exhibit a flexible predictive distribution characterized by a discount parameter and recursive weight formulas.

Uploaded by

Alessandro Sinai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

3 Prediction Handout

This document provides an overview of Bayesian nonparametrics and predictive approaches, specifically focusing on the Dirichlet process and Pitman-Yor process. It summarizes that the Dirichlet process exhibits restrictive logarithmic growth of the number of discovered species as the sample size increases. In contrast, the Pitman-Yor process allows for a richer predictive structure by making the probability of discovering a new species dependent on both the sample size and number of previously discovered species. The Pitman-Yor process falls under the category of Gibbs-type priors, which exhibit a flexible predictive distribution characterized by a discount parameter and recursive weight formulas.

Uploaded by

Alessandro Sinai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Machine Learning II

Part I: An overview of Bayesian Nonparametrics


Predictive approach to BNP

Igor Prünster
Bocconi University

1 / 25
Exchangeability & Prediction

Exchangeability & Prediction

de Finetti’s representation theorem

A sequence of X–valued observations (Xn )n≥1 is exchangeable if and only


if
n
Z Y
P[X1 ∈ A1 , . . . , Xn ∈ An ] = P(Ai )Q(dP)
P i=1

for any n ≥ 1, where P is the space of probability measures on X.

Prediction within de Finetti’s framework:


▶ Singular (or one–step) prediction
Z
P[Xn+1 ∈ A |X (n) ] = P(A) Q(dP|X (n) )
| {z } P | {z }
predicitve distribution ∈ P posterior distribution
(n)
where throughout we set X := (X1 , . . . , Xn ).
▶ m–step prediction
Z m
Y
P[Xn+1 ∈ A1 , . . . , Xn+m ∈ Am |X (n) ] = P(Ai ) Q(dP|X (n) ).
P i=1

2 / 25
Exchangeability & Prediction

Two key properties of the Dirichlet process

Recall the following key properties of the Dirichlet process:


▶ For the number of distinct values (species, agents, clusters, components
etc.) Kn generated by a sample X (n) we have

θk
P [Kn = k] = |s(n, k)|
(θ)n
n
X θ
E[Kn ] =
i=1
θ + i −1

with (θ)n := θ(θ + 1) . . . (θ + n − 1) and |s(n, k)| the signless Stirling


number of the first kind.
▶ The predictive distributions are of the form
Kn
θ n 1 X
P[Xn+1 ∈ · |X (n) ] = P ∗( · ) + Ni δXi∗ ( · )
θ+n | {z } θ+n n i=1
| {z } prior guess
| {z }
| {z }
P[Xn+1 =“new” | X (n) ] P[Xn+1 =“old” | X (n) ] emprirical measure

3 / 25
Exchangeability & Prediction

m–step prediction for the Dirichlet process

Given a basic sample X (n) , prediction of various features of an additional


sample X (m) := (Xn+1 , . . . , Xn+m ) is of interest. In particular,

Km(n) := Km+n − Kn

is the number of new species to be recorded in X (m) given X (n) .


In the DP case [Favaro, P & Walker, 2011] one has
h i θj (θ)n
P Km(n) = j|X (n) = |s(m, j, n)|
(θ)(n+m)
m
h i X θ
E Km(n) |X (n) =
i=1
θ + n + i −1

with|s(n, k, r )| denoting the non–central version of the signless Stirling number.


(n)
=⇒ The only sample information affecting the distribution of Km |X (n) is the
size n; it depends neither on Kn nor on N1 , . . . , NKn !

4 / 25
Exchangeability & Prediction

Since P[Xn+1 = “new” | X (n) ] = θ+n


θ
for any sample size n, clearly Kn will
diverge as the sample size n → ∞. But which is its growth rate?
▶ In the unconditional case, by Korwar and Hollander (1973), one has

Kn a.s.
−→ θ (n → ∞)
log n

▶ In the conditional case, given a fixed sample X (n) , one has


(n)
Km a.s.
X (n) −→ θ (m → ∞).
log m

Remark: Ideally one would like a model with a flexible growth rate which
depends on the model parameters (e.g. nσ with parameter σ ∈ (0, 1)). Instead,
the DP displays a logarithmic growth and such a behaviour is unaffected by the
sample X (n) =⇒ highly restrictive and inappropriate in applications [e.g.
linguistics (Teh, 2006), certain bipartite graphs (Caron, 2012), network models
(Caron & Fox, 2017) and species sampling]
Can such a model be really considered nonparametric? What is re-
sponsible for such a restrictive behaviour?
It all boils down to P[Xn+1 = “new” | X (n) ] = θ
θ+n
!
5 / 25
Gibbs–type priors & the Pitman–Yor process

Probability of discovering a new species


As seen, a key quantity in the analysis of discrete nonparametric priors is the
probability of discovering a new species

P[Xn+1 = “new” | X (n) ]. (∗)

Fundamental Characterization:
Based on (∗), discrete P̃ classified in 3 categories [De Blasi et al., 2015]:
(a) P[Xn+1 = “new” | X (n) ] = f (n, model parameters)
⇐⇒ depends on n but not on Kn and Nn = (N1 , . . . , NKn )
⇐⇒ Dirichlet process;
(b) P[Xn+1 = “new” | X (n) ] = f (n, Kn , model parameters)
⇐⇒ depends on n and Kn but not on Nn = (N1 , . . . , NKn )
⇐⇒ Gibbs–type priors [Gnedin & Pitman, 2006];
(c) P[Xn+1 = “new” | X (n) ] = f (n, Kn , Nn , model parameters)
⇐⇒ depends on all information conveyed by the sample i.e. n, Kn and
Nn = (N1 , . . . , NKn )
⇐⇒ serious tractability issues.
=⇒ Let’s go for the intermediate case which exhibits a richer predictive
structure than the DP!
6 / 25
Gibbs–type priors & the Pitman–Yor process

Gibbs–type priors

Q is a Gibbs-type prior of order σ ∈ (−∞, 1) if and only if it gives rise to


predictive distributions of the form

P[Xn+1 ∈ · | X (n) ]
 PKn
i=1 (Ni − σ) δXi ( · )

Vn+1,Kn +1 Vn+1,Kn +1 ∗
= P ∗( · ) + 1 −
Vn,K Vn,Kn n − σKn
| {z n }
| {z }
prior guess | {z }| {z }
weighted emprirical measure
P[Xn+1 =“new” | X (n) ] P[Xn+1 =“old” | X (n) ]

where P ∗ is diffuse and {Vn,j : n ≥ 1, 1 ≤ j ≤ n} is a set of weights which


satisfy the recursion

Vn,j = (n − jσ)Vn+1,j + Vn+1,j+1 . (♢)

=⇒ A Gibbs–type prior is completely characterized by choice of P ∗ , σ < 1 and


a set of weights Vn,j ’s.
=⇒ Crucially now P[Xn+1 = “new” | X (n) ] depends on both the sample size n
and the distinct values in the sample Kn .

7 / 25
Gibbs–type priors & the Pitman–Yor process

The Pitman-Yor process & the NGG process


By defining, for σ ≥ 0 and θ > −σ or σ < 0 and θ = r |σ| with r ∈ N,
Qk−1
(θ + iσ)
Vn,k = i=1
(θ + 1)n−1
one obtains the two parameter Poisson–Dirichlet process aka Pitman–Yor (PY)
process [Pitman & Yor, 1997], which yields
PKn
i=1 (Ni − σ)δXi ( · )
 
(n) θ + σKn ∗ n − σKn ∗
P Xn+1 ∈ · X = P (·) + .
θ+n θ+n n − σKn
θ+Kn σ θ
=⇒ if σ = 0, the PY reduces to the Dirichlet process and θ+n
to θ+n
.
The normalized generalized gamma process (NGG) arises with
n−1
!
eβ σ k−1 X n − 1
 
i
Vn,k = (−1)i β i/σ Γ k − ; β ,
Γ(n) i=0 i σ

where β > 0, σ ∈ (0, 1) and Γ(x, a) denotes the incomplete gamma function. If
σ = 1/2 it reduces to the normalized inverse Gaussian process (N–IG). [Lijoi,
Mena & P, 2005 & 2007b].
=⇒ In the following we restrict attention to the PY process but most results
carry over (with minor modifications) to general Gibbs–type priors
8 / 25
Gibbs–type priors & the Pitman–Yor process

A closer look at the predictive structure


The Gibbs–structure allows to look at the predictives as the result of two steps:
(1) Xn+1 is a new species with probability
θ + σKn
θ+n
or “old” {X1∗ , . . . , XK∗n } with probability (n − σKn )(θ + n)−1 .
=⇒ depends on n and Kn but not on the frequencies Nn = (N1 , . . . , NKn )
=⇒ P[Xn+1 = “new” | X (n) ] is monotonically increasing or decreasing in
Kn according to σ > 0 or σ < 0, respectively.
(2) (i) Given Xn+1 is new, it is independently sampled from P ∗ .
(ii) Given Xn+1 is a tie, it coincides with Xi∗ with probability
Ni − σ
.
n − σKn
=⇒ A reinforcement mechanism driven by σ takes place among the
“old” values. For instance, if N1 = 2 and N2 = 1 then
 
 < 2 if σ < 0  1.5 if σ = −1
P[Xn+1 = X1∗ | X (n) ] 2−σ e.g .
= = 2 if σ = 0 = 2 if σ = 0
P[Xn+1 = X2∗ | X (n) ] 1−σ
> 2 if σ > 0 3 if σ = 0.5
 

9 / 25
Gibbs–type priors & the Pitman–Yor process

The number of clusters generated by the PY process


The (prior) distribution of the number of clusters [Pitman, 2006] is given by
Qk−1
(θ + iσ)
P(Kn = j) = i=1 C (n, j; σ)
(θ + 1)n−1 σ j
with C (n, j; σ) = j!1 ji=0 (−1)i ji (−iσ)n a generalized factorial coefficient.
P 

In general, the dependence of the distribution of Kn on the prior parameters is:


▶ σ controls the “ flatness ” (or variability) of the (prior) distribution of Kn .
▶ θ controls the location of the (prior) distribution of Kn
(n)
Recall that Km is the number of new species to be recorded in the additional
sample of size m given X (n) featuring Kn = j distinct values. One has that
Qj+k−1
(θ + iσ)
 
(θ + 1)n−1 i=j
(n)
P Km = k X (n)
= C (m, k; σ, −n + jσ).
(θ + 1)n+m−1 σk
and also that the expected number of new species is
  
(n) (n) θ (θ + n + σ)m
E[Km |X ] = j + −1 .
σ (θ + n)m
See Lijoi, Mena & P (2007a) and Favaro, Lijoi, Mena & P (2009).
10 / 25
Gibbs–type priors & the Pitman–Yor process

Prior distribution of the number of clusters as σ varies

0.3

0.2

0.1

0
0.2
10 0.3
20 0.4
0.5
30
0.6
40 0.7
50 0.8

Prior distributions on the number of clusters corresponding to the PY


process with n = 50, θ = 1 and σ = 0.2, 0.3, . . . , 0.8.

11 / 25
Gibbs–type priors & the Pitman–Yor process

Asymptotics for the number of clusters in the PY case


a.s.
▶ If σ < 0 and θ = r |σ|, then Kn −→ r as n → ∞. Also, conditionally on
(n) a.s.
X featuring Kn = j distinct values, Km |X (n) −→ r − j for m → ∞.
(n)

▶ One has
Kn a.s.
−→ Yθ/σ (n → ∞).

where Yq , with q ≥ 0 is a suitably generalized Mittag–Leffler random
variable. [Pitman, 2006]
▶ Conditional on X (n) featuring Kn = j distinct values, as the size of the
additional sample m diverges
(n)
Km a.s.
X (n) −→ Zn,j (m → ∞)

d
where Zn,j = Bj+θ/σ, n/σ−j Y(θ+n)/σ with Ba,b a beta random variable
independent of Yq . [Favaro, Lijoi, Mena & P, 2009]
=⇒ In the DP case logarithmic growth of Kn : such a restriction has been
overcome by allowing a richer predictive structure and the growth now depends
on the model parameter σ.
12 / 25
Species sampling

Data structure in species sampling problems

▶ X (n) = basic sample of draws from a population containing different


species (plants, genes, animals,...). Information:
⋄ sample size n and number of distinct species in the sample Kn ;
PKn
⋄ a collection of frequencies N = (N1 , . . . , NKn ) s.t. i=1 Ni = n;
⋄ the labels (names) Xi∗ ’s of the distinct species, for i = 1, . . . , Kn .

▶ The information provided by N can also be coded by M := (M1 , . . . , Mn )


Mi = number of species in the sample X (n) having frequency i.
Note that ni=1 Mi,n = Kn and ni=1 iMi,n = n.
P P

▶ Example: Consider a basic sample such that


⋄ n = 10 with j = 4 and frequencies (n1 , n2 , n3 , n4 ) = (2, 5, 2, 1).
⋄ equivalently we can code this information as

(m1 , m2 , . . . , m10 ) = (1, 2, 0, 0, 1, . . . , 0),

meaning that 1 species appears once, 2 appear twice and 1 five times.

13 / 25
Species sampling

One–step Prediction
▶ Discovery probability estimation: Given a basic sample X (n) , estimate the
probability of discovering at the (n+1)–th sampling step either a new
species or an “ old ” species with frequency r .
▶ Turing estimator [Good, 1953; Mao & Lindsay, 2002]: probability of
discovering at (n+1)–th step a new species is
m1
n
and a species with frequency r in X (n) is
mr +1
(r + 1) .
n
▶ Problem: mr +1 is used to estimate the probability of discovering a species
with frequency r =⇒ counterintuitive! It should be based on mr .
E.g. If m5 = 10, m6 = 0, the estimated probability of detecting a species
with frequency 5 would be 0 =⇒ problem bypassed by use of “ smoothing
functions ” but, in I.J. Good’s words, it seems like an “adhockery”!
Origin of the problem? In a frequentist nonparametric setup there is no
natural quantity to use for estimating the probability of discovering a new
species and so m1 is used. Hence, the discovery probability of species with
frequency 1 uses m2 and so on.
14 / 25
Species sampling

BNP approach to discovery probab. [Lijoi, Mena & P, 2007a]

Key advantage of the Bayesian nonparametric approach is that the


predictive structure includes a positive probability of discovering a new
species, value, cluster, agent etc.
Assume the data (Xn )n≥1 are exchangeable with a PY de Finetti measure.
(n)
BNP analog to Turing estimator: given Pna basic sample X featuring Kn = j
distinct species with m1 , . . . , mn s.t. i=1 i mi = n:
▶ the probability of discovering a new species is

θ + σj
P[Xn+1 = “new” | X (n) ] = .
θ+n

▶ the probability of detecting a species with frequency r in X (n) is


r −σ
P[Xn+1 = “species with frequency r ” | X (n) ] = mr .
θ+n

=⇒ Probability of sampling a species with frequency r depends, in agreement


with intuition, on mr and also on Kn = j.

15 / 25
Species sampling

More discovery probability estimation problems


Conditionally on a basic sample X (n) , estimate the probability of discovering at
the (n+m+1)–th step either a new species or an “ old ” species with frequency
r without observing the additional sample X (m) := (Xn+1 , . . . , Xn+m ).
Remark. From such an estimator one immediately obtains:
▶ the discovery probability for rare species i.e. the probability of discovering
a species which is either new or has frequency at most τ at the
(n+m+1)–th step =⇒ rare species estimation
▶ an optimal additional sample size: sampling is stopped once the
probability of sampling new or rare species is below a certain threshold
▶ the sample coverage, i.e. the proportion of species in the population
detected with a sample of size n + m.
Frequentist estimators:
▶ Good–Toulmin estimator [Good & Toulmin, 1956; Mao, 2004]: estimator
for the probability of discovering a new species at (n+m+1)–th step.
=⇒ unstable if the size of the additional unobserved sample m is larger
than n (estimated probability becomes either < 0 or > 1).
▶ No frequentist nonparametric estimator for the probability of discovering a
species with frequency r at (n+m+1)–th sampling step is available.
16 / 25
Species sampling

▶ BNP analog of the Good–Toulmin estimator [Favaro, Lijoi, Mena & P,


2009]: estimator for the probability of discovering a new species at the
(n+m+1)–th step

θ + kσ (θ + n + σ)m
P[Xn+m+1 = “new” | X (n) ] = .
θ + n (θ + n + 1)m

▶ BNP estimator for the probability of discovering a species with frequency


r at the (n+m+1)–th sampling step [Favaro, Lijoi and P, 2012]:

P[Xn+m+1 = “ species with frequency r ” | X (n) ]


r
!
X m (θ + n − i + σ)m−r +i
= mi (i − σ)r +1−i
i=1
r − i (θ + n)m+1
!
m (θ + kσ)(θ + n + σ)m−r
+ (1 − σ)r
r (θ + n)m+1

17 / 25
Species sampling

Discovery probability at the (n + m + 1)–th sampling step.


1.2 Anaerobic Aerobic
PY-Estimator PY-Estimator
GT-Estimator GT-Estimator
Probability of discovering a new species DP-Estimator DP-Estimator
1.0

0.8

0.6

0.4

0.2

0.0

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Size of the additional sample

EST data from Naegleria gruberi aerobic and anaerobic cDNA libraries
with basic sample n ∼
= 950: Good–Toulmin (GT), DP process and PY
process estimators of the probability of discovering a new gene at the
(n + m + 1)–th sampling step for m = 1, . . . , 2000.
18 / 25
Species sampling

Expected number of new genes in an additional sample of size


m. 2000
Aerobic Anaerobic
PY-Estimator PY-Estimator
GT-Estimator GT-Estimator
DP-Estimator DP-Estimator
Expected number of new genes

1500

1000

500

0 200 400 600 800 1000 1200 1400 1600 1800 2000
Size of the additional sample

EST data from Naegleria gruberi aerobic and anaerobic cDNA libraries
with basic sample n ∼
= 950: Good–Toulmin (GT), DP process and PY
process estimators of the number of new genes to be observed in an
additional sample of size m = 1, . . . , 2000.
19 / 25
Species sampling

Some remarks on BNP models for species sampling problems

▶ BNP estimators based on Gibbs–type priors available for the before


mentioned and other quantities of interest in species sampling problems.
▶ BNP models correspond to large probabilistic models in which all objects
of potential interest are modeled jointly and coherently thus leading to
intuitive predictive structures
=⇒ avoids ad–hoc procedures and incoherencies sometimes connected
with frequentist nonparametric procedures.
▶ Gibbs–type priors with σ > 0 (recall that they assume an infinite number
of species) are ideally suited for populations with large unknown number
of species =⇒ typical case in Genomics.
▶ In Ecology “∞” assumption often too strong =⇒ Gibbs–type priors with
σ < 0 (surprising heuristic by–product: by combining Gibbs-type priors
with σ > 0 and σ < 0 is possible to identify situations in which frequentist
estimators work).

20 / 25
Frequentist Consistency

Frequentist Posterior Consistency

“ What if ” or frequentist approach to consistency [Diaconis and Freedman,


1986]: What happens if the data are not exchangeable but i.i.d. from a “true”
P0 ? Does the posterior Q( · |X (n) ) accumulate around P0 as the sample size
increases?
Definition. Q is weakly consistent at P0 if for every Aε
n→∞
Q(Aε |X (n) ) −→ 1 a.s. − P0∞

where Aε is a weak neighbourhood of P0 and P0∞ denotes the infinite product


measure.
The typical proof strategy for discrete nonparametric priors consists in showing
▶ E[P̃ | X (n) ] n→∞
−→ P0 a.s.–P0∞ (the key step consists in showing that
n→∞
P[Xn+1 = “new” | X (n) ] −→ 0 a.s.–P0∞ )
▶ Var[P̃ | X (n) ] n→∞
−→ 0 a.s.–P0∞ by finding a suitable bound on the variance.

21 / 25
Frequentist Consistency

Consistency of the PY process


Denote by κn the number of distinct values generated by the “ true ” P0 in a
sample of size n. We distinguish two cases of P0 :
▶ P0 is discrete (with finite or infinite support points) =⇒ κnn n→∞
−→ 0:
The predictive distribution given a sample generated by P0 is then

P[Xn+1 ∈ · | X (n) ] =
κn
θ + σκn ∗ n − σκn X (Ni − σ)
= P (·) + δXi∗ ( · )→ P0 ( · )
θ+n θ+n n − σκ
| {z } | {z } i=1 | {z n}
→0 →1 ∼Ni /n

▶ P0 is diffuse (i.e. P0 ({x}) = 0 ∀x ∈ X) =⇒ κn = n and Ni = 1, ∀i.

n
θ + σn ∗ n − σn X (1 − σ)
P[Xn+1 ∈ · | X (n) ] = P (·) + δXi∗ ( · )
θ+n θ+n n − σn
| {z } | {z } i=1 | {z }
→σ →1−σ =1/n

→ σP ( · ) + (1 − σ)P0 ( · )

22 / 25
Frequentist Consistency

n→∞
Moreover, in both cases we have Var[P̃ | X (n) ] −→ 0 a.s.–P0∞ and hence:
▶ The PY process is consistent for discrete data generating P0 (e.g. species
sampling problems);
▶ The PY process is inconsistent for diffuse data generating P0 .
Similar results hold for general Gibbs–type priors [De Blasi, Lijoi and P, 2013].
Should one worry about the inconsistency result in the case of diffuse P0 ? NO!
Discrete nonparametric priors are designed to model discrete distributions and
should not be used to model data from continuous distributions. In fact, as the
sample size n diverges:
▶ P0 generates (Xn )n≥1 containing no ties with probability 1;
▶ a discrete de Finetti measure generates (Xn )n≥1 containing no ties with
probability 0
=⇒ model and data generating mechanism are incompatible!
=⇒ There are not good or bad priors, but rather models which are
suitable/compatible/designed for a certain phenomenon and not for others!

23 / 25
Some References

Some References

• Antoniak(1974). Mixtures of Dirichlet processes with applications to Bayesian


nonparametric problems. Ann. Statist. 2, 1152–1174.
• De Blasi, Favaro, Lijoi, Mena, Prünster & Ruggiero (2015). Are Gibbs-type priors
the most natural generalization of the Dirichlet process? IEEE TPAMI 37, 212-229.
• De Blasi, Lijoi, & Prünster (2013). An asymptotic analysis of a class of discrete
nonparametric priors. Satist. Sinica 23, 1299–1322.
• Caron (2012). Bayesian nonparametric models for bipartite graphs. NIPS’2012.
• Caron & Fox (2017), Sparse graphs using exchangeable random measures. J. R.
Stat. Soc. B 79, 1295-1366.
• Diaconis & Freedman (1986). On the consistency of Bayes estimates. Ann. Statist.
14, 1-26.
• Favaro, Lijoi, Mena & Prünster (2009). Bayesian nonparametric inference for species
variety with a two parameter Poisson-Dirichlet process prior. J. Roy. Statist. Soc. Ser.
B 71, 993–1008.
• Favaro, Lijoi & Prünster (2012). A new estimator of the discovery probability.
Biometrics 68, 1188-96.
• Favaro, Prünster & Walker (2011). On a class of random probability measures with
general predictive structure. Scand. J. Statist. 38, 359–376.
• Ferguson (1973). A Bayesian analysis of some nonparametric problems. Ann.
Statist. 1, 209-30.
• Gnedin & Pitman (2006). Exchangeable Gibbs partitions and Stirling triangles. J.
Math. Sci. (N.Y.) 138, 5674-85.

24 / 25
Some References

• Good & Toulmin (1956). The number of new species, and the increase in
population coverage, when a sample is increased. Biometrika 43, 45-63.
• Good (1953). The population frequencies of species and the estimation of
population parameters. Biometrika 40, 237-64.
• Korwar & Hollander (1973). Contribution to the theory of Dirichlet processes.
Ann. Probab. 1, 705-11
• Lijoi, Mena & Prünster (2007a). Bayesian nonparametric estimation of the
probability of discovering new species. Biometrika 94, 769–786.
• Lijoi, Mena & Prünster (2007b). Controlling the reinforcement in Bayesian
nonparametric mixture models. J. Roy. Statist. Soc. Ser. B 69, 715–740.
• Lo (1991). A characterization of the Dirichlet process. Statist. Probab. Lett. 12,
185–187.
• Mao (2004). Prediction of the conditional probability of discovering a new class. J.
Am. Statist. Assoc. 99, 1108-18.
• Mao & Lindsay (2002). A Poisson model for the coverage problem with a genomic
application. Biometrika 89, 669-81.
• Pitman & Yor (1997). The two-parameter Poisson-Dirichlet distribution derived
from a stable subordinator. Ann. Probab. 25, 855–900.
• Pitman (2006). Combinatorial Stochastic Processes. Lecture Notes in Math.,
vol.1875, Springer, Berlin.
• Regazzini (1978). Intorno ad alcune questioni relative alla definizione del premio
secondo la teoria della credibilità. Giorn. Istit. Ital. Attuari, 41, 77–89.
• Teh (2006). A Hierarchical Bayesian Language Model based on Pitman-Yor
Processes. Coling/ACL 2006, 985-92.

25 / 25

You might also like