0% found this document useful (0 votes)
17 views

extinction_explosion_subcritical_2015

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

extinction_explosion_subcritical_2015

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Predicting Extinction or Explosion in a Galton-Watson

Branching Process with Power Series Offspring


Distribution1
Peter Guttorp2 and Michael D. Perlman3
University of Washington

Abstract
Extinction is certain in a Galton-Watson (GW) branching process if the off-
spring mean μ ≤ 1, whereas explosion is possible but not certain if μ > 1.
Discriminating between these two possibilities is a well-studied hypothesis-
testing problem. However, deciding whether extinction or explosion will oc-
cur for the current realization of the process is a prediction problem. This can
be formulated as a different testing problem by considering the conditional
distributions of the process given extinction and explosion respectively. For
power series offspring distributions, fixed-sample and sequential parametric
tests are presented for the prediction problem and illustrated with data on
the spread of epidemics and the populations of endangered species.

1
Key words and phrases: Galton-Watson branching process; extinction; explosion; sub-
critical; supercritical; stochastic ordering; prediction; hypothesis testing; least favorable
distribution; sequential probability ratio test; epidemic; endangered species.
2
[email protected]. Research supported in part by National Science Founda-
tion Grant DMS-1106862.
3
[email protected]. Research supported in part by U.S. Department of
Defense Grant H98230-10-C-0263/0000 P0004.

Preprint submitted to Elsevier April 15, 2015

© 2015. This manuscript version is made available under the Elsevier user license
http://www.elsevier.com/open-access/userlicense/1.0/
1. Introduction: the 2012 pertussis outbreak in Washington State
In 2011 the weekly numbers of new pertussis (whooping cough) cases in
Washington State remained fairly constant, but in 2012 the numbers in-
creased rapidly (Figure 1, CDC (2012)). Faced with the possibility of a
pandemic, the governor declared a state-wide health emergency in Week 14
and an inoculation/quarantine program was begun.

Figure 1: Weekly counts of new pertussis cases in Washington state.

The spread of an epidemic, at least in its initial stages, can be modeled as


a classical Galton-Watson (GW) branching process, cf. §2. The question of
predicting extinction or explosion is commonly formulated as that of testing
μ ≤ 1 (subcriticality/criticality) vs. μ > 1 (supercriticality), where μ denotes
the mean number of infected offspring per individual case – cf. Becker (1974),
Heyde (1979), Scott (1987).4 Guttorp and Perlman (2015) use a decision-
theoretic analysis to show, however, that this problem is more complex than
previous literature suggests and that the basis of a standard test procedure
is somewhat dubious.
Fortunately, this testing problem usually is not the one of actual interest,
because a supercritical process still may terminate with positive probability.

4
Basawa and Scott (1976) and Sweeting (1978) treat a related testing problem for the
supercritical case.

2
Of more interest is the problem of predicting whether the current realization
of a non-terminated process will terminate or explode.
In §5-6 this prediction problem is formulated as a different hypothesis-
testing problem based on the conditional distributions of the process given
eventual extinction and explosion respectively. Unlike the original testing
problem, this prediction problem often has relatively simple solutions in the
fixed-sample (§5) and sequential sample (§6) cases, the latter based on the
classical Wald sequential probability ratio test (SPRT), see §6. Using this
procedure, explosion might have been predicted for the 2012 pertussis out-
break as early as Week 3; see Example 7.2.
Like the authors noted above who treated the original testing problem,
we assume a parametric model for the offspring distribution, a power series
offspring distribution (psod); see §3. The conditional distributions of a GW
process given (eventual) extinction or explosion are given in §2, then special-
ized in §3 to the psod case. If the psod satisfies two total positivity conditions,
these conditional distributions possess the stochastic monotonicity properties
needed to justify our fixed-n and sequential prediction methods; see §4. Ya-
glom’s (1947) well-known exponential approximation for the distribution of
the population size is extended and sharpened in §5.3 and §5.4.
2. Conditional processes derived from a GW branching process
The Galton-Watson branching process is a discrete-time Markov chain that
describes the growth or decline of a population that reproduces by simple
branching, or splitting. Applications include nuclear chain reactions, epi-
demics, and the population size of endangered species. The classic reference
is Harris (1963, Ch. I); also see Karlin (1966), Feller (1968), Athreya and
Ney (1972), Jagers (1975), Taylor and Karlin (1984), Guttorp (1991).
For each n = 0, 1, 2... let Xn denote the population size at generation n;
assume that X0 = x0 ≥ 1 is known. At generation n = 0 the ith individ-
(1) d
ual is replaced by a random number ξi = ξ of first-generation offspring,
where the offspring random variable (rv) ξ ≡ ξp has probability distribu-
tion p ≡ (p0 , p1 , p2 , . . . ) on {0, 1, 2, . . . }. The i-th individual in generation
(n) d
n − 1 similarly is replaced by a random number ξi = ξ of n-th generation
offspring independently of its siblings. Thus the population size in the n-th
generation satisfies
(n) (n)
Xn = ξ1 + · · · + ξXn−1 , n ≥ 1, (1)

3
(n) (n) d
where ξ1 , . . . , ξXn−1 are iid rvs, each = ξ. We assume that each pk < 1 so
the process is not deterministic, and that p0 > 0 so extinction is possible.
Denote the probability generating function (pgf) of the offspring distri-
bution by
∞
φ(s) ≡ φp (s) = Ep (sξ ) = p k sk , s ≥ 0, (2)
k=0

and let 1 ≤ ρ ≡ ρp ≤ ∞ be its radius of convergence. Note that φ(1) = 1.


Because φ(s) is convex and p1 < 1, the equation

φ(s) = s (3)

has either one finite root or two distinct finite roots in (0, ρ], one of which
must be 1. If (3) has one finite root in (0, ρ] denote it by u ≡ up ; if (3) has
two distinct finite roots in (0, ρ] denote them by u ≡ up and v ≡ vp , where
0 < u < v ≤ ρ.
If x0 = 1, the pgf of Xn is the n-th functional iterate of φ, denoted by
φn . For x0 ≥ 1 the pgf of Xn is φxn0 ≡ (φn )x0 . Either extinction (Xn = 0 for
some n ≥ 1) or explosion (Xn → ∞) must occur; their probabilities are ux0
and 1 − ux0 respectively.
Denote the mean of the offspring distribution by μ ≡ μp = E(ξ); then
μ = φ (1). The GW process X ≡ Xp and its pgf φ ≡ φp are called subcritical
(resp., critical, supercritical) if μ < 1 (μ = 1, μ > 1); see Figure 2. In the
subcritical case, u = 1 and v may or may not exist, see §2. In the critical
case, u = 1 and v does not exist. In the supercritical case 0 < u < v = 1, so
both extinction and explosion occur with positive probability.
For a subcritical GW process, if v exists then p, X, and φ are called
extendable; in this case 1 = u < v ≤ ρ (see Figure 2). If ρ = 1 then v > 1
cannot exist so φ is not extendable, while if ρ = ∞ then φ is extendable since

it grows at a quadratic rate or faster hence eventually crosses the 45 line a
second time beyond 1 = u. If 1 < ρ < ∞ then φ is extendable iff φ(ρ) ≥ ρ.
Definition 2.1. For a supercritical GW process X, define the conditional
processes

Ẋ ≡ X  extinction, (4)

Ẍ ≡ X  explosion. (5)

If X is subcritical or critical, define Ẋ = X. 

4
2.5 Supercritical Subcritical, extendable

2.5
v
2.0

2.0
1.5

1.5
φ(s)

φ(s)
1.0

1.0
0.5

0.5
u
0.0

0.0
0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5

s s

Figure 2: The duality between supercritical and extendable subcritical pgfs.

Proposition 2.1. The set of supercritical GW processes conditional on ex-


tinction coincides with the set of subcritical extendable GW processes.
Proof. If X is supercritical it is well known5 that Ẋ is a subcritical GW
process with offspring pgf
φ(us)
φ̇(s) = (6)
u
and offspring mean μ̇ = φ (u) < 1. Furthermore φ̇ is extendable with second
root v̇ = 1/u.
Now suppose that X is subcritical and extendable. Define
φ(vs)
φ̃(s) = . (7)
v
It is straightforward to verify that φ̃ is a supercritical offspring pgf with
offspring mean μ̃ = φ (v) > 1 and extinction probability ũ = 1/v. Denote
the corresponding supercritical GW process by X̃. Then

˙ φ(ũvs)
φ̃(s) = = φ(s). (8)
ũv

5
Waugh (1958, p.248), Athreya, Ney (1972, §I.12, Theorem 3), Guttorp (1991, p.101) .

5
Futhermore, if X is supercritical then

˜ = φ(uv̇s) = φ(s).
φ̇(s) (9)
uv̇
This establishes the asserted result. 
Successive conditioning on X1 , . . . , Xn−1 in (1) shows that the joint prob-
ability mass function (pmf) f ≡ fp of Xn ≡ (X1 , . . . , Xn ) is given by


n
f (xn ) ≡ Prp [Xn = xn ] = hp (xi−1 , xi ) ≡ hp (xn ) (10)
i=1

(e.g. Jagers (1975, eqn. (2.1.2)), where



hp (k, l) = p r 1 · · · p rk (11)
r1 +···+rk =l

Note that hp (k, l) is the coefficient of sl in the power series [φp (s)]k .
From Bayes’ formula, the pmf of Ẋn ≡ (Ẋ1 , . . . , Ẋn ) is given by

f˙(xn ) ≡ f˙p (xn ) = Prp [ Xn = xn | extinction] (12)


Pr[ extinction | Xn = xn ] Pr[ Xn = xn ]
=
Pr[ extinction ]
xn −x0
= u f (xn ).

Similarly the pmf of Ẍn ≡ (Ẍ1 , . . . , Ẍn ) is given by


 1 − u xn 
f¨(xn ) ≡ f¨p (xn ) = f (xn ), xn > 0, (13)
1 − ux0

where xn > 0 means that x1 > 0, . . . , xn > 0. From (12) and (13), Ẋ and Ẍ
are Markovian with transition probabilities

f˙(xn |xn−1 ) = uxn −xn−1 hp (xn−1 , xn ), (14)


 1 − u xn 
f¨(xn |xn−1 ) = hp (xn−1 , xn ), xn−1 , xn > 0, (15)
1 − uxn−1

respectively. However, Ẍ is not a GW process because some individuals may


die without offspring even though explosion occurs.

6
3. The GW process with power series offspring distribution
Following Becker (1974) we now specialize this discussion to a parametric
model for the offspring distribution p ≡ (p0 , p1 , . . . ). The power series off-
spring distribution (psod) pθ ≡ (pθ;0 , pθ;1 , . . . ) is given by

ak θ k
pθ;k = , k = 0, 1, . . . , 0 < θ < ψ, (16)
A(θ)

, . . . ) ≡ a are nonnegative constants, θ is the unknown parame-


where (a0 , a1
ter, A(θ) = ak θk , and 0 < ψ ≤ ∞ is the radius of convergence of A(·). We
assume that a0 > 0 so extinction is possible, and that ak > 0 for at least one
k ≥ 2 so growth is possible; without loss of generality we may take a0 = 1.
For simplicity of exposition we limit attention to the case where A(ψ−) = ∞;
this includes the familiar Poisson, binomial, geometric, negative binomial, bi-
nary splitting, and logarithmic series distributions.
Denote Xpθ , ξpθ , fpθ , φpθ , ρpθ , upθ , vpθ , μpθ by Xθ , ξθ , fθ , φθ , ρθ , uθ , vθ ,
μθ respectively. By (2) and (16), φθ has radius of convergence ρθ = ψ/θ and

A(θs) B(θs)
φθ (s) = = s, 0 < s < ρθ , (17)
A(θ) B(θ)

where B(θ) = A(θ)/θ (see Figure 3). Here B(θ) is a strictly convex positive
function on (0, ψ) with B(0+) = B(ψ−) = ∞, so B(·) has a unique minimum
at some τ ∈ (0, ψ) with B  (τ ) = 0; B(θ) is strictly decreasing for θ < τ and
strictly increasing for θ > τ .
It follows from (17) that for θ ∈ (0, ψ),

θA (θ)
Eθ (ξ) ≡ μθ = , (18)
A(θ)
μθ − 1 B  (θ) d log B(θ)
= = , (19)
θ B(θ) dθ
dμθ
Varθ (ξ) ≡ σθ2 = θ . (20)

By (19), μτ = 1 so Xτ is critical. By (20), μθ is strictly increasing in θ, hence
the subcritical and supercritical parameter spaces are both open intervals:

{θ | μθ < 1} = (0, τ ) (subcritical), (21)


{θ | μθ > 1} = (τ, ψ) (supercritical). (22)

7
B(θ)

θ θuθ τ θ θvθ ψ≤∞

Figure 3: The function B(θ) = A(θ)/θ.

If θ ∈ (τ, ψ) then from (3) and (17), uθ is the unique solution to


B(θuθ ) = B(θ), 0 < θuθ < τ. (23)
If θ ∈ (0, τ ) then vθ is the unique solution to
B(θ) = B(θvθ ), τ < θvθ < ψ. (24)
Thus each subcritical Xθ is extendable. It follows from the uniqueness of the
solutions of (23) and (24) that
vθuθ = u−1
θ for θ ∈ (τ, ψ), (25)
uθvθ = vθ−1 for θ ∈ (0, τ ). 
(26)
Proposition 3.1. (i) For θ ∈ (τ, ψ), θuθ strictly decreases from τ to 0; uθ
strictly decreases from 1 to 0.
(ii) For θ ∈ (0, τ ), θvθ strictly decreases from ψ to τ ; vθ strictly decreases
from ∞ to 1.
Proof. (i) It follows from (23) and (19) that for θ ∈ (τ, ψ),

d(θuθ ) μθ − 1
dθ ≡ = uθ . (27)
dθ μθuθ − 1

8
Thus dθ < 0 because θuθ < τ < θ, so θuθ is strictly decreasing, a fortiori
uθ is strictly decreasing. As θ ↓ τ , B(θ) ↓ B(τ ), its unique minimum, hence
θuθ ↑ τ by (23), so uθ ↑ 1. As θ ↑ ψ, B(θ) ↑ ∞, hence θuθ ↓ 0 by (23), so
uθ ↓ 0.
(ii) It follows from (24) and (19) that for θ ∈ (0, τ ),

d(θvθ ) μθ − 1
= vθ , (28)
dθ μθvθ − 1

which is < 0 because τ < θvθ < ψ. The remaining results are verified as in
(i). Alternatively, (25) and (26) can be applied to obtain vτ − and v0+ . 
Proposition 3.1, (25), and (26) establish analytically a 1-1 relation be-
tween the subcritical (0, τ ) and supercritical (τ, ψ) parameter spaces. The
corresponding probabilistic relation between the subcritical and supercritical
processes themselves is now presented.
Proposition 3.2. The set of supercritical processes {Ẋθ | θ ∈ (τ, ψ)} condi-
tional on extinction coincides with the set of subcritical processes {Xθ | θ ∈
(0, τ )}. Specifically,
d
Ẋθ = Xθuθ , θ ∈ (τ, ψ), (29)
d
Xθ = Ẋθvθ , θ ∈ (0, τ ). (30)
d
(Note too that Ẋτ = Xτ .)
Proof. Suppose first that Xθ is supercritical, i.e., θ ∈ (τ, ψ). From (6), Ẋθ
is a subcritical GW process with offspring pgf in the same psod family (16):

A(θuθ s) A(θuθ s)
φ̇θ (s) = = = φθuθ (s); (31)
A(θ)uθ A(θuθ )

cf. Becker (1974, p.394). Since θuθ < τ , Xθuθ is subcritical.


Suppose next that Xθ is subcritical, i.e., θ ∈ (0, τ ). A similar argument
using (7) shows that X̃θ is a supercritical GW process with offspring pgf

φ̃θ (s) = φθvθ (s). (32)

Now apply (8) to obtain φθ = φ̇θvθ ; since θvθ > τ , Xθvθ is supercritical. 

9
Example 3.1: the Poisson(θ) psod. Here pk;θ = e−θ θk /k!, 0 < θ < ∞, so
ak = 1/k!, A(θ) = eθ , ψ = ∞, A(ψ−) = ∞, B(θ) = eθ /θ, and φθ (s) = eθ(s−1) .
Then μθ = σθ2 = θ, τ = 1, and uθ and vθ satisfy the equation

eθ(s−1) = s. (33)

by (23) and (24). This cannot be solved explicitly, but necessarily

uθ = 1, vθ > 1 if θ < 1 (subcritical);


(34)
uθ < 1, vθ = 1 if θ > 1 (supercritical).

Example 3.2: the negative binomial NB(r, θ) and geometric GM(θ)


psods. For fixed r > 0, the NB(r, θ) psod has pk;θ = Γ(r+k)
Γ(r)k!
(1 − θ)r θk ,
Γ(r+k) 1 1
0 < θ < 1. Here ak = Γ(r)k!
, A(θ) = (1−θ) r, ψ = 1, B(θ) = [θ(1−θ)r ]
, and
(1−θ)r rθ 2 rθ 1
φθ (s) = (1−θs)r
. Also μθ = 1−θ , σθ = (1−θ)2 , τ = 1+r
, and uθ and vθ satisfy
the equation
(1 − θ)r = (1 − θs)r s. (35)
This can be solved explicitly for the GM(θ) ≡ NB(1, θ) psod where r = 1
and τ = 1/2:

uθ = 1, vθ = 1−θ θ
if θ < 1
2
(subcritical);
(36)
uθ = 1−θ
θ
, v θ = 1 if θ > 1
2
(supercritical).

Here the relations (25) and (26) can be verified directly. 


Example 3.3: binary splitting. Take a0 = a2 = 1 and ak = 0 for k = 2.
1 θ2
Thus A(θ) = 1 + θ2 for 0 < θ < ∞ ≡ ψ, so p0;θ = 1+θ 2 , p2;θ = 1+θ 2 , and
2 s2 2θ 2
pk;θ = 0 for k = 0, 2. Here B(θ) = θ−1 + θ, φθ (s) = 1+θ 1+θ 2
, μθ = 1+θ 2,
2 4θ 2
σθ = (1+θ2 )2 , τ = 1, and

uθ = 1, vθ = θ12 if θ < 1 (subcritical);


(37)
uθ = θ12 , vθ = 1 if θ > 1 (supercritical).

Again the relations (25) and (26) can be verified directly. 


4. Stochastic orderings for a psod GW process
Let W , Z, Wn ≡ (W1 , . . . , Wn ), Zn ≡ (Z1 , . . . , Zn ), and W ≡ (W1 , . . . ), Z ≡
(Z1 , . . . ) be nonnegative-integer-valued random variables, random vectors,

10
and discrete-time stochastic processes, respectively. We say W is stochasti-
cally smaller than Z, written W ≺ Z, if E[g(W )] ≤ E[g(Z)] for all increasing
bounded nonnegative functions g on the nonnegative integers Z0 with strict
inequality for at least one g. It is straightforward to show that if U, V, W, Z
are independent, then
U ≺ V and W ≺ Z =⇒ U + W ≺ V + Z. (38)
Similarly, we write Wn ≺ Zn if
E[g(Wn )] ≤ E[g(Zn )] (39)
for all increasing bounded nonnegative functions g on Zn0 with strict in-
equality for at least one g. Finally, we write W ≺ Z if Wn ≺ Zn for all
n = 1, 2, . . . . The next lemma follows directly from (1) and (38).
Lemma 4.1. Let X and X be GW processes with offspring rv’s ξ and ξ 
respectively. If ξ ≺ ξ  then X ≺ X . 
Stochastic orderings satisfied by a GW process Xθ with psod (16) and by
the conditional processes Ẋθ and Ẍθ are now developed. These will be useful
for the testing and prediction problems treated below.
From (10), (11), and (16), the pmf of (Xθ )n is
θyn −x0 
fθ (xn ) = ha (xn ), xn ∈ Ra,n , (40)
(A(θ))yn−1

where yn = x0 + x1 + · · · + xn and Ra,n = {xn | ha (xn ) > 0}. Then (12),
(13), and (40) give the following:
θyn −x0
for θ > 0 : f˙θ (xn ) = uxθ n −x0 ha (xn ), (41)
(A(θ))yn−1
 1 − uxn  θyn −x0
for θ > τ : f¨θ (xn ) = θ
ha (xn ), xn > 0. (42)
1 − uxθ 0 (A(θ))yn−1
The transition probabilities are obtained from (40)-(42) (recall (14)-(15)):
θ xn
fθ (xn |xn−1 ) = ha (xn−1 , xn ), (43)
(A(θ))xn−1
x −x θ xn
f˙θ (xn |xn−1 ) = uθ n n−1 ha (xn−1 , xn ), (44)
(A(θ))xn−1
 1 − u xn  θ xn
f¨θ (xn |xn−1 ) = x
θ
ha (xn−1 , xn ), xn−1 , xn > 0.(45)
1 − uθ n−1 (A(θ))xn−1

11
The definitions of f¨θ (·) and f¨θ (·|·) can be extended to the critical case
θ = τ:

f¨τ (xn ) = lim f¨θ (xn ) (46)


θ↓τ
xn τ yn −x0
= ha (xn ), xn > 0; (47)
x0 (A(τ ))yn−1
xn τ xn
f¨τ (xn |xn−1 ) = ha (xn−1 , xn ), xn−1 , xn > 0. (48)
xn−1 (A(τ ))xn−1

Denote the resulting Markov process by Ẍτ .6 By (46) and Scheffe’s Theorem,
L1
Ẍθ −→ Ẍτ as θ ↓ τ. (49)

Proposition 4.1. (i) Xθ is stochastically increasing for θ ∈ (0, ψ), that is,
θ < θ  ⇒ Xθ ≺ Xθ  .
(ii) Ẋθ is stochastically decreasing for θ ∈ [τ, ψ), that is, θ < θ ⇒ Ẋθ Ẋθ .
Proof. (i) follows from Lemma 4.1 since θ < θ ⇒ ξθ ≺ ξθ by the strict
monotone likelihood ratio (MLR) property of the psod family.7 (ii) follows
from (i), (29), and Proposition 3.1(i). 
The verifications of the stochastic orderings of Xθ and Ẋθ are straightfor-
ward because these are GW processes. However, Ẍθ is not a GW process so
its stochastic ordering properties if any are not apparent. Although it might
appear that Ẍθ should inherit the stochastic increasing property of Xθ , upon
closer examination this is not obvious. Conditional on ultimate explosion, as
θ increases above the critical value τ those trajectories of Xθ with relatively
small initial values might have increasing likelihood of survival, hence for
fixed n, (Ẍθ )n might tend to decrease stochastically, not increase.
In Proposition 4.2(iii) it will be shown, however, that Ẍθ is indeed stochas-
tically increasing for θ ≥ τ provided that two additional conditions are im-
posed, namely TP2a and/or TP2b (see below) based on total positivity of
order 2 (TP2). Also, it is shown in Proposition 4.2(i) that under TP2a alone,
the conditional random vector (Xθ )n | Xθ,n > 0 is stochastically increasing
for θ ≤ τ .

6
This is not to be interpreted as Xτ | explosion, which is vacuous.
7
Lehmann and Romano (2005) Lemma 3.4.2 and Problem 3.39; Karlin (1968) Propo-
sition 3.1 and the discussion following Proposition 3.3, both in Chapter 1.

12
Karlin (1968) is the primary reference for total positivity. The TP2 prop-
erty is equivalent to MLR, cf. Lehmann and Romano (2005, Problem 50)).
The following results for the TP2 and FKG properties appear in Kemperman
(1977) and Perlman and Olkin (1980).
Definition 4.1. Let f (x) be a nonnegative function defined on a measurable
rectangle R = ni=1 Ri ⊆ Rn . Then f satisfies the FKG condition on R if
f (xn )f (yn ) ≤ f (xn ∧ yn )f (xn ∨ yn ) ∀xn , yn ∈ R, (50)
where xn ∧ yn = (x1 ∧ y1 , . . . , xn ∧ yn ) and xn ∨ yn = (x1 ∨ y1 , . . . , xn ∨ yn );
we say that f is FKG on R. TP2 is defined as FKG for n = 2. 
Some properties of TP2 and FKG: If h(xi , xj ) is TP2 on Ri × Rj in
a single pair (xi , xj ) then f (xn ) ≡ h(xi , xj ) is FKG on R. If f1 , . . . , fm are
FKG on R then so is m i=1 fi . If f (xn ) = hi (xi ) for a single i then f is
trivially FKG on R, so f (xn ) = ni=1 hi (xi ) is also trivially FKG. If f is
FKG on R∗ ≡ Ri∗ and if, for each i = 1, . . . , n, βi : Ri → Ri∗ is increasing
in xi , then f (β1 (x1 ), . . . , f (βn (xn )) is FKG on R ≡ Ri .
Lemma 4.2. (The FKG Inequality). Let Z be a random vector with an
FKG pdf f on R w.r.to a product measure ν and let g, h be component-wise
increasing nonnegative functions on R ∩ {f > 0}. Then
E[g(Z)h(Z)] ≥ E[g(Z)]E[h(Z)]. (51)
Strict inequality holds in (51) if g is nonconstant w.r.to f (Pr[g(Z) = c] < 1
for all constants c) and h is strictly increasing.
Proof. Perlman and Olkin (1980, Propositions 2.4, 2.6, and Remark 2.5.) 

Condition TP2a: ha (x, y) is TP2 in (x, y) for x, y = 1, 2, . . . . (Note that


ha (x, y) is the coefficient of θy in the power series [A(θ)]x .)
Condition TP2b: (1 − uxθ )θx is TP2 in (x, θ) for x = 1, 2, . . . and τ < θ < ψ.
A sufficient condition for TP2a to hold is that {ak |k = 0, 1, . . . } is a one-
sided Polya frequency sequence of order 2 (PF2); cf. Karlin (1968, (ii) on
pp.142-3, also Ch.8).
Let (Xθ )+
n denote the conditional random vector (Xθ )n | Xθ,n > 0. For
notational convenience the subscript θ sometimes will be omitted. The con-
ditional pmf of (Xθ )+ +
n ≡ Xn is given by

fθ+ (xn ) = Prθ [Xn = xn |Xn > 0] = bθ,n fθ (xn ), xn > 0, (52)

13
where b−1 +
θ,n = Prθ [Xn > 0]. Note that Xn > 0 ⇒ Xn > 0. Clearly Xn Xn
and Ẋ+
n Ẋn for all θ > 0, while Ẍ+
n ≡ Ẍn for θ ≥ τ .

Proposition 4.2. (i) If TP2a holds then for each n ≥ 1, (Xθ )+ n is stochas-
tically increasing for θ ∈ (0, τ ]. Therefore, by Propositions 3.1 and 3.2,
d
(Ẋθ )+ +
n = (Xθuθ )n is stochastically decreasing for θ ∈ [τ, ψ).
(ii) If TP2a holds then for each n ≥ 1, (Ẋτ )n ≺ (Ẋτ )+ +
n ≺ (Ẍτ )n ≡ (Ẍτ )n .
(iii) If TP2a and TP2b hold, Ẍθ is stochastically increasing for θ ∈ [τ, ψ).
Proof. (i) We will show that Eθ [g(X+
n )] is strictly increasing in θ ∈ (0, τ ]
for any increasing bounded nonconstant g ≥ 0 on Zn+ , where Z+ denotes
the positive integers. The FKG inequality will yield the required result as
follows: for 0 < θ1 < θ2 ≤ τ ,

fθ+2 (X+
n)
Eθ2 [g(X+ +
n )] = Eθ1 g(Xn ) +
fθ1 (X+n)
fθ+2 (X+
n)
> Eθ1 [g(X+
n )] Eθ1 +
fθ1 (X+n)
= Eθ1 [g(X+
n )].

To apply the FKG inequality (51) with strict inequality it must be shown
fθ+ (xn )
that (a) fθ+1 (xn ) is FKG on Zn+ ; and (b) the ratio r(xn ) ≡ 2
fθ+ (xn )
is strictly
1
increasing on Zn+ ∩ {fθ+1 (xn )}
= Zn+ ∩ {ha (xn ) > 0}. First, for all θ > 0 and
xn > 0, it follows from (40) and (52) that

bθ,n θx1 +···+xn n


fθ+ (xn ) = ha (xi−1 , xi ), xn > 0. (53)
(A(θ))x0 +···+xn−1 i=1

By TP2a each factor ha (xi−1 , xi ) in (53) is TP2, hence their product is FKG,
thus so is fθ+ (xn ); this gives (a). Next, 0 < θ1 < θ2 ≤ τ ⇒ B(θ1 ) > B(θ2 ), so
 x0  x1 +···+xn−1  xn
bθ ,n A(θ1 ) B(θ1 ) θ2
r(xn ) ≡ 2 (54)
bθ1 ,n A(θ2 ) B(θ2 ) θ1

is strictly increasing in x1 , . . . , xn−1 , xn , which establishes (b).


(ii) The first inequality is immediate. For the second, apply the FKG

14
inequality as follows:
 
f¨τ (Ẋ+
n)
Eτ [g(Ẍn )] = Eτ g(Ẋ+ )
n ˙+
f (Ẋ+ )
τ n

f¨τ (Ẋ+n)
≥ Eτ [g(Ẋ+
n )] Eτ (55)
f˙τ+ (Ẋ+
n)
= Eτ [g(Ẋ+
n )].

As in (i), FKG is applicable in (55) because (a) f˙τ+ (xn ) ≡ fτ+ (xn ) is FKG on
Zn+ (by (53) with θ = τ ); and (b) the ratio

f¨τ (xn ) xn
r(xn ) ≡ = , (56)
f˙τ+ (xn ) bτ,n x0

(obtained from (47) and (52) with θ = τ ) is increasing on Zn+ ∩ {ha (xn ) > 0}.
To show that Eτ [g(Ẍn )] > Eτ [g(Ẋ+ n )] for at least one increasing g, take
+
g(xn ) = 1{2,3,... } (x1 ). Because Ẍτ,1 ≥ 1 and Ẋτ,1 ≥ 1, it follows from (47)
and (52) that

1 − Eτ [g(Ẍn )] = Prτ [Ẍ1 = 1]


τ
= ha (x0 , 1);
x0 (A(τ ))x0

1 − Eτ [g(Ẋ+ +
n )] = Prτ [Ẋ1 = 1]
ḃ1,τ τ
= ha (x0 , 1)
(A(τ ))x0
τ
= ha (x0 , 1)
{1 − Prτ [Ẋ1 = 0]}(A(τ ))x0
τ
=   a  x0  ha (x0 , 1).
1 − A(τ0 ) (A(τ ))x0

Because a0 > 0 and x0 ≥ 1, we conclude that Eτ [g(Ẍn )] > Eτ [g(Ẋ+


n )].
(iii) Since B(θ1 ) < B(θ2 ) when τ ≤ θ1 < θ2 , the FKG inequality is not
applicable here (recall (54)). Instead we use induction on n to show that

Eθ [g(Ẍn )] ≡ g(xn )f¨θ (xn ) (57)
xn >0

15
is increasing for θ ∈ [τ, ψ).
For n = 1, (42) gives
 1 − u x1  θ x1
f¨θ (x1 ) = θ
ha (x0 , x1 ), x1 > 0,
1 − uxθ 0 (A(θ))x0
which is TP2 in (θ, x1 ) by TP2b, so

Eθ [g(Ẍ1 )] ≡ g(x1 )fθ (x1 ) (58)
x1 >0

is increasing for θ ∈ (τ, ψ) by the monotonicity-preserving property of a TP2


≡ MLR kernel (Karlin (1968, Ch.1, Proposition 3.1)). For n ≥ 2,
Eθ [g(Ẍn )] = Eθ [Eθ [g(Ẍn−1 , Ẍn ) | Ẍn−1 ]] (59)
 
= Eθ ¨
g(Ẍn−1 , xn )fθ (xn | Ẍn−1 ) (60)
xn >0

≡ Eθ [gθ∗ (Ẍn−1 )]. (61)


From TP2a, TP2b, and (45), the transition probability f¨θ (xn | xn−1 ) of the
Markov process Ẍθ is TP2 in (xn , θ) and in (xn , xn−1 ), so the monotonicity-
preserving property implies that gθ∗ (Ẍn−1 ) is increasing in θ and in Ẍn−1 .
Thus by (60)-(61) and the induction hypothesis, Eθ [g(Ẍn )] is increasing for
θ for θ ∈ (τ, ψ). Lastly, these results extend to [τ, ψ) by (49) and continuity.
To show that Eθ [g(Ẍn )] is strictly increasing in θ for at least one increasing
g, take g(xn ) = 1{2,3,... } (x1 ). Because Ẍ1 ≥ 1, it follows from (42) that
1 − Eθ [g(Ẍn )] = Prθ [Ẍ1 = 1]
 1−u  θ
θ
= ha (x0 , 1)
1 − uθ (A(θ))x0
x0

x0
(1 − uθ )θ 1
= ha (x0 , 1). (62)
(1 − uxθ 0 )θx0 B(θ)
Because x0 ≥ 1 the first factor in (62) is decreasing in θ by TP2b, while
the second factor is strictly decreasing because B(θ) is strictly increasing for
θ ∈ [τ, ψ). 
Lemma 4.3. Let Xθ be a GW branching process with psod offspring distri-
bution (16). Each of the following two conditions is equivalent to Condition
TP2b: for θ ∈ (τ, ψ),
μθ − 1 1
≤ ; (63)
1 − μθuθ uθ
B  (θuθ ) + B  (θ) ≤ 0. (64)

16
Proof. Let δθ = duθ /dθ. Then (1 − uxθ )θx is TP2 iff for x = 1, 2, . . . , the
(1−ux+1 )θ x+1
ratio θ
(1−ux x is increasing in θ for θ ∈ (τ, ψ), equivalently, iff
θ )θ

d (1 − ux+1 )θ −(x + 1)uxθ δθ xux−1 δθ 1


log θ
≡ + θ x + ≥ 0. (65)
dθ (1 − uθ )
x
1 − uθ
x+1
1 − uθ θ

After some algebra, we find that this is equivalent to the inequality

[(1 − ux ) − xux (1 − u)] + dθ ux−1 [x(1 − u) − u(1 − ux )] ≥ 0, (66)

where we use the relation dθ = θδθ +uθ and abbreviate uθ by u. Because both
terms in square brackets are positive and dθ < 0, this is in turn equivalent
to the inequality
(1 − ux ) − xux (1 − u)
−dθ ≤ ≡ Δ(u, x). (67)
ux−1 [x(1 − u) − u(1 − ux )]

But Δ(u, 1) = 1 and Δ(u, x) ≥ 1 for x ≥ 2:

(1 − ux )(1 + ux ) − ux−1 x(1 − u)(1 + u)


Δ(u, x) − 1 =
ux−1 [x(1 − u) − u(1 − ux )]
(1 − u2x ) − ux−1 x(1 − u2 )
= x−1
u [x(1 − u) − u(1 − ux )]
(1 − u2 )[(1 + u2 + · · · + u2(x−1) ) − u2(x−1)/2 x]
=
ux−1 [x(1 − u) − u(1 − ux )]
≥ 0

because u2x is convex in x. Thus TP2b is equivalent to the simple relation

−dθ ≤ Δ(u, 1) ≡ 1, (68)

which, by (27), is equivalent to (63). Lastly, differentiate (23) with respect


to θ to establish the equivalence of (68) and (64). 
Example 4.1 (= 3.1 continued). For the Poisson(θ) psod, the coefficient
of θy in the power series [A(θ)]x = exθ is ha (x, y) = xy /y!, which is TP2 in
(x, y) so TP2a is satisfied. Furthermore μθ = θ, τ = 1, and from (33),
log uθ
− =θ (69)
1 − uθ

17
for θ ≥ 1, so (63) is equivalent to the inequality

−2u log u ≤ 1 − u2 , (70)

where u = uθ . This inequality holds for all u ∈ [0, 1], hence TP2b is also
satisfied. Thus by Proposition 4.2, (Ẋθ )+ n is stochastically decreasing and
(Ẍθ )n is stochastically increasing for θ ≥ 1, while (Ẋτ )+
n ≺ (Ẍτ )n . 
Example 4.2 (= 3.2 continued). For the negative binomial(r, θ) psod,
the coefficient of θy in the power series [A(θ)]x = 1/(1 − θ)rx is
Γ(rx + y) (rx + y − 1) · · · (rx)
ha (x, y) = = , (71)
Γ(rx)y! y!
which is TP2 in (x, y), so NB(r, ·) satisfies TP2a for all r > 0. Next, μθ =

(1−θ)
t
and τ = 1+t , where t = 1r . Set u = uθ and apply (35) to obtain

1−θ
= ut , (72)
1 − θu
1 − ut
= θ. (73)
1 − ut+1
After some algebra it is seen that (63) is equivalent to each of the inequalities

1 − ut 1 + ut−1
≤ τ , (74)
1 − ut+1 1 + ut
v t − v −t
≤ v − v −1 , (75)
t
where v = u−1 ≥ 1. Because h(t) ≡ v t − v −t is convex in t and h(0) = 0, (75)
holds iff t ≤ 1. Thus the NB(r, ·) psod family satisfies TP2b iff r ≥ 1. This
includes the geometric psod family (r = t = 1) where equality holds in (75).
Thus by Proposition 4.2, if r ≥ 1 then (Ẋθ )+
n is stochastically decreasing
and (Ẍθ )n is stochastically increasing for τ ≤ θ < 1, while (Ẋτ )+
n ≺ (Ẍτ )n ,
where τ = 1/(1 + r). 
Example 4.3 (= 3.3 continued). For the binary splitting GW process,
the coefficient of θy in the power series [A(θ)]x = (1 + θ2 )x is
x
y for y = 0, 2, . . . , 2x,
ha (x, y) = 2 (76)
0 otherwise,

18
which is TP2 in (x, y), so TP2a is satisfied. Furthermore B(θ) = θ + θ−1
and uθ = θ−2 for θ ≥ τ = 1, so (64) is equivalent to the valid inequality
2 − θ2 − θ−2 ≤ 0, hence TP2b is satisfied. Thus by Proposition 4.2, (Ẋθ )+ n
is stochastically decreasing and (Ẍθ )n is stochastically increasing for θ ≥ 1,
while (Ẋτ )+
n ≺ (Ẍτ )n . 
Remark 4.1. The maximum likelihood estimate (MLE) θ̂ is derived by
differentiating (40), then applying (18) to obtain the relation

Yn − x 0
μ̂ ≡ μθ̂ = , (77)
Yn−1

from which θ̂ can be obtained. Here μ̂ denotes the MLE of the mean μθ . 

5. Predicting extinction or explosion: the fixed sample size case


Based on observed data xn ≡ (x1 , . . . , xn ) from a non-terminated psod GW
process X ≡ Xθ with initial size x0 and fixed n, predict whether extinction
or explosion will occur for the current realization of the process.
By the Markov property for X,

Prθ [ extinction | Xn = xn ] = uxθ n = 1 − Prθ [ explosion | Xn = xn ]. (78)

The MLE û of uθ is given by û = uθ̂ where θ̂ is obtained from (77), so the


estimated extinction probability is

= 1 if xn ≤ x0 ,
Prθ̂ [ extinction | Xn = xn ] = ûxn (79)
< 1 if xn > x0 .

The value of ûxn can be used to predict extinction or explosion.


However, this procedure may reach unwarranted conclusions. For exam-
ple, if xn = x0 −1 then extinction will be predicted with certainty even though
the population has declined only slightly. Whereas (79) is based solely on
the value of xn , the prediction procedures derived in this section compare
xn to n in order to predict whether the observed process is on a trajectory
toward extinction or toward explosion.
5.1. Prediction as a testing problem. We reformulate the prediction
problem as a hypothesis-testing problem to which Neyman-Pearson theory
can be applied. As in Proposition 4.2, let X+ +
n ≡ (Xθ )n denote the conditional

19
random vector Xn | Xn > 0. Then conditional on non-termination at time
n, extinction will occur iff either

H≤+ : X+ +
or Ḣ≥+ : X+ +
d d
n = (Xθ )n , θ ≤ τ n = (Ẋθ )n , θ ≥ τ

d
holds, while explosion will occur iff Ḧ>+ : X+ +
n = (Ẍθ )n , θ > τ holds. (Recall
that (Ẍθ )+ = Ẍθ .) However, H≤+ = Ḣ≥+ by Proposition 3.2 while by (49) the
L1 -closure of Ḧ>+ is
Ḧ≥+ : X+ +
d
n = (Ẍθ )n , θ ≥ τ, (80)
so the prediction problem can be formulated as the following testing problem:
Based on observing X+
n = xn > 0, test

Ḣ≥+ (eventual extinction) vs. Ḧ≥+ (eventual explosion). (81)

Either Ḣ≥+ or Ḧ≥+ may be taken to be the null hypothesis. Note that
θ ≥ τ under both Ḣ≥+ and Ḧ≥+ . The conditional pmfs of X+ +
n under Ḣ≥ and
Ḧ≥+ are

f˙θ+ (xn ) ≡ Prθ [Ẋ+ ˙


n = xn ] = ḃθ,n fθ (xn ), xn > 0, (82)

and f¨θ+ (xn ) ≡ f¨θ (xn ), respectively, where f¨θ is given by (42) and (47) and

ḃ−1 −1
θ,n = Prθ [Ẋn > 0] = Prθuθ [Xn > 0] = bθuθ ,n . (83)

A version of the generalized LR criterion for (81) is

supτ ≤θ<ψ f¨θ+ (xn )


λ+ (xn ) ≡ , (84)
sup τ ≤θ<ψf˙+ (xn )
θ

but the numerator and denominator may be difficult to evaluate.


5.2. The least favorable distributions for fixed sample size. When
the psod satisfies Conditions TP2a and TP2b the testing problem (81) has a
convenient solution. Proposition 4.2 implies that Ḣ≥+ and Ḧ≥ are separated
families and that (f˙τ+ , f¨τ+ ) is a pair of least favorable distributions for (81).
By Theorem 3.8.1 of Lehmann and Romano (2005), a test of the form

accept Ḣ≥+ (predict extinction) if λ+ +


τ (Xn ) ≤ d,
(85)
accept Ḧ≥ (predict explosion) if λ+ +
τ (Xn ) > d,

20
is the UMP test of its size for (81), where d is a nonnegative constant and,
from (47) and (82),

f¨τ+ (xn ) xn xn
λ+
τ (xn ) ≡ = = , xn > 0. (86)
f˙τ+ (xn ) x0 ḃτ,n x0 bτ,n

Because λ+
τ (xn ) is strictly increasing in xn , the test (85) has the form

accept Ḣ≥+ (predict extinction) if Xn+ ≤ c,


(87)
accept Ḧ≥+ (predict explosion) if Xn+ ≥ c + 1,

where c is a nonnegative integer.


5.3. Exponential-type approximations for Ẋn+ when θ = τ . Suppose
first that Ḣ≥+ is taken to be the null hypothesis. If Xn+ = xn > 0 is observed,
the p-value Prτ [Ẋn+ ≥ xn ] for test (87) is determined by the distribution of
Ẋn+ under f˙τ+ . For large n the mean and variance of Ẋn+ can be approximated
via (93) as follows:

nστ2
Eτ (Ẋn+ ) = x0 bτ,n ≈ , (88)
2
nστ2  nστ2 
Varτ (Ẋn+ ) = x0 bτ,n [nστ2 − x0 (bτ,n − 1)] ≈ + x0 . (89)
2 2
Unfortunately the conditional rv Ẋn+ (≡ Ẋn | Xn > 0) is not the sum of x0
i.i.d. copies each with initial size 1: conditional on Xn > 0, some of the
initial x0 family lines may have terminated by time n. Therefore a normal
approximation is not available for Ẋn+ , even when x0 is large.
Fortunately, however, when n is large Yaglom’s classical exponential ap-
proximation can be applied. For a critical GW process (not necessarily psod)
with offspring variance σ 2 < ∞, Yaglom (1947) showed that if x0 = 1 then
2
lim Pr[Xn+ ≥ nz] = e−2z/σ (90)
n→∞

for any z > 0. This result appears under progressively weaker moment
conditions in Harris (1963, §I.10), Kesten, Ney, and Spitzer (1966, p.582),
Athreya and Ney (1972, §9), and Jagers (1975, Theorem 2.4.2), but only
for the case x0 = 1. When x0 ≥ 2 it might be expected that the limiting
exponential (EXP) distribution in (90) should be replaced by the distribution

21
of the sum of x0 independent exponential rvs, i.e., a gamma distribution, but
(90) continues to hold without change, cf. (92). However, we will also present
a more accurate gamma (GAM) approximation (94) that does depend on x0 .
Let Gr denote a gamma rv with shape parameter r > 0 and scale param-
eter 1 and let Gr (z) denote its cdf, that is,
 z
1
Gr (z) = tr−1 e−t dt. (91)
Γ(r) 0

Proposition 5.1. (i) Let {Xn } be a critical GW process with offspring pgf
φ, offspring variance σ 2 < ∞, and initial size x0 ≥ 1. For any z > 0,
2
lim Pr[Xn+ ≥ nz] = e−2z/σ , (92)
n→∞
lim nPr[Xn > 0] = 2x0 /σ 2 . (93)
n→∞

(ii) Let Ḡr (z) = 1 − Gr (z). For x0 ≥ 1 and large n,

1 
x0
x0 
Pr[Xn+ ≥ nz] ≈  2
 x0 Ḡr ( σ2z2 )( nσ2 2 )r (1 − 2 x0 −r
nσ 2
) . (94)
1− 1− nσ 2 r=1
r

Proof. (i) The existing results for the case x0 = 1 are based on the following
fact, cf. Jagers (1975, Lemma 2.4.1):

1 1 1 σ2
lim − = uniformly for 0 ≤ s < 1. (95)
n→∞ n 1 − φn (s) 1−s 2

Set s = 0 to obtain
2
lim n(1 − φn (0)) = . (96)
n→∞ σ2
For x0 ≥ 2, Xn has pgf φxn0 so by (96),

nPr[Xn > 0] = n(1 − φxn0 (0)) (97)


= n(1 − φn (0))[1 + φn (0) + · · · + φnx0 −1 (0)] (98)
2x0
→ (99)
σ2
because φn (0) ↑ 1; this confirms (93). Furthermore, the Laplace transform

22
of Xn+ /n is, for t ≥ 0,
+
E[e−tXn /n ]
φxn0 (e−t/n ) − φxn0 (0)
=
1 − φxn0 (0)
1 − φxn0 (e−t/n )
= 1−
1 − φxn0 (0)
n(1 − φn (e−t/n ))[1 + φn (e−t/n ) + · · · + φnx0 −1 (e−t/n )]
= 1−
n(1 − φn (0))[1 + φn (0) + · · · + φxn0 −1 (0)]
σ2
2 x0 1
→ 1− 1 σ2
= 2 as n → ∞
t
+ 2
x0 1 + tσ2

by (95) and the inequalities φn (0) < φn (e−t/n ) < 1. This is the Laplace
transform of the exponential distribution in (92), confirming that result.
d
(ii) Represent Xn = U1 + · · · + Ux0 , where the Ui are i.i.d. copies of Xn
but each with initial size x0 = 1. Then

Pr[Xn > 0] Pr[Xn+ ≥ nz]


 + 
≡ Pr[U1 + · · · Ux0 > 0] Pr U1 + · · · + Ux0 ≥ nz
 
= Pr U1 + · · · + Ux0 ≥ nz
  
= x
Pr Ui ≥ nz, Ui > 0 for i ∈ ω, Ui = 0 for i ∈

ω∈2 0 \∅ i∈ω

x0
 x0 
= r
Pr[U1 + · · · + Ur ≥ nz, U1 > 0, . . . , Ur > 0, Ur+1 = · · · = Ux0 = 0]
r=1

x0
 x0 
= r
Pr[U1+ + · · · + Ur+ ≥ nz] Pr[U1 > 0]r Pr[Ux0 = 0]x0 −r
r=1
x0
 x0 
≈ r
Ḡr ( σ2z2 )( nσ2 2 )r (1 − 2 x0 −r
nσ 2
)
r=1

for large n, by (92) and by (93) with x0 = 1. Furthermore, by (97) and (96),
 x
Pr[Xn > 0] = (1 − φxn0 (0)) ≈ 1 − 1 − nσ2 2 0 , (100)

which yields (94). 

23
From (92) and (94) and a continuity correction we obtain exponential-
type approximations for the p-value of the test (87) when Ḣ≥+ and Ḧ≥+ are
taken to be the null and alternative hypothesis, respectively:
2(xn −1)

Prτ [Ẋn+ ≥ xn ] ≈ e nστ2

≡ π̇ EXP (xn ; n), (101)


1 
x0
 x0 
Prτ [Ẋn+ ≥ xn ] ≈  2
 x0 ( nσ2 2 )r (1 − 2 x0 −r
nστ2
) Ḡr ( 2(xnσ
n −1)
2 )
1− 1− nστ2 r=1
r τ τ

GAM
≡ π̇ (xn ; n, x0 ). (102)
The EXP and GAM approximations coincide when x0 = 1. The approximate
p-value π̇ EXP (xn ; n) does not depend on the value of x0 ; it conveys significance
for Ḧ≥+ (explosion) iff xn  nστ2 , but convergence to the exact p-value is slow,
see Remark 5.1. The π̇ GAM (xn ; n) approximation is noticeably better.
Remark 5.1. The accuracy of the EXP and GAM approximations can be
assessed for the geometric psod (cf. Example 3.2). Here the pgf of Xn ≡ Ẋn
in the critical case can be obtained explicitly8 and expanded in a power series,
from which the exact distribution of Xn can be recovered. By (82) and (83)
this yields the exact distribution of Xn+ ≡ Ẋn+ in the critical case. The exact
p-values and their approximations π̇ EXP and π̇ GAM are shown in Tables 1 and
2, from which the superiority of the GAM approximation is apparent. 
5.4. Exponential-type approximations for Ẍn+ when θ = τ . Suppose
next that Ḧ≥+ is the null hypothesis. The p-value Prτ [Ẍn+ ≤ xn ] for test
(87) is determined by the distribution of Ẍn+ ≡ Ẍn under f¨τ+ ≡ f¨τ . Again
a normal approximation is not available for large x0 because Ẍn is not the
sum of x0 i.i.d. copies each with initial size 1: conditional on explosion, some
of the initial x0 family lines nonetheless may become extinct. However, an
exponential-type approximation is available for large n, based on the follow-
ing representation for the process Ẍτ ≡ {Ẍτ,n |n ≥ 1}. We shall abbreviate
Ẍτ,n to Ẍn .
Proposition 5.2 Define Zn = Ẍn − 1, n = 1, 2, . . . , z0 = x0 − 1. When
θ = τ , Z ≡ {Zn |n ≥ 1} is a critical GW process with immigration (GWI).
Specifically,
(n) (n)
Zn | Zn−1 = ξ1 + · · · + ξZn−1 + Wn , (103)

8
cf. eqn.(8.32) in Taylor and Karlin (1998) for the case x0 = 1.

24
x0 = 1 x0 = 2
n xn π̇ EXP Exact π̇ GAM Exact
5 10 0.165 0.194 0.198 0.226
5 20 0.022 0.031 0.032 0.042
5 30 0.003 0.005 0.005 0.008
10 20 0.150 0.164 0.165 0.178
10 40 0.020 0.024 0.024 0.029
10 60 0.003 0.004 0.004 0.005
15 20 0.282 0.293 0.294 0.305
15 40 0.074 0.081 0.081 0.087
15 60 0.020 0.022 0.022 0.025
100 150 0.225 0.227 0.227 0.229

Table 1: Exact and approximate p-values Prτ [Ẋn+ ≥ xn ] for the geometric psod when
x0 = 1 and x0 = 2.

x0 = 8 x0 = 14
n xn π̇ EXP π̇ GAM Exact π̇ GAM Exact
5 10 0.165 0.420 0.428 0.631 0.618
5 20 0.022 0.129 0.141 0.286 0.288
5 30 0.003 0.035 0.042 0.108 0.115
10 20 0.150 0.262 0.273 0.369 0.375
10 40 0.020 0.058 0.064 0.108 0.115
10 60 0.003 0.012 0.014 0.029 0.032
15 20 0.282 0.369 0.378 0.444 0.450
15 40 0.074 0.126 0.133 0.178 0.184
15 60 0.020 0.042 0.046 0.069 0.073
100 150 0.225 0.237 0.239 0.248 0.249

Table 2: Exact and approximate p-values Prτ [Ẋn+ ≥ xn ] for the geometric psod when
x0 = 8 and x0 = 14.

(n) (n) (n) d


where ξ1 , . . . , ξZn−1 , Wn are independent rvs, ξj = ξτ , and Wn is a nonneg-
ative integer-valued rv with pgf φτ (s). (This is a pgf since φτ (1) = μτ = 1.)

25
Proof. From (47),
n
xi τ xi
f¨τ (xn ) = h (x , xi )
xi−1 a i−1
i=1
x i−1 (A(τ ))

n
≡ gτ (xi |xi−1 ), (104)
i=1

so Ẍτ is a Markov chain with transition probability gτ (xi |xi−1 ). The condi-
tional pgf corresponding to gτ (xi |xi−1 ) is
1  τ xi
Eτ (sẌi | Ẍi−1 = ẍi−1 ) = x i s xi ha (xi−1 , xi )
xi−1 xi
(A(τ ))xi−1
s d  xi τ xi
= s ha (xi−1 , xi )
xi−1 ds x (A(τ ))xi−1
i

d
s
= [(φτ (s))xi−1 ]
xi−1 ds
= s(φτ (s))xi−1 −1 φτ (s). (105)
τ xi (i)
The third equality holds since h (x , xi )
(A(τ ))xi−1 a i−1
is the pmf of ξ1 + · · · +
(i)
ξxi−1 . Thus (105) implies that
(i) (i)
Ẍi | Ẍi−1 = 1 + ξ1 + · · · + ξXi−1 −1 + Wi , (106)
(i) (i)
where the ξj ’s and Wi are mutually independent rvs, the ξj ’s have common
pgf φτ , and Wi is the nonnegative integer-valued rv with pgf φτ (s). Now set
i = n in (106) to obtain (103). 
d
By the theorem of Seneta (1970), 2Zn /nστ2 → G2 (cf. (91)) if z0 = 1
(x0 = 2). Since Ẍn = Zn + 1, we obtain the following approximation when
x0 = 2:

2xn
Prτ [Ẍn ≤ xn ] ≈ G2 ≡ π̈ G2 (xn ; n) for large n. (107)
nστ2
We now show that if n is sufficiently large, π̈ G2 (xn ; n) remains a valid approx-
imation for Prτ [Ẍn ≤ xn ] for all x0 ≥ 2. In the process we derive a sharper
approximation π̈ G23 (xn ; n, x0 ) ≤ π̈ G2 (xn ; n) that depends on x0 as well as n.
The case x0 = 1 is treated separately.

26
Proposition 5.3. As in Proposition 5.2 let Zn = Ẍn − 1, z0 = x0 − 1, θ = τ .
(i) Assume that x0 ≥ 2, so z0 ≥ 1. Then if n is large and z > 0,
  
2(z0 − 1) 2z 2(z0 − 1) 2z
Prτ [Zn ≤ nz] ≈ 1 − G 2 + G 3 , (108)
nστ2 στ2 nστ2 στ2
so

Pr [Ẍ ≤ xn ] (109)
 τ n  
2(x0 − 2) 2xn 2(x0 − 2) 2xn
≈ 1− G2 + G3
nστ2 nστ2 nστ2 nστ2
≡ π̈ G23 (xn ; n, x0 ).

This reduces to (107) if x0 = 2 or nστ2  2(x0 − 2).


(ii) Assume that x0 = 1, so z0 = 0, and define

K = min{k|Ẍk ≥ 2} = min{k|Zk ≥ 1}. (110)

Then if n − K is large,

Pr [Z ≤ (n − K)z | K, ZK ] (111)
 τ n  
2(ZK − 1) 2z 2(ZK − 1) 2z
≈ 1− G2 2 + G3 2 ,
(n − K)στ
2 στ (n − K)στ
2 στ

so the conditional p-value given K and ẌK can be approximated as follows:

Pr [Ẍ ≤ xn |K, ẌK ] (112)


 τ n   
2(ẌK − 2) 2xn 2(ẌK − 2) 2xn
≈ 1− G2 + G3
(n − K)στ2 (n − K)στ2 (n − K)στ2 (n − K)στ2
≡ π̈ G23 (xn ; n − K, ẌK ).

Proof. (i) First assume that x0 ≥ 3, so z0 ≥ 2. Rewrite (103) as follows.


For n = 1,
(1) (1) (1)
Z1 | z0 = (ξ1 + W1 ) + (ξ¯1 · · · + ξ¯z0 −1 ) ≡ U1 + V1 , (113)
¯ are i.i.d. copies of ξτ . For n ≥ 2,
where the ξ’s and ξ’s
(n) (n) (n) (n)
Zn | Zn−1 = (ξ1 + · · · + ξUn−1 + Wn ) + (ξ¯1 · · · + ξ¯Vn−1 ) ≡ Un + Vn . (114)

27
(If Vn−1 = 0, Vn = 0.) Then {Un } is a critical GWI process with immigration
rvs {Wn } and initial size u0 = 1, {Vn } is a critical GW process with initial
size v0 = z0 − 1 = x0 − 2 ≥ 0, and {Un } is independent of {Vn }. Therefore
Prτ [2Zn ≤ nστ2 z] (115)
2 + 2
= Prτ [2Un ≤ nστ z] Prτ [Vn = 0] + Prτ [2(Un + Vn ) ≤ nστ z] Prτ [Vn > 0].
d
Since u0 = 1, Seneta’s result applies to give 2Un /nστ2 → G2 , while by (92)
d
2Vn+ /nστ2 → G1 . Because Un and Vn are independent,
Prτ [2(Un + Vn+ ) ≤ nστ2 z] → G3 (z) as n → ∞. (116)
Furthermore, nPrτ [Vn > 0] → 2(z0 − 1)/στ2 by (93), so (115) yields (108),
which, applying the continuity correction, yields (109) since Zn = Ẍn − 1.
If x0 = 2 so z0 = 1, then all Vn = 0 and (108) reduces to Seneta’s result
for {Un }.
(ii) The case x0 = 1 differs because when z0 = 0 the first nonzero value for
the GWI process {Zn } is ZK = WK and occurs when n = K. By conditioning
on K and ZK or ẌK , however, (111) and (112) follow directly from (108)
and (109) by replacing z0 by ZK , x0 by ẌK , and n by n − K. 
Like π̇ EXP (xn ; n), the approximate p-value π̈ G2 (xn ; n) does not depend on
x0 (≥ 2); it conveys significance for Ḣ≥+ (eventual extinction) iff xn  nστ2 .
We expect that π̈ G2 , like π̇ EXP , will converge only slowly to the exact p-
value, but that π̈ G23 will perform noticeably better. Note that π̈ G23 requires
nστ2 > 2(x0 − 2); otherwise the weight assigned to G2 in (109) is negative.
Remark 5.2. The accuracy of the G2 and G23 approximations can be
assessed for the geometric psod. The pmf of Ẍn in the critical case can be
obtained from (47) as follows: for xn > 0,
xn
Prτ [Ẍn = xn ] = x0
Prτ [Xn = xn ], (117)
and Prτ [Xn = xn ] can be obtained explicitly as in Remark 5.1. Exact p-values
and the G2 and G23 approximations are shown in Table 3, from which the
superiority of G23 is apparent. 
Remark 5.3. Moments of Ẍn can be obtained from (106) by recursion, e.g.,
Eτ (Ẍn ) = x0 + nστ2 , (118)
   4 
Varτ (Ẍn ) = n ωτ + n−3 2
στ + (x0 − 3)στ2 − 1 , (119)
where ωτ = E(ξτ3 ). 

28
x0 = 2 x0 = 8 x0 = 14
n xn π̈ G2 Exact π̈ G23 Exact π̈ G23 Exact
5 2 0.062 0.062 0.022 0.008
5 4 0.191 0.169 0.069 0.028
5 6 0.337 0.291 0.134 0.060
10 4 0.062 0.063 0.029 0.038 0.022
10 8 0.191 0.180 0.105 0.115 0.073
10 12 0.337 0.313 0.207 0.213 0.143
20 8 0.062 0.063 0.045 0.048 0.029 0.037
20 12 0.122 0.120 0.092 0.094 0.063 0.074
20 16 0.191 0.186 0.148 0.148 0.105 0.118
50 16 0.041 0.042 0.037 0.038 0.033 0.034
50 24 0.084 0.084 0.076 0.076 0.067 0.069
50 32 0.135 0.134 0.122 0.122 0.109 0.111

Table 3: Exact and approximate p-values Prτ [Ẍn+ ≤ xn ] for the geometric psod when
x0 = 2, 8, 14.

6. Predicting extinction or explosion: sequential sampling


6.1. Sequential probability ratio tests (SPRT). The SPRT (Barnard
(1946), Wald (1947), Ghosh (1970), Stuart and Ord (1991)) is well suited for
the following sequential version of the prediction problem:
Based on sequential data x ≡ (x0 , x1 , x2 , . . . ) from a psod GW process X ≡
Xθ with initial size x0 , predict whether extinction or explosion will occur for
the current realization of the process.
Unlike Section 5, non-termination need not be assumed. This prediction
problem can be formulated as the following testing problem:
Based on observing X sequentially, test
d
Ḣ≥ : X = Ẋθ , θ ≥ τ (eventual extinction) (120)
d
vs. Ḧ≥ : X = Ẍθ , θ ≥ τ (eventual explosion).

For fixed θ ≥ τ , the SPRT for testing f˙θ vs. f¨θ has the following form:

29
The SPRT (θ; B, A): fix 0 < B < 1 < A < ∞. For n = 1, 2, . . . ,


⎨stop and accept Ḣ≥ (predict extinction) if λθ (xn ) ≤ B,
stop and accept Ḧ≥ (predict explosion) if λθ (xn ) ≥ A,


continue sampling if B < λτ (xn ) < A,

where λθ (xn ) = f¨θ (xn )/f˙θ (xn ).


The stopping time for the SPRT(θ; B, A) is a random variable N (θ; B, A).
Because Prθ [Xn → 0 or ∞] = 1 for all θ ≥ τ , N (θ; B, A) is finite with
probability 1. As B decreases and A increases, Eθ [N (θ; B, A)] increases
under both Ḣ≥ and Ḧ≥ , but the error probabilities αθ ≡ αθ (θ; B, A) and
βθ ≡ βθ (θ; B, A) need not both decrease (cf. Wald (1947, p.45)). Here,

αθ (θ; B, A) ≡ Prθ [SPRT(θ; B, A) accepts Ḧ≥ | Ḣ≥ ]


βθ (θ; B, A) ≡ Prθ [SPRT(θ; B, A) accepts Ḣ≥ | Ḧ≥ ]

Wald (1947, §3.2) derived the following upper bounds: for any θ ≥ τ ,

1 − βθ (θ; B, A) 1
αθ (θ; B, A) ≤ ≤ , (121)
A A
βθ (θ; B, A) ≤ (1 − αθ (θ; B, A))B ≤ B. (122)
1
Thus if α and β are prespecified, we may choose B = β and A = α
to
guarantee that SPRT(θ; β, α1 ) satisfies the error bounds

αθ (θ; β, α1 ) ≤ α and βθ (θ; β, α1 ) ≤ β. (123)

Wald
 also derived
 the following approximations: if α + β < 1 then
β 1−β
SPRT θ; 1−α , α more nearly attains the specified error probabilities α and
 
β than does SPRT θ; β, α1 , , i.e.,
   
β
αθ θ; 1−α , 1−β
α
≈ α, βθ θ β
1−α
, 1−β
α
≈ β, (124)
   
β
αθ θ; 1−α , 1−β
α
β
+ βθ θ; 1−α , 1−β
α
≤ α + β. (125)

6.2. The least favorable distribution for sequential sampling. Be-


cause θ is unknown, the SPRT(θ; ·, ·) cannot be applied directly (but see
Section 6.3.) When the psod satisfies TP2a and TP2b, however, like (81) the
testing problem (120) has a convenient solution, namely the SPRT(τ ; ·, ·).

30
Propositions 4.1 and 4.2 imply that Ḣ≥ and Ḧ≥ are separated families and
that (f˙τ , f¨τ ) is a pair of least favorable distributions for (120). Furthermore,
by Propositions 4.1 and 4.2, αθ and βθ both  decrease
 as θ increases. There-
1−β
fore SPRT(τ ; β, α1 ) (respectively, SPRT τ ; 1−α , α ) is an optimal test for
β

f˙τ vs. f¨τ for which αθ ≤ α and βθ ≤ β (resp., approximately) for all θ ≥ τ .
Specifically, by (41) and (47),

f¨τ (xn ) xn
λτ (xn ) ≡ = , (126)
f˙τ (xn ) x0

so the SPRT(τ ; B, A) assumes the simple form




⎨stop and accept Ḣ≥ (predict extinction) if xn ≤ x0 B,
stop and accept Ḧ≥ (predict explosion) if xn ≥ x0 A,


continue sampling if x0 B < xn < x0 A.

Note that this is a universal prediction procedure, that is, it is valid for any
psod, in particular it does not depend on στ2 . As a consequence, however, it
is somewhat conservative.
6.3. A less conservative sequential prediction procedure.
 β 1−β  If θ were
known (θ > τ ), the SPRT(θ; β, α1 ) (respectively, SPRT θ; 1−α , α ) provides
an optimal test for f˙θ vs. f¨θ for which αθ ≤ α and βθ ≤ β (resp., αθ ≈ α and
βθ ≈ β). From (12) and (13),

f¨θ (xn ) u−xn − 1


λθ (xn ) ≡ = θ−x0 . (127)
f˙θ (xn ) uθ − 1

Because λθ (xn ) is strictly increasing in xn , the SPRT(θ; B, A) is given by


⎧  

⎨stop and accept Ḣθ (predict extinction) if Xn ≤ x0  uxθ 0 , B ,
 
stop and accept Ḧθ (predict explosion) if Xn ≥ x0  uxθ 0 , A ,

⎩    
continue sampling if x0  uxθ 0,B < Xn < x0  uxθ 0,A ,

where  
log ( 1−u )η + 1
(u, η) = u
, 0 < u < 1, 0 ≤ η < ∞. (128)
log( u1 )
For fixed u, (u, η) increases strictly and continuously from 0 to ∞ as η ranges
from 0 to ∞; also (u, 0) = 0 and (u, 1) = 1. Define (1, η) = (1−, η) = η.

31
Lemma 6.1. If 0 < u < 1 and 0 < η < 1 (resp., 1 < η < ∞), then (u, η)
is strictly decreasing (resp., strictly increasing) in u and
 
η < (u, η) < 1 resp., 1 < (u, η) < η . (129)
1−u
Proof. Set v = u
, so that 0 < v < ∞ and
 
log vη + 1) ¯ η).
(u, η) = ≡ (v,
log(v + 1)

For 1 < η < ∞, to show that (u, η) is strictly increasing for 0 < u < 1,
¯ η) is strictly decreasing for 0 < v < ∞, that is,
it suffices to show that (v,
¯
∂ (v, η)/∂v < 0. This is equivalent to showing that

η log(v + 1) log(ηv + 1)
− < 0,
ηv + 1 v+1
equivalently, that
1
Δ(v, η) ≡ (v + 1) log(v + 1) − (v + ) log(ηv + 1) < 0. (130)
η

But Δ(v, 1) = 0 for η = 1 and

∂Δ(v, η) (v + η1 )v log(ηv + 1)
= − +
∂η ηv + 1 η2
1  
= − 2 ηv − log(ηv + 1) < 0,
η

hence (A.1) holds. Then by L’Hospital’s rule,


¯
η = (0+, ¯ η) > (∞−,
η) > (v, ¯ η) = 1,

which yields the desired inequalities for (u, η). The results for 0 < η < 1
are established in similar fashion. 
From Lemma 6.1, the lower (resp., upper) stopping boundary for the
SPRT(θ; B, A) strictly increases (resp., strictly decreases) as θ increases on
[τ, ψ), hence the stopping region decreases and N (θ; B, A) decreases. Thus
   
x0 B < x0  uxθ 0 , B < x0 < x0  uxθ 0 , A < x0 A. (131)

32
This difference can be substantial (see Table 5) and implies that
Eθ [N (θ; B, A)] < Eθ [N (τ ; B, A)] for all θ ≥ τ. (132)
Thus if one is willing to assume a fairly unrestrictive upper bound uθ ≤
ū < 1 for the extinction probability uθ (e.g., ū = 0.90, 0.95, or 0.99), corre-
sponding to an unrestrictive lower bound
 β θ 1−β ≥ θ ≡ θū > τ for θ itself, then
by using the SPRT(θ; β, α1 ) or SPRT θ; 1−α , α ), by Proposition 4.1(ii) one
would control the first error probability for problem (120), i.e., αθ ≤ α for
all θ ≥ θ, while substantially reducing the expected stopping time. If TP2a
and TP2b hold, then by Proposition 4.2(iii) the second error probability also
would be controlled, i.e., βθ ≤ β for all θ ≥ θ.
Remark 6.1. For the Poisson(θ) psod, it follows from (33) that
log(ū)
θū = − , (133)
1 − ū
so θ.90 = 1.0536, θ.95 = 1.0259, θ.99 = 1.0050, which lower bounds are close
to the critical value τ = 1. For the negative binomial(r, θ) psod, (35) yields
1
1 − ū r
θū = r+1 , (134)
1 − ū r

1
which reduces to θū = 1+ū for the geometric(θ) psod when r = 1. Here again
1
this lower bound will be close to the critical value τ = 1+r if ū is close to 1.
In these cases, therefore, the assumption that uθ ≤ ū is not very restrictive
for ū = .90, .95, .99. 
Remark 6.2. Unlike the fixed-n prediction procedures derived in §5 (cf.
Remarks 5.1 and 5.2), the SPRTs compare xn to x0 rather than to n in order
to predict whether the observed process is on a trajectory toward extinction
or toward explosion. Note that if x0 is small, the SPRTs are useful for
predicting explosion but not for predicting extinction. For example,
   
x0  uxθ 0 , β < 1 ⇐⇒ x0 <  uθ , β1 , (135)
 
so if x0 <  uθ , β1 then the SPRT(θ; β, α1 ) reduces to the SPRT(θ; 0, α1 ):


⎨stop and accept Ḣθ (predict extinction) if Xn = 0,  
stop and accept Ḧθ (predict explosion) if Xn ≥ x0  uxθ 0 , α1 ,

⎩  
continue sampling if 1 ≤ Xn < x0  uxθ 0, α1 ,

33
hence will predict extinction only if extinction actually occurs. If x0 < β1 ,
the SPRT(τ ; β, α1 ) also reduces to SPRT(θ; 0, α1 ) hence behaves similarly. 
Remark 6.3. Note that the SPRT(θ; B, A) depends on θ only through the
value of the extinction probability uθ , not on the specific offspring distri-
bution, whether a psod or not. Therefore, hereafter we shall use the nota-
tion SPRT(uθ ; B, A), or simply SPRT(u; B, A). In particular, the universal
SPRT(τ ; B, A) in §9.2 is now designated as SPRT(1; B, A). 

7. Examples
Four examples are presented to illustrate the fixed-n (§5) and sequential (§6)
procedures for predicting extinction or explosion from the current realization
of a GW process. Because the Poisson, negative binomial, and geometric
psods are assumed, conditions TP2a and TP2b are satisfied, so these predic-
tion procedures possess the properties asserted in §5.2, 6.2, and 6.3.
Example 7.1: Smallpox in Sao Paolo, Brazil. An outbreak of variola
minor in Sao Paolo occurred in 1956 (see Table 4). This outbreak was caused
by a single infectious individual and lasted four generations before the schools
closed; see Becker (1972), Guttorp (1991, p.59). Becker (1977) and Heyde
(1979) modeled these data by a GW process; also see Guttorp (1991, p.58).
Like Heyde we assume a Poisson(θ) psod; here μθ = θ, τ = 1, στ2 = 1.

n 0 1 2 3 4
xn 1 5 3 12 24
Table 4: Occurrences of variola minor in Sao Paolo, Brazil, 1956.

These data suggest a trajectory toward explosion. To assess the strength


of this prediction, first consider the fixed-n testing problem (81) with Ḣ≥+
(eventual extinction) taken to be the null hypothesis and Ḧ≥+ (explosion) the
alternative. Here x0 = 1, n = 4, xn = 24 so the exponential approximation
(101) for the p-value of the fixed-n prediction procedure (87) is
47
π̇ EXP (24; 4) = e− 4 ≈ 7.89×10−6 , (136)

which strongly supports the prediction of explosion. Because n = 4 is not


large, this approximation is not entirely reliable. (Because x0 = 1, the EXP
and GAM approximations coincide.)

34
   1
    1

x0 ū x0  ūx0 , 0.05 x0  ūx0 , 0.05 x0  ūx0 , 0.01 x0  ūx0 , 0.01
1 1.00 0.05 20 0.01 100
0.99 0.05 18.3 0.01 69.5
0.95 0.05 14.0 0.01 35.8
0.90 0.05 11.1 0.01 23.7
14 1.00 0.70 280 0.14 1400
0.99 0.75 138.5 0.15 276.5
0.95 0.99 60.3 0.20 90.9
0.90 1.48 40.1 0.31 55.3
38 1.00 1.90 760 0.38 3800
0.99 2.29 232.1 0.46 384.2
0.95 5.13 93.6 1.14 124.8
0.9 12.39 66.3 4.09 81.5

Table 5: Stopping boundaries for SPRT(ū; β, α1 ), α = β = 0.05 and 0.01.

Next we apply the sequential testing approach. Here x0 = 1, so the stop-


ping boundaries for the sequential prediction procedure SPRT(ū; β, α1 ) (cf.
Remark 6.3) appear in the first tier of Table 5 for α = β = .05, .01 and
ū = 1.0, .99, .95, 0.90. As ū decreases, SPRT(ū; β, α1 ) becomes less conser-
1
vative, stopping more quickly. For example, SPRT(ū = 1.0; .05, .05 ) stops
and predicts explosion when xn ≥ 20, which here occurs when n = 4, while
1
SPRT(ū = .90; .05, .05 ) stops and predicts explosion when xn ≥ 11.1, which
occurs when n = 3.
1
The SPRT(ū = .90; .05, .05 ) requires the assumption that uθ ≤ ū = .90,
equivalently θ ≥ θū = 1.0536, see Remark 6.1. The reliability of this as-
sumption can be assessed in two ways. First, yn−1 = 21 and yn = 45 so
θ̂ = μ̂ = (45 − 1)/21 ≈ 2.095 from (77), which is substantially larger than
1.0536. Second, an estimate of uθ could be obtained from the nonparametric
MLE p̂ of the offspring distribution pθ (cf. Guttorp (1991, Proposition 3.4),
also Stigler (1971)), but this would require knowledge of the family histories
of each infected individual, which is unavailable. Here, however, p̂ can be
obtained from the EM algorithm because n is small, cf. Guttorp (1991, pp.
119-120). For n = 3, p̂ puts masses (0.239, 0.428, 0.206, 0.127) on 0, 1, 5,
6, and from (3) the estimated extinction probability for this distribution is
0.424. For n = 4 the estimated distribution puts masses (0.332, 0.147, 0.219,

35
0.302) on 0, 1, 2, and 5, yielding an estimated extinction probability 0.447.
Both estimates fall well below the assumed upper bound ū = 0.9. 
Because the outbreak terminated at the 7th generation, fixed-n prediction
methods (§5) are not relevant. Instead, beause x0 = 1 the stopping bound-
aries of SPRT(ū; β, α1 ) for ū = 1.0, 0.99, 0.95, 0.90 and α = β = .05, .01 again
appear in the first tier of Table 5. Because 1 ≤ xn ≤ 7 for n = 1, . . . , 6 in
this example, none of these SPRTs would stop sampling until the extinction
observed at n = 7. Note that θ̂ = μ̂ = (30 − 1)/30 = 0.967 < 1 by (77)
(yn−1 = yn = 30). 
Example 7.2: Pertussis in Washington State, 2012. The weekly num-
ber of new cases of pertussis remained fairly constant in 2011 (Figure 1) but
increased dramatically at the beginning of 2012 (Table 6), suggesting possi-
ble explosion. Here x0 = 1, n = 11, xn = 98, yn−1 = 594, and yn = 692. The
MLE μ̂ = 691/594 = 1.1633 and σ̃ 2 = 8.0342, where

1
n  x 2
2 ν
σ̃ ≡ xν−1 − μ̂ (137)
n ν=1 xν−1

is Dion’s (1975) estimate of the offspring variance, cf. Guttorp (1991, p.109).
Because μ̂  σ̃ 2 , the Poisson distribution does not fit these data. Instead,
since μ̂ and σ̃ 2 agree with the mean and variance of the negative binomial
NB(r̂, θ̂) psod with r̂ = 0.1970 and θ̂ = 0.8552 (cf. Example 3.2), we shall
assume the model NB(r = 0.1970, θ) with θ unknown (0 < θ < 1), so
τ = r(1 + r)−1 = 0.1646 and στ2 = 6.0761.

Week 1 2 3 4 5 6 7 8 9 10 11 12
n 0 1 2 3 4 5 6 7 8 9 10 11
xn 1 7 22 38 50 65 78 61 74 96 102 98
Table 6: Weekly occurrences of pertussis in Washington State, 2012.

To assess the evidence for a prediction of explosion, first consider the


fixed-n testing problem (81) with Ḣ≥+ (eventual extinction) taken to be the
null hypothesis and Ḧ≥+ (eventual explosion) the alternative. From (101) the
exponential approximation to the p-value of the fixed-n procedure (87) is
195
π̇ EXP (98; 11) = e− 11(6.0761) ≈ 0.054, (138)

36
which moderately supports the prediction of explosion.
By contrast, from Tables 5 and 6 the conservative SPRT(1.0; β, α1 ) with
α = β = 0.05 would have stopped and predicted explosion by Week 3 ! With
α = β = 0.01 this SPRT would not have stopped until Week 11, but the
1
SPRT(0.90; 0.01, 0.01 ) would have predicted explosion by Week 4.
In fact a state of health emergency was declared after Week 14 and an
innoculation program begun. The number of new cases9 continued to increase
to a peak of 254 in Week 20, then declined to 23 new cases in Week 52. Had
these sequential prediction procedures been applied, this program could have
begun much earlier, possibly greatly reducing the total number of cases.
We note that the prediction for 2012 (explosion) is the opposite of that
which our methods would obtain for 2011 (extinction), even though the in-
crease in μ̂ from 2011 to 2012 is small, namely 1.0455 vs. 1.1633. 
Example 7.3: California condors. Wilbur (1978) gives the annual pop-
ulation counts of the threatened California condor from 1968 through 1976
(see Table 7). Here x0 = 38, n = 8, xn = 19, yn−1 = 183, and yn = 202;
the MLE μ̂ = 164/183 = 0.8962 and σ̃ 2 = 2.2755. Because σ̃ 2 is not greatly
different from the estimated variance σθ̂2 = θ̂/(1 − θ̂)2 = 1.6992 under the
geometric GM(θ̂) distribution with θ̂ = μ̂/(1 + μ̂) = .4726 (see Example 3.2),
we will assume the GM(θ) psod model (0 < θ < 1) to illustrate its ease of
application. For this model A(θ) = 1/(1 − θ), τ = 1/2, and στ2 = 2.

Year 1968 1969 1970 1971 1972 1973 1974 1975 1976
n 0 1 2 3 4 5 6 7 8
Count xn 38 26 27 18 25 19 19 11 19
Table 7: Annual counts of California condors 1968-1976.

The data in Table 7 suggest a declining population, hence possible extinc-


tion. To evaluate this prediction, first consider the fixed-n testing problem
(81) with Ḧ≥+ (eventual explosion) as the null hypothesis and Ḣ≥+ (eventual
extinction) the alternative. Here x0 = 38 and n = 8 so nστ2 < 2(x0 −2), hence
the approximations π̈ G2 in (107) and π̈ G23 in (109) for the fixed-n prediction
procedures (87) are inapplicable (cf. Proposition 5.3(i)).

9
The weekly data shown have since been revised. We have used the unrevised data
because it was those upon which public health decisions were based.

37
1
By contrast, the sequential prediction procedure SPRT(0.9; 0.05, 0.05 ) would
have stopped in 1975 and predicted extinction. (Compare the data in Table
7 to the stopping boundaries in the last row of Table 5.)
In fact, by the mid 1980’s all remaining wild condors were captured and
moved to zoos, where a breeding program was begun, followed by relocation
back to the wild. By 2011 the total wild population had grown to 191, in
addition to 178 remaining in captivity. 
Example 7.4: North American whooping cranes. Miller et al. (1974)
give the annual counts of migrating whooping cranes, an endangered species,
arriving in Texas from 1938 (n = 0) through 1972 (n = 34); see Figure 4 and
Guttorp (1991, p.190)). Here x0 = 14, n = 34, xn = 51, yn−1 = 1072, and
yn = 1123; the MLE μ̂ = 1109/1072 = 1.0345. Since μ̂ does not differ greatly
from Dion’s estimate σ̃ 2 = 0.84, the Poisson(θ) psod model is assumed.

Whooping cranes at Aransas NWR


50
Number of cranes

40
30
20

1940 1945 1950 1955 1960 1965 1970

Figure 4: North American whooping crane population counts 1938-1972.

The counts in Figure 4 show an increasing trend, suggesting explosion. To


evaluate this prediction, first consider the fixed-n testing problem (81) with
Ḣ≥+ (eventual extinction) as the null hypothesis and Ḧ≥+ (eventual explosion)
as the alternative. Here στ2 = 1, so the EXP and GAM approximations (101)

38
and (102) for the p-values of the fixed-n prediction procedure (87) are
101
π̇ EXP (51; 34) = e− 34 ≈ 0.051, (139)
 16 14 14 14 1  101 
π̇ GAM (51; 34, 14) = 17
14 17 r=1 r 16 r Ḡ r 34
≈ 0.086, (140)

respectively, with π̇ GAM expected to be more accurate. This provides modest


support for a prediction of explosion.
1
By contrast, the sequential prediction procedure SPRT(0.9; 0.05, 0.05 ) would
have stopped in 1964 (n = 26, xn = 42 > 40.1) and predicted explosion, while
1
SPRT(0.9; 0.01, 0.01 ) would have stopped in 1969 (n = 31, xn = 56 > 55.3)
and predicted explosion. (The values 40.1 and 55.3 appear in the second tier
of Table 5.) 
Acknowledgement: We are grateful to Brayan Ortiz for his assistance with
numerical computations.

References

Athreya, K. B. and Ney, P. E. (1972). Branching Processes. Springer-Verlag,


Berlin.
Barnard, G. A. (1946). Sequential tests in industrial statistics (with discus-
sion). J. Royal Statist. Soc. Supplement 8, 1-26.
Becker, N. (1972). Vaccination programmes for rare infectious diseases.
Biometrika 59 443-453.
Becker, N. (1974). On parametric estimation for mortal branching processes.
Biometrika 61 393-399.
Becker, N. (1977). Estimation for discrete time branching processes with
application to epidemics. Biometrics 33 515-522.
Centers for Disease Control and Prevention (2012). Pertussis epidemic -
Washington, 2012; Morbidity and Mortality Weekly Report. Online at:
http://www.cdc.gov/mmwr/preview/mmwrhtml/mm6128a1.htm
Dion, J.-P. (1975). Estimation of the variance of a branching process. Ann.
Statist. 3 1183-1187.
Feller, W. (1968). An Introduction to Probability Theory and its Applications,
Third Edition. Wiley, NY.

39
Ghosh, B. K. (1970). Sequential Tests of Statistical Hypotheses. Addison-
Wesley, Reading, PA.
Guttorp, P. (1991). Statistical Inference for Branching Processes. Wiley,NY.
Guttorp, P. and Perlman, M. D. (2015). Testing subcriticality vs. supercriti-
cality in a Galton-Watson branching process with power series offspring
distribution. In preparation.
Harris, T. E. (1963). The Theory of Branching Processes. Springer-Verlag,
Berlin.
Heyde, C. C. (1979). On assessing the potential severity of an outbreak of a
rare infectious disease: a Bayesian approach. Australian J. Statist. 21
282-292.
Jagers, P.(1975). Branching Processes with Biological Application.Wiley, NY.
Karlin, S. (1966). A First Course in Stochastic Processes. Acad. Press, NY.
Karlin, S. (1968). Total Positivity. Stanford University Press, Stanford, CA.
Kemperman, J. H. B. (1977). On the FKG-inequality for measures on a
partially ordered space. Indag. Math. 39 313- 331.
Kesten, H., Ney, P., and Spitzer, F. (1966). The Galton-Watson process
with mean one and finite variance. Teor. Veroyatnost. i Primenen. 11
579-611.
Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses,
Third Edition. Springer, NY.
Miller, R. S., Botkin, D. B., and Mendelssohn, R. (1974). The whooping
crane (Grus americana) population of North America. Biol. Cons. 6
106-111.
Perlman, M. D. and Olkin, I. (1980). Unbiasedness of invariant tests for
MANOVA and other multivariate problems. Ann. Statist. 8 1326- 1341.
Seneta, E. (1970). An explicit-limit theorem for the critical Galton-Watson
process with immigration. J. Royal Statist. Soc. Series B 32 149-152.
Scott, D. (1987). On posterior asymptotic normality and asymptotic nor-
mality of estimators for the Galton-Watson process. J. Royal Statist.
Soc. Series B 49 209-214.

40
Stigler, S. M. (1971). The estimation of the probability of extinction and
other parameters associated with branching processes. Biometrika 58
499-508.
Stuart, A. and Ord, J. K. (1991). Kendall’s Advanced Theory of Statistics,
Vol. 2. Oxford University Press, NY.
Taylor, H. and Karlin, S. (1998). An Introduction to Stochastic Modeling,
Third Edition. Academic Press, Orlando.
Waugh, W. A. (1958). Conditioned Markov processes.Biometrika 45 241-249.
Wilbur, S. R. (1978). The California Condor, 1966-76: A Look at its Past
and Future. North American Fauna No. 72. 136 pp. U.S. Dept. of the
Interior, Fish and Wildlife Services, Washington D. C. 1978. Online at:
http://library.sandiegozoo.org/journal list/sWilbur CAcondor.pdf.

41

You might also like