Conc Graph
Conc Graph
November 1, 2010
Abstract
Let Y be a nonnegative random variable with mean µ and let Y s , defined on the same space as Y ,
have the Y -size biased distribution, that is, the distribution characterized by
E[Y f (Y )] = µEf (Y s ) for all functions f for which these expectations exist.
The size bias coupling of Y to Y s can be used to obtain the following concentration of measure result
when Y counts the number of isolated vertices in an Erdös- Rényi random graph model on n edges with
edge probability p. With σ 2 denoting the variance of Y ,
Z θ
Y −µ µ
P ≥ t ≤ inf exp(−θt + H(θ)) where H(θ) = sγs ds
σ θ≥0 2σ 2 0
with
n
pes
γs = 2e 2s
1+ + (1 − p)−n + 1.
1−p
Left tail inequalities may be obtained in a similar fashion. When np → c for some constant c ∈ (0, ∞)
as n → ∞, the bound is of the order at most e−kt for some positive constant k.
The problem of approximating the distribution of the number of vertices v with degree d(v) = d for some
fixed d was considered in [2], and a smooth function bound to the multivariate normal for a vector whose
components count the number of vertices of some fixed degrees was given in [4].
∗ Department of Mathematics, University of Southern California, Los Angeles, CA 90089, USA,[email protected] and
[email protected]
2000 Mathematics Subject Classification: Primary 60E15; Secondary 60C05,60D05.
Keywords: Large deviations, size biased couplings, Stein’s method.
1
In this paper, we study the number of isolated vertices Y of K, that is, the number of vertices having no
incident edges, given by
X
Y = 1(d(v) = 0). (2)
v∈V
In the same paper, Kolmogorov distance bounds to the normal were obtained and asymptotic normality
shown when
O’Connell [6] showed that an asymptotic large deviation principle holds for Y . Raič [7] obtained nonuni-
form large deviation bounds in some generality, for random variables W with E(W ) = 0 and Var(W ) = 1,
of the form
P (W ≥ t) 3
≤ et β(t)/6 (1 + Q(t)β(t)) for all t ≥ 0 (4)
1 − Φ(t)
where Φ(t) denotes the distribution function of a standard normal variate and Q(t) is a quadratic in t.
Although in general the expression for β(t) is not simple, when W equals Y properly standardized and
np → c as n → ∞, (4) holds for all n sufficiently large with
√
C1 C2 t C4 t/ n
β(t) = √ exp √ + C3 (e − 1)
n n
for some constants C1 , C2 , C3 and C4 . For t of order n1/2 , for instance, the function β(t) will be small as
n → ∞, allowing an approximation of the deviation probability P (W ≥ t) by the normal, to within some
factors. Theorem 1.1 below, by contrast, provides a non-asymptotic, explicit bound, that is, it does not
require any relation between n and p and is satisfied for all n. Moreover as (8) and (6) show, this bound is
2 2
of order e−at over some range of t, and of worst case order e−bt , for the right tail, and e−ct for the left tail,
where a, b and c are explicit, with the bounds holding for all t ∈ R.
Theorem 1.1. For n ∈ {1, 2, . . .} and p ∈ (0, 1) let K denote the random graph on n vertices where each
edge is present with probability p, independently of all other edges, and let Y denote the number of isolated
vertices in K. Then for all t > 0,
Z θ
Y −µ µ
P ≥ t ≤ inf exp(−θt + H(θ)) where H(θ) = sγs ds, (5)
σ θ≥0 2σ 2 0
2
Recall that for a nonnegative random variable Y with finite, nonzero mean µ, the size bias distribution
of Y is given by the law of a variable Y s satisfying
for all f for which the expectations above exist. The main tool used in proving Theorem 1.1 is size bias
coupling, that is, constructing Y and Y s , having the Y -size biased distribution, on the same space. In [3],
size bias couplings were used to prove concentration of measure inequalities when |Y s − Y | can be almost
surely bounded by a constant. Here, where Y is the number of isolated vertices of K, we consider a coupling
of Y to Y s with the Y -size bias distribution where the boundedness condition is violated. Unlike the main
theorem in [3] which can be applied to a wide variety of situations where the bounded coupling assumption
is satisfied, it seems that cases where the coupling is unbounded, such as the one we consider here, need
application specific treatment, and cannot be handled by one single general result.
Remark 1.1. Useful bounds for the minimization in (5) may be obtained by restricting to θ ∈ [0, θ0 ] for
some θ0 . In this case, as γs is an increasing function of s, we have
µ
H(θ) ≤ γθ θ 2 for θ ∈ [0, θ0 ].
4σ 2 0
The quadratic −θt + µγθ0 θ2 /(4σ 2 ) in θ is minimized at θ = 2tσ 2 /(µγθ0 ). When this value falls in [0, θ0 ] we
obtain the first bound in (8), while otherwise setting θ = θ0 yields the second.
( t2 σ 2
exp(− µγ ) for t ∈ [0, θ0 µγθ0 /(2σ 2 )]
Y −µ
P ≥t ≤ θ0
µγ θ 2 (8)
σ exp(−θ0 t + θ02 0 ) for t ∈ (θ0 µγθ /(2σ 2 ), ∞).
4σ 0
σ2 s
→ 1 + ce−c − e−c , β + 1 → ec + 1 and γs → 2e2s+ce + ec + 1 as n → ∞.
µ
Since limn→∞ γs and limn→∞ µ/σ 2 exist, the right tail decays at the rate at most e−θ0 t . Also, the left tail
bound (6) in this asymptotic behaves at fixed t as
t 1 + ce−c − e−c
2
σ2
2
t
lim exp − = exp − .
n→∞ 2 µ(β + 1) 2 ec + 1
The paper is organized as follows. In Section 2 we review results leading to the construction of size
biased couplings for sums of possibly dependent variables, and then in Section 3 apply this construction to
the number Y of isolated vertices, a sum of indicator variables; this construction first appeared in [4]. The
proof of Theorem 1.1 is also given in Section 3.
3
Definition 2.1. Let A be an arbitrary index set and let {Xα : α ∈ A} be a collection of nonnegative random
variables with finite, nonzero expectations EXα = µα and joint distribution dF (x). For β ∈ A, we say that
Xβ = {Xαβ : α ∈ A} has the X size bias distribution in coordinate β if Xβ has joint distribution
dF β (x) = xβ dF (x)/µβ .
Just as (9) is related to (7), the random vector Xβ has the X size bias distribution in coordinate β if and
only if
E[Xβ f (X)] = µβ E[f (Xβ )] for all functions f for which these expectations exist.
Letting f (X) = g(Xβ ) for some function g one recovers (7), showing that the β th coordinate of Xβ , that is,
Xββ , has the Xβ size bias distribution.
The factorization
P (X ∈ dx) = P (X ∈ dx|Xβ = x)P (Xβ ∈ dx)
of the joint distribution of X suggests a way to construct X. First generate Xβ , a variable with distribution
P (Xβ ∈ dx). If Xβ = x, then generate the remaining variates {Xαβ , α 6= β} with distribution P (X ∈
dx|Xβ = x). Now, by the factorization of dF (x), we have
dF β (x) = xβ dF (x)/µβ = P (X ∈ dx|Xβ = x)xβ P (Xβ ∈ dx)/µβ = P (X ∈ dx|Xβ = x)P (Xββ ∈ dx). (10)
Hence, to generate Xβ with distribution dF β , first generate a variable Xββ with the Xβ size bias distribution,
then, when Xββ = x, generate the remaining variables according to their original conditional distribution
given that the β th coordinate takes on the value x.
Definition 2.1 and the following special case of a proposition from Section 2 of [4] will be applied in the
subsequent constructions; the reader is referred there for the simple proof.
Proposition 2.1. Let A be an arbitrary index P set, and let X = {Xα , α ∈ A} be a collection of nonnegative
random variables with finite means. Let Y = β∈A Xβ and assume µA = EY is finite and positive. Let
Xβ have the X-size biased distribution in coordinate β as in Definition 2.1. Let I be a random index taking
values in A with distribution
P (I = β) = µβ /µA , β ∈ A.
I
Then if X has the mixture distribution β∈A P (I = β)L(Xβ ), the variable Y s = α∈A XαI has the Y -sized
P P
biased distribution as in (7).
In our examples we use Proposition 2.1 and the random index I, and (10), to obtain Y s by first generating
XII with the size bias distribution of XI , then, if I = α and Xαα = x, generating {Xβα : β ∈ A\{α}} according
to the (original) conditional distribution P (Xβ , β 6= α|Xα = x).
4
Hence, choose one of the n vertices of K uniformly. If the chosen vertex, say V , is already isolated, we do
nothing and set K s = K, as the remaining variables already have the conditional distribution given that V
is isolated. Otherwise obtain K s by deleting all the edges connected to K. By Proposition 2.1, the variable
Y s counting the number of isolated vertices of K s has the Y -size biased distribution.
To derive the needed properties of this coupling, let N (v) be the set of neighbors of v ∈ V, and T be the
collection of isolated vertices of K, that is, with d(v), the degree of v, given in (1),
Note that Y = |T |. Since all edges incident to the chosen V are removed in order to form K s , any neighbor
of V which had degree one thus becomes isolated, and V also becomes isolated if it was not so earlier. As
all other vertices are otherwise unaffected, as far as their being isolated or not, we have
X
Y s − Y = d1 (V ) + 1(d(V ) 6= 0) where d1 (V ) = 1(d(w) = 1). (12)
w∈N (V )
So in particular the coupling is monotone, that is, Y s ≥ Y . Since d1 (V ) ≤ d(V ), (12) yields
Y s − Y ≤ d(V ) + 1. (13)
L(d(V )1(V 6∈ T )|T ) = P (V 6∈ T )L( Bin(n − 1 − Y, p)| Bin(n − 1 − Y, p) > 0) + P (V ∈ T )δ0 , (16)
where δ0 is point mass at zero. Using (16) and the mass function of the conditioned binomial
(
n−1−Y pk (1−p)n−1−Y −k
for 1 ≤ k ≤ n − 1 − Y
P (d(V ) = k|T , V 6∈ T ) = k 1−(1−p)n−1−Y
0 otherwise,
it can be easily verified that the conditional moment generating function of d(V ) and its first derivative are
bounded by
(peθ + 1 − p)n−1−Y − (1 − p)n−1−Y
E(eθd(V ) 1(V 6∈ T )|T ) ≤ and
1 − (1 − p)n−1−Y
(n − 1 − Y )(peθ + 1 − p)n−2−Y peθ
E(d(V )eθd(V ) 1(V 6∈ T )|T ) ≤ .
1 − (1 − p)n−1−Y
By the mean value theorem applied to the function f (x) = xn−1−Y , for some ξ ∈ (1 − p, 1) we have
5
Hence, recalling θ ≥ 0,
Similarly applying the mean value theorem to f (x) = (x + 1 − p)n−1−Y , for some ξ ∈ (0, peθ ) we have
as in (17).
Next, to handle the second to last term in (15) consider
(n − 1 − Y )p (n − 1 − Y )p
E(d(V )1(V 6∈ T )|T ) ≤ ≤ =β where β = (1 − p)−n . (19)
1 − (1 − p)n−1−Y (n − 1 − Y )p(1 − p)n
s θγθ
E(eθY − eθY ) ≤ E(eθY ) for all θ ≥ 0.
2
Letting m(θ) = E(eθY ) thus yields
0 θY θY s θγθ
m (θ) = E(Y e ) = µE(e )≤µ 1+ m(θ). (21)
2
Setting
M (θ) = E(exp(θ(Y − µ)/σ)) = e−θµ/σ m(θ/σ),
differentiating and using (21), we obtain
1 −θµ/σ 0 µ
M 0 (θ) = e m (θ/σ) − e−θµ/σ m(θ/σ)
σ σ
µ −θµ/σ θγθ µ
≤ e (1 + )m(θ/σ) − e−θµ/σ m(θ/σ)
σ 2σ σ
µθγθ µθγθ
= e−θµ/σ 2
m(θ/σ) = M (θ). (22)
2σ 2σ 2
Since M (0) = 1, (22) yields upon integration of M 0 (s)/M (s) over [0, θ],
Z θ
µ
log(M (θ)) ≤ H(θ) so that M (θ) ≤ exp(H(θ)) where H(θ) = sγs ds.
2σ 2 0
6
Hence for t ≥ 0,
Y −µ θ(Y − µ)
P( ≥ t) ≤ P (exp( ) ≥ eθt ) ≤ e−θt M (θ) ≤ exp(−θt + H(θ)).
σ σ
As the inequality holds for all θ ≥ 0, it holds for the θ achieving the minimal value, proving (5).
For the left tail bound let θ < 0. Since Y s ≥ Y and θ < 0, using (11) and (13) we obtain
s |θ| θY s
E(eθY − eθY ) ≤ E (e + eθY )(Y s − Y )
2
≤ |θ|E(eθY (Y s − Y ))
= |θ|E(eθY E(Y s − Y |T ))
≤ |θ|E(eθY E((d(V ) + 1)1(V 6∈ T ))|T )).
and therefore
s
m0 (θ) = µE(eθY ) ≥ µ (1 + (β + 1)θ) m(θ).
µ(β + 1)θ2
log(M (θ)) ≤ . (23)
2σ 2
The inequality in (23) implies that for all t > 0 and θ < 0,
Y −µ µ(β + 1)θ2
P( ≤ −t) ≤ exp(θt + ).
σ 2σ 2
Taking θ = −tσ 2 /(µ(β + 1)) we obtain (6).
References
[1] Baldi, P., Rinott, Y. and Stein, C. (1989). A normal approximations for the number of local
maxima of a random function on a graph, Probability, Statistics and Mathematics, Papers in Honor of
Samuel Karlin, T. W. Anderson, K.B. Athreya and D. L. Iglehart eds., Academic Press, 59-81.
[2] Barbour, A.D., Karoński, M. and Ruciński,A.(1989). A central limit theorem for decomposable
random variables with applications to random graphs, J. Combinatorial Theory B, 47, 125-145.
[3] Ghosh, S. and Goldstein, L.(2009). Concentration of measures via size biased couplings, Probab.
Th. Rel. Fields, to appear.
7
[4] Goldstein, L. and Rinott, Y.(1996). Multivariate normal approximations by Stein’s method and
size bias couplings, Journal of Applied Probability, 33,1-17.
[5] Kordecki, W.(1990). Normal approximation and isolated vertices in random graphs, Random Graphs
’87, Karoński, M., Jaworski, J. and Ruciński, A. eds., John Wiley & Sons Ltd.,1990, 131-139.
[6] O’Connell, N.(1998). Some large deviation results for sparse random graphs, Probab. Th. Rel. Fields,
110, 277-285.
[7] Raič, M.(2007). CLT related large deviation bounds based on Stein’s method, Adv. Appl. Prob., 39,
731-752.