0% found this document useful (0 votes)
2 views

Random_cliques_in_random_graphs_revisited

This paper investigates the distribution of copies of a given graph H, specifically Kr, in the random graph G(n, p), highlighting the differences between this distribution and an independent hypergraph model. The authors derive a new upper bound on the number of Kr-factors in G(n, p) above the threshold for their appearance, and present results applicable to random hypergraphs. The findings suggest that the distribution of Kr copies is largely independent, with adjustments for edge-intersecting pairs, and have implications for understanding the existence and number of Kr-factors in dense graphs.

Uploaded by

ilychapterman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Random_cliques_in_random_graphs_revisited

This paper investigates the distribution of copies of a given graph H, specifically Kr, in the random graph G(n, p), highlighting the differences between this distribution and an independent hypergraph model. The authors derive a new upper bound on the number of Kr-factors in G(n, p) above the threshold for their appearance, and present results applicable to random hypergraphs. The findings suggest that the distribution of Kr copies is largely independent, with adjustments for edge-intersecting pairs, and have implications for understanding the existence and number of Kr-factors in dense graphs.

Uploaded by

ilychapterman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Random cliques in random graphs revisited

Robert Morris∗† and Oliver Riordan‡§


arXiv:2504.00964v1 [math.CO] 1 Apr 2025

April 2, 2025

Abstract
We study the distribution of the set of copies of some given graph
H in the random graph G(n, p), focusing on the case when H = Kr .
Our main results capture the ‘leading term’ in the difference between this
distribution and the ‘independent hypergraph model’, where (in the case
r
H = Kr ) each copy is present independently with probability π = p(2) .
As a concrete application, we derive a new upper bound on the number
of Kr -factors in G(n, p) above the threshold for such factors to appear.
We will prove our main results in a much more general setting, so that
they also apply to random hypergraphs, and also (for example) to the
case when p is constant and r = r(n) ∼ 2 log 1/p (n).

1 Introduction
Our aim in this paper is to study the distribution of the copies of some given
graph H in a binomial random graph G(n, p). Intuitively, if we do not expect
to see any pairs of copies of H sharing edges, then not only will the number
of copies of H be asymptotically Poisson, but the random set of copies will
be close to a random subset of all possible copies of H in which each appears
independently with probability π = pe(H) . As a warm-up for our main result,
we formalise this in Section 3. We then consider the case in which intersecting
pairs are likely to appear, but larger ‘clusters’ are unlikely, or there are not
too many of them. Here we prove a result capturing the effect of intersecting
pairs on the distribution of the set of copies, with an error term involving the
expectation of larger clusters. Finally, we will give an application of this result,
bounding the number of Kr -factors in G(n, p) for r > 3 constant.
Throughout, we consider the random set of copies of H, not just the number
of such copies. This paper is in some sense inspired by [22], where the second
∗ IMPA, Estrada Dona Castorina 110, Jardim Botânico, Rio de Janeiro 22460-320, Brazil.

Email: [email protected]
† RM was partially supported by FAPERJ (Proc. E-26/200.977/2021) and by CNPq

(Procs 303681/2020-9 and 407970/2023-1)


‡ Mathematical Institute, University of Oxford, Radcliffe Observatory Quarter, Woodstock

Road, Oxford, OX2 6GG, UK. Email: [email protected]


§ For the purpose of open access, the author has applied a CC BY public copyright licence

to any author accepted manuscript arising from this submission.

1
author studied copies of Kr in G(n, p) via a coupling argument, but the technical
approach we take here will be very different, arguing not via coupling, but by
directly estimating the probability of each ‘plausible’ outcome.
We will state and prove our main results in a more general setting (so that
they also apply, for example, to hypergraphs), which makes them a little hard
to absorb quickly (see Theorems 4.1 and 5.1). We will therefore begin by giving
two relatively simple consequences for the case of copies of Kr in G(n, p), first
for cliques of constant size, and then for larger cliques when p is constant.

1.1 Constant-size cliques in G(n, p)


For any graph G and r ∈ N, let us write Hr (G) for the r-uniform hypergraph
on V (G) that encodes the r-cliques in G, that is,

S ∈ E(Hr (G)) ⇔ G[S] ∼


= Kr .

Our first theorem bounds the probability that Hr (G(n, p)) is equal to a fixed
hypergraph H (for ‘most’ choices of H) as long as µr 6 n1+ε for some small
constant ε > 0, where  
n (r2)
µr = p
r
denotes the expected number of edges of Hr (G(n, p)). In order to state it, let
r−1    
1X n r n − r 2(r2)  −(2s) 
Λ(n, r) = p p −1 (1.1)
2 s=2 r s r−s

and observe that Λ(n, r) is the difference between the expected number of pairs
of edge-overlapping copies of Kr in G(n, p), and the expected number of pairs
of hyperedges in the random r-uniform hypergraph Hr (n, π) that intersect in
r
at least two vertices, where π = p(2) . Moreover, given an r-uniform hypergraph
H, let us write G(H) for the graph whose edges are all pairs of vertices of H
that are contained in some hyperedge of H, and define
 
r 
t(H) = e(H) − e G(H) . (1.2)
2

Observe that t(H) is equal to the number of ‘repeated’ pairs of vertices of H,


where a pair that is contained in k hyperedges is counted k − 1 times.
We are now ready to state the first application of our main theorems.

Theorem 1.1. For each r > 5, there exists γ > 0 such that if p 6 n−2/r+γ
and µr → ∞, then the following holds. There exists a collection G of r-uniform

hypergraphs, with P Hr (G(n, p)) ∈ G → 1 as n → ∞, such that 1
 n
P Hr (G(n, p)) = H = π e(H) (1 − π)( r )−e(H) p−t(H) e−Λ(n,r)+O(ωµr+1 +n )
−γ

for every H ∈ G.
1 Here, and throughout the paper, ω = ω(n) is a function with ω(n) → ∞ as n → ∞

arbitrarily slowly. The rate at which P Hr (G(n, p)) ∈ G → 1 in Theorem 1.1 depends on ω.

2
Roughly speaking, Theorem 1.1 says that the copies of Kr in G(n, p) arise
essentially independently, except for a bias towards edge-intersecting pairs of
copies, with the weighting corresponding to how much more likely such pairs
are in G(n, p) than in Hr (n, π). Indeed, note that the random hypergraph
n
Hr (n, π) is equal to H with probability exactly π e(H) (1 − π)( r )−e(H) , and the

term p−t(H) compensates for the difference between r2 e(H) and the number
of edges of G(H). The term e−Λ(n,r) is perhaps more surprising; to explain it,
let Z denote the number of pairs of edges of Hr (n, π) that intersect in exactly
two vertices, and note that the distribution of Z is roughly Poisson with mean
λ = E[Z]. Moreover, if Z were exactly Poisson, then a standard calculation
would give
   ′
E p−Z = exp E[Z/p] − E[Z] = eλ −λ ,
where λ′ is the expected number of pairs of copies of Kr in G(n, p) that overlap
in a single edge. A similar argument applies for other intersection sizes, so the
formula in the theorem corresponds (roughly) to the distribution of Hr (n, π)
reweighted by p−t(H) , with e−Λ(n,r) as normalizing factor. Of course this is not
anything like a proof of the result, just an argument that it is plausible. The
content of the theorem is that one can, in fact, use this relatively simple modifi-
cation of the binomial random hypergraph Hr (n, π) as a substitute for the (much
more complicated) distribution of Hr (G(n, p)), even when this distribution is
not close to that of Hr (n, π). In particular, note that if p = o(n−2/r ) then
µr+1 = o(1), and therefore Theorem 1.1 gives an asymptotic formula for the
probability of the event {Hr (G(n, p)) = H}, and thus an essentially complete
description of the distribution of Hr (G(n, p)), up to total variation distance
o(1).
Theorem 1.1 will be finally proved in Section 6, as a consequence of two
more technical results, Theorem 4.1, which will give the upper bound, and
Theorem 5.1, which extends this to a two-sided bound. We remark that we will
also prove a similar result when r ∈ {3, 4}, but in these cases the error terms
are somewhat more complicated, so we defer the statement to Theorem 6.2.

1.2 The number of Kr -factors in G(n, p)


While describing the distribution of the copies of Kr in G(n, p) seems to us to
be a natural aim in its own right, one might wonder whether Theorem 1.1 has
any concrete applications. We will show that the answer is yes, considering the
number of Kr -factors in G(n, p) above the threshold for such factors to appear.
This threshold was determined up to a constant factor in a famous paper of
Johansson, Kahn and Vu [11], and it was recently shown that moreover
1/(r2)
qr (n) = (r − 1)! log n n−2/r (1.3)

is the sharp threshold for G(n, p) to contain a Kr -factor; more precisely, this
follows from the work of Kahn [13] on perfect matchings in random hypergraphs,
together with the couplings of the second author [22] and Heckel [6].
There has been significant interest in recent years in the problem of bound-
ing the number of Kr -factors in dense graphs, and related spanning structures

3
such as perfect matchings and Hamilton cycles in dense hypergraphs, see for
example [1, 3, 12, 15, 16, 17, 18, 21]. Perhaps surprisingly, this question turns
out to be closely related to finding Kr -factors in G(n, p), and several of these re-
cent results have been deduced from generalisations of the Johansson–Kahn–Vu
Theorem. In particular, Allen, Böttcher, Corsten, Davies, Jenssen, P. Morris,
Roberts and Skokan [1] determined the threshold for the existence of a triangle-
factor in a random subgraph of a graph G with minimum degree at least 2n/3,
and used this result to deduce a bound on the number of triangle-factors in
G that is sharp up to a factor of roughly (log n)n/3 . These results were then
strengthened and generalised in [16, 21] using the recent breakthroughs on the
Kahn–Kalai conjecture by Frankston, Kahn, Narayanan and Park [4] and Park
and Pham [20], together with the couplings from [6, 22]. For further background
on clique factors in G(n, p), see Section 7.
Given r > 3 and a graph G, let us write Fr (G) for the number of Kr -factors
in G, let Hr (n, m) denote the uniformly random r-uniform hypergraph with n
vertices and m edges, and observe that if n is a multiple of r, then the expected
number of perfect matchings in Hr (n, m) is exactly

n! (m)n/r
Σ(n, m) := ,
r!n/r (n/r)! (N )n/r

where N = nr and (x)k denotes the falling factorial x(x − 1) · · · (x − k + 1).
We will prove the following theorem.

Theorem 1.2. Let r > 3 and ε > 0 be constants, and suppose that p = p(n)
satisfies p > (1 + ε)qr (n) and p = n−2/r+o(1) . If n ∈ N and r | n, then

Fr (G(n, p)) = eo(n/ log n) Σ(n, m) (1.4)


  r 
with high probability, where m = nr p(2) is the expected number of copies of
Kr in G(n, p) rounded to an integer.

The lower bound in (1.4) follows from the method of [13] via the couplings
from [6, 22] (see Corollaries 7.2 and 7.3); we will use Theorem 1.1 to prove (a
stronger form of) the upper bound (see Theorem 7.4). It is perhaps helpful to
note that the expectation of Fr (G(n, p)) is larger than Σ(n, m) by a factor of
eΘ(n/ log n) when p = Θ(qr (n)) (see (7.1) and (7.2)). In particular, this means
that, by Theorem 1.2, the number of Kr -factors in G(n, p) is typically much
smaller than its expectation when p is close to the threshold.
To put the error term in (1.4) into context, we make the following conjecture.

Conjecture 1.3. Let r > 3 be constant and let p 6 n−2/r+o(1) . Then


 
Fr (G(n, p)) = e−Θ(n /m ) Σ(n, m) · 1 Fr (G(n, p)) > 0
3 2
(1.5)
 n (r) 
with high probability, where m = r p
2 .

Note that when p = Θ(qr (n)), the correction term in the conjecture is of the

form exp −Θ n/(log n)2 ; this term comes from the variation in the number
of intersections between copies of Kr in G(n, p). We expect the conjecture to

4
hold up to roughly m = n4/3 , at which point the variation in the number of
copies of Kr has a larger effect.
We also expect a corresponding hitting time statement to hold. From the
results of Kahn [14] and Heckel, Kaufmann, Müller and Pasch [7], we know
that with high probability the random graph process (adding edges randomly
one by one) contains a Kr -factor as soon as every vertex is in a copy of Kr ,
and we conjecture that at this point the number of such Kr -factors is given
by (1.5). We also make the corresponding conjectures for Shamir’s problem
(perfect matchings in a random hypergraph); in that setting, a lower bound
with a (larger) exp(o(n)) error term is given in [14, Theorem 1.4].

1.3 Almost maximal cliques for constant p


Our main theorems can also be applied to much larger cliques in denser random
graphs; in fact, our original motivation for the results proved in this paper was
to study the chromatic number of G(n, 1/2), where one needs to control the
distribution of the independent sets of size close to α(G) (see, e.g., [5, 8, 9]).
With that setting in mind, let us assume in this subsection that p = p(n) is
bounded away from zero and one, and that p and r = r(n) ∼ 2 log1/p (n) are
chosen so that
µr = Θ(n1+θ ), (1.6)
where θ is a constant2 and µr (as before) denotes the expected number of
copies of Kr in G(n, p). For technical reasons, we will need to assume that
−1 < θ < 1/2; we are most interested in the case θ > 0. Under this assumption
Hr (G(n, p)) is unlikely to contain any pairs of copies of Kr intersecting in s
vertices for any 3 6 s 6 r − 2, and will typically contain Θ∗ (nθ ) pairs that
intersect in exactly r − 1 vertices.3 As a consequence, in this context we expect
to be able to replace the correction term Λ(n, r) in Theorem 1.1 by the simpler
expression    
′ 1 n r n − r 2(r2)  −1 
Λ (n, r) = p p −1 , (1.7)
2 r 2 r−2
which is just the term corresponding to s = 2 in (1.1), and indeed we obtain
the following variant of Theorem 1.1 for constant p.

Theorem 1.4. Let ε > 0 be constant, let −1 + ε 6 θ = θ(n) 6 1/2 − ε, and


suppose that p = p(n) ∈ (ε, 1 − ε) and r = r(n) ∼ 2 log1/p (n) are chosen so
that µr = Θ(n1+θ ). Then there exists a set G of r-uniform hypergraphs with

P Hr (G(n, p)) ∈ G → 1 as n → ∞, such that
 n
P Hr (G(n, p)) = H = π e(H) (1 − π)( r )−e(H) p−t(H) e−Λ (n,r)+O (n )+n
′ ∗ θ −Ω(1)

for every H ∈ G.
2 More precisely, we will only consider those n ∈ N for which (1.6) holds, a set that we
assume to be infinite. Let us also note that we don’t actually need θ to be constant, but can
allow it to vary in a compact interval.
3 Here, and throughout the paper, we write f = O ∗ (g) to mean that there is a constant C

such that f (n) = O g(n)(log n)C , and f = Θ∗ (g) if f = O ∗ (g) and g = O ∗ (f ).




5
We will prove the upper bound in Section 6.2 for a simple and natural explicit
family of graphs G (see Definition 6.11 and Theorem 6.12), and the lower bound
(which seems less important for applications) in Section 6.3 for a different (and
somewhat less natural) family G (see Definition 6.22 and Theorem 6.23).

1.4 Relationship to previous work


In [22], the second author studied, as here, the hypergraph Hr (G(n, p)) formed
by the copies of Kr in G(n, p), for r > 4 and p 6 n−2/r+γ for some γ = γ(r) > 0.
The main result there (extended to r = 3 by Annika Heckel in [6]) is that for
r
some π ′ ∼ π = p(2) , there is a coupling of Hr (G(n, p)) and Hr (n, π ′ ) such that

Hr (n, π ′ ) ⊂ Hr (G(n, p))

with high probability; as noted in [22], there is no similar coupling in the other
direction, precisely due to fact that Hr (G(n, p)) typically contains many more
pairs of intersecting hyperedges than Hr (n, π) does.
To compare the two results: the coupling result of [22] is useful for showing
that certain structures exist in G(n, p) (see, e.g., [2, 7, 12, 16, 21]), and it is
also relatively easy to apply, since it gives a direct comparison to a simple
distribution. However, it is only ‘one-way’, and loses something in that π ′ is not
equal to π. In contrast, Theorem 1.1 is two-way, and in a sense sharper, giving
fairly precise bounds on the actual probability of all outcomes within some
‘typical’ set. On the other hand, while in some sense it answers the distribution
question for Hr (G(n, p)) almost completely, it is not at first clear how to apply it;
we will illustrate one method in Section 7 (see in particular Lemma 7.6), which
we expect to have further applications. The earlier coupling result definitely
does not imply Theorem 1.1; the reverse implication could perhaps hold, but
seems hard to prove – it is often difficult to go from probabilities of individual
outcomes to coupling results. In summary, we believe that the two results are
complementary.
Mousset, Noever, Panagiotou and Samotij [19] proved results that are loosely
related to those given here. They consider, for example, the probability that
G(n, p) contains no copies of some given graph F (their actual context is much
more general, as ours will be). For F = Kr , this is asking for the probability
that Hr (G(n, p)) has no edges. They manage to give a very precise answer
to this question in many cases. Here, in contrast, we consider not one possible
outcome, but the entire set of ‘typical’ outcomes. This complicates the situation
considerably, and correspondingly our results are less precise. We will briefly
discuss the connection between the proof techniques used here and in [19] once
we have outlined our method (see Remark 2.1).
The rest of this paper is organized as follows: in Section 2 we describe
the abstract set-up for our general results, and the basic method of proof. In
Section 3, we prove the simple ‘warm-up’ result alluded to earlier. Then in
Section 4 we prove the main part of our main result, Theorem 4.1, giving an
upper bound on the probability of suitable outcomes. In Section 5 we prove a
corresponding lower bound with additional conditions. In Section 6 we consider

6
the specific case of copies of Kr , in particular deducing Theorems 1.1 and 1.4.
Finally, in Section 7 we turn to applications, proving Theorem 1.2 via a general
lemma that may be useful for applying Theorem 1.1 in other contexts.

2 Set-up and basic approach


Almost all of our arguments can be phrased in a rather abstract setting, which
we will do. For simplicity, as in the previous section, the reader may well wish
to consider the case of copies of a given graph H = Hn in G(n, p), and more
specifically the case H = Kr , where (for asymptotics) r and p are functions
of n; we call this the ‘standard setting’, and will often illustrate definitions by
saying what they correspond to in this case.
Set-up: Let H be an s-uniform hypergraph4 with vertex set X and edge set
E(H) = {E1 , . . . , EN }. Write Xp for the random subset of X obtained by
selecting each element independently with probability p, let Ai be the event
that Ei ⊂ Xp , and set π = ps and µ = πN . We will assume that 1 − π = Ω(1),
and moreover that H is symmetric, in the sense that for each i and j there is
an automorphism of H that maps Ei to Ej .
Throughout we consider the random set

I = i ∈ [N ] : Ai holds .

Our aim is to compare the distribution of I to that of the random subset [N ]π


of [N ]. To do so, given a set Y ⊂ [N ] (which we think of as corresponding to
‘yes’ outcomes), define [
R(Y ) = Ej
j∈Y

and for each j ∈


/ Y , let A′j = A′j (Y ) denote the event that

Ej \ R(Y ) ⊂ Xp \ R(Y ).

(In [22], the set R(Y ) was thought of as the set of ‘revealed’ elements.) Note
that I = Y if and only if R(Y ) ⊂ Xp and A′j (Y )c holds for every j ∈
/ Y , that is,
  \
I = Y = R(Y ) ⊂ Xp ∩ A′j (Y )c .
j ∈Y
/

We may of course think of the A′j as events in the probability space correspond-
ing to choosing a random subset of X \ R(Y ). Then
\ 
P(I = Y ) = p|R(Y )| · P A′j (Y )c .
j ∈Y
/

We bound the probability of the intersection by choosing an order ≺ on the


index set Y c = [N ] \ Y , and setting
 \ 
πj = P A′j (Y ) A′i (Y )c ,
i≺j

4 In
our standard setting X = E(Kn ) and s = e(H), and the edges of H correspond to the
edge-sets of the copies of H in Kn .

7
so that Y
P(I = Y ) = p|R(Y )| (1 − πj ). (2.1)
j∈Y c

Note that our order ≺ can (and will) depend on Y .


When evaluating conditional probabilities such as πj , we use a fairly stan-
dard approach, similar to the Warnke version [23] of the proof of the Janson
inequalities [10]. Let U and {Ui : i ∈ S} be up-sets. Partition S into S0 ∪ S1 so
that for all i ∈ S0 the events U and Ui are independent.5 For t ∈ {0, 1}, let
\
Dt = Uic ,
i∈St

so D0 and D1 are down-sets, and U is independent of D0 . The key formula is


 \   P(U ∩ D1 | D0 ) P(U ) − ε
P U Uic = P U | D0 ∩ D1 = = , (2.2)
P(D1 | D0 ) 1−δ
i∈S

where [  [ 
ε=P U ∩ Ui D0 and δ=P Ui D0 .
i∈S1 i∈S1

Sometimes we use the simpler bound


X 
P(U ) > P U | D0 ∩ D1 ) > P(U ) − ε > P(U ) − P U ∩ Ui , (2.3)
i∈S1

where the last inequality holds by Harris’ Lemma and the union bound. We
mostly (but not always) apply these bounds directly to the events A′i (Y ). That
is, when bounding πj for a fixed j ∈ [N ] \ Y , we let U = A′j (Y ) and {Ui : i ∈
S} = {A′i (Y ) : i ≺ j}. In this setting, we will find it convenient to define i ∼ j
to mean that i 6= j and Ei ∩ Ej 6= ∅,6 and to write

i→j if i ≺ j and i ∼ j and i 6→ j if i ≺ j and i 6∼ j.7

In particular, with this notation the quantities ε and δ above become


[  [ 
εj = P A′i (Y ) ∩ A′j (Y ) D0 (j) and δj = P A′i (Y ) D0 (j) ,
i→j i→j

where \
D0 (j) = A′i (Y )c , (2.4)
i6→j

and it follows from (2.2) that

P(A′j (Y )) − εj
πj = . (2.5)
1 − δj
5 Of course we can just take S to be precisely these i, but it may be convenient to shrink S
0 0
and enlarge S1 . The argument only requires independence of certain events, never dependence.
6 We could define i ∼ j if E \ R(Y ) and E \ R(Y ) intersect, but this doesn’t seem to
i j
change things much, and we would then need separate notation elsewhere in the proof when
considering whether Ei and Ej intersect for j ∈ Y .
7 Here we are abusing notation slightly, since 6→ is not the negation of →.

8
Moreover, if we define
X  X 
ε̂j = P A′i (Y ) ∩ A′j (Y ) and δ̂j = P A′i (Y ) , (2.6)
i→j i→j

then, by the union bound and Harris’ Lemma, we have εj 6 ε̂j and δj 6 δ̂j . In
this setting, (2.3) gives
  
P A′j (Y ) > πj > P A′j (Y ) − εj > P A′j (Y ) − ε̂j . (2.7)

Note also that


ε̂j > π δ̂j , (2.8)

by Harris’s Lemma and since P A′j (Y ) > π.
Remark 2.1. Using (2.5), bounds on εj and δj translate to bounds on πj .
In turn, we can improve on the simple upper bounds εj 6 ε̂j and δj 6 δ̂j by
applying inclusion–exclusion to deal with the union, and (2.2) to deal with con-
ditioning. This process can be repeated to higher and higher order, eventually
producing bounds where the error terms involve ‘clusters’ of events A′i (Y ) of
arbitrarily large size. (Here a cluster is a set of events inducing a connected
subgraph of the dependency digraph.) Remarkably, Mousset, Noever, Pana-
giotou and Samotij [19] managed to systematize something like this, giving an
expansion for the probability that no member of a (suitable) family of events
holds involving ‘cumulants’ up to any given order, and bounds on the error.
In principle we could try to apply their result to our collection of events
A′j (Y ). However this does not seem feasible: the collection is rather hard to
handle, with its properties depending on the set Y . Moreover, in our context,
any event A′j (Y ) that is affected by some Ai with i ∈ Y is already effectively
a cluster of size 2. Our plan here is essentially to expand the probability that
none of the A′j (Y ) holds to a formula including contributions from clusters of
the original events Ai of size 2, with error terms involving clusters of size 3.
This involves treating different A′j (Y ) differently. It is this grouping and the
calculation that constitute the work here; the way we apply (2.5) is relatively
simple, only going one step further using (2.3) to bound εj , and only in some
cases. Our method and results therefore seem almost orthogonal to those of [19].

3 The warm up
In this section we will warm ourselves up for the main theorems by proving a
weaker result, which will allow us to present some the key ideas without too
many distracting technical complications. To state this result, assume that we
are in the setting described in the previous section, recall that µ = πN denotes
the expected number of events Ai that hold, and set
1X 
∆2 = P Ai ∩ Aj , (3.1)
2 i∼j

where the sum runs over all ordered pairs of distinct indices in [N ] with i ∼ j.
Thus, in the standard setting where the sets Ei correspond to copies of a graph

9
H, ∆2 is exactly the expected number of pairs of (edge-)overlapping copies.
Recalling that H is s-uniform, for any Y ⊂ [N ], let

t(Y ) = s|Y | − |R(Y )| (3.2)

be the number of repeated elements (edges in the standard setting) in the union
S
i∈Y Ei , with an element (edge) appearing in k sets Ei counted k − 1 times.
We will prove the following weaker version of our main theorem.

Theorem 3.1. For every Y ⊂ [N ] with |Y | = O(µ), we have

P(I = Y ) 6 π |Y | (1 − π)N −|Y | p−t(Y ) eO(∆2 ) . (3.3)

Moreover, if ∆2 → 0 then there is a set T ⊂ P([N ]) of ‘typical’ outcomes such


that P(I ∈ T ) → 1, and

P(I = Y ) ∼ π |Y | (1 − π)N −|Y | (3.4)

for each Y ∈ T . Equivalently, the random sets I and [N ]π can be coupled to


agree with probability 1 − o(1).

In the case of copies of some graph H in G(n, p), if we do not expect to


see any pairs of copies intersecting even in single vertices, then a result such
as Theorem 3.1 is essentially trivial: the number X of copies has essentially a
Poisson distribution with mean µ, and given that X = k, all outcomes consisting
of k vertex-disjoint copies are equally likely by symmetry. At first it might
appear that this argument extends easily to the case ∆2 = o(1), but this is not
obvious: it is true that for any two sets Y1 and Y2 of k edge-disjoint copies, the
events Y1 ⊂ I and Y2 ⊂ I have exactly the same probability π k . But it does
not follow that I = Y1 and I = Y2 have the same probability: not all sets of k
edge-disjoint copies are isomorphic, and in general the conditional probability
that no other copies are present will be different for Y1 and Y2 .
Before starting the proof of Theorem 3.1, let us explain one key idea. Given
Y ⊂ [N ], i.e., a potential outcome for the random variable I, set
XX 
L2 (Y ) = P Ai | Aj , (3.5)
j∈Y i∼j

which is related to the extent to which the events that actually hold push up
the probability of other events. Recalling that j ∈ I if and only if Aj holds,
observe that X X 
L2 (I) = 1[Aj ] P Ai | Aj .
j∈[N ] i∼j

Since H is symmetric (by assumption), the second sum in both formulae takes
the same value (c, say) for every j ∈ [N ]. Hence L2 (Y ) = c|Y | is proportional
to |Y |. However, we also have
  X X  X 
E L2 (I) = P(Aj ) P Ai | Aj = P Ai ∩ Aj = 2∆2 (3.6)
j∈[N ] i∼j i∼j

10
and E[|I|] = µ, and therefore

2|Y |
L2 (Y ) = · ∆2 . (3.7)
µ
We will use variants of this argument several times, sometimes with extra in-
equalities, such as the inequality P(Ai )P(Aj ) 6 P(Ai ∩ Aj ). In general, we call
a sum such as L2 (Y ) a conditioned cluster expectation: note that it is not the
conditional expectation of the number of ‘clusters’ (defined later), but is a sum
of terms each of which can be thought of as a certain conditional expectation of
a contribution to the number of clusters. We call the function L2 because (i) the
clusters that are relevant here have size two, i.e., involve two of the events Ai ,
and (ii) the dependence on Y via indicator functions is linear.

Proof of Theorem 3.1. We start with the first statement, (3.3). Throughout the
proof we fix the ‘target’ outcome Y ⊂ [N ] with |Y | = O(µ), and set m = |Y |.
Observe first that, by the definition (3.2) of t(Y ), we have

p|R(Y )| = ps|Y |−t(Y ) = π m p−t(Y ) .

In order to prove (3.3), it will therefore suffice to show (see (2.1)) that
Y
(1 − πj ) 6 (1 − π)N −m eO(∆2 ) .
j∈Y c

Let us say that j ∈ Y c is neutral if Ej is disjoint from the set R(Y ), i.e., if
there is no i ∈ Y with i ∼ j. Note that A′j (Y ) = Aj if and only if j is neutral.
A key part of our strategy is to choose the ordering ≺ on the indices j ∈ Y c so
that neutral ones come first.
Recall from (2.6) the definition of ε̂j , and that εj 6 ε̂j , and observe that if
j is neutral and i → j, then i is also neutral, since i ≺ j and by our choice of
P
ordering. It follows that ε̂j = i→j P(Ai ∩ Aj ), and hence
X X
εj 6 ε̂j 6 ∆2 . (3.8)
j neutral j neutral

Hence, writing N0 for the number of neutral j, it follows from (2.7) that
Y Y
(1 − πj ) 6 (1 − π + ε̂j ) 6 (1 − π)N0 eO(∆2 ) ,
j neutral j neutral

since P(A′j (Y )) = P(Aj ) = π if j is neutral, and 1 − π is bounded away from


zero, by assumption. For non-neutral terms we simply use πj > 0, to obtain
Y Y
(1 − πj ) 6 (1 − πj ) 6 (1 − π)N0 eO(∆2 ) . (3.9)
j∈Y c j neutral

Now, by (3.5) and Harris’ lemma we have


XX  XX
L2 (Y ) = P Ai | Aj > P(Ai ) > π(N − m − N0 ), (3.10)
j∈Y i∼j j∈Y i∼j

11
where in the last step we used that fact that if i ∈ Y c is not neutral, then there
exists some j ∈ Y with i ∼ j, so these N − m − N0 terms each contribute at
least P(Ai ) = π. Finally, recall from (3.7) that L2 (Y ) = 2∆2 |Y |/µ = O(∆2 ),
since |Y | = O(µ), and hence

(1 − π)N −m−N0 = eO(∆2 ) , (3.11)

where we again used our assumption that 1 − π is bounded away from zero.
Combining this with (3.9), it follows that
Y
(1 − πj ) 6 (1 − π)N −m eO(∆2 ) ,
j∈Y c

as required.
In order to deduce the second part of the theorem, we need to show that if
∆2 = o(1), then with high probability |I| = O(µ) and t(I) = 0. The first of
these follows8 from Markov’s inequality, since E[|I|] = µ, and the second holds
because
 X 
P t(I) > 0 6 P Ai ∩ Aj = 2∆2 .
i∼j

Since
  
P I = Y 6 1 + o(1) P [N ]π = Y
for every such Y ⊂ [N ], by (3.3), it follows that (3.4) holds for every Y in some
family T such that I ∈ T with high probability. This in turn implies that the
total variation distance between the two distributions is o(1), as claimed.

We remark that under mild additional assumptions we can give an explicit


lower bound on the probabilities of suitable ‘good’ outcomes, essentially match-
ing the upper bound in (3.3). But, surprisingly, doing so turns out to be a little
awkward; we will give such a bound in a more complicated context later.

4 Including the effect of 2-clusters


In this section we will prove our main technical theorem, which we will use (in
Sections 6 and 7) to prove the upper bounds in Theorems 1.1, 1.2 and 1.4. In
Section 5, we will adapt the method used in this section to prove a two-way
bound in a slightly more restricted setting. Throughout the next two sections,
we will assume that we are in the general setting introduced in Section 2.
In order to state the main theorem of this section, we will need a few more
definitions. First, we say that a set S ⊂ [N ] of indices forms a cluster if the
relation i ∼ j induces a connected graph on S. The size of the cluster is |S|, and
a k-cluster is a cluster of size k. The cluster S is present in Y if S ⊂ Y . With
this terminology, ∆2 is simply the expected number of 2-clusters present in I.
More generally we write Wk (Y ) for the number of k-clusters that are present in
Y , and define
 
∆k = E Wk (I) . (4.1)
8 Moreprecisely, Markov tells us that |I| = o(µ/∆2 ) with high probability, which suffices
to prove (3.11), and hence (3.3), up to a factor of 1 + o(1).

12
Theorem 3.1 studied the distribution of edges in the hypergraph H[Xp ] (in the
standard setting, the distribution of copies of some graph H in G(n, p)) in the
case in which we either do not expect 2-clusters, or do, but are willing to tolerate
an error in our probability of the form eO(∆2 ) . Our aim in this section is to go
one step further, taking full account of the effect of 2-clusters, and giving error
terms that depend on the numbers of 3-clusters and 4-clusters in Y .
In order to state the most general version of the theorem we will prove, we
will need to define some further types of conditioned cluster expectation. Each
of them will have the key property that we can bound their expectation in terms
of ∆k for some k ∈ N. First, define
1X  
Q2 (Y ) = 1 i, j ∈ Y , (4.2)
2 i∼j

which is just the number of intersecting pairs of edges of H[Xp ] when I = Y .


 
Note that Q2 (I) = W2 (I), and hence E Q2 (I) = ∆2 . Next, let
X   
Q3 (Y ) = 1 i, j ∈ Y · P Ak | Ai , (4.3)
3-clusters {i,j,k}

and X   
Q4 (Y ) = 1 i, j ∈ Y · P Ai′ ∩ Aj′ | Ai ∩ Aj , (4.4)
′ ′
i∼i ∼j ∼j
i′ 6∼j, i6∼j ′

where the sum is over 4-tuples of distinct indices with the given intersection
   
structure. We will show below that E Q3 (I) = O(∆3 ) and E Q4 (I) = O(∆4 ).
We will also need two further simple definitions in order to state the theorem.
First, we will need the following slight modification of ∆2 :
1 X
∆02 = P(Ai ) · P(Aj ). (4.5)
2 i∼j

Note that this corresponds to the expected number of pairs of intersecting copies
if there were no dependence between copies. In other words, ∆02 is just the
expected number of pairs with i ∼ j in the random set [N ]π . We will also need
to make a (very weak) assumption on the quantity

φ = max P Ai | Aj . (4.6)
i6=j

We are now ready to state our main upper bound on P(I = Y ).

Theorem 4.1. Suppose that φ∆2 = O(∆3 ). Then, for every Y ⊂ [N ] with
|Y | = O(µ), we have
   
|Y | N −|Y | −t(Y ) 2|Y | 0

P(I = Y ) 6 π (1 − π) p exp − − 1 ∆2 − ∆2 + η(Y ) ,
µ

where  
η(Y ) = O ∆3 + φ · Q2 (Y ) + Q3 (Y ) + Q4 (Y ) + π 2 N .

13
Since the statement of Theorem 4.1 is rather complicated, a quick sanity
check is perhaps called for: note that taking Y = ∅, we obtain
 
P(I = ∅) 6 (1 − π)N exp ∆2 − ∆02 + O ∆3 + π 2 N ,

by (3.2), (4.2), (4.3) and (4.4). Since the cumulants (defined in [19, Section 1.3])
satisfy κ1 = µ and κ2 = ∆2 − ∆02 , this does indeed match (in this special case)
the result of Mousset, Noever, Panagiotou and Samotij (see [19, Theorem 11]),
although the error terms are different.

Remark 4.2. It will be easy to show that the bound φ∆2 = O(∆3 ) in The-
orem 4.1 holds in all of our applications. Indeed, it suffices that for any Ei
and Ej with a given intersection size, there is another Ek intersecting Ej in the
same number of elements. Then for each term P(Ai ∩ Aj ) contributing to ∆2 ,
by symmetry we find some k 6= i with P(Ak | Aj ) = φ, and obtain a contribution
of P(Ai ∩ Aj ∩ Ak ) > P(Ai ∩ Aj )P(Ak | Aj ) to the sum defining ∆3 .

To make sense of the error term in Theorem 4.1, let us first prove the bounds
claimed above on the expectation of Qi (I).
 
Lemma 4.3. E Qi (I) = O(∆i ) for each i ∈ {2, 3, 4}.

Proof. As noted above, it follows immediately from (4.2) that Q2 (I) = W2 (I),
 
so E Q2 (I) = ∆2 . Next, observe that
  X  
E Q3 (I) = P Ai ∩ Aj · P Ak | Ai
3-clusters {i,j,k}
X  
6 P Ai ∩ Aj · P Ak | Ai ∩ Aj = O(∆3 ), (4.7)
3-clusters {i,j,k}

by Harris’ Lemma and the definition (4.1) of ∆3 . Finally, note that


  X 
E Q4 (I) = P Ai ∩ Aj ∩ Ai′ ∩ Aj ′ = O(∆4 ), (4.8)
i∼i′ ∼j ′ ∼j
i′ 6∼j, i6∼j ′

as required, since each term in the sum corresponds to a 4-cluster.

Using Lemma 4.3, we obtain the following corollary of Theorem 4.1 with
more usable bounds. Recall that ω = ω(n) → ∞ as n → ∞ arbitrarily slowly.

Corollary 4.4. Suppose that φ∆2 = O(∆3 ), and that also µ > ω 2 , ∆2 = O(µ)
and ∆4 = O(∆3 ). Then there exists a collection G of ‘good’ outcomes Y ⊂ [N ],

with P I ∈ G → 1 as n → ∞, such that
 0 −1/2
+ω∆3 +π 2 N )
P I = Y 6 π |Y | (1 − π)N −|Y | p−t(Y ) e−∆2 +∆2 +O(ω∆2 µ (4.9)

for every Y ∈ G. Moreover, we can choose G to be the set of Y ⊂ [N ] such that


√  
|Y | − µ 6 ω µ and Qi (Y ) 6 ω · E Qi (I) (4.10)

for each i ∈ {2, 3, 4}.

14
Proof. Note first that the random set I satisfies the conditions (4.10) with
high probability by Chebyshev’s inequality and Markov’s inequality respectively,
since |I| has expectation µ and variance at most
X 
µ+ P Ai ∩ Aj = µ + 2∆2 = O(µ). (4.11)
i∼j

Now, let G be the set of Y ⊂ [N ] satisfying (4.10), and observe that |Y | = O(µ)
for every Y ∈ G, since µ > ω 2 . Observe also that
   
2|Y |  ω∆2
− 1 ∆2 − ∆02 = ∆2 − ∆02 + O √
µ µ

for every Y ∈ G, since ∆02 6 ∆2 , by Harris’ Lemma. To deduce (4.9) from


Theorem 4.1, observe that

φ · Q2 (Y ) + Q3 (Y ) + Q4 (Y ) = O(ω∆3 ) (4.12)

for every Y ∈ G, by Lemma 4.3 and (4.10), and hence that


   
2|Y | 0
 0 ω∆2 2
− − 1 ∆2 − ∆2 + η(Y ) = −∆2 + ∆2 + O √ + ω∆3 + π N
µ µ

for every Y ∈ G. The bound (4.9) therefore follows from Theorem 4.1.

Before diving into the details of the proof of Theorem 4.1, let us give a brief
sketch of the argument. Roughly speaking, our aim is to separate the effect of
2-clusters, which contribute to the main correction term, from that of larger
clusters, which we will throw into the error term. To do so, it will be useful
to partition the set Y c as follows: we say that an index j ∈ / Y is neutral if
there is no i ∈ Y with i ∼ j, that j is simple if there is exactly one i ∈ Y with
i ∼ j, and that j is complex otherwise. In evaluating the probability that none
of the events A′j (Y ) holds as a product of conditional probabilities as in (2.1),
we define an order ≺ on the indices j ∈ Y c so that neutral ones come first, then
simple ones, then complex ones. We will show that
Y    
2|Y | 0 2

(1 − πj ) 6 exp − µ + − 1 ∆2 + ∆2 + O ∆3 + π N (4.13)
µ
j neutral

and
Y   
2|Y |
(1 − πj ) = exp − · ∆2 + O ∆3 + φ · Q2 (Y ) + Q3 (Y ) + Q4 (Y ) .
µ
j simple

Together with the trivial bound πj > 0 for every complex j, by (2.1) this will
turn out to be sufficient to prove the theorem.
The following basically trivial lemma will be used in the proof below.

Lemma 4.5. Let D be a down-set, and define d(A) = P(A) − P(A | D) for any
event A. Then, if U1 , . . . , Uk are up-sets, we have

0 6 d(U1 ∪ · · · ∪ Uk ) 6 d(U1 ) + · · · + d(Uk ).

15
Proof. The first inequality is just Harris’ Lemma. For the second, starting with
the identity P(U1 ∪ U2 ) = P(U1 ) + P(U2 ) − P(U1 ∩ U2 ) and subtracting the same
identity for probabilities conditioned on D, we obtain

d(U1 ∪ U2 ) = d(U1 ) + d(U2 ) − d(U1 ∩ U2 ) 6 d(U1 ) + d(U2 ),

since U1 ∩ U2 is an up-set. This proves the case k = 2. The general case now
follows easily by induction on k.

The rest of this section is devoted to the proof of Theorem 4.1.

Proof of Theorem 4.1. Fix a possible outcome Y ⊂ [N ] with |Y | = O(µ), and


partition Y c into neutral, simple and complex terms as defined above. Fix an
ordering ≺ of the elements of Y c such that neutral terms come first, then simple
ones, then complex ones. Since the set Y will be fixed throughout the proof, we
will write A′j for A′j (Y ). Our first main aim is to prove (4.13), which bounds
the contribution of the neutral terms to (2.1).
Neutral terms: For neutral terms we use (2.5), which implies that

πj > (π − εj )(1 + δj ) = π − εj + πδj − εj δj , (4.14)

since if j neutral then P(A′j ) = P(Aj ) = π. We will consider the sum of each of
these terms one by one.
Claim 4.6.
X 2|Y |
π >µ− · ∆02 + O(π 2 N ).
µ
j neutral

Proof of Claim 4.6. Recall that if j ∈ [N ] is not neutral, then either j ∈ Y or


there exists i ∈ Y with i ∼ j. Since H is symmetric and P(Aj ) = π for every
j ∈ [N ], it follows that
X XX
π 6 π|Y | + P(Aj ) = c|Y | (4.15)
j∈[N ] not neutral i∈Y j∼i

for some c > 0. Since E[|I|] = µ and


XX  X  X
E P(Aj ) = E 1[Ai ] · P(Aj ) = P(Ai ) · P(Aj ) = 2∆02 ,
i∈I j∼i i∼j i∼j

it follows (as in the proof of (3.7)) that c = π + 2∆02 /µ, and hence
X  
2∆02 2|Y |
π > πN − π + |Y | = µ − · ∆02 + O(π 2 N ),
µ µ
j neutral

as claimed, since |Y | = O(µ) and µ = πN .

We now turn to the sum of the εj . In the present argument we only need
an upper bound, which is simply ∆2 as shown in (3.8). However, we will need
a lower bound later, and the same method applies (in a less intuitive way) to
the sum over πδj , for which we need a lower bound, so we give the details.

16
Claim 4.7. X
εj = ∆2 + O(∆3 ).
j neutral

Proof of Claim 4.7. Recall (see Section 2) the definition of εj , and that we write
i → j if both i ∼ j and i ≺ j. Noting that P(A′j ) = P(Aj ) if j ∈ Y c is neutral,
observe that X
ε j = B0 − B1 − B2 , (4.16)
j neutral

where X X 
B0 = P Ai ∩ Aj ,
j neutral i→j

is the main term,


X [  X [ 
 
B1 = P Ai ∩ Aj − P Ai ∩ Aj D0 (j)
j neutral i→j j neutral i→j

is the effect of conditioning on the down-sets D0 (j) (see (2.4)), and


X X X [ 
 
B2 = P Ai ∩ Aj − P Ai ∩ Aj
j neutral i→j j neutral i→j

is the slack in the union bound. We will show that

B0 = ∆2 + O(∆3 ), (4.17)

and that B1 and B2 are both O(∆3 ).


To prove (4.17), note first that B0 is the sum of P(Ai ∩ Aj ) over unordered
pairs {i, j} such that i ∼ j and i and j are both neutral (by our choice of ≺),
while ∆2 is the same sum without the restriction to neutral pairs. It follows
that X X 
∆2 − P Ai ∩ Aj 6 B0 6 ∆2 ,
k∈Y i∼j∼k

since if i and j are not both neutral, then either one of them (wlog i) is in Y
(note that we allow k = i in the second sum above), or there exists k ∈ Y with
(wlog) j ∼ k. By symmetry, the double sum is (as in the previous claim) equal
to c|Y | for some c > 0, and
X X  X
 
E P Ai ∩ Aj 6 P(Ak ) · P Ai ∩ Aj = O(∆3 ),
k∈I i∼j∼k i∼j∼k

since the terms with k = i contribute at most π∆2 6 φ∆2 = O(∆3 ), those with
k 6= i correspond to 3-clusters, and P(Ak ) · P(Ai ∩ Aj ) 6 P(Ai ∩ Aj ∩ Ak ), by
Harris’ Lemma. Since E[|I|] = µ, it follows that c = O(∆3 /µ), and hence
X X  
 ∆3 |Y |
P Ai ∩ Aj = O = O(∆3 ), (4.18)
µ
k∈Y i∼j∼k

since |Y | = O(µ). We therefore have B0 = ∆2 + O(∆3 ), as claimed.

17
To bound B1 , recall that D0 (j) is a down-set for each j, and therefore, by

applying Lemma 4.5 to the family of up-sets Ai ∩ Aj : i → j for each neutral
j, we have
X X  
B1 6 P Ai ∩ Aj − P Ai ∩ Aj | D0 (j) . (4.19)
j neutral i→j

To bound the right-hand side, we now apply (2.3) for each pair i → j such that

j is neutral, with U = Ai ∩ Aj , S = ℓ : ℓ 6→ j , Uℓ = Aℓ for each ℓ ∈ S, and
the partition S = S0 ∪ S1 given by setting

S1 = ℓ ∈ S : ℓ ∼ i .

Note that this is a legal partition, since ℓ ∈ S0 implies ℓ 6∼ j and ℓ 6∼ i, so the


events Ai ∩ Aj and Aℓ are independent, as required. Recalling that D0 (j) is the

intersection of the events Acℓ : ℓ 6→ j , it follows from (2.3) and (4.19) that9
X XX 
0 6 B1 6 P Ai ∩ Aj ∩ Aℓ . (4.20)
j neutral i→j ℓ6→j
ℓ∼i

Each triple (i, j, ℓ) with i → j, ℓ 6→ j and ℓ ∼ i corresponds to a 3-cluster,


since ℓ ∼ i ∼ j, and ℓ 6→ j implies that j 6= ℓ, so it follows from (4.20) that
B1 = O(∆3 ), as claimed.
Finally, to bound B2 (the loss in the union bound), simply observe that
X XX
0 6 B2 6 P Ai ∩ Aj ∩ Aℓ ) = O(∆3 ),
j neutral i→j ℓ→j
ℓ≺i

by inclusion–exclusion, and since each triple (i, j, ℓ) in the sum corresponds to


a 3-cluster, since i ∼ j ∼ ℓ and ℓ 6= i. Combining this with (4.16) and (4.17)
and our bound on B1 , the claim follows.

We can bound the sum of πδj over neutral j in the same way, using the fact
that π · P(Ai ) 6 P(Ai ∩ Aj ). Since the proof of the following claim is almost
identical to that of Claim 4.7, we will be somewhat briefer with the details.
Claim 4.8. X
π δj = ∆02 + O(∆3 ).
j neutral

Proof of Claim 4.8. Recalling the definition of δj , and that P(A′j ) = P(Aj ) = π
if j ∈ Y c is neutral, observe that
X
π δj = C0 − C1 − C2 , (4.21)
j neutral

where X X
C0 = π2 ,
j neutral i→j

9 Indeed,
P 
by (2.3) we have P(U ) − P U | D0 ∩ D1 ) 6 ℓ∈S1 P U ∩ Uℓ , which is equivalent
P
to the claimed inequality P(Ai ∩ Aj ) − P(Ai ∩ Aj | D0 (j)) 6 ℓ∈S1 P(Ai ∩ Aj ∩ Aℓ ).

18
is the main term,
X [  X [ 
C1 = π P Ai −π P Ai D0 (j)
j neutral i→j j neutral i→j

is the effect of conditioning on the down-sets D0 (j), and


X X X [ 
C2 = π P(Ai ) − π P Ai
j neutral i→j j neutral i→j

is the slack in the union bound. We will show that

C0 = ∆02 + O(∆3 ), (4.22)

and that C1 and C2 are both O(∆3 ). To prove (4.22), note first that if we
dropped the neutrality condition we would have exactly ∆02 . To show that
∆02 − C0 = O(∆3 ), we simply repeat the proof of (4.17). Indeed, we have
X X
∆02 − π 2 6 C0 6 ∆02 ,
k∈Y i∼j∼k

and since P(Ai ∩ Aj ) > π 2 , by Harris’ Lemma, it follows from (4.18) that the
double sum is O(∆3 ), giving ∆02 − C0 = O(∆3 ), as claimed.
We can also bound C1 and C2 as we did for B1 and B2 . Indeed, applying

Lemma 4.5 to the family of up-sets Ai : i → j for each neutral j, we have
X X 
C1 6 π P(Ai ) − P Ai | D0 (j) .
j neutral i→j

Moreover, applying (2.3) for each pair i → j such that j is neutral with U = Ai ,
and with the set S, the up-sets Uℓ , and the partition S = S0 ∪ S1 as before, we
obtain X XX 
0 6 C1 6 π P Ai ∩ Aℓ .
j neutral i→j ℓ6→j
ℓ∼i
  
Since P Aj | Ai ∩ Aℓ > π, we have π · P Ai ∩ Aℓ 6 P Ai ∩ Aj ∩ Aℓ , and so
the bound C1 = O(∆3 ) follows as before. Finally, observe that
X XX
0 6 C2 6 π P Ai ∩ Aℓ ) = O(∆3 ),
j neutral i→j ℓ→j
ℓ≺i
 
by inclusion–exclusion, and since π · P Ai ∩ Aℓ 6 P Ai ∩ Aj ∩ Aℓ , by Harris’
Lemma, and each triple (i, j, ℓ) in the sum corresponds to a 3-cluster. Combining
this with (4.21) and (4.22) and our bound on C1 , the claim follows.

Finally, we need to bound the sum of εj δj .


Claim 4.9. X X
ε j δj 6 ε̂j δ̂j = O(∆3 ).
j neutral j neutral

19
Proof of Claim 4.9. Recall from (2.6) the definitions of ε̂j and δ̂j , and that
εj 6 ε̂j and δj 6 δ̂j . To bound the second sum, recall that if j is neutral and
i → j then P(A′i ) = P(Ai ) and P(A′i ∩ A′j ) = P(Ai ∩ Aj ), and therefore
XX  XX 
ε̂j δ̂j = P Ai ∩ Aj · P(Ak ) 6 P Ai ∩ Aj · P(Ak )
i→j k→j i∼j k∼j

for each neutral j. Summing over neutral j now gives


X XXX  
ε̂j δ̂j 6 P Ai ∩ Aj · P(Ak ) = O π∆2 + ∆3 = O(∆3 ),
j neutral j i∼j k∼j

since the terms with k = i contribute O(π∆2 ) = O(∆3 ), and the terms with
 
k 6= i contribute at most O(∆3 ), since P Ai ∩ Aj · P(Ak ) 6 P Ai ∩ Aj ∩ Ak
and each triple corresponds to a 3-cluster.

Putting the pieces together, from (4.14) and Claims 4.6–4.9, we have
X  
2|Y |
πj > µ − − 1 ∆02 − ∆2 + O(∆3 ) + O(π 2 N ),
µ
j neutral

and therefore
Y    
2|Y | 0 2

(1 − πj ) 6 exp − µ + − 1 ∆2 + ∆2 + O ∆3 + π N , (4.23)
µ
j neutral

as claimed in (4.13). It remains to deal with the simple and complex j.


Simple terms: For simple j we will be able to use (2.7) rather than (2.5), since
a simple j already corresponds (in some sense) to a cluster of size 2, so we only
P
need to evaluate j simple πj to ‘first order’. Our main task will be to prove the
following claim.
Claim 4.10.
X 2|Y |  
πj = · ∆2 + O ∆3 + φ · Q2 (Y ) + Q3 (Y ) + Q4 (Y ) .
µ
j simple

Proof of Claim 4.10. Observe first that, by (2.6) and (2.7), we have
X X X X X 
πj − P(A′j ) 6 ε̂j = P A′i ∩ A′j . (4.24)
j simple j simple j simple j simple i→j

Observe that if j is simple then P(A′j ) = P Aj | Ai for the unique i ∈ Y such
that i ∼ j. Thus, by (3.5) and (3.7), we have
X XX  2|Y |
P(A′j ) 6 P Aj | Ai = L2 (Y ) = · ∆2 .
j∼i
µ
j simple i∈Y

To prove a lower bound, note that a pair (i, j) with i ∈ Y and j ∼ i contributes
P
to L2 (Y ) − j simple P(A′j ) if and only if at least one of the following happens:
j ∈ Y , or there is some k ∈ Y \ {i, j} such that k ∼ j. Hence
X X  X X 
0 6 L2 (Y ) − P(A′j ) 6 P Aj | Ai + P Aj | Ai . (4.25)
j simple i,j∈Y i,k∈Y i∼j∼k
i∼j i6=k

20
Now, observe that X  
P Aj | Ai = O φ · Q2 (Y ) ,
i,j∈Y
i∼j

by (4.2) and (4.6), and that


X X  X   
P Aj | Ai 6 1 i, k ∈ Y · P Aj | Ai = Q3 (Y ),
i,k∈Y i∼j∼k 3-clusters {i,j,k}
i6=k

by (4.3). Recalling (3.7), it follows that


X 2|Y | 
P(A′j ) = · ∆2 + O φ · Q2 (Y ) + Q3 (Y ) . (4.26)
µ
j simple

To bound the right-hand side of (4.24), recall that if j is simple and i → j,


then i is either neutral or simple, by our choice of the ordering ≺. We first

consider terms with i neutral. In such terms A′i = Ai and P A′i ∩ A′j =

P Ai ∩ Aj | Ak , where k is the unique element of Y with j ∼ k. The total
contribution of such terms is at most
X X 
P Ai ∩ Aj | Ak = c|Y |
k∈Y i∼j∼k
i6=k

for some c > 0, by symmetry. Since E[|I|] = µ and


X X  X
 
E P Ai ∩ Aj | Ak = P Ai ∩ Aj ∩ Ak = O(∆3 ),
k∈I i∼j∼k i∼j∼k
i6=k i6=k

it follows that c = O(∆3 /µ), and hence


X X  
 ∆3 |Y |
P Ai ∩ Aj | Ak =O = O(∆3 ), (4.27)
µ
k∈Y i∼j∼k
i6=k

since |Y | = O(µ). On the other hand, if i and j are both simple, then there
is a unique ℓ ∈ Y such that i ∼ ℓ. Here there are two cases. If k = ℓ then
 
P A′i ∩ A′j = P Ai ∩ Aj | Ak , and since i ∼ j ∼ k ∼ i and k ∈ Y , the
contribution is at most
X X 
P Ai ∩ Aj | Ak = O(∆3 ),
k∈Y i∼j∼k
i6=k

by (4.27), while if k 6= ℓ, then it is at most


X X 
P Ai ∩ Aj | Ak ∩ Aℓ 6 Q4 (Y ),
k,ℓ∈Y i,j∈Y c
k6=ℓ ℓ∼i∼j∼k
i6∼k, j6∼ℓ

by (4.4). Hence
X X  
P A′i ∩ A′j = O ∆3 + Q4 (Y ) .
j simple i→j

21
Combining this with (4.24) and (4.26), we obtain
X 2|Y |  
πj = · ∆2 + O ∆3 + φ · Q2 (Y ) + Q3 (Y ) + Q4 (Y ) , (4.28)
µ
j simple

as claimed.

It follows from Claim 4.10 that


Y   
2|Y |
(1 − πj ) 6 exp − · ∆2 + O ∆3 + φ · Q2 (Y ) + Q3 (Y ) + Q4 (Y ) ,
µ
j simple

and combining this with (4.23), we obtain


Y    
2|Y | 0

(1 − πj ) 6 exp − µ − − 1 ∆2 − ∆2 + η(Y ) ,
µ
j neutral or simple

where  
η(Y ) = O ∆3 + φ · Q2 (Y ) + Q3 (Y ) + Q4 (Y ) + π 2 N .
Q
Finally, using the trivial bound j complex(1 − πj ) 6 1, and noting that
 
(1 − π)N −|Y | = exp − πN + π|Y | + O(π 2 N ) = exp − µ + O(π 2 N ) ,

and that p|R(Y )| = π |Y | p−t(Y ) , the bound in Theorem 4.1 follows from (2.1),
completing the proof of the theorem.

5 The lower bound


In this section we prove a lower bound on the probability P(I = Y ) that a
certain subset of our events Ei hold and the rest do not (corresponding, in the
standard setting, to a specific set of copies of H being the ones that are present
in G(n, p)). This bound will roughly match the upper bound in Theorem 4.1, up
to the error term. Here there is a surprising annoyance, caused by the ‘complex’
terms j. Recall that j ∈ Y c is complex if there are two or more elements i ∈ Y
such that i ∼ j. Fixing Y ⊂ [N ], and defining A′j = A′j (Y ) as before, let
X X
C(Y ) = P(A′j ) = p|Ej \R(Y )| . (5.1)
j complex j complex

Then C(I) is a random variable, and its expectation counts certain clusters.
Unfortunately, we do not in general have good control on their size. Roughly
speaking, we expect the main contribution to come from cases where there are
exactly two elements i ∈ Y with i ∼ j; note that the expectation of the sum of
such terms is O(∆3 ). However, we can’t simply ignore additional intersections
between Ej and sets Ei with i ∈ Y , since they may increase the conditional
probability of A′j .
We first state a result with an error term involving C(Y ). Then we shall
discuss methods of bounding C(Y ). We need a further very mild assumption,
namely that Y is possible, meaning that P(I = Y ) > 0. Note that this is
equivalent to there not existing any j ∈ Y c with Ej ⊂ R(Y ).

22
Theorem 5.1. Suppose that 1 − p = Ω(1), ∆02 6 µ/4, and φ∆2 = O(∆3 ).
Then, for every possible Y ⊂ [N ] with |Y | = O(µ), we have
   
2|Y | 
P(I = Y ) = π |Y | (1 − π)N −|Y | p−t(Y ) exp − − 1 ∆2 − ∆02 + η ′ (Y ) ,
µ

where
 
η ′ (Y ) = O ∆3 + φ · Q2 (Y ) + Q3 (Y ) + Q4 (Y ) + π 2 N + C(Y ) .

Note that the only differences from Theorem 4.1 are the additional (mild)
assumptions that Y is possible, 1 − p = Ω(1) and ∆02 6 µ/4, the two-way rather
than one-way bound, and the appearance of C(Y ) in the error term η ′ (Y ). We
remark that a corresponding two-way version of Corollary 4.4 also follows.
To prove Theorem 5.1, we simply go through the proof of Theorem 4.1,
noting that almost all of the bounds proved there are two-way, and showing
how to give lower bounds in the remaining places.

Proof of Theorem 5.1. The first place in the proof of Theorem 4.1 where we
gave a one-way bound was (4.14). We claim that, using (2.5), we can replace
this by
π − εj
πj = = (π − εj )(1 + δj ) + O(πδj2 ).
1 − δj
This will follow if we can show that δj 6 1/2 for each neutral j ∈ Y c . To see
why this is true, note that if j is neutral, then
X X X 2∆02 2∆02 1
δj 6 δ̂j = P(A′i ) = π6 π= = 6 ,
i→j i→j i∼j
πN µ 2

where the first two steps follow from (2.6) and Harris’ Lemma, the third holds
because i is neutral for every i ≺ j, and hence P(A′i ) = P(Ai ) = π, the fifth
holds by (4.5) and since H is symmetric, and the last holds by assumption.
It follows that the loss in (4.14) is O(πδj2 ) for each neutral j, and by (2.8)
and Claim 4.9, the sum of these terms is at most
X X X
πδj2 6 π δ̂j2 6 ε̂j δ̂j = O(∆3 ).
j neutral j neutral j neutral

The only place in the proof of Claim 4.6 in which we gave a one-way bound
was in (4.15); the loss here is due to terms j that we have overcounted because
either j ∈ Y and there is some i ∈ Y with i ∼ j, or j ∈ Y c and there is more
than one i ∈ Y with i ∼ j. The loss is therefore at most
X X X 
π+ π = O φ · Q2 (Y ) + Q3 (Y ) , (5.2)
i,j∈Y i,k∈Y i∼j∼k
i∼j i6=k

by (4.2) and (4.3), and this is already included in the error term η(Y ). Since
Claims 4.7–4.10 all give two-way bounds, the only other loss for the neutral and
simple terms is in the inequality 1 − πj 6 e−πj , which we used twice, in (4.23)

23
and in the final calculation. In both cases, we can replace it by the two-way
bound
log(1 − πj ) = −πj + O(πj2 ),

which holds because πj = π for every neutral j, and πj 6 φ = maxi6=j P Aj | Ai
for every simple j, and 1 − φ > 1 − p = Ω(1), by assumption. The total loss in
the inequality is therefore at most
 
O πµ + φ∆2 + φη(Y ) = O π 2 N + ∆3 + η(Y ) = O(η(Y )),

since |Y | = O(µ) and φ∆2 = O(∆3 ), and also φ 6 1.


What remains is to deal with complex πj , where we previously just used the
trivial inequality πj > 0. Here we simply note that from the first inequality in
(2.7) (which is simply Harris’ Lemma), we have
X X
πj 6 P(A′j ) = C(Y ),
j complex j complex

by (5.1). Now, since we only consider possible sets Y , we have Ej 6⊂ R(Y ) for
every j ∈ Y c , and therefore πj 6 P(A′j ) = p|Ej \R(Y )| 6 p, which is bounded
away from 1, by assumption. It follows that log(1 − πj ) = O(πj ), so
Y
(1 − πj ) = eO(C(Y )) ,
j complex

completing the proof.

This result is of course rather a cop-out: we have expressed the final error
term simply as what comes out of the proof. We now show that it is indeed
useful, by giving two approaches to bounding C(Y ). The first is most useful for
the case of subgraphs of constant size. Recall that H is s-uniform.

Lemma 5.2. Suppose that s = O(1). Then


  
E C(I) = O ∆3 + · · · + ∆s+1 .

The key to the proof is the following definition.

Definition 5.3. A star-cluster S is a pair (j, T ) where j ∈ [N ] and T ⊂ [N ]


with |T | > 2 satisfy
Ej 6⊂ R(T ),
and if |T | > 3, then
Ei ∩ Ej 6⊂ R(T \ {i})
for every i ∈ T .

In words, the centre of the star Ej is not contained in the union of the leaves
{Ei : i ∈ T }, and the set of leaves is minimal given Ej ∩ R(T ), meaning that if
|T | > 2, then each leaf intersects Ej in some element that is not contained in
any of the other leaves. Note that the definition implies that |T | 6 |Ej |.

24
We will say that the star-cluster S = (j, T ) is pre-present in Y if T ⊂ Y ,
and present in Y if also j ∈ Y . Define

π0 (S) = p|R(T )| , π1 (S) = p|Ej ∪R(T )| and πc (S) = p|Ej \R(T )| ,

so π0 (S) is the probability that S is pre-present in I, π1 (S) is the probability


that S is present in I, and πc (S) is the probability that S is present in I,
conditioned on the event that S is pre-present in I. Let us also define
X  
b )=
C(Y 1 S is pre-present in Y · πc (S), (5.3)
S

and make the following simple but key observation.


b ) for every possible Y ⊂ [N ].
Observation 5.4. C(Y ) 6 C(Y

Proof. Fix Y ⊂ [N ], let j be a complex term, and let T be a minimal subset of


Y such that Ej ∩ R(Y ) ⊂ R(T ) and |T | > 2. Then S = (j, T ) is a star-cluster,
since Ej 6⊂ R(Y ) (because Y is possible and i ∈ Y c ), and if |T | > 3, then
Ei ∩ Ej 6⊂ R(T \ {i}) for every i ∈ T , by the minimality of T . Moreover, S is
pre-present in Y , since T ⊂ Y . Since P(A′j ) = πc (S), it follows that
X X  
C(Y ) = P(A′j ) 6 1 S is pre-present in Y · πc (S) = C(Y
b ),
j complex S

as claimed.
 
We can now easily deduce the claimed bound on E C(I) .

Proof of Lemma 5.2. Observe first that


  X X
b
E C(I) = π0 (S)πc (S) = π1 (S),
S S

and that each S corresponds to a 3-cluster if s = 2, and a cluster of size between


3 and s if s > 3. By Observation 5.4, and since s = O(1), it follows that
    X 
b
E C(I) 6 E C(I) = π1 (S) = O ∆3 + · · · + ∆s+1 ,
S

as claimed.

At this point we only outline our second method for bounding C(Y ), since it
is probably best illustrated by the concrete example in Section 6.2. The idea is
to consider a set F of ‘forbidden configurations’, which we will take to be certain
types of cluster that are not likely to appear. Then we may define a ‘legal star-
cluster’ (j, T ) to be a star-cluster whose leaves do not form any configuration in
F . In place of Observation 5.4, we obtain
X  
C(Y ) 6 C bF (Y ) := 1 S is pre-present in Y · πc (S)
S legal

for every F -free possible set Y ⊂ [N ]. If I is with high probability F -free and
 
E C bF (I) is small, then we obtain good control on C(I) with high probability.

25
6 Cliques in G(n, p)
In this section we give two applications of our main results to cliques in G(n, p),
which in particular imply Theorems 1.1 and 1.4. Throughout this section, as in
Subsections 1.1 and 1.3, we consider copies of Kr in G(n, p), where

3 6 r = r(n) ∈ N and p = p(n) ∈ (0, 1 − ε)



for some constant ε > 0. We will apply Theorems 4.1 and 5.1 to the 2r -uniform

hypergraph H with vertex set X = E(Kn ) and N = nr edges E1 , . . . , EN
corresponding to the edge sets of copies of Kr . Observe that Xp ∼ E(G(n, p)),
r
1 − π > 1 − p > ε, where π = p(2) , and that H is symmetric (via permutations of
the vertices of Kn ), so H satisfies the assumptions from the start of Section 2.

In addition to the 2r -uniform hypergraph H, we will consider the (random)
r-uniform hypergraph Hr (G(n, p)) that encodes the vertex sets of the copies of
Kr in G(n, p).10 We will write H for a possible outcome for Hr (G(n, p)) (so H
corresponds to the set Y ⊂ [N ]), and emphasize that H is now an r-uniform
hypergraph with vertex set V (Kn ). For concreteness, let us write

Y (H) = i ∈ [N ] : Vi ∈ E(H) , (6.1)

where Vi is the vertex set of the copy of Kr corresponding to Ei ∈ E(H), and


note that Y (H) is possible if and only if H = Hr (G) for some graph G ⊂ Kn .
Let us also define

C(H) = C(Y (H)) and Qi (H) = Qi (Y (H)) (6.2)

for each i ∈ {2, 3, 4}, where C(Y ) and Qi (Y ) were defined in (4.2), (4.3), (4.4)
and (5.1). Note that t(H), defined in (1.2), is equal to t(Y (H)) (see (3.2)).
Finally, recall that ω = ω(n) → ∞ as n → ∞ arbitrarily slowly.

6.1 The case r constant


In this subsection we will prove Theorem 1.1, and also (as promised in the
introduction) a similar result when r ∈ {3, 4}. Before stating it, let us first
define the explicit family of ‘good’ hypergraphs G for which we will prove the
 r
claimed two-way bound. Recall that µr = nr p(2) .

Definition 6.1. An r-uniform hypergraph H with vertex set V (Kn ) is good if


H = Hr (G) for some graph G,
√  
e(H) − µr 6 ω µr and Qi (H) 6 ω · E Qi Hr (G(n, p))
 
for each i ∈ {2, 3, 4}, and moreover C(H) 6 ω · E C Hr (G(n, p)) .

The following theorem immediately implies Theorem 1.1.


10 In the particular case of copies of K , it is more natural to record the set of vertices as a
r
hyperedge, than the set of edges.

26
Theorem 6.2. For each r > 3, there exists γ > 0 such that if p 6 n−2/r+γ
and µr → ∞, then
 n
P Hr (G(n, p)) = H = π e(H) (1 − π)( r )−e(H) p−t(H) e−∆2 +∆2 +O(ωξ+n ) (6.3)
0 −γ

for every good H, where


 p

 n5 p7 + n5 p7 if r=3


ξ= n3 p6 + n8 p16 if r=4 (6.4)



 µr+1 if r > 5.

Moreover, Hr (G(n, p)) is good with high probability.


To avoid repetition, let us fix r > 3 and p = p(n) 6 n−2/r+γ throughout this
subsection, where γ > 0 is sufficiently small, and let G be the collection of good
r-uniform hypergraphs with vertex set V (Kn ). Let us begin with the easiest
task: showing that Hr (G(n, p)) is good with high probability.
Lemma 6.3. Hr (G(n, p)) is good with high probability.
Proof. The bounds on C(H) and Qi (H) hold with high probability by Markov’s
inequality, so it will suffice to prove the bound on e(H). To do so, we will first
show that ∆2 = O(µr ). Indeed, we have

X r−1
∆2 r t  r  
= O nr−t p(2)−(2) = O nr−2 p(2)−1 + npr−1 = O n2/r−1+O(γ) , (6.5)
µr t=2

where the second step holds by convexity, and the third by our bound on p,
r
which implies that nr−2 p(2)−1 6 n2/r−1+O(γ) and npr−1 6 n2/r−1+O(γ) . Since
r > 3, it follows from (6.5) that if γ is sufficiently small, then ∆2 = O(µr ), as
claimed. Now, recalling that e(Hr (G(n, p))) has expectation µr and variance at
most µr + 2∆2 , by (4.11), the lemma follows by Chebyshev’s inequality.

We will deduce Theorem 6.2 from Theorem 5.1; our only tasks will be to
verify that the conditions of Theorem 5.1 are satisfied, and that the bound
given by the theorem implies (6.3) for all H ∈ G. To check that the conditions
of Theorem 5.1 are satisfied, note first that
 2
µ
φ 6 p = o(1) and ∆2 = O 2r = o(µr ),
0
(6.6)
n
r
since µr 6 nr p(2) 6 n1+O(γ) = o(n2 ) if γ > 0 is sufficiently small. Observe also
that φ∆2 = O(∆3 ) holds by Remark 4.2. We may therefore apply Theorem 5.1
to obtain the following bound.
Lemma 6.4. For every H ∈ G, we have
 n 2e(H) 0
P Hr (G(n, p)) = H = π e(H) (1 − π)( r )−e(H) p−t(H) e−( µr −1)(∆2 −∆2 )+η (H) ,

where
 
η ′ (H) = O ∆3 + φ · Q2 (H) + Q3 (H) + Q4 (H) + π 2 N + C(H) .

27
Proof. We have already verified the general conditions of Theorem 5.1. To check
that the conclusion holds for the hypergraph H ∈ G, observe first that the set
Y (H) is possible, since H = Hr (G) for some graph G. Moreover, if we choose
ω = ω(n) so that µr > ω 2 , then H ∈ G implies that |Y (H)| = e(H) = O(µr ).
The bound in Theorem 5.1 therefore holds for Y (H), as claimed.

Our next aim is to bound the error term η ′ (H) for all H ∈ G. We start by
comparing ∆k for different k.

Lemma 6.5. If 2 6 k = O(1), then ∆k = O(∆k−1 ), where ∆1 = µr .

Proof. A k-cluster can be thought of (in at least one way) as arising from a
(k − 1)-cluster C by adding a new copy Ei of Kr sharing some number t > 2 of
vertices with C. Given C, there are at most O(1) choices for the shared vertices,
and at most nr−t choices for the new vertices in C. The conditional probability
r t
(given that C is present) that Ei is also present is then at most p(2)−(2) , since

Ei can share at most 2t edges with C. It follows that

X r
∆k r t  r 
= O nr−t p(2)−(2) = O nr−2 p(2)−1 + 1 = O(1),
∆k−1 t=2

as claimed, where the second step holds by convexity, and the final step holds
r
because nr−2 p(2)−1 6 n2/r−1+O(γ) 6 1 if γ is sufficiently small.

We can now bound most of the terms in η ′ (H).

Lemma 6.6.
φ · Q2 (H) + Q3 (H) + Q4 (H) = O(ω∆3 )
for every H ∈ G.

Proof. By Lemma 4.3 and Definition 6.1, we have


 
Qi (H) 6 ω · E Qi Hr (G(n, p)) = O(ω∆i )

for each i ∈ {2, 3, 4}. Since (as noted above) φ∆2 = O(∆3 ) and ∆4 = O(∆3 ),
by Lemma 6.5, the claimed bound follows.

In order to obtain the required bound on η ′ (H), it will now suffice to show
that ∆3 = O(ξ), where ξ was defined in (6.4), since by Lemmas 5.2 and 6.5 this
will imply the same bound for the expectation of C(Hr (G(n, p))).
In the case r > 5, we will actually obtain the following stronger and more
general result, with essentially no extra effort. Recall that r ∈ N is fixed and
p 6 n−2/r+γ for some small γ > 0.

Lemma 6.7. If r > 5 and C is a k-cluster for some 3 6 k = O(1), then the
expected number of copies of C present in G(n, p) is O(µr+1 ) if C has exactly
r + 1 vertices, and is n−Ω(1) otherwise.

Proof. We list the hyperedges/copies of Kr in C in an order E1 , . . . , Ek such that


each after the first shares at least one edge of Kn with the union of the previous
ones. For each i ∈ [k], let Ci be the i-cluster formed by the edges E1 , . . . , Ei , and

28
let Ni denote the expected number of labelled copies of Ci present in G(n, p).
Thus N1 = µr , and
r t 
N2 = O µr · nr−t p(2)−(2) ,
where t is the number of vertices of Kn that E1 and E2 share. Similarly, if Ei
shares 2 6 t 6 r − 1 vertices of Kn with E1 ∪ · · · ∪ Ei−1 , then from (6.5) we
have
Ni r t  
= O nr−t p(2)−(2) = O n2/r−1+O(γ) ,
Ni−1
while if Ei shares r vertices with E1 ∪ · · · ∪ Ei−1 then Ni = O(Ni−1 ), and if
moreover Ei contains a new edges, then Ni = O(pa Ni−1 ).
Let us say that step i is ordinary if Ei contains a vertex not in E1 ∪· · ·∪Ei−1 .
Note that step 2 is always ordinary; if there are at least two ordinary steps, then
 
Nk = O µr · n4/r−2+O(γ) = O n−1/5+O(γ) = n−Ω(1) ,
r
since µr 6 nr p(2) 6 n1+O(γ) and recalling that r > 5 and γ is sufficiently small.
This leaves the case where i = 2 is the only ordinary step. Let |E1 ∪ E2 | =
2r − t, and suppose first that t = r − 1. Then there are r + 1 vertices in total and
(since k > 3 and the Ei are distinct) C must contain all of the edges between
these vertices, so in this case Nk = O(µr+1 ). On the other hand, if t 6 r − 2,
then E1 ∪ E2 does not contain a copy of Kr− , so E3 must contain at least two
edges not in E1 ∪ E2 , and hence

Nk = O(N3 ) = O(p2 N2 ) = O p2 · µr · n2/r−1+O(γ) = n−Ω(1) ,

since p 6 n−2/r+γ , µr 6 n1+O(γ) and γ is sufficiently small.

Before continuing, let’s quickly note the following immediate corollary of


Lemma 6.7.
Corollary 6.8. If r > 5 and p 6 n−2/r+γ , then with high probability every
k-cluster of copies of Kr in G(n, p) with k > 3 forms a copy of Kr+1 .
Proof. By Lemma 6.7 and Markov’s inequality, with high probability G(n, p)
does not contain any k-cluster with more than r + 1 vertices. Since every k-
cluster with r + 1 vertices forms a copy of Kr+1 , the corollary follows.

We can now prove the claimed bound on ∆3 .


Lemma 6.9. ∆3 = O(ξ).
Proof. If r = 3, then any 3-cluster is either a copy of K4 or has exactly 5 vertices
and 7 edges, so in this case

∆3 = O n4 p6 + n5 p7 = O(ξ),

as claimed, since µr → ∞ implies that pn → ∞. If r = 4, then one can easily


check that any 3-cluster is either a copy of K5 , or has 6 vertices and at least 12
edges, or 7 vertices and at least 14 edges, or 8 vertices and at least 16 edges. It
follows that

∆3 = O n5 p10 + n6 p12 + n7 p14 + n8 p16 = O(ξ),

29
where the final step follows by considering the cases p2 n 6 1 and p2 n > 1
separately. We may therefore assume that r > 5, in which case our task is to
show that ∆3 = O(µr+1 ). We may also assume that p = n−2/r+γ , since every

3-cluster has at least r+1
2 edges, so ∆3 /µr+1 is increasing in p. It follows that
µr+1 > 1, and therefore, by Lemma 6.7, that ∆3 = O(µr+1 ), as required.

We are now ready to deduce Theorem 6.2 from Theorem 5.1.

Proof of Theorem 6.2. Let G be the set of good r-uniform hypergraphs from
Definition 6.1, and observe that Hr (G(n, p)) ∈ G with high probability, by
Lemma 6.3. Next, by Lemma 6.4, for every H ∈ G we have
 n 2e(H) 0
P Hr (G(n, p)) = H = π e(H) (1 − π)( r )−e(H) p−t(H) e−( µr −1)(∆2 −∆2 )+η (H) .

Moreover, we have
  
η ′ (H) = O ω · ∆3 + π 2 N + ω · E C(Hr (G(n, p))) (6.7)

for every H ∈ G, by Definition 6.1 and Lemma 6.6. Moreover, it follows from
Lemmas 5.2, 6.5 and 6.9 that
 
∆3 + E C(Hr (G(n, p))) = O(∆3 ) = O(ξ), (6.8)

and it is easy to check that


r r+1
π 2 N 6 nr p2(2) < nr+1 p(2 )=µ
r+1 6 ξ (6.9)

for all r > 3, and hence η ′ (H) = O(ξ).


Finally, observe that
   
2e(H) 0
 0 ω∆2
− 1 ∆2 − ∆2 = ∆2 − ∆2 + O √ (6.10)
µr µr

for every H ∈ G, since e(H) ∈ µr ± ω µr , by Definition 6.1, and ∆2 > ∆02 . It
therefore only remains to show that
∆2
√ = O(ξ) + n−Ω(1) . (6.11)
µr
p
When r = 3 this holds since ∆2 = O(n4 p5 ), µ3 = Θ(n3 p3 ) and ξ > n5 p7 , and
when r = 4 since ∆2 = O(n5 p9 + n6 p11 ), µ4 = Θ(n4 p6 ) and ξ > n3 p6 + n4 p8 .
Finally, when r > 5, it holds because, by (6.5) and since µr 6 n1+O(γ) , we have

∆2 √  
√ = O n2/r−1+O(γ) µr = O n2/r−1/2+O(γ) = n−Ω(1)
µr

for γ sufficiently small. Combining (6.11) with (6.7), (6.8), (6.9) and (6.10), we
obtain the claimed bound, completing the proof of Theorem 6.2.

To finish the subsection, let us quickly observe that Theorem 6.2 implies
Theorem 1.1.

30
Proof of Theorem 1.1. To deduce Theorem 1.1 from Theorem 6.2, simply note
that
Λ(n, r) = ∆2 − ∆02 ,
by (1.1), (3.1) and (4.5), and recall that ξ = µr+1 when r > 5.

Remark 6.10. The use of Markov’s inequality to bound the random variables
 
Qi Hr (G(n, p)) and C Hr (G(n, p)) by ω times their expectations is rather
inefficient; in many cases it should be possible to show that these quantities are
concentrated around their means.

6.2 The case p constant


In this subsection we will prove the upper bound in Theorem 1.4. Recall from
the statement of the theorem that r = r(n) ∼ 2 log1/p (n) for some p bounded
away from 0 and 1, and µr = Θ(n1+θ ) for some −1 + ε 6 θ = θ(n) 6 1/2 − ε.
Given a hypergraph H, let us write tk (H) for the number of (unordered)
pairs of hyperedges of H sharing exactly k vertices, and define
    
 1 n r n − r 2(r2)−(k2)
νk = E tk Hr (G(n, p)) = p (6.12)
2 r k r−k
and
  k
νk0 = E tk Hr (n, π) = p(2) νk . (6.13)
Observe that Λ′ (n, r) = ν2 − ν20 , where Λ′ (n, r) was defined in (1.7).
We will prove that Theorem 1.4 holds with G equal to the following family of
r-uniform hypergraphs. The purpose of the constants C and δ in the definition
is to allow us to obtain the conclusion of the theorem for as large a family of
hypergraphs as possible. We will show below (see Lemma 6.18) that Hr (G(n, p))
has the required properties with high probability for any C > 0 and δ ∈ (0, 1/4).

Definition 6.11. Let C > 0 and δ ∈ (0, 1/4) be constants. We say that an
r-uniform hypergraph H with n vertices is plausible if the following hold:

tk (H) 6 (log n)C νk (6.14)

for k ∈ {0, 1, 2, r − 1},


tk (H) = 0 (6.15)
for 3 6 k 6 r − 2, and
|e(H) − µr | 6 n1−δ . (6.16)

The following theorem implies the upper bound in Theorem 1.4. The error
terms in the theorem depend on the constants C and δ.

Theorem 6.12. Let ε > 0 be constant, let −1 + ε 6 θ = θ(n) 6 1/2 − ε, and


suppose that p = p(n) ∈ (ε, 1 − ε) and r = r(n) ∼ 2 log1/p (n) are chosen so
that µr = Θ(n1+θ ). Then
 n
P Hr (G(n, p)) = H 6 π e(H) (1 − π)( r )−e(H) p−t(H) e−ν2 +ν2 +O (n )+n
0 ∗ θ −Ω(1)

for every plausible H. Moreover, Hr (G(n, p)) is plausible with high probability.

31
We will deduce Theorem 6.12 from Theorem 4.1. In order to do so, observe
first that the condition φ∆2 = O(∆3 ) in the theorem is satisfied, by Remark 4.2.

We may therefore apply Theorem 4.1 to the r2 -uniform hypergraph encoding
copies of Kr in E(Kn ).
Let us assume, until the end of the proof of Theorem 6.12, that ε > 0 is
fixed, and that p ∈ (ε, 1 − ε), r ∈ N and θ ∈ R satisfy the assumptions of the
theorem. Recall from (6.1) and (6.2) the definitions of Y (H) and Qi (H). In
order to deduce the claimed bound, we need to bound the error term η(Y (H))
for each plausible hypergraph H. We will prove the following bounds.

Lemma 6.13. If H is plausible, then

Qi (H) = O∗ (∆i ) (6.17)

for each i ∈ {2, 3, 4}. Moreover

∆3 = O∗ (nθ ) + n−Ω(1) and ∆4 = O∗ (nθ ) + n−Ω(1) . (6.18)


 
Recall from Lemma 4.3 that E Qi Hr (G(n, p)) = O(∆i ) for each i ∈
{2, 3, 4}, and from Definition 6.11 that tk (H) 6 (log n)C νk for all plausible H
and all 0 6 k 6 r − 1. To prove (6.17), it will therefore suffice to prove the
following lemma.

Lemma 6.14. If tk (H) 6 λ · νk for all 0 6 k 6 r − 1, then


 
Qi (H) 6 λ · E Qi Hr (G(n, p))

for each i ∈ {2, 3, 4}.

Proof. The key observation is that, for fixed n and r, there exist non-negative
constants ci (k) such that
r−1
X
Qi (H) = ci (k) · tk (H)
k=0

for every r-uniform hypergraph H on n vertices. To see this, observe that by


symmetry, in the sums defining Qi (H) (see (4.2), (4.3) and (4.4)) all pairs of
r-cliques with a given intersection size s contribute the same when we look
at the terms where the indicator functions correspond to these r-cliques, and
moreover this contribution does not depend on H. Since tk (H) 6 λ · νk for all
0 6 k 6 r − 1, the claimed bound follows from the definition (6.12) of νk .

Remark 6.15. Note that Lemma 6.14 holds for any n, r and p, and as a
consequence it can also be used when verifying that a hypergraph is good, in
the sense of Definition 6.1.

In order to prove (6.18), we will consider separately those clusters that are
contained in sets of size r + 1, and those that are not. For each k > 3, let us
say that a k-cluster is normal if it has at least r + 2 vertices in total.

32
Lemma 6.16. If k ∈ {3, 4}, then the expected number of normal k-clusters is

O∗ (n3θ−1 ) + n−Ω(1) .

In the proof of Lemma 6.16, and also later in the section, it will be convenient
to define
θ̂ = max{θ, 0}.
We will also need the following simple observation.

Observation 6.17. If µr = nO(1) , then pr = Θ∗ (n−2 ).

Proof. Observe that


   r
n (r2) n r/2
µr = p = nO(1) ·p ,
r r

and hence pr/2 = Θ(r/n) = Θ∗ (n−1 ), as claimed.

The proof of the lemma is somewhat similar to that of Lemma 6.7.

Proof of Lemma 6.16. Observe first that the number of isomorphism classes of
k-clusters is O∗ (1), so it will suffice to prove the claimed bound for a fixed k-
cluster C. Let E1 , . . . , Ek be the cliques in C, ordered so that each after the first
shares an edge of Kn with the union of the earlier ones. The complication in
this setting is that, since r is no longer constant, there are many ways to choose
the subset of the vertices of the previous cliques to re-use in Ei . Indeed, a crude
bound on mi , the number of choices in step i 6 4, is 23r = nO(1) , where the
implicit constant depends only on ε, since r ∼ 2 log1/p (n) and p ∈ (ε, 1 − ε).
To deal with this issue, let us say that two cliques have a strange intersection
if they share between λ and r − λ vertices, where λ is a suitable (sufficiently
large) constant. As before, let Ni denote the expected number of labelled copies
of Ci (the i-cluster formed by the cliques E1 , . . . , Ei ) present in G(n, p). If two
cliques in C have a strange intersection of size t, then we may list them first,
and deduce that  
n r t
N 2 6 µr · p(2)−(2) 6 n−λ/2 ,
r−t
since pr = Θ∗ (n−2 ), by Observation 6.17. Now, observe that

Ni = O∗ mi Ni−1 ,

as in the proof of Lemma 6.7, since if Ei shares 2 6 t 6 r − 1 vertices of Kn


with E1 ∪ · · · ∪ Ei−1 , then
 
Ni n r t  
6 p(2)−(2) = O max µr r2 /n2 , pr n = O∗ (nθ̂−1 ), (6.19)
mi Ni−1 r−t

by Observation 6.17, while if Ei shares r vertices with E1 ∪ · · · ∪ Ei−1 then


Ni 6 mi Ni−1 . Using our crude bound on mi , it follows that for such clusters
we have
Nk 6 n−λ/2+O(1) 6 n−λ/4

33
if we choose λ sufficiently large (depending on ε).
We may therefore assume that C contains no strange intersections. But now
when choosing vertices to re-use, from each earlier clique we must pick either at
most λ vertices, or all but at most λ vertices. Thus mi = O((log n)λ ) for each i,
and hence the total number of choices for the shared vertices is O∗ (1). In other
words, we are now in a situation where we can ignore these choice factors.
We can now complete the proof by repeating the argument from Lemma 6.7.
Indeed, step 2 is always ordinary (meaning that Ei contains a vertex not in
E1 ∪ · · · ∪ Ei−1 ), and if there are at least two ordinary steps, then

Nk 6 O∗ µr · n2θ̂−2 = O∗ (nθ+2θ̂−1 ) 6 O∗ (n3θ−1 ) + n−Ω(1) ,

by (6.19) and since µr = Θ(n1+θ ). On the other hand, if i = 2 is the only


ordinary step, then Nk = O∗ (N2 ). If E1 and E2 share 3 6 t 6 r − 2 vertices,
then
  
r n r t  
N 2 6 µr p(2)−(2) = O∗ max µ2r /n3 , µr p2r n2 6 n−Ω(1) ,
t r−t

since µr = O(n3/2−ε ), and so we are done. If they share only 2 vertices, then
E3 must have at least r − 3 edges outside E1 ∪ E2 , so
 
N3 = O∗ pr−3 N2 = O∗ n−2 · µ2r /n2 = n−Ω(1) ,

by Observation 6.17, as required. Finally, if they share r − 1 vertices then, since


no other steps are ordinary, our cluster is not normal.

We can now easily deduce the claimed bounds on Qi (H) and ∆i .

Proof of Lemma 6.13. Let H be plausible, and recall from Definition 6.11 that
tk (H) = O∗ (νk ) for all 0 6 k 6 r − 1. By Lemmas 4.3 and 6.14, it follows that
 
Qi (H) = O∗ E Qi Hr (G(n, p)) = O∗ (∆i )

for each i ∈ {2, 3, 4}, as claimed.


To bound ∆k for k ∈ {3, 4}, note first that, by Lemma 6.16 and since
θ 6 1/2−ε, the expected number of normal k-clusters is at most O∗ (nθ )+n−Ω(1) .
Moreover, every non-normal k-cluster forms a copy of Kr+1 in G(n, p). Since
the expected number of copies of Kr+1 in G(n, p) is
 
n r+1
µr+1 = p( 2 ) 6 pr n · µr = O∗ (nθ ), (6.20)
r+1

by Observation 6.17, and each copy of Kr+1 contains O∗ (1) different k-clusters,
it follows that
∆k 6 O∗ (nθ ) + n−Ω(1) ,
as required.

Finally, we need to show that Hr (G(n, p)) is plausible with high probability.

34
Lemma 6.18. Hr (G(n, p)) is plausible with high probability.

Proof. We start by bounding νk , defined in (6.12), first noting that


 
ν3 = O µ2r · (r2 /n)3 = O∗ (n2θ−1 ) and νr−2 = O µr · r2 n2 p2r = O∗ (nθ−1 ),

since µr = Θ(n1+θ ) and pr = O∗ (n−2 ), by Observation 6.17. By convexity,


and since θ 6 1/2 − ε, it follows that νk = n−Ω(1) for all 3 6 k 6 r − 2.
 
Since νk = E tk Hr (G(n, p)) , the bounds (6.14) and (6.15) thus both hold for
Hr (G(n, p)) with high probability by Markov’s inequality. To show that (6.16)
also holds with high probability, it will suffice to show that ∆2 = O(µr ), since
e(H) has expectation µr = O(n3/2 ) and variance at most µr + 2∆2 , by (4.11).
To see this, note that
 
ν2 = O µ2r · (r2 /n)2 = O∗ (n2θ ) and νr−1 = O µr · rnpr = O∗ (nθ )

so
r−1
X  
∆2 = νk = ν2 + O∗ nθ + n2θ−1 = O∗ n2θ + nθ , (6.21)
k=2

and recall that µr = Θ(n1+θ ) and θ 6 1/2. By Chebyshev’s inequality, it follows


that (6.16) holds with high probability, as required.

We are now ready to prove Theorem 6.12.

Proof of Theorem 6.12. Let H be a plausible r-uniform hypergraph, and note


that e(H) = O(µr ), by (6.16) and since µr = Θ(n1+θ ) and θ > −1 + ε. Since
(as noted above) φ∆2 = O(∆3 ), we may apply Theorem 4.1, and deduce that
 2e(H) 0
P Hr (G(n, p)) = H 6 π e(H) (1 − π)N −e(H) p−t(H) e−( µr −1)(∆2 −∆2 )+η(H) ,

where N = nr and
 
η(H) = O ∆3 + φ · Q2 (H) + Q3 (H) + Q4 (H) + π 2 N .

Moreover, by Lemma 6.13, and since π 2 N = n−Ω(1) , we have



η(H) = O∗ ∆3 + ∆4 + n−Ω(1) = O∗ (nθ ) + n−Ω(1) ,

and by (4.5), (6.13) and (6.21) we have


r−1
X
∆2 = ν2 + O∗ (nθ ) + n−Ω(1) and ∆02 = νk0 = ν20 + n−Ω(1) .
k=2

By (6.16), and observing that n1−δ · ∆2 /µr = O∗ (nθ ) + n−Ω(1) for every δ > 0,
since ∆2 = O∗ (n2θ + nθ ), it follows that
 
2e(H) 
− 1 ∆2 − ∆02 = ν2 − ν20 + O∗ (nθ ) + n−Ω(1) , (6.22)
µr
and hence we obtain the claimed bound. Finally, recall from Lemma 6.18 that
Hr (G(n, p)) is plausible with high probability.

35
6.3 Lower bound for p constant
In this subsection we will complete the proof of Theorem 1.4. To do so we will
use Theorem 5.1, as in Section 6.1, but here we will need to work harder to

bound the term C(H) (the ‘complex’ error term), since if k = r2 (say) then
∆k is very large (to see this, consider sets of k copies of Kr all within a single
copy of K2r ), so the bound given by Lemma 5.2 is too weak. However, such
dense configurations simply do not appear in Hr (G(n, p)), and we will therefore
be able to carry out the strategy outlined at the end of Section 5.
Our first task is to define the family G of hypergraphs for which we will
prove the claimed lower bound; doing so will require a little preparation. The
first step is to describe the set of configurations of copies of Kr that typically
appear in G(n, p) when µr = n3/2−ε .

Definition 6.19. A cluster is legal if it is either contained in a copy of Kr+1 ,


consists of a pair of copies of Kr sharing exactly 2 vertices, or is a 3-cluster
formed by cliques whose vertex sets S1 , S2 , S3 satisfy

|S1 ∩ S2 | = |S1 ∩ S3 | = 2 and S2 ∩ S3 \ S1 = ∅.

We say that a graph G, or the corresponding r-uniform hypergraph H, is legal


if it contains only legal clusters.

We will show below that, if p and r are chosen as in Theorem 1.4, then
G(n, p) is legal with high probability (see Lemma 6.25).
Next, recall from Definition 5.3 that a star-cluster is a collection of at least
three copies of Kr , one of which (the ‘centre’) is not contained in the union
of others (the ‘leaves’), and if there are at least three leaves, then each leaf
intersects the centre in some edge that is not contained in any of the other
leaves. Here we will need the following slightly more restrictive notion.

Definition 6.20. We say that a star-cluster S is legal if the graph formed by


the union of its leaves is legal.

As in Section 5, we say that the star-cluster S is pre-present in H if all of


its leaves correspond to edges of H, and present if the centre also corresponds
to an edge of H. Now, define
X  
bL (H) =
C 1 S is pre-present in H · πc (S), (6.23)
S legal

where πc (S) denotes the probability that S is present in Hr (G(n, p)), condi-
tioned on the event that S is pre-present in Hr (G(n, p)). The following variant
of Observation 5.4 follows from the same proof.
bL (H) for every possible and legal hypergraph H.
Observation 6.21. C(H) 6 C

Proof. To deduce the claimed inequality from the proof of Observation 5.4, note
that if H is legal, then any star-cluster that is pre-present in H is also legal.

We can now define the family of hypergraphs G for which we will prove the
lower bound in Theorem 1.4.

36
Definition 6.22. Let C > 0 be an arbitrary constant. An r-uniform hyper-
graph H with n vertices is reasonable if H is plausible, H is legal, and
 
bL (H) 6 (log n)C E C
C bL Hr (G(n, p)) .

Theorem 1.4 is an immediate consequence of the following theorem.

Theorem 6.23. Let ε > 0 be constant, let −1 + ε 6 θ = θ(n) 6 1/2 − ε, and


suppose that p = p(n) ∈ (ε, 1 − ε) and r = r(n) ∼ 2 log1/p (n) are chosen so
that µr = Θ(n1+θ ). Then
 n
P Hr (G(n, p)) = H = π e(H) (1 − π)( r )−e(H) p−t(H) e−ν2 +ν2 +O (n )+n
0 ∗ θ −Ω(1)

for every reasonable H, and Hr (G(n, p)) is reasonable with high probability.

We will deduce Theorem 6.23 from Theorem 5.1. Let us assume that ε > 0
is fixed, and that p ∈ (ε, 1 − ε), r ∈ N and θ ∈ R satisfy the assumptions of the
theorem, and recall from the previous section that the conditions of Theorem 4.1
are satisfied. Note also that 1 − p > ε by assumption, and that
 2
0 ∗ µr
∆2 = O = o(µr ), (6.24)
n2

since µr = O(n3/2 ). We may therefore apply Theorem 5.1 to the 2r -uniform
hypergraph encoding copies of Kr in E(Kn ). Since, by Definition 6.22, every
reasonable hypergraph H is plausible, it follows from the proof of Theorem 6.12
(in particular from Lemma 6.13 and (6.22)) that the claimed bound holds up

to a factor of exp O(C(H)) . By Observation 6.21 and Definition 6.22, it will
therefore suffice to prove the following lemma. Recall that θ̂ = max{θ, 0}.

Lemma 6.24.
  
E CbL Hr (G(n, p)) = O∗ nθ + n3θ̂−1 .

We say that a star-cluster S is special if it consists of exactly three cliques,


all of which are contained in a copy of Kr+1 , and otherwise we say that S is
ordinary. The first term in Lemma 6.24 comes from special legal star-clusters,
and the second from ordinary ones. Note that in an ordinary legal star cluster,
no two of the leaves can be contained in a copy of Kr+1 that also contains the
centre of S, since otherwise these two leaves would already cover all but one
edge of the centre, so by the definition of a star-cluster there can be no further
leaves in S, and hence S is special.

Proof of Lemma 6.24. Observe first that


  X 
E CbL Hr (G(n, p)) = P S is present in G(n, p) .
S legal

The contribution to the sum from special legal star-clusters is

O∗ (µr+1 ) = O∗ (nθ ),

37
by (6.20), and since the union of the cliques in S is a copy of Kr+1 , and there
are O∗ (1) ways to choose the centre and the leaves inside a given copy of Kr+1 .
From now on we consider only ordinary legal star-clusters.
As usual, we will partition the sum according to the isomorphism type of
the ordinary legal star-cluster S. For a fixed type, we partition the leaves of S
into groups: any set of leaves contained in a copy of Kr+1 forms a group, and
every other leaf forms a group of size 1. Note that each group consists of at
most three cliques, since no leaf can be contained in the union of two or more
other leaves, by the minimality of S (see Definition 5.3). Observe also that any
group shares at most two edges with the union of all the other groups, since
the non-singleton groups are edge-disjoint from the others, and each singleton
group shares an edge (and no more) with at most two other singleton groups.
We now bound the expectation in the usual way, starting by choosing the
centre, S, and then adding leaves one by one. Each time we add a singleton
group, it forms a 2-cluster with S, and conditioning on the other leaves being
present only increases the probability by a factor of at most p−2 = O(1), so we
multiply the expectation by a factor of
 
∆2
O = O(nθ̂−1 ),
µr

where the final inequality follows from (6.21). Adding a non-singleton group is
slightly more complicated; we claim that each time we add such a group, we
multiply the expectation by a factor of
 
∗ ∆2
O · np = O∗ (nθ̂−2 ),
r
(6.25)
µr

where the final inequality holds by Observation 6.17. To see this, let U be the
vertex set of a non-singleton group, and note that (as we observed before the
proof) V (S) is not contained in U , since S is ordinary. It follows that at most
r − 1 vertices of U are shared with the centre of S, and hence U has at least
two vertices outside V (S). Let u ∈ U \ V (S), and let T be the copy of Kr with
vertex set U \ {u}. Note that T is not necessarily a leaf of S; however, if S

is present in G(n, p) then at least r2 − 1 edges of T are contained in G(n, p).
Moreover, if S is present in G(n, p) then at least r − 1 edges between u and
V (T ) are contained in G(n, p), and none of these edges are contained in S ∪ T ,
or in any of the cliques in different groups. Since there are only O∗ (1) choices
for the leaves of S given U , we obtain (6.25).
Finally, note that since S has at least two leaves, we either have at least two
singleton groups, or at least one non-singleton group. It follows that
X  
P S is present in G(n, p) = O∗ µr · n2θ̂−2 = O∗ (n3θ̂−1 ), (6.26)
S legal
and ordinary

as required.

It only remains to show that Hr (G(n, p)) is reasonable with high probability.
We have already shown that Hr (G(n, p)) is plausible with high probability,

38
and the final condition of reasonableness in Definition 6.22 holds by Markov’s
inequality. We therefore only need to prove the following lemma.

Lemma 6.25. G(n, p) is legal with high probability.

Proof. Recall first from the proof of Lemma 6.18 that

ν3 + · · · + νr−2 = o(1),

so with high probability the only pairs of copies of Kr in G(n, p) that share at
least one edge share either 2 or r − 1 vertices. Let C be a cluster, and suppose
that C contains two copies of Kr sharing r − 1 vertices, say E1 and E2 . Either
C is contained in a copy of Kr+1 , or it contains a third copy of Kr that shares
between 2 and r − 1 vertices with E1 ∪ E2 . The expected number of these latter
configurations is at most
    
∗ n (r+1
) ∆2 ∗ ∆2
O p 2 · =O = o(1),
r+1 µr n

by Observation 6.17 and (6.21), so with high probability no such clusters exist.
It remains to consider clusters C in which every pair of copies of Kr share
at most two vertices. If C has size 2 then it is legal, and otherwise there exist
three cliques in C whose vertex sets S1 , S2 , S3 satisfy

|S1 ∩ S2 | = |S1 ∩ S3 | = 2.

The expected number of such configurations with S2 ∩ S3 \ S1 6= ∅ is
 
µ3r
O∗ = O∗ (n3θ−2 ) = o(1),
n5

and the expected number of 4-clusters with pairwise intersections of size at


most 2 is  4
µr
O∗ = O∗ (n4θ−2 ) = o(1),
n6
so with high probability these also do not occur in G(n, p), as required.

Finally, let’s put the pieces together and prove Theorem 6.23.

Proof of Theorem 6.23. The proof of Theorem 6.12, together with the assump-
tion that 1 − p > ε and (6.24), which allows us to apply Theorem 5.1 in place
of Theorem 4.1, and the fact that every reasonable H is plausible, implies that
the claimed bound holds up to a factor of exp(O(C(H)).
To bound this extra error term, we use Observation 6.21 and Lemma 6.24.
Indeed, since H is possible and legal, we have C(H) 6 CbL (H), and hence
   
C(H) 6 CbL (H) = O∗ E C bL Hr (G(n, p)) = O∗ nθ + n−Ω(1)

by Definition 6.22 and Lemma 6.24, so we obtain the claimed two-way bound
for every reasonable H. Finally, observe that Hr (G(n, p)) is reasonable with
high probability by Lemmas 6.18 and 6.25, and Markov’s inequality.

39
7 Clique factors in G(n, p)
One of the main motivations for comparing the hypergraph Hr (G(n, p)) with
the binomial random hypergraph Hr (n, π) is to study Kr -factors in G(n, p), by
relating them to matchings in Hr (n, π), or in the model Hr (n, m) where the
number of hyperedges is fixed. The main focus of research has been on the
threshold for such a factor/matching to exist, for r > 3 constant. While the
original (extremely influential) paper of Johannson, Kahn and Vu [11] estab-
lished the (coarse) threshold simultaneously in both models, the more recent
extremely precise results of Kahn [13, 14] were only proved directly for the
hypergraph matching question (‘Shamir’s problem’); corresponding results for
Kr -factors are deduced using the coupling of [22] (for the sharp threshold for
r > 4) and the extension due to Heckel [6] for r = 3. Remarkably, despite the
apparent loss in the coupling, Heckel, Kaufmann, Müller and Pasch [7] managed
to transfer even Kahn’s hitting time result [14] to the Kr -factor setting.
For both Shamir’s problem and the Kr -factor problem, the threshold for
existence is now extremely well understood. Here we turn to a question that
has received less attention: once hypergraph matchings/clique factors exist,
how many are there? For simplicity, we will focus on the case where p = p(n)
or m = m(n) is a (possibly small) constant factor above the threshold.
Our main aim in this subsection is to prove Theorem 1.2; however, we will
begin by discussing matchings in random hypergraphs. Let M (H) denote the
number of perfect matchings in an r-uniform hypergraph H, and recall from
Section 1.2 that we write
n! (m)n/r
Σ(n, m) =
r!n/r (n/r)! (N )n/r

for the expected number of such matchings in the random hypergraph Hr (n, m)

if r | n, where as usual N = nr is the number of possible hyperedges and
(m)k = m(m − 1) · · · (m − k + 1). Let us also write

n!
Σ(n, π) = π n/r ,
r!n/r (n/r)!

for the corresponding expectation in the binomial random hypergraph Hr (n, π),
and let
n log n mr (n)
mr (n) = and πr (n) =
r N
be the thresholds for a perfect matching to exist in the two models when r ∈ N
r
is constant (note also that πr (n) ∼ qr (n)(2) , where qr (n) was defined in (1.3)).
Observe that if π = m/N , then
 
Σ(n, π) mn/r (N )n/r n2 3 2
= n/r = exp + O(n /m ) + O(n/m) , (7.1)
Σ(n, m) N (m)n/r 2r2 m

which is exp(Θ(n/ log n)) when m = Θ(mr (n)) and r = O(1).


Kahn [13, Theorem 1.2] gives a counting result for Shamir’s problem, which
can be strengthened to the following; see Section 7.2.

40
Theorem 7.1. Let r > 3 and ε > 0 be fixed. If m > (1 + ε)mr (n), then

M (Hr (n, m)) = eo(n/ log n) Σ(n, m)

with high probability.

Only the lower bound here is interesting; the upper bound is just Markov’s
inequality. An immediate corollary is the corresponding result for Hr (n, π).

Corollary 7.2. Let r > 3 and ε > 0 be fixed. If π > (1 + ε)πr (n), then

M (Hr (n, π)) = eo(n/ log n) Σ(n, m)


 n 
with high probability, where m = r π .

Proof. A standard argument: if we take m+ = (1 + n−1/2 )m and m− = (1 −


n−1/2 )m, then Hr (n, π) with high probability has between m− and m+ edges,
and conditional on having m edges, it has the distribution of Hr (n, m). The
result now follows from Theorem 7.1, since the dependence of Σ(n, m) on m
scales (roughly) as mO(n) , so Σ(n, m+ ) and Σ(n, m− ) are both within a factor
of exp(O(n1/2 )) of Σ(n, m).

As we see here, translating between Hr (n, m) and Hr (n, π) is essentially


trivial in either direction, since the property of containing a perfect matching
is monotone. Note, however, that the statement for Hr (n, π) involves the ex-
pectation in the model Hr (n, m). By (7.1), the expected number of perfect

matchings in Hr (n, π) is larger by a factor of order exp Θ(n2 /m) , and this
difference cannot be absorbed into the error term when m = Θ(n log n).
This was for counting matchings in hypergraphs. What can we say about the
number of Kr -factors in G(n, m) or G(n, p)? As outlined above, the coupling
results of [22] and Heckel [6] give the following corollary of Theorem 7.1. Here
we need some upper bound on p for the coupling to work well; presumably it is
not needed for the result to be true. Recall that we write Fr (G) for the number
of Kr -factors in a graph G.

Corollary 7.3. Let ε > 0 and r > 3 be fixed. There exists γ = γ(r) > 0 such
that if p = p(n) satisfies (1 + ε)qr (n) 6 p 6 n−2/r+γ , then

Fr (G(n, p)) > eo(n/ log n) Σ(n, m)


 n (r) 
with high probability, where m = r p
2 .

Proof. This is essentially immediate from Corollary 7.2 and [22, Theorem 1] for
r > 4, or [6, Theorem 2] for r = 3: the latter results show that there is a coupling
r
of G(n, p) and Hr (n, π ′ ) with π ′ ∼ p(2) such that with high probability the set
of copies of Kr in the former contains all hyperedges of the latter. As noted in
both papers (see Remark 4 in [22]), we may in fact take π ′ = (1 − n−δ )π for
some constant δ > 0. Then the corresponding m and m′ agree within a factor
of 1 − n−δ , so Σ(n, m′ ) = exp(O(n1−δ )) · Σ(n, m).

41
The discussion above suggests an obvious question: can we bound the num-
ber of Kr -factors in G(n, p) from above? Of course, the expectation gives an
r
upper bound. However, writing π = p(2) , we have
 
E Fr (G(n, p)) = Σ(n, π), (7.2)

which by (7.1) is significantly larger than Σ(n, m) for m = ⌊µr ⌋, and so does
not match the lower bound in Corollary 7.3. One might expect that switching
to G(n, m) would help, but this is not the case. In the hypergraph setting, the
expectation in Hr (n, π) is ‘too large’ (larger than the typical value) because
when Hr (n, π) has more hyperedges than typical, it has a lot more matchings.
However, the expectation in G(n, p) is not too large due to extra edges, but
rather due to extra copies of Kr .
This suggests a strategy: condition on the total number of copies of Kr
in G(n, p). This thinking (in the different context of studying colourings of
G(n, 1/2), which we will return to in future work) is in fact what motivated the
present paper; using Theorem 6.2 we are able to prove the following result.

Theorem 7.4. Let r > 3 and ε > 0 be constants. There exists γ = γ(r) > 0
such that if p 6 n−2/r+γ , then
∗ √
Fr (G(n, p)) 6 eO ( n)
Σ(n, m)
 n (r) 
with high probability, where m = r p
2 .

Together, with Corollary 7.2, this result establishes Theorem 1.2, illustrating
how our present results here complement the couplings of [22] and [6]. Note that
the result above is (of course) only interesting when p is at least (or is close to)
the threshold qr (n), since otherwise Fr (G(n, p)) = 0 with high probability.

7.1 A general lemma


The main ingredient in the proof of Theorem 7.4 is a general lemma show-
ing how to calculate using the probability distribution of Hr (G(n, p)) given by
Theorem 1.1. In order state it, we will need one more definition.
Throughout this subsection, let us fix r > 3 and p = p(n) 6 n−2/r+γ , where
γ > 0 is sufficiently small. Recall from Definition 6.1 the definition of a good
hypergraph, and let W3 (H) denote the number of 3-clusters in H, so
 
∆3 = E W3 Hr (G(n, p)) ,

cf. (4.1). Recall also that ω = ω(n) → ∞ as n → ∞ arbitrarily slowly.

Definition 7.5. An r-uniform hypergraph H is well behaved if it is good and

W3 (H) 6 ω∆3 .

Recall from Lemma 6.3 that Hr (G(n, p)) is good with high probability, and
observe that therefore, by Markov’s inequality, Hr (G(n, p)) is well behaved with

42
high probability. Let W denote the event that Hr (G(n, p)) is well behaved, and

for each m ∈ N with |m − µr | 6 ω µr , consider the event

Wm = Hr (G(n, p)) is well behaved and e(Hr (G(n, p))) = m .

Recall also that ξ was defined in (6.4).


The following technical lemma is the main result of this subsection.

Lemma 7.6. For each r > 3, there exists γ > 0 such that if p 6 n−2/r+γ and
µr → ∞, then the following holds. Let E1 , . . . , Ek be distinct copies of Kr in

Kn , and let F = E1 ∪ · · · ∪ Ek . For any m ∈ N with |m − µr | 6 ω µr , we have

P Wm ∩ {F ⊂ G(n, p)} (m)k −t(F ) −(k2 )(m2 )−1 (∆2 − ∆02 ) + O∗ (ξ) + n−Ω(1)
 6 ·p ·e ,
P Bin(N, π) = m (N )k
n

where N = r and  
r
t(F ) = k − e(F ) (7.3)
2
is the number of repeated graph edges in E1 , . . . , Ek .

We remark that the error terms in the lemma depend on r, γ and ω, but not
on k. To help motivate the bound in the lemma, notice that

(m)k 
· P Bin(N, π) = m
(N )k

is the probability that Hr (n, π) has exactly m edges, including E1 , . . . , Ek .


Recall from Section 6.2 that if H is a hypergraph, then we write ts (H) for
the number of pairs of hyperedges of H sharing exactly s vertices, and from (1.2)
that t(H) is the analogue of (7.3) for hypergraphs. In the proof of Lemma 7.6,
we will need the following bounds on t(H) in terms of t2 (H), . . . , tr−1 (H).

Lemma 7.7. For any r-uniform hypergraph H we have


r−1  
X  
s r−1
06 ts (H) − t(H) 6 W3 (H).
s=2
2 2
Pr−1 
Proof. Both s=2 s2 ts (H) and t(H) count graph edges that appear in at least
two hyperedges of H, with certain multiplicities. Edges that appear in exactly

k hyperedges have multiplicities k2 and k − 1, which are equal when k = 2, and
for k > 3 the former multiplicity is larger, so this proves the lower bound.
For the upper bound, note that each edge in k hyperedges contributes zero
 
to the central term if k = 2, and at most k2 − k + 1 6 k3 if k > 3. Moreover,

each such edge is contained in the common intersection of at least k3 different
3-clusters. The central term is therefore at most the sum over all 3-clusters
C of the number of graph edges contained in the common intersection of the
hyperedges in C, which is at most the claimed upper bound.

We will also need a similar bound for the following variant of ts (H).

43
Definition 7.8. Given an r-uniform hypergraph H and 2 6 s 6 r − 1, we write
t−
s (H) for the number of unordered pairs {e, f } of hyperedges of H such that

(i) e and f share exactly s vertices, and

(ii) neither e nor f intersects any other hyperedge of H in two or more vertices.

In other words, t−
s (H) counts the number of ‘isolated’ 2-clusters with inter-
section size s. In particular, note that t−
s (H) 6 ts (H).

Lemma 7.9. For any r-uniform hypergraph H we have


r−1  
X r−1  
X  
s s − r−1
06 ts (H) − ts (H) 6 3 W3 (H).
s=2
2 s=2
2 2

Proof. The lower bound is trivial, since t− s (H) 6 ts (H) by definition. Now,
Pr−1 s
observe that s=2 2 ts (H) counts all pairs (e, C), where C is a 2-cluster and e
is a graph edge in the intersection of the two hyperedges of C. If the pair (e, C)
does not contribute to the sum with t− s , then there must be some hyperedge
of H that forms a 3-cluster with C. On the other hand, for each 3-cluster in

H there are at most three choices for the 2-cluster C, and then at most r−1 2
choices for the edge e in the intersection of these two hyperedges.

We are now ready to prove our technical lemma.

Proof of Lemma 7.6. Let Fm be the set of r-uniform hypergraphs on [n] that
contain the k hyperedges corresponding to E1 , . . . , Ek and satisfy the conditions
of Wm , and observe that
 X 
P Wm ∩ {F ⊂ G(n, p)} = P Hr (G(n, p)) = H . (7.4)
H∈Fm

Every H ∈ Fm is good by Definition 7.5, so by Theorem 6.2, the right-hand


side of (7.4) is equal to
X 
π m (1 − π)N −m p−t(H) exp −∆2 + ∆02 + O∗ (ξ) + n−Ω(1) . (7.5)
H∈Fm

Note that p−t(H) is the only term that depends on H, so our aim is to bound
X
p−t(H) .
H∈Fm

Roughly speaking, the idea is to build a random hypergraph H starting with


the k given edges and adding m − k new random edges one-by-one, considering
how the p−t(H) term evolves as we add edges, which we can conveniently do
working in terms of expectations.
Let h1 , . . . , hk be the hyperedges corresponding to E1 , . . . , Ek , and let the
remaining edges hk+1 , . . . , hm be chosen independently and uniformly at random

from all N = nr possible hyperedges. For k 6 i 6 m let H̃i be the random
hypergraph with edge-set {h1 , . . . , hi }. Since H ∈ Fm implies that H has m

44
distinct hyperedges, there are exactly (m − k)! outcomes leading to H̃m = H
for any given H ∈ Fm . Thus, in this probability space,
(m − k)!
P(H̃m = H) = ,
N m−k
and hence
X N m−k X −t(H) 
p−t(H) = p · P H̃m = H
(m − k)!
H∈Fm H∈Fm

N m−k   
= · E p−t(H̃m ) 1 H̃m ∈ Fm , (7.6)
(m − k)!
where the expectation is over the random choice of hk+1 , . . . , hm .
To bound the expectation, we will first replace t(H̃m ) by a simpler quantity
which is bounded from above and below by the sums in Lemmas 7.7 and 7.9.
To do so, for each 2 6 i 6 m and 2 6 s 6 r − 1 let us write Ai,s for the event
that hi shares exactly s vertices with one previous hj , and does not meet any
other hj ′ with j ′ < i in two or more vertices. Now, for each 2 6 ℓ 6 m, define

Xℓ Xr−1  
  s
Zℓ = 1 Ai,s ,
i=2 s=2
2

and observe that


r−1  
X r−1  
X
s − s
ts (H̃ℓ ) 6 Zℓ 6 ts (H̃ℓ ),
s=2
2 s=2
2

since Zℓ counts the graph edges in a subset of the intersections between pairs
of hyperedges of H̃ℓ (this proves the upper bound), and this subset includes all
isolated 2-clusters (this proves the lower bound). By Lemmas 7.7 and 7.9, it
follows that
t(H̃ℓ ) = Zℓ + O(W3 (H̃ℓ )).
Applying this for ℓ = k and ℓ = m, and noting that W3 (H̃k ) 6 W3 (H̃m ), we
obtain
t(H̃m ) = t(H̃k ) + Zm − Zk + O(W3 (H̃m )).
Now, by Definition 7.5 and Lemma 6.9, if H is well behaved, then

W3 (H) 6 ω∆3 = O(ωξ) = O∗ (ξ).

Thus, on the event H̃m ∈ Fm , we have

t(H̃m ) = t(H̃k ) + Zm − Zk + O∗ (ξ).

Recalling (7.6), this implies that


X ∗ N m−k  
p−t(H) 6 eO (ξ)
· p−t(F ) · E p−Zm +Zk , (7.7)
(m − k)!
H∈Fm

since log(1/p) = O∗ (1) and t(H̃k ) = t(F ), by definition.

45
 
To complete the proof, we will inductively bound E p−Zℓ +Zk . To be precise,
we claim that for each k < ℓ 6 m, we have
  
E p−Zℓ +Zk H̃ℓ−1 6 1 + c(ℓ) · p−Zℓ−1 +Zk , (7.8)

where
r−1  
X  −1
r n−r n s 
c(ℓ) = (ℓ − 1) p−(2) − 1 .
s=2
s r−s r
To show this, let us write
   −1
r n−r n
q(s) :=
s r−s r

for the probability that a random r-set shares exactly s vertices with a given
r-set, and observe that

P Aℓ,s | H̃ℓ−1 6 (ℓ − 1)q(s)

for each 2 6 s 6 r − 1, by the union bound over edges of H̃ℓ−1 . It follows that
   
s
P Zℓ − Zℓ−1 = H̃ℓ−1 6 (ℓ − 1)q(s)
2

for each 2 6 s 6 r − 1, and otherwise Zℓ = Zℓ−1 . This proves (7.8), and it


follows that
  Ym  Xm 

E p−Zm +Zk 6 1 + c(ℓ) 6 exp c(ℓ) . (7.9)
ℓ=k+1 ℓ=k+1

It only remains to bound the sum in (7.9). To do so, observe first that
m
X     X
r−1
m k s 
c(ℓ) = − q(s) p−(2) − 1 , (7.10)
2 2 s=2
ℓ=k+1

Pm m
 k

since ℓ=k+1 (ℓ − 1) = 2 − 2 . Note also that
 2
q(s) n r µ2
νs0 = p2(2) = q(s) · r ,
2 r 2

by (6.12) and (6.13), and the definition of q(s), and hence


 
m √ 
q(s) · = 1 + O∗ (1/ µr ) · νs0 ,
2
√ s
since |m − µr | 6 ω µr . Since νs = p−(2) νs0 , it follows from (7.10) that
m
X    −1  Xr−1
√  k m 
c(ℓ) = 1 + O∗ (1/ µr ) 1 − νs − νs0
2 2 s=2
ℓ=k+1
   −1   
k m 0
 ∗ ∆2
= 1− ∆2 − ∆2 + O √ , (7.11)
2 2 µr

46
Pr−1 Pr−1
since ∆2 = s=2 νs and ∆02 = s=2 νs0 .
We are finally ready to put the pieces together. First, by (7.4) and (7.5),
 0 ∗ −Ω(1) X
P Wm ∩ {F ⊂ G(n, p)} 6 π m (1 − π)N −m e−∆2 +∆2 +O (ξ)+n p−t(H) .
H∈Fm

Next, by (7.7), (7.9) and (7.11), we have


X N m−k k m −1 √
· p−t(F ) · e(1−(2)( 2 ) )(∆2 −∆2 )+O (ξ)+O (∆2 / µr ) .
0 ∗ ∗
p−t(H) 6
(m − k)!
H∈Fm

Noting that
 −1
N N m−k 2 (m)k (m)k
= eO(m /N ) = eO(ξ) ,
m (m − k)! (N )k (N )k

since k 6 m and m2 /N = O(n2−r+O(γ) ) = O(ξ), and recalling from (6.11) that


∆2
√ = O(ξ) + n−Ω(1) ,
µr
it follows that

P Wm ∩ {F ⊂ G(n, p)} (m)k −t(F ) −(k2 )(m2 )−1 (∆2 −∆02 )+O∗ (ξ)+n−Ω(1)
 6 ·p ·e ,
P Bin(N, π) = m (N )k
as required.

It is straightforward to deduce Theorem 7.4 from Lemma 7.6.

Proof of Theorem 7.4. We may assume that p > qr (n)/2, since otherwise we
would have Fr (G(n, p)) = 0 with high probability. In particular, it follows from
this assumption that p > n−2/r , and thus µr → ∞ and ξ = Ω(1). Applying
Lemma 7.6 to each of the Kr -factors in Kn , and setting k = n/r, we obtain
 
E Fr (G(n, p))1[Wm ] (m)k −(k2)(m2 )−1 (∆2 −∆02 ) + O∗ (ξ)
 6 Fr (Kn ) · ·e ,
P Bin(N, π) = m (N )k

for each m with |m − µr | 6 ω µr , since if F is a Kr -factor then t(F ) = 0.
Now, since ∆2 > ∆02 , and noting that
(m)n/r
Σ(n, m) = Fr (Kn ) · ,
(N )n/r
it follows that  
E Fr (G(n, p))1[Wm ] ∗
 6 eO (ξ) Σ(n, m).
P Bin(N, π) = m
Summing over all m in the range, it follows that
 
E Fr (G(n, p))1[W] 6 eO (ξ) Σ(n, m+ ),

where W is the event that Hr (G(n, p)) is well behaved (which in particular
√ √
implies that |m − µr | 6 ω µr ), and m+ = m + ω m. Since
 O(n)   
Σ(n, m+ ) m+ n ∗ √
= = exp O∗ √ = eO ( n)
Σ(n, m) m m

47
and ξ 6 n1/3+O(γ) , it follows that
  ∗ √
E Fr (G(n, p))1[W] 6 eO ( n) Σ(n, m).

Since W holds with high probability, by Markov’s inequality the right-hand


side above is an upper bound on Fr (G(n, p)) that holds with high probability,
completing the proof.

Replacing the trivial bound ∆2 > ∆02 with a simple calculation gives a
stronger result for fixed m. To be precise, one can easily check that
   
 µ2r r 2 n −1
∆2 = 1 + o(1)
p 2 2

and ∆02 ∼ p∆2 = o(∆2 ), and hence, for k = n/r and m ∼ µr , we have
  −1
k m  (r − 1)2
∆2 − ∆02 ∼ .
2 2 2p
It therefore follows from the proof above that
   
E Fr (G(n, p))1[Wm ] (r − 1)2 ∗
 6 exp − (1 + o(1)) + O (ξ) · Σ(n, m)
P Bin(N, π) = m 2p

for every m ∈ N with |m − µr | 6 ω µr . This bound illustrates an interesting
phenomenon: in G(n, p), given that there are exactly m copies of Kr , the con-
ditional expected number of Kr -factors is noticeably (around exp(−Θ(p−1 )))
smaller than the expected number of matchings in Hr (n, m). Intuitively, the
reason for this is that Hr (G(n, p)) prefers intersecting pairs of copies of Kr over
disjoint ones, so given that there are m copies of Kr , we expect more intersecting
pairs than in Hr (n, m), and so fewer factors.

7.2 The lower bound


In this section we outline the proof of Theorem 7.1, which as discussed earlier
is a slight modification of Kahn’s proof of Theorem 1.2 in [13].

Proof of Theorem 7.1. As in [13], start with the complete r-uniform hypergraph
and remove hyperedges one by one uniformly at random, letting Φt be the
number of matchings after t steps, so the random variable we are interested in

is ΦN −m where N = nr . Let ξt be the random fraction of matchings removed
in step t, and γt its expectation, which is simply (n/r)/(N − t + 1), since each
of the n/r edges in some particular matching has probability 1/(N − t + 1) of
being chosen for removal in step t. Note that
NY
−m
Φm = Φ0 (1 − ξt ), (7.12)
t=1

and, considering the probability that any particular matching survives,


NY
−m
E[Φm ] = Φ0 (1 − γt ).
t=1

48
One key point (of many!) in [13] is that αt = ξt − γt is a martingale differ-
ence sequence, and furthermore that (off some bad event, on which he freezes
the martingale)√ξt = O(γt ) (see (16) in [13]), so αt = O(γt ). It follows that
PN −m PN −m 2
t=1 αt = O( S) with high probability, where S = t=1 γt = O(n/ log n).
To evaluate the product in (7.12), one considers the log. Since ξt = O(γt ) = o(1),
we have log(1 − ξt ) = −ξt + O(γt2 ), and this is where the final error in [13] comes
from: it is written as o(n), but is in fact O(S) = O(n/ log n).
We can improve on this by considering the variance of the ξt . Although not
essential, it is cleaner to consider the product
NY−m NY−m  
Φm 1 − ξt ξt − γt
= = 1− .
E[Φm ] t=1
1 − γt t=1
1 − γt

Let α′t = (ξt − γt )/(1 − γt ) = αt /(1 − γt ). This is again a martingale difference



PN −m
sequence, and each term is O(γt ). So we again have that t=1 α′t = O( S)
with high probability. Moreover, since α′t = o(1) for all t ∈ [N − m], we have
NX−m N −m
Φm 1 + o(1) X ′ 2
log =− α′t − (αt ) .
E[Φm ] t=1
2 t=1

Let Ft denote the σ-algebra corresponding to the information revealed after t


steps, and define
  
Vt = E (α′t )2 | Ft−1 = (1 − γt )−2 Var ξt | Ft−1 .

Since α′t = O(γt ) we have that (α′t )2 − Vt is a martingale difference sequence


with each term of size O(γt2√) = o(γt ), and it follows that the sum of these terms
P −m
is with high probability o( S), so it just remains to bound N t=1 Vt . If we
2
only use the bound Vt = O(γt ) effectively used in [13], we are still left with an
O(n/ log n) error term. However, although it is not stated there, [13] contains
all the ingredients to prove a stronger bound. The key is that, in addition to the
deterministic bound ξt = O(γt ), the property E defined just below (62) implies
that (off the same bad event) ξt ∼ γt with conditional probability 1 − o(1).

Together, these two bounds imply that Var ξt | Ft−1 = o(γt2 ), so Vt = o(γt2 ).
PN −m
Hence we have t=1 Vt = o(S), giving the claimed improvement.

Note that, by the same argument, a stronger result would follow from any
improvement of this ‘approximate’ flatness of the distribution of ξt given Ft−1 .

Acknowledgements
The research described in this paper was partly carried out during several visits
of the second author to IMPA. The authors are grateful to IMPA for providing
us with a wonderful working environment.

49
References
[1] P. Allen, J. Böttcher, J. Corsten, E. Davies, M. Jenssen, P. Morris,
B. Roberts and J. Skokan, A robust Corrádi–Hajnal theorem, Random
Struct. Alg., 65 (2024), 61–130.

[2] S. Antoniuk, N. Kamčev and C. Reiher, Clique factors in randomly per-


turbed graphs: the transition points, arXiv:2410.11003, 2024.

[3] A. Ferber, L. Hardiman and A. Mond, Counting Hamilton cycles in Dirac


hypergraphs, Combinatorica, 43 (2023), 665–680.

[4] K. Frankston, J. Kahn, B. Narayanan and J. Park, Thresholds versus frac-


tional expectation-thresholds, Ann. Math., 194 (2021), 475–495.

[5] A. Heckel, Non-concentration of the chromatic number of a random graph,


J. Amer. Math. Soc., 34 (2021), 245–260.

[6] A. Heckel, Random triangles in random graphs, Random Struct. Alg., 59


(2021), 616–621.

[7] A. Heckel, M. Kaufmann, N. Müller and M. Pasch, The hitting time of clique
factors, Random Struct. Alg., 65 (2024), 275–312.

[8] A. Heckel and K. Panagiotou, Colouring random graphs: Tame colourings,


arXiv:2306.07253v2, 2023.

[9] A. Heckel and O. Riordan, How does the chromatic number of a random
graph vary?, J. London Math. Soc. 108 (2023), 1769–1815.

[10] S. Janson, Poisson approximation for large deviations, Random Struct. Alg.,
1 (1990), 221–229.

[11] A. Johansson, J. Kahn and V. Vu, Factors in random graphs, Random


Struct. Alg., 33 (2008), 1–28.

[12] F. Joos, R. Lang and N. Sanhueza-Matamala, Robust hamiltonicity,


arXiv:2312.15262, 2023.

[13] J. Kahn, Asymptotics for Shamir’s problem, Adv. Math., 422 (2023),
109019.

[14] J. Kahn, Hitting times for Shamir’s problem, Trans. Amer. Math. Soc.,
375 (2022), 627–668.

[15] D.Y. Kang, T. Kelly, D. Kühn, D. Osthus and V. Pfenninger, Perfect


matchings in random sparsifications of Dirac hypergraphs, Combinatorica
(2024), 1–34.

[16] T. Kelly, A. Müyesser and A. Pokrovskiy, Optimal spread for spanning


subgraphs of Dirac hypergraphs, J. Combin. Theory, Ser. B, 169 (2024),
507–541.

50
[17] M. Kwan, R. Safavi and Y. Wang, Counting perfect matchings in Dirac
hypergraphs, arXiv:2408.09589, 2024.

[18] R. Montgomery and M. Pavez-Signé, Counting spanning subgraphs in dense


hypergraphs, Combin. Probab. Computing, 33 (2024), 729–741.

[19] F. Mousset, A. Noever, K. Panagiotou and W. Samotij, On the probability


of nonexistence in binomial subsets, Ann. Probab. 48 (2020), 493–525.

[20] J. Park and H.T. Pham, A proof of the Kahn–Kalai conjecture, J. Amer.
Math. Soc., 37 (2024), 235–243.

[21] H.T. Pham, A. Sah, M. Sawhney and M. Simkin, A toolkit for robust
thresholds, arXiv:2210.03064, 2022.

[22] O. Riordan, Random cliques in random graphs and sharp thresholds for
F -factors, Random Struct. Alg., 61 (2022), 619–637.

[23] O. Riordan and L. Warnke, The Janson inequalities for general up-sets,
Random Struct. Alg., 46 (2015), 391–395.

51

You might also like