Notes RGC Nii
Notes RGC Nii
NETWORKS
Volume 2
Preface xi
Possible Course Outlines xv
Part I Preliminaries 1
vii
viii Contents
References 473
Glossary 487
Index 489
P REFACE
Targets. In this book, which is Volume 2 of a sequence of two books, we study local limits,
connected components, and small-world properties of random graph models for complex
networks. Volume 1 describes the preliminaries of random graphs as models for real-world
networks, as investigated since 1999. These networks turned out to be rather different from
classical random graph models, for example in the number of connections that the elements
make. As a result, a wealth of new models was invented to capture these properties. Volume
1 studies these models as well as their degree structure. Volume 2 summarizes the insights
developed in this exciting period related to the local, connectivity, and small-world structure
of the proposed random graph models. While Volume 1 is intended to be used for a master
level course, where students have a limited prior knowledge of special topics in probability,
Volume 2 describes the more involved notions that have been the focus of attention of the
research community in the past two decades.
Volume 2 is intended to be used for a PhD level course, a reading seminar, or for re-
searchers wishing to obtain a consistent and extended overview of the results and method-
ologies developed in this scientific area. Volume 1 includes many of the preliminaries, such
as the convergence of random variables, probabilistic bounds, coupling, martingales, and
branching processes, and we frequently rely on these results.
The sequence of Volumes 1 and 2 aims to be self-contained. In Volume 2, we briefly repeat
some of the preliminaries on random graphs, including an introduction to the key models and
their degree distributions, as discussed in detail in Volume 1. In Volume 2, we aim to give
detailed and complete proofs. When we do not give proofs, we provide heuristics, as well
as extensive pointers to the literature. We further discuss several more recent random graph
models that aim to more realistically model real-world networks, as they incorporate their
directed nature, their community structure, and/or their spatial embedding.
Developments. The field of random graphs was pioneered in 1959–1960 by Erdős and Rényi
(1959; 1960; 1961a; 1961b), in the context of the probabilistic method. The initial work by
Erdős and Rényi incited a great amount of follow-up in the field, initially mainly in the
combinatorics community. See the standard references on the subject by Bollobás (2001)
and Janson, Łuczak, and Ruciński (2000) for the state of the art. Erdős and Rényi (1960)
gives a rather complete picture of the various phase transitions that occur in the Erdős–Rényi
random graph. This initial work did not aim to model real-world networks realistically.
In the period after 1999, owing to the fact that data sets of large real-world networks be-
came abundantly available, their structure has attracted enormous attention in mathematics
as well as in various applied domains. This is exemplified by the fact that one of the first
articles in the field, by Barabási and Albert (1999), has attracted over 40,000 citations. One
of the main conclusions from this overwhelming body of work is that many real-world net-
works share two fundamental properties. The first is that they are highly inhomogeneous, in
the sense that different vertices play rather different roles in the networks. This property is
exemplified by the degree structure of the real-world networks obeying power laws: these
networks are scale-free. This scale-free nature of real-world networks has prompted the
xi
xii Preface
community to come up with many novel random graph models that, unlike the Erdős–Rényi
random graph, do have power-law degree sequences. This was the key focus in Volume 1.
Content. In this book, we pick up on the trail left in Volume 1, where we now focus on the
connectivity structure between vertices. Connectivity can be summarized in two key aspects
of real-world networks: the facts that they are highly connected, as exemplified by the fact
that they tend to have one giant component containing a large proportion of the vertices (if
not all of them), and that they are small world, in that most pairs of vertices are separated by
short paths. We discuss the available methods for these proofs, including path-counting tech-
niques, branching-process approximations, exchangeable random variables, and de Finetti’s
theorems. We pay particular attention to a recent technique, called local convergence, that
makes the statement that random graphs “locally look like trees” precise.
This book consists of four parts. In Part I, consisting of Chapters 1 and 2, we start in
Chapter 1 by repeating some definitions from Volume 1, including the random graph mod-
els studied in the present book, which are inhomogeneous random graphs, configuration
models, and preferential attachment models. We also discuss general topics that are impor-
tant in random graph theory, such as power-law distributions and their properties. In Chapter
2, we continue by discussing local convergence, an extremely powerful technique that plays
a central role in the theory of random graphs and in this book. In Part II, consisting of Chap-
ters 3–5, we discuss local limits and large connected components in random graph models.
In Chapter 3, we further extend the definition of the generalized random graph to general
inhomogeneous random graphs. In Chapter 4, we discuss the local limit and large connected
components in the configuration model, and in Chapter 5, we discuss the local structure in,
and connectivity of, preferential attachment models. In Part III, consisting of Chapters 6–8,
we study the small-world nature of random graphs, starting with inhomogeneous random
graphs, continuing with the configuration model, and ending with the preferential attach-
ment model. In Part IV, consisting of Chapter 9, we study related random graph models and
their structure.
Along the way, we give many exercises that should help the reader to obtain a deeper
understanding of the material by working on the solutions. These exercises appear in the last
section of each of the chapters, and, when applicable, we refer to them at the appropriate
place in the text. We also provide extensive notes in the penultimate section of each chapter,
where we discuss the links to the literature and some extensions.
Literature. We have tried to give as many references to the literature as possible. However,
the number of papers on random graphs has exploded. In MathSciNet (see www.ams.org/
mathscinet), there were, on December 21, 2006, a total of 1,428 papers that contain
the phrase “random graphs” in the review text; on September 29, 2008, this number had
increased to 1,614, to 2,346 on April 9, 2013; to 2,986 on April 21, 2016; and to 12,038 on
October 5, 2020. These are merely the papers on the topic in the mathematics community.
What is special about random graph theory is that it is extremely multidisciplinary, and many
papers using random graphs are currently written in economics, biology, theoretical physics,
and computer science. For example, in Scopus (see www.scopus.com/scopus/home.
url), again on December 21, 2006, there were 5,403 papers that contain the phrase “random
graph” in the title, abstract or keywords; on September 29, 2008, this had increased to 7,928;
to 13,987 on April 9, 2013; to 19,841 on April 21, 2016; and to 30,251 on October 5, 2020. It
Preface xiii
can be expected that these numbers will continue to increase, rendering it utterly impossible
to review all the literature.
In June 2014, we decided to split the preliminary version of this book up into two books.
This has several reasons and advantages, particularly since Volume 2 is more tuned towards
a research audience, while Volume 1 is aimed at an audience of master students of varying
backgrounds. The pdf-versions of both Volumes 1 and 2 can be obtained from
www.win.tue.nl/˜rhofstad/NotesRGCN.html.
For errata for this book and Volume 1, or possible outlines for courses based on them, readers
are encouraged to look at this website or e-mail me. Also, for a more playful approach to
networks for a broad audience, including articles, videos, and demos of many of the models
treated in this book, we refer all readers to the NetworkPages at www.networkspages.
nl. The NetworkPages provide an interactive website developed by and for all those who
are interested in networks. Finally, we have relied on various real-world networks data sets
provided by the KONECT project; see http://konect.cc as well as Kunegis (2013)
for more details.
Thanks. This book, as well as Volume 1, would not have been possible without the help
and encouragement of many people. I particularly thank Gerard Hooghiemstra for encour-
aging me to write it, and for using it at Delft University of Technology almost simultane-
ously while I was using it at Eindhoven University of Technology in the spring of 2006 and
again in the fall of 2008. I thank Gerard for many useful comments, solutions to exercises,
and suggestions for improvements of the presentation throughout the book. Together with
Piet Van Mieghem, we entered the world of random graphs in 2001, and I have tremen-
dously enjoyed exploring this field together with them, as well as with Henri van den Esker,
Dmitri Znamenski, Mia Deijfen, Shankar Bhamidi, Johan van Leeuwaarden, Júlia Komjáthy,
Nelly Litvak and many others.
I thank Christian Borgs, Jennifer Chayes, Gordon Slade, and Joel Spencer for joint work
on random graphs that are like the Erdős–Rényi random graph but do have geometry. Spe-
cial thanks go to Gordon Slade, who introduced me to the exciting world of percolation,
which is closely linked to the world of random graphs (see the classic text on percolation
by Grimmett (1999)). It is striking to see two communities working on two such closely
related topics with different methods and even different terminology, and it has taken a
long time to build bridges between the two subjects. I am very happy that these bridges
are now rapidly appearing, and the level of communication between different communities
has increased significantly. I hope that this book helps to further enhance this communica-
tion. Frank den Hollander deserves a special mention. Frank, you have been important as a
driving force throughout my career, and I am very happy now to be working with you on
fascinating random graph problems!
Further, I thank
Marie Albenque, Yeganeh Alimohammadi, Rangel Baldasso, Gianmarco Bet,
Shankar Bhamidi, Finbar Bogerd, Marko Boon, Christian Borgs, Hao Can,
Francesco Caravenna, Rui Castro, Kota Chisaki, Deen Colenbrander, Nicolas Curien,
Umberto De Ambroggio, Mia Deijfen, Michel Dekking, Serte Donderwinkel,
Dylan Dronnier, Henri van den Esker, Lorenzo Federico, Federica Finazzi, Allison Fisher,
xiv Preface
Lucas Gerin, Cristian Giardinà, Claudia Giberti, Jesse Goodman, Rowel Gündlach,
Rajat Hazra, Markus Heydenreich, Frank den Hollander, Yusuke Ide, Simon Irons,
Emmanuel Jacob, Svante Janson, Guido Janssen, Lancelot James, Martin van Jole,
Joost Jorritsma, Willemien Kets, Heejune Kim, Bas Kleijn, Júlia Komjáthy, Norio Konno,
Dima Krioukov, John Lapeyre, Lasse Leskelä, Nelly Litvak, Neeladri Maitra,
Abbas Mehrabian, Marta Milewska, Steven Miltenburg, Mislav Mišković, Christian Mönch,
Peter Mörters, Mirko Moscatelli, Jan Nagel, Sidharthan Nair, Alex Olssen,
Mariana Olvera-Cravioto, Helena Peña, Manish Pandey, Rounak Ray, Nathan Ross,
Christoph Schumacher, Matteo Sfragara, Karoly Simon, Lars Smolders, Clara Stegehuis,
Dominik Tomecki, Nicola Turchi, Viktória Vadon, Thomas Vallier, Irène Ayuso Ventura,
Xiaotin Yu, Haodong Zhu and Bert Zwart
for remarks and ideas that have improved the content and presentation of these books sub-
stantially. Wouter Kager read the February 2007 version of this book in its entirety, giving
many ideas for improvements in the arguments and the methodology. Artëm Sapozhnikov,
Maren Eckhoff, and Gerard Hooghiemstra read and commented on the October 2011 ver-
sion. Haodong Zhu read the December 2023 version completely, and corrected several typos.
Particular thanks go to Dennis Timmers, Eefje van den Dungen, Joop van de Pol,
Rowel Gündlach and Lourens Touwen, who, as, my student assistants, have been a great
help in the development of this pair of books, in making figures, providing solutions to some
of the exercises, checking proofs, and keeping the references up to date. Maren Eckhoff
also provided many solutions to the exercises, for which I am grateful! Sándor Kolumbán,
Robert Fitzner, and Lourens Touwen helped me to turn all pictures of real-world networks
as well as simulations of network models into a unified style, a feat that is beyond my
LATEX skills. A big thanks for that! Also my thanks for suggestions and help with figures to
Marko Boon, Alessandro Garavaglia, Dimitri Krioukov, Vincent Kusters, Clara Stegehuis,
Piet Van Mieghem, and Yana Volkovich. A special thanks to my running mates Jan and
Ruud, whose continuing support has been extremely helpful for me.
Support. This work would not have been possible without the generous support of the
Netherlands Organization for Scientific Research (NWO) through VIDI grant 639.032.304,
VICI grant 639.033.806, and the Gravitation N ETWORKS grant 024.002.003.
P OSSIBLE C OURSE O UTLINES
Random Graphs
Introduc-
and Complex Branch-
Networks ing Pro-
tion [V1,
cesses [V1,
Chapter 1]
Chapter 3]
Phase Tran- Small
sition [V1, world [V2,
Chapter 4] Chapter 8]
Probabilistic Connec-
Methods [V1, tivity [V2,
Revis- Chapter 5]
ited [V1, Chapter 2]
Chapter 5]
Erdős–Rényi Introduc- Preferential
Random Graph tion [V1, Attachment Model
Chapter 8]
Inhomogeneous Configura-
Random Graphs tion Model
Related Models
[V2, Chapter 9]
Here is some more explanation as well as a possible itinerary of a master or PhD course
on random graphs, based on Volume 2, in a course outline. For a course outline based on
Volume 1, we refer to [V1, Preface] for alternative routes through the material, we refer to
the book’s website at www.win.tue.nl/˜rhofstad/NotesRGCN.html:
B Start with the introduction to real-world networks in [V2, Chapter 1], which forms the
inspiration for what follows. For readers wishing for a more substantial introduction, do visit
Volume 1 for an extensive introduction to the models discussed here.
B Continue with [V2, Chapter 2] on the local convergence of (random and non-random)
graphs, as this is a crucial tool in the book and has developed into a key methodology in the
field.
The material in this book is rather substantial, and probably too much to be treated in one
course. Thus, we give two alternative approaches to teaching coherent parts of this book:
B You can either take one of the models and discuss the different chapters in Volume 2
xv
xvi Possible Course Outlines
that focus on them. [V2, Chapters 3 and 6] discuss inhomogeneous random graphs, [V2,
Chapters 4 and 7] discuss configuration models, while [V2, Chapters 5 and 8] focus on
preferential attachment models.
B The alternative is that you take one of the topics, and work through them in detail. [V2,
Part II] discusses the local limits and largest connected components or phase transition in
our random graph models, while [V2, Part III] treats their small-world nature.
If you have further questions and/or suggestions about course outlines, feel free to contact
me. Refer to www.win.tue.nl/˜rhofstad/NotesRGCN.html for further sugges-
tions on how to lecture from Volume 2.
Part I
Preliminaries
1
C HAPTER 1
I NTRODUCTION AND P RELIMINARIES
Abstract
In this chapter, we draw motivation from real-world networks and formulate
random graph models for them. We focus on some of the models that have re-
ceived the most attention in the literature, namely, Erdős–Rényi random graphs,
inhomogeneous random graphs, configuration models, and preferential attach-
ment models. We follow van der Hofstad (2017), which we refer to as [V1], both
for motivation and for the introduction to the random graph models involved.
3
4 Introduction and Preliminaries
cises in Section 1.7. We give few references to the literature within this chapter, but defer a
discussion of the history of the various models to the extensive notes in Section 1.6.
In the past two decades, an enormous research effort has been performed with regard to mod-
eling various real-world phenomena using networks. Networks arise in various applications
ranging from the connections between friends in friendship networks to the connectivity of
neurons in the brain, to the relations between companies and countries in economics, and the
hyperlinks between webpages in the World-Wide Web. The advent of the computer era has
made many network data sets available. Around 1999–2000, various groups started to inves-
tigate network data from an empirical perspective. [V1, Chapter 1] gives many examples of
real-world networks and the empirical findings from them. Here we give some basics.
where, for a set A, we write |A| for its size. Exercise 1.1 asks you to prove (1.1.3).
The average degree in a network is equal to
1 X 2|E(G)|
d(G)
v = . (1.1.4)
|V (G)| v∈V (G) |V (G)|
1.1 Motivation: Real-World Networks 5
104
103
Average degree
102
101
100
nk ≈ cn k −τ , (1.1.6)
and thus
log nk ≈ log cn − τ log k, (1.1.7)
6 Introduction and Preliminaries
107
106
Maximum degree
105
104
103
102
105 106 107 108
Size
Figure 1.2 Maximal degrees in the 727 networks of size larger than 10,000 from
the KONECT data base. Linear regression gives log dmax = 0.742 + 0.519 log n.
so that the plot of log k 7→ log nk is close to a straight line. This is the reason why degree
sequences in networks are often depicted in a log–log fashion, rather than in the more cus-
tomary form of k 7→ nk . Here, and in the remainder of this section, we write ≈ to denote
an uncontrolled approximation. The power-law exponent τ can be estimated by the absolute
value of the slope of the line in the log–log plot. Naturally, we must have that
X
nk = |V (Gn )| < ∞, (1.1.8)
k
laws with exponents τ satisfying τ ∈ (2, 3), so that random variables with such degrees
have infinite variance. Since maximal degrees of networks of size n can be expected to
grow as n1/(τ −1) (see Exercise 1.2 for an illuminating example), Figure 1.2 suggests that,
on average, 1/(τ − 1) ≈ 0.519, so that, again on average, τ ≈ 2.93, which is in line with
such predictions.
For the Internet, log–log plots of degree sequences first appeared in a paper by the Falout-
sos brothers (1999) (see Figure 1.3(b) for the degree sequence in the Autonomous Systems
graph, where the degree distribution looks relatively smooth because it is binned). Here,
the power-law exponent is estimated as τ ≈ 2.15–2.20. Figure 1.3(a) displays the degree
distribution in the Internet Movie Data base (IMDb), in which the vertices are actors and
two actors are connected when they have acted together in a movie. Figure 1.4 displays the
degree-sequence for both the in- as well as the out-degrees in various World-Wide Web data
bases.
(a) (b)
100
10−1
10−1
10−2
Proportion
10−3
Proportion
10−3
10−4 10−5
10−5
10−6 10−7
10−7 0
10 101 102 103 104 105 100 101 102 103 104
Degree Degree
Figure 1.3 (a) Log–log plot of the degree sequence in the 2007 Internet Movie
Data base. (b) Log–log plot of the probability mass function of the Autonomous
Systems degree sequence on April 2014, on a log–log scale from Krioukov et al.
(2012) (data courtesy of Dmitri Krioukov). This degree distribution looks smoother
than others (see e.g., Figure 1.3(a) and 1.4), due to binning of the data.
(a) (b)
100 100
Google Google
10−1 Berkeley-Stanford 10−1 Berkeley-Stanford
10−2 10−2
Proportion
Proportion
10−3 10−3
10−4 10−4
10−5 10−5
10−6 0 10−6 0
10 101 102 103 104 105 10 101 102 103
In-degree Out-degree
Figure 1.4 The probability mass function of the in- and out-degree sequences in
the Berkeley-Stanford and Google competition graph data sets of the World Wide
Web in Leskovec et al. (2009). (a) In-degree; (b) out-degree.
Table 1.1 For comparison, fits of scale-free and alternative distributions to real-world networks
taken from (Broido and Clauset, 2019, Table 1). Listed are the percentage of network data sets that
favor the power-law model MPL , the alternative model MAlt , or neither, under a likelihood-ratio test,
along with the form of the alternative distribution indicated by the alternative density x 7→ f (x).
wrote a blog post containing detailed criticism of the methods and results in Broido and
Clauset (2019), see also Voitalov et al. (2019). Holme (2019) summarized the status of the
arguments in 2019, reaching an almost philosophical conclusion:
Still, it often feels like the topic of scale-free networks transcends science – debating them
probably has some dimension of collective soul searching as our field slowly gravitates
toward data science, away from complexity science.
So, what did the discussion focus on? Here is a list of questions:
What are power-law data? An important question in the discussion on power-law degree
distributions is how to interpret the approximation sign in (1.1.9). Most approaches start
by assuming that the data are realizations of independent and identically distributed (iid)
random variables. This can only be an assumption, as degree distributions are mostly
graphical (meaning that they can arise as degree sequences of graphs without self-loops
and multiple edges), which introduces dependencies between them (if only because the
sum of the degrees needs to be even). However, without this assumption, virtually any
analysis becomes impossible, so let us assume this as well.
1.1 Motivation: Real-World Networks 9
Under the above assumption, one needs to infer the degree distribution from the sample
of degrees obtained from a real-world network. We denote the asymptotic degree distri-
bution by pk , i.e., the proportion of vertices of degree k in the infinite-graph limit. Under
this assumption, p(G k
n)
in (1.1.9) is the empirical probability mass function corresponding
to the true underlying degree distribution (pk )k≥0 . The question is thus what probability
mass functions (pk )k≥0 correspond to a power law.
Broido and Clauset (2019) interpreted the power-law assumption as
pk = ck −τ for all k ≥ kmin , (1.1.10)
and pk arbitrary for k ∈ [kmin − 1]; here c > 0 is chosen appropriately. The inclusion
of the kmin parameter is based on the observation that small values of k generally do not
satisfy the pure power law (see also Clauset et al. (2009), where (1.1.10) first appeared).
Barabási (2018) instead argued from the perspective of generative models (such as the
preferential attachment models described in Section 1.3.5, as well as in Chapters 5 and
8):
In other words, by 2001 it was pretty clear that there is no one-size-fits-all formula for
the degree distribution for networks driven by the scale-free mechanism. A pure power
law only emerges in simple idealised models, driven by only growth and preferential
attachment, and free of any additional effects.
Bear in mind that this dynamical approach is very different from that of Broido and
Clauset (2019), as the degrees in generative models can hardly be expected to be real-
izations of an iid sample! Barabási (2018) instead advocated a theory that predicts power
laws with exponential truncation for many settings, meaning that
pk = ck −τ e−Ak for all k ≥ dmin , (1.1.11)
where dmin denotes the minimal degree in the graph and c, A > 0 are appropriate con-
stants, but the theory also allows for “additional effects,” such as vertex fitnesses that
describe intrinsic differences in how likely it is to connect to vertices, and that may be
realistic in some real-world networks.
Voitalov et al. (2019) took a static approach related to that of Broido and Clauset
(2019), but instead assumed more general power laws of the form
X
1 − F (x) = pk = x−(τ −1) L(x) for all x ≥ 1, (1.1.12)
k>x
where x 7→ L(x) is a so-called slowly varying function, meaning a function that does not
change the power-law exponent, in that it grows or decays more slowly than any power
at infinity. See [V1, Definition 1.5], or Definition 1.19 below, for a precise definition. In
particular, distributions that satisfy (1.1.10) also satisfy (1.1.12), but not necessarily the
other way around.
The advantage of working with (1.1.12) is that this definition is quite general, yet a
large body of work within the extreme-value statistics community becomes available.
These results, as summarized in Voitalov et al. (2019), allow for the “most accurate”
ways of estimating the power-law exponent τ , which brings us to the next question.
10 Introduction and Preliminaries
How to estimate the power-law exponent? Since Broido and Clauset (2019) interpreted
the power-law assumption as in (1.1.10), estimating the model parameters then boiled
down to estimating kmin and τ . For this, Broido and Clauset (2019) relied on the first pa-
per on estimating power-law exponents in the area of networks, by Clauset et al. (2009),
who proposed the power-law-fit method (PLF IT). This method chooses the best possible
kmin on the basis of the difference between the empirical degree distribution for values
above kmin and the power-law distribution function based on (1.1.10) with an appropri-
ately estimated value τ̂ of τ , as proposed by Hill (1975), for realizations above kmin .
The estimator τ̂PLFit is then the estimator of τ corresponding to the optimal kmin .
The PLF ITmethod was recently proved to be a consistent method by Bhattacharya et al.
(2020), which means that the estimator will, in the limit, converge in probability to the
correct value τ , even under the weaker assumption in (1.1.12). Of course, the question
remains whether τ̂PLFit is a good estimator, for example in the sense that the rate of con-
vergence of τ̂PLFit to τ is optimal. The results and simulations in Drees et al. (2020)
suggest that, even in the case of a pure power law as in (1.1.10) with kmin = 1, τ̂PLFit
is outperformed by more classical estimators (such as the maximum likelihood estimator
for τ ). Voitalov et al. (2019) rely on the estimators proposed in the extreme-value lit-
erature; see e.g. Danielsson et al. (2001); Draisma et al. (1999); Hall and Welsh (1984)
for such methods and Resnick (2007); Beirlant et al. (2006) for extensive overviews of
extreme-value statistics.
The dynamical approach by Barabási (2018) instead focusses on estimating the pa-
rameters in the proposed dynamical models, a highly-interesting topic that is beyond the
scope of this book.
How to perform tests? When confronted with a model, or with two competing models
such as in Table 1.1, a statistician would often like to compare the fit of these models
to the data, so as to be able to choose between them. When both models are parametric,
meaning that they involve a finite number of parameters, like the models in Table 1.1,
this can be done using a so-called likelihood-ratio test. For this, one computes the likeli-
hood of the data (basically the probability that the model in question gives rise to exactly
what was found in the data) for each of the models, and then takes the ratio of the two
likelihoods. In the settings in Table 1.1, this means that the likelihood of the data for the
power-law model is divided by that for the alternative model. When this exceeds a certain
threshold, the test does not reject the possibility that the data comes from a power law,
otherwise it rejects the null hypothesis of a power-law degree distribution. This is done for
each of the networks in the data base, and Table 1.1 indicates the percentages for which
each of the models is deemed the most likely.
Unfortunately, such likelihood ratio tests can be performed only when one compares
parametric settings. The setting in (1.1.12) is non-parametric, as it involves the unknown
slowly varying function x 7→ L(x), and thus, in that setting, no statistical test can be
performed unless one makes parametric assumptions on the shape of x 7→ L(x) (by
assuming, for example, that L(x) is a power of log x). Thus, the parametric choice in
(1.1.10) is crucial in that it allows for a testing procedure to be performed. Alternatively,
if one does not believe in the “pure” power-law form as in (1.1.10), then tests are no
longer feasible. What approach should one then follow? See Artico et al. (2020) for a
1.1 Motivation: Real-World Networks 11
related testing procedure, in which the authors reached a rather different conclusion than
that of Broido and Clauset (2019).
How to partition networks? Broido and Clauset (2019) investigated a large body of net-
works, relying on a data base consisting of 927 real-world networks from the KONECT
project; see http://konect.cc as well as Kunegis (2013). We are also relying on this
data base for graphs showing network properties, such as average and maximal degrees,
etc. These networks vary in size, as well as in their properties (directed versus undirected,
static versus temporal, etc.). In their paper, Broido and Clauset (2019) report percentages
of networks having certain properties; see for example Table 1.1.
A substantial part of the discussion around Broido and Clauset (2019) focusses on
whether these percentages are representative. Take the example of a directed network,
which has several degree distributions, namely, in-degree, out-degree, and total degree
distributions (in the latter, the directions are simply ignored). This “diversity of degree
distributions” becomes even more pronounced when the network is temporal, meaning
that edges come and go as time progresses. When does one say that a temporal network
has a power-law degree distribution? When one of these degree distributions is classified
as power-law, when a certain percentage of them is, or when all of them are?
What is our approach in this book? We prefer to avoid the precise debate about whether
power laws in degree distributions are omnipresent or rare. We view power laws as a way
to model settings where there is a large amount of variability in the data, and where the
maximum values of the degrees are several orders of magnitude larger than the average
values (compare Figures 1.1 and 1.2). Power laws predict such differences in scale.
There is little debate about the fact that degree distributions in networks tend to be
highly inhomogeneous. Power laws are the model of choice to model such inhomo-
geneities, certainly in settings where empirical moments (for example, empirical vari-
ances) are very large. Further, inhomogeneities lead to interesting differences in structure
of the networks in question, which will be a focal point of this book. All the alternative
models in Table 1.1 have tails that are too thin for such differences to emerge. Thus, it
is natural to focus on models with power-law degrees to highlight the relation between
degree structure and network topology. Therefore, we often consider degree distributions
that are either exactly described by power laws or are bounded above or below by them.
The focus then resides in how the degree power-law exponent τ changes the network
topology.
(a) (b)
0.4
0.12
0.3
0.08
0.2
0.04
0.1
0.0 0.00
1 2 3 4 5 6 7 8 9 10 11 12 13 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44
Figure 1.5 (a) Number of Autonomous Systems traversed in hopcount data. (b)
Internet hopcount data (courtesy of Hongsuda Tangmunarunkit).
wise e-mail messages could not be delivered between pairs of vertices in distinct connected
components.
Graph distances between pairs of vertices tend to be quite small in most networks. For
example, in the Internet, IP packets cannot use more than a threshold of physical links, and if
distances in the Internet were larger than this threshold then the e-mail service would simply
break down. Thus, the Internet graph has evolved in such a way that typical distances are
relatively small, even though the Internet itself is rather large. As seen in Figure 1.5(a), the
number of Autonomous Systems (ASs) traversed by an e-mail data set, sometimes referred
to as the AS-count, is typically at most 7. In Figure 1.5(b), the proportion of routers traversed
by an e-mail message between two uniformly chosen routers, referred to as the hopcount, is
shown. It shows that the number of routers traversed is at most 27. Figure 1.6 shows typical
distances in the IMDb; the distances are quite small despite the fact that the network contains
more than one million vertices.
The small-world nature of real-world networks is highly significant. Indeed, in small
worlds, news can spread quickly as relatively few people are needed to spread it between two
typical individuals. This is quite helpful in the Internet, where e-mail messages hop along
the edges of the network. At the other side of the spectrum, it also implies that infectious
diseases can spread quite quickly, as just a few infections can carry the disease to a large part
of the population. This implies that diseases have a large potential of becoming pandemic,
as the corona pandemic has made painfully clear.
Let us continue this discussion by formally introducing graph distances, as displayed in
Figures 1.5 and 1.6. For a graph G = (V (G), E(G)) and a pair of vertices u, v ∈ V (G),
we let the graph distance distG (u, v) between u and v be equal to the minimal number of
edges in a path linking u and v . When u and v are not in the same connected component,
we set distG (u, v) = ∞. We are interested in settings where G has a high amount of
connectivity, so that many pairs of vertices are connected to one another by short paths. In
order to describe the typical distances between vertices, we draw o1 and o2 independently
1.1 Motivation: Real-World Networks 13
0.6
2003
proportion of pairs
0.4
0.2
0
1 2 3 4 5 6 7 8 9 10
distance
Figure 1.6 Typical distances in the Internet Movie Data base (IMDb) in 2003.
The quantity in (1.1.13) is a random variable even for deterministic graphs, owing to the
presence of the two uar-chosen vertices o1 , o2 ∈ V (G). Figures 1.5 and 1.6 display the
probability mass functions of this random variable for some real-world networks.
Often, we consider distG (o1 , o2 ) conditional on distG (o1 , o2 ) < ∞. This means that we
consider the typical number of edges between a uniformly chosen pair of connected vertices.
As a result, distG (o1 , o2 ) is sometimes referred to as the typical distance.
The nice property of distG (o1 , o2 ) is that its distribution tells us something about all
possible distances in the graph. An alternative and frequently used measure of distance in a
graph is the diameter of the graph G, defined as
However, the diameter has several disadvantages. First, in many instances, the diameter
is algorithmically more difficult to compute than the typical distances (since one has to
compute the distances between all pairs of vertices and maximize over them). Second, it
is a number instead of a distribution of a random variable, and therefore contains far less
information than the distribution distG (o1 , o2 ). Finally, the diameter is highly sensitive to
relatively small changes in the graph G under consideration. For example, adding a relatively
small string of connected vertices to a graph (each of the vertices in the string having degree
2) may drastically change the diameter, while it hardly influences the typical distances.
(a) their degree correlations, measuring the extent to which high-degree vertices tend to be
connected to high-degree vertices rather than to low-degree vertices (and vice versa);
(b) their clustering, measuring the extent to which pairs of neighbors of vertices are neighbors
themselves;
(c) their community structure, measuring the extent to which the network has more densely-
connected subgraphs;
(d) their spatial structure, where the spatial component is either describing true vertex loca-
tions in real-world networks, or instead some latent geometry in them. The spatial struc-
ture is such that vertices that are near are more likely to be connected.
See, e.g., the book by Newman (2010) for an extensive discussion of such features, as
well as the algorithmic problems that arise from them. We also refer the reader to Chapter
9, where we discuss several related models that focus on these properties.
In this section we discuss how random graph sequences can be used to model real-world
networks. We start by discussing graph sequences.
Graph Sequences
Motivated by the previous section, in which empirical evidence was discussed showing that
many real-world networks are scale free and small world, we set about the question of how to
model them. Since many networks are quite large, mathematically, we model real-world net-
works by graph sequences (Gn )n≥1 , where Gn = (V (Gn ), E(Gn )) has size |V (Gn )| = n
and we take the limit n → ∞. Since most real-world networks are such that the average
degree remains bounded, we will focus on the sparse regime. In the sparse regime (recall
(1.1.2) and (1.1.3)), it is assumed that
1 X
lim sup E[Dn ] = lim sup dv(Gn ) < ∞. (1.2.1)
n→∞ n→∞ |V (G n )| v∈V (G )n
Furthermore, we aim to study graphs that are asymptotically well behaved. For example,
we often either assume, or prove, that the typical degree distribution converges, i.e., there
exists a limiting degree random variable D such that
d
Dn −→ D, (1.2.2)
d
where −→ denotes weak convergence of random variables. Also, we assume that our graphs
are small worlds, which is often translated in the asymptotic sense that there exists a constant
K < ∞ such that
lim P(distG (o1 , o2 ) ≤ K log n) = 1, (1.2.3)
n→∞
where n denotes the network size. Sometimes, we even discuss ultra-small worlds, for which
lim P(distG (o1 , o2 ) ≤ ε log n) = 1 (1.2.4)
n→∞
for every ε > 0. In what follows, we discuss random graph models that share these two
features.
1.3 Random Graph Models 15
We start with the most basic and simple random graph model, which has proved to be a
source of tremendous inspiration, both for its mathematical beauty, as well as for providing
a starting point for the analysis of random graphs.
{1, . . . , n}, and the edge uv is occupied or present with probability p, and vacant or absent
otherwise, independently of all the other edges. Here we denote the edge between vertices
u, v ∈ [n] by uv . The parameter p is called the edge probability. The above random graph is
denoted by ERn (p). The model is named after Erdős and Rényi, since they made profound
contributions in the study of this model. Exercise 1.3 investigates the uniform nature of
ERn (p) with p = 21 . Alternatively speaking, ERn (p) with p = 21 is the null model, where
we take no properties of the network into account except for the total number of edges. The
vertices in this model have expected degree (n − 1)/2, which is quite large. As a result, this
model is not sparse at all. Thus, we next make this model sparse by making p smaller.
Since each edge is occupied with probability p, we obtain that
!
n−1 k
P(Dn = k) = p (1 − p)n−1−k = P(Bin(n − 1, p) = k), (1.3.1)
k
where Bin(m, p) is a binomial random variable with m trials and success probability p.
Note that
E[Dn ] = (n − 1)p, (1.3.2)
so for this model to be sparse, we need that p becomes small with n. Thus, we take
λ
p= , (1.3.3)
n
and study the graph as λ is held fixed while n → ∞. In this regime, we know that
d
Dn −→ D, (1.3.4)
with D ∼ Poi(λ), where Poi(λ) is a Poisson random variable with mean λ. It turns out that
this result can be strengthened to the statement that the proportion of vertices with degree
k also converges to the probability mass function of a Poisson random variable (see [V1,
Section 5.4], and in particular [V1, Theorem 5.12]), i.e., for every k ≥ 0,
1 X λk
Pk(n) = 1{dv =k} −→
P
pk ≡ e−λ , (1.3.5)
n v∈[n] k!
We denote the resulting graph by GRGn (w). In many cases, the vertex weights actually
depend on n, and it would be more appropriate (but also more cumbersome), to write the
weights as w(n) = (wv(n) )v∈[n] . To keep the notation simple, we refrain from making the
dependence on n explicit. A special case of the generalized random graph occurs when
nλ
we take wv ≡ n−λ , in which case puv = λ/n for all u, v ∈ [n] so that we retrieve the
Erdős–Rényi random graph ERn (λ/n).
The generalized random graph GRGn (w) is close to many other inhomogeneous random
graph models, such as the random graph with prescribed expected degrees or Chung–Lu
model, denoted by CLn (w), where instead
puv = p(CL)
uv = min(wu wv /`n , 1). (1.3.8)
A further adaptation is the so-called Poissonian random graph or Norros–Reittu model,
denoted by NRn (w), for which
We can interpret Fn as the distribution of the weight of a uniformly chosen vertex in [n] (see
Exercise 1.7). We denote the weight of a uniformly chosen vertex o in [n] by Wn = wo , so
that, by Exercise 1.7, Wn has distribution function Fn .
The degree distribution can converge only when the vertex weights are sufficiently regular.
18 Introduction and Preliminaries
We often assume that the vertex weights satisfy the following regularity conditions, which
turn out to imply convergence of the degree distribution in the generalized random graph:
Condition 1.1 (Regularity conditions for vertex weights) There exists a distribution func-
tion F such that, as n → ∞, the following conditions hold:
(a) Weak convergence of vertex weights. As n → ∞,
d
Wn −→ W, (1.3.11)
where Wn and W have distribution functions Fn and F , respectively. Equivalently, for any
x for which x 7→ F (x) is continuous,
lim Fn (x) = F (x). (1.3.12)
n→∞
which is itself random. Therefore, in Condition 1.1 we require random variables to converge,
and there are several notions of convergence that may be used. The notion of convergence
that we assume is convergence in probability (see [V1, Section 6.2]). J
Let us now discuss some canonical examples of weight distributions that satisfy the Reg-
ularity Condition 1.1.
where [1 − F ]−1 is the generalized inverse function of 1 − F , defined, for u ∈ (0, 1), by
(recall [V1, (6.2.14) and (6.2.15)])
[1 − F ]−1 (u) = inf{x : [1 − F ](x) ≤ u}. (1.3.16)
For the choice (1.3.15), we can explicitly compute Fn as (see [V1, (6.2.17)])
1
Fn (x) = nF (x) + 1 ∧ 1, (1.3.17)
n
where x ∧ y denotes the minimum of x, y ∈ R. It is not hard to see that Condition 1.1(a)
holds for (wv )v∈[n] as in (1.3.15), while Condition 1.1(b) holds when E[W ] ∈ (0, ∞), and
Condition 1.1(c) holds when E[W 2 ] < ∞, as can be concluded from Exercise 1.9.
1{uv∈E(GRGn (w))} .
X
dv = (1.3.18)
u∈[n]
For k ≥ 0, we let
1 X
Pk(n) = 1{dv =k} (1.3.19)
n v∈[n]
denote the proportion of vertices with degree k of GRGn (w). We call (Pk(n) )k≥0 the de-
gree sequence of GRGn (w). We denote the probability mass function of a mixed-Poisson
distribution by pk , i.e., for k ≥ 0,
h Wki
pk = E e−W , (1.3.20)
k!
where W is a random variable having distribution function F from Condition 1.1. The main
result concerning the vertex degrees is as follows:
20 Introduction and Preliminaries
Theorem 1.3 (Degree sequence of GRGn (w)) Assume that Conditions 1.1(a),(b) hold.
Then, for every ε > 0,
X∞
P |Pk(n) − pk | ≥ ε → 0, (1.3.21)
k=0
conditioned on {dv (X) = dv ∀v ∈ [n]}, is uniform over all graphs with degrees (dv )v∈[n] .
is even.
We wish to construct aP simple graph such that d = (dv )v∈[n] are the degrees of the n
vertices. Even when `n = v∈[n] dv is even, however, this is not always possible. Therefore,
instead, we construct a multi-graph. One way of obtaining such a multi-graph with the given
degree sequence is to pair the half-edges attached to the different vertices in a uniform way.
22 Introduction and Preliminaries
Two half-edges together form an edge, thus creating the edges in the graph. Let us explain
this in more detail.
To construct the multi-graph where vertex v has degree dv for all v ∈ [n], we have n
separate vertices and, incident to vertex v , we have dv half-edges. Every half-edge needs
to be connected to another half-edge to form an edge, and by forming all edges we build
the graph. For this, the half-edges are numbered in an arbitrary order from 1 to `n . We start
by randomly connecting the first half-edge with one of the `n − 1 remaining half-edges.
Once paired, two half-edges form a single edge of the multi-graph, and these half-edges are
removed from the list of half-edges that need to be paired. Hence, a half-edge can be seen
as the left or the right half of an edge. We continue the procedure of randomly choosing and
pairing the half-edges until all half-edges are connected, and we call the resulting graph the
configuration model with degree sequence d, abbreviated as CMn (d). The pairing of the
half-edges that induces the configuration model graph is sometimes called a configurationl.
A careful reader may worry about the order in which the half-edges are being paired.
In fact, this ordering turns out to be irrelevant since the random pairing of half-edges is
completely exchangeable. It can even be done in a random fashion, which will be useful
when investigating neighborhoods in the configuration model. See e.g., [V1, Definition 7.5
and Lemma 7.6] for more details on this exchangeability.
Interestingly, one can rather explicitly compute the distribution of CMn (d). To do so,
note that CMn (d) is characterized by the random vector (Xuv )1≤u≤v≤n . Here Xuv is the
number of edges between vertex u and v , and Xvv is the number of self-loops incident to
vertex v , so that
X
dv = Xvv + Xuv . (1.3.28)
u∈[n]
Note furthermore that Xvv appears twice in (1.3.28), which is natural, since a self-loop
consists of two half-edges. This does not conflict with the definition of dv for GRGn (w),
since Xuu = 0 and Xu,v ∈ {0, 1} for GRGn (w).
In terms of this notation, and writing G = (xuv )u,v∈[n] to denote a multi-graph on [n],
Q
1 v∈[n] dv !
P(CMn (d) = G) = . (1.3.29)
(`n − 1)!! v∈[n] 2xvv 1≤u≤v≤n xuv !
Q Q
See, e.g., [V1, Proposition 7.7] for this result. In particular, P(CMn (d) = G) is the same
for each simple G, where G is simple when xvv = 0 for every v ∈ [n] and xuv ∈ {0, 1}
for every 1 ≤ u < v ≤ n. Thus, the configuration model conditioned on simplicity is a
uniform random graph with the prescribed degree distribution. This is quite relevant, as it
gives a convenient way to obtain such a uniform graph, which is a highly non-trivial fact.
Remark 1.6 (What’s in a name continued?) The name configuration model was invented
by Bollobás (1980), who considered the matching of half-edges to be the configuration on
which the model is based. The model of study for Bollobás (1980) was the uniform simple
random regular graph, where all degrees are the same, as we discuss further below. Molloy
and Reed (1995, 1998) extended it to general degrees. As a result, it is sometimes also called
1.3 Random Graph Models 23
the Molloy–Reed model. With Xuv equal to the number of edges between vertices u and v ,
du dv
E[Xuv ] = , (1.3.30)
`n − 1
since each of the dv half-edges incident to vertex v has probability du /(`n − 1) to be con-
nected to vertex u. Since (1.3.30) is close to the edge probability puv in rank-1 random
graphs (recall Remark 1.5), rank-1 random graphs are sometimes called soft configuration
models. The configuration-model degree constraint is instead viewed as a hard constraint.J
The uniform nature of the configuration model conditioned on simplicity partly explains
its popularity, and it has become one of the most highly studied random graph models. It also
implies that, conditioned on simplicity, the configuration model is the null model for a real-
world network where all the degrees are fixed. This allows one to distinguish the relevance
of the degree inhomogeneity from other features of the network, such as its community
structure, clustering, etc.
As for GRGn (w), we again impose regularity conditions on the degree sequence d. In
order to state these assumptions, we introduce some notation. We denote the degree of a
uniformly chosen vertex o in [n] by Dn = do . The random variable Dn has distribution
function Fn given by
1 X
Fn (x) = 1{dv ≤x} , (1.3.31)
n v∈[n]
which is the empirical distribution of the degrees. We assume that the vertex degrees satisfy
the following regularity conditions:
Condition 1.7 (Regularity conditions for vertex degrees)
(a) Weak convergence of vertex degrees. There exists a distribution function F such that,
as n → ∞,
d
Dn −→ D, (1.3.32)
where Dn and D have distribution functions Fn and F , respectively.
Equivalently, for any x ∈ R,
lim Fn (x) = F (x). (1.3.33)
n→∞
and denote the related degree sequence in the erased configuration model (Pk(er) )k≥1 by
1 X
Pk(er) = 1 (er) . (1.3.39)
n v∈[n] {Dv =k}
From the notation it should be clear that (p(n) k )k≥1 is a deterministic sequence when d =
(dv )v∈[n] is deterministic, while (Pk(er) )k≥1 is a random sequence, since the erased degrees
(Dv(er) )v∈[n] form a random vector even when d = (dv )v∈[n] is deterministic.
Now we are ready to state the main result concerning the degree sequence of the erased
configuration model:
Theorem 1.8 (Degree sequence of erased configuration model with fixed degrees) For
fixed degrees d satisfying Conditions 1.7(a),(b), the degree sequence of the erased config-
uration model (Pk(er) )k≥1 converges in probability to (pk )k≥1 . More precisely, for every
ε > 0,
X∞
P |Pk(er) − pk | ≥ ε → 0, (1.3.40)
k=1
where
E[D(D − 1)]
ν= (1.3.42)
E[D]
is the expected forward degree. This is a realistic option when E[D2 ] < ∞. Unfortu-
nately, this is not an option when the asymptotic degrees obey an asymptotic power law with
τ ∈ (2, 3) (as, e.g., in (1.1.12)), since then E[D2 ] = ∞. Note that, by (1.3.29), CMn (d)
conditioned on simplicity is a uniform random graph with the prescribed degree sequence.
We denote this random graph by UGn (d). We return to the difficulty of generating simple
graphs with infinite-variance degrees in Section 1.3.4 below.
Proof See [V1, Theorem 7.19]. The weak convergence in Condition 1.7(a) follows from
Theorem 1.3.
26 Introduction and Preliminaries
Remark 1.10 (Proving results for GRGn (w) through CMn (d)) Combined with Theorem
1.4, Theorem 1.9 allows us to prove many results for the generalized random graph by first
proving them for the configuration model under appropriate conditions on its degrees, and
then extending them to the generalized random graph by proving that its degrees satisfy the
assumptions made. In particular, any property that holds in probability for CMn (d) can be
extended to GRGn (w) in this way. See [V1, Sections 6.6 and 7.5] for more details. This
strategy is also frequently used in the present volume. J
(a) the degrees in CMn0 (d0 ) are a truncated version of those in CMn (d), i.e., d0v = (dv ∧b)
for v ∈ [n], and d0v = 1 for v ∈ [n0 ] \ [n];
(b) the total degree in CMn0 (d0 ) is the same as that in CMn (d), i.e., v∈[n0 ] d0v = v∈[n] dv ;
P P
(c) for all u, v ∈ [n], if u and v are connected in CMn0 (d0 ), then so are u and v in
CMn (d), i.e., distCMn (d) (u, v) ≤ distCMn0 (d0 ) (u, v) almost surely.
Remark 1.12 (Truncation of degrees in range) The construction that proves Theorem 1.11
is highly flexible, and also allows for a degree truncation that maintains restrictions on the
minimal degree dmin = minv∈[n] dv . Indeed, fix b ≥ 2. There exists a related configuration
model CMn0 (d0 ) satisfying (b) and (c) in Theorem 1.11, while (a) is replaced by d0v = dv
when dv < 2b, by d0v = b when dv ≥ 2b for v ∈ [n], and by b ≤ d0v < 2b for v ∈ [n0 ] \ [n],
so that d0min = minv∈[n0 ] d0v ≥ dmin ∧ b. J
Proof The proof relies on an “explosion” or “fragmentation” of the vertices [n] in CMn (d).
Label the half-edges from 1 to `n . We go through the vertices v ∈ [n] one by one. When
dv ≤ b, we do nothing. When dv > b, we let d0v = b and keep the b half-edges with the
lowest labels. The remaining dv − b half-edges are exploded from vertex v , in that they
are incident to vertices of degree 1 in CMn0 (d0 ), and are given vertex labels above n. We
give the exploded half-edges the remaining labels of the half-edges incident to v . Thus, the
half-edges receive labels both in CMn (d) as well as in CMn0 (d0 ), and the labels of the half-
edges incident to v ∈P [n] in CMn0 (d0 ) are a subset of those in CMn (d). In total, we thus
create an extra n+ = v∈[n] (dv − b) ∨ 0 “exploded” vertices of degree 1, and n0 = n + n+ ,
where x ∨ y denotes the maximum of x, y ∈ R.
We then pair the half-edges randomly, in the same way in CMn (d) as in CMn0 (d0 ). This
means that when the half-edge with label x is paired with the half-edge with label y in
1.3 Random Graph Models 27
CMn (d), then also the half-edge with label x is paired with the half-edge with label y in
CMn0 (d0 ), for all x, y ∈ [`n ].
We now check parts (a)–(c). Obviously parts (a) and (b) follow from the construction.
For part (c), we note that all exploded vertices in [n+ ] \ [n] have degree 1. Further, for
vertices u, v ∈ [n], if there exists a path in CMn0 (d0 ) connecting them then the intermediate
vertices have degree at least 2, so that they cannot correspond to exploded vertices and must
therefore in CMn0 (d0 ) have labels in [n]. Thus, the same path of paired half-edges also
exists in CMn (d), so that u and v are also connected in CMn (d).
We conclude by adapting the construction to prove the statement in Remark 1.12. We
again go through the vertices v ∈ [n] one by one. When dv < 2b, we do nothing. When dv ≥
2b, we let d0v = b and keep the b half-edges with the lowest labels. The remaining dv − b
half-edges are exploded from vertex v , in that they are incident to “exploded” vertices that
all have degree b in CMn0 (d0 ) possibly except for one vertex that has degree in [b, 2b), and
are given vertex labels above n. This means that a vertex of degree dv ≥ 2b is replaced by
one vertex in [n] and bdv /bc−1 vertices in [n0 ]\[n], of which all, possibly except for the last
vertex, have degree b, and the degree of the last vertex equals dv − b(bdv /bc − 1) ∈ [b, 2b).
We again give the exploded half-edges the remaining labels of the half-edges incident to v .
This identifies the desired construction for Remark 1.12. For part (c), we note that the half-
edges incident to exploded vertices arise from the same vertex in [n] as before explosion, so
a path between vertices u0 , v 0 ∈ [n0 ] in CMn0 (d0 ) implies that a path between the vertices
u, v ∈ [n] that correspond to u0 , v 0 exists. This implies that part (c) holds.
equals dv for all v ∈ [n]. We assume that such a simple graph exists, i.e., we assume that
d = (dv )v∈[n] is graphical.
In order to describe the dynamics of the switch chain, choose two edges {u, v} and {x, y}
uar from the edge set E(G), where G is the current simple graph. The possible switches of
these two edges are (1) {u, x} and {v, y}; (2) {v, x} and {u, y}; and (3) {u, v} and {x, y}
(so that no change is made). Choose each of these three options with probability equal to 13 ,
and write the chosenedges as e1 , e2 . Accept the switch when the resulting graph with edges
{e1 , e2 } ∪ (E(G) \ {u, v}, {x, y} ) is simple, and reject the switch otherwise (so that the
graph remains unchanged under the dynamics).
It is not hard to see that the resulting Markov chain is aperiodic and irreducible. Further,
the switch chain is doubly stochastic since it is reversible. As a result, its stationary dis-
tribution is the uniform random graph with prescribed degree sequence d, which we have
denoted by UGn (d), as required.
The above method works rather generally, and, in the limit of infinitely many switches,
produces a sample from UGn (d) for every graphical degree sequence, even when the de-
grees are large. As a result, this chain is the method of choice to produce a sample of
UGn (d) when the probability of simplicity of the configuration model vanishes. However,
it is unclear precisely how often one needs to switch in order for the Markov chain to be
sufficiently close to the uniform (and thus stationary) distribution. See the notes in Section
1.6 for a discussion of the history of the switch chain, as well as the available results about
its convergence.
Remark 1.14 (Relation to ECMn (d) and GRGn (w)) Theorem 1.13 shows that, when
du dv `n ,
`n
1 − P({u, v} ∈ E(UGn (d))) = (1 + o(1)) . (1.3.46)
du dv
1.3 Random Graph Models 29
since d1 ≤ (cF n)1/(τ −1) + 1. Since τ ∈ (2, 3), the above is o(n).
Proof of Theorem 1.13. To compute the asymptotics of P({u, v} ∈ E(UGn (d)) | EU ),
we switch between two classes of graphs, S and S̄ . Class S consists of graphs where all
edges in {u, v} ∪ U are present, whereas S̄ consists of all graphs where every {s, t} ∈ U is
present, but {u, v} is not. Recall that EU = {{s, t} ∈ E(UGn (d)) ∀{s, t} ∈ U } denotes
the event that {s, t} is an edge for every {s, t} ∈ U . Then, since the law on simple graphs
is uniform (see also Exercise 1.18),
|S| 1
P({u, v} ∈ E(UGn (d)) | EU ) = = , (1.3.50)
|S| + |S̄| 1 + |S̄|/|S|
and we are left to compute the asymptotics of |S̄|/|S|.
For this, we define an operation called a forward switching that converts a graph in G ∈
S to a graph G0 ∈ S̄ . The reverse operation, converting G0 to G, is called a backward
30 Introduction and Preliminaries
u v u v
x y x y
a b a b
Figure 1.7 Forward and backward switchings. The edge {u, v} is present on the
left, but not on the right.
switching. Then we estimate |S̄|/|S| by counting the number of forward switchings that can
be applied to the graph G ∈ S , and the number of backward switchings that can be applied
to the graph G0 ∈ S̄ . In our switching, we wish to have control on whether {u, v} is present
or not, so we tune it to take this restriction into account.
The forward switching on G ∈ S is defined by choosing two edges and specifying their
ends as {x, a} and {y, b}. We write this as directed edges (x, a) since the roles of x and a
are different, as indicated in Figure 1.7. We assume that EU occurs. The choice must satisfy
the following constraints:
For (b0 ), there are O(1) choices for choosing {a, b} since |U | = O(1), and at most
(du − |Uu |)(dv − |Uv |) choices for x and y . Thus, the number of choices for case (b0 ) is
O((du − |Uu |)(dv − |Uv |)) = o((du − |Uu |)(dv − |Uv |)`n ).
For (c0 ), the case where a or b is equal to x or y corresponds to a 2-path starting from u or
v together with a single edge from u or v . Since o(`n ) bounds the number of 2-paths starting
from u or v and du −|Uu |+dv −|Uv | bounds the number of ways to choose the single edge,
there are o(`n (dv −|Uv |))+o(`n (du −|Uu |)) total choices. If a or b is equal to u or v , there
are (du − |Uu |)(dv − |Uv |) ways to choose x and y , and at most du + dv ways to choose
the last vertex as a neighbor of u or v . Thus, there are O((du − |Uu |)(dv − |Uv |)dmax ) =
o((du − |Uu |)(dv − |Uv |)`n ) total choices, since dmax = O(n1/(τ −1) ) = o(n) = o(`n ).
We conclude that the number of backward switchings that can be applied to any graph
G0 ∈ S 0 is (du − |Uu |)(dv − |Uv |)`n (1 + o(1)), so that
E[b(G0 )] = (du − |Uu |)(dv − |Uv |)`n (1 + o(1)). (1.3.54)
Conclusion
Combining (1.3.52), (1.3.53), and (1.3.54) results in
`2n
|S̄|/|S| = (1 + o(1)) , (1.3.55)
(du − |Uu |)(dv − |Uv |)`n
and thus (1.3.50) yields
1
P({u, v} ∈ E(UGn (d)) | EU ) =
1 + |S̄|/|S|
(du − |Uu |)(dv − |Uv |)
= (1 + o(1)) . (1.3.56)
`n + (du − |Uu |)(dv − |Uv |)
Remark 1.16 (Uniform random graphs and configuration models) Owing to the close links
between uniform random graphs with prescribed degrees and configuration models, we treat
the two models together, in Chapters 4 and 7. J
This preferential attachment mechanism is called affine, since the attachment probabilities
in (1.3.57) depend in an affine way on the degrees of the random graph PA(1,δ) n (a).
The model with m > 1 is defined in terms of the model for m = 1 as follows. Fix δ ≥
−m. We start with PA(1,δ/m) mn (a), and denote the vertices in PA(1,δ/m)mn (a) by v1(1) , . . . , vmn
(1)
.
(1) (1,δ/m)
Then we identify or collapse the m vertices v1 , . . . , vm in PAmn (a) to become vertex
(1)
v1(m) in PA(m,δ) n (a). In doing so, we let all the edges that are incident to any of the vertices
in v1(1) , . . . , vm
(1)
be incident to the new vertex v1(m) in PA(m,δ) n (a). Then, we collapse the
(1) (1) (m)
m vertices vm+1 , . . . , v2m in PAmn (a) to become vertex v2 in PA(m,δ)
(1,δ/m)
n (a), etc. More
(1) (1)
generally, we collapse the m vertices v(j−1)m+1 , . . . , vjm in PA(1,δ/m)
mn (a) to become vertex
(m)
vj in PAn (a). This defines the model for general m ≥ 1.
(m,δ)
The resulting graph PA(m,δ) n (a) is a multi-graph with precisely n vertices and mn edges,
so that the total degree is equal to 2mn. The model with δ = 0 is sometimes called the
proportional model. The inclusion of the extra parameter δ > −m is relevant, though, as
we will see later. It can be useful to think of edges and vertices as carrying weights, where a
(1)
vertex has weight δ and an edge has weight 1. Then, the vertex vn+1 attaches its edges with
a probability proportional to the weight of the vertex plus the edges to which it is incident.
This, for example, explains why PA(1,δ/m) mn (a) needs to be used in the collapsing procedure,
(1,δ)
rather than PAmn (a).
The preferential attachment model (PA(m,δ) n (a))n≥1 is increasing in time, in the sense
that vertices and edges, once they have appeared, remain there forever. Thus, the degrees
are monotonically increasing in time. Moreover, vertices with a high degree have a higher
chance of attracting further edges of later vertices. Therefore, the model is sometimes called
a.s.
the rich-get-richer model. It is not hard to see that Di (n) −→ ∞ for each fixed i ≥ 1,
as n → ∞ (see Exercise 1.20). As a result, one could also call the preferential attachment
model the old-get-richer model.
The following theorem describes the evolution of the degree of fixed vertices:
Theorem 1.17 (Degrees of fixed vertices) Consider PA(m,δ)
n (a) with m ≥ 1 and δ > −m.
Then, Di (n)/n1/(2+δ/m) converges almost surely to a random variable ξi as n → ∞.
34 Introduction and Preliminaries
for the (random) proportion of vertices with degree k at time n. For m ≥ 1 and δ > −m,
we define (pk )k≥0 by pk = 0 for k = 0, . . . , m − 1 and, for k ≥ m,
Γ(k + δ)Γ(m + 2 + δ + δ/m)
pk = (2 + δ/m) . (1.3.60)
Γ(m + δ)Γ(k + 3 + δ + δ/m)
It turns out that (pk )k≥0 is a probability mass function (see [V1, Section 8.4]). It arises as
the limiting degree distribution for PA(m,δ)
n (a), as shown in the following theorem:
Theorem 1.18 (Degree sequence in preferential attachment model) Consider PA(m,δ)
n (a)
with m ≥ 1 and δ > −m. There exists a constant C = C(m, δ) > 0 such that, as n → ∞,
r
log n
P max |Pk (n) − pk | ≥ C = o(1). (1.3.61)
k n
Proof See [V1, Theorem 8.3].
We next investigate the scale-free properties of (pk )k≥0 by investigating the asymptotics
of pk for k large. By (1.3.60) and Stirling’s formula, as k → ∞ we have
pk = cm,δ k −τ (1 + O(1/k)), (1.3.62)
where
Γ(m + 2 + δ + δ/m)
τ = 3 + δ/m > 2, and cm,δ = (2 + δ/m) . (1.3.63)
Γ(m + δ)
Therefore, by Theorem 1.18 and (1.3.62), the asymptotic degree sequence of PA(m,δ)
n (a) is
close to a power law with exponent τ = 3 + δ/m. We note that any exponent τ > 2 can be
obtained by choosing δ > −m and m ≥ 1 appropriately.
Di (n, j) + δ
P vn+1,j+1
(m)
→ vi(m) | PA(m,δ)
n,j (d) = for i ∈ [n]. (1.3.65)
n(2m + δ)
Here, Di (n, j) is the degree of vertex vi(m) after the connection of the edges incident to the
(m)
first n + 1 vertices, as well as the first j edges incident to vertex vn+1 , and PA(m,δ)
n,j (d) is the
(m)
graph of the first n vertices, as well as the first j edges incident to vertex vn+1 . The model
is by default connected, and at time n consists of n + 1 vertices and mn edges. For m = 1,
apart from the different starting graphs, models (b) and (d) are identical. Indeed, PA(1,δ)n (b)
for n = 2 consist of two vertices with two edges between them, while PA(1,δ) n (d) for n = 2
consists of two vertices with one edge between them for PA(1,δ) n (d).
Many other adaptations are possible and have been investigated in the literature, such
(m)
as settings where the m edges incident to vn+1 are independently connected as in (1.3.65)
when j = 0. We refrain from discussing these. It is not hard to verify that Theorem 1.18
holds for all these adaptations, which explains why authors have often opted for the version
of the model that is most convenient for them. From the perspective of local convergence, it
turns out that (PA(m,δ)
n (d))n≥1 is the most convenient, as we will see in Chapter 5. On the
other hand, Theorem 1.17 contains minor adaptations between models, particularly since the
limiting random variables (ξi )i≥1 do depend on the precise model.
In particular, log(1/pk )/ log k → 1+1/γ when f (k)/k → γ ∈ (0, 1) (see Exercise 1.23).
Remarkably, when f (k) = γk + β , the power-law exponent of the degree distribution does
not depend on β . The restriction that f (k + 1) − f (k) < 1 is needed to prevent the degrees
from exploding. Further, log(1/pk ) ∼ k 1−α /(γ(1 − α)) when f (k) ∼ γk α for some
α ∈ (0, 1) (see Exercise 1.24). Interestingly, there exists a P persistent hub, i.e., a vertex
that
P has maximal degree for all but finitely many times, when k≥1 1/f (k)2 < ∞. When
2
k≥1 1/f (k) = ∞, this does not happen.
in the same universality class, or rather in different ones, and why. We will see that the
degree distribution decides the universality class for a wide range of models, as one might
possibly hope. This also explains why the degree distribution plays such a dominant role in
the investigation of random graphs. See Chapter 9 for more details.
In this book, we frequently deal with random variables having an (asymptotic) power-law
distribution. For such random variables, we often need to investigate truncated moments,
and we also often deal with their sized-biased distribution. In this section, we collect some
results concerning power-law random variables. We start by recalling the definition of a
power-law distribution:
Definition 1.19 (Power-law distributions) We say that X has a power-law distribution with
exponent τ when there exists a function x 7→ L(x) that is slowly varying at infinity such
that
1 − FX (x) = P(X > x) = L(x)x−(τ −1) . (1.4.1)
Here, we recall that a function x 7→ L(x) is slowly varying at infinity when, for every t > 0,
L(xt)
lim = 1. (1.4.2)
x→∞ L(x)
J
A crucial result about slowly varying functions is Potter’s Theorem, which we next recall:
Theorem 1.20 (Potter’s Theorem) Let x 7→ L(x) be slowly varying at infinity. For every
δ , there exists a constant Cδ ≥ 1 such that, for all x ≥ 1,
x−δ /Cδ ≤ L(x) ≤ Cδ xδ . (1.4.3)
Theorem 1.20 implies that the tail of any general power-law distribution, as in Definition
1.19, can be bounded above and below by that of a pure power-law distribution (i.e., one
without a slowly varying function) with a slightly adapted power-law exponent. As a result,
we can often deal with pure power laws instead.
We continue by studying the relation between power-law tails of the empirical degree
distribution and bounds on the degrees themselves:
Lemma 1.21 (Tail and degree bounds) Let d = (dv )v∈[n] be a degree distribution, d(1) ≥
d(2) ≥ · · · ≥ d(n−1) ≥ d(n) its non-increasing ordered version, and
1 X
Fn (x) = 1{dv ≤x} (1.4.4)
n v∈[n]
while
1/(τ −1)
d(v) ≤ (cF n/v) ∀v ∈ [n] (1.4.7)
implies that
[1 − Fn ](x) ≤ cF x−(τ −1) ∀x ≥ 1. (1.4.8)
Proof Assume first that (1.4.5) holds. For every v ∈ [n], the number of vertices with
degree at least d(v) is at least v . By (1.4.5), for every v ∈ [n],
cF n(d(v) − 1)1−τ ≥ n[1 − Fn ](d(v) − 1) ≥ v. (1.4.9)
1/(τ −1)
Thus, d(v) ≤ (cF n/v) + 1, as required.
Next, assume that (1.4.7) holds. Then
1 X 1 X
[1 − Fn ](x) = 1{dv >x} = 1{d(v) >x}
n v∈[n] n v∈[n]
1 X
≤ 1{(cF n/v)1/(τ −1) >x}
n v∈[n]
1 X
= 1{v<ncF x−(τ −1) } ≤ cF x−(τ −1) , (1.4.10)
n v∈[n]
as required.
We next study truncated moments of random variables whose tail is bounded by that of a
power law:
Lemma 1.22 (Truncated moments) Let X be a non-negative random variable whose dis-
tribution function FX (x) = P(X ≤ x) satisfies, for every x ≥ 1,
1 − FX (x) ≤ CX x−(τ −1) . (1.4.11)
Then, for all a < τ − 1, there exists a constant CX (a) such that, for all ` ≥ 1,
E[X a 1{X>`} ] ≤ CX (a)`a−(τ −1) , (1.4.12)
while, for a > τ − 1 and all ` ≥ 1,
E[X a 1{X≤`} ] ≤ CX (a)`a−(τ −1) . (1.4.13)
Proof We note that for any cumulative distribution function x 7→ FX (x) on the non-
negative reals, we have a partial integration identity, stating that, for every f : R → R,
Z ∞ Z ∞
f (x)FX (dx) = f (u)[1 − FX (u)] + [f (x) − f (u)]FX (dx)
u
Zu∞ Z x
= f (u)[1 − FX (u)] + f 0 (y)dyFX (dx)
u u
Z ∞ Z ∞
0
= f (u)[1 − FX (u)] + f (y) FX (dx)dy
u y
Z ∞
= f (u)[1 − FX (u)] + f 0 (y)[1 − FX (y)]dy, (1.4.14)
u
1.5 Notation and Preliminaries 39
provided that either (a) y 7→ f 0 (y)[1 − FX (y)] is absolutely integrable, or (b) x 7→ f (x) is
either non-decreasing or non-increasing, so that f 0 (y)[1 − FX (y)] has a fixed sign. Here, the
interchange of the integration order is allowed by Fubini’s Theorem for non-negative func-
tions (Halmos, 1950, Section 36, Theorem B) when x 7→ f (x) is non-decreasing, and by
Fubini’s Theorem for absolutely–integrable functions (Halmos, 1950, Section 36, Theorem
C) when y 7→ f 0 (y)[1 − FX (y)] is absolutely integrable. Similarly, for f with f (0) = 0,
Z u Z uZ x Z u Z u
f (x)FX (dx) = f 0 (y)dyFX (dx) = f 0 (y) FX (dx)dy
0 0 0 0 y
Z u
= f 0 (y)[FX (u) − FX (y)]dy. (1.4.15)
0
Let us introduce some standard notation used throughout this book, and recall some prop-
erties of trees and Poisson processes.
40 Introduction and Preliminaries
Abbreviations
We write rhs for right-hand side, and lhs for left-hand side. Further, we abbreviate with
respect to by wrt.
Random variables
d
We write X = Y to denote that X and Y have the same distribution. We write X ∼ Be(p)
when X has a Bernoulli distribution with success probability p, i.e., P(X = 1) = 1 −
P(X = 0) = p. We write X ∼ Bin(n, p) when the random variable X has a binomial
distribution with parameters n and p, and we write X ∼ Poi(λ) when X has a Poisson
distribution with parameter λ.
We write X ∼ Exp(λ) when X has an exponential distribution with mean 1/λ. We
write X ∼ Gam(r, λ) when X has a gamma distribution with scale parameter λ and shape
parameter r, for which the density, for x ≥ 0, is given by
fX (x) = λr xr−1 e−λx /Γ(r), (1.5.1)
where r, λ > 0, and we recall (1.3.58), while fX (x) = 0 for x < 0. The random variable
Gam(r, λ) has mean r/λ and variance r/λ2 . Finally, we write X ∼ Beta(α, β) when X
has a beta distribution with parameters α, β > 0, so that X has density, for x ∈ [0, 1],
fX (x) = xα−1 (1 − x)β−1 /B(α, β), (1.5.2)
where
Γ(α)Γ(β)
B(α, β) = (1.5.3)
Γ(α + β)
is the Beta-function, while fX (x) = 0 for x 6∈ [0, 1]. We sometimes abuse notation, and
write e.g., P(Bin(n, p) = k) to denote P(X = k) when X ∼ Bin(n, p).
We call a sequence of random variables (Xi )i≥1 independent and identically distributed
(iid) when they are independent, and Xi has the same distribution as X1 for every i ≥ 1.
For a finite set X , we say that X ∈ X is drawn uniformly at random (uar) when X has the
uniform distribution on X .
Stochastic Domination
We recall that a random variable X is stochastically dominated by a random variable Y
when FX (x) = P(X ≤ x) ≥ FY (x) = P(Y ≤ x) for every x ∈ R. We write this as
X Y . See [V1, Section 2.3] for more details on stochastic ordering.
1.5 Notation and Preliminaries 41
|u| < |v| or |u| = |v| and u = ∅u1 · · · uk and v = ∅v1 · · · vk are such that (u1 , . . . , uk ) <
(v1 , . . . , vk ) in the lexicographic sense. J
We next explain the breadth-first exploration of t:
Definition 1.25 (Breadth-first exploration of a tree) For a tree t of size |V (t)| = t, we let
(ak )tk=0 be the elements of V (t) ordered according to the breadth-first ordering of t (recall
Definition 1.24). For i ≥ 1, let xi denote the number of children of vertex ai . Thus, if dv
denotes the degree of v ∈ V (t) in the tree t, we have x1 = da0 = d∅ and xi = dai − 1 for
i ≥ 2. The recursion
si = si−1 + xi − 1 for i ≥ 1, with s0 = 1, (1.5.7)
describes the evolution of the number of unexplored vertices in the breadth-first exploration.
For a finite tree t of size |V (t)| = t, thus si > 0 for i ∈ {0, . . . , t − 1} while st = 0. J
The sequence (xi )ti=1 gives an alternative encoding of the tree t that is often convenient.
Indeed, by Exercise 1.28, the sequence (xi )ti=1 is in one-to-one correspondence to t.
Sections 1.1–1.3 are in the majority summaries of chapters in Volume 1, to which we refer for notes and
discussion, so we restrict ourselves here to the exceptions.
1.6 Notes and Discussion for Chapter 1 43
chains focusses on two key aspects: first, their rapid mixing (Erdős et al. (2022); Gao and Greenhill (2021),
and various related papers, for which we refer to Erdős et al. (2022)), and, second, counting the number
of simple graphs using switch chain arguments (as in Gao and Wormald (2016)), which is the approach
that we take in this section. Rapid mixing means that the mixing time of the switch chain is bounded by
an explicit power of the number of vertices (or number of edges, or both combined). The powers, however,
tend to be large, and thus “rapid mixing” may not be rapid enough to give good guarantees when one is
trying to sample a uniform random graph of the degree distribution of some real-world network. Theorem
1.13 is adapted from Gao et al. (2020), where it was used to compute the number of triangles in uniform
random graphs with power-law degree distributions having infinite variance. See also Janson (2020b) for a
relation between the configuration model and uniform random graphs using switchings.
Preferential attachment models were first introduced in the context of complex networks by Barabási and
Albert (1999). Bollobás et al. (2001) studied the model by Barabási and Albert (1999), and later many other
papers followed on this, and related, models. Barabási and Albert (1999) and Bollobás et al. (2001) focussed
on the proportional model, for which δ = 0. The affine model was proposed by Dorogovtsev et al. (2000).
All these works were pre-dated by Price (1965); Simon (1955); Yule (1925); see [V1, Chapter 8] for more
details on the literature. The Bernoulli preferential attachment model was introduced and investigated by
Dereich and Mörters (2009, 2011, 2013).
Exercise 1.9 (Domination weights) Let Wn have the distribution function Fn from (1.3.17). Show that
Wn is stochastically dominated by the random variable W having distribution function F . Here we recall
that Wn is stochastically dominated by W when P(Wn ≤ w) ≥ P(W ≤ w) for all w ∈ R.
Exercise 1.10 (Degree of uniformly chosen vertex in GRGn (w)) Prove that the asymptotic degree in
GRGn (w) satisfies (1.3.22) under the conditions of Theorem 1.3.
Exercise 1.11 (Power-law degrees in generalized random graphs) Prove that, under the conditions of
Theorem 1.3, the degree power-law tail in (1.3.24) for GRGn (w) follows from the weight power-law tail
in (1.3.23). Does the converse also hold?
Exercise 1.12 (Degree example) Let the degree sequence d = (dv )v∈[n] be given by
dv = 1 + (v mod 3). (1.7.3)
Show that Conditions 1.7(a)–(c) hold. What is the limiting degree variable D?
Exercise 1.13 (Poisson degree example) Let the degree sequence d = (dv )v∈[n] satisfy
λk
nk /n → e−λ (1.7.4)
k
and
X X
knk /n → λ, k2 nk /n → λ(λ + 1). (1.7.5)
k≥0 k≥0
Show that Conditions 1.7(a)–(c) hold. What is the limiting degree variable D?
Exercise 1.14 (Power-law degree example) Consider the random variable D having generating function,
for α ∈ (0, 1),
GD (s) = s − (1 − s)α+1 /(α + 1). (1.7.6)
What is the probability mass function of D?
Exercise 1.15 (Power-law degree example) Consider the random variable D having generating function
(1.7.6) with α ∈ (0, 1). Show that D has an asymptotic power-law distribution and compute its power-law
exponent.
Exercise 1.16 (Power-law degree example (cont.)) Consider the degree sequence d = (dv )v∈[n] with
dv = [1 − F ]−1 (v/n), where F is the distribution of a random variable D having generating function
(1.7.6) with α ∈ (0, 1). Show that Conditions 1.7(a) and (b) hold, but Condition 1.7(c) does not.
Exercise 1.17 (Number of erased edges) Assume that Conditions 1.7(a) and (b) hold. Show that Theorem
1.8 implies that the number of erased edges in ECMn (d) is oP (n).
Exercise 1.18 (Edge probability of uniform random graphs with prescribed degrees) Prove the formula
for the (conditional) edge probabilities in uniform random graphs with prescribed degrees in (1.3.50).
Exercise 1.19 (Edge probability of uniform random graphs with prescribed degrees (cont.)) Prove the
formula for the number of switches with and without a specific edge in uniform random graphs with pre-
scribed degrees in (1.3.51). Hint: Use an “out-is-in” argument that the number of switches from S to S̄ is
the same as the number of switches that enter S̄ from S.
Exercise 1.20 (Degrees grow to infinity almost surely) Consider the preferential attachment model PA(m,δ)
n (a).
a.s.
Fix m = 1 and i ≥ 1. Prove that Di (n) −→ ∞ as n → ∞, by using n
P
s=i Is Di (n), where (In )n≥i
is a sequence of independent Bernoulli random variables with P(In = 1) = (1 + δ)/(n(2 + δ) + 1 + δ).
What does this imply for m > 1?
Exercise 1.21 (Degrees of fixed vertices) Consider the preferential attachment model PA(m,δ)
n (a). Prove
Theorem 1.17 for m = 1 and δ > −1 using the martingale convergence theorem and the fact that
n−1
Di (n) + δ Y (2 + δ)s + 1 + δ
Mi (n) = (1.7.7)
1 + δ s=i−1 (2 + δ)(s + 1)
46 Introduction and Preliminaries
Abstract
In this chapter we discuss local convergence, which describes the intuitive no-
tion that a finite graph, seen from the perspective of a typical vertex, looks like
a certain limiting graph. Local convergence plays a profound role in random
graph theory.
We give general definitions of local convergence in several probabilistic senses.
We then show that local convergence in its various forms is equivalent to the
appropriate convergence of subgraph counts. We continue by discussing several
implications of local convergence, concerning local neighborhoods, clustering,
assortativity, and PageRank. We further investigate the relation between local
convergence and the size of the giant component, making the statement that the
giant is “almost local” precise.
The local convergence of finite graphs was first introduced by Benjamini and Schramm
(2001) and, in a different context, independently by Aldous and Steele (2004). It describes
the intuitive notion that a finite graph, seen from the perspective of a vertex that is chosen
uar from the vertex set, looks like a certain limiting graph. This is already useful to in that
it makes precise the notion that a finite cube in Zd with large side length is locally much
alike Zd itself. However, it plays an even more profound role in random graph theory. For
example, local convergence to some limiting tree, which often occurs in random graphs,
as we will see throughout this book, is referred to as locally tree-like behavior. Such trees
are often branching processes; see for example [V1, Section 4.1] where this is worked out
for the Erdős–Rényi random graph. Since trees are generally simpler objects than graphs,
this means that, to understand a random graph, it often suffices to understand a branching
process tree instead.
Local convergence is a central technique in random graph theory, since many properties
of random graphs are in fact determined by a local limit. For example, the asymptotic num-
ber of spanning trees, the partition function of the Ising model, and the spectral distribution
of the adjacency matrix of the graph all turn out to be computable in terms of the local limit.
We refer to Section 2.7 for an extensive discussion of the highly non-trivial consequences of
local convergence. Owing to its enormous power, local convergence has become an indis-
pensable tool in the random graph theory of sparse graphs. In this book we will see several
examples of quantities whose convergence and limit are determined by the local limit, in-
cluding clustering, the size of the giant in most cases, and the PageRank distribution of
sparse random graphs. In this chapter, we lay the general foundations of local convergence.
47
48 Local Convergence of Random Graphs
Local weak convergence is a notion of the weak convergence of finite rooted graphs. In
general, weak convergence is equivalent to the convergence of expectations of continuous
functions. For continuity, one needs a topology. Therefore, we start by discussing the topol-
ogy of rooted graphs that is at the center of local weak convergence. We start with some
definitions:
Definition 2.1 (Locally finite and rooted graphs) A rooted graph is a pair (G, o), where
G = (V (G), E(G)) is a graph with vertex set V (G), edge set E(G), and root vertex
o ∈ V (G). Further, a rooted or non-rooted graph is called locally finite when each of its
vertices has finite degree (though not necessarily uniformly bounded). J
In Definition 2.1, graphs can have finitely or infinitely many vertices, but we always have
graphs in mind that are locally finite. Also, in the definitions below, the graphs are deter-
ministic and we clearly indicate when we move to random graphs instead. We next define
neighborhoods as rooted subgraphs of a rooted graph, for which we recall that distG denotes
the graph distance in the graph G:
Definition 2.2 (Neighborhoods as rooted graphs) For a rooted graph (G, o), we let Br(G) (o)
denote the (rooted) subgraph of (G, o) of all vertices at graph distance at most r away from
o. Formally, this means that Br(G) (o) = ((V (Br(G) (o)), E(Br(G) (o)), o), where
V (Br(G) (o)) = {u : distG (o, u) ≤ r}, (2.2.1)
E(Br(G) (o)) = {{u, v} ∈ E(G) : distG (o, u), distG (o, v) ≤ r}.
Also, let ∂Br(G) (o) denote the (unrooted) graph with vertex set V (∂Br(G) (o)) = V (Br(G) (o))\
(G)
V (Br−1 (o)) and edge set E(∂Br(G) (o)) = E(Br(G) (o)) \ E(Br−1 (G)
(o)). J
2.2 Metric Space of Rooted Graphs 49
2.3), this is indeed a dense countable subset. We discuss the metric structure of the space of
rooted graphs in more detail in Appendix A.3. Exercises 2.4 and 2.5 study such aspects.
In this section, we discuss the local weak convergence of deterministic graphs (Gn , on ),
rooted at a uniform vertex on ∈ V (Gn ), whose size tends to infinity as n → ∞. This section
is organized as follows. In Section 2.3.1, we give the definitions of local weak convergence
of (possibly disconnected) finite graphs. In Section 2.3.2, we provide a convenient criterion
to prove local weak convergence and discuss tightness. In Section 2.3.3, we show that when
the limit has full support on some subset of rooted graphs, convergence can be restricted to
that set. In Section 2.3.4, we discuss two examples of graphs that converge locally weakly.
We close in Section 2.3.5 by discussing the local weak convergence of marked graphs, which
turns out to be useful in many applications of local weak convergence.
We next use these conventions to define the local weak convergence of finite graphs:
Definition 2.6 (Local weak convergence) Let Gn = (V (Gn ), E(Gn )) denote a finite
(possibly disconnected) graph. Let (Gn , on ) be the rooted graph obtained by letting on ∈
V (Gn ) be chosen uar, and restricting Gn to the connected component C (on ) of on in Gn .
We say that (Gn , on ) converges locally weakly to the connected rooted graph (G, o), which
is a (possibly random) element of G? having law µ, when, for every bounded and continuous
function h : G? 7→ R,
E h Gn , on → Eµ h G, o ,
(2.3.3)
where the expectation on the rhs of (2.3.3) is wrt (G, o) having law µ, while the expectation
d
on the lhs is wrt the random vertex on . We denote the above convergence by (Gn , on ) −→
(G, o). J
Of course, by (2.3.2), the values h(Gn , on ) give you information only about C (on ),
which may be only a small portion of the graph when Gn is disconnected. However, since
2.3 Local Weak Convergence of Deterministic Graphs 51
we are sampling on ∈ V (Gn ) uar, actually we may “see” every connected component, so
in distribution we do observe the graph as a whole.
Since later we apply local weak convergence ideas to random graphs, we need to be
absolutely clear about with respect to what we take the expectation. Indeed, the expectation
in (2.3.3) is wrt the random root on ∈ V (Gn ), and is thus equal to
1 X
E h Gn , on =
h(Gn , v). (2.3.4)
|V (Gn )| v∈V (G )
n
The notion of local weak convergence plays a central role in this book. It may be hard
to grasp, and it also may appear to be rather weak. In what follows, we discuss examples
of graphs that converge locally weakly. Further, in Section 2.5 we discuss examples of how
local weak convergence may be used to obtain interesting consequences for graphs, such as
their clustering and degree–degree dependencies, measured through the assortativity coeffi-
cient. We continue by discussing a convenient criterion for proving local weak convergence.
Theorem 2.7 (Criterion for local weak convergence) The sequence of finite rooted graphs
((Gn , on ))n≥1 converges locally weakly to (G, o) ∼ µ precisely when, for every rooted
graph H? ∈ G? and all integers r ≥ 0,
1
1{B
X
p(Gn ) (H? ) = (Gn )
(v)'H? }
→ µ(Br(G) (o) ' H? ), (2.3.5)
|V (Gn )| v∈V (G r
n)
where Br(Gn ) (v) is the rooted r-neighborhood of u in Gn , and Br(G) (o) is the rooted r-
neighborhood of v in the limiting graph (G, o).
Proof This is a standard weak convergence argument. First, the local weak convergence
in Definition 2.6 implies that (2.3.5) holds, since we can take p(Gn ) (H? ) = E[h G, on
and h G, o = 1{Br(G) (o)'H? } , and h : G? 7→ {0, 1} is bounded and continuous (see Ex-
ercise 2.6). For the other direction, since µ is a probability measure on G? , the sequence
((Gn , on ))n≥1 is tight; see Theorem A.7 in Appendix A.2. By tightness, every subsequence
of ((Gn , on ))n≥1 has a further subsequence that converges in distribution. We work along
that subsequence, and note that the limiting law is that of (G, o), since the laws of Br(G) (o)
for all r ≥ 1 uniquely identify the law of (G, o) (see Proposition A.15 in Appendix A.3.5).
Since this is true for every subsequence, the local weak limit is (G, o).
Theorem 2.7 shows that the proportion of vertices in Gn whose neighborhoods look like
H? converges to a (possibly random) limit. See Exercise 2.8, where you are asked to con-
struct an example where the local weak limit of a sequence of deterministic graphs actually
is random. You are asked to prove local weak convergence for some examples in Exercises
2.9 and 2.10. Appendix A.3.6 discusses tightness in G? in more detail.
52 Local Convergence of Random Graphs
and all r ≥ 1.
We will apply Theorem 2.8 in particular when the limit is almost surely a tree. Then
Theorem 2.8 implies that we have to investigate only rooted graphs H? that are finite trees
of height at most r themselves.
Proof The set T? (r) is countable. Therefore, since µ (G, o) ∈ T? = 1, for every ε > 0,
there exists an m = m(ε) and a subset T? (r, m) of size at most m such that µ Br(G) (o) ∈
T? (r, m) ≥ 1 − ε. Fix this set. Then we bound
Since ε > 0 is arbitrary, we conclude that P Br(Gn ) (on ) 6∈ T? (r) → 0. In particular, this
d
and denote the resulting graph by (Gn , on ). We claim that (Gn , on ) −→ (Zd , o), which we
now prove. We rely on Theorem 2.7, which shows that we need to prove the convergence of
subgraph proportions.
d
Let µ be the point measure on (Zd , o), so that µ(Br(G) (o) ' Br(Z ) (o)) = 1. Thus, by
d
Theorem 2.8, it remains to show that p(Gn ) (Br(Z ) (o)) → 1 (recall (2.3.5)). For this, we note
d
that Br(Gn ) (on ) ' Br(Z ) (o) unless on happens to lie within a distance strictly smaller than r
from one of the boundaries of [n]d . This means that one of the coordinates of on is either in
[r − 1], or in [n] \ [n − r + 1]. Since the latter occurs with vanishing probability, the claim
follows.
In the above case, we see that the local weak limit is deterministic, as one would have
expected. One can generalize the above to the local weak convergence of tori as well.
and edge set as follows. Let v = ∅v1 · · · vk and u = ∅u1 · · · u` be two vertices. We say
that u is the parent of v when ` = k − 1 and ui = vi for all i ∈ [k − 1]. Then we say
that two vertices u and v are neighbors when u is the parent of v or vice versa. We obtain a
graph with
|V (Td,n )| = 1 + d + · · · + d(d − 1)n−1 (2.3.10)
vertices.
Let on denote a vertex chosen uar from V (Td,n ). To study the local weak limit of
(Td,n , on ), we first consider the so-called canopy tree. For this, we take the graph Td,n ,
root it at any leaf, which we will call the root-leaf, and take the limit of n → ∞. Denote
this graph by Tcd , which we consider to be an unrooted graph, but we keep the root-leaf for
reference purposes. This graph has a unique infinite path from the root-leaf. Let o` be the
`th vertex on this infinite path (the root-leaf being o0 ), and consider (Tcd , o` ). Define the
limiting measure µ by
µ((Tcd , o` )) ≡ µ` = (d − 2)(d − 1)−(`+1) , ` ≥ 0. (2.3.11)
Fix Gn = Td,n . We claim that (Gn , on ) ≡ (Td,n , on ) −→ (G, o) with law µ in (2.3.11).
d
We again rely on Theorem 2.7, which shows that we need to prove only the convergence of
the subgraph proportions.
(Tc )
By Theorem 2.8, it remains to show that p(Gn ) (Br d (o` )) → µ` (recall (2.3.5)). When
(Tc )
n is larger than r (which we now assume), Br(Gn ) (on ) ' Br d (o` ) precisely when on has
distance ` from the closest leaf. There are
d(d − 1)k−1 (2.3.12)
vertices at distance k from the root, out of a total of |V (Td,n )| = d(d − 1)n /(d − 2)(1 +
54 Local Convergence of Random Graphs
o(1)). Having distance ` to the closest leaf in Td,n is the same as having distance k = n − `
from the root. Thus,
d(d − 1)k−1
p(Gn ) (Br(Td ) (o` )) =
c
→ (d − 2)(d − 1)−(`+1) = µ` , (2.3.13)
|V (Td,n )|
as required.
We see that, for truncated regular trees, the local weak limit is random, where the ran-
domness originates from the choice of the random root of the graph. More precisely, this is
due to the choice how far away the chosen root is from the leaves of the finite tree. Perhaps
surprisingly, the local limit of a truncated regular tree is not the infinite regular tree.
with φ : V (Br(G1 ) (o1 )) → V (Br(G2 ) (o2 )) running over all isomorphisms between Br(G1 ) (o1 )
and Br(G2 ) (o2 ) satisfying φ(o1 ) = o2 . J
When Ξ is a finite set, we can simply let dΞ (a, b) = 1{a6=b} , so that (2.3.14)–(2.3.15) state
that not only should the neighborhoods Br(G1 ) (o1 ) and Br(G2 ) (o2 ) be isomorphic but also the
corresponding marks on the vertices and half-edges in Br(G1 ) (o1 ) and Br(G2 ) (o2 ) should all
be the same.
Definition 2.10 puts a metric structure on marked rooted graphs. With this metric topology
in hand, we can simply adapt all convergence statements to this setting. We refrain from
stating all these extensions explicitly. See Exercise 2.17 for an application of marked graphs
to directed graphs. For example, the marked rooted graph setting is a way to formalize the
setting of multi-graphs in Remark 2.4 (see Exercise 2.18).
Having discussed the notion of local weak convergence for deterministic graphs, we now
move on to random graphs. Here the situation becomes more delicate, as now we have
double randomness, both in the random root as well as the random graph. This gives rise to
surprising subtleties.
We next discuss the local convergence of random graphs. This section is organized as fol-
lows. In Section 2.4.1 we define what it means for a sequence of random graphs to converge
locally, as well as which different versions thereof exist. In Section 2.4.2 we then give a use-
ful criterion to verify the local convergence of random graphs. In Section 2.4.3 we prove the
completeness of the limit by showing that, when the limit is supported on a subset of rooted
graphs, then one needs only to verify the convergence for that subset. In many examples that
we encounter in this book, this subset is the collection of trees. We close with two examples:
that of random regular graphs in Section 2.4.4 and that of the Erdős–Rényi random graph
ERn (λ/n) in Section 2.4.5.
Definition 2.11 (Local convergence of random graphs) Let (Gn )n≥1 with Gn = (V (Gn ), E(Gn ))
denote a finite sequence of (possibly disconnected) random graphs. Then,
E h Gn , on → Eµ̄ h Ḡ, ō ,
(2.4.1)
for every bounded and continuous function h : G? 7→ R, where the expectation E on the
lhs of (2.4.1) is wrt the random vertex on and the random graph Gn . This is equivalent
d
to (Gn , on ) −→ (G, o).
56 Local Convergence of Random Graphs
dom variable due to its dependence on the random graph Gn , converges in probability to
Eµ h G, o , which is possibly also a random variable in that µ might be a random proba-
bility distribution on G? . When, instead,
we havelocal weak convergence, only expectations
wrt the random graph of the form E h Gn , on converge, and the limiting measure µ̄ is
deterministic.
Remark 2.12 (Local convergence in probability and rooted versus unrooted graphs) Usu-
ally, if we have a sequence of objects xn living in some space X , and xn converges to
x, then x also lives in X . In the above definitions of local convergence in probability and
almost surely, respectively, we take a graph sequence (Gn )n≥1 that converges locally in
probability and almost surely, respectively, to a rooted graph (G, o) ∼ µ. One might have
P a.s.
guessed that this is related to (Gn , on ) −→ (G, o) and (Gn , on ) −→ (G, o), but in fact
it is quite different. Let us restrict attention to local convergence in probability. Indeed,
P
(Gn , on ) −→ (G, o) is a very strong and arguably not so useful statement. For one, it re-
quires that ((Gn , on ))n≥1 and (G, o) live on the same probability space, which is not often
evidently the case. Further, sampling on gives rise to a variability in (Gn , on ) that is hard to
capture by the limit (G, o). Indeed, when n varies, the root on in (Gn , on ) will have to be
redrawn every once in a while, and it seems difficult to do this in such a way that (Gn , on )
is consistently close to (G, o). J
Remark 2.13 (Random measure interpretation of local convergence in probability) The
following observations turn local convergence in probability into the convergence of ob-
jects living on the same space, namely, the space of probability measures on rooted graphs.
Denote the empirical neighborhood measure µn on G? by
1
1{(Gn ,v)∈H? } ,
X
µn (H? ) = (2.4.4)
|V (Gn )| v∈V (G )
n
for every measurable subset H? of G? . Then, (Gn )n≥1 converges locally in probability to
the random rooted graph (G, o) ∼ µ when
P
µn (H? ) −→ µ(H? ) (2.4.5)
for every measurable subset H? of G? . This is equivalent to Definition 2.11(b), since, for
every bounded and continuous h : G? 7→ R, and denoting the conditional expected value of
h(Gn , on ) when (Gn , on ) ∼ µn by Eµn [h(Gn , on ) | Gn ], we have
1 X
Eµn [h(Gn , on ) | Gn ] = h(Gn , v) = E h Gn , on | Gn , (2.4.6)
|V (Gn )| v∈V (G )
n
2.4 Local Convergence of Random Graphs 57
Further, if (Gn )n≥1 converges almost surely to (G, o), then (Gn )n≥1 also converges locally
in probability to (G, o).
Proof Note that Eµ h G, o is a bounded random variable, and so is E h Gn , on | Gn .
Therefore, by the Dominated Convergence Theorem [V1, Theorem A.1], the expectations
also converge. We conclude that
h i h i
E h Gn , on = E E h Gn , on | Gn → E Eµ h G, o ,
(2.4.9)
which proves that the claim with the limit identified in (2.4.8) holds. The relation between
local convergence almost surely and in probability follows from that for random variables
and Definition 2.11.
In most of our examples, the law µ of the local limit in probability is actually determin-
istic, in which case µ̄ = µ. However, there are some cases where this is not true. A simple
example arises as follows. For ERn (λ/n), the local limit in probability turns out to be a
Poi(λ) branching process (see Section 2.4.5). Therefore, when considering ERn (X/n),
where X is uniform on [0, 2], the local limit in probability will be a Poi(X) branching pro-
cess. Here, the expected offsprings conditioned on the random variable X are random and
related, as they are all equal to X . This is not the same as a mixed-Poisson branching process
with offspring distribution Poi(X), since, for the local limit in probability of ERn (X/n),
we draw X only once. We refer to Section 2.4.5 for more details on local convergence for
ERn (λ/n).
We have added the notion of local convergence in the almost sure sense, even though
for random graphs this notion is often not highly useful. Indeed, almost sure convergence
58 Local Convergence of Random Graphs
for random graphs can already be tricky, since for static models such as the Erdős–Rényi
random graph and the configuration model, there is no obvious relation between the graphs
of size n and those of size n + 1. This of course is different for the preferential attachment
model, which forms a (consistent) random graph process.
(c) Gn converges locally almost surely to (G, o) ∼ µ precisely when, for every rooted
graph H? ∈ G? and all integers r ≥ 0,
1
1 (Gn ) a.s.
X
p(Gn ) (H? ) = −→ µ(Br(G) (o) ' H? ). (2.4.12)
|V (Gn )| v∈V (G ) {Br (v)'H? }
n
Proof This follows from Theorem 2.7. Indeed, for part (a), it follows directly, as part (a)
deals with local weak convergence as in Theorem 2.7. For convergence almost surely as
in part (c), this also follows directly. For part (b), we need an extra argument. By the Sko-
rokhod Embedding Theorem, for each H? , there exists a probability space for which the
convergence in (2.4.12) occurs almost surely. The same holds for any finite subcollection of
H? ∈ G? , and since the set of graphs H? that can occur as r-neighborhoods is countable, it
can even be extended to all such H? ∈ H? . Thus, the statement again follows from Theorem
2.7.
In what follows, we are mainly interested in local convergence in probability, since this is
the notion that is the most powerful and useful in the setting of random graphs.
Recall that T? ⊂ G? is a subset of the space of rooted graphs, and that T? (r) ⊆ T? is
the subset of T? of graphs for which the distance between any vertex and the root is at most
r. Then, we have the following result:
Theorem 2.16 (Local convergence and subsets) Let (Gn )n≥1 be a sequence of rooted
graphs. Let (Ḡ, ō) be a random variable on G? having law µ̄. Let T? ⊂ G? be a subset of
d
the space of rooted graphs. Assume that µ̄ (Ḡ, ō) ∈ T? = 1. Then, (Gn , on ) −→ (Ḡ, ō)
1 − E[p(Gn ) (Br(Td ) (o))] → 0. Next, we show that E[p(Gn ) (Br(Td ) (o))2 ] → 1, which shows
P
that Var(p(Gn ) (Br(Td ) (o))) → 0, and thus p(Gn ) (Br(Td ) (o)) −→ 1. Now,
completes the proof for the configuration model. Since we have proved the convergence
in probability of the subgraph proportions, convergence in probability follows when we
condition on simplicity (recall [V1, Corollary 7.17]), and thus the proof also follows for
random regular graphs. We leave the details of this argument as Exercise 2.19.
where Gn = ERn (λ/n), and the law µ of (G, o) is that of a Poi(λ) branching process. We
see that, in this case, µ is deterministic, as it will be in most examples encountered in this
book. In (2.4.20), we may without loss of generality assume that t is a finite tree of depth at
most r, since otherwise both sides are zero.
the breadth-first exploration of the tree t in Definition 1.25, which is described in terms of
(xi )ti=0 as in (1.5.7) and the corresponding vertices (ai )ti=0 , where t = |V (t) denotes the
number of vertices in t. Further, note that (G, o) is, by construction, an ordered tree, and
therefore Br(G) (o) inherits this ordering. We make this explicit by writing B̄r(G) (o) for the
ordered version of Br(G) (o). Therefore, we can write B̄r(G) (o) = t to indicate that the two
2.4 Local Convergence of Random Graphs 61
ordered trees B̄r(G) (o) and t agree. In terms of this notation, one can compute
Y λ xi
µ(B̄r(G) (o) = t) = e−λ , (2.4.17)
i∈[t] : dist(∅,a )<r
xi !
i
where dist(∅, v) is the tree distance between v ∈ V (t) and the root ∅ ∈ V (t). We note that
B̄r(G) (o) = t says nothing about the degrees of the vertices that are at distance exactly r away
from the root ∅, which is why we restrict to vertices v with dist(∅, ai ) < r in (2.4.18).
Further, µ(B̄r (o) = t0 ) = µ(B̄r (o) = t) for each ordered tree t0 that is isomorphic to the
tree t. This is because the root degrees and degree sequences of the non-root vertices are the
same for all trees that are isomorphic to t, and the right-hand side of (2.4.17) depends only
on the degree of the root, and the degrees of all other non-root vertices (recall also Definition
1.25 and Exercise 1.28). Therefore,
Y λ xi
µ(Br(G) (o) ' t) = #(t) e−λ , (2.4.18)
i∈[t] : dist(∅,a )<r
xi !
i
where #(t) is the number of ordered trees that are isomorphic to t. This identifies the right-
hand side of (2.4.20).
We note further that by permuting the labels of all the childrenQ of any vertex in t, we
obtain a rooted tree that is isomorphic to t, and there are i∈[t] xi ! such permutations.
However, not all of them may lead to distinct ordered trees. In our analysis, the precise
value of #(t) will be irrelevant.
It is convenient to order also the vertices in Br(Gn ) (on ), where Gn = ERn (λ/n). This
can be achieved by ordering the forward children of a vertex in Br(Gn ) (on ) according to their
vertex labels. We denote the result as B̄r(Gn ) (o), which is an ordered graph. Then, we can
again write B̄r(Gn ) (o) = t to indicate that the two ordered graphs Br(Gn ) (o) and t agree. This
implies that Br(Gn ) (o) is a tree (so there are no cycles within depth r), and that its ordered
version is equal to the ordered tree t. Then, as in (2.4.18),
1 X 1 X
p(Gn ) (t) = 1{Br(Gn ) (v)'t} = #(t) 1 (Gn ) . (2.4.19)
n v∈[n] n v∈[n] {B̄r (v)=t}
The second-moment method shows that Nn,r (t) is well concentrated around nµ(Br(G) (o) =
t). We start by investigating the first moment of Nn,r (t), which equals
X
E[Nn,r (t)] = P(B̄r(Gn ) (v) = t) = nP(B̄r(Gn ) (1) = t), (2.4.22)
v∈[n]
62 Local Convergence of Random Graphs
where the latter step uses the fact that the distributions of the neighborhoods of all vertices
in ERn (λ/n) are the same.
We recall the breadth-first description of an ordered tree in Definitions 1.24 and 1.25
in Section 1.5. Let vi ∈ [n] denote the vertex label of the ith vertex that is explored in the
breadth-first exploration. Let Xi denote the number of forward neighbors of vi , except when
vi is at graph distance r from vertex 1, in which case we set Xi = 0 by convention. Further,
let Yi denote the number of edges leading to already found, but not yet explored, vertices.
Then, B̄r(Gn ) (1) = t occurs precisely when (Xi , Yi ) = (xi , 0) for all i ∈ [t]. Therefore,
t
Y
P (Xi , Yi ) = (x[i] , 0[i] ) | (X[i−1] , Y[i−1] ) = (x[i−1] , 0[i−1] ) . (2.4.24)
i=1
Conditional on (X[i−1] , Y[i−1] ) = (x[i−1] , 0[i−1] ), for all i for which vi is at a distance at
most r − 1 from vertex 1, we have
since there are si−1 active vertices, and Yi counts the number of edges between vi and
any other vertex. Finally, Xi and Yi are conditionally independent given (X[i−1] , Y[i−1] ) =
(x[i−1] , 0[i−1] ), owing to the independence of the edges in ERn (λ/n). Note that the distance
of vi from vertex 1 is exactly equal to the distance of the corresponding vertex ai ∈ V (t) to
the root ∅ ∈ V (t). Therefore, since P(Bin(ni , λ/n) = xi ) → e−λ λxi /xi !,
Y Y λ si−1
P(B̄r(Gn ) (1) = t) = P(Bin(ni , λ/n) = xi ) × 1−
i∈[t] : dist(∅,ai )<r i∈[t]
n
Y λxi
→ e−λ = µ(B̄r(G) (o) = t), (2.4.27)
i∈[t] : dist(∅,ai )<r
xi !
1
P(B̄r(Gn ) (1) = t) = E[Nn,r (t)] → µ(B̄r(G) (o) = t). (2.4.28)
n
2.4 Local Convergence of Random Graphs 63
P(B̄r(Gn ) (2) = t, distGn (1, 2) > 2r | B̄r(Gn ) (1) = t) → µ(B̄r(G) (o) = t), (2.4.37)
as required.
In this section we discuss some consequences of local convergence that will either prove
to be useful in what follows or describe how network statistics are determined by the local
limit.
(b) Assume that Gn converges locally in probability to (G, o) ∼ µ on G? . Then, for every
m ≥ 1, with o(1)
n , on two independent uniformly chosen vertices in V (Gn ),
(2)
m
|∂Br(Gn ) (o(1)
n )|, |∂Br
(Gn )
(o(2)
n )|
r=1
d
m
−→ |∂Br (o )|, |∂Br (o(2) )|
(G) (1) (G)
, (2.5.2)
r=1
P Bm (Gn )
n ) ' t1 , Bm (on ) ' t2 | Gn
(o(1) (Gn ) (2)
P
−→ µ(Bm
(G)
(o(1) ) ' t1 )µ(Bm
(G)
(o(2) ) ' t2 ). (2.5.5)
Taking the expectation proves the claim (the reader is invited to provide the fine details of
this argument in Exercise 2.30).
In the above discussion, it is crucial to note that the limits in (2.5.2) correspond to two
independent copies of (G, o) having law µ, but with the same µ, where µ is a random
probability measure on G? . It is here that the possible randomness of µ manifests itself.
Recall also the example below Corollary 2.14.
We continue by showing that local convergence implies that the graph distance between
two uniform vertices tends to infinity:
Corollary 2.20 (Large distances) Let (Gn )n≥1 be a graph sequence whose sizes |V (Gn )|
n , on be two vertices chosen independently and uar from V (Gn ).
tend to infinity. Let o(1) (2)
d
Assume that (Gn , on ) −→ (Ḡ, ō) ∼ µ̄. Then
P
n , on ) −→ ∞.
distGn (o(1) (2)
(2.5.6)
Proof It suffices to prove that, for every r ≥ 1,
P(distGn (o(1)
n , on ) ≤ r) = o(1).
(2)
(2.5.7)
For this, we use that o(2)
n is chosen uar from V (Gn ) independently of on , so that
(1)
P(distGn (o(1)
n , on ) ≤ r) = E |Br
(Gn ) (1)
(2)
(on )|/|V (Gn )|
= E |Br(Gn ) (on )|/|V (Gn )| .
(2.5.8)
By Corollary 2.19(a), |Br(Gn ) (on )| is a tight random variable, so that
P
|Br(Gn ) (on )|/|V (Gn )| −→ 0. (2.5.9)
66 Local Convergence of Random Graphs
Further, |Br(Gn ) (on )|/|V (Gn )| ≤ 1 almost surely. Thus, by the Dominated Convergence
Theorem ([V1, Theorem A.1]), E |Br(Gn ) (on )|/n = o(1) for every r ≥ 1, so that the
claim follows.
We close this section by showing that local convergence implies that the number of con-
nected components converges:
Corollary 2.21 (Number of connected components) Let (Gn )n≥1 be a sequence of graphs
whose sizes |V (Gn )| tend to infinity, and let Qn denote the number of connected components
in Gn .
d
(a) Assume that (Gn , on ) −→ (Ḡ, ō) ∼ µ̄. Then
E[Qn /|V (Gn )|] → Eµ̄ [1/|C (ō)|], (2.5.10)
where |C (ō)| is the size of the connected component of ō in Ḡ.
(b) Assume that Gn converges locally in probability to (G, o) ∼ µ. Then
where on ∈ V (Gn ) is chosen uar. Since h(G, o) = 1/|C (o)| is a bounded and continuous
function (where, by convention, h(G, o) = 0 when |C (o)| = ∞; see Exercise 2.22), the
claim follows.
For part (b), instead, we have
" # " #
1 1
Qn /|V (Gn )| = E Gn −→ Eµ
P
, (2.5.14)
|C (on )| |C (o)|
as required.
1{ij,jk∈E(Gn )} =
X X
WG n = dv (dv − 1) (2.5.15)
i,j,k∈V (Gn ) v∈V (Gn )
2.5 Consequences of Local Convergence: Local Functionals 67
denote twice the number of wedges in the graph Gn . The factor two arises because the
wedge ij, jk is the same as the wedge kj, ji, but it is counted twice in (2.5.15). We further
let
1{ij,jk,ik∈E(Gn )}
X
∆G n = (2.5.16)
i,j,k∈V (Gn )
denote six times the number of triangles in Gn . The global clustering coefficient CCGn in
Gn is defined as
∆Gn
CCGn = . (2.5.17)
WGn
The global clustering coefficient measures the proportion of wedges for which the closing
edge is also present. As such, it can be thought of as the probability that two random friends
of a random individual are friends themselves.
The following theorem describes the conditions for the clustering coefficient to converge.
In its statement, we recall that a sequence (Xn )n≥1 of random variables is uniformly inte-
grable when
lim lim sup E[|Xn |1{|Xn |>K} ] = 0. (2.5.18)
K→∞ n→∞
Theorem 2.22 (Convergence of global clustering coefficient) Let (Gn )n≥1 be a sequence
of graphs whose sizes |V (Gn )| tend to infinity. Assume that Gn converges locally in prob-
ability to (G, o) ∼ µ. Further, assume that Dn = d(G on
n)
is such that (Dn2 )n≥1 is uniformly
integrable, and that µ(do > 1) > 0. Then
P Eµ [∆G (o)]
CCGn −→ , (2.5.19)
Eµ [do (do − 1)]
where ∆G (o) = u,v∈∂B1 (o) 1{{u,v}∈E(G)} denotes twice the number of triangles in G that
P
contain o as a vertex.
Proof We write
E[∆Gn (on ) | Gn ]
CCGn = , (2.5.20)
E[d(Gn) (Gn )
on (don − 1) | Gn ]
that the convergence of their expectations over on does not follow immediately from local
convergence in probability. It is here that we need to make use of the uniform integrability
of (Dn2 )n≥1 , where Dn = d(G n)
on . We make the split
on (don − 1)1{(do
E[d(G | Gn ] −→ Eµ [do (do − 1)1{d2o ≤K} ],
P
n) (Gn )
(Gn ) 2
) ≤K}
(2.5.22)
n
since h(G, o) = do (do − 1)1{d2o ≤K} is a bounded continuous function. Further, by the
on ) )n≥1 and with E denoting the expectation wrt
uniform integrability of (Dn2 )n≥1 = ((d(Gn) 2
on as well as wrt the random graph, for every ε > 0 there exists an N = N (ε) sufficiently
large such that, uniformly in n ≥ N (ε),
1
P E[d(Gon
n)
(d(Gn )
on − 1) (Gn ) 2
{(do ) >K}
| G n ] ≥ ε
n
1
on ) 1{(d(G
n) 2
≤ E[(d(G n) 2
) >K}
] ≤ ε. (2.5.24)
ε on
It follows that E[d(G − 1) | Gn ] −→ Eµ [do (do − 1)], as required. Since µ(do >
P
on (don
n) (Gn )
The proof that E[∆Gn (on ) | Gn ] −→ Eµ [∆G (o)] is similar, where now we make the
P
split
E[∆Gn (on ) | Gn ] = E[∆Gn (on )1{(d(G
o
n) 2
) ≤K}
| Gn ]
n
continuous functional, and therefore converges in probability. The second term, on the other
hand, satisfies
E[∆Gn (on )1{(do(Gn ) )2 >K} | Gn ] ≤ E[(do(Gnn ) )2 1{(d(G
o
n) 2
) >K}
| Gn ], (2.5.26)
n n
Here, we can think of ∆Gn (v)/[dv (dv − 1)] as the proportion of edges present between
neighbors of v , and then (2.5.28) takes the average of this. The following theorem implies
its convergence without any further uniform integrability conditions, and thus justifies the
name local clustering coefficient:
Theorem 2.23 (Convergence of local clustering coefficient) Let (Gn )n≥1 be a sequence of
graphs whose sizes |V (Gn )| tend to infinity. Assume that Gn converges locally in probability
to (G, o) ∼ µ. Then
h ∆ (o) i
G
CCGn −→ Eµ
P
. (2.5.29)
do (do − 1)
Proof We now write
h ∆Gn (on ) i
CCGn = E | G n , (2.5.30)
d(Gn) (Gn )
on (don − 1)
and note that h(G, o) = ∆G (o)/[do (do −1)] is a bounded continuous functional. Therefore,
h ∆Gn (on ) i h ∆ (o) i
G
E E
P
| G n −→ µ , (2.5.31)
d(Gn) (Gn )
on (don − 1) do (do − 1)
as required.
There are more versions of clustering coefficients. Convergence of the so-called clustering
spectrum is discussed in the notes in Section 2.7.
Note that
pe(Gn ) (H? ) = P(Br(Gn ) (e) ' H? | Gn ), (2.5.33)
70 Local Convergence of Random Graphs
where e = (e, ē) is a uniformly chosen directed edge from E(G ~ n ). Thus, pe(Gn ) (H? ) is the
edge-equivalent of p (Gn )
(H? ) in (2.3.5). We next study its asymptotics:
Theorem 2.24 (Convergence neighborhoods of edges) Let (Gn )n≥1 be a sequence of
graphs whose sizes |V (Gn )| tend to infinity. Assume that Gn converges locally in proba-
bility to (G, o) ∼ µ. Assume further that (do(Gnn ) )n≥1 is a uniformly integrable sequence of
random variables, and that µ(do ≥ 1) > 0. Then, for every H? ∈ G? ,
P
Eµ [do 1{Br(G) (o)'H? } ]
pe(Gn ) (H? ) −→ . (2.5.34)
Eµ [do ]
Proof We recall (2.5.32), and note that
2 1 X
|E(Gn )| = d(Gn ) = E[d(Gn)
| Gn ]. (2.5.35)
|V (Gn )| |V (Gn )| v∈V (G ) v on
n
Therefore, since (do(Gnn ) )n≥1 is uniformly integrable, local convergence in probability implies
that
2
|E(Gn )| −→ Eµ [do ].
P
(2.5.36)
|V (Gn )|
Since µ(do ≥ 1) > 0, it follows that Eµ [do ] > 0.
Further, we rewrite
1 1
1{Br(Gn ) (u)'H? } = du(Gn ) 1{Br(Gn ) (u)'H? }
X X
|V (Gn )| ~ n)
|V (G n )| u∈V (Gn )
(u,v)∈E(G
1
h i
= E d(G
on
n)
(Gn )
{Br (on )'H? }
| G n , (2.5.37)
where on is a uniformly chosen vertex in V (Gn ). Again, since (d(G on )n≥1 is uniformly
n)
Therefore, by (2.5.32), taking the ratio of the terms in (2.5.36) and (2.5.38) proves the claim.
Thus, p(G n)
k,l is the probability that a random directed edge connects a vertex of degree k with
one of degree l. By convention, we define p(G n)
k,l = 0 when k = 0. The following theorem
proves that the degree–degree distribution converges when the graph locally converges in
probability:
Theorem 2.25 (Degree–degree convergence) Let (Gn )n≥1 be a sequence of graphs whose
sizes |V (Gn )| tend to infinity. Assume that Gn converges locally in probability to (G, o) ∼
2.5 Consequences of Local Convergence: Local Functionals 71
µ. Assume further that (do(Gnn ) )n≥1 is a uniformly integrable sequence of random variables,
and that µ(do ≥ 1) > 0. Then, for every k, l with k ≥ 1,
(Gn ) P
pk,l −→ kµ(do = k, dV = l), (2.5.40)
= k E 1{d(G
h i
o
n)
=k,d
(Gn )
=l}
| G n , (2.5.41)
n V
Thus, by (2.5.39), taking the ratio of the terms in (2.5.36) and (2.5.42) proves the claim.
We finally discuss the consequences for the assortativity coefficient (recall [V1, Section
1.5]). We now write the degrees in Gn as (dv )v∈V (Gn ) to avoid notational clutter. Define the
assortativity coefficient as
where we recall that E(G ~ n ) is the collection of directed edges and we make the abbreviation
(Gn )
di = di for i ∈ V (Gn ). We can recognize ρGn in (2.5.43) as the empirical correlation
coefficient of the two-dimensional sequence of variables (de , dē )e∈E(G ~ n ) . As a result, it is
the correlation between the coordinates of the two-dimensional random variable of which
(Gn )
(pk,l )k,l≥1 is the joint probability mass function. We can rewrite the assortativity coefficient
ρGn more conveniently as
P
~ n ) di dj − (
P 2 2 ~
(i,j)∈E(G i∈V (Gn ) di ) /|E(Gn )|
ρGn = P . (2.5.44)
d3 − (
P
i i
~ n )|
d2 )2 /|E(G
i∈V (Gn ) i∈V (Gn )
The following theorem gives conditions for the convergence of ρGn when Gn converges
locally in probability:
Theorem 2.26 (Assortativity convergence) Let (Gn )n≥1 be a sequence of graphs whose
sizes |V (Gn )| tend to infinity. Assume that Gn converges locally in probability to (G, o) ∼
72 Local Convergence of Random Graphs
µ. Assume further that Dn = do(Gnn ) is such that (Dn3 )n≥1 is uniformly integrable, and that
µ(do = r) < 1 for every r ≥ 0. Then,
Proof We start with (2.5.44), and consider the various terms. We divide all the sums by
n) 3
n. Then, by local convergence in probability and the uniform integrability of ((d(G
on ) )n≥1 ,
which implies that (don )n≥1 is also uniformly integrable,
(Gn )
1 ~
|E(Gn )| = E[do(Gnn ) | Gn ] −→ Eµ [do ].
P
(2.5.46)
n
n) 3
Again by local convergence and the uniform integrability of ((d(G
on ) )n≥1 , which implies
(Gn ) 2
that ((don ) )n≥1 is also uniformly integrable,
1 X
d2 = E[(d(Gn) 2 2
on ) | Gn ] −→ Eµ [do ].
P
(2.5.47)
|V (Gn )| i∈V (G ) i
n
Further, again by local convergence in probability and the assumed uniform integrability of
((do(Gnn ) )3 )n≥1 ,
1 X
d3 = E[(d(Gn) 3 3
on ) | Gn ] −→ Eµ [do ].
P
(2.5.48)
|V (Gn )| i∈V (G ) i
n
This identifies the limits of all but one of the sums appearing in (2.5.44). Details are left to
the reader in Exercise 2.26. Further, Eµ [d3o ] − Eµ [d2o ]2 /Eµ [do ] > 0 since µ(do = r) < 1
for every r ≥ 0 (see Exercise 2.27).
We finally consider the last term, involving the product of the degrees across edges, i.e.,
!
1 X 1 X
2 1 X
di dj = d dv (2.5.49)
|V (Gn )| ~
|V (Gn )| u∈V (G ) u du v : v∼u
(i,j)∈E(Gn ) n
= E[d2on dV | Gn ],
where V is a uniform neighbor of on . When the degrees are uniformly bounded, the func-
tional h(G, o) = d2o E[dV | G] is bounded and continuous, so that it will converge. However,
the degrees are not necessarily bounded, so a truncation argument is needed.
We make the split
1 1
di dj 1{di ≤K,dj ≤K}
X X
di dj =
|V (Gn )| ~ n)
|V (Gn )| ~ n)
(i,j)∈E(G (i,j)∈E(G
1
di dj (1 − 1{di ≤K,dj ≤K} ).
X
+ (2.5.50)
|V (Gn )| ~ n)
(i,j)∈E(G
2.5 Consequences of Local Convergence: Local Functionals 73
By local convergence in probability (or by Theorem 2.25), since the functional is now
bounded and continuous,
1
di dj 1{di ≤K,dj ≤K} −→ Eµ [d2o dV 1{don ≤K,dV ≤K} ].
X P
(2.5.52)
|V (Gn )| ~ n)
(i,j)∈E(G
We are left with showing that the second contribution in (2.5.50) is small. We bound this
contribution as follows:
1
di dj (1{di >K} + 1{dj >K} )
X
|V (Gn )| ~ n)
(i,j)∈E(G
2
di dj 1{di >K} .
X
= (2.5.53)
|V (Gn )| ~ n)
(i,j)∈E(G
on ) 1{d(G
n) 3
= 2E[(d(G o
n)
>K}
| Gn ]1/2 E[(d(Gn) 3
on ) | Gn ]
1/2
. (2.5.54)
n
n) 3
By the uniform integrability of ((d(G
on ) )n≥1 , there exists K = K(ε) and N = N (ε) such
that, for all n ≥ N ,
on ) 1{d(G
n) 3
E[(d(G o
n)
>K}
] ≤ ε4 /4. (2.5.55)
n
ε3 4
1 ≤ 3 E[(do(Gnn ) )3 1{d(G
n) 3
P E[(d(G
on ) (Gn )
{don >K}
| G n ] ≥ n)
>K}
] ≤ ε. (2.5.56)
4 ε on
As a result, with probability at least 1 − ε and for ε > 0 sufficiently small to accommo-
date the factor En [(d(Gn ) 3 1/2
on ) ] (which is uniformly bounded by the uniform integrability of
(Gn ) 3
((don ) )n≥1 ),
1
di dj (1{di >K} + 1{dj >K} ) ≤ ε3/2 E[(d(G
X
n) 3 1/2
o n ) | Gn ] ≤ ε. (2.5.57)
|V (Gn )| ~ n)
(i,j)∈E(G
We continue by investigating the size of the giant component when the graph converges
locally. Here, we simplify the notation by assuming that Gn = (V (Gn ), E(Gn )) is such
that |V (Gn )| = n, and we recall that
|Cmax | = max |C (v)| (2.6.1)
v∈V (Gn )
denotes the maximal connected component size. While Corollary 2.21 shows that the num-
ber of connected components is well behaved in the local topology, the proportion of vertices
in the giant is not so nicely behaved.
Assume that Gn converges locally in probability to (G, o). Then, we conclude that, with
ζ≥k = µ(|C (o)| ≥ k) (see Exercise 2.32),
Z≥k
= E[1{|C (on )|≥k} | Gn ] −→ ζ≥k .
P
(2.6.4)
n
For every k ≥ 1,
{|Cmax | ≥ k} = {Z≥k ≥ k}, (2.6.5)
and |Cmax | ≤ Z≥k on those realizations where the event that Z≥k ≥ 1 holds. Note that
ζ = limk→∞ ζ≥k = µ(|C (o)| = ∞). We take k large enough that ζ ≥ ζ≥k − ε/2. Then,
for every k ≥ 1, ε > 0, and all n large enough that n(ζ + ε) ≥ k ,
P(|Cmax | ≥ n(ζ + ε)) ≤ P(Z≥k ≥ n(ζ + ε))
≤ P(Z≥k ≥ n(ζ≥k + ε/2)) = o(1). (2.6.6)
2.6 Giant Component is Almost Local 75
We conclude that while local convergence cannot determine the size of the largest con-
nected component, it can prove an upper bound on |Cmax |. In this book, we often extend this
P
to |Cmax |/n −→ ζ = µ(|C (o)| = ∞), but this is no longer a consequence of local conver-
P
gence alone. In Exercise 2.31, you are asked to give an example where |Cmax |/n −→ η < ζ ,
even though Gn does converge locally in probability to (G, o) ∼ µ. Therefore, in general,
more involved arguments must be used. The next theorem shows that one, relatively simple,
/ y for the statement that C (x) and C (y)
condition suffices. In its statement, we write x ←→
are disjoint:
Theorem 2.28 (The giant is almost local) Let Gn = (V (Gn ), E(Gn )) denote a random
graph of size |V (Gn )| = n. Assume that Gn converges locally in probability to (G, o) ∼ µ.
Assume further that
1 h i
lim lim sup 2 E # (x, y) ∈ V (Gn ) × V (Gn ) : |C (x)|, |C (y)| ≥ k, x ←→ / y = 0.
k→∞ n→∞ n
(2.6.7)
Then, if Cmax and C(2) denote the largest and second largest connected components (with
ties broken arbitrarily),
|Cmax | P |C(2) | P
−→ ζ = µ(|C (o)| = ∞), −→ 0. (2.6.8)
n n
Remark 2.29 (“Giant is almost local” proofs) Theorem 2.28 shows that the relatively mild
condition in (2.6.7) suffices for the giant to have the expected limit. In fact, it is necessary
and sufficient; see Exercise 2.34. It is most useful when we can easily show that vertices with
large clusters are likely to be connected, and it will be applied to the Erdős–Rényi random
graph below, to configuration models in Section 4.3, and to inhomogeneous random graphs
with finitely many types in Section 6.5.3. J
We now start with the proof of Theorem 2.28. Recall that ζ = µ(|C (o) = ∞) might be
a random variable when µ is a random probability measure on rooted graphs. We first note
that, by Corollary 2.27, the statement follows on the event that ζ = µ(|C (o) = ∞) = 0,
so that it suffices to prove Theorem 2.28 on the event that ζ > 0. By conditioning on this
event, we may assume that ζ > 0 almost surely.
We recall that the vector (|C(i) |)i≥1 denotes the cluster sizes ordered in size, from large
to small with ties broken arbitrarily, so that |C(1) | = |Cmax |. The following lemma gives
a useful estimate of the sum of squares of these ordered cluster sizes. In its statement, we
write Xn,k = ok,P (1) when
lim lim sup P(|Xn,k | > ε) = 0. (2.6.9)
k→∞ n→∞
Proof We use that, by local convergence in probability and for any k ≥ 1 fixed (recall
(2.6.4))
1 1X
Z≥k = |C(i) |1{|C(i) |≥k} = ζ + ok,P (1), (2.6.11)
n n i≥1
by Exercise 2.23. Further,
2
Z≥k 1X
ζ 2 + ok,P (1) = 2
= |C(i) |2 1{|C(i) |≥k} + ok,P (1). (2.6.12)
n n i≥1
Indeed,
1 X
|C(i) ||C(j) |1{|C(i) |,|C(j) |≥k}
n2 i,j≥1
i6=j
1
= # (x, y) ∈ V (Gn ) × V (Gn ) : |C (x)|, |C (y)| ≥ k, x ←→/ y , (2.6.13)
n2
which, by the Markov inequality, and abbreviating (x, y) ∈ V (Gn ) × V (Gn ) to (x, y),
satisfies
1
lim lim sup P 2 # (x, y) : |C (x)|, |C (y)| ≥ k, x ←→ / y ≥ε
k→∞ n→∞ n
1 h i
≤ lim lim sup 2 E # (x, y) : |C (x)|, |C (y)| ≥ k, x ←→ / y = 0, (2.6.14)
k→∞ n→∞ εn
|E(Cmax )| P 1 h
−→ Eµ do 1{|C (o)|=∞} .
i
(2.6.19)
n 2
Proof The proof follows that of Theorem 2.28. We now define, for k ≥ 1, A ⊆ N, and
with dv the degree of v in Gn ,
Assume that Gn converges locally in probability to (G, o). Then we conclude that
ZA,≥k P
−→ µ(|C (o)| ≥ k, do ∈ A). (2.6.21)
n
Since |Cmax | ≥ k whp by Theorem 2.28, we thus obtain, for every A ⊆ N,
1X ZA,≥k P
va (Cmax ) ≤ −→ µ(|C (o)| ≥ k, do ∈ A), (2.6.22)
n a∈A n
Therefore, along the subsequence (nl )l≥1 that attains the lim inf in (2.6.25), with asymp-
totic probability κ > 0, and using (2.6.24),
|Cmax | 1 v` (Cmax )
= [|Cmax | − v` (Cmax )] +
n n n
≤ µ(|C (o)| ∈ {`}c ) + ε/2 + µ(|C (o)| = ∞, do = `) − ε
≤ µ(|C (o)| = ∞) − ε/2, (2.6.26)
which contradicts Theorem 2.28. We conclude that (2.6.25) cannot hold, so that (2.6.18)
follows.
For (2.6.19), we note that
1X
|E(Cmax )| = `v` (Cmax ). (2.6.27)
2 `≥1
We divide by n and split the sum over ` into small and large `:
|E(Cmax )| 1 X 1 X
= `v` (Cmax ) + `v` (Cmax ). (2.6.28)
n 2n `∈[K] 2n `>K
By uniform integrability,
on 1{d(G
lim lim sup E d(n)
o
n)
>K}
= 0. (2.6.31)
K→∞ n→∞ n
and
1 X
1{B (Gn )
(v)'H? }
P
−→ µ(|C (o)| < ∞, Br(G) (o) ' H? ). (2.6.34)
n v6∈C r
max
Proof The convergence in (2.6.34) follows from that in (2.6.33) combined with the fact
that, by assumption,
1 X
1 (Gn ) P
−→ µ(Br(G) (o) ' H? ). (2.6.35)
n v∈V (G ) {Br (v)'H? }
n
The convergence in (2.6.34) can be proved as for Theorem 2.31, now using that, for every
H? ⊆ G? ,
1 1 X
ZH? ,≥k ≡ 1 (Gn )
n n v∈V (G ) {|C (v)|≥k,Br (v)∈H? }
n
We then argue by contradiction again as in (2.6.25) and (2.6.26). We leave the details to the
reader.
µ(|C (o)| ≥ k, |∂Br(G) (o)| < r) → 0, µ(|C (o)| < k, |∂Br(G) (o)| ≥ r) → 0.
(2.6.39)
Proof Denote
Pk = # (x, y) : |C (x)|, |C (y)| ≥ k, x ←→
/ y , (2.6.40)
0
Pr = # (x, y) : |∂Br (x)|, |∂Br (y)| ≥ r, x ←→
(Gn ) (Gn )
/ y . (2.6.41)
Then,
|Pk − Pr0 | ≤ 2n Z<r,≥k + Z≥r,<k ,
(2.6.42)
80 Local Convergence of Random Graphs
where
1{|∂B
X
Z<r,≥k = (Gn )
r (v)|<r,|C (v)|≥k}
, (2.6.43)
v∈V (Gn )
1{|∂B
X
Z≥r,<k = (Gn )
r (v)|≥r,|C (v)|<k}
. (2.6.44)
v∈V (Gn )
Our aim is to show that, for every ε > 0, we can find r = rε such that, for every b(1) (2)
0 , b0 ≥ r
(1) (2)
and s0 , s0 fixed,
lim sup Pr o1 ←→
/ o2 ≤ ε. (2.6.56)
n→∞
Under Pr ,
|∂Br+1 (o1 ) \ Br (o2 )| ∼ Bin(n(1) (1)
1 , p1 ), (2.6.57)
where
(1)
λ b0
n(1) (1) (2)
1 = n − s0 − s0 , p(1)
1 = 1 − 1 − . (2.6.58)
n
Here, we note that the vertices in ∂Br (o2 ) play a different role from those in ∂Br (o1 ), as
they can be in ∂Br+1 (o1 ), but those in ∂Br (o1 ) cannot. This explains the slightly asymmet-
ric form with respect to vertices 1 and 2 in (2.6.58).
We are led to studying concentration properties of binomial random variables. For this,
we rely on the following lemma:
Lemma 2.35 (Concentration binomials) Let X ∼ Bin(m, p). Then, for every δ > 0,
δ 2 E[X]
P X − E[X] ≥ δ E[X] ≤ 2 exp −
. (2.6.59)
2(1 + δ/3)
Proof This is a direct consequence of [V1, Theorem 2.21].
Lemma 2.35 ensures that whp the boundary of |∂Br+1 (o1 )| is close to λ|∂Br (o1 )| for r
and n large, so that the boundary grows by a factor λ > 1. Further applications lead to the
statement that |∂Br+k (o1 )| ≈ λk |∂Br (o1 )|. Thus, in roughly a logλ n steps, the boundary
will have expanded to na vertices. However, in order to make this precise, we need that (1)
the sum of complementary probabilities in Lemma 2.35 is still quite small uniformly in k
and for r large; and (2) we have good control over the number of vertices in the boundaries,
not just in terms of lower bounds, but also in terms of upper bounds, as that gives control
over the number of vertices that have not yet been used. For the latter, we also need to deal
with the δ -dependence in (2.6.59).
We prove (2.6.56) by√first growing |∂Br+k (o1 )| for k ≥ 1 until |∂Br+k (o1 )| is very
large (much larger than n will suffice), and then, outside Br+k (o1 ) for the√ appropriate k ,
growing |∂Br+k (o2 )| for k ≥ r until |∂Br+k (o2 )| is also very large (now n will suffice).
Then, it is very likely that there is a direct edge between the resulting boundaries. We next
provide the details.
0
on |∂Br+k (o1 )| that we will prove to hold whp, where we choose ε > 0 small enough that
λ(1 − ε) > 1. We let
k−1
(1)
X
s(1)
k−1 = s (1)
0 + bl (2.6.60)
l=0
2.6 Giant Component is Almost Local 83
0
denote the resulting upper bound on |Br+k−1 (o1 )|. We fix a ∈ ( 12 , 1), and let
k ≤ kn? = kn? (ε) = da logλ(1−ε) ne, (2.6.61)
and note that there exists C > 1 such that
k−1
X b(1)
0
(1)
sk−1 ≤ s0 + (1)
[λ(1 + ε)]l ≤ s(1)
0 + [λ(1 + ε)]l
l=0
λ(1 + ε) − 1
a log λ(1+ε)/ log λ(1−ε)
≤ Cn , (2.6.62)
uniformly in k ≤ kn? .
We choose a ∈ ( 12 , 1) so that a log λ(1 + ε)/ log λ(1 − ε) ∈ ( 12 , 1).
Define the good event by
(1)
\ (1)
0
(1)
Er,[k] = Er,l , where (1)
Er,k = {b(1)
k ≤ |∂Br+k (o1 )| ≤ bk }. (2.6.63)
l∈[k]
We write
Y (1)
Pr Er,[k] Pr Er,l | Er,[l−1]
(1)
(1)
= , (2.6.64)
l∈[k]
so that
X (1)
Pr Er,[k] Pr (Er,l )c | Er,[l−1]
(1)
≥1− (1)
. (2.6.65)
l∈[k]
(1)
0 (1)
With the above choices, we have, conditional on |∂Br+l−1 (o1 )| = b(1)
l−1 ∈ [bl−1 , bl−1 ]
0 (1) (1)
and |Br+l−1 (o1 )| = sl−1 ≤ sl−1 ,
0
|∂Br+l (o1 )| ∼ Bin(n(1) (1)
l , pl ), (2.6.66)
where
(1)
λ bl−1
(1)
nl = n − sl−1 − s0 , (1) (2) (1)
pl =1− 1− . (2.6.67)
n
0
The fact that we grow ∂Br+l−1 (o1 ) outside of Br (o2 ) is reflected in the subtraction of s(2) 0
0
in n(1)
l . We aim to apply Lemma 2.35 to |∂B r+l (o 1 )| , with δ = ε/2 , for which it suffices to
prove bounds on the (conditional) expectation n(1)
l p (1)
l . We use that
b(1)
l−1 λ (b(1) λ)2 b(1)
l−1 λ
− l−1 2 ≤ p(1) l ≤ . (2.6.68)
n 2n n
Therefore, with Er denoting expectation wrt Pr ,
0 b(1)
l−1 λ
Er [|∂Br+l (1)
(o1 )| | Er,[l−1] ] = n(1) (1)
l pl ≤ n = λb(1)
l−1 , (2.6.69)
n
which provides the upper bound on n(1) (1)
l pl . For the lower bound, we use the lower bound
in (2.6.68) to note that pl ≥ (1 − ε/4)λb(1)
(1)
l−1 /n for n sufficiently large, since we are on
?
(1)
Er,[l−1] . Further, n(1) (1)
l ≥ (1 − ε/4)n on Er,[l−1] , uniformly in l ≤ kn . We conclude that, for
n sufficiently large,
b(1) λ
nl pl(1) (1)
≥ (1 − ε/4) n l−1 ≥ (1 − ε/2)λb(1)
2
l−1 . (2.6.70)
n
84 Local Convergence of Random Graphs
(1)
0
(1)
Recall the definition of Er,l in (2.6.63). As a result, b(1)
k ≤ |∂Br+k (o1 )| ≤ bk implies that
0
|∂Br+k (o1 )| − n(1)
l pl
(1)
≤ (ε/2)n(1) (1)
l pl . Thus, by Lemma 2.35 with δ = ε/2,
0
Pr (Er,l(1) c
≤ Pr |∂Br+l
(1)
) | Er,[l−1] (o1 )| − n(1)
l pl
(1)
≥ (ε/2)n(1)
l pl
(1) (1)
Er,[l−1]
2
ε (1 − ε/2)λb(1)
l−1
≤ 2 exp − = 2 exp −qλb(1)
l−1 , (2.6.71)
8(1 + ε/6)
where q = ε2 (1 − ε/2)/[8(1 + ε/6)] > 0.
We conclude that, for n sufficiently large,
k
X (1)
Pr Er,[k] ≥ 1 − 2 e−qλbl−1 ,
(1)
(2.6.72)
l=1
which is our key estimate for the neighborhood growth in ERn (λ/n).
(2)
0 (2)
Then, conditional on the above, as well as on |∂Br+k−1 (o2 )| = b(2)
k−1 ∈ [bk−1 , bk−1 ] and
0
|Br+k−1 (o2 )| = s(2) (2)
k−1 ≤ sk−1 , we have
0
|∂Br+k (o2 )| ∼ Bin(n(2) (2)
k , pk ), (2.6.76)
where now
(2)
λ bk−1
n(2) (1)
k = n − skn
(2)
? − sk−1 , p(2)
k = 1 − 1 − . (2.6.77)
n
Let Pr,kn? denote the conditional probability given |∂Br (o1 )| = b(i) (i)
0 with b0 ≥ r,
2.6 Giant Component is Almost Local 85
On Erε ,[kn? ] ,
k
|∂Br+kn? (o1 )| ≥ b(1)
kn
(1)
? = b0 [λ(1 − ε)] ≥ rε na , (2.6.82)
where we choose a such that a ∈ ( 12 , 1). An identical bound holds for |∂Br+k
0
n
? (o2 )|. There-
0
fore, the total number of direct edges between ∂Brε +kn? (o1 ) and ∂Brε +kn? (o2 ) is at least
(rε na )2 n when a > 21 . Each of these potential edges is present independently with
probability λ/n. Therefore,
a 2
λ (rε n )
Pr distERn (λ/n) (o1 , o2 ) > 2(kn? + rε ) + 1 | Erε ,[kn? ] ≤ 1 − = o(1). (2.6.83)
n
We conclude that, for n sufficiently large,
Pr distERn (λ/n) (o1 , o2 ) ≤ 2(kn? + rε ) + 1 | Erε ,[kn? ] = 1 − o(1). (2.6.84)
Theorem 2.36 (Small-world nature Erdős–Rényi random graph) Consider ERn (λ/n)
with λ > 1. Then, conditional on o1 ←→ o2 ,
distERn (λ/n) (o1 , o2 ) P 1
−→ . (2.6.86)
log n log λ
Proof The lower bound follows directly from (2.4.34), which implies that
k
h i X λl λk+1 − 1
P distERn (λ/n) (o1 , o2 ) ≤ k = E |Bk (o1 )|/n ≤
= . (2.6.87)
l=0
n n(λ − 1)
Applying this to k = d(1 − η) logλ ne shows that, for any η > 0,
P distERn (λ/n) (o1 , o2 ) ≤ (1 − η) logλ n → 0.
(2.6.88)
For the upper bound, we start by noting that
1 X
P(o1 ←→ o2 | ERn (λ/n)) = |C(i) |2 −→ ζλ2 ,
P
2
(2.6.89)
n i≥1
then graph distances grow logarithmically as in Theorem 2.36. If, on the other hand, the
second moment blows up with the graph size, then distances are smaller. In particular, often
these typical distances are doubly logarithmic when the degrees obey a power law with
exponent τ that satisfies τ ∈ (2, 3), so that even a moment of order 2 − ε is infinite for some
ε > 0. Anyone who has done some numerical work will realize that in practice there is little
difference between log log n and a constant, even when n is quite large.
One of the main conclusions of the local convergence results in Part II is that the most
popular random graph models for inhomogeneous real-world networks are locally tree-like,
in that the majority of neighborhoods of vertices have no cycles. This is for example true
for the Erdős–Rényi random graph, see Theorem 2.18, since the local limit is a branching
process tree. In many real-world settings, however, this is not realistic. Certainly in social
networks, many triangles and even cliques of larger size exist. Therefore, in Part IV, con-
sisting of Chapter 9, we investigate some adaptations of the models discussed in Parts II and
III. These models may incorporate clustering or community structure; they may be directed
or living in a geometric space. All these aspects have received tremendous attention in the
literature. Therefore, with Part IV in hand, the reader will be able to access the literature
more easily.
There is an extensive body of work studying dense graph limits, using the theory of graphons; see Lovász
(2012) and the references therein. Links have been built between this theory and the so-called “local–global”
limits of sparse graphs called graphings. Elek (2007) proved that local limits of bounded-degree graphs are
graphings; see also Hatami et al. (2014) for an extension. The related notion of graphops was defined in
Backhausz and Szegedy (2022).
where nk is the number of vertices of degree k in Gn . It is not hard to adapt the proof of Theorem 2.23 to
P
show that, under its assumptions, cGn (k) −→ cG (k), where
h ∆ (o) i
G
cG (k) = Eµ | do = k , (2.7.2)
k(k − 1)
and the convergence holds for all k for which p(G) k = µ(do(G) = k) > 0. See Exercise 2.37.
The convergence of the assortativity coefficient in Theorem 2.26 is restricted to degree distributions
that have uniformly integrable third moments. In general, an empirical correlation coefficient needs a finite
variance of the random variables to converge to the correlation coefficient. Nelly Litvak and the author (see
van der Hofstad and Litvak (2014) and Litvak and van der Hofstad (2013)) proved that when the random
variables do not have finite variance, such convergence (even for an iid sample) can be to a proper random
variable that has support containing a subinterval of [−1, 0] and a subinterval of [0, 1], giving problems in
interpretation.
For networks, ρGn in (2.5.44) is always well defined, and gives a value in [−1, 1]. However, also for
networks there is a problem with this definition. Indeed, van der Hofstad and Litvak (2014) and Litvak and
van der Hofstad (2013) proved that if a limiting value of ρGn exists for a sequence of networks and the
third moment of the degree of a random vertex is not uniformly integrable, then lim inf n→∞ ρGn ≥ 0, so
no asymptotically disassortative graph sequences exist for power-law networks with infinite third-moment
degrees. Naturally, other ways of classifying the degree–degree dependence can be proposed, such as the
correlation of their ranks. Here, a sequence of numbers x1 , . . . , xn has ranks r1 , . . . , rn when xi is the ri th
largest of x1 , . . . , xn . Ties tend to be broken by giving random ranks for the equal values. For practical
purposes, a scatter plot of the values might be the most useful way to gain insight into degree–degree
dependencies.
Several related graph properties or parameters have been investigated using local convergence. Lyons
(2005) showed that the exponential growth rate of the number of spanning trees of a finite connected graph
can be computed through the local limit. See also Salez (2013) for weighted spanning subgraphs, and
Gamarnik et al. (2006) for maximum-weight independent sets. Bhamidi et al. (2012) identified the limiting
spectral distribution of the graph adjacency matrix of finite random trees using local convergence, and
2.8 Exercises for Chapter 2 89
Bordenave et al. (2011) proved the convergence of the spectral measure of sparse random graphs (see also
Bordenave and Lelarge (2010) and Bordenave et al. (2013) for related results). A property that is almost
local is the density of the densest subgraph in a random graph, as shown by Anantharam and Salez (2016)
and studied in more detail in Section 4.5.
(G)
and dv is the degree of v in G.
(G)
Exercise 2.3 (Distance to rooted graph ball) Recall the definition of the ball Br (o) around o in the
graph G in (2.2.1). Show that dG? Br(G) (o), (G, o) ≤ 1/(r + 1). When does equality hold?
Exercise 2.4 (Countable number of graphs with bounded radius) Fix r. Show that there is a countable
number of isomorphism classes of rooted graphs (G, o) with radius at most r. Here, we let the radius
rad(G, o) of a rooted graph (G, o) be equal to rad(G, o) = maxv∈V (G) distG (o, v) where distG denotes
the graph distance in G.
Exercise 2.5 (G? is separable) Use Exercise 2.4 above to show that the set of rooted graphs G? has a
countable dense set, and is thus separable. (See also Proposition A.12 in Appendix A.3.2.)
Exercise 2.6 (Continuity of local neighborhood functions) Fix H? ∈ G? . Show that h : G? 7→ {0, 1}
given by h G, o = 1{B (G) (o)'H } is continuous.
r ?
Exercise 2.7 (Bounded number of graphs with bounded radius and degrees) Show that there are only a
bounded number of isomorphism classes of rooted graphs (G, o) with radius at most r for which the degree
of every vertex is at most k.
Exercise 2.8 (Random local weak limit) Construct the simplest (in your opinion) possible example where
the local weak limit of a sequence of deterministic graphs is random.
Exercise 2.9 (Local weak limit of line and cycle) Let Gn be given by V (Gn ) = [n], E(Gn ) = {i, i +
1} : i ∈ [n − 1] be a line. Show that (Gn , on ) converges to (Z, 0). Show that the same is true for the cycle,
for which E(Gn ) = {i, i + 1} : i ∈ [n − 1] ∪ {1, n} .
Exercise 2.10 (Local weak limit of finite tree) Let Gn be the tree of depth k, in which every vertex except
the 3 × 2k−1 leaves have degree 3. Here n = 3(2k − 1). What is the local weak limit of Gn ?
Exercise 2.11 (Uniform integrability and convergence of size-biased degrees) Show that when (d(Gn)
on )n≥1
forms a uniformly integrable sequence of random variables, there exists a subsequence along which Dn? ,
the size-biased version of Dn = do(Gnn ) , converges in distribution.
Exercise 2.12 (Uniform integrability and degree regularity condition) For Gn = CMn (d), show that
Conditions 1.7(a),(b) imply that (d(Gn)
on )n≥1 is a uniformly integrable sequence of random variables.
Exercise 2.13 (Adding a small disjoint graph does not change local weak limit) Let Gn be a graph that
converges in the local weak sense. Let an ∈ N be such that an = o(n), and add a disjoint copy of an
arbitrary graph of size an to Gn . Denote the resulting graph by G0n . Show that G0n has the same local weak
limit as Gn .
90 Local Convergence of Random Graphs
Exercise 2.14 (Local weak convergence does not imply uniform integrability of the degree of a random
vertex) In the setting of Exercise 2.13, add a complete graph of size an to Gn . Let a2n n. Show that
the degree of a vertex chosen uar in G0n is not uniformly integrable.
Exercise 2.15 (Local limit of random 2-regular graph) Show that the configuration model CMn (d) with
dv = 2 for all v ∈ [n] converges locally in probability to (Z, 0). Conclude that the same applies to the
random 2-regular graph.
Exercise 2.16 (Independent neighborhoods of different vertices) Let Gn converge locally in probability
to (G, o). Let (o(1) (2) (1)
n , on ) be two independent uniformly chosen vertices in V (Gn ). Show that (Gn , on )
and (Gn , o(2)
n ) jointly converge to two conditionally independent copies of (G, o) given µ.
Exercise 2.17 (Directed graphs as marked graphs) There are several ways to describe directed graphs as
marked graphs. Give one.
Exercise 2.18 (Multi-graphs as marked graphs) Use the formalism of marked rooted graphs in Definition
2.10 to cast the setting of multi-graphs discussed in Remark 2.4 into this framework.
Exercise 2.19 (Uniform d-regular simple graph) Use Theorem 2.17 and (1.3.41) to show that the uniform
random d-regular graph (which is the same as the d-regular configuration model conditioned on simplicity)
also converges locally in probability to the infinite d-regular tree.
Exercise 2.20 (Local weak convergence and subsets) Recall the statement of Theorem 2.16. Prove that
d
local weak convergence (Gn , on ) −→ (Ḡ, ō) when (2.4.10) holds for all H? ∈ T? (r) and all r ≥ 1.
Exercise 2.21 (Local convergence in probability and subsets) Recall the statement of Theorem 2.16. Prove
that Gn converges locally in probability to (G, o) when (2.4.11) holds for all H? ∈ T? (r) and all r ≥ 1.
Extend this to almost sure local convergence and (2.4.12).
Exercise 2.22 (Functional for number of connected components is continuous) Prove that h(G, o) =
1/|C (o)| is a bounded and continuous function, where, by convention, h(G, o) = 0 when |C (o)| = ∞.
Exercise 2.23 (Functional for number of connected components is continuous) Recall the notion of a
random variable Xn,k being ok,P (1) in (2.6.9). Recall the definition of Z≥k in (2.6.3). Show that Z≥k /n =
ζ + ok,P (1).
Exercise 2.24 (Convergence of sum of squares of cluster sizes) Show that, under the conditions of Theo-
rem 2.28 and with ζ = µ(|C (o)| = ∞),
1 X
|C(i) |2 −→ ζ 2 .
P
(2.8.2)
n2
i≥1
Exercise 2.25 (Expected boundary of balls in Erdős–Rényi random graphs) Prove that E |∂Br(Gn ) (1)| ≤
λr for Gn = ERn (λ/n) and every r ≥ 0. This can be done, for example, by using induction and showing
that, for every r ≥ 1,
E |∂Br(Gn ) (1)| | Br−1
(Gn ) (Gn )
(1) ≤ λ|∂Br−1 (1)|. (2.8.3)
Exercise 2.26 (Uniform integrability and moment convergence) Assume that Dn = do(Gnn ) is such that
(Dn3 )n≥1 is uniformly integrable. Assume further that Gn converges locally in probability to (G, o). Prove
n) 3 3
that E[(d(G
on ) | Gn ] −→ Eµ [do ]. Conclude that (2.5.47) and (2.5.48) hold. Hint: You need to be very
P
3
careful, as Eµ [do ] may be a random variable when µ is a random measure.
Exercise 2.27 (Uniform integrability and moment convergence) Use Cauchy–Schwarz to show that Eµ [d3o ]−
Eµ [d2o ]2 /Eµ [do ] > 0 when µ(do = r) < 1 for every r ≥ 0.
Exercise 2.28 (Example of weak convergence where convergence in probability fails) Construct an ex-
ample where Gn converges locally weakly to (G, o), but not locally in probability.
Exercise 2.29 (Continuity of neighborhood functions) Fix m ≥ 1 and `1 , . . . , `m . Show that
h(G, o) = 1{|∂B (G) (o)|=` (2.8.4)
r k ∀k≤m}
Exercise 2.30 (Proof of (2.5.2)) Let Gn converge locally in probability to (G, o). Prove the joint conver-
gence in distribution of the neighborhood sizes in (2.5.2) using Exercise 2.16.
Exercise 2.31 (Example where the proportion in the giant is smaller than the survival probability) Con-
P
struct an example where Gn converges locally in probability to (G, o) ∼ µ, while |Cmax |/n −→ η < ζ =
µ(|C (o)| = ∞).
Exercise 2.32 (Convergence of the proportion of vertices in clusters of size at least k) Let Gn converge
P
locally in probability to (G, o) as n → ∞. Show that Z≥k in (2.6.3) satisfies that Z≥k /n −→ ζ≥k =
µ(|C (o)| ≥ k) for every k ≥ 1.
Exercise 2.33 (Upper bound on |Cmax | using local convergence) Let Gn = (V (Gn ), E(Gn )) denote a
random graph of size |V (Gn )| = n. Assume that Gn converges locally in probability to (G, o) ∼ µ as
n → ∞, and assume that the survival probability of the limiting graph (G, o) satisfies ζ = µ(|C (o)| =
P
∞) = 0. Show that |Cmax |/n −→ 0.
Exercise 2.34 (Sufficiency of (2.6.7) for almost locality of the giant) Let Gn = (V (Gn ), E(Gn )) denote
a random graph of size |V (Gn )| = n. Assume that Gn converges locally in probability to (G, o) ∼ µ and
write ζ = µ(|C (o)| = ∞) for the survival probability of the limiting graph (G, o). Assume that
1 h i
lim sup 2 lim sup E # (x, y) ∈ V (Gn ) × V (Gn ) : |C (x)|, |C (y)| ≥ k, x ←→ / y > 0. (2.8.5)
k→∞ n n→∞
Exercise 2.35 (Lower bound on graph distances in Erdős–Rényi random graphs) Use Exercise 2.25 to
show that, for every ε > 0,
dist
ERn (λ/n) (o1 , o2 )
lim P ≤ 1 − ε = 0. (2.8.7)
n→∞ logλ n
Exercise 2.36 (Lower bound on graph distances in Erdős–Rényi random graphs) Use Exercise 2.25 to
show that
lim lim sup P distERn (λ/n) (o1 , o2 ) ≤ logλ n − K = 0,
(2.8.8)
K→∞ n→∞
Overview of Part II
In this Part, we study local limits and connected components in random graphs, and the
relation between them. In more detail, we investigate the connected components of uniform
vertices, thus also describing the local limits of these random graphs. Further, we study the
existence and structure of the largest connected component, sometimes also called the giant
component when it contains a positive (as opposed to zero) proportion of the vertices in the
graph.
In many random graphs, such a giant component exists when there are sufficiently many
connections, while the largest connected component is much smaller than the number of
vertices when there are few connections. Thus, these random graphs undergo a phase tran-
sition. We identify the size of the giant component, as well as its structure in terms of the
degrees of its vertices. We also investigate whether the graph is fully connected. General
inhomogeneous random graphs are studied in Chapter 3, and the configuration model, as
well the closely related uniform random graph with prescribed degrees, in Chapter 4. In the
last chapter of this part, Chapter 5, we study the connected components and local limits of
preferential attachment models.
93
C HAPTER 3
C ONNECTED C OMPONENTS IN G ENERAL
I NHOMOGENEOUS R ANDOM G RAPHS
Abstract
In this chapter, we introduce the general setting of inhomogeneous random
graphs that are generalizations of the Erdős–Rényi and generalized random
graphs. In inhomogeneous random graphs the status of edges is independent,
with unequal edge-occupation probabilities. While these edge probabilities are
moderated by vertex weights in generalized random graphs, in the general set-
ting they are described in terms of a kernel.
The main results in this chapter concern the degree structure, the multi-type
branching process local limits, and the phase transition in these inhomogeneous
random graphs. We also discuss various examples, and indicate that they can
have rather different structures.
In this chapter we discuss general inhomogeneous random graphs, which are sparse random
graphs in which the edge statuses are independent. We investigate their local limits, as well
as their connectivity structure, and their giant component. This is inspired by the fact that
many real-world networks are highly connected, in the sense that their largest connected
component contains a large proportion of the total vertices of the graph. See Table 3.1 for
many examples and Figure 3.1 for the proportion of vertices in the maximal connected
components in the KONECT data base.
Table 3.1 and Figure 3.1 raise the question of how one can view settings where giant
0.9
Relative size of LCC
0.8
0.7
0.6
0.5
0.4
0.3
95
96 Connected Components in General Inhomogeneous Random Graphs
Table 3.1 The rows in this table correspond to the following real-world networks:
Protein–protein interactions in the blood of people with Alzheimer’s disease.
Protein–protein interactions in the blood of people with multiple sclerosis.
IMDb collaboration network, where actors are connected when they have co-acted in a movie.
DBLP collaboration network, where scientist are connected when they have co-authored a paper.
Interactions between zebras, where zebras are connected when they have interacted during the
observation phase.
components exist. We know that there is a phase transition in the size of the giant component
in ERn (λ/n); recall [V1, Chapter 4]. A main topic in the present chapter is to investigate the
conditions for a giant component to be present in general inhomogeneous random graphs;
this occurs precisely when the local limit has a positive survival probability (recall Section
2.6). Therefore, we also investigate the local convergence of inhomogeneous random graphs
in this chapter.
We will study much more general models, where edges are present independently, than
in the generalized random graph in [V1, Chapter 6]; see also Section 1.3.2. There, vertices
have weights associated to them, and the edge-occupation probabilities are approximately
proportional to the product of the weights of the vertices that the edge connects. This means
that vertices with high weights have relatively large probabilities of connections to all other
vertices, a property that may not always be appropriate. Let us illustrate this by an example,
which is a continuation of [V1, Example 6.1]:
Example 3.1 (Population of two types: general setting) Suppose that we have a complex
network in which there are n1 vertices of type-1 and n2 of type-2. Type-1 individuals have
on average m1 neighbors, type-2 individuals m2 , where m1 6= m2 . Further, suppose that
the probability that a type-1 individual is a friend of a type-2 individual is quite different
from the probability that a type-1 individual is a friend of another type-1 individual.
In the generalized random graph model proposed in [V1, Example 6.3], the probabil-
ity that a type-s individual is a friend of a type-r individual (where s, r ∈ [2]) equals
ms mr /(`n + ms mr ), where `n = n1 m1 + n2 m2 . Approximating this probability by
ms mr /`n , we see that the probability that a type-1 individual is a friend of a type-2 in-
dividual is highly related to the probability that a type-1 individual is friend of a type-1
individual. Indeed, take two type-1 and two type-2 individuals. Then, the probability that
the type-1 individuals are friends and the type-2 individuals are friends is almost the same
as the probability that the first type-1 individual is friend of the first type-2 individual, and
that the second type-1 individual is a friend of the second type-2 individual. Thus, there is
some, possibly unwanted and artificial, symmetry in the model.
How can one create instances where the edge probabilities between vertices of the same
type are much larger, or alternatively much smaller, than they would be for the generalized
3.2 Definition of the Model 97
random graph? In sexual networks, there are likely to be more edges between the different
sexes than amongst them, while in highly polarized societies most connections are within the
groups. In the two extremes, we either have a bipartite graph, where vertices are connected
only to vertices of the other type, or a disjoint union of two Erdős–Rényi random graphs,
consisting of the vertices of the two types and no edges between them. We aim to be able
to obtain anything in between. In particular, the problem with the generalized random graph
originates in the approximate product structure of the edge probabilities. In this chapter, we
deviate from such a product structure. J
We assume that our individuals (vertices) have types which are in a certain type space S .
When there are individuals of just two types, as in Example 3.1, then it suffices to take
S = {1, 2}. However, the model allows for rather general sets of types of the individuals,
both finite as well as (countably or even uncountably) infinite type spaces. An example
of an uncountably infinite type space arises when the types are related to the ages of the
individuals in the population. Also the setting of the generalized random graph with wi
satisfying (1.3.15) correponds to the uncountable type-space setting when the distribution
function F is that of a continuous random variable W . We therefore also need to know
how many individuals there are of a given type. This is described in terms of a measure µn ,
where, for A ⊆ S , µn (A) denotes the proportion of individuals having a type in A.
In our general model, instead of vertex weights, the edge probabilities are moderated by a
kernel κ : S 2 → [0, ∞). The probability that two vertices of types x1 and x2 are connected
is approximately κ(x1 , x2 )/n, and different edges are present independently. Since there
are many choices for κ, we arrive at a rather flexible model.
98 Connected Components in General Inhomogeneous Random Graphs
(iii)
1 1
X ZZ
[κ(xu , xv ) ∧ n] → κ(x, y)µ(dx)µ(dy). (3.2.3)
n2 1≤u<v≤n
2 S2
Similarly, a sequence (κn )n≥1 of kernels is called graphical with limit κ when, for µ-
almost every y, z ,
yn → y and zn → z imply that κn (yn , zn ) → κ(y, z), (3.2.4)
where κ satisfies conditions (a) and (b) above, and
1 X 1
ZZ
[κn (xu , xv ) ∧ n] → κ(x, y)µ(dx)µ(dy). (3.2.5)
n 1≤u<v≤n 2 S2
(b) A kernel κ is called reducible if
∃A ⊆ S with 0 < µ(A) < 1 such that κ = 0 a.e. on A × (S\A);
otherwise κ is irreducible. J
We now discuss the above definitions. Below, we will take puv = [κn (xu , xv ) ∧ n]/n.
Then the assumptions in (3.2.2), (3.2.3), (3.2.5) imply that the expected number of edges
E[|E(IRG n (κn ))|] is proportional to n, and that the proportionality constant is precisely
1
RR
2 S2
κ(x, y)µ(dx)µ(dy). Thus, in the terminology of [V1, Chapter 1], the model is sparse
(recall Section 1.1.1). This sparsity allows us to approximate graphical kernels by bounded
3.2 Definition of the Model 99
ones in such a way that the number of removed edges is oP (n), a fact that will be crucially
used in what follows. Indeed, bounded graphical kernels can be well approximated by step
functions similarly to the way in which continuous functions on R can be well approximated
by step functions. In turn, such step functions on S × S correspond to random graphs with
vertices having only finitely many different types.
We extend the setting to n-dependent sequences (κn )n≥1 of kernels in (3.2.4), as in many
natural cases the kernels do depend on n. In particular, this allows us to deal with several
closely related and natural notions of the edge probabilities, all at the same time (see, e.g.,
(3.2.6) and (3.2.7) below), showing that identical results hold in each of these cases.
Roughly speaking, κ is reducible if the vertex set [n] of IRGn (κ) can be split into two
parts in such a way that the probability of an edge from one part to the other is zero, and κ
is irreducible otherwise. For reducible kernels, we could equally well have started with each
of these parts separately, explaining why the notion of irreducibility is quite natural.
In many cases, we take S = [0, 1], xi = i/n, and µ the Lebesgue-measure on [0, 1].
Then, clearly, (3.2.1) is satisfied. In fact, Janson (2009) shows that we can always restrict
to S = [0, 1] by suitably adapting the other choices of our model. However, for notational
purposes, it is more convenient to work with general S . For example, when S = {1} is just
a single type, the model reduces to the Erdős–Rényi random graph, and, in the setting where
S = [0, 1], this is slightly more cumbersome, as can be worked out in detail in Exercise 3.1.
this follows immediately from [V1, Theorem 6.18] (see Exercise 3.4). In the next section,
we discuss some examples of inhomogeneous random graphs.
100 Connected Components in General Inhomogeneous Random Graphs
Chung–Lu Model
For CLn (w) with w = (wv )v∈[n] , where wv = [1 − F ]−1 (v/n) as in (1.3.15), we take
S = [0, 1], xv = v/n and, with ψ(x) = [1 − F ]−1 (x),
κn (x, y) = ψ(x)ψ(y)n/`n . (3.2.9)
For CLn (w) with w = (wv )v∈[n] satisfying Condition 1.1 in Section 1.3.2, instead, we take
S = [0, 1], xv = v/n, and
κn (u/n, v/n) = wu wv /E[Wn ]. (3.2.10)
Exercises 3.5 and 3.6 study the Chung–Lu random graph in the present framework.
Thus, despite the inhomogeneity that is present, every vertex in the graph has (asymptot-
ically) the same number of expected offspring. Exercise 3.8 shows that the Erdős–Rényi
random graph, the homogeneous bipartite random graph, and the stochastic block model are
all homogeneous random graphs. In such settings, the level of inhomogeneity is limited.
3.3 Degree Sequence of Inhomogeneous Random Graphs 101
Sum Kernels
We have already seen that product kernels are special, as they give rise to the Chung–Lu
model or its close relatives, the generalized random graph and the Norros–Reittu model. For
sum kernels, instead, we take κ(x, y) = ψ(x) + ψ(y), so that
puv = min{(ψ(u/n) + ψ(v/n))/n, 1}. (3.2.13)
We start by investigating the degrees of the vertices of IRGn (κn ). As we shall see, the
degree of a vertex of a given type x is asymptotically Poisson with mean
Z
λ(x) = κ(x, y)µ(dy) (3.3.1)
S
102 Connected Components in General Inhomogeneous Random Graphs
that (possibly) depends on the type x ∈ S . This leads to a mixed-Poisson distribution for
the degree D of a (uniformly chosen) random vertex of IRGn (κn ). We recall that Nk (n)
denotes the number of vertices of IRGn (κn ) with degree k , i.e.,
1{dv =k} ,
X
Nk (n) = (3.3.2)
v∈[n]
Recall that, in the finite-type case, the edge probability between vertices of types s and
r is denoted by (κn (s, r) ∧ n)/n. Further, (3.2.4) implies that κn (s, r) → κ(s, r) for
every s, r ∈ [t], while (3.2.1) implies that the number ns of vertices of type s satisfies
µn (s) = ns /n → µ(s) for some probability distribution (µ(s))s∈[t] .
3.3 Degree Sequence of Inhomogeneous Random Graphs 103
(a) (b)
100 100
10−1 10−1
10−2 10−2
P(X > x)
P(X > x)
10−3 10−3
10−4 10−4
Degree distribution Degree distribution
−5
Size-biased degree distribution −5
Size-biased degree distribution
10 Random friend degree distribution 10 Random friend degree distribution
100 101 102 103 100 101
Degrees Degrees
Assume that n ≥ max κ. The random variables (dv,r )r∈[t] are independent, and dv,r ∼
Bin(nr − 1{s=r} , κn (s, r)/n) −→ Poi(µ(r)κ(s, r)), where nr is the number of vertices
d
with type r and µ(r) = limn→∞ nr /n is the limiting type distribution. Hence
X
d
dv −→ Poi κ(s, r)µ(r) = Poi(λ(s)), (3.3.8)
r∈[t]
R P
where λ(s) = κ(s, r)µ(dr) = r∈[t] κ(s, r)µ(r). Consequently,
λ(s)k −λ(s)
P(dv = k) → P(Poi(λ(s)) = k) = e . (3.3.9)
k!
Let Nk,s (n) be the number of vertices in IRGn (κn ) of type s with degree k . Then
1 1
E[Nk,s (n)] = ns P(dv = k) → µ(s)P(Poi(λ(s)) = k). (3.3.10)
n n
It is easily checked that Var(Nk,s (n)) = O(n) (see Exercise 3.12). Hence,
1
Nk,s (n) −→ P(Poi(λ(s)) = k)µ(s),
P
(3.3.11)
n
and thus, summing over s ∈ [t],
1 X 1 X
P(Poi(λ(s)) = k)µ(s) = P(D = k).
P
Nk (n) = Nk,s (n) −→ (3.3.12)
n s∈[t]
n s∈[t]
In order to prove Theorem 3.4 in the general case, we approximate a sequence of graph-
ical kernels (κn ) by appropriate regular finite kernels, as we explain in detail in the next
subsection.
104 Connected Components in General Inhomogeneous Random Graphs
Our aim is to find finite-type approximations of κn that bound κn from above and below.
It is here that the metric structure of S , as well as the convergence properties of κn and
a.e.-continuity of (x, y) 7→ κ(x, y) in Definition 3.3, are crucially used:
Proposition 3.5 (Finite-type approximations of general kernels) If (κn )n≥1 is a graph-
ical sequence of kernels on a vertex space (S, µ, (xn )n≥1 ) with limit κ, then there exist
sequences (κm )m≥1 , and (κm )m≥1 when (3.3.13) holds, of finite-type kernels on the same
vertex space (S, µn , (xn )n≥1 ) satisfying the following:
(a) if κ is irreducible, then so are κm and κm for all large enough m;
(b) κm (x, y) % κ(x, y) for (µ × µ)-a.e. x, y ∈ S ;
(c) κm (x, y) & κ(x, y) for (µ × µ)-a.e. x, y ∈ S .
Let us now give some details. We find these finite-type approximations by giving a par-
tition Pm of S on which κn (x, y) is almost constant when x and y are inside cells of the
partition. Fix m ≥ 1; this indicates the number of cells in the partition of S . Given a se-
quence of finite partitions Pm = {Am1 , . . . , AmMm } of S and an x ∈ S , we define the
function x 7→ im (x) by requiring that
x ∈ Am,im (x) . (3.3.14)
Thus, im (x) indicates the cell in Pm containing x. For A ⊆ S , we write diam(A) =
sup{dist(x, y) : x, y ∈ A}, where dist(·, ·) denotes the distance on S . We obtain the
following key approximation result:
Lemma 3.6 (Approximating partition) Fix m ≥ 1. There exists a sequence of finite parti-
tions Pm = {Am1 , . . . , AmMm } of S such that:
(a) each Ami is measurable and µ(∂Ami ) = 0;
(b) for each m, Pm+1 refines Pm , i.e., each Ami is a union j∈Jmi Am+1,j for some set
S
Jmi ;
(c) for almost every x ∈ S , diam(Am,im (x) ) → 0 as m → ∞, where im (x) is defined by
(3.3.14).
Proof This proof is a little technical. When S = (0, 1] and µ is continuous, we can take
Pm as the dyadic partition into intervals of length 2−m . If S = (0, 1] and µ is arbitrary,
then we can do almost the same: only we shift the endpoints of the intervals a little when
necessary to avoid point masses of µ.
In general, we can proceed as follows. Let z1 , z2 , . . . be a dense sequence of points in
3.3 Degree Sequence of Inhomogeneous Random Graphs 105
S . For any zi , the balls Bd (zi ) = {y ∈ S : dist(y, zi ) ≤ d}, for d > 0, have disjoint
boundaries, and thus all except at most a countable number of them are µ-continuity sets.
Consequently, for every m ≥ 1, we may choose balls Bmi =SBdmi (zi ) that are µ-continuity
S satisfying 1/m < dmi < 2/m. Then, i Bmi = 0S and if we define
sets and have radii
0
Bmi := Bmi \ j<i Bmj , we obtain for each m an infinite partition {Bmi }i≥1 of S into µ-
continuity sets, each with diameter S at most 4/m. To get a finite partition, we choose qm large
enough to ensure that, with B00 := i>qm Bmi 0
, we have µ(Bm0 0
) < 2−m ; then {Bmi
0
}qi=0
m
is
0
a partition of S for each m, with diam(Bmi ) ≤ 4/m Tmfor i ≥ 1.
Finally, we let Pm consist of all intersections l=1 Bli0 l with 0 ≤ il ≤ ql ; then con-
ditions (a) and (b) are satisfied. Condition (c) follows from the Borel–Cantelli Lemma: as
0 0
m0 ) is finite, a.e. x is in finitely many of the sets Bm0 . For any such x, if m is
P
m µ(B
0
large enough then x ∈ Bmi for some i ≥ 1, so the part of Pm containing x has diameter
0
satisfying diam(Bmi ) ≤ 4/m.
Now we are ready to complete the proof of Proposition 3.5.
Proof of Proposition 3.5. Recall from Definition 3.3 that a kernel κ is a symmetric measur-
able function on S × S that is a.e. continuous. Recall also that κn is a graphical sequence
of kernels, so that it satisfies the convergence properties in (3.2.4). Fixing a sequence of par-
titions with the properties described in Lemma 3.6, we can define sequences of lower and
upper approximations to κ by
κm (x, y) = inf{κ(x0 , y 0 ) : x0 ∈ Am,im (x) , y 0 ∈ Am,im (y) }, (3.3.15)
κm (x, y) = sup{κ(x0 , y 0 ) : x0 ∈ Am,im (x) , y 0 ∈ Am,im (y) }. (3.3.16)
We thus replace κ by its infimum or supremum on each Ami × Amj . As κm might be
+∞, we use it only for bounded κn as in (3.3.13). Obviously, κm and κm are constant on
Am,i ×Am,j for every i, j , so that κm and κm correspond to finite-type kernels (see Exercise
3.14).
By Lemma 3.6(b),
κm ≤ κm+1 and κm ≥ κm+1 . (3.3.17)
Furthermore, since κ is almost everywhere continuous then, by Lemma 3.6(c),
κm (x, y) → κ(x, y) and κm (x, y) → κ(x, y) for (µ × µ)-a.e. (x, y) ∈ S 2 .
(3.3.18)
If (κn ) is a graphical sequence of kernels with limit κ, then we similarly define
κm (x, y) := inf{(κ ∧ κn )(x0 , y 0 ) : x0 ∈ Am,im (x) , y 0 ∈ Am,im (y) , n ≥ m}, (3.3.19)
κm (x, y) := sup{(κ ∨ κn )(x0 , y 0 ) : x0 ∈ Am,im (x) , y 0 ∈ Am,im (y) , n ≥ m}. (3.3.20)
By (3.3.17), κm ≤ κm+1 , and, by Lemma 3.6(c) and (3.2.4) in Definition 3.3(a),
κm (x, y) % κ(x, y) as m → ∞, for (µ × µ)-a.e. (x, y) ∈ S 2 . (3.3.21)
This proves part (b) of Proposition 3.5. The proof of part (c) is similar. For the irreducibility
in part (a), we may assume that µ is irreducible. In fact, κm may be reducible for some
m. We omit the proof that κm can be adapted in such a way that the adapted version is
irreducible.
106 Connected Components in General Inhomogeneous Random Graphs
Since κm ≤ κ, we can obviously construct our random graph in such a way that all
edges in IRGn (κm ) are also present in IRGn (κn ), which we will write as IRGn (κm ) ⊆
IRGn (κn ), and in what follows we will assume this. See also Exercise 3.15. Similarly, we
shall assume that IRGn (κm ) ⊇ IRGn (κn ) when κn is bounded as in (3.3.13). Moreover,
when n ≥ m,
κn ≥ κm , (3.3.22)
and we may assume that IRGn (κm ) ⊆ IRGn (κn ). By the convergence of the sequence of
kernels (κn ), we further obtain that the number of edges also converges. Thus, in bounding
κn , we do not create or destroy too many edges. This provides the starting point of our
analysis, which we provide in the following subsection.
λ(x), we can couple the limiting degrees in such a way that D(m) ≤ D almost surely, and
thus
P(D 6= D(m) ) = P(D − D(m) ≥ 1) ≤ E[D − D(m) ]
ZZ ZZ
= κ(x, y)µ(dx)µ(dy) − κm (x, y)µ(dx)µ(dy) < ε. (3.3.27)
S2 S2
3.3 Degree Sequence of Inhomogeneous Random Graphs 107
Combining (3.3.25), (3.3.26), and (3.3.27), we see that |Nk (n)/n− P(D = k))| < 4ε whp,
as required.
Bounded Kernels
First of all, the above proof is exemplary of several proofs that we will use in this chapter
as well as in Chapter 6. The current proof is particularly simple, as it makes use only of the
lower bounding finite-type inhomogeneous random graph, while in many settings we also
need the upper bound. This upper bound can apply only to bounded kernels κn as in (3.3.13).
As a result, we need to study the effect of bounding κn , for example by approximating it by
κn (x, y) ∧ K for large enough K .
Corollary 3.7 (Power-law tails for the degree sequence) Let (κn ) be a graphical sequence
of kernels with limit κ. Suppose that
where the first limit is for k fixed and n → ∞, and the second for k → ∞.
Proof It suffices to show that P(D > k) = cW k −(τ −1) (1 + o(1)); the remaining conclu-
sions then follow from Theorem 3.4. For any ε > 0, as k → ∞,
It follows that P(D > k) = P(Poi(W ) > k) = cW k −(τ −1) (1 + o(1)) as k → ∞. Exercise
3.16 asks you to fill in the details of this argument.
Corollary 3.7 shows that the general inhomogeneous random graph does include natural
cases with power-law degree distributions. Recall that we have already observed in [V1,
Theorem 6.7] that this is the case for GRGn (w) when the weights sequence w is chosen
appropriately.
108 Connected Components in General Inhomogeneous Random Graphs
In order to study further properties of IRGn (κn ), we need to understand the neighborhood
structure of vertices. This will be crucially used in the next section, where we study the
local convergence properties of IRGn (κn ). For simplicity, let us restrict ourselves first to
the finite-types case. As we have seen, nice kernels can be arbitrarily well approximated by
finite-type kernels, so this should be a good start. Then, for a vertex of type s, the number
of neighbors of type r is close to Poisson-distributed with approximate mean κ(s, r)µ(r).
Even when we assume independence of the neighborhood structures of different vertices, we
still do not arrive at a classical branching process as discussed in [V1, Chapter 3]. Instead,
we can describe the neighborhood structure with a branching process in which we keep track
of the type of each vertex. For general κ and µ, we can even have a continuum of types. Such
branching processes are called multi-type branching processes. In this section, we discuss
some of the basics of these processes.
be the joint probability generating function of the offspring of an individual of type s ∈ [t].
We write
G(z) = (G(1) (z), . . . , G(r) (z)) (3.4.3)
3.4 Multi-Type Branching Processes 109
for the vector of generating functions. We now generalize [V1, Theorem 3.1] to the multi-
type case.
Let ζ be the smallest solution in the lexicographic order on Rt to
ζ = 1 − G(1 − ζ). (3.4.4)
It turns out that ζ is the vector whose sth component equals the survival probability of
(Z k(s) )k≥0 . Define
h Y Z (s) i
k (z) = E
G(s) zt k,t , (3.4.5)
r∈[t]
exists l such that (T lκ )s,r > 0, where the matrix T lκ is the lth power of T κ . We call a
multi-type branching process positively regular if there exists l such that (T lκ )s,r > 0 for
all s, r ∈ [t]. J
The definition of irreducible multi-type branching processes in Definition 3.9 is closely
related to that of irreducible random graph kernels in Definition 3.3. The name irreducibility
can be understood since it implies that the Markov chain of the number of individuals of the
various types is irreducible.
By the Perron–Frobenius theorem, in the positively regular case, the matrix T κ has a
unique largest eigenvalue equal to kT κ k with non-negative left eigenvector xκ , and the
eigenvalue kT κ k can be computed as
sX
kT κ k = sup kT κ xk, where kxk = x2s . (3.4.6)
x : kxk≤1
s∈[t]
(a) The survival probability ζ is the largest solution to ζ = 1 − G(1 − ζ), and ζ > 0
precisely when kT κ k > 1.
(b) Assume that kT κ k > 1. Let xκ be the unique positive left eigenvector of T κ . Then,
−k
as k → ∞ , the martingale Mk = xκ Z (i) k kT κ k converges almost surely to a non-
negative limit on the event of survivalPprecisely when E[Z1(s) log (Z1(s) )] < ∞ for all
s ∈ [t], where Z1(s) = kZ (s)
1 k1 =
(s)
r∈[t] Z1,r is the total number of offspring of a
type-s individual.
Again, it can be seen, in a way similar to that above, that ζκ > 0 if and only if kT κ k > 1,
where now the linear operator T κ is defined, for f : S → R, by
Z
(T κ f )(x) = κ(x, y)f (y)µ(dy), (3.4.15)
S
for any (measurable) function f such that this integral is defined (finite or +∞) for a.e.
x ∈ S.
Note that T κ f is defined for every f ≥ 0, with 0 ≤ T κ f ≤ ∞. If κ ∈ L1 (S × S), as
we assume throughout, then T κ f is also defined for every bounded f . In this case T κ f ∈
L1 (S) and thus T κ f is finite almost everywhere.
The consideration of multi-type branching processes with a possibly uncountable number
of types requires some functional analysis. Similarly to the finite-type case in (3.4.6), we
define
n o
kT κ k = sup kT κ f k : f ≥ 0, kf k ≤ 1 ≤ ∞. (3.4.16)
Thus, also
kT κ k = kψk2L2 (µ) , (3.4.22)
since ψ is the unique non-negative eigenfunction, and a basis of eigenfunctions can be found
by taking a basis in the space orthogonal to ψ (each member of which will have eigenvalue
0). Thus, the rank-1 multi-type branching process is supercritical when kψk2L2 (µ) > 1,
critical when kψk2L2 (µ) = 1, and subcritical when kψk2L2 (µ) < 1.
The rank-1 case is rather special, and not only since we can explicitly compute the eigen-
vectors of the operator T κ . It also turns out that the rank-1 multi-type case reduces to a
single-type branching process with mixed-Poisson offspring distribution. For this, we recall
the construction right below Lemma 3.11. We compute that
Z Z
λ(x) = ψ(x)ψ(y)µ(dy) = ψ(x) ψ(y)µ(dy), (3.4.23)
S S
We conclude that every individual chooses its type independently of the type of its parent.
This means that this multi-type branching process reduces to a single-type branching process
with offspring distribution Poi(Wλ ), where
R
ψ(y)µ(dy)
P(Wλ ∈ A) = RA . (3.4.25)
S
ψ(z)µ(dz)
This makes the rank-1 setting particularly appealing.
We now introduce some helpful notation along the lines of that in Section 1.5. We let
BP≤r denote the branching process up to and including generation r, where, for each in-
dividual v in the rth generation, we record its type as Q(v). It is convenient to think of
the branching-process tree, denoted as BP, as being labeled in the Ulam–Harris way (recall
Section 1.5), so that a vertex v in generation r has a label ∅a1 · · · ar , where ai ∈ N. When
applied to BP, we denote this process by (BP(t))t≥1 , where BP(t) consists of precisely
t + 1 vertices and their types (with BP(0) equal to the root ∅ and its type Q(∅)). We recall
Definitions 1.24 and 1.25 for details.
We can represent this by a sum of independent Poisson multi-type processes with intensities
∆κn (x, y) and can associate a label n with each individual that arises from ∆κn (x, y).
Then the branching process BP(n)≤r is obtained by keeping all vertices with labels at most
(n) d
n, while BP≤r is obtained by keeping all vertices. Consequently, BP≤r −→ BP≤r fol-
lows since κn → κ. Further, 1 − ζ≥k (x) = ζ<k (x) = P(|BP≤k | ≥ k), which thus also
(n) (n) (n)
converges.
and each of its offspring receives an independent type with distribution Q(x) given by
R
κ(x, y)µ(dy)
ρ(Q(x) ∈ A) = RA . (3.5.3)
S
κ(x, y)µ(dy)
The proof of Theorem 3.14 follows a familiar pattern: we first prove it for the finite-type
case, and then use finite-type approximations to extend the proof to the infinite-type case.
We use ρ for the law of the local limit rather than µ, as in Chapter 2, to avoid confusion with
the limiting type measure µ appearing in the definition of IRGn (κn ).
denote the number of vertices whose ordered local neighborhood up to generation r, includ-
ing their types, equals (t, q). Here, in B̄r(Gn ;Q) (v), we record the types of the vertices in
B̄r(Gn ) (v). Theorem 2.15 implies that in order to prove Theorem 3.14, we need to show that
Nn,r (t, q) P
−→ ρ(B̄r(G,Q) (o) = (t, q)), (3.5.5)
n
where (B̄r(G,Q) (o))r≥0 are the vertex-marked r-neighborhoods of the unimodular branching
process (G, o) ∼ ρ described in Theorem 3.14, including the types of the tree vertices.
Recall that these neighborhoods are ordered trees. This implies the convergence in proba-
bility of marked rooted graphs, discussed in Section 2.3.5, and the usual local convergence
in probability of the (unmarked) neighborhood follows by summing over the types of the
vertices in B̄r(Gn ) (v). Let
1{B̄r(Gn ) (v)=t} .
X
Nn,r (t) = (3.5.6)
v∈[n]
Then, indeed, since there is only a finite number of types, (3.5.5) also implies that Nn,r (t)/n
P
−→ ρ(t), where, with a slight abuse of notation, we write
X
ρ(t) = ρ(B̄r(G,Q) (o) = (t, q)) = ρ(B̄r(G) (o) = t) (3.5.7)
q
for the probability that the branching process produces a certain marked tree. We can then
apply Theorem 2.15.
To prove (3.5.5), we follow the usual pattern of using a second-moment method. We first
prove that the first moment satisfies E[Nn,r (t, q)]/n → ρ(B̄r(G,Q) (o) = (t, q)), after which
3.5 Local Convergence for Inhomogeneous Random Graphs 117
we prove that Var(Nn,r (t, q)) = o(n2 ). Then, (3.5.5) follows by the Chebychev inequality
([V1, Theorem 2.18]).
since we first draw a Poisson λ(q(v)) number of children, and then assign a type q to each of
them with probability κ(q(v), q)µ(q)/λ(q(v)). This is true independently for all v ∈ V (t)
with |v| ≤ r − 1, so that
dv
Y 1 Y
ρ(B̄r(G,Q) (o) = (t, q)) = e−λ(q(v)) κ(q(v), q(vj))µ(q(vj)). (3.5.10)
v∈V (t) : |v|≤r−1
dv ! j=1
For a comparison with the graph exploration, it turns out to be convenient to rewrite this
probability slightly. Let t≤r−1 = {v : |v| ≤ r − 1} denote the vertices in the first r − 1
generations of t and let |t≤r−1 | denote its size. We can order the elements of t≤r−1 in their
|t≤r−1 |
lexicographic or Ulam–Harris ordering as (vi )i=1 (recall Definition 1.24 in Section 1.5).
Then we can write
|t≤r−1 | dv
Y
−λ(q(vi )) 1 Yi
ρ(B̄r(G,Q)
(o) = (t, q)) = e κ(q(vi ), q(vi j))µ(q(vi j)). (3.5.11)
i=1
dvi ! j=1
Let us now turn to IRGn (κn ). Fix a vertex v ∈ [n] of type q(v). Recall that nq denotes
the number of type q vertices. The probability of obtaining a sequence of dv neighbors of
(ordered) types (q(v1), . . . , q(vdv )) equals
dv
1 Y κn (q(v), q) nq −mq Y κn (q(v), q(vj))
1− [nqj − mq(vj) (j − 1)], (3.5.12)
dv ! q∈S n j=1
n
where mq = #{i : q(vi) = q} is the number of type-q vertices in (q(v1), . . . , q(vdv )) and
mq(vj) (j) = #{i ≤ j : q(vi) = q} is the number of type-q vertices in (q(v1), . . . , q(vj)).
Here, the first factor, 1/dv !, arises since we are assigning an ordering on all vertices uar,
118 Connected Components in General Inhomogeneous Random Graphs
the second factor, involving the product over q ∈ S , since all other edges (except for the
specified ones) need to be absent, and the third factor, involving the product over j ∈ [dv ],
specifies that the edges to vertices of the (ordered) sequence of types are present.
When n → ∞, κn (q(v), q) → κ(q(v), q) for every q ∈ S since nq /n → µ(q), so that
dv
1 Y κn (q(v), q) nq −mq Y [nqj − mq(vj) (j − 1)]
1− κn (q(v), q(vj))
dv ! q∈S n j=1
n
dv
−λ(q(v)) 1 Y
→e κ(q(v), q(vj))µ(q(vj)), (3.5.13)
dv ! i=1
as required. The above computation, however, ignores the depletion-of-points effect, that
fewer vertices participate in the course of the exploration.
|t≤r−1 |
To describe this, recall the lexicographic ordering of the elements in t≤r−1 as (vi )i=1 ,
and, for a type q , let mq (i) = #{j ∈ [i] : q(vj ) = q} denote the number of type-q
individuals in (t, q) encountered up to and including the ith exploration. Then
|t≤r−1 |
Y 1 Y κn (q(vi ), q) nq −mq (i−1)
P(B̄r
(Gn ;Q)
(o) = (t, q)) = 1− (3.5.14)
i=1
dvi ! q∈S n
dvi
Y κn (q(vi ), q(vi j))
× [nqj − mq(vj) (i + j − 1)].
j=1
n
As n → ∞, this converges to the rhs of (3.5.11), as required. This completes the proof of
(3.5.8), and thus the convergence of the first moment, which, in turn, implies local weak
convergence.
We now condition on B̄r(Gn ;Q) (o1 ) ' (t, q), and write
We already know that P(B̄r(Gn ;Q) (o1 ) = (t, q)) → ρ(B̄r(G,Q) (o) = (t, q)), so that also
In Exercise 3.24, you can prove that (3.5.19) does indeed hold.
We next investigate the conditional probability given B̄r(Gn ;Q) (o1 ) = (t, q) and o2 ∈ /
(Gn )
B2r (o1 ), by noting that the probability that B̄r (Gn ;Q)
(o2 ) = (t, q) is the same as the prob-
ability that B̄r(Gn ;Q) (o2 ) = (t, q) in IRGn0 (κn ), which is obtained by removing the vertices
in Br(Gn ;Q) (o1 ), as well as the edges from them, from IRGn (κn ). We conclude that the result-
ing random graph has n0 = n − |V (t)| vertices, and n0q = nq − mq vertices of type q ∈ [t],
where mq is the number of type-q vertices in (t, q). Further, κn0 (s, r) = κn (s, r)n0 /n. The
whole point is that κn0 (s, r) → κ(s, r) and n0q /n → µ(q) still hold. Therefore, we also
have
P B̄r(Gn ;Q) (o2 ) = (t, q) | Br(Gn ;Q) (o1 ) = (t, q), o2 ∈ (Gn )
/ B2r (o1 )
→ ρ(B̄r(G,Q) (o) = (t, q)). (3.5.20)
and we have proved that E[Nn,r (t, q)2 ]/n2 → ρ(B̄r(G,Q) (o) = (t, q))2 . From this, (3.5.15)
follows directly since E[Nn,r (t, q)]/n → ρ(B̄r(G,Q) (o) = (t, q)). As a result, Nn,r (t, q)/n
P
−→ ρ(B̄r(G,Q) (o) = (t, q)), as required.
Lemma 3.15 completes the proof of Theorem 3.14 in the finite-type case.
holds whp. We let K denote the maximal degree in t. Let Nn,r (m)
(t, q) denote Nn,r (t, q) for
the kernel κm (and keep Nn,r (t, q) as in (3.5.4) for the kernel κn ).
If a vertex v is such that Br(Gn ;Q) (v) ' (t, q) in IRGn (κm ), but not in IRGn (κn ), or
(Gn ;Q)
vice versa, then one vertex in Br−1 (v) needs to have a different degree in IRGn (κm )
120 Connected Components in General Inhomogeneous Random Graphs
1{u∈B 1{D
X
+ (Gn ;Q) (G ;Q)
(v),B̄r n (v)=(t,q) in IRGn (κn )}
(m)
u 6=Du }
. (3.5.22)
r−1
u,v
Recall that the maximal degree of any vertex in V (t) is K . Further, if B̄r(Gn ;Q) (v) = (t, q)
and u ∈ Br−1(Gn ;Q)
(v), then all the vertices on the path between u and v have degree at most
K . Therefore,
Kr − 1
1{u∈Br−1
X X
`
(Gn ;Q) (Gn ;Q)
(v),B̄r (v)=(t,q) in IRGn (κm )}
≤ K ≤ , (3.5.23)
v `≤r−1
K −1
and, in the same way,
Kr − 1
1{u∈B
X
(Gn ;Q) (G ;Q)
(v),B̄r n (v)=(t,q) in IRGn (κn )}
≤ . (3.5.24)
v
r−1 K −1
We thus conclude that whp
Kr − 1 X
|Nn,r
(m)
(t, q) − Nn,r (t, q)| ≤ 2 1 (m)
K − 1 u∈[n] {Du 6=Du }
K r − 1 X (m) Kr − 1 0
≤2 (Du − Du ) ≤ 2 ε n. (3.5.25)
K − 1 u∈[n] K −1
kernels with finitely many types. Recall the rank-1 Poisson branching processes defined in
Section 3.4.3, and recall that there such branching processes were shown to be equivalent to
one-type mixed-Poisson branching processes.
We now make the connection between the thinned marked mixed-Poisson branching pro-
cess and neighborhood exploration precise:
Proposition 3.16 (Connected components as thinned marked branching processes) The
connected component of a uniformly chosen vertex C (o) is equal in distribution to the
set of vertices {Mv : v unthinned} ⊆ [n] and the edges between them inherited by the
marked mixed-Poisson branching process (Xv , Mv )v . Here, {Mv : v unthinned} consists
of the marks of unthinned tree nodes encountered in the marked mixed-Poisson branching
process up to the end of the exploration. Consequently, the set of vertices at graph distance
r from o has the same distribution as
{Mv : v unthinned, |v| = r} . (3.5.31)
r≥0
This proves that the set of marks of the children of the root in the MMPBD has the same
distribution as the set of neighbors of the chosen vertex in NRn (w).
Next, we look at the number of new elements of C (o) neighboring a vertex found in
the exploration. Fix one such vertex, and let its tree label be v = ∅v1 . First, condition on
Mv = l, and assume that v is not thinned. Conditional on Mv = l, the number of children
of v in the MMPBP has distribution Poi(wl ). Each of these Poi(wl ) children receives an iid
mark. Let Xv,j denote the number of children of v that receive mark j .
By Lemma 3.11, (Xv,j )j∈[n] is again a vector of independent Poisson random variables
with parameters wl wj /`n . Owing to the thinning, a mark appears within the offspring of
individual v precisely when Xv,j ≥ 1, and these events are independent. In particular, for
each j that has not appeared as the mark of an unthinned vertex, the probability that it occurs
as the child of a vertex having mark l equals 1 − e−wj wl /`n = p(NR)
lj , as required.
to the finite-type case for convenience. Further, we let the edge probabilities of our random
graph be given by
−κn (su ,sv )/n
uv = 1 − e
puv = p(NR) , (3.5.33)
where su ∈ [t] is the type of vertex u ∈ [n] and S = [t] is the collection of types.
Let us introduce some notation. Recall that ns denotes the number of vertices of type
s ∈ [t], and write n≤s = r≤s nr . Define the intervals Is = [n≤s ] \ [n≤s−1 ] (where, by
P
convention, I0 is the empty set). We note that all vertices in the intervals Is play the same
role, and this is used crucially in the coupling that we present below.
We now describe the cluster exploration of a uniformly chosen vertex o ∈ [n], which has
type s with probability µn (s) = ns /n. To define the cluster of o, as well as the types of
the vertices in it, we define the mark distribution of a tree vertex of type r to be the random
variable M (r) with distribution
1
P(M (r) = `) = , ` ∈ Ir . (3.5.34)
nr
Let (Xv , Tv , Mv )v be a collection of random variables, where:
(a) the root ∅ has type s with probability µn (s) = ns /n, and, given the type s of the
P X∅ of the root has a mixed-Poisson distribution with ran-
root, the number of children
dom parameter λn (s) = r∈[t] κn (s, r)µn (r), where each child v with |v| = 1 of ∅
independently receives a type Tv , where Tv = r with probability κn (s, r)µn (r)/λn (s);
(b) given its type s, the number of childrenP Xv of a tree vertex v has a mixed-Poisson
distribution with parameter λn (s) = r∈[t] κn (s, r)µn (r), and child vj with j ≥ 1 of
v receives a type Tvj , where Tvj = r with probability κn (s, r)µn (r)/λn (s);
(c) given that a tree vertex v has type r, it receives an independent mark Mv (r) with distri-
bution in (3.5.34).
We call (Xv , Tv , Mv )v a marked multi-type Poisson branching process. Then, the follow-
ing extension of Proposition 3.16 holds:
Theorem 3.18 (Locally tree-like nature of GRGn (w)) Assume that Conditions 1.1(a),(b)
hold. Then GRGn (w) converges locally in probability to the unimodular branching-process
tree, with offspring distribution (pk )k≥0 given by
h Wki
pk = P(D = k) = E e−W . (3.5.36)
k!
This result also applies to NRn (w) and CLn (w) under the same conditions.
Theorem 3.18 follows directly from Theorem 3.14. However, we give an alternative proof
that relies on the locally tree-like nature of CMn (d) proved in Theorem 4.1 below and the
relation between GRGn (w) and CMn (d) discussed in Section 1.3 and Theorem 1.9. This
approach is interesting in itself, since it allows for general proofs for GRGn (w) by proving
the result first for CMn (d), and then merely extending it to GRGn (w). We frequently rely
on such a proof strategy for GRGn (w).
In this section we discuss the phase transition in IRGn (κn ). The main result shows that
there is a giant component when the associated multi-type branching process is supercritical
(recall Definition 3.12), while otherwise there is not:
Theorem 3.19 (Giant component of IRG) Let (κn ) be a sequence of irreducible graphical
kernels with limit κ, and let Cmax and C(2) denote the two largest connected components of
IRGn (κn ) (breaking ties arbitrarily). Then,
P
|Cmax |/n −→ ζκ , (3.6.1)
P
and |C(2) |/n −→ 0. In all cases ζκ < 1, while ζκ > 0 precisely when kT κ k > 1.
Theorem 3.19 is a generalization of the law of large numbers for the largest connected
component in [V1, Theorem 4.8] for ERn (λ/n) (see Exercise 3.31); recall also Theorem
2.34.
We do not give a complete proof of Theorem 3.19 in this chapter. The upper bound follows
directly from the local convergence in Theorem 3.14, together with Corollary 2.27. For the
lower bound, it suffices to prove this for kernels with finitely-many types, by Proposition
3.5. This proof is deferred to Section 6.5.3 in Chapter 6. We close this section by discussing
a few examples of Theorem 3.19.
branching process with mean offspring λ/2. This is not surprising, since the degree of each
vertex is Bin(n/2, λ/n), so the bipartite random graph of size n is, in terms of its local
structure, closely related to the Erdős–Rényi random graph of size n/2.
Finite-Type Case
The bipartite random graph can also be viewed as a random graph with two types of vertices
(i.e., the vertices [n/2] and [n] \ [n/2]). We now generalize our results to the finite-type
case, in which we have seen that κn is equivalent to a t × t matrix (κn (s, r))s,r∈[t] , where
t denotes the number of types. In this case, IRGn (κn ) has vertices of t different types (or
colors), say ns vertices of type s, where two vertices of type s and r are joined by an edge
with probability n−1 κn (s, r) ∧ 1. Exercises 3.29 and 3.30 investigate the phase transition in
the finite-type case.
In the case where E[W 2 ] = ∞, on the other hand, we take fε (x) = cψ(x)1{x∈[ε,1]} , where
c = cε is such that kfε k = 1. Then, kT κ fε k → ∞, so that kT κ k = ∞, and CLn (w) is
always supercritical in this regime.
Theorem 3.20 (Phase transition in generalized random graphs) Suppose that Conditions
1.1(a),(b) hold and consider the random graphs GRGn (w), CLn (w) or NRn (w), letting
n → ∞. Denote pk = P(Poi(W ) = k) as defined below (1.3.22). Let Cmax and C(2) be the
largest and second largest components of GRGn (w), CLn (w), or NRn (w).
126 Connected Components in General Inhomogeneous Random Graphs
(a) If ν = E[W 2 ]/E[W ] > 1, then there exist ξ ∈ (0, 1), ζ ∈ (0, 1) such that
P
|Cmax |/n −→ ζ,
vk (Cmax )/n −→ pk (1 − ξ k ), for every k ≥ 0,
P
while |C(2) |/n −→ 0 and |E(C(2) )|/n −→ 0. Further, 12 E[W ](1 − ξ 2 ) > ζ , so that the
P P
The proof of Theorem 3.20, except for the proof of the linear complexity, is deferred to
Section 4.3.2 in Chapter 4, where a similar result is proved for the configuration model.
By the strong relation between the configuration model and the generalized random graph
(recall Theorem 1.9), this result can be seen to imply Theorem 3.20.
Let us discuss some implications of Theorem 3.20, focussing on the supercritical case
where ν = E[W 2 ]/E[W ] > 1. In this case, the parameter ξ is the extinction probability of
a branching process with offspring distribution p?k = P(Poi(W ? ) = k), where W ? is the
size-biased version of W . Thus,
?
(ξ−1)
ξ = GPoi(W ? ) (ξ) = E[eW ], (3.6.5)
?
where GPoi(W ? ) (s) = E[sPoi(W ) ] is the probability generating function of a mixed-Poisson
random variable with mixing distribution W ? .
Further, since vk (Cmax )/n −→ pk (1 − ξ k ) and |Cmax |/n −→ ζ , it must hold that
P P
X
ζ= pk (1 − ξ k ) = 1 − GD (ξ), (3.6.6)
k≥0
where GD (s) = E[sD ] is the probability generating function of D = Poi(W ). We also note
P
that |E(Cmax )|/n −→ η , and compute
X X kpk
η= 1
kpk (1 − ξ k ) = 21 E[W ] (1 − ξ k )
2
k≥0 k≥0
E[W ]
1
E[W ] 1 − ξGPoi(W ? ) (ξ) = 12 E[W ] 1 − ξ 2 ),
= 2
(3.6.7)
as required.
We now compare the limiting total number of edges with |Cmax |. Recall the useful corre-
lation inequality in [V1, Lemma 2.14] that states that E[f (X)g(X)] ≥ E[f (X)]E[g(X)]
for any non-decreasing functions f and g and random variable X . Applying this to f (k) = k
and g(k) = 1 − ξ k , which are both increasing, leads to
X X X
kpk (1 − ξ k ) > kpk (1 − ξ k )pk = E[W ]ζ. (3.6.8)
k≥0 k≥0 k≥0
As a result, by (3.6.7),
X
η= 1
2
kpk (1 − ξ k ) > 12 E[W ]ζ. (3.6.9)
k≥0
3.6 Phase Transition for Inhomogeneous Random Graphs 127
Thus, the average degree η/ζ in the giant component is strictly larger than the average degree
in the entire graph E[W ]/2.
We finally show that η > ζ , so that the giant has linear complexity. By convexity of
x 7→ xk−1 and the fact that ξ < 1, for k ≥ 1, we have
k−1
X
ξ i ≤ k(1 + ξ k−1 )/2, (3.6.10)
i=0
The lhs of (3.6.12) equals ζ by (3.6.6). We next investigate the rhs of (3.6.12). Recall that
X
kpk = E[W ], (3.6.13)
k
and, by (3.6.5),
X kpk
ξ k−1 = ξ. (3.6.14)
k
E[W ]
Hence, the rhs of (3.6.12) is
(1 − ξ)(E[W ] + E[W ]ξ)/2 = E[W ](1 − ξ 2 )/2 = η, (3.6.15)
which is the limit in probability of |E(Cmax )|/n. Thus, ζ < η .
jargon, we are dealing with site percolation rather than with bond percolation. We will start
by relating the obtained graph to an inhomogeneous random graph.
Note that when we explore a connected component of a vertex after a random attack, the
vertex may not have been affected by the attack, which has probability p. After this, in the
exploration, we always inspect an edge between a vertex that is unaffected by the attack
and a vertex of which we do not yet know whether it has been attacked or not. As a result,
for random attacks, the probability that it is affected equals p independently of the past
randomness. Therefore, it is similar to the random graph where puv is replaced by p × puv .
For a branching process, this identification is exact, and we have that ζκ,p = pζpκ , where
ζκ,p denotes the survival probability of the unimodular multi-type marked branching-process
tree in Theorem 3.14, where additionally each individual in the tree is killed with probability
1 − p independently of all other randomness. For CLn (w), this equality is only asymptotic.
In the case where E[W 2 ] < ∞, so that ν < ∞, this means that there exists a critical value
pc = 1/ν , such that, if p > pc , the giant component persists in CLn (w), where vertices
are removed with probability 1 − p, while the giant component is destroyed for p ≤ pc .
Thus, when E[W 2 ] < ∞, CLn (w) is sensitive to random attacks. When E[W 2 ] = ∞, on
the other hand, ν = ∞ also, so that the giant component persists for every p ∈ [0, 1), and
the graph is called robust to random attacks. Here we must note that the size of the giant
component does decrease, since ζκ,p < pζκ < ζκ !
To mimic a deliberate attack, we remove a proportion p of vertices with highest weight.
For convenience, we assume that w = (w1 , . . . , wn ) is non-increasing. Then, removing a
proportion p of the vertices with highest weight means that w is replaced with w(p), which
is equal to wv (p) = wv 1{v>np} , and we denote the resulting edge probabilities by
In this case, the resulting graph on [n] \ [n(1 − p)] is again a Chung–Lu model for which ν
is replaced with ν(p), given by
where U is uniform on [0, 1] and we recall that we have written ψ(u) = [1−F ]−1 (u). Now,
for any distribution function F , E[[1 − F ]−1 (U )2 1{U >p} ] < ∞, so that, for p sufficiently
close to 0, ν(p) < 1 (see Exercise 3.39). Thus, the CLn (w) model is always sensitive to
deliberate attacks.
Phase Transitions in Uniformly Grown Random Graphs and for Sum Kernels
Recall the definition of the uniformly grown random graph in (3.2.12). A vertex v is con-
nected independently with all u ∈ [v − 1] with probability puv = λ/v . This leads to an
inhomogeneous random graph with type space S = [0, 1] and limiting kernel κ(x, y) =
λ/(x ∨ y). It is non-trivial to compute kT κ k, but remarkably this can be done, to yield
kT κ k = 4λ, so that a giant exists for all λ > 14 . We do not give the proof of kT κ k = 4λ
here, and refer to Exercise 3.40 for details. Exercise 3.41 investigates when there is a giant
for sum kernels, as in (3.2.13).
3.7 Related Results for Inhomogeneous Random Graphs 129
In this section, we discuss some related results for inhomogeneous random graphs. While
we give intuition about their proofs, we do not include them in full detail.
typical vertex is close to a branching process, so that it is whp bounded, and its expected
connected component size is close to 1/(1 − ν). Thus, the best way to obtain a large con-
nected component is to start with a vertex with high weight wi , and let all of its roughly wi
children be independent branching processes. Therefore, in expectation, each child is con-
nected to another 1/(1 − ν) different vertices, leading to a connected component size of
roughly wi /(1 − ν). This is clearly largest when wi = maxj∈[n] wj = wmax , leading to an
intuitive explanation of Theorem 3.22.
Theorems 3.21 and 3.22 raise the question what the precise conditions are for |Cmax | to
be of order log n. Intuitively, if wmax log n then |Cmax | = wmax /(1 − ν)(1 + oP (1)),
whereas if wmax = Θ(log n) then |Cmax | = ΘP (log n) as well. In Turova (2011), it was
proved that |Cmax |/ log n converges in probability to a finite constant when ν < 1 and
the weights are iid with distribution function F with E[eαW ] < ∞, for some α > 0, i.e.,
exponential tails are sufficient.
the k parameter in (3.7.6) measures how many such exponential contributions arise before
the graph remaining after the removal of all large components becomes such that its giant
has whp size at most dnue.
Interestingly, applying Theorem 3.23 to u = 1 also provides the relation
1
lim log P(ERn (λ/n) connected) = − log(1 − e−λ ); (3.7.9)
n→∞n
see also Exercise 3.42.
2006, Proposition 3.1), where the connections between NRn (w) and Poisson branching processes were
first exploited to prove the versions of Theorem 6.3 in Chapter 6.
F ]−1 (v/n) as in (1.3.15). Assume that wv2 = o(`n ). Show that the edge probabilities in CLn (w̃) are
Further, show that CLn (w̃) and CLn (w) are asymptotically equivalent whenever (E[Wn ] − E[W ])2 =
o(1/n2 ).
Exercise 3.7 (Definitions 3.2 and 3.3 for the homogeneous bipartite graph) Prove that Definitions 3.2 and
3.3 hold for the homogeneous bipartite graph.
Exercise 3.8 (Examples of homogeneous random graphs) Show that the Erdős–Rényi random graph, the
homogeneous bipartite random graph, and the stochastic block model are all homogeneous random graphs.
Exercise 3.9 (Homogeneous bipartite graph) Prove that the homogeneous bipartite random graph is a
special case of the finite-type case.
Exercise 3.10 (Irreducibility for the finite-types case) Prove that, in the finite-type case, irreducibility
follows when there exists an m such that the mth power of the matrix (κ(s, r)µ(r))s,r∈[t] contains no
zeros.
Exercise 3.11 (Graphical limit in the finite-types case) Prove that, in the finite-type case, the convergence
of µn in (3.2.1) holds precisely when, for every type s ∈ S,
lim ns /n = µ(s). (3.9.3)
n→∞
Exercise 3.12 (Variance of number of vertices of degree k and type s) Let IRGn (κn ) be a finite-type in-
homogeneous random graph with graphical sequence of kernels κn . Let Nk,s (n) be the number of vertices
of degree k and type s. Show that Var(Nk,s (n)) = O(n).
Exercise 3.13 (Proportion of isolated vertices in inhomogeneous random graphs) Let IRGn (κn ) be an
inhomogeneous random graph with a graphical sequence of kernels κn that converges to κ. Show that the
proportion of isolated vertices converges to
Z
1
N0 (n) −→ p0 = e−λ(x) µ(dx).
P
(3.9.4)
n
R
Conclude that p0 > 0 when λ(x)µ(dx) < ∞.
Exercise 3.14 (Upper and lower bounding finite-type kernels) Prove that the kernels κm and κm in
(3.3.15) and (3.3.16) are of finite type.
Exercise 3.15 (Inclusion of graphs for larger κ) Let κ0 ≤ κ hold a.e. Show that we can couple IRGn (κ0 )
and IRGn (κ) in such a way that IRGn (κ0 ) ⊆ IRGn (κ).
Exercise 3.16 (Tails of Poisson variables) Use the stochastic domination of Poisson random variables
with different parameters, as well as the concentration properties of Poisson variables, to complete the
proof of (3.3.30), showing that the tail asymptotics of the weight distribution and that of the mixed-Poisson
random variable with that weight agree.
Exercise 3.17 (Power laws for sum kernels) Let κ(x, y) = ψ(x) + ψ(y) for a continuous function
ψ : [0, 1] 7→ [0, ∞), and let the reference measure µ be uniform on [0, 1]. Use Corollary 3.7 to identify
when the degree distribution satisfies a power law. How is the tail behavior of D related to that of ψ?
Exercise 3.18 (Survival probability of individual with random type) Consider a multi-type branching
process where the root has type s with probability µ(s) for all s ∈ [t]. Show that the survival probability ζ
equals
X (s)
ζ= ζ µ(s), (3.9.5)
s∈[t]
Exercise 3.21 (Singularity of multi-type branching process) Prove that G(z) = Mz for some matrix M
precisely when each individual in the multi-type branching process has exactly one offspring almost surely.
Exercise 3.22 (Erdős–Rényi random graph) Prove that NRn (w) = ERn (λ/n) when w is constant with
wv = −n log (1 − λ/n) for all v ∈ [n].
Exercise 3.23 (Homogeneous Poisson multi-type branching processes) In analogy with the homogeneous
random graph as defined in (3.2.11), we call a Poisson multi-type branching process homogeneous when
the expected offspring of a tree vertex of type x equals λ(x) = λ for all x ∈ S. Consider a homogeneous
Poisson multi-type branching process with parameter λ. Show that the function φ(x) = 1 is an eigenvector
of T κ with eigenvalue λ. Conclude that (Zj /λj )j≥0 is a martingale, where (Zj )j≥0 denotes the number
of individuals in the jth generation, irrespective of the starting distribution.
Exercise 3.24 (Proof of no-overlap property in (3.5.19)) Prove that P(B̄r(Gn ;Q) (o1 ) = (t, q), o2 ∈
(Gn ;Q)
B2r (o1 )) → 0, and conclude that (3.5.19) holds.
Exercise 3.25 (Unimodular mixed-Poisson branching process) Recall the definition of a unimodular
branching process in Definition 1.26. Prove that the mixed-Poisson branching process described in (3.5.29)
and (3.5.30) is indeed unimodular.
Exercise 3.26 (Branching process domination of Erdős–Rényi random graph) Show that Exercise 3.22 to-
gether with Proposition 3.16 imply that |C (o)| T ? , where T ? is the total progeny of a Poisson branching
process with mean −n log (1 − λ/n) offspring.
Exercise 3.27 (Local convergence of ERn (λ/n)) Use Theorem 3.18 to show that ERn (λ/n) converges
locally in probability to the Poisson branching process with parameter λ.
Exercise 3.28 (Coupling to a multi-type Poisson branching process) Prove the stochastic relation be-
tween multi-type Poisson branching processes and neighborhoods in Norros–Reittu inhomogeneous random
graphs in Proposition 3.17 by adapting the proof of Proposition 3.16.
Exercise 3.29 (Phase transition for r = 2) Let ζκ(1) and ζκ(2) denote the survival probabilities of an
irreducible multi-type branching process with two types starting from vertices of types 1 and 2, respectively.
Give necessary and sufficient conditions for ζκ(i) > 0 to hold for i ∈ {1, 2}.
Exercise 3.30 (The size of small components in the finite-type case) Prove that, in the finite-types case,
when (κn ) converges to a limiting kernel κ, then supx,y,n κn (x, y) < ∞ holds, so that the results of
Theorem 3.21 apply in the sub- and supercritical cases.
Exercise 3.31 (Law of large numbers for |Cmax | for ERn (λ/n)) Prove that, for the Erdős–Rényi random
P
graph, Theorem 3.19 implies that |Cmax |/n −→ ζλ , where ζλ is the survival probability of a Poisson
branching process with mean-λ offspring.
Exercise 3.32 (Connectivity of uniformly chosen vertices) Suppose we draw two vertices independently
and uar from [n] in IRGn (κn ). Prove that Theorem 3.20 implies that the probability that the vertices are
connected converges to ζ 2 .
Exercise 3.33 (The size of small components for CLn (w)) Use Theorem 3.21 to prove that, for CLn (w)
with weights given by (1.3.15) and 1 < ν < ∞, the second largest cluster has size |C(2) | = OP (log n)
when W has bounded support or is almost surely bounded below by ε > 0 with E[W ] < ∞. Further,
|Cmax | = OP (log n) when W has bounded support and ν < 1. Here W is a random variable with
distribution function F .
Exercise 3.34 (Average degree in two populations) Show that the average degree is close to pm1 + (1 −
p)m2 in the setting of Example 3.1 with n1 vertices of type 1 satisfying n1 /n → p.
Exercise 3.35 (Phase transition for two populations) Show that ζ > 0 precisely when [pm21 + (1 −
p)m22 ]/[pm1 +(1−p)m2 ] > 1 in the setting of Example 3.1 with n1 vertices of type 1 satisfying n1 /n → p.
Exercise 3.36 (Phase transition for two populations (cont.)) In the setting of Exercise 3.35, find an example
of p, m1 , m2 where the average degree is less than 1, yet there exists a giant component.
Exercise 3.37 (Degree sequence of giant component for rank 1) Consider GRGn (w) as in Theorem 3.20.
Show that the proportion of vertices of Cmax having degree ` is close to p` (1 − ξ ` )/ζ.
3.9 Exercises for Chapter 3 135
Exercise 3.38 (Degree sequence of complement of giant component) Consider GRGn (w) as in Theorem
3.20. Show that when ξ < 1, the proportion of vertices outside the giant component Cmax having degree
` is close to p` ξ ` /(1 − ζ). Conclude that the degree sequence of the complement of the giant component
never satisfies a power law. Can you give an intuitive explanation for this?
Exercise 3.39 (Finiteness of ν(p)) Prove that ν(p) in (3.6.17) satisfies ν(p) < ∞ for every p ∈ (0, 1].
Exercise 3.40 (Phase transition of uniformly grown random graphs) Recall the uniformly grown random
graph in (3.2.12). Look up the proof that kT κ k = 4λ in (Bollobás et al., 2007, Section 16.1).
Exercise 3.41 (Phase transition of sum kernels) Recall the inhomogeneous random graph with sum kernel
in (3.2.13). When does it have a giant?
Exercise 3.42 (Connectivity probability of sparse ERn (λ/n)) Use Theorem 3.23 to prove that
1
lim log P(ERn (λ/n) connected) = log(1 − e−λ )
n→∞ n
as in (3.7.9).
C HAPTER 4
C ONNECTED C OMPONENTS IN
C ONFIGURATION M ODELS
Abstract
In this chapter we investigate the local limit of the configuration model, identify
when it has a giant component, and find its size and degree structure. We give
two proofs, one based on a “the giant is almost local” argument, and the other
based on a continuous-time exploration of the connected components in the
configuration model. Further results include its connectivity transition.
In this chapter we study the connectivity structure of the configuration model. We focus
on the local connectivity, by investigating its local limit, as well as the global connectivity,
by identifying its giant component and connectivity transition. In inhomogeneous random
graphs there always is a positive proportion of vertices that are isolated (recall Exercise
3.13). In many real-world examples, we observe the presence of a giant component (recall
Table 3.1). In many of these examples the giant is almost the whole graph and sometimes,
by definition, it is the whole graph. For example, the Internet needs to be connected in such a
way as to allow e-mail messages to be sent between any pair of vertices. In many other real-
world examples, though, it is not at all obvious whether, or why, the network is connected.
See Figure 4.1 (which is the same as Figure 3.1), and observe that there are quite a few
connected networks in the KONECT data base.
Table 4.1 invites us to think about what makes networks (close to) fully connected. We
0.9
Relative size of LCC
0.8
0.7
0.6
0.5
0.4
0.3
Table 4.1 The rows in the above table represent the following six real-world networks:
In the California road network, vertices represent intersections or endpoints of roads.
In the Facebook network, vertices represent the users and the edges Facebook friendships.
Hyves was a Dutch social media platform. Vertices represent users, and edges friendships.
The arXiv astro-physics network represents authors of papers within the astro-physics section of
arXiv, where an edge between authors represents that they have co-authored a paper.
In the high-voltage power network in western USA, the vertices represent transformer substations
and generators, and the edges transmission cables.
In the jazz-musicians data set, vertices represent musicians and connections indicate past
collaborations.
investigate this question here in the context of the configuration model. The advantage of the
configuration model is its high flexibility in degree structure, so that all degrees can have at
least a certain minimal value. We will see that this can give rise to connected random graphs
that at the same time remain sparse, as is the case in many real-world networks.
We start by investigating the locally tree-like nature of the configuration model. Recall the
unimodular branching-process tree from Definition 1.26. Our main result is as follows:
Theorem 4.1 (Locally tree-like nature of the configuration model) Assume that Con-
ditions 1.7(a),(b) hold. Then CMn (d) converges locally in probability to the unimodu-
lar branching-process tree (G, o) ∼ µ with root offspring distribution (pk )k≥0 given by
pk = P(D = k).
Before starting the proof of Theorem 4.1, let us informally explain the above connection
between local neighborhoods and branching processes. We note that the asymptotic off-
spring distribution at the root is equal to (pk )k≥0 , where pk = P(D = k) is the asymptotic
degree distribution. Indeed, fix Gn = CMn (d). Then, the probability that a random vertex
4.2 Local Convergence of the Configuration Model 139
(a) (b)
100 100
10−1 10−1
10−2 10−2
P(X > x)
P(X > x)
10−3 10−3
10−4 10−4
Degree distribution Degree distribution
−5
Size-biased degree distribution −5
Size-biased degree distribution
10 Random friend degree distribution 10 Random friend degree distribution
100 101 102 103 100 101
Degrees Degrees
Figure 4.2 Degree distributions in the configuration model with n = 100, 000 and
(a) τ = 2.5; (b) τ = 3.5.
has degree k is equal to
p(G
k
n)
= P(Dn = k) = nk /n, (4.2.1)
(Gn )
where nk denotes the number of vertices with degree k . By Condition 1.7(a), pk converges
to pk = P(D = k), for every k ≥ 1. This explains the offspring of the root of our branching-
process approximation.
The offspring distribution of individuals in the first and later generations is given by
(k + 1)pk+1
p?k = . (4.2.2)
E[D]
We now explain this heuristically, by examining the degree of the vertex to which the first
half-edge incident to the root is paired. By the uniform matching of half-edges, the probabil-
ity that a vertex of degree k is chosen is proportional to k . Ignoring the fact that the root and
one half-edge have already been chosen (which does have a minor effect on the number of
available or free half-edges), the degree of the vertex incident to the chosen half-edge equals
k with probability equal to kp(G k
n)
/E[Dn ] (recall (4.2.1)). See Figure 4.2 for an example of
the degree distribution in the configuration model, where we show the degree distribution
itself, the size-biased degree distribution, and the degree distribution of a random neighbor
of a uniform vertex, for two values of τ . As can be guessed from the local limit, the latter
two degree distributions are virtually indistinguishable.
However, one of the half-edges is used in connecting to the root, so that, for a vertex
incident to the root to have k offspring, it needs to connect its half-edge to a vertex of degree
k + 1. Therefore, the probability that the offspring, or “forward degree,” of any of the direct
neighbors of the root is k equals
(Gn )
(k + 1)pk+1
p?k(Gn ) = . (4.2.3)
E[Dn ]
Thus, (pk?(Gn ) )k≥0 can be interpreted as the forward degree distribution of vertices in the
cluster exploration. When Conditions 1.7(a),(b) hold, we also have pk?(Gn ) → p?k , where
(p?k )k≥0 is defined in (4.2.2). As a result, we often refer to (p?k )k≥0 as the asymptotic forward
degree distribution.
140 Connected Components in Configuration Models
The above heuristic argues that any direct neighbor of the root has a number of for-
ward neighbors with asymptotic law (p?k )k≥0 . However, every time we pair two half-edges,
the number of free or available half-edges decreases by 2. Similarly to the depletion-of-
points effect in the exploration of connected components in the Erdős–Rényi random graph
ERn (λ/n), the configuration model CMn (d) suffers from a depletion-of-points-and-half-
edges effect. Thus, by iteratively connecting half-edges in a breadth-first way, the offspring
distribution changes along the way, which potentially gives trouble.
Luckily, the number of available half-edges is initially `n − 1, which is very large when
Conditions 1.7(a),(b) hold, since then `n /n = E[Dn ] → E[D] > 0. Thus, we can pair
many half-edges before we start noticing that their number decreases. As a result, the de-
grees of different vertices in the exploration process are close to being iid, leading to a
branching-process approximation of neighborhoods in the configuration model. In order to
prove Theorem 4.1, we need to pair only a bounded number of edges, but our approximation
extends significantly beyond this.
In order start the proof of Theorem 4.1 based on (2.4.11), we introduce some notation.
First, we let B̄r(Gn ) (v) denote the ordered version of Br(Gn ) (v), obtained by ordering the half-
edges randomly and performing a breadth-first exploration from the smallest to the largest
labeled half-edge. We again write B̄r(Gn ) (v) = t to denote that this ordered neighborhood is
equal to the ordered tree t.
Fix a rooted ordered tree t with r generations, and let
1{B̄r(Gn ) (v)=t}
X
Nn,r (t) = (4.2.4)
v∈[n]
denote the number of vertices in Gn = CMn (d) whose ordered local neighborhood up to
generation r equals t. By Theorem 2.15, to prove Theorem 4.1, we need to show that
Nn,r (t) P
−→ µ(B̄r(G) (o) = t), (4.2.5)
n
where (G, o) ∼ µ denotes the unimodular branching process with root offspring distribution
(pk )k≥1 . Here, we also rely on Theorem 2.8 to see that it suffices to prove (4.2.5) for trees,
since the unimodular branching-process tree is a tree with probability 1.
To prove (4.2.5), and as we have done before, we use a second-moment method. We
start by proving that the first moment E[Nn,r (t)]/n → µ(B̄r(G) (o) ' t), after which we
prove that Var(Nn,r (t)) = o(n2 ). Then (4.2.5) follows from the Chebychev inequality
[V1, Theorem 2.18].
the root ∅ and its neighbors. We explore it in breadth-first order as in Definition 1.25 in
Section 1.5.
d d
Clearly, by Conditions 1.7(a),(b), we have Dn −→ D and Dn? −→ D? , which implies
d
that BPn (t) −→ BP(t) for every t finite, where BP(t) is the restriction of the unimodular
branching process (G, o) with root offspring distribution (pk )k≥1 to its first t individuals
(see Exercise 4.1). Note that, for t a fixed rooted tree of at most r generations, Br(G) (o) ' t
precisely when BP(tr ) ' t, where tr denotes the number of vertices in the first r − 1
generations in t.
We let (Gn (t))t≥1 denote the graph exploration process from a uniformly chosen vertex
o ∈ [n]. Here Gn (t) is the exploration where we have paired precisely t−1 half-edges, in the
breadth-first manner as described in Definition 1.25, while we also indicate the half-edges
incident to the vertices found. Thus, Gn (1) consists of o ∈ [n] and its Dn = do half-
edges, and every further exploration corresponds to the pairing of a half-edge. In particular,
from (Gn (t))t≥1 , we can retrieve Br(Gn ) (o) for every r ≥ 0, where Gn = CMn (d). The
following lemma proves that we can couple the graph exploration to the branching process in
such a way that (Gn (t))t∈[mn ] is equal to (BPn (t))t∈[mn ] whenever mn → ∞ sufficiently
slowly. In the statement, we write (Gb n (t), BP
c n (t))t≥1 for the coupling of (Gn (t))t∈[m ] and
n
(BPn (t))t∈[mn ] :
Lemma 4.2 (Coupling graph exploration and branching process) Subject to Conditions
1.7(a),(b), there exists a coupling (G b n (t), BP
c n (t))t≥1 of (Gn (t))t≥1 and (BPn (t))t≥1 such
that
P (Gb n (t))t∈[m ] 6= (BP
n
c n (t))t∈[m ] = o(1),
n
(4.2.6)
0 b n (m), and we give all sibling half-edges of y 0 the ghost status (where we
ym to obtain G m
recall that the sibling half-edges of a half-edge y are those half-edges unequal to y that are
incident to the same vertex as is y ).
When the half-edge has the real status, it needs to be paired both in G b n (m) and BP c n (m).
To obtain Gn (m), this half-edge needs to be paired with a uniform “free” half-edge, i.e., one
b
that has not been paired so far. For BP c n (m), this restriction does not hold. We now show
how these two choices can be conveniently coupled.
c n (m), we draw a uniform half-edge ym from the collection of all half-edges,
For BP
independently of the past randomness. Let Um denote the vertex to which ym is incident.
We then let the mth individual in (BP c n (t))t∈[m−1] have precisely dU −1 children. Note that
m
dUm −1 has the same distribution as Dn? −1 and, by construction, the collection dUt −1 t≥1
is iid. This constructs BP c n (m), except for the statuses of the sibling half-edges incident to
Ut , which we describe below.
For Gb n (m), when ym is still free, i.e., it has not yet been paired in (G b n (t))t∈[m−1] , we
let xm be paired with ym in G b n (m); we have thus also constructed (G b n (t), BP
c n (t))t∈[m] .
We give all the other half-edges of Um the status “real” when Um has not yet appeared in
b n (m − 1), and otherwise we give them the ghost status. The latter case implies that a cycle
G
appears in (G b n (t))t∈[m] . By construction, such a cycle does not occur in (BP c n (t))t∈[m] ,
where reused vertices are simply repeated several times.
A difference in the coupling arises when ym has already been paired in (G b n (t))t∈[m−1] , in
which case we give all the sibling half-edges of Ut the ghost status. For G b n (m), we draw a
0 0
uniform unpaired half-edge ym and pair xm with ym instead, to obtain G b n (m), and we give
0
all the sibling half-edges of ym the ghost status. Clearly, this might give rise to a difference
between G b n (m) and BP c n (m).
We continue the above exploration algorithm until it terminates at some time Tn . Since
each step pairs exactly one half-edge, we have that Tn = |E(C (o))|, so that Tn ≤ `n /2
steps. The final result is then (G b n (t), BP
c n (t))t∈[T ] . At this moment, however, the branching-
n
process tree (BP c n (t))t≥1 has not been fully explored, since the tree vertices corresponding
to ghost half-edges in (BP c n (t))t≥1 have not been explored. We complete the tree explo-
ration (BPc n (t))t≥1 by iid drawing children of all the ghost tree vertices until the full tree is
obtained.
We emphasize that the law of (BP c n (t))t≥1 obtained above is not the same as that of
(BPn (t))t≥1 , since the order in which half-edges are paired is chosen in such a way that
(Gb n (t))t∈[T ] has the same law as the graph exploration process (Gn (t))t∈[T ] . However,
n n
with σn the first time that a ghost half-edge is paired, we have that (BP c n (t))t∈[σ ] does have
n
Half-edge reuse. In the above coupling, a half-edge reuse occurs when ym has already been
paired and is being reused in the branching process. As a result, for (Gb n (t))t∈[m] , we need
0
to redraw ym to obtain ym , which is used instead in (Gn (t))t∈[m] ;
b
Vertex reuse. A vertex reuse occurs when Um = Um0 for some m0 < m. In the above cou-
pling, this means that ym is a half-edge that has not yet been paired in (Gb n (t))t∈[m−1] , but
it is incident to a half-edge that has already been paired in (G
b n (t))t∈[m−1] . In particular,
the vertex Um to which it is incident has already appeared in (G b n (t))t∈[m−1] , and it is be-
ing reused in the branching process. In this case, a copy of Um appears in (BP c n (t))t∈[m] ,
while a cycle appears in (Gn (t))t∈[m] .
b
Half-Edge Reuse
At time m − 1, precisely 2m − 1 half-edges are forbidden for use by (G
b n (t))t∈[m] . The
probability that the half-edge ym equals one of these half-edges is
2m − 1
. (4.2.7)
`n
Hence the expected number of half-edge reuses before time mn is
mn
X 2m − 1 m2
= n = o(1), (4.2.8)
m=1
`n `n
√
when mn = o( n). The Markov inequality ([V1, Theorem√ 2.17]) shows that the probabil-
ity that a half-edge reuse occurs is also o(1) when mn = o( n).
Vertex Reuse
The probability that vertex v is chosen in the mth draw of (BP
c n (t))t≥1 is equal to dv /`n .
The probability that vertex v is drawn twice before time mn is therefore at most
mn (mn − 1) d2v
. (4.2.9)
2 `2n
The expected number of vertex reuses up to time mn is thus at most
We next study the second moment of Nn,r (t) and show that it is almost its first moment
squared:
Lemma 4.4 (Concentration number of trees) Subject to Conditions 1.7(a),(b),
E[Nn,r (t)2 ]
→ µ(B̄r(G) (o) = t)2 . (4.2.13)
n2
P
Consequently, Nn,r (t)/n −→ µ(B̄r(G) (o) = t).
Proof Let o1 , o2 ∈ [n] be two vertices chosen uar from [n], independently. We start by
computing
E[Nn,r (t)2 ]
= P(B̄r(Gn ) (o1 ), B̄r(Gn ) (o2 ) = t). (4.2.14)
n2
d
Recall that |Br(Gn ) (o1 )| −→ |Br(G) (o)|, where (G, o) ∼ µ denotes the local weak limit of
CMn (d) derived above. Since |Br(G) (o)| is a tight random variable, o2 ∈ / B2r(Gn )
(o1 ) whp
(recall Corollary 2.20), so that also
E[Nn,r (t)2 ]
= P(B̄r(Gn ) (o1 ), B̄r(Gn ) (o2 ) = t, o2 ∈ (Gn )
/ B2r (o1 )) + o(1). (4.2.15)
n2
We now condition on B̄r(Gn ) (o1 ) = t, and write
P(B̄r(Gn ) (o1 ), B̄r(Gn ) (o2 ) = t, o2 ∈ (Gn )
/ B2r (o1 ))
= P(B̄r(Gn ) (o2 ) = t | B̄r(Gn ) (o1 ) = t, o2 ∈ (Gn )
/ B2r (o1 ))
× P(B̄r (o1 ) = t, o2 ∈
(Gn ) (Gn )
/ B2r (o1 )). (4.2.16)
4.2 Local Convergence of the Configuration Model 145
We already know that P(B̄r(Gn ) (o1 ) = t) → µ(B̄r(G) (o) = t), so that also
In Exercise 4.4, the reader can prove that (4.2.17) does indeed hold.
Conditional on B̄r(Gn ) (o1 ) = t and o2 ∈ / Br(Gn ) (o1 ), the probability that B̄r(Gn ) (o2 ) = t is
the same as the probability that B̄r (o2 ) = t in CMn0 (d0 ), which is obtained by removing
(Gn )
all vertices in Br(Gn ) (o1 ). Thus, since B̄r(Gn ) (o1 ) = t, we have that n0 = n − |V (t)| and d0
is the corresponding degree sequence. The key point is that the degree distribution d0 still
satisfies Conditions 1.7(a),(b). Therefore, we also have
so that M0 = E[Nn,r (t)] and M`n /2 = Nn,r (t). We use the Azuma–Hoeffding inequality
[V1, Theorem 2.27] to obtain the concentration of M`n /2 − M0 . For this, we investigate, for
t ∈ [`n /2],
Mt − Mt−1 = E[Nn,r (t) | Ft ] − E[Nn,r (t) | Ft−1 ]. (4.2.20)
In the first term, we reveal one more pairing compared with the second term. We now study
the effect of this extra pairing. Let ((xs , ys ))s∈[`n /2] be the pairing conditional on Ft , where
we let the pairing ((xs , ys ))s∈[`n /2] be such that xs < ys . We now construct a pairing
((x0s , ys0 ))s∈[`n /2] that has the correct distribution under Ft−1 , while ((xs , ys ))s∈[`n /2] and
((x0s , ys0 ))s∈[`n /2] differ by at most two edges almost surely, i.e., by switching one pairing.
For this, we let (xs , ys ) = (x0s , ys0 ) for s ∈ [t − 1]. Then we let xt be the half-edge with
lowest label that has not been paired yet at time t, and yt its pair prescribed by Ft . Further,
we also let x0t = xt , and we let yt0 be a pair of xt chosen independently of yt from the set
of available half-edges prescribed by Ft−1 . Then, clearly, ((xs , ys ))s∈[t] and ((x0s , ys0 ))s∈[t]
146 Connected Components in Configuration Models
have the correct distributions. We complete the proof by describing how the remaining half-
edges can be paired in such a way that at most two edges are different in ((xs , ys ))s∈[`n /2]
and ((x0s , ys0 ))s∈[`n /2] .
Let (xa , yb ) be the unique pair of ((xs , ys ))s∈[`n /2] such that yt0 ∈ {xa , ya }. Then, we
pair yt in ((x0s , ys0 ))s∈[`n /2] to {xa , ya } \ {yt0 }. Thus, in ((x0s , ys0 ))s∈[`n /2] , yt is paired
with ya when yt0 = xa , and yt is paired with xa when yt0 = ya . This means that the
pair of edges (xt , yt ) and (xa , ya ) in ((xs , ys ))s∈[`n /2] is switched to (xt , yt0 ) and either
the ordered version of {xa , yt } or that of {ya , yt } in ((x0s , ys0 ))s∈[`n /2] . All other pairs in
((x0s , ys0 ))s∈[`n /2] are the same as in ((xs , ys ))s∈[`n /2] . Since (xt , yt0 ) is paired independently
of (xt , yt ), the conditional distribution of ((x0s , ys0 ))s∈[`n /2] given Ft−1 is the same as that
of ((xs , ys ))s∈[`n /2] given Ft−1 , as required.
0
Let Nn,r (t) be the number of vertices whose r-neighborhood is isomorphic to t in
0 0
((xs , ys ))s∈[`n /2] . The above coupling gives that
0
Mt − Mt−1 = E[Nn,r (t) − Nn,r (t) | Ft ]. (4.2.21)
When switching two edges, the number of verticesPwhose r-neighborhood is isomorphic
r k
to t cannot change by more than 4c, where c = k=0 d and d is the maximal degree
in t. Indeed, the presence of an edge {u, v} in the resulting multi-graph Gn affects the
event {Br(Gn ) (i) ' t} only if there exists a path of length at most r in Gn between i and
{u, v}, the maximal degree along which is at most d. For the given choice of {u, v} there
are at most 2c such values of i ∈ [n]. Since a switch changes two edges, we obtain that
0
|Nn,r (t) − Nn,r (t)| ≤ 4c. Thus, Azuma–Hoeffding [V1, Theorem 2.27] implies that (with
the time variable n in [V1, Theorem 2.27] replaced by `n /2)
P(|Nn,r (t) − E[Nn,r (t)]| ≥ nε) = P(|M`n /2 − M0 | ≥ nε)
2
/[16c2 `n ]
≤ 2e−(nε) . (4.2.22)
Since this vanishes exponentially, and so is summable, we have proved the following corol-
lary:
Corollary 4.5 (Almost sure local convergence) The local convergence in Theorem 4.1 in
fact occurs almost surely.
second proof uses the concentration inequality in (4.2.22). We refer the reader to Section 4.6
for further discussion.
Proof We rely on Theorem 1.13, for which (4.2.23) provides the assumption. We will
compare the neighborhood probabilities in UGn (d) with those in CMn (d), and show that
these are asymptotically equal. We then use Theorem 4.1 to reach the conclusion.
j
It is convenient to order all half-edges in UGn (d) randomly. We then write {u
v in UGn (d)} for the event that the j th half-edge incident to u connects to v in UGn (d). For
j
CMn (d), we also order the half-edges in [`n ] randomly, and we write {u v in CMn (d)}
for the event that the j th half-edge incident to u connects to v in CMn (d).
Fix an ordered tree t and write Gn = UGn (d). Let us start by computing P(B̄r(Gn ) (o1 ) =
t), where we write B̄r(Gn ) (o) for the ordered version of Br(Gn ) (o), in which we order all
half-edges incident to o, as well as the forward half-edges incident to v ∈ Br(Gn ) (o) \ {o},
according to their labels. Here we bear in mind that the half-edge at v connecting v to the
(unique) vertex closer to the root o is not one of the forward edges.
Recall (1.3.45) in Theorem 1.13. Since the degrees in t are bounded, we can make the
approximation
(du − i + 1)dvi
P(u ∼ vi | EUi−1 ) = (1 + o(1)) . (4.2.25)
`n
Taking the product over i ∈ [du ], we conclude that
du
Y dv
P(∂B1(Gn ) (u) = {v1 , . . . , vdu }) = (1 + o(1))du ! i
. (4.2.26)
i=1
`n
Recalling that B̄1(Gn ) (u) is the ordered version of the 1-neighborhood of u and noting that
there are du ! orderings of the edges incident to u, each of them equally likely, we thus obtain
that
du
du ! Y dvj
P(∂ B̄1(Gn )
(u) = (v1 , . . . , vdu )) = (1 + o(1))
du ! j=1 `n
du
Y dvj
= (1 + o(1)) . (4.2.27)
`
j=1 n
148 Connected Components in Configuration Models
Alternatively,
du
j
Y dvj
P(u vj ∀j ∈ [du ] in UGn (d)) = (1 + o(1)) . (4.2.28)
`
j=1 n
so that
j
P(u vj ∀j ∈ [du ] in UGn (d))
j
= (1 + o(1))P(u vj ∀j ∈ [du ] in CMn (d)). (4.2.30)
This shows that, conditional on o = u, the neighborhood sets of u in UGn (d) can be
coupled whp to those in CMn (d).
We continue by investigating the neighborhood set of another vertex v ∈ Br(Gn ) (u). For
this, we note that one edge has already been used to connect v to Br(Gn ) (u), so there are dv −1
edges remaining, which we will call forward edges. Let v be the sth vertex in B̄r(Gn ) (u), and
let Fs−1 denote all the information about the edges and vertices that have been explored
before vertex v . Then we compute
v −1
dY
j dvj
P(v vj ∀j ∈ [dv − 1] in UGn (d) | Fs−1 ) = (1 + o(1)) , (4.2.31)
j=1
`n
where we use that there are (dv − 1)! orderings of the dv − 1 forward edges. For CMn (d),
we instead compute
v −1
dY
j dvj
P(v vj ∀j ∈ [du ] in CMn (d) | Fs−1 ) =
j=1
`n − 2j − 2s − 1
v −1
dY
dvj
= (1 + o(1)) , (4.2.32)
j=1
`n
be the total numbers of half-edges incident to high- and low-degree vertices, respectively.
We start by pairing the d≥K half-edges incident to high-degree vertices, and we call their
pairing good when the half-edges incident to a high-degree vertex are all connected to dis-
tinct low-degree vertices. We also call a sub-pairing (i.e., a pairing of a subset of the d≥K
half-edges incident to high-degree vertices) good when the half-edges in it are such that all
half-edges incident to the same vertex are paired with distinct vertices.
Let n[k] denote the number of low-degree vertices. Note that, independently of how earlier
150 Connected Components in Configuration Models
half-edges have been paired, the probability that the pairing of a half-edge keeps the sub-
pairing good is at least (n[k] − d≥K )/`n ≥ α for some α > 0, when k is such that n[k] ≥ εn
and when K is large enough that d≥K ≤ εn/2, which we assume from now on.
Let En be the probability that the pairing of half-edges incident to high-degree vertices is
good. Then, by the above,
P(En ) ≥ αd≥K . (4.2.41)
Now choose K = K(ε) sufficiently large that log(1/α)d≥K ≤ εn/2 for every n. Then we
obtain
P(En ) ≥ e−εn/2 . (4.2.42)
Having paired the half-edges incident to the high-degree vertices, we pair the remaining
half-edges uniformly. Note that CMn (d) is simple precisely when this pairing produces
a simple graph. Since the maximal degree is now bounded by K , the probability of the
simplicity of this graph is Θ(1) e−εn/2 (recall (1.3.41)). Thus, we arrive at
P(CMn (d) simple) ≥ e−εn . (4.2.43)
Since ε > 0 is arbitrary, the claim follows, when noting that obviously P(CMn (d) simple) ≤
1 = e−o(n) .
Proof of Corollary 4.7. By (4.2.22), and by Lemma 4.8,
P(|Nn,r (t) − E[Nn,r (t)]| ≥ nε | CMn (d) simple)
2
/[16c2 `n ] o(n)
≤ 2e−(nε) e = e−Θ(n) , (4.2.44)
which completes the proof since CMn (d), conditioned on simplicity, has the same law as
UGn (d) by (1.3.29) and the discussion below it.
In this section we investigate the connected components in the configuration model. Simi-
larly to the Erdős–Rényi random graph, we identify when the configuration model whp has
a giant component. Again, this condition has the interpretation that an underlying branching
process describing the exploration of a cluster has a strictly positive survival probability.
For a graph G, we recall that vk (G) denotes the number of vertices of degree k in G and
|E(G)| the number of edges. The main result concerning the size and structure of the largest
connected components of CMn (d) is the following:
Theorem 4.9 (Phase transition in CMn (d)) Consider CMn (d) subject to Conditions
1.7(a),(b). Assume that p2 = P(D = 2) < 1. Let Cmax and C(2) be the largest and second
largest connected components of CMn (d) (breaking ties arbitrarily).
(a) If ν = E[D(D − 1)]/E[D] > 1, then there exist ξ ∈ [0, 1), ζ ∈ (0, 1] such that
P
|Cmax |/n −→ ζ ,
vk (Cmax )/n −→ pk (1 − ξ k )
P
for every k ≥ 0,
2
1
E[D](1
P
|E(Cmax )|/n −→ 2
− ξ ),
4.3 The Giant in the Configuration Model 151
P P
while |C(2) |/n −→ 0 and |E(C(2) )|/n −→ 0.
(b) If ν = E[D(D − 1)]/E[D] ≤ 1, then |Cmax |/n −→ 0 and |E(Cmax )|/n −→ 0.
P P
Consequently, the same result holds for the uniform random graph P with degree sequence d
satisfying Conditions 1.7(a),(b), under the extra assumption that i∈[n] d2i = O(n).
where ξ satisfies
X
ξ= p?k ξ k . (4.3.2)
k≥0
Finally, an edge consists of two half-edges, and an edge is part of the giant compo-
nent precisely when this is true for one vertex incident to it, which occurs with probabil-
ity 1 − ξ 2 . There are in total `n /2 = nE[Dn ]/2 ≈ nE[D]/2 edges, which explains why
|E(Cmax )|/n −→ 12 E[D](1 − ξ 2 ). Therefore, the results in Theorem 4.9 have a simple
P
Our first example is when dv = 2 for all v ∈ [n], so we are studying a random 2-
regular graph. In this case the components are cycles and the distribution of cycle lengths in
CMn (d) is given by Ewen’s sampling formula ESF( 12 ); see, e.g., Arratia et al. (2003). This
implies that |Cmax |/n converges in distribution to a non-degenerate distribution on [0, 1]
(Arratia et al., 2003, Lemma 5.7) and not to any constant as in Theorem 4.9. Moreover,
the same is true for |C(2) |/n (and for |C(3) |/n, . . .), so in this case there are several large
components.
To see this result intuitively, we note that in the exploration of a cluster we start with
one vertex with two half-edges. When pairing a half-edge, it connects to a vertex that again
has two half-edges. Therefore, the number of half-edges to be paired is always equal to 2,
up to the moment when the cycle is closed, and the cluster is completed. When there are
m = αn free half-edges left, the probability of closing up the cycle equals 1/m = 1/(αn),
and, thus, the time this takes is of order n. A slight extension of this reasoning shows that
the time it takes to close a cycle is nTn , where Tn converges to a limiting non-degenerate
random variable (see Exercise 4.5).
Our second example for p2 = 1 is obtained by adding a small number of vertices of
degree 1. More precisely, we let n1 → ∞ be such that n1 /n → 0 and n2 = n − n1 . In this
case, components can either be cycles, or strings of vertices with degree 2 terminated with
two vertices with degree 1. When n1 → ∞, it is more likely that a long string of vertices of
degree 2 will be terminated by a vertex of degree 1 than by closing the cycle, as for the latter
we need to pair to a unique half-edge while for the former we have n1 choices. Therefore,
intuitively this implies that |Cmax | = oP (n) (see Exercise 4.6 for details).
Our third example for p2 = 1 is obtained by instead adding a small number of vertices
of degree 4 (i.e., n4 → ∞ such that n4 /n → 0, and n2 = n − n4 .) We can regard
each vertex of degree 4 as two vertices of degree 2 that have been identified. Therefore,
to obtain CMn (d) with this degree distribution, we can start from a configuration model
having n0 = n + n4 vertices, and uniformly identify n4 pairs of vertices of degree 2. Since
the configuration model with n0 = n+n4 vertices of degree 2 has many components having
size of order n, most of these merge into one giant component upon identification of these
pairs. As a result, |Cmax | = n − oP (n), so there is a giant component containing almost
everything (see Exercise 4.7).
We conclude that the case where p2 = P(D = 2) = 1 is quite sensitive to the precise
properties of the degree structure, which are not captured by the limiting distribution (pk )k≥1
only. In what follows, we thus ignore the case where p2 = 1.
For the latter, we refer to Exercise 4.8. Thus, it suffices to check the assumptions in Theorems
2.28–2.31. The uniform integrability of Dn = d(G on
n)
follows from Conditions 1.7(a),(b).
For the assumptions in Theorem 2.28, the local convergence in probability follows from
Theorem 4.1, so we are left to prove the crucial hypothesis in (2.6.7), which the remainder
of the proof does.
We first prove (2.6.7) under the assumption that dv ≤ b for all v ∈ [n]. At the end
of the proof, we will lift this assumption. To start with our proof of (2.6.7), applied to
Gn = CMn (d) under the condition that dv ≤ b for all v ∈ [n], we first use the al-
ternative formulation from Lemma 2.33, and note that (2.6.39) holds for the unimodular
branching-process tree with root offspring distribution (pk )k≥0 given by pk = P(D = k).
Thus, instead of proving (2.6.7), it suffices to prove (2.6.38), which we do later.
Recall from (2.6.54) that, with o1 , o2 ∈ [n] chosen independently and uar,
1 h i
E # (x, y) ∈ [n] × [n] : |∂B (Gn )
r (x)|, |∂Br
(Gn )
(y)| ≥ r, x ←→
/ y
n2
= P(|∂Br (o1 )|, |∂Br (o2 )| ≥ r, o1 ←→
(Gn ) (Gn )
/ o2 ). (4.3.7)
Thus, (2.6.38) states that
lim lim sup P(|∂Br(Gn ) (o1 )|, |∂Br(Gn ) (o2 )| ≥ r, o1 ←→
/ o2 ) = 0. (4.3.8)
r→∞ n→∞
consist of the individuals in the k th generation of this branching process. Since all degrees
are bounded, |Bk(Gnn ) (o1 )| ≤ (1 + b)mn = Θ(mn ). Let Cn (1) denote the event that this
perfect coupling happens, so that
k n = inf k : |Bk(Gn ) (o2 )| ≥ mn , (4.3.11)
and, again since all degree are bounded, |Bk(Gn ) (o2 )| ≤ (1 + b)mn = Θ(mn ). Further, for
n
δ > 0, we let
Cn (2) = (|∂Bk(Gn ) (o2 )|)k≤kn = (|BP(2)
k |)k≤kn
where Cn (2, 1) and Cn (2, 2) refer to the events in the first and second line of (4.3.12), respec-
tively. Here (BP(2)
k )k≥0 is again
√an n-dependent unimodular branching process independent
of (BP(1) )
k k≥0 . With m n = o( n), we will later pick mn such that mn mn n, to reach
our conclusion. The following lemma shows that also Cn (2) occurs whp:
Lemma 4.10 (Coupling beyond Lemma 4.2) Consider CMn (d) and let m2n /`n → ∞.
Then, for every δ > 0,
Proof The fact that Cn (2, 1) = (|∂Bk(Gn ) (o2 )|)k≤kn = (|BP(2) k |)k≤kn occurs whp fol-
lows in the same way as in (4.3.10). We thus investigate the bounds in Cn (2, 1) only for
k ∈ (k n , k n ].
Define an = (m2n /`n )1+δ where δ > 0. Let mn = (b + 1)mn denote the maximal
number of vertex explorations needed to explore the kn th generation. Recalling the notation
in Lemma 4.2, G b n (mn ) and BP
c n (mn ) denote the half-edges and individuals found up to
b n (mn ) \ BP
the mn th step of the exploration starting from o2 ; let G c n (mn ) \
c n (mn ) and BP
b n (mn ) denote the sets of half-edges that are in one, but not the other, exploration. Then
G
|G
b n (m̄n )| − |BP
c n (m̄n )| ≤ cn #{half-edge reuses up to time m̄n }. (4.3.17)
Thus, by (4.2.8),
m̄2n
E 1Cn (2,1)∩Dn |G
h i
b n (m̄n )| − |BP
c n (m̄n )| ≤ cn . (4.3.18)
+ `n
We continue with the second term in (4.3.15), which is similar. We note that |G
b n (t + 1)| −
|G
b n (t)| can be smaller than |BP
c n (t+1)|−|BP
c n (t)| when a half-edge reuse occurs, or when
a vertex reuse occurs. Thus, again using that the total number of secondary ghosts, together
with the single primary ghost, is at most cn , on Cn (2, 1) ∩ Dn ,
|BP
c n (m̄n )| − |G
b n (m̄n )| ≤ cn #{half-edge and vertex reuses up to time m̄n }. (4.3.19)
We conclude that
1 m̄2 ` δ/2
n
P(Cn (2)c ) ≤ 2cn n = O = o(1), (4.3.21)
an `n m̄2n
when taking cn → ∞ such that cn = o((m̄2n /`n )δ/2 ).
We now define the successful coupling event Cn to be
Cn = Cn (1) ∩ Cn (2), so that P(Cn ) = 1 − o(1). (4.3.22)
Recall that νn = E[Dn (Dn − 1)]/E[Dn ] denotes the expected forward degree of a
uniform half-edge in Gn = CMn (d), which equals the expected offspring of the branching
processes (BP(i)
k )k≥0 . Define
Dn = b(i) (i) (i)
k ≤ |BPr+k | ≤ b̄k ∀i ∈ [2], k ≥ 0 , (4.3.23)
where (b(i) (i) (i)
k )k≥0 and (b̄k )k≥0 satisfy the recursions b0 = b̄(i)
0 = b(i)
0 , while, for some
1
α ∈ ( 2 , 1),
(i) α (i) α
b(i) (i)
k+1 = bk νn − (b̄k ) , b̄(i) (i)
k+1 = b̄k νn + (b̄k ) . (4.3.24)
The following lemma investigates the asymptotics of (b(i) (i)
k )k≤kn −r and (b̄k )k≤kn −r :
(i) k
For the lower bound, we use that b̄(i)
k ≤ Ār b0 νn to obtain
α (i) α αk
b(i) (i)
k+1 ≥ bk νn − Ār (b0 ) νn . (4.3.28)
We use induction to show that
(i) k
b(i)
k ≥ ak b0 νn , (4.3.29)
4.3 The Giant in the Configuration Model 157
where a0 = 1 and
ak+1 = ak − Āαr r1−α νn(α−1)k−1 . (4.3.30)
The initialization follows since b(i) (i)
0 = b0 and a0 = 1. To advance the induction hypothesis,
we substitute it to obtain that
(i) k+1
b(i)
k+1 ≥ ak b0 νn − Āαr (b(i) α αk
0 ) νn
k+1 α−1 (α−1)k−1
ak − Āαr (b(i)
= b(i)
0 νn 0 ) νn
k+1
ak − Āαr rα−1 νn(α−1)k−1 = ak+1 b(i) k+1
≥ b(i)
0 νn 0 νn , (4.3.31)
by (4.3.30). Finally, ak is decreasing, and thus ak & a ≡ 1/Ar , where
Y −1
Ar = 1 − Āαr r−(1−α) νn−(1−α)k <∞
k≥0
where
Dn,k = b(i) (i) (i)
k ≤ |BPr+k | ≤ b̄k ∀i ∈ [2] . (4.3.34)
Note that, when |BP(i) (i) (i) (i)
r+k | > b̄k and |BPr+k−1 | ≤ b̄k−1 ,
α
|BP(i) (i) (i) (i) (i)
r+k | − νn |BPr+k−1 | > b̄k − νn b̄k−1 = (b̄k−1 ) (4.3.35)
while when |BP(i) (i) (i) (i)
r+k | < bk and |BPr+k−1 | ≥ bk−1 ,
α
|BP(i) (i) (i) (i) (i)
r+k | − νn |BPr+k−1 | < bk − νn bk−1 = −(b̄k−1 ) . (4.3.36)
Thus,
c
Dn,k ∩ Dn,k−1 (4.3.37)
α α
(1) (1)
(2) (2)
⊆ |BPr+k | − νn |BPr+k−1 | ≥ (b̄k−1 ) ∪ |BPr+k | − νn |BPr+k−1 | ≥ (b̄(2)
(1)
k−1 ) .
By the Chebychev inequality, conditional on Dn,k−1 ,
P |BP(i) α
(i)
r+k | − νn |BPr+k−1 | ≥ (b̄k−1 ) | Dn,k−1
(i)
(4.3.38)
(i)
Var(|BPr+k | | Dn,k−1 ) σn2 E[|BP(i)
r+k−1 | | Dn,k−1 ]
≤ ≤ ≤ σn2 (b̄(i)
k−1 )
1−2α
,
(b̄(i)
k−1 )
2α (b̄(i)
k−1 )
2α
158 Connected Components in Configuration Models
so that σn2 ≤ b(b − 1)2 is uniformly bounded. Thus, by the union bound for i ∈ {1, 2},
P(Dn,k
c
∩ Dn,k−1 ) ≤ 2σn2 (b̄(i)
k−1 )
1−2α
, (4.3.40)
and we conclude that
X
P(Dnc ) ≤ 2σn2 1−2α 1−2α
(b̄(1)
k−1 ) + (b̄(2)
k−1 ) . (4.3.41)
k≥1
The claim now follows from Lemma 4.11 together with α ∈ ( 12 , 1) and the fact that σn2 ≤
b(b − 1)2 remains uniformly bounded.
When ∂Bk(Gnn ) (o1 ) ∩ ∂Bk(Gn ) (o2 ) 6= ∅, we have o1 ←→ o2 so this does not contribute to
n
(4.3.42).
On the other hand, when ∂Bk(Gnn ) (o1 ) ∩ ∂Bk(Gn ) (o2 ) = ∅, by Lemma 4.11 and when
n
m2n /`n → ∞ sufficiently slowly, |∂Bk(Gnn ) (o1 )| = ΘP (mn ) and |∂Bk(Gnn ) (o2 )| = ΘP (mn ).
The same bounds hold for the number of half-edges Zk(1)n and Zk(2) incident to ∂Bk(Gnn ) (o1 )
n
and ∂Bk(Gn ) (o2 ), respectively, since Zk(1)n ≥ |∂Bk(Gnn+1
)
(o 1 )| and Zk
(2)
≥ |∂Bk(Gnn+1
)
(o2 )|, so
n n
(1) (2)
that also Zkn = ΘP (mn ) and Zk = ΘP (mn ).
n
Conditional on having paired some half-edges incident to ∂Bk(Gnn ) (o1 ), each further such
half-edge has probability at least 1 − Zk(2) /`n of being paired with a half-edge incident to
n
∂Bk(Gnn ) (o2 ), thus creating a path between o1 and o2 . The latter conditional probability is
independent of the pairing of the earlier half-edges. Thus, the probability that ∂Bk(Gnn ) (o1 ) is
not directly connected to ∂Bk(Gn ) (o2 ) is at most
n
Zk(2) Zk(1)n /2
1− n , (4.3.45)
`n
4.3 The Giant in the Configuration Model 159
since at least Zk(1) /2 pairings need to be performed. This probability vanishes when mn mn
n
n. As a result, as n → ∞,
P(|∂Br(Gn ) (o1 )|, |∂Br(Gn ) (o2 )| ≥ r, o1 ←→
/ o2 ; Gn ) = o(1), (4.3.46)
as required. This completes the proof of (2.6.7) for CMn (d) with uniformly bounded de-
grees, and indeed shows that distCMn (d) (o1 , o2 ) ≤ 2r + k n + k n + 1 whp on the event that
|∂Br(Gn ) (o1 )|, |∂Br(Gn ) (o2 )| ≥ r.
Recall the definition of CMn0 (d0 ) in Theorem 1.11 and its proof. This implies that CMn0 (d0 )
has at most (1+ε)n vertices, and that the (at most εn) extra vertices compared with CMn (d)
all have degree 1, while the vertices in [n] have degree d0v ≤ b. Further, with Cmax
0
the largest
0
connected component in CMn0 (d ), by (4.3.47), we have
0
|Cmax | ≤ |Cmax | + εn, (4.3.48)
so that
P(|Cmax | ≥ n(ζ 0 − 2ε)) → 1, (4.3.49)
0
|/n −→ ζ 0 . We denote the limiting parameters of CMn0 (d0 ) by (p0k )k≥1 , ξ 0 and
P
when |Cmax
0
ζ , and note that, for ε as in (4.3.47), when ε & 0, we have
p0k → pk , ξ 0 → ξ, ζ 0 → ζ, (4.3.50)
so that we can take b sufficiently large that, for all ε > 0,
P(|Cmax | ≥ n(ζ − 3ε)) → 1. (4.3.51)
This proves the required lower bound on |Cmax |, while the upper bound follows from Corol-
lary 2.27.
Remark 4.13 (Small-world properties of CMn (d)) We next discuss the consequences of
the above proof to the small-world nature of CMn (d), in a similar way to the proof of
Theorem 2.36. Here we should consider the use of the degree truncation in Theorem 1.11.
Let o1 , o2 ∈ [n] be chosen uar. Recall from Theorem 1.11(c) that distCMn (d) (o1 , o2 ) ≤
distCMn0 (d0 ) (o1 , o2 ). The above “giant is almost local” proof shows that, whp, if n → ∞
followed by r → ∞, then
distCMn0 (d0 ) (o1 , o2 ) ≤ 2r + k n + k n + 1. (4.3.52)
Lemma 4.11 implies the asymptotics k n = (1 + oP (1)) log mn / log νn0 and k n = (1 +
160 Connected Components in Configuration Models
oP (1)) log mn /log νn0 , where νn0 = E[Dn0 0 (Dn0 0 − 1)]/E[Dn0 0 ]. Thus, on the event Gn =
Cn ∩ D n ,
log n
distCMn (d) (o1 , o2 ) ≤ (1 + ε). (4.3.54)
log ν
be the probability generating function for the probability distribution (pk )k≥1 given by pk =
P(D = k). Recall that, for a non-negative random variable D, the random variable D?
denotes its size-biased distribution. Define further, again for s ∈ [0, 1],
?
X
G?D (s) = E[sD ] = p?k sk = G0D (s)/E[D], (4.3.56)
k≥0
Note that G?D (1) = 1, and thus H(0) = H(1) = 0. Note also that
d X
H 0 (1) = E[D] 1 − G?D (1) = E[D] 1 − kp?k
ds k≥0
X
= E[D] − k(k − 1)pk = −E[D(D − 2)]. (4.3.58)
k≥1
for the exploration of the giant component in the configuration model. Regard each edge
as consisting of two half-edges, each half-edge having one endpoint. We label the vertices
as sleeping or awake (i.e., used) and the half-edges as sleeping, active, or dead (already
paired into edges); the sleeping and active half-edges are also called living. We start with all
vertices and half-edges sleeping. Pick a vertex and label its half-edges as active. Then take
any active half-edge, say x, and find its partner y in the graph; label these two half-edges as
dead. Further, if the endpoint of y is sleeping, label it as awake and all other half-edges of
the vertex incident to y as active. Repeat as long as there are active half-edges. When there is
no active half-edge left, we have obtained the first connected component in the graph. Then
start again with another vertex until all components are found.
We apply this algorithm to CMn (d), revealing its edges during the process. Thus we
initially only observe the vertex degrees and the half-edges, not how they are paired with
form edges. Hence, each time we need a partner for a half-edge, this partner is uniformly
distributed over all other living half-edges. It is here that we are using the specific structure
of the configuration model, which simplifies the analysis substantially.
We make the random choices of finding a partner for the half-edges by associating iid
random maximal lifetimes Ex to the half-edge x, where Ex has an Exp(1) distribution. We
interpret these lifetimes as clocks, and changes in our exploration process occur only when
the clock of a half-edge rings. In other words, each half-edge dies spontaneously at rate 1
(unless killed earlier). Each time we need to find a partner for a half-edge x, we then wait
until the next living half-edge unequal to x dies and take that one. This process in continuous
time can be formulated as an algorithm, constructing CMn (d) and exploring its components
simultaneously, as follows. Recall that we start with all vertices and half-edges sleeping. The
exploration is then formalized in the following three steps:
Step 1 When there is no active half-edge (as at the beginning), select a sleeping vertex and
declare it awake and all its half-edges active. For definiteness, we choose the vertex by
choosing a half-edge uar among all sleeping half-edges. When there is no sleeping half-
edge left, the process stops; the remaining sleeping vertices are all isolated and we have
explored all other components.
Step 2 Pick an active half-edge (which one does not matter) and kill it, i.e., change its status to
dead.
Step 3 Wait until the next half-edge dies (spontaneously, as a result of its clock ringing). This
half-edge is paired with the one killed in step Step 2 to form an edge of the graph. If
the vertex incident to it is sleeping, then we change this vertex to awake and all other
half-edges incident to it to active. Repeat from Step 1.
The above randomized algorithm is such that components are created between the succes-
sive times at which Step 1 is performed, where we say that Step 1 is performed when there
is no active half-edge and, as a result, a new vertex is chosen whose connected component
we continue exploring.
The vertices in the component created during one of these intervals between the suc-
cessive times at which Step 1 is performed are the vertices that are awakened during the
interval. Note also that a component is completed and Step 1 is performed exactly when the
4.3 The Giant in the Configuration Model 163
number of active half-edges is 0 and a half-edge dies at a vertex where all other half-edges (if
any) are dead. Below, we investigate the behavior of the key characteristics of the algorithm.
be the number of living half-edges. For definiteness, we define these random functions to be
right-continuous.
P
Let us first look at L(t). We start with `n = i∈[n] di half-edges, all sleeping and thus
living, but we immediately perform Step 1 and Step 2 and kill one of them. Thus, L(0) =
`n − 1. Next, as soon as a living half-edge dies, we perform Step 3 and then (instantly)
either Step 2 or both Step 1 and Step 2. Since Step 1 does not change the number of
living half-edges while Step 2 and Step 3 each decrease it by 1, the total result is that L(t)
is decreased by 2 each time one of the living half-edges dies, except when the last living one
dies and the process terminates. Because of this simple dynamics of t 7→ L(t), we can give
sharp asymptotics of L(t) when n → ∞:
Proof The process t 7→ L(t) satisfies L(0) = `n − 1, and it decreases by 2 at rate L(t).
As a result, it is closely related to a death process. We study such processes in the following
lemma:
Lemma 4.15 (Asymptotics of death processes) Let d, γ > 0 be given and let (N (x) (t))t≥1
be a Markov process such that N (x) (0) = x almost surely, and the dynamics of t 7→
(N (x) (t))t≥1 is such that from position y , it jumps down by d at a rate γy . In other words,
the waiting time until the next event is Exp(1/γy) and each jump is of size d downwards.
Then, for every t0 ≥ 0,
h 2
i
E sup N (x) (t) − e−γdt x ≤ 8d(eγdt0 − 1)x + 8d2 . (4.3.61)
t≤t0
Proof The proof follows by distinguishing several cases. First assume that d = 1 and
that x is an integer. In this case, the process is a standard pure death process taking the
values x, x − 1, x − 2, . . . , 0, describing the number of particles alive when the particles
die independently at rate γ > 0. As is well known, and is easily seen by regarding N (x) (t)
as the sum of x independent copies of the process N (1) (t), the process (eγt N (x) (t))t≥1 , is
a martingale starting in x. Furthermore, for every t ≥ 0, the random variable N (x) (t) has
a Bin(x, e−γt ) distribution, since each particle (of which there are x) has a probability of
dying before time t equal to e−γt , and the different particles die independently.
Application of Doob’s martingale inequality (recall (1.5.4)), now in continuous time,
164 Connected Components in Configuration Models
yields
h 2
i h 2
i h 2 i
E sup N (x) (t) − e−γt x ≤ E sup eγt N (x) (t) − x ≤ 4E eγt N (x) (t0 ) − x
t≤t0 t≤t0
2γt
= 4e Var(N (x) (t0 )) ≤ 4(eγt0 − 1)x. (4.3.62)
This proves the claim when x is integer.
Next, we still assume
that d = 1, but let x > 0 be arbitrary. We can couple the two
processes N (x) (t) t≥1 and N (bxc) (t))t≥1 with different initial values in such a way that
whenever the smaller one jumps by 1, so does the other. This coupling keeps
|N (x) (t) − N (bxc) (t)| < 1 (4.3.63)
for all t ≥ 0, and thus,
sup N (bxc) (t) − e−γt bxc ≤ sup N (x) (t) − e−γt x + 2, (4.3.64)
t≤t0 t≤t0
Finally, for a general d > 0, we observe that N (x) (t)/d is a process of the same type with the
parameters (γ, d, x) replaced by (γd, 1, x/d), and the general result follows from (4.3.65)
and (4.3.62).
The proof of Proposition 4.14 follows from Lemma 4.15 with d = 2, x = `n − 1 =
nE[Dn ] − 1, and γ = 1.
We continue by considering the sleeping half-edges S(t). Let Vk (t) be the number of
sleeping vertices of degree k at time t, so that
∞
X
S(t) = kVk (t). (4.3.66)
k=1
Note that Step 2 does not affect sleeping half-edges, and that Step 3 implies that each
sleeping vertex of degree k is eliminated (i.e., awakened) with intensity k , independently of
what happens to all other vertices. However, some sleeping vertices eliminated by Step 1,
which complicates the dynamics of t 7→ Vk (t).
It is here that the depletion-of-points-and-half-edges effect enters the analysis of the com-
ponent structure of CMn (d). This effect is complicated, but we will see that it is quite
harmless, as can be understood by noting that we apply Step 1 only when we have com-
pleted the exploration of an entire component. Since we are mainly interested in settings
where the giant component is large, we will see that we will not be using Step 1 very often
before having completely explored the giant component. After having completed the explo-
ration of the giant component, we start using Step 1 again quite frequently, but it will turn
out that then it is very unlikely to be exploring any particularly large connected component.
Thus, we can have a setting in mind where the number of applications of Step 1 is quite
small. With this intuition in mind, we first ignore the effect of Step 1 by letting Vek (t) be the
number of vertices of degree k such that all its half-edges x have maximal lifetimes Ex > t
i.e., none of its k half-edges would have died spontaneously up to time t, assuming they all
4.3 The Giant in the Configuration Model 165
escaped Step 1. We conclude that, intuitively, the difference between Vk (t) and Vek (t) can
be expected to be insignificant. We thus start by focussing on the dynamics of (Vek (t))t≥1 ,
ignoring the effect of Step 1, and later correct for this omission.
For a given half-edge, we call the half-edges incident to the same vertex its sibling half-
edges. Further, let
∞
X
S(t)
e = k Vek (t) (4.3.67)
k=1
denote the number of half-edges whose sibling half-edges have all escaped spontaneous
death up to time t. Comparing with (4.3.66), we see that the process S(t)
e ignores the effect
of Step 1 in an identical way to Vek (t).
Recall the functions GD , G?D from (4.3.55) and (4.3.56), and define, for s ∈ [0, 1],
h(s) = sE[D]G?D (s). (4.3.68)
Then, we can identify the asymptotics of (Vek (t))t≥1 in a similar way to that in Proposition
4.14:
Lemma 4.16 (Number of living vertices of degree k ) Subject to Conditions 1.7(a),(b), as
n → ∞ and for any t0 ≥ 0 fixed,
sup |n−1 Vek (t) − pk e−kt | −→ 0
P
(4.3.69)
t≤t0
Proof The statement (4.3.69) again follows from Lemma 4.15, now with γ = k , x = nk
and d = 1. We can replace p(Gk
n)
= nk /n by pk by Condition 1.7(a).
By Condition 1.7(b), the sequence of random variables (Dn )n≥1 is uniformly integrable,
which means that for every ε > 0 there exists K < ∞ such that k>K knk /n =
P
E[Dn 1{D Pn >k} ] < ε for all n. We may further assume (or deduce from Fatou’s inequal-
ity) that k>K kpk < ε and obtain by (4.3.69) that, whp,
∞
X
−1 e − h(e−t )| = sup
sup |n S(t) k(n−1 Vek (t) − pk e−kt )
t≤t0 t≤t0
k=1
K n
k
X X
≤ k sup |n−1 Vek (t) − pk e−kt | + k + pk
k=1
t≤t0
k>K
n
≤ ε + ε + ε,
proving (4.3.71). An almost identical argument yields (4.3.70).
Remarkably, the difference between S(t) and S(t)e is easily estimated. The following
result can be viewed as the key to why this approach works. Indeed, it gives a uniform upper
bound on the difference due to the application of Step 1:
166 Connected Components in Configuration Models
Lemma 4.17 (Effect of Step 1) Let dmax := maxv∈[n] dv be the maximum degree of
CMn (d). Then
0 ≤ S(t)
e − S(t) < sup (S(s)
e − L(s)) + dmax . (4.3.72)
0≤s≤t
(a) If ν = E[D(D − 1)]/E[D] > 1 and p1 > 0, then there is a unique ξ ∈ (0, 1) such
that H(ξ) = 0. Moreover, H(s) < 0 for all s ∈ (0, ξ) and H(s) > 0 for all s ∈ (ξ, 1).
(b) If ν = E[D(D − 1)]/E[D] ≤ 1, then H(s) < 0 for all s ∈ (0, 1).
Proof As remarked earlier, H(0) = H(1) = 0 and H 0 (1) = −E[D(D − 2)]. Fur-
thermore, if we define φ(s) := H(s)/s, then φ(s) = E[D](s − G?D (s)) is a concave
function on (0, 1], and it is strictly concave unless pk = 0 for all k ≥ 3, in which case
H 0 (1) = −E[D(D − 2)] = p1 > 0. Indeed, p1 + p2 = 1 when pk = 0 for all k ≥ 3.
Since we assume that p2 < 1, we thus obtain that p1 > 0 in this case.
In case (b), we thus have that φ is concave and φ0 (1) = H 0 (1) − H(1) ≥ 0, with
either the concavity or the inequality strict, and thus φ0 (s) > 0 for all s ∈ (0, 1), whence
φ(s) < φ(1) = 0 for s ∈ (0, 1).
In case (a), H 0 (1) < 0, and thus H(s) > 0 for s close to 1. Further, when p1 > 0,
H (0) = −h0 (0) = −p1 < 0, and thus H(s) ≤ 0 for s close to 0. Hence, there is at least
0
one ξ ∈ (0, 1) with H(ξ) = 0 and, since H(s)/s is strictly concave and also H(1) = 0,
there is at most one such ξ and the result follows.
Further, by Condition 1.7(b), dmax = o(n), and thus dmax /n → 0. Therefore, (4.3.76)
and (4.3.78) yield
sup n−1 |A(t) − A(t)| = sup n−1 |S(t)
P
e e − S(t)| −→ 0. (4.3.79)
t≤θ t≤θ
168 Connected Components in Configuration Models
Thus, by (4.3.75),
sup |n−1 A(t) − H(e−t )| −→ 0.
P
(4.3.80)
t≤θ
This is the work horse of our argument. By Lemma 4.18, we know that t 7→ H(e−t ) is
positive on (0, − log ξ) when ν > 1. Thus, exploration in the interval (0, − log ξ) will find
the giant component. In particular, we need to show that whp no large connected component
is found before or after this interval (showing that the giant is unique), and we need to
investigate the properties of the giant, in terms of its number of edges, vertices of degree k ,
etc. We now provide these details.
Let 0 < ε < θ/2. Since H(e−t ) > 0 on the compact interval [ε, θ − ε], (4.3.80) implies
that A(t) remains whp positive on [ε, θ − ε], and thus we have not started exploring a new
component in this interval.
On the other hand, again by Lemma 4.18(a), H(e−(θ+ε) ) < 0 and (4.3.75) implies that
−1 e
n A(θ + ε) −→ H(e−(θ+ε) ), while A(θ + ε) ≥ 0. Thus, with ∆ = |H(e−θ−ε )|/2 > 0,
P
whp
e + ε) − S(θ + ε) = A(θ + ε) − A(θ
S(θ e + ε) ≥ −A(θ
e + ε) > n∆, (4.3.81)
Let T1 be the last time that Step 1 was performed before time θ/2. Let T2 be the next time
that Step 1 is performed (by convention, T2 = ∞ if such a time does not exist). We have
shown that, for any ε > 0 and whp, 0 ≤ T1 ≤ ε and θ − ε ≤ T2 ≤ θ + ε. In other words,
P P
T1 −→ 0 and T2 −→ θ. We conclude that we have found one component that is explored
P P
between time T1 −→ 0 and time T2 −→ θ. This is our candidate for the giant component,
and we continue to study its properties, i.e., its size, its number of edges, and its number of
vertices of degree k . These properties are stated separately in the next proposition, so that
we are able to reuse them later on:
Proposition 4.19 (Connected component properties) Let T1? and T2? be two random times
when Step 1 is performed, with T1? ≤ T2? , and assume that T1? −→ t1 and T2? −→ t2
P P
Below, we apply Proposition 4.19 to T1 = oP (1) and T2 = θ + oP (1). We can identify the
values of the above constants for t1 = 0 and t2 = θ as e−t1 = 1, e−t2 = ξ , GD (e−t1 ) = 1,
GD (e−t2 ) = 1 − ζ , h(e−t1 ) = 2E[D], h(e−t2 ) = 2E[D]ξ 2 (see Exercise 4.9).
By Proposition 4.19 and Exercise 4.9, Theorem 4.9(a) follows if we can prove that the
4.3 The Giant in the Configuration Model 169
connected component found between times T1 and T2 is indeed the giant component. This
will be proved after we complete the proof of Proposition 4.19:
Proof The set of vertices C ? contains all vertices awakened in the interval [T1? , T2? ) and
no others, and thus (writing Vk (t−) = lims%t Vk (s))
vk (C ? ) = Vk (T1? −) − Vk (T2? −), k ≥ 1. (4.3.85)
where the latter equality follows since H(1) = 0. Now, (4.3.75) and (4.3.76) imply, in
analogy with (4.3.78) and (4.3.79), that n−1 inf t≤T2? A(t)
P
e −→ 0 and thus also
Hence (4.3.87) implies that supt≤T2? |Vek (t) − Vk (t)| = oP (n) for every k ≥ 1. Conse-
quently, using Lemma 4.16, for j = 1, 2, we have
?
Vk (Tj? −) = Vek (Tj? −) + oP (n) = npk e−kTj + oP (n) = npk e−ktj + oP (n), (4.3.89)
P∞
and (4.3.82) follows by (4.3.85). Similarly, using k=0 (Vek (t) − Vk (t)) ≤ S(t)
e − S(t),
∞
X ∞
X
?
|C | = (Vk (T1? −) − Vk (T2? −)) = (Vek (T1? −) − Vek (T2? )) + oP (n) (4.3.90)
k=1 k=1
? ?
= nGD (e−T1 ) − nGD (e−T2 ) + oP (n),
and
∞
X ∞
X
2|E(C ? )| = k(Vk (T1? −) − Vk (T2? )) = k(Vek (T1? −) − Vek (T2? )) + oP (n)
k=1 k=1
−T1? −T2?
= nh(e ) − nh(e ) + oP (n). (4.3.91)
Thus, (4.3.83) and (4.3.84) follow from the convergence Ti? −→ ti and the continuity of
P
is found.
No late large component. In order to study the probability of finding a component containing
at least η`n edges after Cmax
0
is found, we start by letting T3 be the first time after time T2
e − S(t) increases by at most dmax = o(n) each time
that Step 1 is performed. Since S(t)
Step 1 is performed, we obtain from (4.3.87) that
e − S(t)) ≤ sup (S(t)
sup (S(t) e − S(t)) + dmax = oP (n). (4.3.96)
t≤T3 t≤T2
Comparing this with (4.3.81), for every ε > 0 and whp we have that θ + ε > T3 . Since also
T3 > T2 −→ θ, it follows that T3 −→ θ. If C 0 is the component created between times T2
P P
and T3 , then Proposition 4.19 applied to T2 and T3 yields |C 0 |/n −→ 0 and |E(C 0 )| −→ 0.
P P
chosen at random by Step 1 at time T2 to start the component C 0 would belong to C . If this
occurred, we would clearly have that C = C 0 . Consequently,
P(a component C with |E(C )| ≥ η`n is found after Cmax
0
)
≤ η −1 P(|E(C 0 )| ≥ η`n ) → 0, (4.3.97)
since |E(C 0 )| −→ 0.
P
Completion of the proof of Theorem 4.9(a). Combining (4.3.95) and (4.3.97), we see that
4.3 The Giant in the Configuration Model 171
Further, again whp, |E(C(2) )| < η`n . Consequently, the results for Cmax follow from
P
(4.3.92)–(4.3.94). We have further shown that |E(C(2) )|/`n −→ 0, which implies that
P P
|E(C(2) )|/n −→ 0 and |C(2) |/n −→ 0 because `n = Θ(n) and |C(2) | ≤ |E(C(2) )| + 1.
This completes the proof of Theorem 4.9(a).
Completion of the proof of Theorem 4.9(b). The proof of Theorem 4.9(b) is similar to the
last step in the proof for Theorem 4.9(a). Indeed, let T1 = 0 and let T2 be the next time that
Step 1 is performed, and let T2 = ∞ if this does not occur. Then
sup |A(t) − A(t)|
e = sup |S(t)
e − S(t)| ≤ 2dmax = o(n). (4.3.98)
t≤T2 t≤T2
For every ε > 0, n−1 A(ε) −→ H(e−ε ) < 0 by (4.3.75) and Lemma 4.18(b), while
P
e
P
A(ε) ≥ 0, and it follows from (4.3.98) that whp T2 < ε. Hence, T2 −→ 0. We apply
Proposition 4.19 (which holds in this case too, with θ = 0) and find that if C is the first
P
component found, then |E(C )|/n −→ 0.
Let η > 0. If |E(Cmax )| ≥ η`n , then the probability that the first half-edge chosen by
Step 1 belongs to Cmax , and thus C = Cmax , is 2|E(Cmax )|/(2`n ) ≥ η , and hence,
P(|E(Cmax )| ≥ η`n ) ≤ η −1 P(|E(C )| ≥ η`n ) → 0. (4.3.99)
The results follows since `n = Θ(n) by Condition 1.7(b) and |Cmax | ≤ |E(Cmax )| + 1.
This completes the proof of Theorem 4.9(b), and thus that of Theorem 4.9.
Proof By [V1, Corollary 7.17], and since d = (dv )v∈[n] satisfies Conditions 1.7(a)–(c),
any event En that occurs whp for CMn (d) also occurs whp for UGn (d). By Theorem 4.9,
the event En = { |Cmax |/n − ζ ≤ ε} occurs whp for CMn (d), so it also holds whp for
UGn (d). The proof for the other properties is identical.
Note that it is not obvious how to extend Theorem 4.20 to the case where ν = ∞, which
we discuss now:
Theorem 4.21 (Giant in uniform graph with given degrees for ν = ∞) Consider UGn (d),
where the degrees d satisfy Conditions 1.7(a),(b), and assume that there exists τ ∈ (2, 3)
such that, for every x ≥ 1,
[1 − Fn ](x) ≤ cF x−(τ −1) . (4.3.100)
172 Connected Components in Configuration Models
Then, Theorem 4.9 extends to the uniform simple graph with degree sequence d.
Sketch of proof. We do not present the entire proof, but rather sketch it. We will show that,
for every ε > 0, there exists δ = δ(ε) > 0 such that
and
P(|vk (Cmax ) − pk (1 − ξ k )n| > εn) ≤ e−δn . (4.3.102)
This exponential concentration is quite convenient, as it allows us to extend the result to the
setting of uniform random graphs by conditioning CMn (d) to be simple. Indeed, by Lemma
4.8, it follows that the result also holds for the uniform simple random graph UGn (d) when
Conditions 1.7(a),(b) hold. In Exercise 4.10 below, the reader is invited to fill in the details
of the proof of Theorem 4.21.
We refer to Section 4.5 for a further discussion of (4.3.102). There, we discuss approxi-
mations for P(CMn (d) simple) under conditions such as (4.3.100).
We next prove Theorem 3.19 for rank-1 inhomogeneous random graphs, as already stated
in Theorem 3.20 and as restated here for convenience:
Theorem 4.22 (Phase transition in rank-1 random graphs) Let w satisfy Condition 1.1(a)–
(c). Then the results in Theorem 4.9 also hold for GRGn (w), CLn (w), and NRn (w).
Proof Let dv be the degree of vertex v ∈ [n] in GRGn (w) defined in [V1, (1.3.18)], where
we use a small letter to avoid confusion with Dn , which is the degree of a uniform vertex
in [n]. By [V1, Theorem 7.18], the law of GRGn (w) conditioned on the degrees d and
CMn (d) conditioned on being simple agree (recall also Theorem 1.4). By Theorem 1.9,
(dv )v∈[n] satisfies Conditions 1.7(a)–(c) in probability. Then, by [V1, Theorem 7.18] and
Theorem 4.9, the results in Theorem 4.9 also hold for GRGn (w). By [V1, Theorem 6.20],
the same result applies to CLn (w), and, by [V1, Exercise 6.39], also to NRn (w).
Unfortunately, when ν = ∞ we cannot rely on the fact that, by [V1, Theorem 7.18], the
law of GRGn (w) conditioned on the degrees d and CMn (d) conditioned on being simple
agree. Indeed, when ν = ∞, the probability that CMn (d) is simple vanishes. Therefore,
we instead rely on a truncation argument to extend Theorem 4.22 to the case where ν = ∞.
It is here that the monotonicity of GRGn (w) in terms of the edge probabilities can be used
rather conveniently:
Theorem 4.23 (Phase transition in GRGn (w)) Let w satisfy Conditions 1.1(a),(b). Then,
the results in Theorem 4.9 also hold for GRGn (w), CLn (w), and NRn (w).
P
Proof We prove only that |Cmax |/n −→ ζ ; the other statements in Theorem 4.9 can be
proved in a similar fashion (see Exercises 4.11 and 4.12 below). We prove Theorem 4.23
only for NRn (w), the proofs for GRGn (w) and CLn (w) being similar. The required upper
bound |Cmax |/n ≤ ζ + oP (1) follows by the local convergence in probability in Theorem
3.14 and Corollary 2.27.
4.4 Connectivity of Configuration Models 173
For the lower bound, we bound NRn (w) from below by a random graph with edge prob-
abilities
−(wu ∧K)(wv ∧K)/`n
uv = 1 − e
p(K) . (4.3.103)
Therefore, we also have |Cmax | |Cmax
(K)
|, where Cmax
(K)
is the largest connected component
in the inhomogeneous random graph with edge probabilities (puv (K)
)u,v∈[n] . Let
1 X
wv(K) = (wv ∧ K) (wu ∧ K), (4.3.104)
`n u∈[n]
so that the edge probabilities in (4.3.103) correspond to the Norros–Reittu model with
weights (wv(K) )v∈[n] . It is not hard to see that, when Condition 1.1(a) holds for (wv )v∈[n] ,
Conditions 1.1(a)–(c) hold for (wv(K) )v∈[n] , where the limiting random variable equals W (K) =
(W ∧ K)E[(W ∧ K)]/E[W ]. Therefore, Theorem 4.22 applies to (wv(K) )v∈[n] . We deduce
P
that |Cmax
(K)
|/n −→ ζ (K) , which is the survival probability of the two-stage mixed-Poisson
branching process with mixing variable W (K) . Since ζ (K) → ζ when K → ∞, we conclude
P
that |Cmax |/n −→ ζ .
Assume that P(D = 2) < 1. By Theorem 4.9, we see that |Cmax |/n −→ 1 when P(D ≥
P
2) = 1, as in this case the survival probability ζ of the local limit equals 1. In this section,
we investigate the conditions under which CMn (d) is whp connected, i.e., Cmax = [n] and
|Cmax | = n. Our main result shows that this occurs whp when dmin = minv∈[n] dv ≥ 3:
Theorem 4.24 (Connectivity of CMn (d)) Assume that Conditions 1.7(a),(b) hold. Fur-
ther, assume that dv ≥ 3 for every v ∈ [n]. Then
P(CMn (d) disconnected) = o(1). (4.4.1)
If Condition 1.7(a) holds with p1 = p2 = 0, then ν ≥ 2 > 1 is immediate, so we
are always in the supercritical regime. Also, ζ = 1 when p1 = p2 = 0, since survival
of the unimodular branching-process tree occurs with probability 1. Therefore, Theorem
4.9 implies that the largest connected component has size n(1 + oP (1)) when Conditions
1.7(a),(b) hold. Theorem 4.24 extends this to the statement that CMn (d) is whp connected.
Theorem 4.24 yields an important difference between the generalized random graph and
the configuration model, also from a practical point of view. Indeed, for the generalized
random graph to be whp connected, the degrees must tend to infinity. This has already been
observed for ERn (p) in [V1, Theorem 5.8]. The configuration model can be connected
while the average degree is bounded. Many real-world networks are connected, which makes
the configuration model often more suitable than inhomogeneous random graphs from this
perspective (recall Table 4.1 and Figure 4.1).
Proof The proof is based on a relatively simple counting argument. We recall that a config-
uration denotes a pairing of all the half-edges. We note that the probability of a configuration
equals 1/(`n − 1)!!. On the event that CMn (d) is disconnected, there exists a set of vertices
I ⊆ [n] with |I| ≤ bn/2c such that all half-edges incident to vertices in I are paired only
174 Connected Components in Configuration Models
with half-edges incident to other vertices in I . For I ⊆ [n], we let `n (I) denote the total
degree of I , i.e.,
X
`n (I) = di . (4.4.2)
i∈I
Since dmin ≥ 3, we can use Theorem 4.9 to conclude that most edges are in Cmax , and
I 6= Cmax . Therefore, `n (I) = o(`n ) = o(n), and we may, without loss of generality,
assume that `n (I) ≤ `n /2. We denote by En the event that there exists a collection of
connected components I consisting of |I| ≤ bn/2c vertices for which the sum of degrees
is at most `n (I) ≤ `n /2, so that En occurs whp, i.e.
P(Enc ) = o(1). (4.4.3)
Clearly, in order for the half-edges incident to vertices in I to be paired only to other
half-edges incident to vertices in I , `n (I) needs to be even. The number of configurations
for which this happens is bounded above by
(`n (I) − 1)!!(`n (I c ) − 1)!!. (4.4.4)
As a result,
X (`n (I) − 1)!!(`n (I c ) − 1)!!
P(CMn (d) disconnected; En ) ≤
I⊆[n]
(`n − 1)!!
`n (I)/2
X Y `n (I) − 2j + 1
= , (4.4.5)
I⊆[n] j=1
`n − 2j + 1
where the sum over I ⊆ [n] is restricted to sets I for which 1 ≤ |I| ≤ bn/2c and
`n (I) ≤ `n /2 is even. Exercise 4.13 uses (4.4.5) to bound the probability of the existence
of an isolated vertex (i.e., a vertex with only self-loops).
Define
x
Y 2x − 2j + 1
f (x) = , (4.4.6)
` − 2j + 1
j=1 n
so that
X
P(CMn (d) disconnected; En ) ≤ f (`n (I)/2). (4.4.7)
I⊆[n]
We can rewrite
Qx Qx−1 x−1
(2x − 2j + 1) (2i + 1) Y 2j + 1
f (x) = Qj=1
x = Qx−1i=0 = , (4.4.8)
j=1 (`n − 2j − 1) k=0 (`n − 2k + 1)
` − 2j + 1
j=0 n
where we set j = x−i and j = k+1 in the second equality. Thus, for x ≤ `n /4, x 7→ f (x)
is decreasing because
f (x + 1) 2x + 1
= ≤ 1. (4.4.9)
f (x) `n − 2x + 1
4.4 Connectivity of Configuration Models 175
Since `n (I) ≤ `n /2, we also have that `n (I)/2 ≤ `n /4, so that f (`n (I)/2) ≤ f (a) for
any a ≤ `n (I)/2. Now, since di ≥ 3 for every i ∈ [n] and `n (I) ≤ `n /2 is even,
Define
!
n
hn (m) = f (d3m/2e), (4.4.12)
m
so that
bn/2c
X
P(CMn (d) disconnected; En ) ≤ hn (m). (4.4.13)
m=1
hn (m + 1) n − m f (d3(m + 1)/2e)
= . (4.4.14)
hn (m) m+1 f (d3m/2e)
Note that, for m odd,
f (d3(m + 1)/2e) f ((3m + 1)/2 + 1) 3m + 2
= = . (4.4.15)
f (d3m/2e) f ((3m + 1)/2) `n − 3m
while, for m even,
f (d3(m + 1)/2e) f (3m/2 + 2) 3m + 3 3m + 1
= = . (4.4.16)
f (d3m/2e) f (3m/2) `n − 3m − 1 `n − 3m + 1
Thus, for m odd and using `n ≥ 3n,
hn (m + 1) n − m 3m + 2 3(n − m)
= ≤ ≤ 1, (4.4.17)
hn (m) m + 1 `n − 3m `n − 3m
while, for m even and using `n ≥ 3n,
hn (m + 1) n − m 3m + 3 3m + 1 3m + 1
= ≤ . (4.4.18)
hn (m) m + 1 `n − 3m − 1 `n − 3m + 1 `n − 3m − 1
Thus, we obtain that, for m ≤ n/2 and since `n ≥ 3n, there exists a c > 0 such that
hn (m + 1) c
≤1+ . (4.4.19)
hn (m) n
176 Connected Components in Configuration Models
We then follow the proof of Theorem 4.24, and now define I as the collection of com-
ponents that satisfies `n (I) ≤ `n /2. It should be remarked that in this case we cannot rely
upon Theorem 4.9, which implies (4.4.3). Theorem 4.9 was used to show that `n (I) ≤ `n /2
and |I| ≤ n/2 whp. The fact that `n (I) ≤ `n /2 was used in (4.4.9) to show that x 7→ f (x)
is decreasing for the appropriate x, and this still holds. The fact that |I| ≤ n/2 was used to
restrict the sum over m in (4.4.11) and the formulas that followed it, which we can now no
longer use, and thus we need an alternative argument.
In the current setting, since the degrees are all in {3, 4, 5},
3|I| ≤ `n (I) ≤ `n /2, (4.4.26)
so that m ≤ `n /6. Following the proof of Theorem 4.24 up to (4.4.13), we thus arrive at
b`n /6c
X
P(CMn (d) disconnected) ≤ hn (m). (4.4.27)
m=1
The bound in (4.4.17) remains unchanged since it did not rely on m ≤ n/2, while, for
m ≤ `n /6, (4.4.18) can be bounded as follows:
3m + 1
≤ 1 + O(1/n). (4.4.28)
`n − 3m − 1
As a result, both (4.4.17) and (4.4.18) remain valid, proving that hn (m + 1)/hn (m) ≤
1 + c/n. We conclude that the proof can be completed as for Theorem 4.24.
The above proof is remarkably simple, and requires very little of the precise degree dis-
tribution to be satisfied except for dmin ≥ 3. In what follows, we investigate what happens
when this fails. We first continue by showing that CMn (d) is with positive probability dis-
connected when n1 , the number of vertices of degree 1, satisfies n1 n1/2 :
Proposition 4.26 (Disconnectivity of CMn (d) when n1 n1/2 ) Let Conditions 1.7(a),(b)
hold, and assume that n1 n1/2 . Then
lim P(CMn (d) connected) = 0. (4.4.29)
n→∞
Proof We note that CMn (d) is disconnected when there are two vertices of degree 1 whose
half-edges are paired with each other. When the half-edges of two vertices of degree 1 are
paired with each other, we say that a 2-pair is created. Then, since after i pairings of degree-1
vertices to higher-degree vertices, there are `n − n1 − i + 1 half-edges incident to higher-
degree vertices, out of a total of `n − 2i + 1 unpaired half-edges, we have
n1
Y `n − n 1 − i + 1
P(CMn (d) contains no 2-pair) =
i=1
`n − 2i + 1
n1
Y n1 − i
= 1− . (4.4.30)
i=1
`n − 2i + 1
For each i ≥ 1,
n1 − i n1 − i
1− ≤1− ≤ e−(n1 −i)/`n , (4.4.31)
`n − 2i + 1 `n
178 Connected Components in Configuration Models
so that we arrive at
n1
Y
P(CMn (d) contains no 2-pair) ≤ e−(n1 −i)/`n
i=1
−n1 (n1 −1)/[2`n ]
=e = o(1), (4.4.32)
Proposition 4.27 (Disconnectivity of CMn (d) when p2 > 0) Let Conditions 1.7(a),(b)
hold, and assume that p2 > 0. Then,
By assumption, p2 > 0, so that also λ2 > 0. By investigating the higher factorial moments,
d
and using [V1, Theorem 2.6], it follows that Pn (2) −→ Poi(λ2 ), so that
as required. The proof that [V1, Theorem 2.6] can be applied is Exercise 4.18.
Theorem 4.28 (Almost-connectivity of CMn (d) when p1 = 0) Consider CMn (d) where
the degrees d satisfy Conditions 1.7(a),(b), and assume that p2 ∈ (0, 1). Also assume that
dv ≥ 2 for every v ∈ [n]. Then
d
X
n − |Cmax | −→ kXk , (4.4.37)
k≥2
where (Xk )k≥2 are independent Poisson random variables with parameters λk = λk 2k−1 /k
with λ = p2 /E[D]. Consequently,
P 2 k
P(CMn (d) connected) → e− k≥2 (2λ ) /(2k)
∈ (0, 1). (4.4.38)
4.5 Related Results for Configuration Models 179
Rather than giving the complete proof of Theorem 4.28, we give a sketch of it:
Sketch of proof of Theorem 4.28. Let Pn (k) denote the number of k -cycles consisting of
degree-2 vertices, for k ≥ 2. Obviously, every vertex in such a cycle is not part of the giant
component, so that
X
n − |Cmax | ≥ kPn (k). (4.4.39)
k≥2
d
A multivariate moment method allows one to prove that (Pn (k))k≥2 −→ (Xk )k≥2 , where
(Xk )k≥2 are independent Poisson random variables with parameters (see Exercise 4.19)
In order to complete the argument, two approaches are possible (and have been used in
the literature). First, Federico and van der Hofstad (2017) used counting arguments to show
that as soon as a connected component has at least one vertex v of degree dv ≥ 3, then it
is whp part of the giant component Cmax . This then proves that (4.4.39) is whp an equality.
See also Exercise 4.20.
Alternatively, and more in the style of Łuczak (1992), one can pair up all the half-edges
incident to vertices of degree 2, and then realize that the graph, after pairing of all these
degree-2 vertices, is again a configuration model with a changed degree distribution. The
cycles consisting of only degree-2 vertices will be removed, so that we need only to consider
the contribution of pairing strings of degree-2 vertices to vertices of degrees at least 3. If both
ends of the string are each connected to two distinct vertices of degrees ds , dt at least 3, then
we can imagine this string to correspond to a single vertex of degree ds + dt − 2 ≥ 4, which
is sufficiently large.
Unfortunately, it is also possible that the string of degree-2 vertices is connected to the
same vertex u of degree du ≥ 3, thus possibly reducing the degree by 2. When du ≥ 5,
there are still at least three half-edges remaining at u. Thus, we need only to consider the case
where we create a cycle of vertices of degree 2 with one vertex u in it of degree du = 3 or
du = 4, respectively, which corresponds to vertices of remaining degree 1 or 2, respectively.
In Exercise 4.21, the reader is asked to prove that there is a bounded number of such cycles.
We conclude that it suffices to extend the proof of Theorem 4.24 to the setting where there
is a bounded number of vertices of degrees 1 and 2. The above argument can be repeated
for the degree-2 vertices. We can deal with the degree-1 vertices in a similar way. Pairing
the degree-1 vertices again leads to vertices of remaining degree at least 3 − 1 = 2 after the
pairing, which is fine when the remaining degree is at least 3; otherwise they can be dealt
with in the same way as the other degree-2 vertices. We refrain from giving more details.
In this section we discuss related results on connected components for the configuration
model. We start by discussing the subcritical behavior of the configuration model.
180 Connected Components in Configuration Models
there are many multi-edges between vertices of degree of order dmax in CMn (d), and the
conditioning on being simple thus has a dramatic effect.
As a consequence of Theorem 4.30, we obtain that, subject to its assumptions,
n `
n E[Dn2 ] 3 X o
P(CMn (d) simple) = exp − + + + log(1 + du dv /`n ) + o(1) ;
2 2E[Dn ] 4 1≤u<v≤n
(4.5.3)
recall (4.2.39) for a more general, but weaker, estimate. In Exercise 4.29 the reader can show
that (4.5.3) is indeed e−o(n) under the conditions of Theorem 4.30.
|E(H)|
κ(G) = max (4.5.4)
∅6=H⊆G |V (H)|
be the density of the densest subgraph of G. It is far from obvious that the asymptotics of
κ(Gn ) can be described using local convergence methodologies, but a deep relation exists:
Theorem 4.31 (Densest subgraph of sparse CM) Consider CMn (d) subject to Condition
1.7(a), and assume that P(D = 1) < 1 as well as that there exists θ > 0 such that
P
Then κ(CMn (d)) −→ κ(µ), where µ is the law of the unimodular branching process with
root offspring distribution (pk )k≥1 with pk = P(D = k), and κ(µ) is defined in (4.5.9)
below.
Theorem 4.31 describes the convergence of the edge density of the densest subgraph of
CMn (d) as well as the fact that its limit is a functional of the local limit, as described in
Theorem 4.1. Theorem 4.31 holds under a strong degree assumption, in the sense that the
degree distribution has exponentially small tails. It is unclear whether Theorem 4.31 remains
valid when (4.5.5) fails. We refer to the notes and discussion in Section 4.6 for more details.
Exercise 4.30 shows that Condition 1.7(a) and (4.5.5) imply that Conditions 1.7(b),(c) hold.
Let us now shed some light on the how the proof of Theorem 4.31 can be related to local
convergence. This proof is beautiful, while at the same time also technically demanding.
The proof highlights how this link can be used to define κ(µ), as well as to establish the
convergence of κ(CMn (d)) to it. This connection is through load balancing problems.
Let G = (V (G), E(G)) be a finite, simple, undirected graph. As before, we write E(G) ~
for the set of directed edges, formed by replacing each edge {u, v} ∈ E(G) by the two
directed edges (u, v) and (v, u). An allocation on G is a map θ : E(G)~ → [0, 1] satisfying
θ(u, v) + θ(v, u) = 1 for every {u, v} ∈ E(G). The load induced by θ at a vertex o ∈
182 Connected Components in Configuration Models
V (G) is given by
X
∂θ(o) := θ(o, v). (4.5.6)
v : {v,o}∈E(G)
~
An allocation θ is balanced when, for every (u, v) ∈ E(G), ∂θ(u) < ∂θ(v) implies that
θ(u, v) = 0.
When we are thinking of each edge as carrying a unit amount of load, an allocation needs
to be chosen that distributes load over its endpoints in such a way that the total load is as
balanced as possible across the graph. Thus,
P a balanced allocation optimizes this allocation
problem, in that a balanced θ minimizes v∈V (G) f (∂θ(v)) either over some strictly convex
f : [0, 1] → [0, ∞), or over all convex f : [0, 1] → [0, ∞).
From now on, we let θ denote a balanced allocation. Remarkably, it can be seen that
∂θ(v) measures the local density of G at v ∈ V (G). In particular, in terms of this load
balancing problem,
κ(G) = max ∂θ(v). (4.5.7)
v∈V (G)
We are left with studying the vector (∂θ(v))v∈[n] . Denote the empirical load distribution
by
1
1{∂θ(v)∈A} ,
X
LG (A) = (4.5.8)
|V (G)| v∈V (G)
for every Borel set A ⊆ [0, ∞). When Gn converges locally, one would also expect that
LGn (A) → L(A) for some limiting measure L. This indeed turns out to be true (but is
technically quite challenging). In fact, it turns out that if Gn converges locally to (G, o) ∼ µ
then L = Lµ . In terms of L, we have the characterization that
κ(µ) = sup{t ∈ R : L[t, ∞) > 0}. (4.5.9)
Unfortunately, this is not the end of the story. Indeed, by the above, one would expect that
P
κ(Gn ) −→ κ(µ) if Gn converges locally in probability to (G, o) ∼ µ. This, however, is
far from obvious as the graph parameter κ(G) is too sensitive to be controlled only by local
convergence. Indeed, let Gn converge locally, and add a disjoint clique Kmn to Gn of size
mn = o(n) to obtain G+ +
n . Then, obviously, κ(Gn ) = max{κ(Gn ), (mn − 1)/2}. Thus,
the precise structure of the graph Gn is highly relevant, and CMn (d) under the condition
(4.5.5) turns out to be “nice enough.”
We do not prove Theorem 4.31 but instead indicate how (4.5.5) can be used to show that
κ(CMn (d)) is bounded. This proceeds in four key steps.
In the first step, we investigate the number of edges NS between vertices in a set S ⊆
[n], and show that NSPis stochastically bounded by a binomial random variable with mean
d2S /m, where dS = v∈S dv is the total degree of the set S . This can be seen by pairing
the half-edges one by one, giving priority to the half-edges incident to vertices in S . Let
(Xt )t≥1 denote the Markov chain that describes the number of edges with both endpoints in
S after t pairings. Then, conditioning on (Xs )ts=1 , the probability that Xt+1 = Xt + 1 is
(dS − Xt − t − 1)+ dS − t − 1 d
≤ 1{t≤dS } ≤ S 1{t≤dS } . (4.5.10)
`n − 2t − 1 `n − 2t − 1 `n
4.5 Related Results for Configuration Models 183
using the crude bounds x2r ≤ (2r)!ex for x = dS θ, and (2r)!/r! ≤ (2r)r . As a result, with
Xk,r denoting the number of subgraphs in CMn (d) with k vertices and at least r edges,
X 2r r X Y
E[Xk,r ] ≤ P(NS ≥ r) ≤ 2`
eθds
|S|=k
θ n
|S|=k s∈S
2r r 1 X k 2r r e X k
≤ 2 eθdv ≤ 2 eθdv , (4.5.13)
θ `n k! v∈[n] θ `n k v∈[n]
since k! ≥ (k/e)k . We can rewrite the resulting bound slightly more conveniently. Denote
α = sup E[Dn ], λ = sup E[eθDn ], (4.5.14)
n≥1 n≥1
and pick θ > 0 small enough that λ < ∞. It is here that (4.5.5) is crucially used. Then,
2r r eλn k
E[Xk,r ] ≤ . (4.5.15)
θ2 αn k
In the third step, we first note that, for any set S ⊆ [n] with |S| ≥ nδ , the edge density
of S is at most
dS `n E[Dn ]
≤ = , (4.5.16)
2|S| 2δn 2δ
which remains uniformly bounded. Thus, to show that κ(CMn (d)) is uniformly bounded, it
suffices to analyze sets of size at most δn. For δ ∈ (0, 1) and t > 1, we then let Zδ,t denote
the number of subsets S with |S| ≤ δn and |E(S)| ≥ t|S| in CMn (d). We next show that
there exists a δ > 0 such that, for every t > 1, there exists a χ < ∞ such that
log n t−1
E[Zδ,t ] ≤ χ . (4.5.17)
n
In particular, Zδ,t = 0 whp, so that the density of the densest subgraph is bounded by
(1 + ε)(1 ∧ E[D]/(2δ)). In order to see (4.5.17), we note that
δn
X
E[Zδ,t ] = E[Xk,dkte ]. (4.5.18)
k=1
By (4.5.15),
2dkte dkte k dkte−k
E[Xk,dkte ] ≤ (eλ)k ≤ f (k/n)k , (4.5.19)
θ2 αk n
184 Connected Components in Configuration Models
where we define
2(t + 1) t+1
f (δ) = 1 ∨ (eλ)δ t−1 . (4.5.20)
θ2 α
We choose δ ∈ (0, 1) small enough that f (δ) < 1. Note that δ 7→ f (δ) is increasing, so
that, for every 1 ≤ m ≤ δn,
m
X δn
X
E[Zδ,t ] = f (m/n)k + f (δ)k
k=1 k=m+1
f (m/n) f (δ)m
≤ + . (4.5.21)
1 − f (m/n) 1 − f (δ)
Finally, choose m = c log n with c fixed. Then f (m/n) is of order (log n/n)t−1 , while
f (δ)m (log n/n)t−1 when c is large enough. This proves (4.5.17).
The fourth step concludes the proof. The bound in (4.5.17) shows that κ(CMn (d)) re-
mains uniformly bounded. Further, it also shows that either there is a set of size at least δn
whose density is at least t, or Zδ,t > 0. The latter occurs with vanishing probability for the
appropriate δ > 0, so that whp there is a set of size at least δn whose density is at least t.
The fact that such high-density sets must be large is crucial to go from the convergence of
LGn to Lµ (which follows from local convergence) to that of κ(CMn (d)) to κ(µ) (which,
as we have seen, generally does not follow from local convergence). Indeed, local conver-
gence has direct implications on the local properties of only a positive proportion of vertices,
so problems might arise in this convergence should the maximum in (4.5.7) be carried by a
vanishing proportion of vertices.
with infinite variance degree is not exponentially small (recall Lemma 4.8), this implies Theorem 4.21. In
their statement of the main result implying Theorem 4.21, Bollobás and Riordan (2015) used a condition
slightly different from (4.3.100), namely, that there exists a p > 1 such that
E[Dnp ] → E[Dp ] < ∞. (4.6.1)
It is straightforward to show that (4.6.1) for some p > 1 holds precisely when (4.3.100) holds for some
τ > 2. See Exercise 4.31.
The sharpest results for n2 = n(1 − o(1)) are in Federico (2023), to which we refer for details. There,
Federico proved the results in Exercises 4.6 and 4.7 and derived the exact asymptotics of n − |Cmax |.
Barbour and Röllin (2019) proved a central limit theorem for the giant in Theorem 4.9, where the asymp-
totics of the variance already had already been identified by Ball and Neal (2017). Janson (2020a) (see also
Janson (2020b)) lifted the simplicity condition under Condition 1.7(a)–(c) using switchings, so that the
results extend to uniform random graphs with prescribed degrees, as conjectured in Barbour and Röllin
(2019). Janson and Luczak (2008) proved related central limit theorems for the k-core in the configuration
model.
Exercise 4.4 (Proof of no-overlap property in (4.2.17)) Subject to the conditions in Theorem 4.1, prove
that P(Br(Gn ) (o1 ) ' t, o2 ∈ B2r
(Gn )
(o1 )) → 0, and conclude that the no-overlap property in (4.2.17)
holds.
Exercise 4.5 (Component size of vertex 1 in a 2-regular graph) Consider CMn (d) where all degrees are
equal to 2, i.e., n2 = n. Let C (1) denote the size of the connected component of vertex 1. Show that
d
|C (1)|/n −→ T, (4.7.1)
√
where P(T ≤ x) = 1 − 1 − x.
Exercise 4.6 (Component size in a 2-regular graph with some degree-1 vertices) Consider CMn (d) with
n1 → ∞ with n1 /n → 0, and n2 = n − n1 . Let C (1) denote the size of the connected component of
vertex 1. Show that
P
|C (1)|/n −→ 0. (4.7.2)
Exercise 4.7 (Component size in a 2-regular graph with some degree-4 vertices) Consider CMn (d) with
n4 → ∞ with n4 /n → 0, and n2 = n − n4 . Let C (1) denote the the size of the connected component of
vertex 1. Show that
P
|C (1)|/n −→ 1. (4.7.3)
Exercise 4.8 (Expected degree giant in CMn (d)) Prove that Eµ do 1{|C (o)|=∞} = E[D](1 − ξ 2 )
h i
as claimed in (4.3.6), where µ is the law of the unimodular branching-process tree with root offspring
distribution (pk )k≥0 given by pk = P(D = k).
Exercise 4.9 (Limiting constants in Theorem 4.9) Recall the constants t1 = 0 and t2 = θ = − log ξ,
where ξ is the zero of H given by Lemma 4.18(a). Prove that for t1 = 0 and t2 = θ, e−t1 = 1, e−t2 = ξ,
GD (e−t1 ) = 1, GD (e−t2 ) = 1 − ζ, h(e−t1 ) = 2E[D], and h(e−t2 ) = 2E[D]ξ 2 , where, for θ = ∞, e−t2
should be interpreted as 0.
Exercise 4.10 (Giant in UGn (d) for ν = ∞) Combine (4.2.39) and (4.3.101)–(4.3.102) to complete the
proof of the identification of the giant in UGn (d) for ν = ∞ in Theorem 4.21.
Exercise 4.11 (Number of degree-k vertices in giant NRn (w)) Let w satisfy Conditions 1.1(a),(b). Adapt
the proof of |Cmax |/n −→ ζ in Theorem 4.23 to show that vk (Cmax )/n −→ pk (1 − ξ k ) for NRn (w).
P P
Exercise 4.12 (Number of edges in giant NRn (w)) Let w satisfy Conditions 1.1(a),(b). Use Exercise
4.11 to show that |E(Cmax )|/n −→ 12 E[W ](1 − ξ 2 ).
P
Exercise 4.13 (Isolated vertex in CMn (d)) Use (4.4.5) to show that, when dv ≥ 3 for all v ∈ [n],
3n
P(∃ isolated vertex in CMn (d)) ≤ . (4.7.4)
(2`n − 1)(2`n − 3)
Exercise 4.14 (Isolated vertex (Cont.)) Use (4.4.11) to reprove Exercise 4.13. Hence, the bound in (4.4.11)
is quite sharp.
Exercise 4.15 (Connected component of size 2) Use (4.4.11) to prove that, when dv ≥ 3 for all v ∈ [n],
15n(n − 1)
P(∃ component of size 2 in CMn (d)) ≤ . (4.7.5)
(2`n − 1)(2`n − 3)(2`n − 5)
Exercise 4.16 (Lower bound on probability CMn (d) disconnected) Show that
P(CMn (d) disconnected) ≥ c/n
for some c > 0 when P(D = 3) > 0 and E[D] < ∞.
Exercise 4.17 (Lower bound on probability CMn (d) disconnected) Show that
P(CMn (d) disconnected) ≥ c/n
for some c > 0 when P(D = 4) > 0 and E[D] < ∞.
4.7 Exercises for Chapter 4 187
Exercise 4.18 (Factorial moments of Pn (2)) Consider CMn (d) subject to Conditions 1.7(a),(b), and
assume that p2 > 0. Let Pn (2) denote the number of 2-cycles consisting of two vertices of degree 2. Prove
that, for every k ≥ 1 and with λ2 = p22 /E[D]2 ,
E[(Pn (2))k ] → λk2 , (4.7.6)
d
where we recall that xk = x(x − 1) · · · (x − k + 1). Conclude that Pn (2) −→ Poi(λ2 ).
Exercise 4.19 (Cycles in CMn (d)) Let Pn (k) denote the number of k-cycles consisting of degree-2
d
vertices, for k ≥ 2. Let λ = p2 /E[D]. Use the multivariate moment method to prove that (Pn (k))k≥2 −→
(Xk )k≥2 , where (Xk )k≥2 are independent Poisson random variables with parameters
Exercise 4.20 (Cmax when dmin = 2) Consider CMn (d) with dmin = 2 and assume that P(D ≥ 3) > 0.
Show that Theorem 4.28 holds if P(∃v : dv ≥ 3 and v 6∈ Cmax ) = o(1).
Exercise 4.21 (Cycles of degree 2 vertices with one other vertex) Subject to Conditions 1.7(a),(b) and
dmin ≥ 2, show that the expected number of cycles consisting of vertices of degree 2 with a starting and
ending vertex of degree k converges to
k(k − 1)pk X
(2p2 /E[D])` .
2E[D]2
`≥1
Exercise 4.22 (Subcritical power-law GRGn (w) in Theorem 3.22) Use the size of the largest connected
component in the subcritical power-law CMn (d) in Theorem 4.29, combined with Theorem 1.9, to identify
the largest connected component in the subcritical power-law GRGn (w) in Theorem 3.22.
Exercise 4.23 (Sharp asymptotics in Theorem 4.29) Recall the setting of the largest connected subcritical
component in CMn (d) in Theorem 4.29. Prove that |Cmax | = dmax /(1 − ν)(1 + oP (1)) precisely when
dmax = Θ(n1/(τ −1) ). Prove that |Cmax | = dmax /(1−ν)(1+oP (1)) precisely when dmax = Θ(n1/(τ −1) ).
Exercise 4.24 (Sub-polynomial subcritical clusters) Use Theorem 4.29 to prove that |Cmax | = oP (nε ) for
every ε > 0 when (4.5.1) holds for every τ > 1. Thus, when the maximal degree is sub-polynomial in n,
also the size of the largest connected component is.
Exercise 4.25 (Single tree asymptotics in Theorem 4.29) Assume that the conditions in Theorem 4.29
hold. Use Theorem 4.1 to prove that the tree rooted at any half-edge incident to the vertex of maximal
degree converges in distribution to a subcritical branching process with expected total progeny 1/(1 − ν).
Exercise 4.26 (Two-tree asymptotics in Theorem 4.29) Assume that the conditions in Theorem 4.29 hold.
Use the local convergence in Theorem 4.1 to prove that the two trees rooted at any pair of half-edges
incident to the vertex of maximal degree jointly converge in distribution to two independent subcritical
branching processes with expected total progeny 1/(1 − ν).
Exercise 4.27 (Theorem 4.29 when dmax = o(log n)) Assume that the subcritical conditions in Theorem
4.29 hold, so that ν < 1 . Suppose that dmax = o(log n). Do you expect |Cmax | = dmax /(1−ν)(1+oP (1))
to hold? Note: No proof is expected; a reasonable argument will suffice.
Exercise 4.28 (Theorem 4.29 when dmax log n) Assume that the subcritical conditions in Theorem
4.29 hold, so that ν < 1. Suppose that dmax log n. Do you expect |Cmax | = dmax /(1 − ν)(1 + oP (1))
to hold? Note: No proof is expected; a reasonable argument will suffice.
Exercise 4.29 (Probability of simplicity in Theorem 4.30) Subject to the conditions in Theorem 4.30,
show that (4.5.3) implies that P(CMn (d) simple) = e−o(n) , as proved in Lemma 4.8.
Exercise 4.30 (Exponential moments) Show that Condition 1.7(a) and supn≥1 E[eθDn ] < ∞ as in (4.5.5)
imply that E[Dnp ] → E[Dp ] for every p > 0. Conclude that then also Conditions 1.7(b)-(c) hold.
Exercise 4.31 (Moment versus tails) Show that E[Dnp ] → E[Dp ] < ∞ for some p > 1 precisely when
[1 − Fn ](x) ≤ cF x−(τ −1) for all x ≥ 1 and some τ > 2.
C HAPTER 5
C ONNECTED C OMPONENTS IN
P REFERENTIAL ATTACHMENT M ODELS
Abstract
In this chapter we investigate the connectivity structure of preferential attach-
ment models. We start by discussing an important tool: exchangeable random
variables and their distribution as described in de Finetti’s Theorem. We ap-
ply these results to Pólya urn schemes, which, in turn, we use to describe the
distribution of the degrees in preferential attachment models.
It turns out that Pólya urn schemes can also be used to describe the local limit
of preferential attachment models. A crucial ingredient is the fact that the edges
in the Pólya urn representation are conditionally independent, given the appro-
priate randomness. The resulting local limit is the Pólya point tree, a specific
multi-type branching process with continuous types.
The models discussed so far share the property that they are static and their edge-connection
probabilities are close to being independent. As discussed at great length in [V1, Chapter
8], see also Section 1.3.5, preferential attachment models were invented for their dynamic
structure: since edges incident to younger vertices connect to older vertices in a way that
favours high-degree vertices, preferential attachment models develop power-law degree dis-
tributions. This intuitive dynamics comes at the expense of creating dynamic models in
which edge-connection probabilities are hard to compute. As a result, we see that proofs for
preferential attachment models are generally substantially harder than those for inhomoge-
neous random graphs and configuration models.
In this chapter, we explain how this difference can be overcome, to some extent, by real-
izing that the degree evolution in preferential attachment models can be described in terms
of exchangeable random variables. Because of this, we can describe these models in terms
of independent edges, given some appropriate extra randomness.
189
190 Connected Components in Preferential Attachment Models
The notion of exchangeability is rather strong and implies for example that the distribution
of Xi is the same for every i (see Exercise 5.1) as well as that (Xi , Xj ) have the same
distribution for every i 6= j .
Clearly, when a sequence of random variables is iid then it is also exchangeable (see Exer-
cise 5.2). A second example arises when we take a sequence of random variables that are iid
conditionally on some random variables. An example could be a sequence of Bernoulli ran-
dom variables that are iid conditional on their success probability U but U itself is random.
This is called a mixture of iid random variables. Remarkably, the distribution of an infinite
sequence of exchangeable random variables is always such a mixture. This is the content of
de Finetti’s Theorem, which we state and prove here in the case where (Xi )i≥1 are indicator
variables:
Theorem 5.2 (De Finetti’s Theorem) Let (Xi )i≥1 be an infinite sequence of exchangeable
random variables, and assume that Xi ∈ {0, 1}. Then there exists a random variable U
with P(U ∈ [0, 1]) = 1 such that, for all n ≥ 1 and k ∈ [n],
The theorem of de Finetti (Theorem 5.2) states that an infinite exchangeable sequence of
indicators has the same distribution as an independent Bernoulli sequence with a random
success probability U . Thus, the different elements of the sequence are not independent, but
their dependence enters only through the success probability U .
The proof of Theorem 5.2 can be relatively easily extended to more general settings, for
example, when Xi takes on a finite number of values. Since we are relying on Theorem 5.2
only for indicator variables, we refrain from stating this more general version.
Define Sn to be the number of ones in (Xi )ni=1 , i.e.,
n
X
Sn = Xk . (5.2.2)
k=1
The reader is asked to prove (5.2.3) in Exercise 5.4. Equation (5.2.3) also allows us to com-
pute the distribution of U . Indeed, when we suppose that
Z b
lim P(Sn ∈ (an, bn)) = f (u)du, (5.2.4)
n→∞ a
where f is a density, then (5.2.3) implies that f is in fact the density of the random variable
U . This is useful in applications of de Finetti’s Theorem (Theorem 5.2). Equation (5.2.4)
a.s.
follows by noting that Sn /n −→ U by the strong law of large numbers applied to the
conditional law given U . In Exercise 5.3, you are asked to fill in the details.
Proof of Theorem 5.2. The proof makes use of Helly’s Theorem, which states that any se-
quence of bounded random variables has a weakly converging subsequence. We fix m ≥ n
and condition on Sm to write
P(X1 = · · · = Xk = 1, Xk+1 = · · · = Xn = 0) (5.2.5)
Xm
P X1 = · · · = Xk = 1, Xk+1 = · · · = Xn = 0 | Sm = j P(Sm = j).
=
j=k
m
By exchangeability, and conditional on Sm = j , each sequence (Xi )i=1 containingm−n
pre-
m
cisely j ones is equally likely. There are precisely j such sequences, and precisely j−k
of them start with k ones followed by n − k zeros. Thus,
m−n
j−k
P X1 = · · · = Xk = 1, Xk+1 = · · · = Xn = 0 | Sm = j = m .
(5.2.6)
j
where Ym = Sm /m. Note that it is here that we make use of the fact that (Xi )i≥1 is an in-
finite exchangeable sequence of random variables. Equation (5.2.9) is the point of departure
for the completion of the proof.
We have that Ym ∈ [0, 1] since Sm ∈ [0, m], so that the sequence of random variables
192 Connected Components in Preferential Attachment Models
(Ym )m≥1 is bounded. By Helly’s Theorem, it thus contains a weakly converging subse-
quence, i.e., there exists a subsequence (Yml )l≥1 with liml→∞ ml = ∞ and a random
d
variable U such that Yml −→ U . Since the random variable Ymk (1 − Ym )n−k is uniformly
bounded for each k, n, Lebesgue’s Dominated Convergence Theorem ([V1, Theorem A.1])
gives that
lim E Ymk (1 − Ym )n−k = lim E Ymk l (1 − Yml )n−k
m→∞ l→∞
= E U k (1 − U )n−k .
(5.2.10)
This completes the proof. Yet a careful reader may wonder whether the above proof on the
basis of subsequences is enough. Indeed, it is possible that another subsequence (Ym0l )l≥1
d
with liml→∞ m0l = ∞ has a different limiting random variable V such that Ym0l −→ V.
However, from (5.2.9) we then conclude that E V k (1 − V )n−k = E U k (1 − U )n−k
for every k, n. In particular, E[V k ] = E[U k ] for every k ≥ 0. Since the random variables
U, V are almost surely bounded by 1, and have the same moments, they also have the same
d
distribution. We conclude that Yml −→ U for every subsequence (ml )l≥1 along which
d
(Yml )l≥1 converges, and this is equivalent to Ym −→ U .
The theorem of de Finetti implies that if Xk and Xn are coordinates of an infinite ex-
changeable sequence of indicators then they are positively correlated; see Exercise 5.5. Thus,
it is impossible for infinite exchangeable sequences of indicator variables to be negatively
correlated, which is somewhat surprising.
In the proof of de Finetti’s Theorem, it is imperative that the sequence (Xi )i≥1 is infinite.
This is not merely a technicality of the proof. Rather, there are finite exchangeable sequences
of random variables for which the equality (5.2.1) does not hold. Indeed, take an urn filled
with b blue and r red balls, and draw balls successively without replacement. Thus, the urn
is sequentially being depleted, and it will be empty after the (b + r)th ball is drawn. Let Xi
denote the indicator that the ith ball drawn is blue. Then, clearly, the sequence (Xi )r+bi=1 is
exchangeable. However,
b(b − 1)
P(X1 = X2 = 1) =
(b + r)(b + r − 1)
b 2
< ) = P(X1 = 1)P(X2 = 1), (5.2.11)
b+r
so that X1 and X2 are negatively correlated.
After drawing a ball, it is replaced together with a second ball of the same color; we denote
this Pólya urn scheme by ((Bn , Rn ))n≥0 . Naturally, since we always replace one ball by
two balls, the total number of balls Bn + Rn = b0 + r0 + n is deterministic.
In this section, we restrict to the case where there exist ar , ab > 0 such that
i.e., both weight functions are linear with the same slope, but possibly a different intercept.
Our main result concerning Pólya urn schemes is the following theorem:
Theorem 5.3 (Limit theorem for linear Pólya urn schemes) Let ((Bn , Rn ))n≥0 be a Pólya
urn scheme starting with (B0 , R0 ) = (b0 , r0 ) balls of each color, and with linear weight
functions Wb and Wr as in (5.2.13) for some ar , ab > 0. Then, as n → ∞,
Bn a.s.
−→ U, (5.2.14)
Bn + Rn
where U has a Beta distribution with parameters a = b0 + ab and b = r0 + ar , and, for all
k ≤ n,
h i
P(Bn = b0 + k) = E P Bin(n, U ) = k . (5.2.15)
Before proving Theorem 5.3, let us comment on its remarkable content. Clearly, the num-
ber of blue balls Bn is not a binomial random variable, as early draws of blue balls reinforce
the proportion of blue balls in the end. However, (5.2.15) states that we can first draw a ran-
dom variable U and then, conditionally on that random variable, the number of blue balls
is binomial. This is an extremely useful perspective, as we will see later on. The urn con-
ditioned on the limiting variable U is sometimes called a Pólya urn with strength U , and
Theorem 5.3 implies that this is a mere binomial experiment given the strength. The vari-
ables a = b0 + ab and b = r0 + ar of the Beta distributions indicate the initial weights of
each of the two colors.
Proof of Theorem 5.3. We start with the almost sure convergence in (5.2.14). Let Mn =
(Bn + ab )/(Bn + Rn + ab + ar ). Note that
1
E[Mn+1 | (Bl )nl=1 ] = E[(Bn+1 + ab ) | Bn ]
Bn+1 + Rn+1 + ab + ar
1 h Bn + ab i
= Bn + ab +
Bn+1 + Rn+1 + ab + ar Bn + Rn + ab + ar
Bn + ab h Bn + Rn + ab + ar + 1 i
=
Bn+1 + Rn+1 + ab + ar Bn + Rn + ab + ar
Bn + ab
= = Mn , (5.2.16)
Bn + Rn + ab + ar
since Bn+1 + Rn+1 + ab + ar = Bn + Rn + ab + ar + 1. As a result, (Mn )n≥0 is a
non-negative martingale, and thus converges almost surely to some random variable U by
the Martingale Convergence Theorem ([V1, Theorem 2.24]).
We continue by identifying the limiting random variable in (5.2.14), which will follow
194 Connected Components in Preferential Attachment Models
from (5.2.15). Let Xn denote the indicator that the nth ball drawn is blue. We first show that
(Xn )n≥1 is an infinite exchangeable sequence. Note that
n
X n
X
Bn = b0 + Xj , Rn = r0 + (1 − Xj ) = r0 + b0 + n − Bn . (5.2.17)
j=1 j=1
while
n
Y k−1
Y n
Y n−k−1
Y
Wb (bt−1 )xt = (b0 + ab + m), Wr (rt−1 )1−xt = (r0 + ar + j).
t=1 m=0 t=1 j=0
(5.2.20)
Thus, we arrive at
Qk−1 Qn−k−1
m=0 (b + m) j=0 (r + j)
P (Xt )nt=1 (xt )nt=1
= = Qn−1 , (5.2.21)
t=0 (b + r + t)
of k ones and n − k zeros. Each sequence has the same probability, given by (5.2.21). Thus,
! Qk−1 Qn−k−1
n m=0 (b + m) j=0 (r + j)
P(Sn = k) = Qn−1
k t=0 (b + r + t)
Γ(n + 1) Γ(k + b) Γ(n − k + r) Γ(b + r)
= × × ×
Γ(k + 1)Γ(n − k + 1) Γ(b) Γ(r) Γ(n + b + r)
Γ(b + r) Γ(k + b) Γ(n − k + r) Γ(n + 1)
= × × × . (5.2.22)
Γ(r)Γ(b) Γ(k + 1) Γ(n − k + 1) Γ(n + b + r)
For k and n − k large, by [V1, (8.3.9)],
Γ(b + r) k b−1 (n − k)r−1
P(Sn = k) = (1 + o(1)). (5.2.23)
Γ(r)Γ(b) nb+r−1
5.2 Exchangeable Random Variables and Pólya Urn Schemes 195
where Ui is independent of (U1 , . . . , Ui−1 ) and has a Beta distribution with parameters
a = ki + ai and b = k[i,`] + a[i,`] . This gives not only an extension of Theorem 5.3 to urns
with multiple colors but also an appealing independence structure of the limits.
to be possible, we need d1 + d2 to be even, and the graph may contain self-loops and
multiple edges. After this, we successively attach vertices to older vertices with probability
proportional to the degree plus δ > −1. We do not allow for self-loops in the growth
of the trees, so that the structures connected to vertices 1 and 2 are trees (but the entire
structure is not when d1 + d2 > 2). This is a generalization of (PA(1,δ)
n (b))n≥2 , in which we
are more flexible in choosing the initial graph. The model for (PA(1,δ)
n (b))n≥1 arises when
d1 = d2 = 2 (see Exercise 5.8). For (PA(1,δ) n (d))n≥1 , d1 = d2 = 1 is the most relevant
(1,δ)
(recall from Section 1.3.5 that (PAn (d))n≥1 starts at time 1 with two vertices and one
edge between them).
We decompose the growing tree into two trees. For i = 1, 2, we let Ti (n) be the tree
of vertices that are closer to vertex i than to vertex 3 − i. Thus, the tree T2 (n) consists of
those vertices for which the path in the tree from the vertex to vertex 1 passes through vertex
2, and T1 (n) consists of the remainder of the scale-free tree. Let Si (n) = |Ti (n)| denote
the number of vertices in Ti (n). Clearly, S1 (n) + S2 (n) = n, which is the total number
of vertices in the tree at time n. We can apply Theorem 5.3 to describe the relative sizes of
T1 (n) and T2 (n):
Theorem 5.4 (Tree decomposition for scale-free trees) For scale-free trees with initial
degrees d1 , d2 ≥ 1, as n → ∞,
S1 (n) a.s.
−→ U, (5.2.29)
n
where U has a Beta distribution with parameters a = (d1 + δ)/(2 + δ) and b = (d2 +
δ)/(2 + δ), and
h i
P(S1 (n) = k) = E P Bin(n − 1, U ) = k − 1 . (5.2.30)
By Theorem 5.4, we can decompose a scale-free tree into two disjoint scale-free trees
each of which contains an almost surely positive proportion of the vertices.
Proof The evolution of (S1 (n))n≥2 can be viewed as a Pólya urn scheme. Indeed, when
S1 (n) = s1 (n), the probability of attaching the (n + 1)th vertex to T1 (n) is equal to
(2s1 (n) + d1 − 2) + δs1 (n)
, (5.2.31)
(2s1 (n) + d1 − 2) + δs1 (n) + 2(s2 (n) + d2 ) + δs2 (n)
since the number of vertices in Ti (n) equals Si (n), while the total degree of Ti (n) equals
(2Si (n) + di − 2). We can rewrite this as
s1 (n) + (d1 − 2)/(2 + δ)
, (5.2.32)
s1 (n) + s2 (n) + (d1 + d2 − 4)/(2 + δ)
which is equal to (5.2.12) in the case (5.2.13) when r0 = b0 = 1 and ab = (d1 − 2)/(2 +
δ), ar = (d2 − 2)/(2 + δ). Therefore, Theorem 5.4 follows directly from Theorem 5.3.
We continue by adapting the above argument to the size of the connected component
of, or subtree containing, vertex 1 in PA(1,δ)
n (a) (recall Section 1.3.5), which we denote by
0
S1 (n):
5.2 Exchangeable Random Variables and Pólya Urn Schemes 197
Theorem 5.5 (Tree decomposition for preferential attachment trees) For PA(1,δ)
n (a), as
n → ∞,
S10 (n) a.s. 0
−→ U , (5.2.33)
n
where U 0 has a mixed Beta distribution with random parameters a = I + 1 and b =
1 + (1 + δ)/(2 + δ), and where, for ` ≥ 2,
P(I = `) = P(first vertex that is not connected to vertex 1 is vertex `). (5.2.34)
Consequently,
h i
P(S1 (n) = k) = E P Bin(n − 1, U 0 ) = k − 1 . (5.2.35)
n (a))n≥1 , as n → ∞,
Theorem 5.6 (Relative degrees in scale-free trees) For (PA(1,δ)
Dk (n) a.s.
−→ ψk , (5.2.38)
D[k] (n)
where D[k] (n) = D1 (n) + · · · + Dk (n) and ψk ∼ Beta(1 + δ, (k − 1)(2 + δ)).
a.s.
By Theorem 1.17, Dk (n)n−1/(2+δ) −→ ξk , where ξk is positive almost surely by the
argument in the proof of [V1, Theorem 8.14]. It thus follows from Theorem 5.6 that ψk =
ξk /(ξ1 + · · · + ξk ). We conclude that Theorem 5.6 allows us to identify properties of the
law of the limiting degrees.
198 Connected Components in Preferential Attachment Models
Proof of Theorem 5.6. Denote the sequence of stopping times (τk (n))n≥2k−1 , by τk (2k −
1) = k − 1, where
τk (n) = inf{t : D[k] (t) = n}, (5.2.39)
i.e., τk (n) is the time where the total degree of vertices [k] equals n. The initial condition
τk (2k − 1) = k − 1 is chosen such that the half-edge incident to vertex k is already con-
sidered to be present at time k − 1, but the receiving end of that edge is not. This guarantees
that also the attachment of the edge of vertex k is properly taken into account.
a.s.
Note that τk (n) < ∞ for every n, since Dj (n) −→ ∞ as n → ∞ for every j . Moreover,
a.s.
since τk (n) −→ ∞ as n → ∞,
Dk (n) Dk (τk (n)) Dk (τk (n))
lim = lim = lim . (5.2.40)
n→∞ D[k] (n) n→∞ D[k] (τk (n)) n→∞ n
Now, the random variables (Dk (τk (n)), D[k−1] (τk (n))) n≥2k−1 form a Pólya urn scheme,
with Dk (τk (2k − 1)) = 1, and D[k−1] (τk (2k − 1)) = 2k − 2. The edge at time τk (n) is
attached to vertex k with probability
Dk (τk (n)) + δ
, (5.2.41)
n + kδ
which is the probability of a Pólya urn scheme having linear weights as in (5.2.13) with
ab = δ, ar = (k − 1)δ , b0 = 1, and r0 = 2(k − 1). Thus, the statement follows from
Theorem 5.3.
Theorem 5.6 is easily extended to (PA(1,δ)
n (b))n≥1 :
n (b))n≥1 , as n → ∞,
Theorem 5.7 (Relative degrees in scale-free trees) For (PA(1,δ)
Dk (n) a.s. 0
−→ ψk , (5.2.42)
D[k] (n)
where ψk0 ∼ Beta(1 + δ, (2k − 1) + (k − 1)δ) for k ≥ 3, and ψ20 ∼ Beta(2 + δ, 2 + δ).
The dynamics for (PA(1,δ) (1,δ)
n (b))n≥1 are slightly different than those of (PAn (a))n≥1 ,
(1,δ)
since PAn (b) does not allow for self-loops in the growth of the tree. Indeed, now the
random variables (Dk (τk (n)), D[k−1] (τk (n))) n≥2k form a Pólya urn scheme, starting
with Dk (τk (2k)) = 1, and D[k−1] (τk (2k)) = 2k − 1. The edge at time τk (n) is attached
to vertex k with probability
Dk (τk (n)) + δ
, (5.2.43)
n + kδ
which are the probabilities of a Pólya urn scheme in (5.2.12) in the linear weight case in
(5.2.13) when ab = δ, ar = (k − 1)δ , b0 = 1, and r0 = 2k − 1. The setting is a little
different for k = 2, since vertex 3 attaches to vertices 1 and 2 with equal probability, so
that ψ20 ∼ Beta(2 + δ, 2 + δ). Thus, again the statement follows from Theorem 5.3. See
Exercise 5.10 for the complete proof. We conclude that, even though (PA(1,δ) n (a))n≥1 and
(PA(1,δ)
n (b))n≥1 have the same asymptotic degree distribution, the limiting degree ratios in
Theorems 5.6 and 5.7 are different.
5.3 Local Convergence of Preferential Attachment Models 199
In this section we study the local limit of preferential attachment models, which is a more
difficult subject that of inhomogeneous random graphs or configuration models. Indeed,
it turns out that the local limit is not described by a homogeneous unimodular branching
process but rather by an inhomogeneous multi-type branching process.
called the Pólya point tree. Its nodes are labeled by finite words, using the Ulam–Harris
labeling of trees in Section 1.5, as w = w1 w2 · · · wl , each carrying an age as well as a label
Y or O denoting whether the child is younger or older than its parent in the tree.
The root ∅ has age U∅ , where U∅ is chosen uar in [0, 1]. The root is special, and has
no label in {Y, O}, since it has no parent. Having discussed the root of the tree, we now
construct the remainder of the tree by recursion.
In the recursion step, we assume that the Ulam–Harris word w (recall Section 1.5) and
the corresponding age variable Aw ∈ [0, 1] have been chosen in a previous step. For j ≥ 1,
let wj be the j th child of w, and set
(
m if w is the root or of label O,
m− (w) = (5.3.1)
m − 1 if w is of label Y.
The intuition behind (5.3.1) is that m− (w) equals the number of older children of w, which
equals m when w is older than its parent, and m − 1 when w is younger than its parent.
Recall that a Gamma distribution with parameters r and λ has the density given in (1.5.1).
Let Γ have a Gamma distribution with parameters r = m + δ and λ = 1, and let Γ? be the
size-biased version of Y , which has a Gamma distribution with parameters r = m + δ + 1
and λ = 1 (see Exercise 5.11). We then take
(
Γ if w is the root or of label Y,
Γw ∼ (5.3.2)
Γ? if w is of label O,
independently of everything else.
Let w1, . . . , wm− (w) be the children of w having label O, and let their ages Aw1 , . . . ,
Awm− (w) be given by
1/χ
Awj = Uwj Aw , (5.3.3)
m (w)
−
where (Uwj )j=1 are iid uniform random variables on [0, 1] that are independent of every-
thing else, and let
m+δ
χ= . (5.3.4)
2m + δ
Further, let (Aw(m− (w))+j) )j≥1 be the (ordered) points of a Poisson point process on [Aw , 1]
with intensity
Γw x1/(τ −1)−1
ρw (x) = 1/(τ −1)
, (5.3.5)
τ − 1 Aw
where we recall that τ = 3 + δ/m by (1.3.63), and the nodes (w(m− (w) + j))j≥1 have
label Y. The children of w are the nodes wj with labels O and Y.
The above random tree is known as the Pólya point tree. The Pólya point tree is a
multi-type discrete-time branching process, where the type of a node w is equal to the pair
(aw , tw ), with aw ∈ [0, 1] corresponding to the age of the vertex, and tw ∈ {Y, O} to
its label. Thus, the type space S = [0, 1] × {Y, O} of the multi-type branching process is
continuous.
Let us discuss the offspring structure of the above process. Obviously, there are finitely
5.3 Local Convergence of Preferential Attachment Models 201
(a) (b)
100 100
10−1
10−1
10−2
P(X > x)
P(X > x)
−2
10
10−3
10−3
10−4
Degree distribution Degree distribution
−4
Size biased degree distribution −5
Size-biased degree distribution
10 Random friend degree distribution 10 Random friend degree distribution
100 101 102 103 100 101
Degrees Degrees
many children of label O. Further, note that 1/(τ − 1) = m/(2m + δ) > 0, so the intensity
ρw in (5.3.5) of the Poisson process is integrable. Thus every vertex in the random tree has
almost surely finitely many children.
With the above description in hand, we are ready to state our main result concerning local
convergence of PA(m,δ)
n (d):
See Figures 5.1 and 5.2 for examples of the various degree distributions in the preferential
attachment model, where we plot the degree distribution itself, the size-biased degree dis-
tribution, and the degree distribution of a random neighbor of a uniform vertex. Contrary to
the generalized random graphs and the configuration model (recall Figures 3.2 and 4.2), the
latter two degree distributions are different, particularly for small values of n, as in Figure
5.1, even though their power-law exponents do agree.
Extensions of Theorem 5.8 to other models, including those where self-loops are allowed,
are given in Section 5.4.4 below. These results show that Theorem 5.8 is quite robust to
minor changes in the model definition, and that it also applies to PA(m,δ)
n (a) and PA(m,δ)
n (b)
with the same limit. We refer to the discussion in Section 5.7 for more details and also the
history of Theorem 5.8.
The proof of Theorem 5.8 is organized as follows. We start in Section 5.3.2 by investi-
gating consequences of Theorem 5.8 for the degree structure of PA(m,δ)
n (d) (or any other
graph having the same local limit). In Section 5.3.3 we prove that PA(m,δ)
n (d) can be rep-
resented in terms of conditionally independent edges by relying on a Pólya urn description.
The remainder of the proof of local convergence is deferred to Section 5.4.
202 Connected Components in Preferential Attachment Models
(a) (b)
100 100
10−1 10−1
10−2 10−2
P(X > x)
P(X > x)
10−3 10−3
10−4 10−4
Degree distribution Degree distribution
−5
Size-biased degree distribution −5
Size-biased degree distribution
10 Random friend degree distribution 10 Random friend degree distribution
100 101 102 103 100 101
Degrees Degrees
Note that the limiting degree distribution in (5.3.6) is equal to that for PA(m,δ)n (a) in
(1.3.60), again exemplifying that the details of the model have little influence on the limiting
degree sequence. It is not hard to see from Lemma 5.9 that
pk = cm,δ k −τ (1 + O(1/k)), p0k = c0m,δ k −(τ −1) (1 + O(1/k)), (5.3.8)
for some constants cm,δ and c0m,δ and with τ = 3 + δ/m (see Exercise 5.12). We conclude
that there is a form of size biasing in that older neighbors of a uniform vertex have a limiting
degree distribution that satisfies a power law (like the degree of the random vertex itself),
but with an exponent that is one lower than that of the vertex itself (recall Figures 5.1 and
5.2). Exercises 5.13–5.15 study the joint distribution (D, D0 ) and various conditional power
laws.
Proof of Lemma 5.9 subject to Theorem 5.8. We note that local convergence in probability
5.3 Local Convergence of Preferential Attachment Models 203
implies the convergence of the degree distribution. It thus suffices to study the distribution
of the degree of the root in the Pólya point tree. We first condition on the age A∅ = U∅ of
the root of the Pólya point tree, where U∅ is standard uniform. Let D be the degree of the
root. Conditioning on A∅ = a, the degree D is m plus a Poisson variable with parameter
Z 1
Γ∅ 1/(τ −1)−1 1 − a1/(τ −1)
x dx = Γ ∅ ≡ Γ∅ κ(a), (5.3.9)
a1/(τ −1) (τ − 1) a a1/(τ −1)
where Γ∅ is a Gamma variable with parameters r = m + δ and λ = 1. Thus, taking the
expectation wrt Γ∅ , we obtain
Z ∞
y m+δ−1 −y
P(D = k | A∅ = a) = P(D = k | A∅ = a, Γ∅ = y) e dy
0 Γ(m + δ)
Z ∞
(yκ(a))k−m y m+δ−1 −y
= e−yκ(a) e dy
0 (k − m)! Γ(m + δ)
κ(a)k−m Γ(k + δ)
=
(1 + κ(a)) k−m+m+δ (k − m)!Γ(m + δ)
Γ(k + δ)
= (1 − a1/(τ −1) )k−m a(m+δ)/(τ −1) , (5.3.10)
(k − m)!Γ(m + δ)
where we use κ(a)/(1 + κ(a)) = 1 − a1/(τ −1) and 1/(1 + κ(a)) = a1/(τ −1) . We thus
conclude that
Z 1
P(D = k) = P(D = k | A∅ = a)da
0
Z 1
Γ(k + δ)
= (1 − a1/(τ −1) )k−m a(m+δ)/(τ −1) da. (5.3.11)
0 (k − m)!Γ(m + δ)
Recall that
1
Γ(p)Γ(q)
Z
up−1 (1 − u)q−1 du = . (5.3.12)
0 Γ(p + q)
Using the integral transform u = a1/(τ −1) , for which da = (τ − 1)u2−τ du, we arrive at
Z 1
Γ(k + δ)
P(D = k) = (τ − 1) (1 − u)k−m um+δ+1+δ/m du
(k − m)!Γ(m + δ) 0
Γ(k + δ) Γ(k − m + 1)Γ(m + 2 + δ + δ/m)
= (τ − 1)
(k − m)!Γ(m + δ) Γ(k + 3 + δ + δ/m)
Γ(k + δ)Γ(m + 2 + δ + δ/m)
= (τ − 1) . (5.3.13)
Γ(m + δ)Γ(k + 3 + δ + δ/m)
Since τ − 1 = (2m + δ)/m by (1.3.63), this proves (5.3.6).
We next extend this to the convergence in distribution of Do0 (n), for which we again note
that local convergence implies the convergence of the degree distribution of neighbors of the
root, and so in particular of Do0 (n). It thus suffices to study the distribution of the degree of a
uniform neighbor of the root in the Pólya point tree. We first condition on the age A∅ = U∅
of the root of the Pólya point tree, where U∅ is standard uniform, and recall that the age A∅1
204 Connected Components in Preferential Attachment Models
1/χ
of one of the m older vertices to which ∅ is connected has distribution A∅1 = U∅1 A∅ ,
where U∅1 is uniform on [0, 1] and 1/χ = (τ − 1)/(τ − 2) by (5.3.4). Let D0 be the degree
of vertex ∅1. By (5.3.5), conditioning on A∅1 = b, the degree D0 is m plus a Poisson
variable with parameter
1
Γ∅1 1 − b1/(τ −1)
Z
x1/(τ −1)−1 dx = Γ∅1 ≡ Γ∅1 κ(b), (5.3.14)
b1/(τ −1) (τ − 1) b b1/(τ −1)
∞
y m+1+δ
Z
0
P(D = k | A∅1 = b) = P(D0 = k | A∅1 = b, Γ∅1 = y) e−y dy
0 Γ(m + 2 + δ)
∞
(yκ(b))k−m y m+δ
Z
= e−yκ(b) e−y dy
0 (k − m)! Γ(m + 1 + δ)
κ(b)k−m Γ(k + 1 + δ)
= (5.3.15)
(1 + κ(b))k−m+m+1+δ (k − m)!Γ(m + 1 + δ)
Γ(k + 1 + δ)
= (1 − b1/(τ −1) )k−m b(m+1+δ)/(τ −1) ,
(k − m)!Γ(m + 1 + δ)
where we again use that κ(b)/(1 + κ(b)) = 1 − b1/(τ −1) and 1/(1 + κ(b)) = b1/(τ −1) .
(τ −2)/(τ −1)
We next use that A∅1 = U∅1 A∅ , where A∅ is uniform on [0, 1]. Recall that
the vector (A∅ , U∅1 ) has density 1 on [0, 1]2 . Define the random vector (A∅ , A∅1 ) =
(τ −2)/(τ −1)
(A∅ , U∅1 A∅ ), so that (A∅ , A∅1 ) has joint density on {(a, b) : b ≤ a} given by
1 a
τ −2
Z Z
−(τ −1)/(τ −2)
a b−1/(τ −1) P(D0 = k | A∅1 = b)db da
τ −1 0 0
τ −2 Γ(k + 1 + δ)
= (5.3.17)
τ − 1 (k − m)!Γ(m + 1 + δ)
Z 1 Z a
−(τ −2)/(τ −1)
× a (1 − b1/(τ −1) )k−m b(m+1+δ)/(τ −1)−1/(τ −1) db da
0 0
Z 1Z u
Γ(k + 1 + δ)
= (τ − 2)(τ − 1) (1 − v)k−m v m+1+δ+δ/m dv du,
(k − m)!Γ(m + 1 + δ) 0 0
where we have now used the integral transform u = aτ −1 and v = bτ −1 . Recall (5.3.12).
5.3 Local Convergence of Preferential Attachment Models 205
Interchanging the integrals over u and v thus leads to the conclusion that P(D0 = k) equals
Z 1
Γ(k + 1 + δ)
(τ − 2)(τ − 1) (1 − v)k−m+1 v m+1+δ+δ/m dv
(k − m)!Γ(m + 1 + δ) 0
Γ(k + 1 + δ) Γ(m + 2 + δ + δ/m)Γ(k − m + 2)
= (τ − 2)(τ − 1)
(k − m)!Γ(m + 1 + δ) Γ(k + 4 + δ + δ/m)
(2m + δ) Γ(m + 2 + δ + δ/m) (k − m + 1)Γ(k + 1 + δ)
= , (5.3.18)
m2 Γ(m + δ) Γ(k + 4 + δ + δ/m)
as required.
Here the latter equality follows simply by induction on k ≥ 1 (see Exercise 5.16). Finally,
let Ik(n) = [Sk−1
(n)
, Sk(n) ). We now construct a graph as follows:
B conditional on ψ1 , . . . , ψn , choose (Uk,i )k∈[n],i∈[m] as a sequence of independent ran-
(n)
dom variables, with Uk,i chosen uar from the (random) interval [0, Sk−1 ];
B for k ∈ [n] and j < n, join two vertices j and k if j < k and Uk,i ∈ Ij(n) for some
i ∈ [m] (with multiple edges between j and k if there are several such i).
Call the resulting random multi-graph on [n + 1] the finite-size Pólya graph of size n. The
main result for PA(m,δ)
n (d) is as follows:
Theorem 5.10 (Finite-graph Pólya version of PA(m,δ)n (d)) Fix m ≥ 1 and δ > −m. Then,
the distribution of PA(m,δ)
n (d) is the same as that of the finite-size Pólya graph of size n.
The importance of Theorem 5.10 is that the edges in the finite-size Pólya graph are inde-
pendent conditional on the Beta variables (ψk )k≥1 , in a similar way as for (5.2.15) in The-
orem 5.3. This independence makes explicit computations possible. Exercises 5.18–5.17,
for example, use Theorem 5.10 to derive properties of the number of multiple edges in
PA(m,δ)
n (d) for m = 2.
In terms of the above Pólya point tree, the proof shows that the Gamma variables that
define the “strengths” Γw are inherited from the Beta random variables (ψk )k∈[n] , while the
age variables Aw are inherited from the random variables (Sk(n) )k∈[n] (see Lemmas 5.17 and
5.18 below).
206 Connected Components in Preferential Attachment Models
while, for j ≥ 3,
ψj0 ∼ Beta m + δ, (2j − 1)m + δ(j − 1) .
(5.3.21)
Recall also Theorem 5.7. The above changes affect the finite-size Pólya graph in only a
minor way. J
Let us give some insight into the proof of Theorem 5.10, after which we give two full
proofs. The first proof relies on Pólya urn methods and the second on a direct computation.
For the Pólya urn proof, we rely on the fact that there is a close connection between
the preferential attachment model and the Pólya urn model, in the following sense. Every
new connection that a vertex gains can be represented by a new ball added to the urn cor-
responding to that vertex, as in Theorems 5.6 and 5.7. As time progresses, the number of
urns corresponding to the vertices changes, which is a major complication. As it turns out,
however, the attachment probabilities are consistent, which allows the Pólya urn description
to be extended to this setting of increasing numbers of urns. Let us now make this intuition
precise.
Pólya urn proof of Theorem 5.10. Let us consider first a two-urn model, where the number
of balls in one urn represents the degree of a particular vertex k , and the number of balls in
the other represents the sum of the degrees of the vertices [k −1] as in Theorems 5.6 and 5.7.
We start this process at the point when n = k , and k has connected to precisely m vertices
in [k − 1]. Note that at this point, by the structure of PA(m,δ)
n (d), the urn representing the
degree of vertex k has m balls while the other urn, corresponding to the vertices in [k − 1],
has (2k − 3)m balls.
Consider a time in the evolution of PA(m,δ)
n (d) when we have n − 1 ≥ k old vertices, and
i − 1 edges between the new vertex n and [k − 1]. Assume that at this point the degree of k
is dk and the sum of the degrees of the vertices in [k − 1] is d[k−1] . The probability that the
ith edge from n to [n − 1] is attached to k is then
dk + δ
, (5.3.22)
2m(n − 1) + (1 + δ)(i − 1)
while the probability that it is connected to a vertex in [k − 1] is equal to
d[k−1] + δ(k − 1)
. (5.3.23)
2m(n − 1) + (1 + δ)(i − 1)
Thus, conditioned on connecting to [k], the probability that the ith edge from n to [n − 1]
is attached to k is (dk + δ)/(kδ + d[k] ), while the probability that the ith edge from n to
[n − 1] is attached to [k − 1] is (d[k−1] + δ(k − 1))/(kδ + d[k] ).
Taking into account that the two urns start with m and (2k − 3)m balls, respectively, we
5.3 Local Convergence of Preferential Attachment Models 207
see that the evolution of the two bins is a Pólya urn with strengths ψk and 1 − ψk , where
ψk ∼ Beta(m + δ, (2k − 3)m + δ(k − 1)) (recall Theorem 5.3). We next use this to
complete the proof of Theorem 5.10, where we use induction. Indeed, using the two-urn
process as an inductive input, we construct the finite-size Pólya graph defined in Theorem
5.10 in a similar way as for the Pólya urns with multiple colors in (5.2.28).
Let Xt ∈ [0, dt/me] be the vertex receiving the tth edge (the other endpoint of this edge
being the vertex dt/me + 1). For t ∈ [m], Xt is deterministic (and equal to 1 since we start
at time 1 with two vertices and m edges between them); however, beginning at t = m + 1,
we have a two-urn model, starting with m balls in each urn. As shown above, the two urns
can be described as Pólya urns with strengths 1 − ψ2 and ψ2 . Once t > 2m, Xt can take
three values but, conditioned on Xt ≤ 2, the process continues to be a two-urn model with
strengths 1 − ψ2 and ψ2 .
To determine the probability of the event that Xt ≤ 2, we now use the above two-urn
model with k = 3, which gives that the probability of the event Xt ≤ 2 is 1 − ψ3 , at least as
long as t ≤ 3m. Combining these two-urn models, we get a three-urn model with strengths
(1 − ψ2 )(1 − ψ3 ), ψ2 (1 − ψ3 ), and ψ3 . Again, this model remains valid for t > 3m, as long
as we condition on Xt ≤ 3. Continuing inductively, we see that the sequence Xt evolves in
stages:
B For t ∈ [m], the variable Xt is deterministic: Xt = 1.
B For t = m + 1, . . . , 2m, the distribution of Xt ∈ {1, 2} is described by a two-urn
model with strengths 1 − ψ2 and ψ2 , where ψ2 ∼ Beta(m + δ, m + δ).
B In general, for t = m(k − 1) + 1, . . . , km, the distribution of Xt ∈ [k] is described by
a k -urn model with strengths
k
Y
ϕ(k)
j = ψj (1 − ψi ), j ∈ [k]. (5.3.24)
i=j+1
Here the Beta-variable ψk in (5.3.19) is chosen at the beginning of the k th stage, in-
dependently of the previously chosen strengths ψ1 , . . . , ψk−1 (for convenience, we set
ψ1 = 1).
Note that the random variables ϕ(k) j can be expressed in Q
terms of the random variables
n
introduced in Theorem 5.10 as follows. By (5.3.20), Sk(n) = j=k+1 (1 − ψj ). This implies
that ϕ(n)
j = ψj Sj(n) , which relates the strengths ϕ(n)
j to the random variables defined right
before Theorem 5.10, and shows that the process derived above is indeed the process given
in the theorem.
We next give a direct proof of Theorem 5.10, which is of independent interest as it also
indicates how the conditional independence of edges can be used effectively:
Direct proof of Theorem 5.10. In what follows, we let PA0n denote the finite-size Pólya graph
of size n. Our aim is to show that P(PA0n = G) = P(PA(m,δ) n (d) = G) for any graph G.
Here, we think of G as being a directed and edge-labeled graph, where every vertex has out-
degree m and the out-edges are labeled as [m]. Thus, the out-edges are the edges from young
to old. Indeed, recall from Section 1.3.5 that the graph starts at time 1 with two vertices and
m edges between them. The vertex set of PA(m,δ) n (d) is [n]. In the proof, it is convenient
208 Connected Components in Preferential Attachment Models
~
to denote the labeled edge set of G as E(G) = {(u, vj (u), j) : u ∈ [n], j ∈ [m]}, where
vj (u) < u is the vertex to which the j th edge of u is attached in G. We can assume that
vj (2) = 1 for all j ∈ [m], since PA(m,δ)
n (d) starts at time 1 with two vertices having m
edges between them.
Fix an edge-labeled graph G for which P(PA(m,δ)
n (d) = G) > 0. On the one hand, we
can compute directly that
Y d(G)
vj (u) (u) + δ
P(PAn (m,δ)
(d) = G) = , (5.3.25)
u∈[3,n],j∈[m]
2m(u − 2) + j − 1 + δ(u − 1)
Thus, we are left with showing that P(PA0n = G) is equal to the rhs of (5.3.27).
To identify P(PA0n = G), it is convenient to condition on the Beta variables (ψj )j∈[n] .
We denote the conditional measure by Pn ; i.e., for every event E ,
Pn (E) = P(E | (ψj )j∈[n] ). (5.3.28)
The advantage of this measure is that now edges are conditionally independent, which allows
us to give an exact formula for the probability of a certain graph occurring. We start by
j
computing the edge probabilities under Pn , where we recall that {u v} is the event that
the j th edge of u connects to v :
Lemma 5.12 (Edge probabilities in PA0n conditioned on Beta variables) Fix m ≥ 1 and
δ > −m, and consider PA0n . For any u > v and j ∈ [m],
j
Pn (u v) = ψv (1 − ψ)(v,u) , (5.3.29)
where, for A ⊆ [n],
Y
(1 − ψ)A = (1 − ψa ). (5.3.30)
a∈A
Proof Recall the construction between (5.3.19) and Theorem 5.10. When we condition on
(ψj )j∈[n] , the only randomness left is that in the uniform random variables (Uk,i )k∈[n],i∈[m] ,
(n) j
where Uk,i is uniform on [0, Sk−1 ]. Then, u v occurs precisely when Uu,j ∈ Iv(n) =
[Sv−1 , Sv ), which occurs with Pn -probability equal to |Iv(n) |/Su−1
(n) (n) (n)
. Note that
(n)
|Iv(n) | = Sv(n) − Sv−1 = (1 − ψ)[v+1,n] − (1 − ψ)[v,n] = ψv (1 − ψ)(v,n] , (5.3.31)
5.3 Local Convergence of Preferential Attachment Models 209
where ps = p(G)
s and qs = qs are given by
(G)
1{s∈(v (u),u)} .
X X
s − m,
ps = d(G) qs = j
(5.3.35)
u∈[3,n] j∈[m]
j
Proof Multiply the factors Pn (u ~
vj (u)) for every labeled edge (u, vj (u), j) ∈ E(G),
and collect the powers of ψs and 1 − ψs .
We note that ps equals the number of edges in the graph G that point towards s. This is
relevant, since in (5.3.29) in Lemma 5.12 every older vertex v in an edge receives a factor
ψv . Further, again by (5.3.29), there are factors 1 − ψs for every s ∈ (v, u) and all edges
(v, u), so qs counts how many factors 1 − ψs occur.
When taking expectations wrt (ψv )v∈[n] into account, by Corollary 5.13, we obtain ex-
pectations of the form E[ψ p (1 − ψ)q ], where ψ ∼ Beta(α, β) and p, q ≥ 0. These are
computed in the following lemma:
Lemma 5.14 (Expectations of powers of Beta variables) For all p, q ∈ N and ψ ∼
Beta(α, β),
(α + p − 1)p (β + q − 1)q
E[ψ p (1 − ψ)q ] = , (5.3.36)
(α + β + p + q − 1)p+q
where, as before, (x)m = x(x − 1) · · · (x − m + 1) denotes the mth falling factorial of x.
Proof A direct computation based on the density of a Beta random variable in (1.5.2)
yields
B(α + p, β + q) Γ(α + β) Γ(α + p)Γ(β + q)
E[ψ p (1 − ψ)q ] = =
B(α, β) Γ(α)Γ(β) Γ(α + β + p + q)
(α + p − 1)p (β + q − 1)q
= . (5.3.37)
(α + β + p + q − 1)p+q
210 Connected Components in Preferential Attachment Models
The above computation, when applied to Corollary 5.13, leads to the following expression
for the probability of observing a particular edge-labeled multi-graph G:
Corollary 5.15 (Graph probabilities in PA0n ) Fix m ≥ 1 and δ > −m, and consider PA0n .
For any edge-labeled multi-graph G,
n−1
Y (α + ps − 1)ps (βs + qs − 1)qs
P(PA0n = G) = , (5.3.38)
s=2
(α + βs + ps + qs − 1)ps +qs
where α = m + δ, βs = (2s − 3)m + δ(s − 1), and ps = ps(G) and qs = qs(G) are defined
in (5.3.35).
n = dn − m = 0 almost surely
Note that the contribution for s = n equals 1, since p(G) (G)
0
in PAn . Corollary 5.15 allows us to complete the direct proof of Theorem 5.10:
Corollary 5.16 (Graph probabilities in PA0n and PA(m,δ)
n (d)) Fix m ≥ 1 and δ > −m,
and consider PA0n and PA(m,δ)
n (d). For any edge-labeled multi-graph G,
P(PA0n = G) = P(PA(m,δ)
n (d) = G), (5.3.39)
where α = m + δ, βs = (2s − 3)m + δ(s − 1), and ps = ps(G) and qs = qs(G) are defined
in (5.3.35). Consequently, Corollaries 5.13 and 5.15 also hold for PA(m,δ)
n (d).
Proof We evaluate (5.3.38) in Corollary 5.15 explicitly. Since α = m + δ and ps =
s − m,
d(G)
d(G)
s −m−1
Y
(α + ps − 1)ps = (i + m + δ), (5.3.40)
i=0
so that
(G)
−m−1
n−1
Y Y ds
n−1 Y
(α + ps − 1)ps = (i + m + δ), (5.3.41)
s=2 s=2 i=0
which produces the first product in (5.3.27) except for the s = 1 factor.
We next identify the other factors, for which we start by analyzing qs(G) as follows:
We can use
1{v (u)∈[s−1]} = d[s−1] − m(s − 1),
X X (G)
j
(5.3.43)
u∈[3,n] j∈[m]
since the lhs counts the in-edges in [s − 1] except for those from vertex 2 to vertex 1, while
d(G)
[s−1] counts all in- and out-edges, and there are exactly m(s − 2) out-edges in [s − 1] and
m edges from vertex 2 to vertex 1. Further, note that, for s ∈ [n],
1{s∈[u,n]} = m 1{s∈[u,n]} = m(s − 2).
X X X
(5.3.44)
u∈[3,n] j∈[m] u∈[3,s]
5.4 Proof of Local Convergence for Preferential Attachment Models 211
Thus,
(G)
qs(G) = d[s−1] − m(2s − 3). (5.3.45)
As a result, by (5.3.35) and the recursions
(G)
p(G)
s + qs
(G)
= qs+1 + m, α + βs = βs+1 − m, (5.3.46)
we obtain
(α + βs + ps + qs − 1)ps +qs = (βs+1 + qs+1 − 1)qs+1 +m
= (βs+1 + qs+1 − 1)qs+1 (βs+1 − 1)m . (5.3.47)
Therefore, by (5.3.45) and since βs = (2s − 3)m + δ(s − 1),
n−1
Y (βs + qs − 1)qs
s=2
(α + βs + ps + qs − 1)ps +qs
n−1 n−1
Y 1 Y (βs + qs − 1)qs
=
s=2
(βs+1 − 1)m s=2 (βs+1 + qs+1 − 1)qs+1
n−1
Y 1
= (δ + d(G)
1 − 1)d(G) −m . (5.3.48)
1
s=2
(m(2s − 1) + δs − 1)m
Indeed, the starting value in the telescoping product equals, again by (5.3.45), with β2 =
m + δ and q2 = d(G)1 − m,
where j = m − i + 1, as required.
In this section we complete the proof of the local convergence of preferential attachment
models to the Pólya point tree in Theorem 5.8. This section is organized as follows. In Sec-
tion 5.4.1 we discuss some necessary preliminaries, such as the convergence of the rescaled
Beta variables to Gamma variables and the regularity properties of the Pólya point tree. Our
local convergence proof again relies on a second-moment method for the number of vertices
212 Connected Components in Preferential Attachment Models
whose ordered r-neighborhood agrees with a specific ordered tree t. We investigate the first
moment of the subgraph counts in Section 5.4.2 and handle the second moment in Section
5.4.3. We close in Section 5.4.4 by discussing the local convergence of related preferential
attachment models.
This proves the upper bound in (5.4.1) for k ≥ K and K large enough. The inequality in
(5.4.4) is obviously true for x ≥ 1, so we will assume that x ∈ [0, 1) from now on. Then
we can write, with b = (2k − 3)m + δ(k − 1) − 1,
Note that y 7→ e−by (1 − y)−b is increasing, while y 7→ 1{y≤x} is decreasing, so that, by the
correlation inequality in [V1, Lemma 2.14],
as required.
We continue with the lower bound in (5.4.1), and now instead aim to prove that, for
x ≤ (log k)2 /b and again with b = (2k − 3)m + δ(k − 1) − 1,
We now write
R (1−ε)xb
y αk −1 (1 − y/b)b dy
P(ψk ≤ (1 − ε)x) = 0
Rb
0
y αk −1 (1 − y/b)b dy
E[1{χ0k ≤(1−ε)xb} eχk (1 − χ0k /b)−b ]
0
=
E[1{χ0k ≤b} eχk (1 − χ0k /b)−b ]
0
= 0 . (5.4.8)
E[eχk (1 − χ0k /b)−b | χ0k ≤ b]
Thus, for the lower bound in (5.4.1), it suffices to show that, for all k large enough, and for
x ≤ (log k)2 /b,
P(χ0k ≤ (1 − ε)xb | χ0k ≤ b) ≤ P(χ0k ≤ xb). (5.4.10)
In turn, this follows from the statement that, for x ≤ (log k)2 /b,
e(x) = P(xb(1 − ε) < χ0k ≤ bx) − P(χ0k > b)P(χ0k ≤ xb). (5.4.12)
214 Connected Components in Preferential Attachment Models
We bound the first term on the rhs of (5.4.12) from below as follows:
and
xb xb
y m+δ−1 −y [xb]m+δ−1
Z Z
P(χ0k ≤ xb) = e dy ≤ e−y dy
0 Γ(m + δ) Γ(m + δ) 0
[xb]m+δ−1
≤ . (5.4.15)
Γ(m + δ)
Substitution yields that
[xb]m+δ−1
e(x) ≥ [(1 − ε)m+δ−1 (e−xb(1−ε) − e−bx ) − 2m+δ e−b/2 ], (5.4.16)
Γ(m + δ)
which, for any ε ∈ (0, 1), is non-negative for all x < 13 , say, and b = (2k − 3)m + δ(k − 1)
sufficiently large. This is much more than is needed.
We complete the proof by showing that χk ≤ (log k)2 for all k ≥ K with probability at
least 1 − ε, for which we note that
2 2
P(χk ≥ (log k)2 ) ≤ E[eχk /2 ] e−(log k) /2
= 2m+δ e−(log k) /2
, (5.4.17)
Proposition 5.18 (Asymptotics of Sk(n) ) Recall that χ = (m + δ)/(2m + δ). For every
ε > 0, there exist η > 0 and K < ∞ such that, for all n ≥ K and with probability at least
1 − ε,
k χ
max Sk(n) − ≤ η, (5.4.18)
k∈[n] n
and
k χ k χ
max Sk(n) − ≤ε . (5.4.19)
k∈[n]\[K] n n
Proof We will give the intuition behind Proposition 5.18. We recall from (5.3.20) that
5.4 Proof of Local Convergence for Preferential Attachment Models 215
Qn
Sk(n) = i=k+1 (1 − ψi ), where (ψk )k∈[n] are independent random variables. We write
n
X n
X n
X
log Sk(n) = log(1 − ψi ) = E[log(1 − ψi )] + (log(1 − ψi ) − E[log(1 − ψi )]).
i=k i=k i=k
Pn (5.4.20)
Note that (Mn )n≥k , with Mn = i=k (log(1 − ψi ) − E[log(1 − ψi )]), is a martingale.
Thus, by Kolmogorov’s inequality in (1.5.5), and all t ≥ k ,
t Xn t
X
P sup (log(1−ψi )− E[log(1−ψi )]) ≥ ε ≤ ε−2 Var(log(1−ψi )). (5.4.21)
n=k
i=k i=k
Using that log(1 − x) ≤ x/(1 − x) for all x ∈ [0, 1] and Lemma 5.14, we obtain the bound
h ψi2 i
Var(log(1 − ψi )) ≤ E[(log(1 − ψi ))2 ] ≤ E = O(i−2 ), (5.4.22)
(1 − ψi )2
so that, for all t ≥ n,
n
t X CX 1
P sup (log Sk(n) − E[log(1 − ψi )]) ≥ ε ≤ 2 , (5.4.23)
n=k
i=k
ε i≥k i2
which can be made small by letting k ≥ K and K be large. This shows that the random part
in (5.4.20) is whp small for k ≥ K .
To compute the asymptotics of the deterministic first part in (5.4.20), now using that
x ≤ log(1 − x) ≤ x + x2 /(1 − x) for all x ∈ [0, 1],
n
X n h ψ2 i X 1
X
0≤ E[log(1 − ψi )] − E[ψi ] ≤ E i
≤C , (5.4.24)
i=k i=k
1 − ψi i≥k
i2
which can again be made small when k ≥ K with K large. Further, by Lemma 5.14, with
α = m + δ and βi = (2i − 3)m + δ(i − 1), we have
n n
X X m+δ m+δ
E[ψi ] = = log(n/k) + O(1/k) (5.4.25)
i=k i=k
(2i − 3)m + δ(i − 1) 2m + δ
= χ log(n/k) + O(1/k),
Pn
since i=k 1/i = log(n/k) + O(1/k). We conclude that
log Sk(n) = χ log(n/k) + O(1/k), (5.4.26)
which completes the proof of (5.4.19). The proof of (5.4.18) follows easily, since (5.4.19) is
stronger for k ≥ K , while E[Sk(n) ] = o(1) for k ∈ [K]. We omit further details.
density of the ages in the Pólya point tree. For this, it is useful to have a description of this
joint density.
Before we can formulate our main result concerning this joint density, we will introduce
some further notation. Recall the definition of the Pólya point tree in Section 5.3.1 and also
the Poisson intensities in (5.3.5) and the corresponding Gamma variables in (5.3.2). Below,
we write x 7→ ρw (x; Γw ) for the Poisson intensity in (5.3.5) conditioned on the Gamma
variable Γw .
Fix an ordered tree t, and let (G, o) be the Pólya point tree. In what follows, it is useful
to regard Br(G) (v) as a rooted edge-marked graph, where an edge receives a label in [m]
corresponding to the label of the directed edge (directed from young to old) that gives rise
to that edge (in either possible direction). Thus, in the pre-limiting preferential attachment
j j
model, the edge {u, v} receives label j when u v or when v u.
We denote this marked ordered neighborhood as B̄r (o). The edge labels are almost
(G)
contained in the ordered tree t but not quite, since when a vertex has label Y it is unclear
which edge of its parent gave rise to this connection, and, together with the m−1 edge labels
of its older children, these edge labels should equal [m]. We will slightly abuse notation, and
also write t for this edge-labeled tree, and we will write B̄r(G) (o) = t when the two graphs
are the same as edge-labeled trees.
For (aw )w∈V (t) ∈ [0, 1]|V (t)| , we define ft (aw )w∈V (t) to be the density of the ages in
the Pólya point tree when the ordered r-neighborhood B̄r(G) (o) in the Pólya point tree equals
t. Thus,
Y
µ(B̄r(G) (o) = t, Aw ∈ aw daw ∀w ∈ V (t)) = ft (aw )w∈V (t) daw . (5.4.27)
w∈V (t)
Note that (aw )w∈V (t) 7→ ft (aw )w∈V (t) is a sub-probability measure, as it need not inte-
grate to 1. We let t̄ denote a rooted vertex- and edge-marked tree, where the vertex labels
corresponding to the ages of the nodes are in [0, 1] and the edge labels are in [m]. Thus,
t̄ = t, (aw )w∈V (t) , (5.4.28)
where aw ∈ [0,1] is the age of w ∈ V (t). The following proposition identifies the density
ft (aw )w∈V (t) in (5.4.27), which corresponds to the density of the ages in the Pólya point
tree when the edge-marked neighborhood equals t:
Proposition 5.19 (Joint density of the Pólya point tree) The density ft (aw )w∈V (t) in
(5.4.27) satisfies
ft (aw )w∈V (t) = E[gt (aw )w∈V (t) ; (χw )w∈V (t) ],
(5.4.29)
where (χw )w∈V (t) are iid Gamma variables with parameters r = m + δ and λ = 1, and,
w (t̄) = #{{v, w} ∈ E(t) : av > aw } the in-degree of w in t̄,
with d(in)
gt (aw )w∈V (t) ; (χw )w∈V (t) (5.4.30)
Y χw d(in) w ( t̄) Y R 1 Y 1
= e− aw ρw (dt;χw ) ,
w∈V (t)
2m + δ w∈V ◦ (t) (w,w`)∈E(t)
(aw ∧ aw` ) (aw ∨ aw` )χ
1−χ
where V ◦ (t) denotes the set of vertices in the tree t that are at a distance strictly smaller
than r from the root.
5.4 Proof of Local Convergence for Preferential Attachment Models 217
Proof The proof is split into several steps. We start by removing the size-biasing of the
Gamma variables in (5.3.2).
ft (aw )w∈V (t) = E[ft (aw )w∈V (t) ; (Γw )w∈V (t) ].
(5.4.31)
Now recall the size-biasing in (5.3.2), present for all individuals of label O. In terms of these
random variables, note that, for each function h : R 7→ R, and using that E[Y ] = m + δ ,
h Y i h Y i
E[h(Y ? )] = E h(Y ) = E h(Y ) . (5.4.32)
E[Y ] m+δ
Thus, with (χw )w∈V (t) a collection of iid Gamma random variables with parameters m + δ
and 1,
h Y χ 1{label(w)=O} i
w
ft (aw )w∈V (t) = E
ft (aw )w∈V (t) ; (χw )w∈V (t) , (5.4.33)
w∈V (t)
m+δ
where we write label(w) for the label of w. We claim that gt (aw )w∈V (t) ; (χw )w∈V (t) in
(5.4.30) is given by
gt (aw )w∈V (t) ; (χw )w∈V (t)
Y χw 1{label(w)=O}
= ft (aw )w∈V (t) ; (χw )w∈V (t) , (5.4.34)
w∈V (t)
m+δ
w (t̄) when w has label Y, while dw (t̄) = dw (t̄) − 1 when w has label
Then dw (t̄) = d(in) (in)
O. We can then rewrite the first factor on the rhs of (5.4.30) as follows:
for aw` ∈ [0, aw ]. These ages are iid, so that the joint density of the O children of w is
m− (w)
Y m+δ 1
. (5.4.39)
`=1
2m + δ (aw ∧ aw` ) (aw ∨ aw` )χ
1−χ
and aw` > aw for all ` > m− (w). Here, we also note that aw` < aw(`+1) for all w and
` > m− (w) such that w`, w(` + 1) ∈ V (t)). By (5.3.5),
χw x1/(τ −1)−1 χw 1/(τ −1)−1 −1/(τ −1)
ρw (x; χw ) = = a aw (5.4.41)
1/(τ −1)
τ − 1 aw τ − 1 w`
m m 1
= χw a−χ −(1−χ)
w` aw = χw .
2m + δ 2m + δ (aw ∧ aw` ) (aw ∨ aw` )χ
1−χ
Since the number of `-values with aw` > aw in (5.4.40) equals dw (t̄), this leads to
R1 m dw (t̄) Y 1
e− aw ρw (dt;χw ) χw
dw (t̄)
. (5.4.42)
2m + δ ` : a >a
(aw ∧ a w` )1−χ (a ∨ a )χ
w w`
w` w
When we recall that each `-value with aw` > aw is assigned an edge label in [m], which
occurs independently with probability 1/m, the density of the edge-labeled younger children
of w is given by
R1 1 dw (t̄) Y 1
e− aw ρw (dt;χw ) χw
dw (t̄)
. (5.4.43)
2m + δ ` : a >a
(aw ∧ aw` ) (aw ∨ aw` )χ
1−χ
w` w
Multiplying Out
We multiply (5.4.39) and (5.4.43) to obtain that the density of the ages of the children of w
for each w ∈ V (t) is given by
R1 1 dw (t̄) m + δ m− (w) Y 1
χdww (t̄) e− aw ρw (dt;χw ) .
2m + δ 2m + δ (aw ∧ aw` ) (aw ∨ aw` )χ
` : w`∈V (t)
1−χ
5.4 Proof of Local Convergence for Preferential Attachment Models 219
The above holds for all w ∈ V ◦ (t), i.e., all w that are at a distance strictly smaller than r
from the root ∅. We next multiply over all such w, to obtain
Y R1 χw dw (t̄) m + δ m− (w)
e− ρw (dt;χw )
ft (aw )w∈V (t) ; (χw )w∈V (t) = aw
w∈V ◦ (t)
2m + δ 2m + δ
(5.4.44)
Y 1
× .
(w,w`)∈E(t)
(aw ∧ aw` )1−χ (aw ∨ aw` )χ
Since
1{label(w)= } ,
X X
m− (w) = O (5.4.45)
w∈V ◦ (t) w∈V (t)
this is indeed the same as the rhs of (5.4.37). This proves (5.4.34), and thus (5.4.29).
Lemma 5.20 (Regularity of Pólya point tree) Consider the Pólya point tree (G, ∅). Fix
r ≥ 1 and ε > 0. Then there exist constants η > 0 and K < ∞ such that, with probability
at least 1 − ε,
Proof The proof of this lemma is standard, and can be obtained, for example, by induction
on r or by using Proposition 5.19. The last bound follows from the continuous nature of
the random variables Aw , which implies that Aw 6= Aw0 for all distinct pairs w, w0 , so that
any finite number are pairwise separated by at least η for an appropriate η = η(ε) with
probability at least 1 − ε.
With Br(G) (∅) the r-neighborhood of ∅ in the Pólya point tree (which in itself is also or-
dered), we aim to show that
Nn,r (t) P
−→ µ(Br(G) (∅) = t), (5.4.47)
n
where, again, Br(G) (∅) = t denotes that the ordered trees Br(G) (∅) and t agree, and µ
denotes the law of the Pólya point tree.
Proving the convergence of Nn,r (t) is much harder than for inhomogeneous random
graphs and configuration models, considered in Theorems 3.14 and 4.1, respectively, as the
type of a vertex is crucial in determining the number and types of its children, and the type
space is continuous.
We start with the first moment, for which we note that
where on ∈ [n] is a uniform vertex. We do this using an explicit computation, alike the
one used in the direct proof of Theorem 5.10. In fact, we will prove a stronger statement,
in which we also study the vertex labels in B̄r(Gn ) (on ) and compare it with the density in
Proposition 5.19. Let us introduce some necessary notation.
Recall the definition of the rooted vertex- and edge-marked tree t̄ in (5.4.28), where the
vertex labels were in [0, 1] and the edge labels in [m]. We fix a tree t of height exactly r.
We let the vertex vw = dnaw e ∈ [n] correspond to the node in the tree having age aw .
With a slight abuse of notation, we also write B̄r(Gn ) (on ) = t̄ to denote that the vertices,
edges, and edge labels in B̄r(Gn ) (on ) are given by those in t̄. Note that this is rather different
from Br(Gn ) (on ) ' t as defined in Definition 2.3, where t was unlabeled and we were
investigating whether Br(Gn ) (on ) and t are isomorphic, and even different from B̄r(Gn ) (on ) =
t as in (5.4.46), where only the edges receive marks, and not the vertices. The definition of
B̄r(Gn ) (o) = t̄ used here is tailor-made to study the local convergence of PA(m,δ) n (d) as a
marked graph, where the vertex marks denote the vertex labels or ages of the vertices and
also the edges receive marks in [m].
Let t̄ = (t, (vw )w∈V (t) ) be the vertex-marked version of t, now with vw = dnaw e ∈ [n]
denoting the vertex label of the tree node w in t (instead of the age as in (5.4.28)). Below,
we write v ∈ V (t̄) to indicate that there exists a w ∈ V (t) with vw = v . Also, let ∂V (t̄)
denote the vertices at distance exactly r from the root of t̄, and let V ◦ (t̄) = V (t̄) \ ∂V (t̄)
denote the restriction of t̄ to all vertices at distance at most r − 1 from its root. The main
result of this section is the following theorem:
Theorem 5.21 (Marked local weak convergence) Fix m ≥ 1 and δ > −m, and consider
Gn = PA(m,δ)
n (d). Uniformly for vw ≥ εn and χ̂vw ≤ K for all w ∈ V (t), where
χ̂v = fv (ψv ) and when (vw )w∈V (t) are all distinct,
and 1,
E[gt (aw )w∈V (t) ; (χw )w∈V (t) ] = ft (aw )w∈V (t) ,
(5.4.50)
and thus PA(m,δ)
n (d) converges to the Pólya point tree in the marked local weak sense.
By Lemma 5.17, (χ̂v )v∈[n] are iid Gamma variables with parameters m + δ and 1. The
coordinates of the sequence (χw )w∈V (t) defined by
χw = χ̂vw (5.4.51)
are indeed iid when the (vw )w∈V (t) are distinct. By (5.4.50), the relation (5.4.49) can be
seen as a density theorem for the densities of the ages of vertices in r-neighborhoods.
Note that the type space of the Pólya point tree equals S = {Y, O} × [0, ∞) (except for
the root, which has a type only in [0, 1]). However, the {Y, O}-components of the types are
deterministic when one knows the ages in B̄r(Gn ) (on ), so these do not need to receive much
attention in what follows.
We prove Theorem 5.21 below. The main ingredient to the proof is Proposition 5.22,
which gives an explicit description for the lhs of (5.4.49):
Proposition 5.22 (Law of vertex- and edge-marked neighborhoods in PA(m,δ) n (d)) Fix
m ≥ 2 and δ > −m, and consider Gn = PAn (d). Let t̄ = (t, (vw )w∈V (t) ) be a rooted
(m,δ)
vertex- and edge-marked tree with root on . Fix t̄ such that (vw )w∈V (t) ) are all distinct, with
the oldest vertex having age at least εn. Then, for all (ψv )v∈V (t̄) such that ψv ≤ K/v for
all v ∈ V (t̄), as n → ∞,
P B̄r(Gn ) (on ) = t̄ | (ψv )v∈V (t̄)
1 + oP (1) Y p0v Y
= ψv exp − (2m + δ)nψv (v/n)χ (1 − (v/n)1−χ )
n v∈V (t̄) ◦
v∈V (t̄)
Y (βs + qs0 − 1)qs0
× , (5.4.52)
s∈[n]\V (t̄)
(α + βs + qs0 − 1)qs0
1{s∈(v (u),u)} .
X X
qs0 = j
(5.4.54)
u∈V (t̄) j∈[m]
Proof We start by analyzing the conditional law of B̄r(Gn ) (on ) given all (ψv )v∈[n] . After
this, we take the expectation wrt ψv for v 6∈ B̄r(Gn ) (on ) to get the claim.
We first condition on all (ψv )v∈[n] and use Lemma 5.12 to obtain, for a vertex-marked edge-
labeled tree t̄,
n
1 Y p0v Y 0
Pn (B̄r(Gn ) (on ) = t̄) = ψv (1 − ψs )qs
n v∈V (t̄) s=2
Y Y
× [1 − Pu,v ], (5.4.56)
v∈V ◦ (t̄) j
u,j : u6 v
where the 1/n is due to the uniform choice of the root, the first double product is due to all
the required edges to ensure that B̄r(Gn ) (on ) ⊆ t̄, while the second double product is due to
all the other edges, which must be absent, so that B̄r(Gn ) (on ) really equals t̄.
No-Further-Edge Probability
We continue by analyzing the second line in (5.4.56), which, for clarity, we call the no-
further-edge probability. First of all, since we are exploring the r-neighborhood of o, the
j
only edges that are not allowed are of the form u v , where v ∈ V ◦ (t̄) and u > v , i.e.,
they are younger vertices than those in V (t̄) that do not form edges in t̄.
◦
Recall that the minimal age of a vertex in t̄ is εn. Further, by Lemma 5.17, with over-
whelming probability, ψv ≤ (log n)2 /n for all v ≥ εn. In particular, Pu,v is small uni-
formly in v ∈ V (t̄) and u > v . Since there are only finitely many elements in V ◦ (t̄), we
can thus approximate as follows:
Y Y Y Y
[1 − Pu,v ] = (1 + oP (1)) [1 − Pu,v ]. (5.4.57)
v∈V ◦ (t̄) j v∈V ◦ (t̄) u,j : u>v
u,j : u6 v
− sχ | −→ 0.
P
We will take v = dsne for some s ∈ [ε, 1]. By Lemma 5.18, sups∈[ε,1] |Sns
(n)
X Z 1
t−χ dt −→ 0.
P
sup 1/Su −
(n)
(5.4.60)
s∈[ε,1] s
u∈(sn,n]
5.4 Proof of Local Convergence for Preferential Attachment Models 223
We conclude that
1
m m
X Z
tχ dt = [1 − s1−χ ]
P
(n) Pu,sn −→ m
nψsn Ssn u∈(sn,n] s 1−χ
= (2m + δ)[1 − s1−χ ]. (5.4.61)
As a result,
X
m Pu,v = (1 + oP (1))nψsn Ssn
(n)
(2m + δ)[1 − s1−χ ]
u∈(sn,n]
Therefore,
Y Y Y
[1 − Pu,v ] = (1 + oP (1)) e−(2m+δ)(vψv )κ(v/n) . (5.4.64)
v∈V (t̄) j v∈V ◦ (t̄)
u,j : u6 v
In order to relate this to the Gamma variables in the description of the Pólya point tree
(recall (5.3.2)–(5.3.5)), we need to look into the precise structure of the rhs of (5.4.52), as
some of its ingredients give rise to the size-biasing in (5.3.2). In the proof below, we restrict
to the (χ̂k )k≥1 for which χ̂v ≤ K for all v ∈ V (t̄), which occurs whp for K large by
Lemma 5.20 and since v > εn.
where the error term is uniform on the event that χ̂v ≤ K for all v ∈ V (t̄).
where we recall ρw from (5.3.5), and its conditional form given Γw denoted by ρw (x; Γw ).
For w ∈ V (t), let vw be such that aw n = vw and write χw = χ̂vw . Using (5.4.66), this
leads to
Y Y − R 1 ρ (dt;χ̂ )
e−(2m+δ)(vψv )κ(v/n) = (1 + oP (1)) e v/n wv vw
where the error term is uniformly bounded. We recall that the edges in our edge-marked
tree t̄ are given by E(~ t̄) = {(u, vj (u), j) : u ∈ [n], j ∈ [m]}. We use (5.4.54) and βs =
(2s − 3)m + δ(s − 1) to write
X X X 1{s∈(v,u)}
qs0 /βs =
s∈[n]\V (t̄)
βs
(u,vj (u),j)∈E(t̄) s∈[n]\V (t̄)
X X 1{s∈(v,u)}
= . (5.4.71)
(u,vj (u),j)∈E(t̄) s∈[n]\V (t̄)
(2s − 3)m + δ(s − 1)
Note that (u, vj (u), j) ∈ E(t̄) when there exists w ∈ V (t) and ` such that (w, w`) ∈
E(t), so that
!χ
Y v χ Y vw ∧ vw`
= . (5.4.74)
u
(u,vj (u),j)∈E(t̄) (w,w`)∈E(t)
vw ∨ vw`
Combining this further with (5.4.69) and using (5.4.30) in Proposition 5.19, we obtain
(5.4.49) with (χw )w∈V (t) = (χ̂dnaw e )w∈V (t) , which is indeed an iid sequence of Gamma(m+
δ, 1) random variables when the (dnaw e)w∈V (t) are distinct, and we recall (5.4.30).
226 Connected Components in Preferential Attachment Models
the Pólya point tree. Further, Theorem 5.23 establishes a local density limit theorem for the
vertex marks.
The proof of Theorem 5.23 follows that of Theorem 5.21, so we can be more succinct. We
have the following characterization of the conditional law of the vertex- and edge-marked
versions of B̄r(Gn ) (o1 ) and B̄r(Gn ) (o2 ), where Gn = PA(m,δ)
n (d), which is a generalization of
Proposition 5.22 to two neighborhoods:
Proposition 5.24 (Law of neighborhoods in PA(m,δ) n (d)) Fix m ≥ 1 and δ > −m, and
(m,δ)
consider Gn = PAn (d). Let t̄1 and t̄2 be two rooted vertex- and edge-marked trees
with distinct and disjoint vertex sets and root vertices o1 and o2 , respectively. Uniformly for
vw > εn and χvw ≤ K for all w ∈ V (t1 ) ∪ V (t2 ), where χv = fv (ψv ), as n → ∞,
P B̄r(Gn ) (o1 ) = t̄1 , B̄r(Gn ) (o2 ) = t̄2 (ψv )v∈V (t̄1 )∪V (t̄2 )
0
Y Y
= (1 + oP (1)) ψvpv e−(2m+δ)(vψv )κ(v/n)
v∈V (t̄1 )∪V (t̄2 ) v∈V ◦ (t̄1 )∪V ◦ (t̄2 )
Y (βs + qs0 − 1)qs0
× , (5.4.80)
s∈[n]\(V (t̄1 )∪V (t̄2 ))
(α + βs + qs0 − 1)qs0
where now
1{u 1{s∈(v,u)} .
X
qs0 = ju
v}
(5.4.82)
u,v∈V (t̄1 )∪V (t̄2 )
Proof We first condition on all (ψv )v∈[n] and use Lemma 5.12 to obtain, for two trees t̄1
and t̄2 , as in (5.4.56),
n
1 Y 0
Y 0
Pn (B̄r (o1 ) = t̄1 , B̄r (o2 ) = t̄2 ) = 2 ψvpv (1 − ψs )qs
n v∈V (t̄1 )∪V (t̄2 ) s=2
Y Y
× [1 − Pu,v ], (5.4.83)
v∈V ◦ (t̄ )∪V ◦ (t̄ ) j
1 2
u,j : u6 v
where the factor 1/n2 is due to the uniform choices of the vertices o1 , o2 , the first double
product is due to all the edges required to ensure that B̄r(Gn ) (oi ) ⊆ t̄i for i ∈ {1, 2},
while the second double product is due to the edges that are not allowed to be there, so that
B̄r(Gn ) (oi ) really equals t̄i for i ∈ {1, 2}. The remainder of the proof follows the steps in
the proof of Proposition 5.22, and is omitted.
We continue by proving Theorem 5.23. This proof follows that of Theorem 5.21, using
Proposition 5.24 instead of Proposition 5.22. The major difference is that in Proposition
5.24, we assume that V (t̄1 ) and V (t̄2 ) are disjoint, which we prove to be the case whp next.
228 Connected Components in Preferential Attachment Models
Disjoint Neighborhoods
We note that, as in Corollary 2.20,
h i
P(B̄r(Gn ) (o1 ) ∩ B̄r(Gn ) (o2 ) 6= ∅) = 1 − E |B2r
(Gn )
(o1 )|/n = 1 − o(1),
with
m(m+δ)
2m+δ
for st = OO,
m(m+1+δ)
for st = OY,
2m+δ
cst = (m−1)(m+δ) (5.4.89)
2m+δ
for st = YO,
m(m+δ)
for st = YY.
2m+δ
Proof Recall (5.3.3). Note that U 1/χ has density χy χ−1 . Thus, for [a, b] ⊆ [0, x], and with
χ = (m + δ)/(2m + δ),
b/x
hZ i bχ − aχ
κ (x, O), ([a, b], O) = mE χy χ−1 dy = m
. (5.4.90)
a/x xχ
Further, by (5.3.5), for [a, b] ⊆ [x, 1] and noting that 1/(τ − 1) = m/(2m + δ) = 1 − χ,
we have
Z b 1/(τ −1)−1
h y i
κ (x, O), ([a, b], Y) = (τ − 1)E Poi Γ?
dy
a x1/(τ −1)
1/(τ −1) 1/(τ −1)
b −a
= E[Γ? ]
x −1)
1/(τ
b1−χ − a1−χ
= (m + 1 + δ) . (5.4.91)
x1−χ
Similarly, for [a, b] ⊆ [0, x],
b/x
hZ i bχ − aχ
κ (x, Y), ([a, b], O) = (m − 1)E χy χ−1 dy = (m − 1)
, (5.4.92)
a/x xχ
Proof We do not present the entire proof but, rather, explain how the proof of Theorem 5.8
can be adapted. References can be found in the notes in Section 5.7. The proof of Theorem
5.8 has two main steps, the first being the fixed-graph Pólya urn representation in Theorem
5.10, which is the crucial starting point of the analysis. In the second step, this representation
is used to perform the second-moment method for the marked neighborhood counts. In the
present proof, we focus on the first part, as this is the most sensitive to minor changes in the
model.
(1,δ/m)
PAmn (b) and PA(1,δ/m)
mn (d) are the same, except that PA(1,δ/m)
mn (b) starts with two vertices
with two edges between them, while PA(1,δ/m) mn (d) starts with two vertices with one edge
between them. This different starting graph was addressed in Remark 5.11, where it was
explained that the finite-graph Pólya version is changed only in a minor way. We can thus
use the results obtained thus far, together with a collapsing procedure, to obtain the Pólya
urn description of PA(1,δ/m)
mn (b).
For PA(m,δ)
n (a), we also use that it can be obtained from PA(1,δ/m)
mn (a) by collapsing
the vertices [mv] \ [m(v − 1)] in PA(1,δ/m)mn (a) into vertex v in PA(m,δ)
n (a). However,
(1,δ/m) (1,δ/m)
PAmn (a) is not quite the same as PAmn (d). Instead, we use the description in Theo-
rem 5.6 and compare it with that in Theorem 5.7 to see that now the Beta variables are given
by (ψj0 )j∈[n] with ψ10 = 1, and, for j ≥ 2,
where (Γw,i )i∈[m] are iid Gamma variables with parameters given in (5.3.2) for m = 1
and with δ replaced by δ/m. Recall that the sum of iid Gamma parameters with parameters
(r
Pim )i∈[m] and scale parameter λ = 1 is again Gamma distributed, now with parameter r =
i=1 ri and scale parameter λ = 1. Recall that either ri = 1 + δ/m or ri = 1 + δ/m + 1
and that there is no ri = 1 + δ/m + 1 when w has label Y, while Pmthere is exactly one i
with ri = 1 + δ/m + 1 when i has label O. Thus, we see that i=1 Γw,i has a Gamma
distribution with parameters r = m + δ and λ = 1 when the label of w is Y, while it has
parameters r = m + δ + 1 and λ = 1 when the label of w is O, as in (5.3.2) for m ≥ 2. We
refrain from giving more details.
note that In = Nn − Nn−1 = 1 precisely when all m edges of vertex n are attached to
vertex n. Thus,
Y 2e − 1 + δ
P(In = 1) = . (5.5.3)
j∈[m]
(2m + δ)n + (2j − 1 + δ)
For m ≥ 2,
∞
X
P(In = 1) < ∞, (5.5.4)
n=2
so that, almost surely, In = 1 occurs only finitely often. As a result, limn→∞ Nn < ∞
almost surely since
X ∞
Nn ≤ 1 + In . (5.5.5)
n=2
This implies that, for m ≥ 2, PAn (a) almost surely contains only finitely many con-
(m,δ)
We condition on n=K 1{Nn >Nn−1 } = 0, so that no new connected components are formed
P∞
after time K , and the number of connected components can only decrease in time. Let Fs
denote the σ -algebra generated by (PA(m,δ) n (a))sn=1 . We are left with proving that, for n
sufficiently large, the vertices in [K] are whp all connected in PA(m,δ)
n (a).
The proof proceeds in two steps. We show that, if Nn ≥ 2 and n is large, P(N2n − Nn ≤
−1 | Fn ) is uniformly bounded from below. Indeed, we condition on Fn for which Nn ≥ 2
and Nn ≤ K . Then, using Nn ≤ K , PA(m,δ) n (a) must have one connected component of
size at least n/K , while every other component has at least one vertex in it, and its degree
is at least m. Fix s ∈ [2n] \ [n]. Then, the probability that the first edge of vs(m) connects to
the connected component of size at least n/K , while the second connects to the connected
component of size at least 1, is, conditional on Fs−1 , at least
m + δ (m + δ)n/K ε
≥ , (5.5.7)
2(2m + δ)n 2(2m + δ)n n
for some ε > 0 and uniformly in s. Thus, conditional on Fn , the probability that this
happens for at least one s ∈ [2n] \ [n] is at least
ε n
1− 1− ≥ η > 0, (5.5.8)
n
uniformly for every n. Thus, when Nn ≥ 2, P(N2n − Nn ≤ −1 | Fn ) ≥ η . As a result,
a.s.
Nn −→ 1, so that NT = 1 for some T < ∞ almost surely. Without loss of generality, we
5.6 Further Results for Preferential Attachment Models 233
can take T ≥ K . When n=K 1{Nn >Nn−1 } = 0, if NT = 1 for some T then Nn = 1 for
P∞
all n ≥ T . This proves that PA(m,δ)
n (a) is whp connected for all n ≥ T , where T is large,
which implies Theorem 5.27.
Exercise 5.31 investigates the all-time connectivity of (PA(m,δ)
n (a))n≥1 .
it turns out that the parameter γ > 0 (which, by convention, is always taken to be 1 for
(PA(m,δ)
n )n≥1 ) is now the parameter that determines the tail behaviour of the degree distri-
bution (recall Exercise 1.23). In Exercises 5.23 and 5.24, the reader is invited to compute
the average degree of this affine model, as well as the number of edges added at time n for
large n.
The case of general attachment functions k 7→ f (k) is more delicate to describe. We start
by introducing some notation.We call a preferential attachment function f : N0 7→ (0, ∞)
concave when
f (0) ≤ 1 and ∆f (k) := f (k + 1) − f (k) < 1 for all k ≥ 0. (5.6.4)
Concavity implies the existence of the limit
f (k)
γ := lim = min ∆f (k). (5.6.5)
k→∞ k k≥0
The following result investigates the proportion of vertices v ∈ [n] whose connected
component C (v) at time n has size k :
Theorem 5.29 (Component sizes of Bernoulli preferential attachment models) Let f be a
concave attachment function. The Bernoulli preferential attachment model with condition-
n satisfies that, for every k ≥ 1,
ally independent edges BPA(f )
1 P
#{v ∈ [n] : |C (v)| = k} −→ µ(|T | = k), (5.6.6)
n
where |T | is the total progeny of an appropriate multi-type branching process.
The limiting multi-type branching process in Theorem 5.29 is such that µ(|T | = k) > 0
for all k ≥ 1, so that BPA(f )
n is disconnected whp. While Theorem 5.29 does not quite prove
the local convergence of BPA(f )
n , it is strongly related. See Section 5.7 for a more detailed
discussion. Thus, in what follows, we discuss the proof as if the theorem does yield local
convergence.
We next describe the local limit and degree evolution in more detail. We start with two
main building blocks. Let (Zt )t≥0 be a pure-birth Markov process with birth rate f (k) when
it is in state k , starting from Z0 = 0 (i.e., it jumps from k to k + 1 at rate f (k)). Further,
for σ ≥ 0, let (Zt − 1[σ,∞) (t))t≥0 be the process (Zt )t≥0 conditioned on having a jump
[σ]
at time σ .
Let S := {Y} × R ∪ ({O} × [0, ∞)) × R be the type space, considering the label as
an element in {Y} ∪ ({O} × [0, ∞)) and the location as being in R. It turns out that the
location of a vertex t ∈ N in BPA(f )
n corresponds to log (t/n), and we allow for t > n in
our description. Individuals of label Y correspond to individuals that are younger than their
parent in the tree. Individuals of label (O, σ) correspond to individuals that are older than
their parent, and for them we need to record the relative location of an individual compared
with its parent (roughly corresponding to the log of the ratio of their ages).
The local limit is a multi-type branching process with the following properties and off-
spring distributions. The root has label Y and location −E , where E is a standard exponen-
tial random variable with parameter 1; this variable corresponds to log(U ) with U uniform
in [0, 1]. A particle of label Y at location x has younger children of label Y with relative
5.6 Further Results for Preferential Attachment Models 235
locations at the jumps of the process (Zt )t≥0 , so that their locations are equal to x + πi ,
where πi is the ith jump of (Zt )t≥0 . A particle of label Y at location x has older children
with labels (O, −πi ), where (πi )i≥0 are the points in a Poisson point process on (−∞, 0]
with intensity measure given by
et E[f (Z−t )] dt, (5.6.7)
their locations being x + πi . A particle of label (O, σ) generates offspring
B of labels in O × [0, ∞) in the same manner as for a parent of label Y;
B of label Y with locations at the jumps of (Zt − 1[σ,∞) (t))t≥0 plus x.
[σ]
The above describes the evolution of the branching process limit for all times. This can
be interesting when investigating, e.g., the degree evolutions and graph structures of the
vertices in [n] at all times t ≥ n. However, for local convergence, we are interested only
in the subgraph of vertices in [n], so that only vertices with a negative location matter.
Thus, finally, we kill all particles with location x > 0 together with their entire tree of
descendants. This describes the local limit of BPA(f n , and |T | is the total progeny of this
)
instead of (Zt )t≥0 , where we see that the offspring distribution depends on the jump σ of
the individual of label (O, σ).
We close this discussion by explaining how the children having labels in {O} × [0, ∞)
arise. Note that these children are the same for individuals having label Y as well as for those
having label (O, σ) with σ ≥ 0. Since the connection decisions are independent and edge
probabilities are small for n large, the number of connections to vertices in the range [a, b]n
are close to a Poisson random variable with an appropriate parameter, thus leading to an
appropriate Poisson process. The expected number of neighbors of vertex qn with ages in
[a, b]n is roughly
bn
" #
f (Di(in) (qn)) 1 b 1 b
X Z Z
E ≈ E[f (Dun (qn))]du ≈
(in)
E[f (Zlog(q/u) )]du
i=an
qn q a q a
Z log(q/b) Z log(b/q)
= e−t E[f (Zt )]dt = et E[f (Z−t )]dt,
log(q/a) log(a/q)
(5.6.12)
as in (5.6.7). When the age is in [a, b]n, the location is in [log(a), log(b)], so the change
in location compared with q is in [log(a/q), log(b/q)]. This explains how the children with
labels in {O} × [0, ∞) arise, and completes our discussion of the local structure of BPA(f )
n .
1
( 21 − γ)2
γ≥ 2
or β> . (5.6.13)
1−γ
Consequently, if Cmax and C(2) denote the largest and second largest connected components
in BPA(f
n ,
)
P P
|Cmax |/n −→ ζ, |C(2) |/n −→ 0, (5.6.14)
Next, define a linear operator Aα on the Banach space C(S) of continuous, bounded
5.7 Notes and Discussion for Chapter 5 237
(i.e., without an intermediate update of the degrees while attaching the m edges incident to the newest
vertex), and a conditional model in which the edges are attached to distinct vertices. This shows that the
result is quite robust, as Theorem 5.26 also indicates.
A related version of Theorem 5.10 for δ = 0 was proved by Bollobás and Riordan (2004a) in terms of
a pairing representation. This applies to PA(m,δ)
n (a) with δ = 0. Another related version of Theorem 5.10
is proved in Rudas et al. (2007) and applies to general preferential attachment trees with m = 1. Its proof
relies on a continuous-time embedding in terms of continuous-time branching processes. We further refer
to Lo (2021) for results on the local convergence of preferential attachment trees with additive fitness.
Exercise 5.2 (Iid sequences are exchangeable) Show that (Xi )i≥1 forms an infinite sequence of exchange-
able random variables if (Xi )i≥1 are iid.
Exercise 5.3 (Limiting density in de Finetti’s Theorem) Use de Finetti’s Theorem (Theorem 5.2) to prove
a.s.
that Sn /n −→ U , where U appears in (5.2.1). Use this to prove (5.2.4).
Exercise 5.4 (Number of ones in (Xi )n Prove that P(Sn = k) = E P Bin(n, U ) = k in (5.2.3)
i=1 )
follows from de Finetti’s Theorem (Theorem 5.2).
Exercise 5.5 (Positive correlation of exchangeable random variables) Let (Xi )i≥1 be an infinite sequence
of exchangeable random variables. Prove that
Prove that equality holds if and only if (Xk )k≥1 are iid.
Exercise 5.6 (Limiting density of mixing distribution for Pólya urn schemes) Show that (5.2.24) identifies
the limiting density in (5.2.4).
Exercise 5.7 (Uniform recursive trees) A uniform recursive tree is obtained by starting with a single
vertex, and successively attaching the (n + 1)th vertex to a uniformly chosen vertex in [n]. Prove that, for
uniform recursive trees, the tree decomposition in Theorem 5.4 is such that
S1 (n) a.s.
−→ U, (5.8.2)
S1 (n) + S2 (n)
where U is uniform on [0, 1]. Use this to prove that P(S1 (n) = k) = 1/n for each k ∈ [n].
Exercise 5.8 (Scale-free trees) Recall the model studied in Theorem 5.4, where at time n = 2, we start
with two vertices of which vertex 1 has degree d1 and vertex 2 has degree d2 . After this, we successively
attach vertices to older vertices with probabilities proportional to the degree plus δ > −1 as in (1.3.64).
Show that the model for (PA(1,δ)
n (b))n≥1 , for which the graph at time n = 2 consists of two vertices joined
by two edges, arises when d1 = d2 = 2. What does Theorem 5.4 imply for (PA(1,δ) n (b))n≥1 ?
Exercise 5.9 (Relative degrees of vertices 1 and 2) Use Theorem 5.6 to compute limn→∞ P(D2 (n) ≥
xD1 (n)) for (PA(1,δ)
n (a))n≥1 .
Exercise 5.10 (Proof of Theorem 5.7) Complete the proof of Theorem 5.7 on the relative degrees in
scale-free trees for (PA(1,δ)
n (b))n≥1 by adapting the proof of Theorem 5.6.
Exercise 5.11 (Size-biased Gamma is again Gamma) Let X have a Gamma distribution with shape pa-
rameter r and scale parameter λ. Show that its size-biased version X ? has a Gamma distribution with
shape parameter r + 1 and scale parameter λ.
Exercise 5.12 (Power-law exponents in PA(m,δ)
n (d)) Use Lemma 5.9 to prove the power-law relations in
(5.3.8) of the asymptotic degree and neighbor degree distributions in PA(m,δ)
n (d), and identify the constants
cm,δ and c0m,δ appearing in them.
2m + δ Γ(k + 1 + δ) Γ(j + δ)
P(D = j, D0 = k) =
m2 (k − m)!Γ(m + 1 + δ) (j − m)!Γ(m + δ)
Z 1Z 1
× (1 − v)k−m v m+1+δ+δ/m (1 − u)j−m um+δ dudv. (5.8.3)
0 v
Exercise 5.17 (Multiple edges and Theorem 5.10) Fix m = 2. Let Mn denote the number of edges in
(m,δ)
PAn (d) that need to be removed so that no multiple edges remain. Use Theorem 5.10 to show that,
conditional on (ψk )k≥1 , the sequence (Mn+1 − Mn )n≥2 is an independent sequence with
n
X ϕ k 2
P Mn+1 − Mn = 1 | (ψk )k≥1 = . (5.8.6)
k=1
Sn(n)
Exercise 5.18 (Multiple edges and Theorem 5.10 (cont.)) Fix m = 2. Let Mn denote the number of edges
(m,δ)
in PAn (d) that need to be removed so that no multiple edges remain, as in Exercise 5.17. Use Exercise
5.17 to show that
n
" #
X ϕ 2
k
E[Mn+1 − Mn ] = E . (5.8.7)
k=1
Sn(n)
Exercise 5.19 (Multiple edges and Theorem 5.10 (cont.)) Fix m = 2. Let Mn denote the number of edges
in PA(m,δ)
n (d) that need to be removed so that no multiple edges remain, as in Exercise 5.17. Compute
" #
ϕ 2
k
E .
Sn(n)
Exercise 5.20 (Multiple edges and Theorem 5.10 (cont.)) Fix m = 2. Let Mn denote the number of
multiple edges in PA(m,δ)
n (d), as in Exercise 5.18, and fix δ > −1. Use Exercise 5.19 to show that
E[Mn ]/ log n → c, and identify c > 0. What happens when δ ∈ (−2, −1)?
Exercise 5.21 (Almost sure limit of normalized product of ψ’s) Let (ψj )j≥1 be independent Beta random
variables with parameters α = m + δ, β = (2j − 3)m + δ(j − 1) as in (5.3.19). Fix k ≥ 1. Prove that
(Mn (k))n≥k+1 , where
n
Y 1 − ψj
Mn (k) = , (5.8.8)
E[1 − ψj ]
j=k+1
is a multiplicative positive martingale. Thus, Mn (k) converges almost surely by the Martingale Conver-
gence Theorem ([V1, Theorem 2.24]).
ExerciseQ 5.22 (Almost sure limit of normalized product of Sk(n) ) Use Exercise 5.21, combined with the
fact that nj=k+1 E[1−ψj ] = ck (k/n) (1+o(1)) for some ck > 0, to conclude that (k/n)
χ χ Qn
j=k+1 (1−
ψj ) = (k/n)χ Sk(n) converges almost surely for fixed k.
Exercise 5.23 (Recursion formula for total edges in affine BPA(f t )
)
Consider the affine BPA(f )
n with
f (k) = γk + β. Derive a recursion formula for E[|E(BPA(f n
)
)|], where we recall that |E(BPA (f )
n )| is the
n . Identify µ such that E[|E(BPAn )|]/n → µ.
total number of edges in BPA(f ) (f )
Exercise 5.24 (Number of edges per vertex in affine BPAt(f ) ) Consider the affine BPA(f )
n with f (k) =
(f ) P
γk + β, as in Exercise 5.23. Argue that |E(BPAn )|/n −→ µ.
Exercise 5.25 (Degree of last vertex in affine BPAt(f ) ) Use the conclusion of Exercise 5.24 to show that
d
Dn (n) −→ Poi(µ).
Exercise 5.26 (CLT for number of connected components for m = 1) Show that the number of connected
components Nn in PA(1,δ)
n satisfies a central limit theorem when n → ∞, with equal asymptotic mean and
variance given by
1+δ 1+δ
E[Nn ] = log n(1 + o(1)), Var(Nn ) = log n(1 + o(1)). (5.8.9)
2+δ 2+δ
5.8 Exercises for Chapter 5 241
Exercise 5.27 (Number of connected components for m = 1) Use Exercise 5.26 to show that the number
P
of connected components Nn in PA(1,δ)
n satisfies Nn / log n −→ (1 + δ)/(2 + δ).
Exercise 5.28 (Number of self-loops in PA(m,δ)
n ) Fix m ≥ 1 and δ > −m. Use a similar analysis
P
to that in Exercise 5.26 to show that the number of self-loops Sn in PA(m,δ)
n (a) satisfies Sn / log n −→
(m + 1)(m + δ)/[2(2m + δ)].
Exercise 5.29 (Number of self-loops in PA(m,δ)
n (b)) Fix m ≥ 1 and δ > −m. Use a similar analysis
P
to that in Exercise 5.28 to show that the number of self-loops Sn in PA(m,δ)
n (b) satisfies Sn / log n −→
(m − 1)(m + δ)/[2(2m + δ)].
Exercise 5.30 (All-time connectivity for (PA(m,δ)
n (a))n≥1 ) Fix m ≥ 2. Show that the probability that
(PAn (m,δ)
(a))n≥1 is connected for all times n ≥ 1 equals P(In = 0 ∀n ≥ 2), where we recall that In is
the indicator that all the m edges of vertex n create self-loops.
Exercise 5.31 (All-time connectivity for (PA(m,δ)
n (a))n≥1 (cont.)) Fix m ≥ 2. Show that the probability
that (PA(m,δ)
n (a))n≥1 is connected for all times n ≥ 1 is in (0, 1).
Part III
Summary of Part II
So far, we have considered the simplest connectivity properties possible. We focused on
vertex degrees in Volume 1, and in Part II of this book we extended this to the local structure
of the random graphs involved as well as the existence and uniqueness of a macroscopic
connected, or giant, component. We can summarize the results obtained in the following
meta theorem:
The above means, informally, that the existence of the giant component is quite robust
when τ ∈ (2, 3), while it is not when τ > 3. This informally extends even to the random
removal of edges, exemplifying the robust nature of the giant when τ ∈ (2, 3). These results
make the general philosophy that “random graphs with similar degree characteristics behave
alike” precise, at least at the level of the existence and robustness of a giant component. The
precise condition guaranteeing the existence of the giant varies, but generally amounts to the
survival of the local limit.
In more detail, Part III is organized as follows. We study distances in general inhomoge-
neous random graphs in Chapter 6 and those in the configuration model, as well the closely
243
244
related uniform random graph with prescribed degrees, in Chapter 7. In the last chapter of
this part, Chapter 8, we study distances in the preferential attachment model.
C HAPTER 6
S MALL -W ORLD P HENOMENA IN
I NHOMOGENEOUS R ANDOM G RAPHS
Abstract
In this chapter we investigate the small-world structure in rank-1 and general
inhomogeneous random graphs. For this, we develop path-counting techniques
that are interesting in their own right.
12
10
2
105 106 107
Size
Figure 6.1 Median of typical distances in the 727 networks of size larger than
10,000 from the KONECT data base.
245
246 Small-World Phenomena in Inhomogeneous Random Graphs
specialize to rank-1 inhomogeneous random graphs, for which we can characterize their
ultra-small-world structure in more detail.
The proofs for the main results are in Sections 6.3–6.5. In Section 6.3 we prove lower
bounds on typical distances. In Section 6.4 we prove the corresponding upper bounds in
the doubly logarithmic regime, and in Section 6.5 we discuss path-counting techniques to
obtain the logarithmic upper bound for τ > 3. In Section 6.6 we discuss related results for
distances in inhomogeneous random graphs, including their diameter. We close the chapter
with notes and discussion in Section 6.7 and exercises in Section 6.8.
In this section we consider the distances between vertices of IRGn (κn ), where, as usual,
(κn ) is a graphical sequence of kernels with limit κ.
Recall that we write distG (u, v) for the graph distance between the vertices u, v ∈ [n] in
a graph G having vertex set V (G) = [n]. Here the graph distance between u and v is the
minimum number of edges in the graph G in all paths from u to v . Further, by convention,
we let distG (u, v) = ∞ when u, v are in different connected components. We define the
typical distance to be distG (o1 , o2 ), where o1 , o2 are two vertices that are chosen uar from
the vertex set [n].
It is possible that no path connecting o1 and o2 exists; then distIRGn (κn ) (o1 , o2 ) = ∞.
By Theorem 3.19, P(distIRGn (κn ) (o1 , o2 ) = ∞) → 1 − ζ 2 > 0, since ζ < 1 (see
Exercise 3.32). In particular, when ζ = 0, which is equivalent to ν = kT κ k ≤ 1,
P(distIRGn (κn ) (o1 , o2 ) = ∞) → 1. Therefore, in our main results, we condition on o1
and o2 being connected, and consider only cases where ζ > 0.
distances are of order ΘP (log n) when supx,y,n κn (x, y) < ∞, so that IRGn (κn ) is not
an ultra-small world. When kT κ k = ∞, a truncation argument can be used to prove that
distIRGn (κn ) (o1 , o2 ) = oP (log n), but its exact asymptotics is unclear. See Exercise 6.2.
The intuition behind Theorem 6.1 is that, by (3.4.7) and (3.4.8), a Poisson multi-type
branching process with kernel κ has neighborhoods that grow exponentially, i.e., the number
of vertices at distance k grows like kT κ kk . Thus, if we are to examine the distance between
two vertices o1 and o2 chosen uar from [n] then we need to explore the neighborhood of
vertex o1 up to the moment that it “catches” vertex o2 . For this to happen, the neighborhood
must have size of order n, so that we need kT κ kk = ν k ∼ n, i.e., k = kn ∼ logν n.
However, proving such a fact is quite tricky, since there are far fewer possible further vertices
to explore when the neighborhood has size proportional to n. The proof overcomes this fact
by exploring from the two vertices o1 and o2 simultaneously up to the first moment that
their neighborhoods share a common vertex, since then the shortest path is obtained. √ It turns
out that shared vertices start appearing when the neighborhoods have size roughly n. At
this moment, the neighborhood exploration
√ is still quite close to that in the local branching-
process limit. Since kT κ kr = ν r ∼ n when r = rn ∼ 21 logν n, this still predicts that
distances are close to 2rn ∼ logν n.
We next specialize to rank-1 inhomogeneous random graphs, where we also investigate
in more detail what happens when ν = ∞ in the case where the degree power-law exponent
τ satisfies τ ∈ (2, 3).
where the upper bound holds for all x ≥ 1, while the lower bound is required to hold only
for 1 ≤ x ≤ nβ for some β > 12 .
The assumption in (6.2.5) precisely is what we need, and it states that [1 − Fn ](x) obeys
power-law type bounds for appropriate values of x. Note that the lower bound in (6.2.5)
cannot be valid for all x, since Fn (x) > 0 implies that Fn (x) ≥ 1/n so that the lower and
upper bounds in (6.2.5) are contradictory when x n1/(τ −1) . Thus, the lower bound can
hold only for x = O(n1/(τ −1) ). When τ ∈ (2, 3), we have that 1/(τ − 1) ∈ ( 12 , 1), and we
need the lower bound to hold only for x ≤ nβ for some β ∈ ( 12 , 1). Exercises 6.3 and 6.4
give simpler conditions for (6.2.5) in special cases, such as iid weights.
The main result on graph distances in the case of infinite-variance weights is as follows:
Theorem 6.3 (Typical distances in rank-1 random graphs with infinite-variance weights)
Consider NRn (w), where the weights w = (wv )v∈[n] satisfy Conditions 1.1(a),(b) and
(6.2.5). Then, with o1 , o2 chosen independently and uar from [n] and conditional on o1 ←→
o2 ,
distNRn (w) (o1 , o2 ) P 2
−→ . (6.2.6)
log log n | log (τ − 2)|
The same result applies, under identical conditions, to GRGn (w) and CLn (w).
Theorem 6.3 implies that NRn (w), with w satisfying (6.3.21), is an ultra-small world.
See Figure 6.2 for a simulation of the typical distances in GRGn (w) with τ = 2.5 and
τ = 3.5, respectively, where the distances are noticeably smaller in the ultra-small setting
with τ = 2.5 compared with the small-world case with τ = 3.5.
In the next two sections we prove Theorems 6.2 and 6.3. The main tool to study typical
distances in NRn (w) is by comparison with branching processes. For τ > 3, the branching-
process approximation has finite mean, and we can make use of the martingale limit results
for the number of individuals in generation k as k → ∞, so that this number grows exponen-
tially as ν k . This explains the logarithmic growth of the typical distances. When τ ∈ (2, 3),
on the other hand, the branching process has infinite mean. In this case, the number of
6.2 Small-World Phenomena in Inhomogeneous Random Graphs 249
(a) (b)
0.2
0.3
0.15
Proportion
Proportion
0.2
0.1
0.1
0.05
0
2 3 4 5 6 7 8 9 10 11 12 2 4 6 8 10 12 14 16 18 20 22 24
Typical Distance Typical Distance
Figure 6.2 Typical distances between 2,000 pairs of vertices in the generalized
random graph with n = 100, 000, for (a) τ = 2.5 and (b) τ = 3.5.
individuals in generation k , conditional on survival of the branching process, grows super-
exponentially in k , which explains why the typical distances grow doubly logarithmically.
See Section 7.4, where this is explained in more detail in the context of the configuration
model.
The super-exponential growth implies that a path between two vertices typically passes
through vertices with larger and larger weights as we move away from the starting and
ending vertices. Thus, starting from the first vertex o1 ∈ [n], the path connecting o1 to o2
uses vertices whose weights first grow until the midpoint of the path is reached, and then
decrease again to reach o2 . This can be understood by noting that the probability that a vertex
with weight w is not connected to any vertex with weight larger than y > w in NRn (w) is
where Fn? (y) = i∈[n] wi 1{wi ≤y} /`n is the distribution function of Wn? , to be introduced
P
in (6.3.24) below. When (6.2.5) holds, it follows that [1 − Fn? ](y) is close to y −(τ −2) ; the
size-biasing increases the power by 1 (recall Lemma 1.23). Therefore, the probability that
a vertex with weight w is not connected to any vertex with weight larger than y > w in
−(τ −2)
NRn (w) is approximately e−cwy for some c > 0. For w large, this probability is
small when y w1/(τ −2) . Thus, a vertex of weight w is whp connected to a vertex of
weight approximately w1/(τ −2) , where 1/(τ − 2) > 1 for τ ∈ (2, 3).
distNRn (w) (o1 , o2 ) can be achieved by showing that the expected number of paths between
o1 and o2 having a given number of steps vanishes.
Fix Gn = NRn (w). When proving upper bounds on typical distances, we do need to
consider carefully the conditioning on distGn (o1 , o2 ) < ∞. Indeed, distGn (o1 , o2 ) = ∞
does actually occur with positive probability, for example when o1 and o2 are in two dis-
tinct connected components. To overcome this difficulty, we condition on Br(Gn ) (o1 ) and
Br(Gn ) (o2 ) in such a way that ∂Br(Gn ) (o1 ) 6= ∅ and ∂Br(Gn ) (o2 ) 6= ∅ hold, which, for
r large, makes the event that distNRn (w) (o1 , o2 ) = ∞ quite unlikely. In Section 6.4, we
prove the doubly logarithmic upper bound for τ ∈ (2, 3). Surprisingly, this proof is simpler
than that for logarithmic distances, primarily because we know that the shortest paths for
τ ∈ (2, 3) generally go from lower-weight vertices to higher-weight ones, until the hubs are
reached, and then they go back.
In Section 6.5 we investigate the variance of the number of paths between sets of ver-
tices in NRn (w), using an intricate path-counting method that estimates the sum, over
pairs of paths, of the probability that they are both occupied. For this, the precise joint
topology of these pairs of paths is crucial. We use a second-moment method to show that,
under the conditional laws, given Br(Gn ) (o1 ) and Br(Gn ) (o2 ) such that ∂Br(Gn ) (o1 ) 6= ∅
and ∂Br(Gn ) (o2 ) 6= ∅, whp there is a path of appropriate length linking ∂Br(Gn ) (o1 ) and
∂Br(Gn ) (o2 ). This proves the logarithmic upper bound when τ > 3. In each of our proofs,
we formulate the precise results as separate theorems and prove them under conditions that
are slightly weaker than those in Theorems 6.1, 6.2, and 6.3.
In this section we prove lower bounds on typical distances. In Section 6.3.1 we prove the
lower bound in Theorem 6.1(a), first in the setting of Theorem 6.2; this is followed by the
proof of Theorem 6.1(a). In Section 6.3.2 we prove the doubly logarithmic lower bound on
distances for infinite-variance degrees for NRn (w) in Theorem 6.3.
Then, for any ε > 0, with o1 , o2 chosen independently and uar from [n],
P(distNRn (w) (o1 , o2 ) ≤ (1 − ε) logν n) = o(1). (6.3.2)
The same result applies, under identical conditions, to GRGn (w) and CLn (w).
Proof The idea behind the proof of Theorem 6.4 is that it is quite unlikely for a path
containing far fewer than logν n edges to exist. In order to show this, we use a first-moment
bound and show that the expected number of occupied paths connecting the two vertices
6.3 Typical-Distance Lower Bounds in Inhomogeneous Random Graphs 251
~π
π0 = uπ1 π2 π3 π4 π5 π6 π7 π8 π9 π10 π11 π12 = v
chosen uar from [n] having length at most (1 − ε) logν n is o(1). We will now fill in the
details.
We set kn = d(1 − ε) logν ne. Then, conditioning on o1 , o2 gives
kn
1 X X
P(distNRn (w) (o1 , o2 ) ≤ kn ) = 2 P(distNRn (w) (u, v) = k)
n u,v∈[n] k=0
kn
1 1 X X
= + 2 P(distNRn (w) (u, v) = k). (6.3.3)
n n u,v∈[n] : u6=v k=1
In this section and in Section 6.5, we make use of path-counting techniques (see in par-
ticular Section 6.5.1). Here, we show that short paths are unlikely to exist by giving upper
bounds on the expected number of paths of various types. In Section 6.5.1 we give bounds
on the variance of the number of paths of various types, so as to show that long paths are
quite likely to exist. Such variance bounds are quite challenging, and here we give some
basics to highlight the main ideas in a much simpler setting.
See Figure 6.3 for an example of a 12-step self-avoiding path between u and v .
When distNRn (w) (u, v) = k , there must be a path of length k such that all edges {πl , πl+1 }
are occupied in NRn (w), for l = 0, . . . , k − 1. The probability that the edge {πl , πl+1 } is
occupied in NRn (w) is equal to
For CLn (w) and GRGn (w), an identical upper bound holds, which explains why the proof
of Theorem 6.4 for NRn (w) applies verbatim to those models. By the union bound or
Boole’s inequality,
Therefore
k−1
wu wv X wπ2 l
Y
P(distNRn (w) (u, v) = k) ≤
`n π ∈Pk (u,v) l=1
~
`n
k−1
!
wu wv Y X wπ2 l wu wv k−1
≤ = ν , (6.3.7)
`n l=1 π ∈[n] `n `n n
l
when δ = δ(ε) > 0 is chosen such that (1 − ε)/ log(ν + δ) < 1, since kn = d(1 −
ε) logν ne. This completes the proof of Theorem 6.4.
The condition (6.3.1) is slightly weaker than Condition 1.1(c), which is assumed in The-
orem 6.2, as shown in Exercises 6.5 and 6.6. Exercise 6.7 extends the proof of Theorem
6.4 to show that (distNRn (w) (o1 , o2 ) − log n/ log νn )− is tight, where we write (x)− =
max{−x, 0}.
We close this section by extending the above result to settings where νn is not necessarily
bounded, the most interesting case being τ = 3:
Corollary 6.6 (Lower bound on typical distances for rank-1 random graphs for τ = 3)
Consider NRn (w), and let νn be given in (6.3.1). Then, for any ε > 0, with o1 , o2 chosen
independently and uar from [n],
The same result applies, under identical conditions, to GRGn (w) and CLn (w).
The proof of Corollary 6.6 is left as Exercise 6.8. In the case where τ = 3 and [1−Fn ](x)
6.3 Typical-Distance Lower Bounds in Inhomogeneous Random Graphs 253
is, for a large range of x values, of order x−2 (which is stronger than τ = 3), it can be
expected that νn = Θ(log n), so that, in that case,
log n
P distNRn (w) (o1 , o2 ) ≤ (1 − ε) = o(1). (6.3.11)
log log n
Exercise 6.9 investigates the situation where τ = 3. Exercise 6.10 considers the case
τ ∈ (2, 3), where Corollary 6.6 unfortunately does not give particularly interesting results.
Lower Bound on Typical Distances for General IRGs: Proof of Theorem 6.1(a)
The proof of the upper bound in Theorem 6.1(a) is closely related to that in Theorem 6.4.
Note that
X k−1
Y κn (xπl , xπl+1 )
P(distIRGn (κn ) (u, v) = k) ≤ , (6.3.12)
π ,...,π ∈[n] l=0
n
1 k−1
where ni denotes the number of vertices of type i ∈ [t] and where the probability that there
exists an edge between vertices of types i and j is equal to κ(n) (i, j)/n.
Under the conditions in Theorem 6.1(a), we have µi(n) = ni /n → µ(i) and κ(n) (i, j) →
κ(i, j) as n → ∞. This also implies that kT κn k → ν , where ν is largest eigenvalue of the
(n)
matrix M = (Mij )i,j∈[t] with Mij = κ(i, j)µ(j). Denoting Mij = κ(n) (i, j)nj /n →
Mij , we obtain
1
P(distIRGn (κ) (o1 , o2 ) = k) ≤ h(µ(n) )T , [M(n) ]k 1i, (6.3.17)
n
254 Small-World Phenomena in Inhomogeneous Random Graphs
where 1 is the all-1s vector. Obviously, since there are t < ∞ types,
√
h(µ(n) )T , [M(n) ]k 1i ≤ kM(n) kk kµ(n) kk1k ≤ kM(n) kk t. (6.3.18)
Thus,
√
t
P(distIRGn (κ) (o1 , o2 ) = k) ≤ kM(n) kk . (6.3.19)
n
We conclude that
P(distIRGn (κn ) (o1 , o2 ) ≤ (1 − ε) logνn n) = o(1), (6.3.20)
where νn = kM(n) k → ν . This proves Theorem 6.1(a) in the finite-type setting.
We next extend the proof of Theorem 6.1(a) to the infinite-type setting. Assume that the
conditions in Theorem 6.1(a) hold. Recall the bound in (3.3.20), which bounds κn from
above by κ̄m , which is of finite type. Then, use the fact that kT κ̄m k & kT κ k = ν > 1 to
conclude that P(distIRGn (κn ) (o1 , o2 ) ≤ (1 − ε) logν n) = o(1) holds under the conditions
of Theorem 6.1(a). This completes the proof of Theorem 6.1(a).
where
1 X
Fn? (x) = wi 1{wi ≤x} , (6.3.24)
`n i∈[n]
so that (6.3.23) is small when y is too large. The main contribution to νn , on the other hand,
comes from vertices having maximal weight of the order n1/(τ −1) .
This problem is resolved by a suitable truncation argument on the weights of vertices in
occupied paths, P which effectively removes these high-weight vertices. Therefore, instead of
obtaining νn = v∈[n] wv2 /`n , we obtain a version of this sum restricted to vertices having
a relatively small weight. Effectively, this means that we split the space of all paths into
good paths, i.e., paths that avoid high-weight vertices, and bad paths, which are paths that
use high-weight vertices.
We now present the details of this argument. We again start from
1 1 X
P(distNRn (w) (o1 , o2 ) ≤ kn ) = + 2 P(distNRn (w) (u, v) ≤ kn ). (6.3.25)
n n u,v∈[n] : u6=v
When distNRn (w) (u, v) ≤ kn , there exists an occupied path ~π ∈ Pk (u, v) for some k ≤ kn .
We fix an increasing sequence of numbers (bl )l≥0 that serve as truncation values for the
weights of vertices along our occupied path. We determine the precise values of (bl )l≥0 ,
which is a quite delicate procedure, below.
Definition 6.8 (Good and bad paths) Fix k ≥ 1. Recall the definitions of k -step self-
avoiding paths Pk (u, v) and Pk (u) from Definition 6.5. We say that a path ~π ∈ Pk (u, v) is
good when wπl ≤ bl ∧ bk−l for every l ∈ [k], and bad otherwise. Let GP k (u, v) be the set
of good paths in Pk (u, v), and let
BP k (u) = {~π ∈ Pk (u) : wπk > bk , wπl ≤ bl ∀l < k} (6.3.26)
denote the set of bad paths of length k starting in u. J
The condition wπl ≤ bl ∧ bk−l for every l = 0, . . . , k is equivalent to the statement that
wπl ≤ bl for l ≤ k/2, while wπl ≤ bk−l for k/2 ≤ l ≤ k . Thus, bl provides an upper
bound on the weight of the lth and (k − l)th vertices of the occupied path, ensuring that the
weights in it cannot be too large. See Figure 6.4 for a visualization of a good path and the
bounds on the weight of its vertices.
Let
Ek (u, v) = {∃~π ∈ GP k (u, v) : ~π occupied} (6.3.27)
denote the event that there exists a good path of length k between u and v .
Let Fk (u) be the event that there exists a bad path of length k starting from u, i.e.,
Fk (u) = {∃~π ∈ BP k (u) : ~π occupied}. (6.3.28)
Then, since distNRn (w) (u, v) ≤ kn implies that there either is a good path between vertices
u and v , or a bad path starting in u or in v , for u 6= v ,
kn
[
{distNRn (w) (u, v) ≤ kn } ⊆ Fk (u) ∪ Fk (v) ∪ Ek (u, v) , (6.3.29)
k=1
256 Small-World Phenomena in Inhomogeneous Random Graphs
π5 : w π5 ≤ b5 ~π
π4 : w π4 ≤ b4 π6 : w π6 ≤ b4
π3 : w π3 ≤ b3 π7 : w π7 ≤ b3
π2 : w π2 ≤ b2 π8 : w π8 ≤ b2
π1 : w π1 ≤ b1 π9 : w π9 ≤ b1
Figure 6.4 A 10-step good path connecting π0 = u and π10 = v and the upper
bounds on the weight of its vertices. Vertices with large weights are higher in the
figure.
In order to estimate the probabilities P(Fk (u)) and P(Ek (u, v)), we introduce some no-
tation. For b ≥ 0, define the truncated second moment
1 X 2
νn (b) = w 1{wi ≤b} (6.3.31)
`n i∈[n] i
to be the restriction of νn to vertices of weight at most b, and recall that Fn? (x) from
(6.3.24) denotes the distribution function of Wn? , the size-biased version of Wn . The fol-
lowing lemma gives bounds on P(Fk (u)) and P(Ek (u, v)) in terms of the tail distribution
function 1 − Fn? and νn (b), which, in turn, we bound using Lemmas 1.23 and 1.22, respec-
tively:
Lemma 6.9 (Truncated path probabilities) For every k ≥ 1, (bl )l≥0 with bl ≥ 0 and
l 7→ bl non-decreasing, in NRn (w), CLn (w), and GRGn (w),
k−1
Y
P(Fk (u)) ≤ wu [1 − Fn? ](bk ) νn (bl ) (6.3.32)
l=1
and
k−1
wu wv Y
P(Ek (u, v)) ≤ νn (bl ∧ bk−l ). (6.3.33)
`n l=1
When bl = ∞ for each l, the bound in (6.3.33) equals that obtained in (6.3.7).
6.3 Typical-Distance Lower Bounds in Inhomogeneous Random Graphs 257
k−1
Y
= wu [1 − Fn? ](bk ) νn (bl ). (6.3.35)
l=1
since wπl ≤ bl ∧ bk−l . Now follow the steps in the proof of (6.3.32). Again the same bound
applies to CLn (w) and GRGn (w).
In order to apply Lemma 6.9 effectively, we use Lemmas 1.22 and 1.23 to derive bounds
on [1 − Fn? ](x) and νn (b):
Lemma 6.10 (Bounds on sums) Suppose that the weights w = (wv )v∈[n] satisfy Condition
1.1(a) and that there exist τ ∈ (2, 3) and c2 such that, for all x ≥ 1, (6.3.21) holds. Then,
there exists a constant c?2 > 0 such that, for all x ≥ 1,
[1 − Fn? ](x) ≤ c?2 x−(τ −2) , (6.3.37)
and there exists a cν > 0 such that, for all b ≥ 1,
νn (b) ≤ cν b3−τ . (6.3.38)
Proof The bound in (6.3.37) follows from Lemma 1.23, and the bound in (6.3.38) from
(1.4.13) in Lemma 1.22 with a = 2 > τ −1 when τ ∈ (2, 3). For both lemmas, the assump-
tions follow from (6.3.21). See Exercise 6.13 below for the bound on νn (b) in (6.3.38).
With Lemmas 6.9 and 6.10 in hand, we are ready to choose (bl )l≥0 and to complete the
proof of Theorem 6.7:
Proof of Theorem 6.7. Take kn = d2(1 − ε) log log n/| log (τ − 2)|e. By (6.3.25) and
(6.3.29),
kn h
1 X 2 X 1 X i
P(distNRn (w) (o1 , o2 ) ≤ kn ) ≤ + P(Fk (u)) + 2 P(Ek (u, v)) ,
n k=1 n u∈[n] n u,v∈[n] : u6=v
(6.3.39)
258 Small-World Phenomena in Inhomogeneous Random Graphs
where the term 1/n is due to o1 = o2 for which distNRn (w) (o1 , o2 ) = 0. We use Lemmas 6.9
and 6.10 to provide bounds on P(Fk (u)) and P(Ek (u, v)). These bounds are quite similar.
We first describe how we choose the truncation values (bl )l≥0 in such a way that [1 −
Fn? ](bk ) is small enough to make P(Fk (u)) small, and, for this choice of (bl )l≥0 , we show
that the contribution due to P(Ek (u, v)) is small. This means that it is quite unlikely that
u or v is connected to a vertex at distance k with too high a weight, i.e., with a weight at
least bk . At the same time, it is also unlikely that there is a good path ~π ∈ Pk (u, v) whose
weights are all small, i.e., for which wπk ≤ bk for every k ≤ kn , because k is too small to
achieve this.
By Lemma 6.9, we wish to choose bk in such a way that
k−1
1 X `n Y
P(Fk (u)) ≤ [1 − Fn? ](bk ) νn (bl ) (6.3.40)
n u∈[n] n l=0
1/(τ −2)
is small. Below (6.2.7), it was argued that we should choose bk such that bk ≈ bk−1 . In
order to make the contribution due to P(Fk (u)) small, however, we will take bk somewhat
larger. We now make this argument precise.
We take δ ∈ (0, τ − 2) sufficiently small and let
a = 1/(τ − 2 − δ) > 1. (6.3.41)
Take b0 = eA for some constant A ≥ 0 sufficiently large, and define (bl )l≥0 recursively by
l −l
bl = bal−1 , which implies that bl = ba0 = eA(τ −2−δ) . (6.3.42)
We will start from (6.3.30). By Lemma 6.9, we obtain an upper bound on P(Fk (u)) in terms
of factors νn (bl ) and [1 − Fn? ](bk ), which are bounded in Lemma 6.10. We start by applying
the bound on νn (bl ) to obtain
k−1 k−1 Pk−1
al
Y Y
νn (bl ) ≤ cν b3−τ
l = ck−1
ν eA(3−τ ) l=1
l=1 l=1
k (3−τ )/(a−1)
≤ ck−1
ν eA(3−τ )a /(a−1)
= ck−1
ν bk . (6.3.43)
Combining (6.3.43) with the bound on [1 − Fn? ](bk ) in Lemma 6.10 yields, for k ≥ 1,
−(τ −2)+(3−τ )/(a−1)
P(Fk (u)) ≤ c?2 wu ck−1
ν bk . (6.3.44)
Since 3 − τ + δ < 1 when τ ∈ (2, 3) and δ ∈ (0, τ − 2), we have
(τ − 2) − (3 − τ )/(a − 1) = (τ − 2) − (3 − τ )(τ − 2 − δ)/(3 − τ + δ)
= δ/(3 − τ + δ) > δ, (6.3.45)
so that, for k ≥ 1,
P(Fk (u)) ≤ c?2 wu ck−1
ν b−δ
k . (6.3.46)
As a result, for each δ > 0,
kn
1 XX 1 X ? X k−1 −δ X
P(Fk (u)) ≤ c2 wu cν bk = O(1) ck−1
ν b−δ
k ≤ ε, (6.3.47)
n u∈[n] k=1 n u∈[n] k≥1 k≥1
6.4 Doubly Logarithmic Upper Bound for Infinite-Variance Weights 259
`n 2(3−τ )/(a−1)
≤ 2
kn ckνn −1 bdkn /2e , (6.3.49)
n
by (6.3.42). We complete the proof by analyzing this bound.
Recall that k ≤ kn = d2(1 − ε) log log n/| log(τ − 2)|e. Take δ = δ(ε) > 0 small
enough that (τ − 2 − δ)−(kn +1)/2 ≤ (log n)1−ε/4 . Then, by (6.3.42),
−(kn +1)/2 1−ε/4
bdkn /2e ≤ eA(τ −2−δ) ≤ eA(log n) , (6.3.50)
and we conclude that
kn
X 1 X `n
2
P(Ek (u, v)) ≤ 2 kn ckνn exp 2A(3 − τ )(log n)1−ε/4 ) = o(1), (6.3.51)
k=1
n u,v∈[n]
n
since kn = O(log log n) and `n /n2 = Θ(1/n). This completes the proof of Theorem
6.7.
In this section we prove the doubly logarithmic upper bound on typical distances in the case
where the asymptotic weight distribution has infinite variance. Throughout this section, we
assume that there exist τ ∈ (2, 3), β > 21 and c1 > 0 such that, uniformly in n and x ≤ nβ ,
whp form a complete graph or clique. In the second step, we prove a doubly logarithmic
upper bound on the distance between a vertex and the set of giant-weight vertices. The latter
bound holds only when the vertex is in the giant component, a fact that we need to take into
account carefully. In the final step, we complete the proof of Theorem 6.11.
Proposition 6.13 (Connecting to Giantn ) Consider NRn (w) under the conditions of The-
orem 6.11. Let u ∈ [n] be such that wu > 1. Then, there exist c, c?1 > 0 and η > 0 such
that
log log n ? η
P distNRn (w) (u, Giantn ) ≥ (1 + ε) ≤ ce−c1 wu . (6.4.10)
| log (τ − 2)|
Consequently, if Wr (u) = k∈∂Br(Gn ) (u) wk denotes the weight of vertices at graph dis-
P
Thus, x` is the maximal-weight neighbor of x`−1 in NRn (w). We stop the above recursion
when wx` ≥ nβ , since then x` ∈ Giantn . Recall the heuristic approach below (6.2.7),
which shows that a vertex with weight w is whp connected to a vertex with weight w1/(τ −2) .
We now make this precise.
We take a = 1/(τ − 2 + δ), where we choose δ > 0 small enough that a > 1. By
(6.3.24),
We split the argument depending on whether wxa` ≤ nβ . First, when wxa` ≤ nβ , by (6.4.1)
and uniformly for x ≤ nβ ,
xn
[1 − Fn? ](x) ≥ [1 − Fn ](x) ≥ c?1 x−(τ −2) , (6.4.14)
`n
where, for n large enough, we can take c?1 = c1 /(2E[W ]). Therefore
Second, when wxa` > nβ but wx` < nβ , we can use (6.4.14) for x = nβ to obtain
P(wx`+1 < nβ | (xs )s≤` ) ≤ exp − c?1 wx` n−β(τ −2) ≤ exp − c?1 nβ[1−(τ −2)]/a
least 1. It is here that we use the relation of the edge probabilities in NRn (w) and Poisson
random variables.
uv satisfy puv ≥ puv for all
By [V1, (6.8.12) and (6.8.13)], the edge probabilities p(CL) (CL) (NR)
6.4 Doubly Logarithmic Upper Bound for Infinite-Variance Weights 263
u, v ∈ [n], so the results immediately carry over to CLn (w). For Gn = GRGn (w),
however, for all z, v ∈ [n] \ V (Br(Gn ) (u)), we have
(GRG)
P zv ∈ E(Gn ) | Br(Gn ) (u) = 1 − (1 − p(GRG) ) ≥ 1 − e−pzv
zv
Since
P(distNRn (w) (o1 , o2 ) ≤ 2kn | distNRn (w) (o1 , o2 ) < ∞)
P(distNRn (w) (o1 , o2 ) ≤ 2kn )
= , (6.4.23)
P(distNRn (w) (o1 , o2 ) < ∞)
this follows from the two bounds
lim sup P(distNRn (w) (o1 , o2 ) < ∞) ≤ ζ 2 , (6.4.24)
n→∞
where ζ = µ(|C (o)| = ∞) > 0 is the survival probability of the branching-process ap-
proximation to the neighborhoods of NRn (w), as identified in Theorem 3.18. For (6.4.24),
we make the following split, for some r ≥ 1:
P(distNRn (w) (o1 , o2 ) < ∞)
≤ P(|∂Br(Gn ) (o1 )| > 0, |∂Br(Gn ) (o2 )| > 0, distNRn (w) (o1 , o2 ) > 2r)
+ P(distNRn (w) (o1 , o2 ) ≤ 2r). (6.4.26)
To prove (6.4.25), we fix r ≥ 1 and write
P(distNRn (w) (o1 , o2 ) ≤ 2kn )
≥ P(2r < distNRn (w) (o1 , o2 ) ≤ 2kn )
≥ P distNRn (w) (oi , Giantn ) ≤ kn , i = 1, 2, distNRn (w) (o1 , o2 ) > 2r
≥ P(|∂Br(Gn ) (o1 )| > 0, |∂Br(Gn ) (o1 )| > 0, distNRn (w) (o1 , o2 ) > 2r)
− 2P distNRn (w) (o1 , Giantn ) > kn , |∂Br(Gn ) (o1 )| > 0 .
(6.4.27)
264 Small-World Phenomena in Inhomogeneous Random Graphs
The first terms in (6.4.26) and (6.4.27) are the same. By Corollary 2.19(b), this term satisfies
P(|∂Br(Gn ) (o1 )| > 0, |∂Br(Gn ) (o1 )| > 0, distNRn (w) (o1 , o2 ) > 2r)
= P(|∂Br(Gn ) (o)| > 0)2 + o(1), (6.4.28)
which converges to ζ 2 when r → ∞.
We are left with showing that the second terms in (6.4.26) and (6.4.27) vanish when first
n → ∞ followed by r → ∞. By Corollary 2.20, P(distNRn (w) (o1 , o2 ) ≤ 2r) = o(1),
which completes the proof of (6.4.24).
For the second term in (6.4.27), we condition on Br(Gn ) (o1 ), and use that ∂Br(Gn ) (o1 ) is
measurable wrt Br(Gn ) (o1 ), to obtain
P distNRn (w) (o1 , Giantn ) > kn , |∂Br(Gn ) (o1 )| > 0
= E 1{|∂Br(Gn ) (o1 )|>0} P distNRn (w) (o1 , Giantn ) > kn | Br(Gn ) (o1 ) .
h i
(6.4.29)
1{|∂Br(Gn ) (o1 )|>0} P distNRn (w) (o1 , Giantn ) > kn | Br(Gn ) (o1 ) −→
P
0. (6.4.33)
By Lebesgue’s Dominated Convergence Theorem [V1, Theorem A.1] this implies
when first n → ∞ followed by r → ∞. This proves (6.4.25), and thus completes the proof
of the upper bound in Theorem 6.3 for NRn (w). The proofs for GRGn (w) and CLn (w)
are similar, and are left as Exercise 6.14.
In this section we give the proof of the logarithmic upper bound typical distances in rank-
1 random graphs with finite-variance weights stated in Theorem 6.2. For this, we use the
second-moment method to show that whp there exists a path of at most (1 + ε) logν n
6.5 Logarithmic Upper Bound for Finite-Variance Weights 265
edges between o1 and o2 , when o1 and o2 are such that ∂Br(Gn ) (o1 ), ∂Br(Gn ) (o2 ) 6= ∅ for
Gn = CLn (w). This proves Theorem 6.2 for CLn (w).
The extensions to NRn (w) and GRGn (w) follow by asymptotic equivalence of these
graphs, as discussed in [V1, Section 6.7]. Even though this shows that NRn (w), CLn (w),
and GRGn (w) all behave similarly, for our second-moment methods, we will need to be
especially careful about the model with which we are working.
To apply the second-moment method, we give a bound on the variance of the number of
paths of given lengths using path-counting techniques. This section is organized as follows.
In Section 6.5.1 we highlight our path-counting techniques. In Section 6.5.2 we apply these
methods to give upper bounds on typical distances for finite-variance weights. We also in-
vestigate the case where τ = 3, for which we prove that typical distances are bounded by
log n/ log log n under appropriate conditions.
The above path-counting methods can also be used to study general inhomogeneous ran-
dom graphs, as discussed in Section 6.5.3, where we prove Theorem 6.1(b) and use its proof
ideas to complete the proof of the law of large numbers for the giant in Theorem 3.19.
denote the number of self-avoiding paths of length k between the vertices a and b, where we
recall that a path ~π is self-avoiding when it visits every vertex at most once (see Definition
6.5). Let
nk (a, b) = E[Nk (a, b)] (6.5.2)
denote the expected number of occupied paths of length k connecting a and b. Define
X k−1 X k−1
n̄k (a, b) = ua ub u2i , nk (a, b) = ua ub u2i , (6.5.3)
i∈I\{a,b} i∈Ia,b,k
where Ia,b,k is the subset of I in which a and b, as well as the k − 1 vertices with high-
est weights, have been removed. In Section 6.3 we proved implicitly an upper bound on
E[Nk (a, b)] of the form (see also Exercise 6.15)
In this section we prove that nk (a, b) is a lower bound on nk (a, b) and use related bounds
to prove a variance bound on Nk (a, b).
Before stating our main result, we introduce some further notation. Let
X X
νI = u2i , γI = u3i (6.5.5)
i∈I i∈I
denote the sums of squares and third powers of (ui )i∈I , respectively. Our aim is to show
that whp paths of length k exist between the vertices a and b for an appropriate choice of k .
We do this by applying a second-moment method to Nk (a, b), for which we need a lower
bound on E[Nk (a, b)] and an upper bound on Var(Nk (a, b)) such that Var(Nk (a, b)) =
o(E[Nk (a, b)]2 ) (recall [V1, Theorem 2.18]), as in the next proposition, which is interesting
in its own right:
Proposition 6.14 (Variance of numbers of paths) For any k ≥ 1, a, b ∈ I and (ui )i∈I ,
E[Nk (a, b)] ≥ nk (a, b), (6.5.6)
while, assuming that νI > 1,
Var(Nk (a, b))
γ ν2 1 1 γI2 νI
≤ nk (a, b) + n̄k (a, b)2
I I
+ + + e k , (6.5.7)
νI − 1 ua ub ua ub (νI − 1)2
where γI γI νI 3 2 3
e2k γI /νI − 1 .
ek = 1 + 1+ (6.5.8)
ua νI ub νI νI − 1
Remark 6.15 (Path-counting and existence of k -step paths) Path-counting methods are
highly versatile. While in Proposition 6.14 we focus on Chung–Lu-type inhomogeneous
random graphs, we will apply them to general inhomogeneous random graphs with finitely
many types in Section 6.5.3 and to the configuration model in Section 7.3.3. For such
applications, we need to slightly modify our bounds, particularly those in Lemma 6.18,
owing to a slightly altered dependence structure between the occupation statuses of distinct
paths. J
We apply Proposition 6.14 in cases where √ E[Nk (a, b)] = nk (a, b) → ∞, by taking I to
be a large subset of [n] and ui to equal wi / `n for CLn (w). In this case, νI ≈ νn ≈ ν > 1.
In our applications of Proposition 6.14, the ratio n̄k (a, b)/nk (a, b) will be bounded, and
k 3 γI2 /νI3 = o(1), so that the term involving ek is an error term. The starting and ending
vertices a, b ∈ I will correspond to a union of vertices in [n] of quite large size; this relies
on the local limit stated in Theorem 3.18. As a result, γI /ua and γI /ub are typically small,
so that also
Var(Nk (a, b)) γI νI2 1 1 γI2 νI
≈ + + (6.5.9)
E[Nk (a, b)]2 νI − 1 ua ub ua ub (νI − 1)2
is small. As a result, whp there exists a path of k steps, as required. The choice of a, b, and
I is quite delicate, which explains why we formulated Proposition 6.14 in such generality.
We next prove Proposition 6.14, which, in particular for (6.5.7), requires some serious
combinatorial arguments.
6.5 Logarithmic Upper Bound for Finite-Variance Weights 267
Proof of Proposition 6.14. Recall Definition 6.5 and that Nk (a, b) is a sum of indicators:
1{~π occupied in Gn } ,
X
Nk (a, b) = (6.5.10)
π ∈Pk (a,b)
~
X k−1
Y
= uπ0 uπk u2πl . (6.5.11)
π ∈Pk (a,b) l=1
~
To compute Var(Nk (a, b)), we again start from (6.5.10), which yields
X
P(~π , ρ
~ occupied) − P(~π occ.)P(~
Var(Nk (a, b)) = ρ occ.) , (6.5.13)
~ ρ∈Pk (a,b)
π ,~
where we abbreviate {~π occupied in Gn } to {~π occupied} or {~π occ.} when no confusion
can arise.
For ~π , ρ
~, we denote the edges that the paths ~π and ρ~ have in common by ~π ∩ ρ ~. The
~ are independent precisely when ~π ∩ ρ
occupation statuses of ~π and ρ ~ = ∅, so that
X
Var(Nk (a, b)) ≤ P(~π , ρ
~ occupied). (6.5.14)
~ ~ ∈ Pk (a, b)
π, ρ
π∩ρ
~ ~ 6= ∅
~ \ ~π to be the edges in ρ
Define ρ ~ that are not part of ~π , so that
P(~π , ρ
~ occupied) = P(~π occupied)P(~
ρ occupied | ~π occupied)
k
Y Y
= uπl uπl+1 uē ue , (6.5.15)
l=0 ρ\~
e∈~ π
and this contributes nk (a, b) to Var(Nk (a, b)). Thus, from now on, we consider (~π , ρ
~ ) such
that ~π 6= ρ ~ 6= ∅.
~ and ~π ∩ ρ
The probability in (6.5.15) needs to be summed over all possible pairs of paths (~π , ρ ~)
with ~π 6= ρ
~ that share at least one edge. In order to do this effectively, we introduce some
notation.
Let l = |~π ∩ ρ~ | denote the number of edges in ~π ∩ ρ ~, so that l ≥ 1 precisely when
~π ∩ ρ~ 6= ∅. Note that l ∈ [k − 2], since ~π and ρ ~ are distinct self-avoiding paths of length
k between the same vertices a and b. Let k − l = |~ ρ \ ~π | ≥ 2 be the number of edges in ρ~
that are not part of ~π .
Let m denote the number of connected subpaths in ρ ~ \~π , so that m ≥ 1 whenever ~π 6= ρ
~.
Since π0 = ρ0 = a and πk = ρk = b, these subpaths start and end in vertices along the
path ~π . We can thus view the subpaths in ρ~ \ ~π as excursions of the path ρ ~ from the walk
~π . By construction, between two excursions there is at least one edge that ~π and ρ ~ have in
common. We next characterize this excursion structure:
Definition 6.16 ((Edge-)shapes of pairs of paths) Let m be the number of connected sub-
~ \ ~π . We define the shape of the pair (~π , ρ
paths in ρ ~ ) by
~ ) = (~xm+1 , ~sm , ~tm , ~om+1 , ~rm+1 ),
Shape(~π , ρ (6.5.17)
where
(1) ~ xm+1 ∈ Nm+1 0 , where xj ≥ 0 is the length of the subpath in ρ ~ ∩ ~π in between the
(j − 1)th and the j th subpath of ~π \ ρ ~. Here x1 ≥ 0 is the number of common edges in
the subpath of ρ ~ ∩ ~π that contains a, while xm+1 ≥ 0 is the number of common edges
in the subpath of ρ ~ ∩ ~π that contains b. For j ∈ {2, . . . , m}, xj ≥ 1;
(2) ~sm ∈ N , where sj ≥ 1 is the number of edges in the j th subpath of ~π \ ρ
m
~;
(3) ~tm ∈ Nm , where tj ≥ 1 is the number of edges in the j th subpath of ρ ~ \ ~π ;
(4) ~om+1 ∈ [m + 1]m+1 , where oj is the order of the j th common subpath in ρ ~ ∩ ~π of the
path ~π in ρ
~, e.g., o2 = 5 means that the second subpath that ~π has in common with ρ ~ is
the fifth subpath that ρ ~ has in common with ~π . Note that o1 = 1 and om+1 = m + 1,
since ~π and ρ~ start and end in a and b, respectively;
(5) ~rm+1 ∈ {0, 1}m+1 , where rj describes the direction in which the j th common subpath
~ ∩ ~π of the path ~π is traversed by ρ
in ρ ~, with rj = 1 when the direction is the same for
~π and ρ ~ and rj = 0 otherwise. Thus, r1 = rm+1 = 1. J
The information in Shape(~π , ρ ~ ) in Definition 6.16 is precisely what is needed to piece
together the topology of the two paths, except for information about the vertices involved
in ~π and ρ
~. The subpaths in Definition 6.16 of ρ ~ \ ~π avoid the edges in ~π but may contain
vertices that appear in ~π . This explains why we call the shapes edge-shapes. See Figure 6.5
for an example of a pair of paths (~π , ρ
~ ) and its corresponding shape.
We next discuss properties of shapes and use shapes to analyze Var(Nk (a, b)) further.
Recall that l = |~π ∩ ρ
~ | denotes the number of common edges in ~π and ρ
~, and m the number
of connected subpaths in ρ ~ \ ~π . Then
m+1
X m
X m
X
xj = l, sj = tj = k − l. (6.5.18)
j=1 j=1 j=1
6.5 Logarithmic Upper Bound for Finite-Variance Weights 269
t2
t1 ~π
t3 t4
x1 s1 x2 s2 x 3 s3 x 4 s4 x 5
r1 = 1 r2 = 1 r3 = 1 r4 = 0 r5 = 1
o1 = 1 o2 = 2 o3 = 4 o4 = 3 o5 = 5
ρ~
Figure 6.5 An example of a pair of paths (~π , ρ
~ ) and its corresponding shape.
Let Shapem,l denote the set of shapes corresponding to pairs of paths (~π , ρ
~ ) with m excur-
sions and l common edges so that (6.5.18) holds. Then,
k−2 X
X k−l X X
Var(Nk (a, b)) ≤ nk (a, b) + P(~π , ρ
~ occupied). (6.5.19)
l=1 m=1 σ∈Shapem,l ~ ~ ∈ Pk (a, b)
π, ρ
Shape(~ ~) = σ
π, ρ
We continue by investigating the structure of the vertices in (~π , ρ ~ ). Fix a pair of paths
(~π , ρ
~) such that Shape(~π , ρ~ ) = σ for some σ ∈ Shapem,l . There are k + 1 vertices in ~π .
Every subpath of ρ ~ \ π starts and ends in a vertex that is also in ~π . There are m connected
subpaths in ρ ~ \ ~π and l = |~π ∩ ρ
~ | common edges, so that there are at most k − l − m extra
vertices in ρ~ \ π . We conclude that the union of paths ~π ∪ ρ ~ visits at most 2k + 1 − l − m
distinct vertices and thus at most 2k − 1 − l − m vertices unequal to a or b.
Vertex a is in 1 + δx1 ,0 edges and vertex b is in 1 + δxm+1 ,0 edges. Of the other k − 1
vertices in ~π , precisely 2m − δx1 ,0 − δxm+1 ,0 are in three edges, the remaining k − 1 − 2m +
δx1 ,0 + δxm+1 ,0 vertices are in two or four edges. The remaining k − l − m vertices in ρ ~ \ ~π
that are not in ~π are in two edges. By construction ~π and ρ ~ are self-avoiding, so the k + 1
~, are distinct. In contrast, the k − l − m vertices in ρ
vertices in ~π , and those in ρ ~ \ ~π may
intersect those of ~π .
We summarize the vertex information of ~π and ρ ~ in the vector (v1 , . . . , v2k−1−l−m ) ∈
I 2k−1−l−m denoting the vertices in the union of ~π and ρ ~ that are unequal to a or b. We order
these vertices as follows:
B the vertices (v1 , . . . , v2m−a1 −am+1 ) are in three edges, in the same order as their appear-
ance in ~π , where we denote a1 = δx1 ,0 , am+1 = δxm+1 ,0 ;
B the vertices (v2m−a1 −am+1 +1 , . . . , vk−1 ) are the ordered vertices in ~π that are not in
three edges and are unequal to a or b, listed in the same order as in ~π ;
B the vertices (vk , . . . , v2k−1−l−m ) are the ordered vertices in ρ~ that are not in three edges
and are unequal to a or b, listed in the same order as in ρ ~.
Thus, vertices that are in four edges in ~π ∪ ρ~ occur twice in (v1 , . . . , v2k−1−l−m ). The vector
(v1 , . . . , v2k−1−l−m ) is precisely the missing information to reconstruct (~π , ρ ~ ) from σ :
Lemma 6.17 (Bijection of pairs of paths) There is a one-to-one correspondence between
the pairs of paths (~π , ρ
~ ) and the shape σ , combined with the vertices in (v1 , . . . , v2k−1−l−m )
as described above.
270 Small-World Phenomena in Inhomogeneous Random Graphs
Proof We have already observed that the shape σ of (~π , ρ ~ ) determines the intersection
structure of (~π , ρ
~ ) precisely, and, as such, it contains all the information needed to piece
together the two paths (~π , ρ
~ ), except for the information about the vertices involved in these
paths. Every vertex in ~π ∪ ρ ~ appears in two, three, or four edges. The vertices that occur
in three edges occur at the start of (v1 , . . . , v2k−1−l−m ) and the other vertices are those in
~π \ ρ ~ \ ~π , respectively. The above ordering ensures that we can uniquely determine
~ and ρ
where these vertices are located along the paths ~π and ρ ~.
Fix the pair of paths (~π , ρ ~ ) = σ for some σ ∈ Shapem,l , and
~ ) for which Shape(~π , ρ
recall that a1 = δx1 ,0 , am+1 = δxm+1 ,0 . Then, by (6.5.15) and Lemma 6.17,
2m−a1 −am+1 2k−1−l−m
1 1+am+1
Y Y
P(~π , ρ
~ occupied) = u1+a
a ub u3vs u2vt . (6.5.20)
s=1 t=2m−a1 −am+1 +1
Fix σ ∈ Shapem,l . We bound from above the sum over ~π , ρ ~ ∈ Pk (a, b) such that
~ ) = σ by summing (6.5.20) over all (v1 , . . . , v2k−1−l−m ) ∈ I 2k−1−l−m , to
Shape(~π , ρ
obtain
X
P(~π , ρ
~ occupied)
~ ~ ∈ Pk (a, b)
π, ρ
Shape(~ ~) = σ
π, ρ
u ν δx1 ,0 u ν δxm+1 ,0
a I b I
≤ ua ub γI2m νI2k−1−3m−l
γI γI
γ 1−δx1 ,0 γ 1−δxm+1 ,0
= n̄k (a, b)2 γI2(m−1) νI−3(m−1)−l
I I
. (6.5.21)
ua νI ub νI
Therefore, we arrive at
k−2 X
X k−l
Var(Nk (a, b)) ≤ nk (a, b) + n̄k (a, b)2 γI2(m−1) νI−3(m−1)−l
l=1 m=1
X γ 1−δx1 ,0 γ 1−δxm+1 ,0
I I
× . (6.5.22)
σ∈Shapem,l
ua νI ub νI
Equation (6.5.22) is our first main result on Var(Nk (a, b)), and we are left with inves-
tigating the combinatorial nature of the sums over the shapes. We continue to bound the
number of shapes in the following lemma:
(a) For m = 1, the number of shapes in Shapem,l with fixed a1 = δx1 ,0 , am+1 = δxm+1 ,0
equals l when a1 = am+1 = 0, 1 when a1 + am+1 = 1, and 0 when a1 = am+1 = 1.
(b) For m ≥ 2, the number of shapes in Shapem,l with fixed a1 = δx1 ,0 , am+1 = δxm+1 ,0
is bounded by
!2 !
m−1 k−l−1 l
2 (m − 1)! . (6.5.23)
m−1 m − a1 − am+1
6.5 Logarithmic Upper Bound for Finite-Variance Weights 271
be the number of ones in between the (i − 1)th and ith chosen zero. Similarly, there are
a−1 Pb
possible sequences (y1 , . . . , yb ) ∈ Nb such that j=1 yj = a, since we can apply the
b−1
previous equality to (y1 − 1, . . . , yb − 1) ∈ Nb0 .
Using the above, we continue to count the Pm number of shapes. The number of vectors
(s1 , . . . , sm ) ∈ Nm such that sj ≥ 1 and j=1 sj = k − l equals k−l−1 m−1
. The same
Pm
applies to (t1 , . . . , tm ) ∈ N such that tj ≥ 1 and j=1 tj = k − l.
m
Pm+1
In counting the number of possible ~ xm+1 such that j=1 xj = l, we need to count their
numbers separately for x1 = 0 and x1 ≥ 1, and for xm+1 = 0 and xm+1 ≥ 1. When
m = 1, the number is zero when x1 = x2 = 0, since x1 = x2 = 0 implies that the paths
share no edges. Recall a1 , am+1 , and suppose that m − a1 − am+1 ≥ 0. Then, there are
!
l
m − a1 − am+1
possible choices of ~ xm+1 with fixed a1 = δx1 ,0 , am+1 = δxm+1 ,0 . The claims in part (a),
as well as that in (6.5.23) in part (b), follow by multiplying these bounds on the number of
choices for ~rm+1 , ~om+1 , ~sm , ~tm and ~
xm+1 .
To prove (6.5.24) in part (b), we continue by obtaining the bound
!2
k−l−1 1 (k − l − 1)! 2 k 2(m−1)
(m − 1)! = ≤ , (6.5.25)
m−1 (m − 1)! (k − l − m)! (m − 1)!
!
l lm−a1 −am+1
≤ ≤ km . (6.5.26)
m − a1 − am+1 (m − a1 − am+1 )!
Therefore, the number of shapes in Shapem,l is bounded, for each l ≥ 1 and m ≥ 2, by
as required.
We are now ready to complete the proof of Proposition 6.14:
Proof of Proposition 6.14. By (6.5.22) and applying Lemma 6.18, it suffices to show that the
sum of
2γ 2 m−1 γ 1−a1 γ 1−am+1
−l I I
|Shapem,l | × I
νI
(6.5.28)
νI3 ua νI ub νI
l
over l ∈ [k − 2], m ∈ [k − l], and a1 , am+1 ∈ {0, 1} (where, by convention, −1 = 0), is
bounded by the contribution in parentheses in the second term in (6.5.7).
We start with m = 1, for which we obtain that the sum of (6.5.28) over the other variables
l ∈ [k − 2] and a1 , am+1 ∈ {0, 1} equals
∞ ∞
1 1 X −(l−1) γI2 X −(l−1)
γI + νI + lν
ua ub l=1 ua ub νI l=1 I
γI νI2 1 1 γI2 νI
= + + , (6.5.29)
νI − 1 ua ub ua ub (νI − 1)2
where we use that, for a ∈ [0, 1),
∞
X ∞
X
a−l = a/(1 − a), la−(l−1) = a2 /(1 − a)2 . (6.5.30)
l=0 l=0
The terms in (6.5.29) are the first two terms that are multiplied by n̄k (a, b)2 on the rhs of
(6.5.7).
This leaves us to bound the contribution when m ≥ 2. Since (6.5.24) is independent of
l, we can start by summing (6.5.28) over l ∈ [k] and over a1 , am+1 ∈ {0, 1}, to obtain a
bound of the form (recall (6.5.8))
γI γI νI X (2k 3 )m−1 γI2 m−1
k 1+ 1+
ua νI ub νI νI − 1 m≥2 (m − 1)! νI3
γI γI νI 3 2 3
e2k γI /νI − 1 = ek .
=k 1+ 1+ (6.5.31)
ua νI ub νI νI − 1
After multiplication with n̄k (a, b)2 , the term in (6.5.31) is the same as the last term appearing
on the rhs of (6.5.7). Summing the bounds in (6.5.29) and (6.5.31) proves (6.5.7).
Exercises 6.16–6.20 study various consequences of our path-counting techniques. In the
next subsection, we use Proposition 6.14 to prove lower bounds on graph distances.
ν = E[W 2 ]/E[W ] ∈ (1, ∞). Then, for any ε > 0, with o1 , o2 chosen independently and
uar from [n],
P(distNRn (w) (o1 , o2 ) ≤ (1 + ε) logν n | distNRn (w) (o1 , o2 ) < ∞) = 1 + o(1). (6.5.32)
The same result applies, under identical conditions, to GRGn (w) and CLn (w).
Theorem 6.19 provides the upper bound on the typical distances that matches Theorem
6.4, and together these two theorems prove Theorem 6.2. The remainder of this subsection
is devoted to the proof of Theorem 6.19.
The 1 − ε factors in (6.5.34) are due to the fact that the edge probabilities in the graph on
{a, b} ∪ [n] \ (Br(Gn ) (o1 ) ∪ Br(Gn ) (o2 )) are not exactly of the form pij = ui uj . Indeed, for
274 Small-World Phenomena in Inhomogeneous Random Graphs
i, j ∈ {a, b}, the edge probabilities are slightly different. When Conditions 1.1(a)–(c) hold,
however, the bound almost holds for a and b, which explains the factors 1 − ε.
We formalize the above ideas in the following lemma:
where (Zm(1)
, Zm
(2)
)m≥0 are the generation sizes of two independent unimodular branching
processes as in Theorem 3.18, and (W ?(1) (j))j≥1 and (W ?(2) (j))j≥1 are two independent
sequences of iid random variables with distribution F ? .
Proof It is now convenient to start with Gn = NRn (w). By Corollary 2.19, |∂Br(Gn ) (o1 )|
and |∂Br(Gn ) (o2 )| jointly converge in distribution to (Zr(1) , Zr(2) ), which are independent
generation sizes of the local limit of NRn (w) as in Theorem 3.18. Each of the individu-
als in ∂Br(Gn ) (o1 ) and ∂Br(Gn ) (o2 ) receives a mark Mi with weight wMi . By Proposition
3.16, these marks are iid random variables conditioned to be unthinned, where whp no ver-
P|∂B (Gn ) (oi )| ?(i)
tex in Br(Gn ) (o1 ) ∪ Br(Gn ) (o2 ) is thinned. Then, Wr (oi ) = j=1r Wn (j), where
d
(Wn?(i) (j))j≥1 are iid copies of Wn? . By Conditions 1.1(a),(b), Wn? −→ W ? , so that also
d P|∂B (G) (o )|
Wr (oi ) −→ j=1r i W ?(i) (j).
The joint convergence follows in a similar fashion, now using local convergence in prob-
ability. As discussed before, the above results extend trivially to GRGn (w) and CLn (w)
by asymptotic equivalence.
In order to apply Proposition 6.14, we start by relating the random graph obtained by
restricting CLn (w) to the vertex set Ia,b to the√ model on the vertex set Ia,b ∪ {a, b} with
edge probabilities pij = ui uj with ui = wi / `n for i ∈ Ia,b and ua , ub given by (6.5.34).
For this, we note that for i, j ∈ Ia,b , this equality holds by definition of CLn (w). We next
take i = a and j ∈ Ia,b ; the argument for i = b and j ∈ Ia,b is identical.
The conditional probability that j ∈ Ia,b is connected to at least one vertex in ∂Br(Gn ) (o1 ),
6.5 Logarithmic Upper Bound for Finite-Variance Weights 275
we obtain that
Y wv wj X wv wj X wv1 wv2 wj2
1− 1− ≥ −
(Gn )
`n (Gn )
`n (Gn )
2`2n
v∈∂Br (o1 ) v∈∂Br (o1 ) v1 ,v2 ∈∂Br (o1 )
2
Wr (o1 )wj Wr (o1 ) wj2
≥ − . (6.5.38)
`n 2`2n
√
By Conditions 1.1(a)–(c), wj = o( n) (recall Exercise 1.8), and Wr (o1 ) is a tight sequence
of random variables (see also Lemma 6.21 below), so that, whp for any ε > 0,
Y wv wj Wr (o1 )wj
1− 1− ≥ (1 − ε) . (6.5.39)
(Gn )
`n `n
v∈∂Br (o1 )
With the choices in (6.5.34), we see that our graph is bounded below by that studied
in Proposition 6.14. By the above description, it is clear that all our arguments will be
conditional, given Br(Gn ) (o1 ) and Br(Gn ) (o2 ). For this, we define Pr to be the conditional
distribution given Br(Gn ) (o1 ) and Br(Gn ) (o2 ), and we let Er and Varr be the corresponding
conditional expectation and variance.
In order to apply Proposition 6.14, we investigate the quantities appearing in it:
Lemma 6.21 (Parameters in path counting) Under the conditions of Theorem 6.19, and
conditioning on Br(Gn ) (o1 ) and Br(Gn ) (o2 ) with ∂Br(Gn ) (o1 ) 6= ∅, ∂Br(Gn ) (o2 ) 6= ∅, and
with a = ∂Br(Gn ) (o1 ), b = ∂Br(Gn ) (o2 ), for k = kn = d(1 + ε) logν ne − 2r,
P
nk (a, b) −→ ∞, n̄k (a, b) = (1 + oP (1))nk (a, b), (6.5.40)
and, as n → ∞,
Varr (Nk (a, b)) Kν 2 1 1 K 2ν 2
≤ √ + √ + + oP (1). (6.5.41)
Er [Nk (a, b)]2 ν−1 `n ua `n ub (ν − 1)`n ua ub
Proof By (6.5.3),
n̄k (a, b) νIa,b k−1
nk (a, b) = ua ub νIk−1 , and = . (6.5.42)
a,b,k
nk (a, b) νIa,b,k
We start by investigating νI . Denote
E[W 2 1{W ≤K} ]
ν(K) = . (6.5.43)
E[W ]
276 Small-World Phenomena in Inhomogeneous Random Graphs
Then, by (6.5.36) and since Br(Gn ) (o1 ) and Br(Gn ) (o2 ) contain a finite number of vertices,
P
νIa,b −→ ν(K). (6.5.44)
The same applies to νIa,b,k . Then, with K > 0 chosen sufficently large that ν(K) ≥ ν −ε/2
and with k = kn = d(1 + ε) logν ne − 2r,
Wr (o1 )Wr (o2 ) (1+ε) log νIa,b,k / log ν−1 P
nk (a, b) = ua ub νIk−1 = n −→ ∞, (6.5.45)
a,b,k
`n
when K and n are large that (1 + ε)ν(K)/ν > 1. This proves the first property in (6.5.40).
To prove the second property in (6.5.40), we note that the set Ia,b,k is obtained from Ia,b
by removing the k vertices with highest weight. Since wi ≤ K for all i ∈ I (recall (6.5.36)),
νIa,b ≤ νIa,b,k + kK/`n . Since k ≤ A log n, we therefore arrive at
ther, by (6.5.36),
νI K
γI ≤ νI (max ui ) ≤ √ , (6.5.47)
i∈I `n
so that, for k ≤ A log n with A > 1 fixed,
γI γI 2k3 γI2 /νI3
1+ 1+ k(e − 1) = oP (1). (6.5.48)
ua νI ub νI
Substituting these bounds into (6.5.41) and using (6.5.40) yields the claim.
Indeed, (6.5.49) implies that P(distCLn (w) (o1 , o2 ) > kn | distCLn (w) (o1 , o2 ) < ∞) = o(1),
since P(distCLn (w) (o1 , o2 ) < ∞) → ζ 2 > 0 by Theorem 3.20.
We rewrite
≤ P(Nkn −2r (∂Br(Gn ) (o1 ), ∂Br(Gn ) (o2 )) = 0, ∂Br(Gn ) (o1 ) 6= ∅, ∂Br(Gn ) (o2 ) 6= ∅)
≤ E Pr (Nkn −2r (∂Br(Gn ) (o1 ), ∂Br(Gn ) (o2 )) = 0)1{∂Br(Gn ) (o1 )6=∅,∂Br(Gn ) (o2 )6=∅} ,
h i
where we recall that Pr is the conditional distribution given Br(Gn ) (o1 ) and Br(Gn ) (o2 ).
6.5 Logarithmic Upper Bound for Finite-Variance Weights 277
By Lemma 6.21 and the Chebychev inequality [V1, Theorem 2.18], the conditional prob-
ability of {distCLn (w) (o1 , o2 ) > kn }, given Br(Gn ) (o1 ), Br(Gn ) (o2 ), is at most
Varr (Nkn −2r (a, b)) Kν 2 1 1 K 2ν 2
≤ √ + √ + + oP (1). (6.5.51)
Er [Nkn −2r (a, b)]2 ν−1 `n ua `n ub (ν − 1)`n ua ub
When ∂Br(Gn ) (o1 ) 6= ∅ and ∂Br(Gn ) (o2 ) 6= ∅, by (6.5.34) and as n → ∞,
(1) (2)
Zr Zr
1 1 X −1 X −1
√ +√ W ?(1) (j) W ?(2) (j)
P P
−→ (1 − ε) + (1 − ε) −→ 0,
`n ua `n ub j=1 j=1
(6.5.52)
when r → ∞. Therefore, with first n → ∞ followed by r → ∞,
Pr Nk−2r (a, b) = 0 | ∂Br(Gn ) (o1 ) 6= ∅, ∂Br(Gn ) (o2 ) 6= ∅ −→ 0,
P
(6.5.53)
Lemma 6.23 (Typical distances in core) Under the conditions of Theorem 6.22, let o01 , o02 ∈
Coren be chosen with probabilities proportional to their weights, i.e.,
wj
P(o0i = j) = P , (6.5.60)
v∈Coren wv
and let Hn0 be the graph distance between o01 , o02 in Coren . Then, for any ε > 0, there exists
an η ∈ (0, 1) in (6.5.58) such that
(1 + ε) log n
P Hn0 ≤ → 1. (6.5.61)
log log n
Lemma 6.24 (From periphery to core) Under the conditions of Theorem 6.22, let o1 , o2 be
two vertices chosen uar from [n]. Then, for any η > 0 in (6.5.58),
P(distCLn (w) (o1 , Coren ) ≤ νn1−η , distCLn (w) (o2 , Coren ) ≤ νn1−η ) → ζ 2 . (6.5.62)
Further, CLn (w), GRGn (w), and NRn (w) are asymptotically
√ equivalent when restricted
to the edges in [n] × {v : wv ≤ βn } for any βn = o( n).
Proof of Theorem 6.22 subject to Lemmas 6.23 and 6.24. To see that Lemmas 6.23 and 6.24
imply Theorem 6.22, we note that
distCLn (w) (o1 , o2 ) ≤ distCLn (w) (o1 , Coren ) + distCLn (w) (o2 , Coren )
+ distCLn (w) (o01 , o02 ), (6.5.63)
where o01 , o02 ∈ Coren are the vertices in Coren found first in the breadth-first search
from o1 and o2 , respectively. By the asymptotic equivalence of CLn (w), GRGn (w), and
NRn (w) on [n] × {v : wv ≤ βn }, stated in Lemma 6.24, whp distCLn (w) (o1 , Coren ) =
distNRn (w) (o1 , Coren ) and distCLn (w) (o2 , Coren ) = distNRn (w) (o2 , Coren ), so we can
work with NRn (w) outside Coren . Then, by Proposition 3.16, o01 , o02 ∈ Coren are cho-
sen with probabilities proportional to their weights, as assumed in Lemma 6.23.
Fix kn = d(1 + ε) log n/ log log ne. We conclude that, when n is sufficiently large that
νn1−η ≤ εkn /4,
P(distCLn (w) (o1 , o2 ) ≤ kn )
≥ P distCLn (w) (oi , Coren ) ≤ νn1−η , i = 1, 2
(6.5.64)
×P distCLn (w) (o01 , o02 ) νn1−η , i
≤ (1 − ε/2)kn | distCLn (w) (oi , Coren ) ≤ = 1, 2 .
By Lemma 6.24, the first probability converges to ζ 2 and by Lemma 6.23 the second proba-
bility converges to 1. We conclude that
log n
P distCLn (w) (o1 , o2 ) ≤ (1 + ε) → ζ 2. (6.5.65)
log log n
Since also P(distCLn (w) (o1 , o2 ) < ∞) → ζ 2 , this completes the proof of Theorem 6.22.
The proofs of Lemmas 6.23 and 6.24 follow from path-counting techniques similar to
those carried out earlier. Exercises 6.21–6.24 complete the proof of Lemma 6.23. Exercise
6.25 asks you to verify the asymptotic equivalence stated in Lemma 6.24, while Exercise
6.26 asks you to give the proof of (6.5.62) in Lemma 6.24.
6.5 Logarithmic Upper Bound for Finite-Variance Weights 279
In this case, kT κn k is the largest eigenvalue of the matrix M(n) i,j = κ (i, j)µn (j), which
(n)
converges to the largest eigenvalue of the matrix Mi,j = κ(i, j)µ(j) and equals ν =
kT κ k ∈ (1, ∞), by assumption. Without loss of generality, we may assume that µ(i) > 0
for all i ∈ [t]. This sets the stage for our analysis.
Fix Gn = IRGn (κn ); fix r ≥ 1 and assume that ∂Br(Gn ) (o1 ), ∂Br(Gn ) (o2 ) 6= ∅. We will
prove that
P distIRGn (κn ) (o1 , o2 ) ≤ (1 + ε) logν n | Br(Gn ) (o1 ), Br(Gn ) (o2 ) = 1 + oP (1). (6.5.68)
We follow the proof of Theorem 6.19 and rely on path-counting techniques. We again take
and
Ia,b = [n] \ (Br(Gn ) (o1 ) ∪ Br(Gn ) (o2 )). (6.5.70)
Recall from (6.5.1) that Nk (a, b) denotes the number of k -step occupied self-avoiding paths
connecting a and b.
280 Small-World Phenomena in Inhomogeneous Random Graphs
We aim to use the second-moment method for Nk (a, b), for which we need to investi-
gate the mean and variance of Nk (a, b). Let Pr denote the conditional probability given
Br(Gn ) (o1 ) and Br(Gn ) (o2 ), and let Er and Varr denote the corresponding conditional expec-
tation and variance. We compute
k
X X Y κn (πl , πl+1 )
Er [Nk (a, b)] = P(~π occupied in Gn ) =
π ∈Pk (a,b)
~ π∈Pk (a,b) l=0
n
1
≤ hx, T kκn yi, (6.5.71)
n
where x = (xi )ti=1 , and y = (yi )ti=1 , with xi the number of type-i vertices in ∂Br(Gn ) (o1 )
and yi the number of type-i vertices in ∂Br(Gn ) (o2 ), respectively. An identical lower bound
holds with an extra factor (µn − k/n)/µn , where µn = minj∈[t] µn (j) → minj∈[t] µ(j) >
0, by assumption.
Recall the notation and results in Section 3.4, and in particular Theorem 3.10(b). The
types of o1 and o2 are asymptotically independent, and the probability that o1 has type j is
equal to µn (j), which converges to µ(j). On the event that the type of o1 equals j1 , the vec-
tor of the numbers of individuals in ∂Br(Gn ) (o1 ) converges in distribution to (Zr(1,j1 ) (i))i∈[t] ,
which, by Theorem 3.10(b), is close to M∞ xκ (i) for some strictly positive random variable
M∞ . We conclude that
d d
x −→ Z (1,j
r
1)
, y −→ Z (2,j
r
2)
, (6.5.72)
where the limiting branching processes are independent. Equation (6.5.72) replaces the con-
vergence in Lemma 6.20 for GRGn (w).
We conclude that, for k = kn = d(1 + ε) logν ne, conditioning on Br(Gn ) (o1 ) and
Br (o2 ) such that a = ∂Br(Gn ) (o1 ), and b = ∂Br(Gn ) (o2 ), with ∂Br(Gn ) (o1 ), ∂Br(Gn ) (o2 ) 6=
(Gn )
∅, we have
Er [Nk (a, b)] −→ ∞.
P
(6.5.73)
P(~π , ρ
~ occupied) = P(~π occupied)P(~
ρ occupied | ~π occupied). (6.5.74)
Now recall the definition of a shape in (6.5.17), in Definition 6.16. Fix σ ∈ Shapem,l and
~ ∈ Pk (a, b) with Shape(~π , ρ
ρ ~ ) = σ . The factor P(~ ρ occupied | ~π occupied), summed out
over the free vertices of ρ ~ (i.e., those that are not also vertices in ~π ) gives rise to m factors of
the form T tκin (iπui , iπvi )/n, for i ∈ [m] and some vertices πui and πvi in the path (πi )ki=0 .
We use that, uniformly in q ≥ 1,
1 C
max T q (i, j) ≤ kT κn kq . (6.5.75)
n i,j∈[t] κn n
C
Thus, for every of the m subpaths of length ti we obtain a factor n
kT κn kti . Using that
6.5 Logarithmic Upper Bound for Finite-Variance Weights 281
Pm
i=1 ti = k − l, by (6.5.18), we arrive at
X
P(~π , ρ
~ occupied)
~ ~ ∈ Pk (a, b)
π, ρ
Shape(~ ~) = σ
π, ρ
m
Y C C m
≤ Er [Nk (a, b)] kT κn kti ≤ Er [Nk (a, b)]kT κn kk−l . (6.5.76)
i=1
n n
This replaces (6.5.21). The proof can now be completed in an identical way to that of (6.5.7)
combined with that of (6.5.49) in the proof of Theorem 6.19. We omit further details.
We condition on Br(Gn ) (o1 ) and Br(Gn ) (o2 ), and note that the events {|∂Br(Gn ) (o1 )| ≥ r} and
{|∂Br(Gn ) (o2 )| ≥ r} are measurable with respect to Br(Gn ) (o1 ) and Br(Gn ) (o2 ), to obtain
In the proof of Theorem 6.1(b) we showed that, on {∂Br(Gn ) (o1 ), ∂Br(Gn ) (o2 ) 6= ∅},
P distIRGn (κn ) (o1 , o2 ) ≤ (1 + ε) logν n | Br(Gn ) (o1 ), Br(Gn ) (o2 ) = 1 − oP (1), (6.5.79)
P o1 ←→
/ o2 | Br(Gn ) (o1 ), Br(Gn ) (o2 ) = oP (1), (6.5.80)
P o1 ←→
/ o2 | Br(Gn ) (o1 ), Br(Gn ) (o2 ) ≤ 1,
the Dominated Convergence Theorem [V1, Theorem A.1] completes the proof of (2.6.38)
for IRGn (κn ), as required.
282 Small-World Phenomena in Inhomogeneous Random Graphs
(a) (b)
40
300
35
250
30
200
25
Diameter
Diameter
150 20
15
100
10
50
5
0
105 106 107 108 105 106 107 108
Size Size
Figure 6.6 (a) Diameters of the 727 networks of size larger than 10,000, from the
KONECT data base, and (b) the 721 diameters that are at most 40.
In this section we discuss some related results for inhomogeneous random graphs. While we
give some intuition about their proofs, we do not include them in full detail.
See Figure 6.6 for the diameters of networks in the KONECT data base. While there are
some networks with quite large diameters (often corresponding to road or other spatial net-
works), the diameters in the majority of the networks are quite small.
We next investigate the diameter of an IRGn (κn ), which tends to be much larger than
the typical distances owing to the long thin lines that are distributed as a IRGn (κn ) with a
subcritical κn by a duality principle for IRGn (κn ). Before we state the results, we introduce
the notion of the dual kernel:
Definition 6.25 (Dual kernel for IRGn (κn )) Let (κn ) be a sequence of supercritical ker-
nels with limit κ. The limiting dual kernel is the kernel κ
b defined by κ
b(x, y) = κ(x, y) with
reference measure dµ b(x) = (1 − ζκ (x))µ(dx). Note that this reference measure integrates
to 1 − ζκ > 0, not to 1. J
The dual kernel describes the graph that remains after the removal of the giant component.
Here, the reference measure µ b measures the structure of the types of vertices in the graph.
Indeed, a vertex x is in the giant component with probability ζκ (x); if in fact it is in the giant
then it must be removed. Thus, µ b describes the proportion of vertices, of various types, that
are outside the giant component. As before, we define the operator T κb by
Z Z
(T κb f )(x) = κ
b(x, y)f (y)dµb(y) = κ(x, y)f (y)[1 − ζκ (x)]µ(dy), (6.6.2)
S S
6.6 Related Results on Distances in Inhomogeneous Random Graphs 283
where
Z
kf k2µb = f 2 (x)µ
b(dx). (6.6.4)
S
The following theorem describes the diameter in terms of the above notation:
Theorem 6.26 (Diameter of IRGn (κn ) in the finite-types case) Let (κn ) be a sequence of
kernels with limit κ, which has finitely many types. If 0 < kT κ k < 1 then
diam(IRGn (κn )) P 1
−→ (6.6.5)
log n log(1/kT κ k)
as n → ∞. If kT κ k > 1 and κ is irreducible then
diam(IRGn (κn )) P 2 1
−→ + , (6.6.6)
log n log(1/kT κb k) log kT κ k
where κ
b is the dual kernel to κ.
If we compare Theorem 6.26 with Theorem 6.2 then we see that the diameter has the same
scaling as the typical distance when kT κ k < ∞, but that diam(IRGn (κn ))/ log n con-
verges in probability to a strictly larger limit than the one when distIRGn (κn ) (o1 , o2 )/ log n is
conditioned on being finite. This effect is particularly noticeable in the case of rank-1 mod-
els with τ ∈ (2, 3), where, conditional on its being finite, distIRGn (κn ) (o1 , o2 )/ log log n
converges in probability to a finite limit, while diam(IRGn (κn ))/ log n converges to a
non-zero limit. This can be explained by noticing that the diameter in IRGn (κn ) is due to
very thin lines of length of order log n. Since these lines involve only very few vertices, they
do not contribute to distIRGn (κn ) (o1 , o2 ) but they do contribute to diam(IRGn (κn )). This is
another argument for why we prefer to work with typical distances rather than the diameter.
Exercise 6.28 investigates the consequences for ERn (λ/n).
We do not prove Theorem 6.26 here. For GRGn (w), it also follows from Theorem 7.19
below, which states a related result for the configuration model.
Equation (6.6.7) implies that the degrees have finite variance; see Exercise 6.29.
Theorem 6.27 (Limit law for the typical distance in NRn (w)) Consider NRn (w), where
the weights w = (wv )v∈[n] are given by wv = [1 − F ]−1 (v/n) as in (1.3.15), with F
284 Small-World Phenomena in Inhomogeneous Random Graphs
satisfying (6.6.7), and let ν = E[W 2 ]/E[W ] > 1. For k ≥ 1, define ak = blogν kc −
logν k ∈ (−1, 0]. Then, there exist random variables (Ra )a∈(−1,0] with
lim sup sup P(|Ra | < K) = 1 (6.6.8)
K→∞ a∈(−1,0]
such that, as n → ∞ and for all k ∈ Z, with o1 , o2 chosen independently and uar from [n],
P distNRn (w) (o1 , o2 ) − blogν nc = k | distNRn (w) (o1 , o2 ) < ∞
CLn (w), in the case of admissible deterministic weights. We refer to (Chung and Lu, 2003, p. 94) for the
definition of admissible weight sequences.
Theorem 6.2 has a long history, and many versions of it have been proved in the literature. We refer the
reader to Chung and Lu (2002a, 2003) for the Chung–Lu model, and van den Esker et al. (2008) for its
extensions to the Norros–Reittu model and the generalized random graph.
Theorem 6.3 for the random graph with prescribed expected degrees, or Chung–Lu model, was first
proved by Chung and Lu (2002a, 2003), in the case of deterministic weights wv = c(n/v)1/(τ −1) having
average degree strictly greater than 1 and maximum weight m satisfying log m log n/ log log n. These
restrictions were lifted in (Durrett, 2007, Theorem 4.5.2). Indeed, the bound on the average degree is not
necessary, since, for τ ∈ (2, 3), ν = ∞ and therefore the IRG is always supercritical. An upper bound
as in Theorem 6.3 for the Norros–Reittu model with iid weights was proved by Norros and Reittu (2006).
Theorem 6.3 has been proved in many versions, both fully as well as in partial forms; see, e.g., Norros and
Reittu (2006); Chung and Lu (2002a, 2003); Dereich et al. (2012).
Exercise 6.3 (Power-law tails in key example of deterministic weights) Let w be defined as wv = [1 −
F ]−1 (v/n) as in (1.3.15), and assume that F satisfies
Exercise 6.13 (Bound on truncated forward degree νn (b)) Assume that (6.3.21) holds. Prove the bound
on νn (b) in (6.3.38) by combining (1.4.12) in Lemma 1.22 with `n = Θ(n) by Conditions 1.1(a),(b).
6.8 Exercises for Chapter 6 287
Exercise 6.14 (Ultra-small distances for CLn (w) and GRGn (w)) Complete the proof of the doubly
logarithmic upper bound on typical distances in Theorem 6.11 for CLn (w) and GRGn (w).
Exercise 6.15 (Upper bound on the expected number of paths) Consider an inhomogeneous random graph
with edge probabilities pij = ui uj for (ui )i∈[n] ∈ [0, 1]n . Prove (6.5.4), which states that
X k−1
E[Nk (a, b)] ≤ ua ub u2i .
i∈I\{a,b}
Exercise 6.16 (Variance of two-paths) Consider an inhomogeneous random graph with edge probabilities
pij = ui uj for (ui )i∈[n] ∈ [0, 1]n . Prove that Var(Nk (a, b)) ≤ E[Nk (a, b)] for k = 2.
Exercise 6.17 (Variance of three-paths) Consider an inhomogeneous random graph with edge probabili-
ties pij = ui uj for (ui )i∈[n] ∈ [0, 1]n . Compute Var(N3 (a, b)) explicitly, and compare it with the bound
in (6.5.7).
Exercise 6.18 (Connections between sets in NRn (w)) Let A, B ⊆ [n] be two disjoint sets of vertices.
Prove that
P(A directly connected to Bin NRn (w)) = 1 − e−wA wB /`n , (6.8.8)
P
where wA = a∈A wa is the weight of A.
Exercise 6.19 (Expectation of paths between sets in ERn (λ/n)) Consider ERn (λ/n). Fix A, B ⊆ [n]
with A ∩ B = ∅, and let Nk (A, B) denote the number of self-avoiding paths of length k connecting A to
B (where a path connecting A and B avoids A and B except for the starting point and endpoint). Show
that, for k(|A| + |B|)/n = o(1),
!k
k |A| + |B|
E[Nk (A, B)] = λ |A||B| 1 − (1 + o(1)). (6.8.9)
n
Exercise 6.20 (Variance on path counts for ERn (λ/n) (cont.)) In the setting of Exercise 6.19, use Propo-
sition 6.14 to bound the variance of Nk (A, B), and prove that
P
Nk (A, B)/E[Nk (A, B)] −→ 1 (6.8.10)
when |A|, |B| → ∞ with |A| + |B| = o(n/k) and k = dlogλ ne.
Exercise 6.21 (Logarithmic bound for νn when τ = 3) Define
a = o01 , b = o02 ,
p
I = {i ∈ [n] : wi ∈ [K, βn ]}, (6.8.11)
where o01 , o02 are independent copies from the sized-biased distribution in (6.5.60). Prove that τ = 3 in the
form of (6.5.55) and (6.5.56) implies that νI ≥ c log βn for some c > 0 and all n sufficiently large. It may
be helpful to use
Z √βn
1X 2
wi = E[Wn 1{Wn ∈[K, βn ]} ] = 2
2
p
√ x[Fn ( βn ) − Fn (x ∨ K)]dx. (6.8.12)
n i∈I 0
Exercise 6.22 (Expected number of paths within Coren diverges) Recall the setting in (6.8.11) in Exercise
6.21. Fix η > 0. Prove that
E[Nk (a, b)] → ∞
for a = o01 , b= o02 , and k = d(1 + η) log n/ log νn e.
Exercise 6.23 (Concentration of number of paths within Coren ) Recall the setting in (6.8.11) in Exercise
6.21. Prove that
Var(Nk (a, b))/E[Nk (a, b)]2 → 0
for a = o01 , b = o02 , and k = d(1 + η) log n/ log νn e.
288 Small-World Phenomena in Inhomogeneous Random Graphs
Exercise 6.24 (Concentration of number of paths within Coren ) Complete the proof of Lemma 6.23 on
the basis of Exercises 6.21–6.23.
Exercise 6.25 (Asymptotic equivalence in Lemma 6.24) Recall the conditions of Theorem 6.22. Prove
that CLn (w), GRGn (w), and NRn (w) √ are asymptotically equivalent when restricted to the edges in
[n] × {v : wv ≤ βn } for any βn = o( n). Hint: Use the asymptotic equivalence in [V1, Theorem 6.18]
for general inhomogeneous random graphs.
Exercise 6.26 (Completion of the proof of Lemma 6.24) Complete the proof of (6.5.62) in Lemma 6.24
by adapting the arguments in (6.5.51)–(6.5.54).
Exercise 6.27 (Concentration of the giant in IRGs) In Section 6.5.3, Theorem 3.19 for finite-type inhomo-
geneous random graphs was proved using a path-counting method based on Theorem 6.1(b). Give a direct
proof of the “giant is almost local” condition in (2.6.38) by adapting the argument in Section 2.6.4 for the
Erdős–Rényi random graph. You may assume that µ(s) > 0 for every s ∈ [t] for which µn (s) > 0.
Exercise 6.28 (Diameter of ERn (λ/n)) Recall the asymptotics of the diameter in IRGn (κn ) in Theorem
6.26. For ERn (λ/n), show that kT κ k = λ and kT κb k = µλ , where µλ is the dual parameter in [V1,
(3.6.6)], so that Theorem 6.26 becomes
diam(ERn (λ/n)) P 2 1
−→ + . (6.8.13)
log n log(1/µλ ) log λ
Exercise 6.29 (Finite variance of degrees when (6.6.7) holds) Prove that (6.6.7) implies that E[W 2 ] < ∞.
Use this to prove that the degrees have uniformly bounded variance when (6.6.7) holds.
Exercise 6.30 (Tightness of centered typical distances in NRn (w)) Prove that, under the conditions
of Theorem 6.27, and conditional on distNRn (w) (o1 , o2 ) < ∞, the sequence distNRn (w) (o1 , o2 ) −
blogν nc n≥2 is tight.
Exercise 6.31 (Non-convergence of centered typical distances in NRn (w)) Prove that, under the condi-
tions of Theorem 6.27, and conditional on distNRn (w) (o1 , o2 ) < ∞, the sequence distNRn (w) (o1 , o2 ) −
blogν nc does not weakly converge when the distribution of Ra depends continuously on a and when there
are a, b ∈ (−1, 0] such that the distribution of Ra is not equal to that of Rb .
Exercise 6.32 (Extension of Theorem 6.27 to GRGn (w) and CLn (w)) Use [V1, Theorem 6.18] to prove
that Theorem 6.27 holds verbatim for GRGn (w) and CLn (w) when (6.6.7) holds. Hint: Use asymptotic
equivalence.
Exercise 6.33 (Extension lower bound Theorem 6.28 to all α > − 21 ) Consider NRn (w) with weights w
satisfying
P(Wn > x) = x−2 (log x)2α+o(1) , (6.8.14)
ε
for all x ≤ n for some ε > 0, where Wn = wo is the weight of a uniform vertex in [n]. Prove the lower
bound in Theorem 6.28 for all α > − 21 .
Exercise 6.34 (Extension of the lower bound in Theorem 6.28 to α < − 12 ) Consider NRn (w) as in
Exercise 6.33, but now with α < − 12 . Let ν = E[W 2 ]/E[W ] < ∞. Prove that the lower bound in Theorem
6.28 is replaced by
P(distGRGn (w) (o1 , o2 ) ≤ (1 − ε) logν n) = o(1). (6.8.15)
C HAPTER 7
S MALL -W ORLD P HENOMENA
IN C ONFIGURATION M ODELS
Abstract
In this chapter we investigate the distance structure of the configuration model
by investigating its typical distances and its diameter. We adapt the path-
counting techniques in Section 6.5 to the configuration model, and obtain typ-
ical distances from the “giant is almost local” proof. To understand the ultra-
small distances for infinite-variance degree configuration models, we investi-
gate the generation growth of infinite-mean branching processes. The relation
to branching processes informally leads to the power-iteration technique, which
allows one to deduce typical distance results in a relatively straightforward way.
In this chapter we investigate graph distances in the configuration model. We start with a
motivating example.
Motivating Example
Recall Figure 1.5(a), in which graph distances in the Autonomous Systems (AS) graph in the
Internet, also called AS counts, are shown. A relevant question is whether such a histogram
can be predicted by the graph distances in a random graph model a having similar degree
structure and size to the AS graph. Figure 7.1 compares simulations of the typical distances
for τ ∈ (2, 3) with the distances in the AS graph, with n = 10, 940 equal to the number of
autonomous systems, and τ = 2.25 the best approximation to the degree power-law expo-
nent of the AS graph. We see that the typical distances in the configuration model CMn (d)
and the AS counts are quite close. Further, Figure 7.2 shows the 90% percentile of typical
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6 7 8 9 10 11
Figure 7.1 Number of AS traversed in hopcount data (lighter gray), and, for
comparison, the model (darker gray) with τ = 2.25, n = 10, 940.
289
290 Small-World Phenomena in Configuration Models
20
16
14
12
10
2
105 106 107
Size
Figure 7.2 90th percentile of typical distances in the 727 networks of size larger
than 10,000 from the KONECT data base.
distances in the KONECT data base (recall that Figure 6.1 indicates the median value). We
see that this 90th percentile mostly remains relatively small, even for large networks.
Figures 7.1 and 7.2 again raise the question how graph distances depend on the structure
of the random graphs and real-world networks in question, such as their size and degree
structure. The configuration model is highly flexible, in the sense that it offers complete
freedom in the choice of the degree distribution. Thus, we can use the configuration model
(CM) to single out the relation between graph distances and degree structure, in a similar
way to that in which we investigate the giant component size and connectivity as a function
of the degree distribution, as discussed in detail in Chapter 4. Finally, we can verify whether
graph distances in CM are closely related to those in inhomogeneous random graphs, as dis-
cussed in Chapter 6, so as to detect another sign of the wished-for universality of structural
properties of random graphs with similar degree distributions.
In this section we describe the main results on typical distances in the CM, both in the case of
finite-variance degrees and in the case of infinite-variance degrees. These results are proved
in the following section.
(a) (b)
0.3
0.1
0.08
Proportion
Proportion
0.2
0.06
0.1 0.04
0.02
0
2 3 4 5 6 7 8 9 10 11 12 13 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
Typical Distance Typical Distance
Figure 7.3 Typical distances between 2,000 pairs of vertices in the configuration
model with n = 100, 000 and (a) τ = 2.5 and (b) τ = 3.5.
cases τ = 2 and τ = 3. In these cases it can be expected that the results depend on finer
properties of the degrees. We will present some results along the way, when such results
follow relatively straightforwardly from our proofs.
In this section we give the proofs of Theorems 7.1 and 7.2 describing the small-world prop-
erties in CMn (d). These proofs are adaptations of the proofs of Theorems 6.2 and 6.3, and
we focus on the differences in the proofs.
The section is organized as follows. In Section 7.3.1 we give a branching-process ap-
proximation for the neighborhoods of a pair of uniform vertices in CMn (d), using the local
convergence in Theorem 4.1 and its proof. In Section 7.3.2 we use path-counting upper
bounds to prove lower bounds on typical distances by the first-moment method. In Section
7.3.3 we employ path-counting techniques similar to those in in Section 6.5.1 using second-
moment methods, adapted to the CM, where edges are formed by pairing half-edges. We use
these to prove logarithmic upper bounds on graph distances. We close in Section 7.3.4 by
proving doubly logarithmic upper bounds on typical distances of CMs with infinite-variance
degrees. We also discuss the diameter of the core of high-degree vertices.
For a path ~π as in (7.3.3), we write ~π ⊆ CMn (d) when the path ~π in (7.3.3) is present in
CMn (d), so that the half-edge corresponding to si is paired with the half-edge correspond-
ing to ti+1 for i = 0, . . . , k − 1. Without loss of generality, we assume throughout that the
path ~π is self-avoiding, i.e., that π0 , . . . , πk are distinct vertices.
In this section, we perform first-moment computations on the number of paths present in
CMn (d). In the next section, we perform second-moment methods.
Proof The probability that the path ~π in (7.3.3) is present in CMn (d) is equal to
k
Y 1
P(~π ⊆ CMn (d)) = , (7.3.6)
` − 2i + 1
i=1 n
Substituting π0 = a, πk = b, we arrive at
da db X∗ k−1Y dπ (dπ − 1)
E[Nk (a, b)] = i i
, (7.3.8)
`n − 2k + 1π ,...,π i=1 `n − 2i + 1
1 k−1
where the sum is over distinct elements of I \ {a, b} (as indicated by the asterisk). Let R
denote the subset of vertices of I \ {a, b} for which di ≥ 2. Then
k−1
da db X∗ Y dπi (dπi − 1)
E[Nk (a, b)] = . (7.3.9)
`n − 2k + 1π `n − 2i + 1
1 ,...,πk−1 ∈R i=1
By an inequality of Maclaurin (Hardy et al., 1988, Theorem 52), for r = |R|, 2 ≤ k ≤ r+1,
and any (ai )i∈R with ai ≥ 0, we have
k−1
!k−1
(r − k + 1)! X∗ Y 1X
aπi ≤ ai . (7.3.10)
r! π ,...,π ∈R i=1
r i∈R
1 k−1
7.3 Proofs of Small-World Results for the Configuration Model 295
Finally, we arrive at
k−1
Y (r − i + 1)
da db
E[Nk (a, b)] ≤ (`n νI /r)k−1
`n − 2k + 1 i=1
(`n − 2i + 1)
k−2
Y (1 − i/r)
da db `n
≤ νIk−1 . (7.3.12)
`n − 2k + 1 `n − 2k + 3 i=0
(1 − 2i/`n )
Then, for any ε > 0, with o1 , o2 chosen independently and uar from [n],
P(distCMn (d) (o1 , o2 ) ≤ (1 − ε) logνn n) = o(1). (7.3.14)
We leave the proof of Theorem 7.6, which is almost identical to that of Theorem 6.4, with
(7.3.5) in Proposition 7.5 to hand, as Exercise 7.5.
We next investigate the τ = 3 case, where the degree distribution has logarithmic correc-
tions to the power law, as also investigated in Theorem 6.28 for NRn (w):
Corollary 7.7 (Critical τ = 3 case: interpolation) Consider CMn (d) where the degrees
d = (dv )v∈[n] satisfy Conditions 1.7(a),(b), and there exists an α such that, for all x ≥ 1,
[1 − Fn ](x) ≤ c2 x−2 (log x)2α . (7.3.15)
Let o1 , o2 be chosen independently and uar from [n]. Then, for any ε > 0 and α > − 21 ,
log n
P distCMn (d) (o1 , o2 ) ≤ (1 − ε) = o(1), (7.3.16)
(2α + 1) log log n
while, for α < − 12 and with ν = limn→∞ νn < ∞,
Equation (7.3.20) replaces the similar identity (6.3.6) for CLn (w). We see that wπ0 and wπk
in (6.3.6) are replaced by dπ0 and dπk in (7.3.20), and, for i ∈ [k − 1], the factors wπ2 i in
(6.3.6) are replaced by dπi (dπi − 1) in (7.3.20) while the factors `n in (6.3.6) are replaced
by `n − 2i + 1 for i ∈ [k] in (7.3.20).
Define, as in (6.3.31),
1 X
νn (b) = dv (di − 1)1{di ≤b} . (7.3.21)
`n v∈[n]
Then, the arguments in Section 6.3.2 imply that (see in particular Exercise 6.12),
kn k−1
du dv X `kn (`n − 2k − 1)!! Y
P(distCMn (d) (u, v) ≤ kn ) ≤ νn (bl ∧ bk−l ) (7.3.22)
`n k=1 (`n − 1)!! l=1
kn k
X `k (`n − 2k − 1)!!
n
Y
+ (du + dv ) [1 − Fn? ](bk ) νn (bl ),
k=1
(`n − 1)!! l=1
i.e., the bound in (6.8.7) is changed by factors `kn (`n − 2k − 1)!!/(`n − 1)!! in the sum. For
k = O(log log n) and when Conditions 1.7(a),(b) hold,
k
`kn (`n − 2k − 1)!! Y `n
= = 1 + O(k 2 /`n ) = 1 + o(1), (7.3.23)
(`n − 1)!! i=1
` n − 2i + 1
so this change has a negligible effect. Since (6.3.38) in Lemma 6.10 applies under the con-
ditions of Theorem 7.8, we can follow the proof of Theorem 6.7 verbatim.
7.3 Proofs of Small-World Results for the Configuration Model 297
7.3.3 PATH -C OUNTING L OWER B OUNDS AND R ESULTING D ISTANCE U PPER B OUNDS
In this subsection we provide upper bounds on typical distances in CMn (d). We start by
using the “giant is almost local” results proved in Section 4.3.1; see in particular Remark
4.13. After this, we continue with path-counting techniques similar to those in Section 6.5.1,
focussing on the variance of the number of paths in CMn (d). Such estimates turn out to be
extremely versatile and can be used extensively to prove various upper bounds on distances,
as we will show in the remainder of the section.
Theorem 7.9 (Logarithmic upper bound on graph distances in CMn (d)) Consider CMn (d)
where the degrees d = (dv )v∈[n] satisfy Conditions 1.7(a)–(c) with ν = E[D(D−1)]/E[D]
∈ (1, ∞). Then, for any ε > 0, with o1 , o2 chosen independently and uar from [n],
P(distCMn (d) (o1 , o2 ) ≤ (1 + ε) logν n | distCMn (d) (o1 , o2 ) < ∞) = 1 + o(1). (7.3.24)
Proof Recall Section 4.3.1, where the degree-truncation technique from Theorem 1.11
was used with b sufficiently large. Recall that CMn0 (d0 ) denotes the CM after the degree-
truncation method has been applied. Then, for Gn = CMn0 (d0 ) with d0max ≤ b, the
proof shows that, when |∂Br(Gn ) (o1 )|, |∂Br(Gn ) (o2 )| ≥ r, whp also distCMn0 (d0 ) (o1 , o2 ) ≤
logνn0 n(1 + oP (1)) (recall Remark 4.13). Here νn0 = E[Dn0 (Dn0 − 1)]/E[Dn0 ] denotes the
expected forward degree of a uniform half-edge in CMn0 (d0 ).
We next relate this to the bound in (7.3.24). We note that νn0 → ν 0 when Conditions
1.7(a),(b) hold, where, by construction,
E[D0 (D0 − 1)] E[(D ∧ b)((D ∧ b) − 1)]
ν0 = = . (7.3.25)
E[D ]0 E[D]
The latter equality holds since v∈[n0 ] d0v = v∈[n] dv , and d0v = dv ∧ b for v ∈ [n], while
P P
d0v = 1 for v ∈ [n0 ] \ [n]. Thus, we also have
X X X
d0v (d0v − 1) = d0v (d0v − 1) = (dv ∧ b)((dv ∧ b) − 1). (7.3.26)
v∈[n0 ] v∈[n] v∈[n]
Since also P(distCMn (d) (o1 , o2 ) < ∞) → ζ 2 and P(distCMn0 (d0 ) (o1 , o2 ) < ∞) → (ζ 0 )2 ,
where ζ 0 = ζ 0 (b) → ζ when b → ∞, this gives the first proof of the upper bound in
Theorem 7.9.
Exercise 7.7 shows that the analysis in Section 4.3.1 can be performed without the degree-
truncation argument of Theorem 1.11 when lim supn→∞ E[Dn3 ] < ∞. Exercise 7.8 extends
the condition to lim supn→∞ E[Dnp ] < ∞ for some p > 2.
298 Small-World Phenomena in Configuration Models
where Ia,b,k is the subset of I in which a and b, as well as the k − 1 indices with highest
degrees, have been removed. Let
1 X 1 X
νI = di (di − 1), γI = 3/2
di (di − 1)(di − 2). (7.3.30)
`n i∈I `n i∈I
The following proposition replaces the similar Proposition 6.14 for CLn (w), which was
crucial in deriving lower bounds on typical distances:
Proposition 7.10 (Variance of number of paths) For any k ≥ 1, a, b ∈ I ,
is the number of paths ~π of length k between the vertices a and b, where a path was defined
7.3 Proofs of Small-World Results for the Configuration Model 299
in (7.3.3). Since Nk (a, b) is a sum of indicators, its variance can be written as follows:
~ gives rise to the first contribution to e0k . For the other contributions, we
over all ~π and ρ
follow the proof of (6.5.13) for NRn (w), and omit further details.
With Proposition 7.10 in hand, we can straightforwardly adapt the proof of Theorem 6.19
to CMn (d) to prove Theorem 7.9. We leave this proof of Theorem 7.9 as Exercise 7.9.
Theorem 7.11 (Doubly logarithmic upper bound on typical distance for τ ∈ (2, 3)) Con-
sider CMn (d) where the degrees d = (dv )v∈[n] satisfy Conditions 1.7(a),(b) and (7.3.38).
300 Small-World Phenomena in Configuration Models
Then, for any ε > 0, with o1 , o2 chosen independently and uar from [n],
!
2(1 + ε) log log n
lim P distCMn (d) (o1 , o2 ) ≤ | distCMn (d) (o1 , o2 ) < ∞ = 1.
n→∞ | log (τ − 2)|
(7.3.39)
Exercise 7.10 explores a proof of Theorem 7.11, based on the proof for NRn (w) in
Theorem 6.11, that is an alternative to the proof given below.
Proof This proof of Theorem 7.11 makes precise the statement that vertices of large degree
d are directly connected vertices of degree approximately d1/(τ −2) .
Connectivity of Sets
We start by studying the connectivity of sets in CMn (d), for which we rely on the following
connectivity lemma, which is of independent interest:
Lemma 7.12 (Connectivity sets in CMn (d)) For any two sets of vertices A, B ⊆ [n],
P(A not directly connected to B in CMn (d)) ≤ e−dA dB /(2`n ) , (7.3.40)
where, for any A ⊆ [n],
X
dA = da (7.3.41)
a∈A
and (7.3.38). Fix r ≥ 1, and condition on Br(Gn ) (o1 ) and Br(Gn ) (o2 ) such that ∂Br(Gn ) (o1 ) 6=
d
∅ and ∂Br(Gn ) (o2 ) 6= ∅. By Corollary 7.3 (Zr(n;1) , Zr(n;2) ) −→ (Zr(1) , Zr(2) ), and Zr(n;1) and
Zr(n;2) are whp quite large since we are conditioning on Zr(n;1) ≥ 1 and Zr(n;2) ≥ 1. Fix
C > 1 large, and note that, by Lemma 7.12, the conditional probability that none of the
Zr(n;i) half-edges is paired to a vertex of degree at least d is at most
( )
dv 1{dv ≥d} /(2`n ) .
X
exp − Zr (n;i)
(7.3.44)
v∈[n]
where c = c1 /(2 supn E[Dn ]). With d = (Zr(n;i) )1/(τ −2+ε) , the probability (7.3.44) is at
most exp{−c(Zr(n;i) )ε/(2−τ +ε) }. Call the maximal-degree vertex to which one of the Zr(n;i)
half-edges is paired the first power-iteration vertex.
k
(n;i) 1/(τ −2+ε)
We now iterate these ideas. Denote uk = u(i) k = (Zr ) . Then, the probability
that the (k − 1)th power-iteration vertex is not paired to a vertex of degree at least uk is at
ε/(2−τ +ε)
most exp{−cuk−1 }, and we call the maximum-degree vertex to which the (k − 1)th
power-iteration vertex is paired the k th power-iteration vertex.
We iterate this until we reach one of the hubs in {v : dv > nβ }, where β > 21 , for which
we need at most kn? iterations, with kn? satisfying
?
kn
ukn? = (Zr(n;i) )1/(τ −2+ε) ≥ nβ . (7.3.46)
The smallest kn? for which this occurs is
& '
β
log log(n ) − log log(Z (n;i)
r )
kn? = . (7.3.47)
| log(τ − 2 + ε)|
Finally, the probability that power iteration fails from vertex oi is at most
∞
ε/(2−τ +ε)
X P
exp{−cuk−1 } −→ 0, (7.3.48)
k=1
i.e., the set of vertices of degree at least (log n)σ . Then, the diameter of the core is bounded
in the following theorem, which is interesting in its own right:
Theorem 7.13 (Diameter of the core) Consider CMn (d) where the degrees d = (dv )v∈[n]
satisfy Conditions 1.7(a),(b) and (7.3.38). For any σ > 1/(3 − τ ), the diameter of Coren
is whp bounded above by
2 log log n
+ 1. (7.3.50)
| log (τ − 2)|
We prove Theorem 7.13 below and start by setting up the notation for it. We note that
(7.2.2) implies that, for some β ∈ ( 12 , 1/(τ − 1)), we have
Define
Γ1 = {v ∈ [n] : dv ≥ u1 }, (7.3.52)
so that Γ1 6= ∅. For some constant C > 0 to be determined later on, and for k ≥ 2, we
recursively define
τ −2
uk = C log n uk−1 . (7.3.53)
where
bk = β(τ − 2)k−1 , ak = [1 − (τ − 2)k−1 ]/(3 − τ ). (7.3.55)
Γk = {v ∈ [n] : dv ≥ uk }. (7.3.57)
The key step in the proof of Theorem 7.13 is the following proposition, showing that whp
every vertex in Γk is connected to a vertex in Γk−1 :
Proposition 7.15 (Connectivity between Γk−1 and Γk ) Consider CMn (d) where the de-
grees d = (dv )v∈[n] satisfy Conditions 1.7(a),(b) and (7.3.38). Fix k ≥ 2, and take
C > 2E[D]/c1 , with c1 as in (7.3.38). Then the probability that there exists an i ∈ Γk
that is not directly connected to Γk−1 in CMn (d) is at most n−δ , for some δ > 0 that is
independent of k .
7.3 Proofs of Small-World Results for the Configuration Model 303
By (7.3.60) and Lemma 7.12, using Boole’s inequality, the probability that there exists a
v ∈ Γk that is not directly connected to Γk−1 is bounded by
n o
n exp − uk nuk−1 [1 − F (uk−1 )]/(2`n )} ≤ n exp − cuk (uk−1 )2−τ /(2E[Dn ])
= n1−cC/(2E[Dn ]) , (7.3.61)
where we use (7.3.53). By Conditions 1.7(a),(b), E[Dn ] → E[D] so that, as n → ∞ and
taking C > 2E[D]/c, we obtain the claim for any δ < cC/[2E[D]] − 1.
We now complete the proof of Theorem 7.13:
Proof of Theorem 7.13. Fix
j log log n k
kn? = . (7.3.62)
| log (τ − 2)|
As a result of Proposition 7.15 and the fact that kn? n−δ = o(1), whp every i ∈ Γk is directly
connected to Γk−1 for all k ≤ kn? . By Exercise 7.11, Γ1 forms whp a complete graph. As a
result, the diameter of Γkn? is at most 2kn? + 1. Therefore, it suffices to prove that
Coren ⊆ Γkn? . (7.3.63)
By (7.3.53), in turn, this is equivalent to ukn? ≤ (log n)σ , for any σ > 1/(3 − τ ). According
to Lemma 7.14,
ukn? = (C log n)akn? nbkn? . (7.3.64)
?
We note that nbkn? = exp (τ − 2)
kn
log n . Since, for τ ∈ (2, 3),
x(τ − 2)log x/| log (τ −2)| = x × x−1 = 1, (7.3.65)
304 Small-World Phenomena in Configuration Models
we find with x = log n that nbkn? ≤ e1/(τ −2) . Further, ak → 1/(τ − 3) as k → ∞, so that
(C log n)akn? = (C log n)1/(3−τ )+o(1) . We conclude that
ukn? = (log n)1/(3−τ )+o(1) , (7.3.66)
so that, by choosing n sufficiently large, we can make 1/(3−τ )+o(1) ≤ σ . This completes
the proof of Theorem 7.13.
Exercise 7.12 studies an alternative proof of Theorem 7.11 that proceeds by showing that
whp a short path exists between ∂Br(Gn ) (o1 ) and Coren when ∂Br(Gn ) (o1 ) is non-empty.
Critical Case τ = 2
We next discuss the critical case where τ = 2 and the degree distribution has logarithmic
corrections. We use adaptations of the power-iteration technique.
Let us focus on one specific example, where
c1
[1 − Fn ](x) ≥ (log x)−α , (7.3.67)
x
for all x ≤ nβ and some β > 12 . We take α > 1, since otherwise (Dn )n≥1 might not be
uniformly integrable. Our main result is as follows:
Theorem 7.16 (Example of ultra-ultra-small distances for τ = 2) Consider CMn (d)
where the degrees d = (dv )v∈[n] satisfy Conditions 1.7(a),(b) and (7.3.67). Fix r ≥ 1 and
(1−ε)/(α−1)
let u0 = r. Define, for k ≥ 1, recursively uk = exp uk−1 . Let kn? = inf{k : uk ≥
β
n }. Then, with o1 , o2 chosen independently and uar from [n],
P(distCMn (d) (o1 , o2 ) ≤ 2(kn? + r) + 1 | distCMn (d) (o1 , o2 ) < ∞) → 1, (7.3.68)
when first n → ∞ followed by r → ∞.
Proof We start from the setting discussed just above (7.3.44) and see how power iteration
applies in this case. We compute
1 X
dv 1{dv ≥d} = E[Dn 1{Dn ≥d} ] = P(Dn 1{Dn ≥d} > k)
X
n v∈[n] k≥0
X X
= P(Dn > k) = [1 − Fn ](k). (7.3.69)
k≥d k≥d
(1−ε)/(α−1)
Denote u0 = r, and recursively define uk = exp{uk−1 } for k ≥ 1. Again the k th
power-iteration vertex is the maximal-degree vertex to which the (k − 1)th power-iteration
vertex is connected.
The probability that the (k − 1)th power-iteration vertex is not paired to a vertex of
degree at least uk is at most exp{−cuεk−1 }. Recall that kn? = inf{k : uk ≥ nβ } denotes the
number of steps needed to reach a hub. Since the set of hubs {v : dv ≥ nβ } is whp a clique,
by Exercise 7.11, the probability that distCMn (d) (o1 , o2 ) > 2(r + kn? ) + 1 is at most
ε
X
oP (1) + 2 e−cuk−1 → 0, (7.3.72)
k≥1
Recall the branching-process limit for neighborhoods in CMn (d) in Corollary 7.3. When
τ ∈ (2, 3), the branching processes (Zj(1) )j≥0 and (Zj(2) )j≥0 are well defined but have infi-
nite mean in generations 2, 3, etc. In this section we give a scaling result for the generation
sizes for branching processes with infinite mean. This result is crucial to describe the fluctua-
tions of the typical distances in CMn (d), and it also allows us to understand how ultra-small
distances of order log log n arise. The main result in this section is as follows:
Theorem 7.17 (Branching processes with infinite mean) Let (Zk )k≥0 be a branching pro-
cess with offspring distribution Z1 = X having distribution function FX . Assume that there
exist α ∈ (0, 1) and a non-negative non-increasing function x 7→ γ(x) such that
x−α−γ(x) ≤ 1 − FX (x) ≤ x−α+γ(x) , for large x, (7.4.1)
where x 7→ γ(x) satisfies
In the proof of Theorem 7.17 for the special case γ(x) = c(log x)γ−1 , we rely on regu-
larly varying functions. Note, however, that (7.4.1) does not assume that x 7→ 1 − FX (x)
is regularly varying (meaning that xα [1 − FX (x)] is slowly varying; recall Definition 1.19).
Thus, instead, we work with the special slowly varying functions
1 − FX± (x) = x−α±γ(x) . (7.4.2)
Proof of Theorem 7.17 for γ(x) = c(log x)γ−1 . The proof is divided into five main steps.
The Split
We first assume that P(Z1 ≥ 1) = 1, so that the survival probability equals 1. Define
Mk = αk log(Zk ∨ 1). (7.4.3)
For i ≥ 1, we define
!
i (Zi ∨ 1)
Yi = α log . (7.4.4)
(Zi−1 ∨ 1)1/α
We make the split
Mk = Y1 + Y2 + · · · + Yk . (7.4.5)
P∞ this split, it is clear that the almost sure convergence of Mk follows when the sum
From
i=0 Yi converges, which is the case when, in turn,
∞
X
E Yi < ∞.
(7.4.6)
i=1
so that Yi = Ui + Vi and
E Yi ≤ E Ui + E Vi .
(7.4.11)
We bound each of these terms separately.
When (7.4.1) holds, and since limx→∞ γ(x) = 0, there exists a constant Cε ≥ 1 such
that, for all n ≥ 1,
un ≤ Cε n1/α+ε . (7.4.12)
This gives a first bound on n 7→ un . We next substitute this bound into (7.4.1) and use that
x 7→ xγ(x) is non-decreasing together with γ(x) = (log x)γ−1 , to obtain
1 + o(1) = n[1 − FX (un )] ≥ n un−α−γ(un )
h n oi
1/α+ε γ
≥ n u−αn exp log C ε n . (7.4.13)
In turn, this implies that there exists a constant c > 0 such that
γ
un ≤ n1/α ec(log n) . (7.4.14)
1/α −c(log n)γ
In a similar way, we can show the matching lower bound un ≥ n e . As a result,
E Ui ≤ cαi E (log (Zi−1 ∨ 1))γ .
(7.4.15)
Using the concavity of x 7→ xγ for γ ∈ [0, 1), as well as Jensen’s inequality, we arrive at
γ
E Ui ≤ cαi E (log (Zi−1 ∨ 1)) = αi(1−γ) E[Mi−1 ]γ .
(7.4.16)
By (7.4.5) and (7.4.7), which implies that E[Mi−1 ] ≤ Kκ/(1 − κ), we arrive at
Kκ γ
E Ui ≤ αi(1−γ) c
, (7.4.17)
1−κ
γ
so that (7.4.7) follows for Ui , with κ = α1−γ < 1 and K replaced by c 1−κ Kκ
. An identical
argument implies that
Kκ γ
E log(u+ − i(1−γ)
Zi−1 ∨1 /u Zi−1 ∨1 ) ≤ α c . (7.4.18)
1−κ
Logarithmic Moment of an Asymptotically Stable Random Variable
In this step, which is the most technical, we bound E Vi . We note that by [V1, Theorem
2.33] and for Zi quite large, the random variable (Zi ∨ 1)/(uZi−1 ∨1 ) should be close to
being a stable random variable. We first add and subtract a convenient additional term, and
write
E Vi = E Vi − 2E log(u+ −
Zi−1 ∨1 /uZi−1 ∨1 )
+ −
+ 2E log(uZi−1 ∨1 /uZi−1 ∨1 ) .
(7.4.19)
308 Small-World Phenomena in Configuration Models
The latter term is bounded in (7.4.18). For the first term, we will rely on stochastic domina-
tion results in terms of 1 − FX± in (7.4.2).
We make use of the relation to stable distributions by obtaining the bound
E Vi − 2E log(u+ −
Zi−1 ∨1 /uZi−1 ∨1 )
n o
≤ αi sup E log Sm /um − 2 log(u+ −
m /u m ) , (7.4.20)
m≥1
where Sm = X1 + · · · + Xm , and (Xi )m i=1 are iid copies of the offspring distribution X .
Our aim is to prove that there exists a constant C > 0 such that, for all m ≥ 1,
E log Sm /um − 2 log(u+ −
m /um ) ≤ C. (7.4.21)
In order to prove (7.4.21), we note that it suffices to obtain the bounds
E log Sm /um + − log(u+ −
m /um ) ≤ C+ , (7.4.22)
E log Sm /um − − log(u+ −
m /um ) ≤ C− , (7.4.23)
≤ E log u− − −
m /(Sm ∧ um ) , (7.4.24)
where x ∧ y = min{x, y} and we have used (7.4.9). The random variables Xi− have a
regularly varying tail, so that we can use extreme-value theory in order to bound the above
quantity.
− −
The function x 7→ log (um /(x ∧ um ) is non-increasing, and, since Sm ≥ X(m) where
− −
X(m) = max1≤i≤m Xi , we arrive at
E log u− − −
≤ E log u− − −
m /(Sm ∧ um ) m /(X(m) ∧ um ) . (7.4.25)
We next use that, for x ≥ 1, x 7→ log(x) is concave, so that, for every s ≥ 0,
1
E log u− − −
= E log (u− − − s
m /(X(m) ∧ um ) m /(X(m) ∧ um ))
s
1
− s
≤ log E u− −
m /(X(m) ∧ um )
s
1 1 − −s
≤ + log (u− m ) s
E (X(m) ) , (7.4.26)
s s
where, in the last step, we have made use of the fact that u− − −
m /(x ∧ um ) ≤ 1 + um /x.
−s s −
Now rewrite X(m) as (−Y(m) ) , where Yj = −1/Xj and Y(m) = max1≤j≤m Yj . Clearly,
Yj ∈ [−1, 0] since Xi− ≥ 1, so that E[(−Y1 )s ] < ∞. Also, u− −
m Y(m) = −um /X(m)
7.4 Branching Processes with Infinite Mean 309
and which together with (7.4.28) proves (7.4.22) with C+ = 1/s + 2s/2 Cs /s.
By (7.4.1),
1/α
P(Y1 > x) = P(Z1 > ex ) = e−x(1+o(1)) , (7.4.33)
which shows that Y1 satisfies (7.4.31). The equality in (7.4.32) together with (7.4.4) suggests
that the tails of Y1 are equal to those of Y , which heuristically explains (7.4.31). Exercise
7.23 gives an example where the limit Y is exactly exponential, so that the asymptotics in
Theorem 7.18 is exact. The key behind this argument is the fact that Y in Theorem 7.17
satisfies the distributional equation
d X
Y = α max Yi , (7.4.34)
i=1
where (Yi )i≥1 are iid copies of Y that are independent of X (see Exercise 7.22).
(the first approximation being true for fairly small k , but not necessarily for large k ), this
suggests that distCMn (d) (o1 , o2 ) ≈ kn , where, by Theorem 7.17,
n o
Θ(n) = Zk(1)n = exp (τ − 2)−kn Y (1) (1 + oP (1)) , (7.4.37)
which in turn suggests that distCMn (d) (o1 , o2 ) ≈ log log n/| log(τ −2)|. Of course, for such
values the branching-process approximation may fail miserably, and in fact it does. This is
exemplified by the fact that the rhs of (7.4.37) can become much larger than n, which is
clearly impossible for |∂Bk(Gnn ) (o1 )|.
More intriguingly, we see that the proposed typical distances are a factor 2 too small
compared with Theorem 7.2. The reason is that the double-exponential growth can clearly
no longer be valid when Zk(n;1) becomes too large, and thus, Zk(n;1) must be far away from Zk(1)
in this regime. The whole problem is that we are using the branching-process approximation
well beyond its “expiry date.”
Hence, let us try this again but, rather than using it for one neighborhood, let us use the
branching-process approximation from two sides. Now we rely on the statement that
P(distCMn (d) (o1 , o2 ) ≤ 2k) = P(∂Bk(Gn ) (o1 ) ∩ ∂Bk(Gn ) (o2 ) 6= ∅). (7.4.38)
Again using (7.4.36), we see that
log |∂Bk(Gn ) (oi )| ≈ (τ − 2)−k Y (i) (1 + oP (1)), i ∈ {1, 2},
7.5 Diameter of the Configuration Model 311
where Y (1) and Y (2) are independent copies of the random variable Y in Theorem 7.17.
We see that |∂Bk(Gn ) (o1 )| and |∂Bk(Gn ) (o2 )| grow roughly at the same pace, and, in par-
ticular, we have |∂Bk(Gn ) (oi )| = nΘ(1) roughly at the same time, namely, when k ≈
log log n/| log(τ − 2)|. Thus, we conclude that
distCMn (d) (o1 , o2 ) ≈ 2 log log n/| log(τ − 2)|,
as rigorously proved in Theorem 7.2. We will see in more detail in Theorem 7.25 that the
above growth from two sides does allow for better branching-process approximations.
P(Qk > q | Qk−1 = qk−1 ) = 1− P(Qk ≤ q | Qk−1 = qk−1 ) = 1−FX (q)qk−1 . (7.4.39)
that ξ is the extinction probability of the branching process with offspring distribution p? ,
and further define
d ? X X
µ= GD (z) z=ξ = kξ k−1 p?k = k(k + 1)ξ k−1 pk+1 /E[D]. (7.5.1)
dz k≥0 k≥1
When ξ < 1, we also have that µ < 1. Then, the main result is as follows:
Theorem 7.19 (Diameter of the configuration model) Consider CMn (d) where the de-
grees d = (dv )v∈[n] satisfy Conditions 1.7(a),(b). Assume that E[Dn2 ] → E[D2 ] ∈ (0, ∞)∪
{∞}, where ν = E[D(D − 1)]/E[D] > 1. Assume further that n1 = 0 when p1 = 0, and
that n2 = 0 when p2 = 0. Then,
diam(CMn (d)) P 1 1{p >0} 1{p =0,p >0}
−→ +2 1 + 1 2
. (7.5.2)
log n log ν | log µ| | log p?1 |
For finite-variance degrees, we note that, by Theorems 7.1 and 7.19, the diameter of the
configuration model is strictly larger than the typical distance, except when p1 = p2 = 0.
In the latter case, the degrees are at least 3, so that thin lines, consisting of degree-2 vertices
connected to each other, are not possible and the configuration model is whp connected
(recall Theorem 4.24). By [V1, Corollary 7.17] (recall also the discussion around (1.3.41)),
Theorem 7.19 also applies to uniform random graphs with a given degree sequence, when
the degrees have finite second moment, as in the examples below.
We also remark that Theorem 7.19 applies not only to the finite-variance case but also to
the finite-mean and infinite-variance case. In the latter case, the diameter is of order log n
unless p1 = p2 = 0, in which case Theorem 7.19 implies that the diameter is oP (log n). We
will discuss the latter case in more detail in Theorem 7.20 below.
Again, we make essential use of [V1, Theorem 7.18] (recall also Theorem 1.4 and the dis-
cussion below (1.3.29)), which relates the configuration model and the generalized random
graph. We note that ERn (λ/n) is the same as GRGn (w), where wv = nλ/(n − λ) for all
v ∈ [n] (recall [V1, Exercise 6.1]).
Clearly, w = (nλ/(n − λ))v∈[n] satisfies Conditions 1.1(a)–(c), so that the degree se-
quence of ERn (λ/n) also satisfies Conditions 1.7(a)–(c), where the convergence holds in
probability (recall [V1, Theorem 5.12]). From the above identifications and using [V1, The-
orem 7.18], we find that
diam(ERn (λ/n)) P 1 2
−→ + . (7.5.4)
log n log λ | log µλ |
This identifies the diameter of the Erdős–Rényi random graph, for which Theorem 7.19
agrees with Theorem 6.26. Exercise 7.30 investigates the diameter of GRGn (w).
close to log n/[2| log p?1 |]. Now, it turns out that pairs of such vertices realize the asymptotic
diameter, which explains why the diameter is close to log n[1/ log ν + 1/| log p?1 |].
Finally, we discuss what happens when p1 = p2 = 0. In this case, the assumption in
Theorem 7.19 implies that n1 = n2 = 0, so that dmin ≥ 3. Then, CMn (d) is whp connected
(recall Theorem 4.24), and the 2-core is the graph itself. Also, there cannot be any long
thin parts of the giant, since every vertex has degree at least 3 so that local neighborhoods
grow exponentially with overwhelming probability. Therefore, the graph distances and the
diameter have the same asymptotics, as proved by Theorem 7.19 when dmin ≥ 3.
The above case-distinctions explain the intuition behind Theorem 7.19. This intuition is
far from a proof; see Section 7.7 for references.
Lemma 7.22 (Moments of number of minimally k -connected vertices) Let CMn (d) sat-
isfy dmin ≥ 3, ndmin > dmin (dmin − 1)k−1 . For kn ≤ (1 − ε) log log n/ log (dmin − 1),
Mkn
E[Mkn ] → ∞,
P
and −→ 1. (7.5.6)
E[Mkn ]
We leave the proof of Lemma 7.22 to Exercises 7.31–7.33. To complete the proof of the
lower bound on the diameter, we fix ε > 0 sufficiently small and take
l log log n m
kn? = (1 − ε) .
log (dmin − 1)
Clearly,
?
dmin (dmin − 1)kn −1 ≤ (log n)1−ε ≤ `n /8, (7.5.7)
vertices of degree dmin , whp there must be at least two minimally kn? -connected vertices
whose kn? -neighborhoods are disjoint. We fix two such vertices and denote them by v1
?
and v2 . We note that v1 and v2 have precisely dmin (dmin − 1)kn −1 unpaired half-edges
in ∂Bk(Gn? n ) (v1 ) and ∂Bk(Gn? n ) (v2 ). Let A12 denote the event that v1 , v2 are minimally kn? -
connected and their kn? -neighborhoods are disjoint.
Conditional on A12 , the random graph found by collapsing the half-edges in ∂Bk(Gn? n ) (v1 )
to a single vertex a and the half-edges in ∂Bk(Gn? n ) (v1 ) to a single vertex b is a configuration
model on the vertex set {a, b} ∪ [n] \ (Bk(Gn? n ) (v1 ) ∪ Bk(Gn? n ) (v2 )), having degrees d0 given by
?
d0a = d0b = dmin (dmin − 1)kn −1 and d0i = di for every i ∈ [n] \ (Bk(Gn? n ) (v1 ) ∪ Bk(Gn? n ) (v2 )).
By the truncated first-moment method on paths, performed in the proof of Theorem 7.8
(recall (7.3.22)), it follows that, for any ε > 0,
2 log log n
P distCMn (d) (∂Bk(Gn? n ) (v1 ), ∂Bk(Gn? n ) (v2 )) ≤ (1−ε) A12 = o(1). (7.5.9)
| log (τ − 2)|
316 Small-World Phenomena in Configuration Models
Therefore, whp,
2 log log n
diam(CMn (d)) ≥ (1 − ε) + 2kn?
| log (τ − 2)|
h 2 2 i
= (1 − ε) log log n + . (7.5.10)
| log (τ − 2)| log (dmin − 1)
Since ε > 0 is arbitrary, this suggests the lower bound in Theorem 7.20.
3
N=10
4
0.6 N=10
5
N=10
probability
0.4
0.2
0.0
0 1 2 3 4 5
hopcount
Figure 7.4 Typical distances for τ = 1.8 and n = 103 , 104 , 105 .
We will now study the configuration model CMn (d) where the degrees d = (dv )v∈[n] are
an iid sequence of random variables with distribution F satisfying (7.6.1).
We will make heavy use of the results and notation used in [V1, Theorem 7.23], which
we first recall: the random probability distribution P = (Pi )i≥1 is given by
Pi = Zi /Z, (7.6.2)
−1/(τ −1) Pi
where Zi = Γi and Γi = Ei with (Ei )i≥1 an iid sequence of exponential
j=1
P −1/(τ −1)
random variables with parameter 1 and where Z = i≥1 Γi . The latter is finite
almost surely, since 1/(τ − 1) > 1 for τ ∈ (1, 2) (see Exercise 7.34).
Recall further that MP,k is a multinomial distribution with parameters k and (random)
probabilities P = (Pi )i≥1 . Thus, MP,k = (B1 , B2 , . . .), where, conditional on P =
(Pi )i≥1 , Bi is the number of outcomes i in k independent trials such that each outcome
is equal to i with probability Pi .
In [V1, Theorem 7.23], the random variable MP,D1 appears, where D1 is independent of
P = (Pi )i≥1 . We let MP,D
(1)
1
and MP,D
(2)
2
be two such random variables that are conditionally
independent given P = (Pi )i≥1 (but share the same P = (Pi )i≥1 sequence). In terms of
this notation, the main result on distances in CMn (d) when the degrees have infinite mean
is the following:
Theorem 7.23 (Distances in CMn (d) with iid infinite mean degrees) Consider CMn (d)
where the degrees d = (dv )v∈[n] are a sequence of iid copies of D satisfying (7.6.1) for
some τ ∈ (1, 2). Then, with o1 , o2 chosen independently and uar from [n],
lim P(distCMn (d) (o1 , o2 ) = 2) = 1 − lim P(distCMn (d) (o1 , o2 ) = 3) = pF ∈ (0, 1).
n→∞ n→∞
The probability pF can be identified as the probability that an outcome occurs both in MP,D
(1)
1
and MP,D2 , where D1 and D2 are two iid copies of D.
(2)
large, the probability that vertex 1 is not connected to any of the vertices corresponding to
(d(n+1−i) )i∈[K] converges to 0 when first n → ∞ followed by K → ∞.
Let Pn denote the conditional probability given the degrees (dv )v∈[n] . For i ∈ [n], we let
vi be the vertex corresponding to the ith largest degree d(n+1−i) . By Lemma 7.12,
Pn (vi not directly connected to vj ) ≤ exp − d(n+1−i) d(n+1−j) /2`n .
(7.6.4)
Moreover, d(n+1−i) , d(n+1−j) ≥ n1/(τ −1)−ε whp for n sufficiently large and any ε > 0, while
whp `n ≤ n1/(τ −1)+ε . As a result, whp,
when ε > 0 is sufficiently small. Therefore, for fixed K and for every i, j ∈ [K], the
vertices vi and vj are whp neighbors. This implies that the vertices corresponding to the
highest degrees whp form a complete graph.
We have already concluded that 1 is whp connected to vi for some i ≤ K . In the same
way, we conclude that vertex 2 is whp connected to vj for some j ≤ K . Since vi is whp
directly connected to vj , we conclude that
degrees d = (dv )v∈[n] are a sequence of iid copies of D satisfying that there exist τ > 3
and c < ∞ such that, for all x ≥ 1,
Let ν = E[D(D − 1)]/E[D] > 1. For k ≥ 1, let ak = blogν kc − logν k ∈ (−1, 0]. Then,
there exist random variables (Ra )a∈(−1,0] such that, as n → ∞ and for all k ∈ Z, with
o1 , o2 chosen independently and uar from [n],
P distCMn (d) (o1 , o2 ) − blogν nc = k | distCMn (d) (o1 , o2 ) < ∞
(7.6.9)
= P(Ran = k) + o(1).
The random variables (Ra )a∈(−1,0] can be identified by
where Y (1) and Y (2) are independent limit copies of Y in (7.6.7) and κ = E[D]/(ν − 1).
In words, Theorem 7.24 states that, for τ > 3, the graph distance distCMn (d) (o1 , o2 )
between two randomly chosen connected vertices grows as logν n, where n is the size of the
graph, and that the fluctuations around this leading asymptotics remain uniformly bounded
in n. Exercise 7.37 shows that distCMn (d) (o1 , o2 ) − blogν nc converges in distribution along
appropriately chosen subsequences.
The law of Ra is involved, and in most cases cannot be computed exactly. The reason for
this is the fact that the random variables Y (1) and Y (2) that appear in its statement are hard
to compute explicitly (see also [V1, Chapter 3]).
Let us give two examples where the law of Y is known. The first example is the r-regular
random graphs, for which all degrees in the graph are equal to some r ≥ 3. In this case,
E[D] = r, ν = r − 1, and Y = 1 almost surely. In particular, P(distCMn (d) (o1 , o2 ) <
∞) = 1 + o(1). Therefore,
r
P(Ra > k) = exp − (r − 1)a+k ,
(7.6.11)
r−2
and distCMn (d) (o1 , o2 ) is asymptotically equal to logr−1 n. Note that the distribution Ra de-
pends explicitly on a, so that distCMn (d) (o1 , o2 )−blogν nc does not converge in distribution
(see also Exercise 7.38).
The second example for which Y can be explicitly computed is where p? is the proba-
bility mass function of a geometric random variable, in which case the branching-process
generation sizes with offspring p? , conditioned to be positive, converge to an exponential
random variable with parameter 1. This example corresponds to
1
p?j = p(1 − p)j−1 , so that pj = p(1 − p)j−2 , ∀j ≥ 1, (7.6.12)
jcp
and cp is a normalization constant. For p > 12 , Y has the same law as the sum of D1
copies of a random variable that is equal to 0 with probability (1 − p)/p and an exponential
random variable with parameter 1 with probability (2p − 1)/p. Even in this simple case the
computation of the exact law of Ra is non-trivial.
320 Small-World Phenomena in Configuration Models
where cl = 1 if l is even, and zero otherwise, and Y (1) , Y (2) are two independent copies of
the limit random variable Y in Theorem 7.17.
In words, Theorem 7.25 states that for τ ∈ (2, 3) the graph distance distCMn (d) (o1 , o2 )
between two randomly chosen connected vertices grows proportionally to the log log of the
size of the graph and that the fluctuations around this mean remain uniformly bounded in n.
We next discuss an extension, obtained by possibly truncating the degree distribution. In
order to state the result, we make the following assumption that makes (7.6.13) more precise:
Condition 7.26 (Truncated infinite-variance degrees) Fix ε > 0. There exists a βn ∈
(0, 1/(τ − 1)] such that Fn (x) = 1 for x ≥ nβn (1+ε) , while, for all x ≤ nβn (1−ε) ,
Ln (x)
1 − Fn (x) = , (7.6.16)
xτ −1
with τ ∈ (2, 3) and a function Ln (x) that satisfies, for some constant C > 0 and γ ∈ (0, 1),
that, for all x ≤ nβn (1−ε) ,
γ−1 γ−1
x−C(log x) ≤ Ln (x) ≤ xC(log x) . (7.6.17)
Theorem 7.27 (Fluctuations of the distances CMn (d) for truncated infinite-variance de-
grees) Consider CMn (d) where the degrees d = (dv )v∈[n] satisfy Condition 7.26 for
some τ ∈ (2, 3). Assume that dmin ≥ 2, and that there exists κ > 0 such that
max{dTV (Fn , F ), dTV (Fn? , F ? )} ≤ n−κβn . (7.6.18)
7.7 Notes and Discussion for Chapter 7 321
When βn → 1/(τ − 1), we further require that the limit random variable Y in Theorem
7.17 has no point mass on (0, ∞). Then, with o1 , o2 chosen independently and uar from [n],
and conditional on o1 ←→ o2 ,
log log(nβn ) 1
distCMn (d) (o1 , o2 ) − 2 − (7.6.19)
| log(τ − 2)| βn (τ − 3)
is a tight sequence of random variables.
Which of the two terms in (7.6.19) dominates depends sensitively on the choice of βn .
When βn → β ∈ (0, 1/(τ − 1)], the first term dominates. When βn = (log n)−γ for some
γ ∈ (0, 1), the second term dominates. Both terms are of the same order of magnitude when
βn = Θ(1/ log log n).
For supercritical graphs, typical distances are at most of order log n. The boundary point
in (7.6.19) corresponds to βn = Θ(1/ log n), in which case nβn = Θ(1) and Theorem
7.1 applies. Thus, even after truncation of the degrees, in the infinite-variance case, typical
distances are always ultra-small.
Fernandez de la Vega (1982). For a nice discussion and results about the existence of a large k-core in the
configuration model, we refer to Janson and Luczak (2007).
Exercise 7.2 (Poisson degree example) Consider the degree sequence d = (dv )v∈[n] satisfying the Pois-
son degree limit as formulated in (1.7.4) and (1.7.5) with λ > 1. Let o1 , o2 be two independent vertices
chosen uar from [n]. Identify a number a such that, conditional on distCMn (d) (o1 , o2 ) < ∞,
P
distCMn (d) (o1 , o2 )/ log n −→ a. (7.8.2)
Exercise 7.3 (Power-law degree example) Consider the degree sequence d = (dv )v∈[n] with dv =
[1 − F ]−1 (v/n), where F is the distribution of a random variable D having generating function, for
α ∈ (0, 1),
GX (s) = s − (1 − s)α+1 /(α + 1) (7.8.3)
as in Exercise 1.14. Identify a number a such that, conditional on distCMn (d) (o1 , o2 ) < ∞,
P
distCMn (d) (o1 , o2 )/ log log n −→ a. (7.8.4)
Exercise 7.4 (Branching-process approximation in Corollary 7.3) Use Corollary 2.19 to prove the
branching-process approximation for CMn (d) in (7.3.2) in Corollary 7.3. Hint: Use that, with Gn =
CMn (d), Zl(n;i) = |∂Bl(Gn ) (oi )| for all l ∈ [r] when Br+1
(Gn )
(oi ) is a tree, and then use Theorem 4.1.
Exercise 7.5 (Proof of logarithmic lower bound distances in Theorem 7.6) Let o1 , o2 be two independent
vertices chosen uar from [n]. Use Proposition 7.5 with a = o1 , b = o2 , I = [n] to prove the logarithmic
lower bound on typical distances in Theorem 7.6.
Exercise 7.6 (Proof of distances for τ = 3 in Corollary 7.7) Let o1 , o2 be two independent vertices chosen
uar from [n]. Use Theorem 7.6 to prove the logarithmic and logarithmic divided by log log lower bounds on
typical distances in Corollary 7.7 when Conditions 1.7(a),(b) and (7.3.15) hold.
7.8 Exercises for Chapter 7 323
Exercise 7.7 (Proof of logarithmic upper bound distances without degree truncation) Check that the “giant
is almost local” analysis in Section 4.3.1 can be performed without the degree-truncation argument of
Theorem 1.11 when lim supn→∞ E[Dn3 ] < ∞. Hint: Note that lim supn→∞ E[Dn3 ] < ∞ implies that the
Chebychev inequality can be used without degree truncation.
Exercise 7.8 (Proof of logarithmic upper bound distances without degree truncation (cont.)) Extend the
proof in Exercise 7.7 to the case where lim supn→∞ E[Dnp ] < ∞ for some p > 2. Hint: Instead of the
Chebychev inequality, use the Marcinkiewicz–Zygmund inequality (Gut, 2005, Corollary 8.2), a form of
i=1 with E[Xi ] = 0 and all q ∈ (1, 2],
which states that, for iid random variables (Xi )m
m
h X qi
E Xi ≤ nE[|X1 |q ]. (7.8.5)
i=1
Exercise 7.9 (Proof of logarithmic upper bound on typical distances in Theorem 7.9) Use Proposition
7.10 to prove the logarithmic upper bound on typical distances in Theorem 7.9 by adapting the proof of the
related result for NRn (w) in Theorem 6.19.
Exercise 7.10 (Alternative proof of log log typical distances in Theorem 7.11) Give an alternative proof
of the doubly logarithmic upper bound on typical distances in Theorem 7.11 by adapting the proof of the
related result for NRn (w) in Theorem 6.11.
Exercise 7.11 (The hubs Γ1 form whp a complete graph) Use Lemma 7.12 and β > 21 to show that, whp,
the set of hubs in Γ1 in (7.3.52) forms a complete graph, i.e., whp, every pair i, j ∈ Γ1 are neighbors in
CMn (d).
Exercise 7.12 (Second alternative proof of log log typical distances in Theorem 7.11 using the core) Give
an alternative proof of the doubly logarithmic upper bound on typical distances in Theorem 7.11 by using
the diameter of the core in Theorem 7.13, and an application of the second-moment method for the existence
of paths in Proposition 7.10.
Exercise 7.13 (Typical distances when τ = 2 in Theorem 7.16) Recall the definition of kn? for the critical
case τ = 2 studied in Theorem 7.16. Show that kn? = o(log?p (n)) for every p ≥ 1, where log?p (n) is
obtained by taking the logarithm of n p times.
Exercise 7.14 (Typical distances when τ = 2 in Theorem 7.16) Recall kn? from Exercise 7.13. Investigate
heuristically the size of kn? .
Exercise 7.15 (Another example of typical distances when τ = 2) Adapt the upper bound on typical
distances for τ = 2 in Theorem 7.16 to degree sequences for which τ = 2, in which (7.3.67) is replaced by
γ
[1 − Fn ](x) ≥ c1 e−c(log x) /x for some c, c1 , γ ∈ (0, 1), and all x ≤ nβ for some β > 21 .
Exercise 7.16 (Infinite mean under conditions in Theorem 7.17) Prove that E[X] = ∞ when the condi-
tions in Theorem 7.17 are satisfied. Extend this to show that E[X s ] = ∞ for every s > α ∈ (0, 1).
Exercise 7.17 (Example of infinite-mean branching process) Prove that γ(x) = (log x)γ−1 , for some
γ ∈ [0, 1), satisfies the assumptions in Theorem 7.17.
Exercise 7.18 (Telescoping sum identity for generation sizes in infinite-mean branching processes) Con-
sider an infinite-mean branching process as in Theorem 7.17. Prove the telescoping-sum identity (7.4.3) for
αk log (Zk ∨ 1).
Exercise 7.19 (Conditions in Theorem 7.17 for individuals with infinite line of descent) Prove that p(∞)
in [V1, (3.4.2)] satisfies the conditions in Theorem 7.17 with the function x 7→ γ ? (x) given by γ ? (x) =
γ(x) + c/ log x.
Exercise 7.20 (Convergence for Zk + 1) Show that, under the conditions of Theorem 7.17, it also holds
that αk log(Zk + 1) converges to Y almost surely.
Exercise 7.21 (Branching processes with infinite mean: case X ≥ 0) Use the branching process of the
number of vertices with infinite line of descent in [V1, Theorem 3.12] to extend the proof of Theorem 7.17
to the case where X ≥ 0.
324 Small-World Phenomena in Configuration Models
Exercise 7.22 (Distributional identity of limit in Theorem 7.17) Let Y be the limit of k 7→ αk log(Zk ∨ 1)
in Theorem 7.17. Prove (7.4.34) by showing that
d X
Y = α max Yi ,
i=1
where X denotes the offspring variable of the infinite-mean branching process and (Yi )i≥1 is a sequence
of iid copies of Y .
Exercise 7.23 (Exponential limit in Theorem 7.17) Let the offspring X have generating function
GX (s) = 1 − (1 − s)α (7.8.6)
with α ∈ (0, 1) as in Exercise 1.5. Use (7.4.34) as well as Theorem 7.18 to show that the limit Y of
k 7→ αk log(Zk ∨ 1) in Theorem 7.17 has an exact exponential distribution.
Exercise 7.24 (Maximum process for infinite-mean branching processes) Recall the maximum process
for infinite-mean branching processes, for which we let Q0 = 1, and, given Qk−1 = qk−1 , let Qk denote
the maximal offspring of the qk−1 individuals in the (k − 1)th generation. Show that (Qk )k≥0 is a Markov
chain, for which the transition probabilities can be derived from (recall (7.4.39))
P(Qk > q | Qk−1 = qk−1 ) = 1 − FX (q)qk−1 . (7.8.7)
Exercise 7.25 (Telescoping sum identity for maximum process, for infinite-mean branching processes)
Recall the maximum process (Qk )k≥0 for infinite-mean branching processes from Exercise 7.24. Prove the
telescoping-sum identity for αk log Qk in (7.4.40).
Exercise 7.26 (Convergence of the maximum process for infinite-mean branching processes) Recall the
a.s.
maximum process (Qk )k≥0 for infinite-mean branching processes from Exercise 7.24. Show αk log Qk −→
Q∞ under the conditions of Theorem 7.17, by adapting the proof of double-exponential growth of the
generation sizes in Theorem 7.17. You may for simplicity assume that the offspring distribution satisfies
P(X > k) = c/kα for all k ≥ kmin and c = kminα
.
Exercise 7.27 (Diameter of “soup” of cycles) Prove that in a graph consisting solely of cycles, the diam-
eter is equal to the longest cycle divided by 2.
Exercise 7.28 (Longest cycle in a 2-regular graph) Let Mn denote the size of the longest cycle in a
d
2-regular graph. Prove that Mn /n −→ M for some M . What can you say about the distribution of M ?
Exercise 7.29 (Diameter result for ERn (λ/n)) Fix λ > 1, and recall the constants in the limit of
diam(ERn (λ/n))/ log n in (7.5.4), as a consequence of Theorem 7.19. Prove that ν in Theorem 7.19
equals ν = λ and that µ in Theorem 7.19 equals µ = µλ , where µλ ∈ [0, 1) is the dual parameter, i.e., the
unique µ < 1 satisfying
µe−µ = λe−λ . (7.8.8)
Exercise 7.30 (Diameter of GRGn (w)) Consider GRGn (w) where the weights w = (wv )v∈[n] satisfy
Conditions 1.1(a)–(c). Identify the limit in probability of diam(GRGn (w))/ log n. Can this limit be zero?
Exercise 7.31 (Expectation of the number of minimally k-connected vertices) Recall the definition of
minimally k-connected vertices in Definition 7.21. Prove that, for all k ≥ 1,
dmin (dmin −1)k−1
Y dmin (ndmin − (i − 1))
E[Mk ] = ndmin . (7.8.9)
i=1
`n − 2i + 1
Exercise 7.32 (Second moment of the number of minimally k-connected vertices) Recall the definition
of minimally k-connected vertices in Definition 7.21. Prove that, for all k such that dmin (dmin − 1)k−1 ≤
`n /8,
h dmin 2ndmin d2min (dmin − 1)2k i
E[Mk2 ] ≤ E[Mk ]2 + E[Mk ] (dmin − 1)k + . (7.8.10)
dmin − 2 (dmin − 2)`n
7.8 Exercises for Chapter 7 325
Exercise 7.33 (Concentration of the number of minimally k-connected vertices: proof of Lemma 7.22)
Recall the definition of minimally k-connected vertices in Definition 7.21. Use Exercises 7.31 and 7.32 to
complete the proof of Lemma 7.22.
Exercise 7.34 (Sum of Gamma variables is finite almost surely) Fix τ ∈ (1, 2). Let (Ei )i≥1 be iid
−1/(τ −1)
exponentials, and Γi = ij=1 Ei be (dependent) Gamma variables. Show that Z = i≥1 Γi
P P
is
almost surely finite.
Exercise 7.35 (Typical distance is at least 2 whp for τ ∈ (1, 2)) Complete the argument that
P(distCMn (d) (o1 , o2 ) = 1) = o(1)
in the proof of the typical distance for τ ∈ (1, 2) in Theorem 7.23.
Exercise 7.36 (Typical distance equals 2 whp for τ = 1) Let (dv )v∈[n] be a sequence of iid copies
of D with distribution function F satisfying that x 7→ [1 − F ](x) is slowly varying at ∞. Prove that
P P
P satisfies that distCMn (d) (o1 , o2 ) −→ 2. You may use without proof that Mn /Sn −→ 1, where
CMn (d)
Sn = v∈[n] Dv and Mn = maxv∈[n] Dv .
Exercise 7.37 (Convergence along subsequences (van der Hofstad et al. (2005))) Fix an integer n1 . Prove
that, under the assumptions in Theorem 7.24, and conditional on distCMn (d) (o1 , o2 ) < ∞, along the
subsequence nk = bn1 ν k−1 c the sequence of random variables distCMn (d) (o1 , o2 ) − blogν nk c converges
in distribution to Ran1 as k → ∞.
Exercise 7.38 (Non-convergence of graph distances for random regular graph) Let dv = r for every v ∈
[n] and let nr be even. Recall (7.6.11). Show that Theorem 7.24 implies that distCMn (d) (o1 , o2 ) − blogν nc
does not converge in distribution.
Exercise 7.39 (Tightness of the hopcount (van der Hofstad et al. (2005))) Prove that, under the assump-
tions in Theorem 7.24:
(a) conditional on distCMn (d) (o1 , o2 ) < ∞ and whp, the random variable distCMn (d) (o1 , o2 ) is in be-
tween (1 ± ε) logν n for any ε > 0;
(b) conditional on distCMn (d) (o1 , o2 ) < ∞, the random variables distCMn (d) (o1 , o2 ) − logν n form a
tight sequence, i.e.,
lim lim sup P |distCMn (d) (o1 , o2 ) − logν n| ≤ K distCMn (d) (o1 , o2 ) < ∞ = 1. (7.8.11)
K→∞ n→∞
As a consequence, prove that the same result applies to a uniform random graph with degrees (dv )v∈[n] .
Hint: Make use of [V1, Theorem 7.21].
C HAPTER 8
S MALL -W ORLD P HENOMENA IN
P REFERENTIAL ATTACHMENT M ODELS
Abstract
In this chapter we investigate graph distances in preferential attachment mod-
els. We focus on typical distances as well as on the diameter of preferential
attachment models. We again rely on path-counting techniques, as well as local
limit results. Since the local limit is a rather involved quantity, some parts of our
analysis are considerably harder than those in Chapters 6 and 7.
In Chapters 6 and 7, we saw that generalized random graphs and configuration models with
finite-variance degrees are small worlds, whereas random graphs with infinite-variance de-
grees are ultra-small worlds. In the small-world setting, distances are roughly logν n, where
ν describes the exponential growth of the branching-process approximation of local neigh-
borhoods in the random graphs in question. For preferential attachment models with δ > 0,
for which the degree power-law exponent equals τ = 3 + δ/m > 3, it is highly unclear
whether the neighborhoods grow exponentially, since the Pólya point tree that arises as the
local limit in Theorem 5.8 is a rather intricate object (recall also Theorem 5.21). This local
limit also makes path-counting estimates, the central method for bounding typical distances,
much more involved.
The ultra-small behavior in generalized random graphs and configuration models, on the
other hand, can be understood informally in terms of two effects. First, we note that such
random graph models contain super-hubs, whose degrees are much larger than n1/2 and
which form a complete graph of connections. Second, vertices of large degree d 1 are
typically connected to vertices of much larger degree, more precisely of degree roughly
d1/(τ −2) , by the power-iteration method. When combined, these two effects mean that it
takes roughly log log n/| log(τ − 2)| steps from a typical vertex to reach one of the super-
hubs, and thus roughly 2 log log n/| log(τ −2)| steps to connect two typical vertices to each
other. Of course the proofs are more technical, but this is the bottom line.
For preferential attachment models with δ ∈ (−m, 0) and m ≥ 2, however, vertices of
large degree d tend to be the old vertices, but old vertices are not necessarily connected to
much older vertices, which would be necessary to increase their degree from d to d1/(τ −2) .
However, vertices of degree d 1 do tend to be connected to vertices that are in turn con-
nected to a vertex of degree roughly d1/(τ −2) . This gives rise to a two-step power-iteration
property. We conclude that distances seem about twice as large in preferential attachment
models with infinite-variance degrees than in the corresponding generalized random graphs
or configuration models, and that this can be explained by differences in the local connec-
tivity structure. Unfortunately, owing to their dynamic nature, the results for preferential
attachment models are harder to prove, and they are less complete.
327
328 Small-World Phenomena in Preferential Attachment Models
The above explanations depend crucially on the local structure of the generalized random
graph as well as the configuration model. In this chapter we will see that, for the preferential
attachment model, such considerations need to be subtly adapted.
(1,δ)
and PAn (d). Such results are interesting in their own right and at the same time pro-
vide natural upper bounds on distances for m ≥ 2 owing to the fact that PA(m,δ) n (a) and
(m,δ) (1,δ/m)
PAn (b) can be obtained by collapsing blocks of m vertices in (PAn (a))n≥1 and
(PAn(1,δ/m) (b))n≥1 .
Let the height of a tree T on the vertex set [n] be defined as
height(T ) = max distT (1, v), (8.2.1)
v∈[n]
where distT (u, v) denotes the graph distance between vertices u and v in the tree T , and 1
denotes the root of the tree. We start by studying various distances in the tree PA(1,δ)
n :
k−1
\
{~π ⊆ PAn (b)} =
(1,δ)
{πi πi+1 }. (8.2.5)
i=0
1
1 + δ k Γ(v −
2+δ
)Γ(u) k−1
Y 1
P(~π ⊆ PA(1,δ)
n (a)) = 1+δ 1 . (8.2.8)
2+δ Γ(u + 2+δ
)Γ(v) i=1 πi − 2+δ
J
330 Small-World Phenomena in Preferential Attachment Models
Proof of Proposition 8.2. We claim that the events {πi πi+1 } are independent, i.e., it
holds that, for every ~π = (π0 , . . . , πk ),
k−1
\ k−1
Y
P {πi πi+1 } = P(πi πi+1 ). (8.2.9)
i=0 i=0
h k−1
\ i
P(~π ⊆ PAn (1,δ)
(b)) = E P {πi πi+1 } PA(1,δ)
π1 −1 (b)
i=0
= E 1Tk−1
h i
i=1 {πi πi+1 } P π0 π1 | PA(1,δ)
π0 −1 (b) , (8.2.10)
Tk−1
since the event i=1 {πi πi+1 } is measurable with respect to PA(1,δ)
π0 −1 (b) because π0 −
1 ≥ πi for all i ∈ [k − 1]. Furthermore, from [V1, (8.2.2)],
Dπ1 (π0 − 1) + δ
P π0
π1 | PA(1,δ)
π0 −1 (b) = . (8.2.11)
(2 + δ)(π0 − 1)
In particular,
h D (π − 1) + δ i
π1 0
P π0 π1 = E
. (8.2.12)
(2 + δ)(π0 − 1)
Therefore,
Dπ1 (π0 − 1) + δ i
n (b)) = E 1 k−1
h
P(~π ⊆ PA(1,δ) T
i=1 {πi πi+1 }
(2 + δ)(π0 − 1)
k−1 h D (π − 1) + δ i
π1 0
\
=P {πi πi+1 } E , (8.2.13)
i=1
(2 + δ)(π0 − 1)
since the random variable Dπ1 (π0 − 1) depends only on how many edges are connected
Tk−1
to π1 after time π1 , and is thus independent of the event i=1 {πi πi+1 }, which only
depends on the attachment of the edges up to and including time π1 . We conclude that
k−1
\
P(~π ⊆ PA(1,δ)
n (b)) = P π0 π1 P {πi πi+1 } . (8.2.14)
i=1
As a result,
1
1+δ Γ(πi − 1 + 2+δ )Γ(πi+1 )
P(πi πi+1 ) = 1
(2 + δ)(πi − 1) Γ(πi − 1)Γ(πi+1 + 2+δ )
1
1 + δ Γ(πi − 1 + 2+δ )Γ(πi+1 )
= 1 , (8.2.17)
2 + δ Γ(πi )Γ(πi+1 + 2+δ )
so that
1 + δ k k−1
Y Γ(πi − 1 + 1
2+δ
)Γ(πi+1 )
P(~π ⊆ PAn (b)) =
(1,δ)
1
2+δ i=0
Γ(πi )Γ(πi+1 + 2+δ )
1+δ 1
1 + δ k Γ(π0 −
2+δ
)Γ(πk ) k−1
Y Γ(πi − 1 + 2+δ )
= 1 1
2+δ Γ(π0 )Γ(πk + 2+δ ) i=0 Γ(πi + 2+δ )
1+δ
1 + δ k Γ(u −
2+δ
)Γ(v) k−1
Y 1
= 1 , (8.2.18)
2+δ Γ(u)Γ(v + 2+δ ) i=1 πi − 1+δ
2+δ
which proves (8.2.6). Since the path between vertex u and v in PA(1,δ)
n (b) is unique,
X k−1
\
P(distPA(1,δ)
n (b)
(u, v) = k) = P {πi πi+1 } , (8.2.19)
~
π i=0
where again the sum is over all ordered vectors ~π = (π0 , . . . , πk ) with π0 = u and πk = v .
Thus, (8.2.7) follows immediately from (8.2.6). This completes the proof of Proposition
8.2.
For the proof of Remark 8.3, (8.2.15) is replaced with
k−1
" #
Y Dπi+1 (πi − 1) + δ
P(~π ⊆ PAn (a)) =
(1,δ)
E , (8.2.20)
i=0
(2 + δ)(πi − 1) + 1 + δ
and (8.2.16) is replaced with, for n ≥ s,
1
Γ(n + 1)Γ(s − 2+δ )
E Ds (n) + δ = (1 + δ)
1+δ
. (8.2.21)
Γ(n + 2+δ )Γ(s)
After these changes, the proof follows the same steps (see Exercise 8.3).
immediately prove the first upper bound in (8.2.3). Further, (8.2.23) implies that, almost
surely for large n,
height(PA(1,δ)
n (b)) (1 + δ)
≤ (1 + ε) . (8.2.24)
log n (2 + δ)θ
Indeed, height(PA(1,δ) n (b)) > log n(1 + ε)(1 + δ)/(2 + δ)θ for n large precisely when
there exists an m large (m must at least satisfy m ≥ log n(1 + ε)(1 + δ)/(2 + δ)θ) such
that distPA(1,δ)
n (b)
(1, m) > log m(1 + ε)(1 + δ)/(2 + δ)θ. Since the latter almost surely
does not happen for m large, by (8.2.23), it follows that (8.2.24) does not hold for n large.
Thus, (8.2.22) and (8.2.23) also prove the second upper bound in (8.2.3).
By the triangle inequality,
distPA(1,δ)
n (b)
(o1 , o2 ) ≤ distPA(1,δ)
n (b)
(1, o1 ) + distPA(1,δ)
n (b)
(1, o2 ), (8.2.25)
diam(PAn (b)) ≤ 2 height(PAn (b)),
(1,δ) (1,δ)
(8.2.26)
so that (8.2.22) and (8.2.23) imply the upper bounds in (8.2.4), and thus all those in Theorem
8.1, for PA(1,δ)
n (b).
We proceed to prove (8.2.22) and (8.2.23) for PA(1,δ)
n (b), and start with some preparations.
We use (8.2.7) and symmetry to obtain
1 + δ k Γ(n − 1+δ k−1
2+δ
)Γ(1) X∗ 1 Y 1
P(distPA(1,δ) (1, n) = k) = 1 ,
n (b)
2+δ Γ(1 + 2+δ
)Γ(n) ~ (k − 1)! i=1 ti − 1+δ
2+δ
tk−1
(8.2.27)
where the sum now is over all vectors ~tk−1 = (t1 , . . . , tk−1 ), with 1 < ti < n, having
distinct coordinates. We can upper bound this sum by leaving out the restriction that the
coordinates of ~tk−1 are distinct, so that
!k−1
1 + δ k Γ(n − 1+δ
2+δ
) 1
n−1
X 1
P(distPA(1,δ) (1, n) = k) ≤ 1 .
n (b)
2+δ Γ(1 + 2+δ )Γ(n) (k − 1)! s=2
s − 1+δ
2+δ
(8.2.28)
Since x 7→ 1/x is monotonically decreasing and (1 + δ)/(2 + δ) ∈ (0, 1),
n−1 n−1 n
1 X 1 1
X Z
1+δ
≤ ≤1+ dx ≤ log (en). (8.2.29)
s=2
s − 2+δ s=2
s−1 1 x
1+δ
Also, we use [V1, (8.3.9)] to bound Γ(n − 2+δ
)/Γ(n) ≤ Cδ n−(1+δ)/(2+δ) , for some con-
stant Cδ > 0, so that
k−1
1+δ
2+δ
log (en)
− 1+δ
P(distPA(1,δ) (1, n) = k) ≤ Cδ n 2+δ
n (b)
(k − 1)!
1 + δ
= Cδ P Poi log (en) = k − 1 . (8.2.30)
2+δ
Now we are ready to prove (8.2.22) for PA(1,δ)
n (b). We note that o is chosen uar from [n],
8.2 Logarithmic Distances in Preferential Attachment Trees 333
so that, with C denoting a generic constant that may change from line to line,
n
1X
P(distPA(1,δ) (1, o) = k) = P(distPA(1,δ) (s, 1) = k)
n (b)
n s=1 n (b)
k−1
1+δ
1 Xn
1+δ 2+δ
log (es)
≤ Cδ s− 2+δ
n s=1 (k − 1)!
k−1
1+δ
2+δ
log (en) X n
1+δ
≤ Cδ s− 2+δ
n(k − 1)! s=1
k−1
1+δ
2+δ
log (en) 1+δ
≤C n− 2+δ
(k − 1)!
1 + δ
= C P Poi log (en) = k − 1 . (8.2.31)
2+δ
Therefore,
1 + δ
P(distPA(1,δ) (1, o) > k) ≤ C P Poi log (es) ≥ k . (8.2.32)
n (b)
2+δ
Fix ε > 0 and take k = kn = d (1+ε)(1+δ)
(2+δ)
log (en)e, to arrive at
P(distPA(1,δ)
n (b)
(1, o) > kn )
1 + δ (1 + ε)(1 + δ)
≤ C P Poi log (en) ≥ log (en) = o(1), (8.2.33)
2+δ (2 + δ)
by the law of large numbers and for any ε > 0, as required.
We continue by proving (8.2.23) for PA(1,δ)
n (b). By (8.2.30),
1 + δ
P(distPA(1,δ) (1, n) > k) ≤ Kδ P Poi log (en) ≥ k . (8.2.34)
n (b)
2+δ
Take kn = da log ne with a > (1 + δ)/(2 + δ). We use the Borel–Cantelli lemma to see
that dist(1, n) > kn will occur only finitely often when (8.2.34) is summable. We then
use the large-deviation bounds for Poisson random variables in [V1, Exercise 2.20] with
λ = (1 + δ)/(2 + δ) to obtain that
P(distPA(1,δ)
n (b)
(1, n) > a log n) ≤ Kδ n−p , (8.2.35)
with
a(2 + δ) 1+δ
p = a log −a+ . (8.2.36)
1+δ 2+δ
Let x be the solution of
1+δ
x(log (x(2 + δ)/(1 + δ)) − 1) + = 1, (8.2.37)
2+δ
so that x = (1 + δ)/[(2 + δ)θ]. Then, for every a > x,
P(distPA(1,δ)
n (b)
(1, n) > a log n) = O(n−p ), (8.2.38)
334 Small-World Phenomena in Preferential Attachment Models
where p in (8.2.36) satisfies p > 1 for any a > (1 + δ)/[(2 + δ)θ]. As a result, by the
Borel–Cantelli lemma, the event {distPA(1,δ)
n (b)
(1, n) > kn }, with kn = da log ne and
a > (1 + δ)/[(2 + δ)θ], occurs only finitely often, so that (8.2.23) holds.
Proof of the lower bound on distPA(1,δ)
n (b)
(1, o) in Theorem 8.1 for PA(1,δ)
n (b). By (8.2.31),
!
1 + δ
P(distPA(1,δ) (1, o) ≤ k) ≤ C P Poi log (en) ≤ k . (8.2.39)
n (b)
2+δ
Fix kn = d[(1 − ε)(1 + δ)/(2 + δ)] log (2 + δ)ne, and note that P(distPA(1,δ) n (b)
(1, o) ≤
kn ) = o(1) by the law of large numbers.
a.s. (1+δ)
n (b) / log n −→ (2+δ)θ in Theorem 8.1, we
To complete the proof that height PA(1,δ)
(1+δ)
use the second-moment method to prove that height PA(1,δ) n (b) ≤ (1 − ε) (2+δ)θ log n has
vanishing probability. Together with (8.2.24), this certainly proves that
height PA(1,δ)
n (b) P (1 + δ)
−→ .
log n (2 + δ)θ
However, since height PA(1,δ)
n (b) is a non-decreasing sequence of random variables, this
also implies convergence almost surely, as we argue in more detail below Proposition 8.4.
(1−ε)(1+δ)
n (b) ≤
We formalize the statement that height PA(1,δ) (2+δ)θ
log n as follows:
(1 − ε)(1 + δ)
≥ log nk−1
(2 + δ)θ
(1 + δ)
≥ (1 − ε)(1 − α) log n, (8.2.41)
(2 + δ)θ
where the third inequality follows from the almost sure lower bound on height PA(1,δ)
nk−1 (b) .
The above bound holds for all ε, α > 0, so that letting ε, α & 0 proves our claim.
We omit the proof of Proposition 8.4, and refer to the notes and discussion in Section
8.10 for details. The proof relies on a continuous-time embedding of preferential attachment
models, first invented by Pittel (1994), and the height of such trees, found using a beautiful
8.2 Logarithmic Distances in Preferential Attachment Trees 335
argument by Kingman (1975) (see Exercises 8.4 and 8.5). In the remainder of the proofs in
this chapter, we rely only on the upper bound in (8.2.24).
We continue by proving the lower bound on diam PA(1,δ) n (b) in Theorem 8.1:
Proof of the lower bound on diam PA(1,δ) n (b) in Theorem 8.1. We use the lower bound on
height(PA(1,δ)
n (b)) in Theorem 8.1 and the decomposition of scale-free trees in Theorem
5.4. Theorem 5.4 states that PA(1,δ)
n (b) can be decomposed into two scale-free trees hav-
ing similar distributions as copies PA(1,δ) (1,δ) (1,δ)
S1 (n) (b1 ) and PAn−S1 (n) (b2 ), where (PAn (b1 ))n≥1
and (PA(1,δ)
n (b2 ))n≥1 are independent scale-free tree processes, and the law of S1 (n) is
described in (5.2.30). By this tree decomposition,
n (b) ≥ height PAS1 (n) (b1 ) + height PAn−S1 (n) (b2 ) .
(1,δ) (1,δ)
diam PA(1,δ) (8.2.42)
distPA(1,δ) (b)
(1, V ) P
n
−→ 0. (8.2.45)
log n
We leave the proof of Lemma 8.5 as Exercise 8.16, which we postpone until we have
discussed path-counting techniques for preferential attachment models.
Proof of Theorem 8.1 for PA(1,δ) (1,δ) (1,δ)
n (d) and PAn (a). The proof of Theorem 8.1 for PAn (d)
(1,δ)
follows the same line of argument as for PAn (b), where we note that the only difference
in PA(1,δ) (1,δ)
n (d) and PAn (b) is in the graph for n = 2. We omit further details.
336 Small-World Phenomena in Preferential Attachment Models
To prove Theorem 8.1 for PA(1,δ)n (a), we note that the connected components of PAn (a)
(1,δ)
(1,δ) (1,δ)
are similar in distribution to single scale-free tree PAt1 (b1 ), . . . , PAtNn (bNn ), apart from
the initial degree of the root. Here ti denotes the size of the ith tree at time n, and we recall
P
that Nn denotes the total number of trees at time n. Since Nn / log n −→ (1 + δ)/(2 + δ)
whp the largest connected component has size at least εn/ log n. Since
by Exercise 5.26,
log εn/ log n = (1+o(1)) log n, the distances in these trees are closely related to those in
PA(1,δ) (1,δ)
n (b). Theorem 8.1 for PAn (a) then follows similarly to the proof for PAn (b).
(1,δ)
(a) (b)
0.2
0.3
0.15
Proportion
Proportion
0.2
0.1
0.1
0.05
0
2 4 6 8 10 12 14 16 18 20 22 24 2 3 4 5 6 7 8 9 10 11 12
Typical Distance Typical Distance
Figure 8.1 Typical distances between 2,000 pairs of vertices in the preferential
attachment model with n = 100, 000 and (a) τ = 2.5; (b) τ = 3.5.
Theorem 8.7 (Ultra-small typical distances for δ < 0) Consider PA(m,δ)
n (a) with m ≥ 2
and δ ∈ (−m, 0). Let o1 , o2 be chosen independently and uar from [n]. As n → ∞,
distPA(m,δ) (a)
(o1 , o2 ) P 4
n
−→ . (8.3.2)
log log n | log (τ − 2)|
These results also apply to PA(m,δ)
n (b) and PA(m,δ)
n (d) under identical conditions.
Exercise 8.6 investigates an example of the above result. Interestingly, the limiting con-
stant 4/| log (τ − 2)| appearing in Theorem 8.7 replaces the limit 2/| log (τ − 2)| in The-
orem 6.3 for the Norros–Reittu model NRn (w) and in Theorem 7.2 for the configuration
model CMn (d) when the power-law exponent τ satisfies τ ∈ (2, 3). Thus, typical distances
are twice as large for PA(m,δ)
n compared with CMn (d) with the same power-law exponent.
This can be intuitively explained as follows. For the configuration model CMn (d), vertices
with degree d 1 are likely to be directly connected to vertices of degree ≈ d1/(τ −2) (see,
e.g., Lemma 7.12), which is the whole idea behind the power-iteration methodology.
For PA(m,δ)
n , this is not the case. However, pairs of high-degree vertices are likely to be at
distance 2, as whp there is a young vertex that connects to both older vertices. This makes
distances in PA(m,δ)
n effectively twice as big as those for CMn (d) with the same degree
sequence. This effect is special for δ < 0 and is studied in more detail in Exercises 8.7 and
8.8, while Exercise 8.9 shows that this effect is absent when δ > 0.
The lower bounds in Theorem 8.8 also apply to PAn(m,0) (b) and PA(m,0)
n (d).
In this section we study the probability that a certain path is present in PA(m,δ) n . Recall from
Definition 6.5 that we call a k -step path ~π = (π0 , π1 , . . . , πk ) self-avoiding when πi 6= πj
for all 1 ≤ i < j ≤ k . The following proposition studies the probability that a path ~π is
present in PA(m,δ)
n :
Proposition 8.9 (Path counting in preferential attachment models) Consider PA(m,δ) n (a)
with m ≥ 2. Denote γ = m/(2m + δ). Fix k ≥ 0 and let ~π = (π0 , π1 , . . . , πk ) be a k -step
8.4 Path Counting in Preferential Attachment Models 339
self-avoiding path. Then, there exists a constant C > 0 such that, for all k ≥ 1,
k−1
Y 1
P(~π ⊆ PA(m,δ)
n (a)) ≤ (Cm)k 1−γ . (8.4.1)
i=0 (πi ∧ πi+1 )γ (πi ∨ πi+1 )
This result also applies to PA(m,δ)
n (b) and PA(m,δ)
n (d).
Paths are formed by repeatedly forming edges. When m = 1, paths always go from
younger to older vertices. When m ≥ 2 this monotonicity property is lost, which makes
proofs harder. We start by investigating intersections of events that specify which edges are
present in PA(m,δ)
n . More precise results appear in Propositions 8.14 and 8.15 below.
The following lemma shows that the events Env ,v , for different v , are negatively correlated:
Lemma 8.10 (Negative correlation for edge connections in preferential attachment models)
Consider PA(m,δ) n (a) with m ≥ 2. Fix k ≥ 1. For distinct v1 , v2 , . . . , vk ∈ [n] and all
nv1 , . . . , nvk ≥ 1,
\ Y
P Envt ,vt ≤ P(Envt ,vt ). (8.4.3)
t∈[k] t∈[k]
To advance the induction, we assume that (8.4.3) holds for all k , all distinct vertices
v1 , v2 , . . . , vk ∈ [n], all nv1 , . . . , nvk ≥ 1, and all choices of u(v i
s)
, ji(vs ) such that we
(vs )
have maxi,s m(ui − 1) + ji (vs )
≤ e − 1, and we extend it to all k , all distinct ver-
tices v1 , v2 , . . . , vk ∈ [n], all nv1 , . . . , nvk ≥ 1, and all choices of u(v i
s)
, ji(vs ) such that
(vs ) (vs )
maxi,s m(ui − 1) + ji ≤ e. Clearly, by induction, we may restrict attention to the case
for which maxi,s m(u(v i
s)
− 1) + ji(vs ) = e.
We note that there is a unique choice of u, j such that m(u − 1) + j = e. There are
two possibilities: (1) either there is exactly one choice of s and u(v i
s)
, ji(vTs)
such that u(v i
s)
=
u, ji = j , or (2) there are at least two such choices. In the latter case, t∈[k] Envt ,vt = ∅,
(vs )
since the eth edge is connected to a unique vertex. Hence, there is nothing to prove.
We are left with investigating the case where there exists a unique s and ui(vs ) , ji(vs ) such
that u(vi
s)
= u, ji(vs ) = j . Denote the restriction of Envs ,vs to all other edges by
(vs )
\ ji
En0 vs ,vs =
u(v
i
s)
vs . (8.4.4)
(vs ) (vs )
i∈[nv ] : (ui ,ji )6=(u,j)
Tk
By construction, all edge numbers of events in En0 v ,v ∩ i=1 : si 6=s Envi ,vi are at most e − 1.
By conditioning, we obtain
where we have used that the event En0 v ,v ∩ Envt ,vt is measurable with respect
T
t∈[k] : vt 6=vs
to PA(1,δ/m)
e−1 (a). We compute
j Dvs (u − 1, j − 1) + δ
P(u vs | PA(1,δ/m)
e−1 (a)) = , (8.4.7)
zu,j
where we recall that Dvs (u − 1, j − 1) is the degree of vertex vs after j − 1 edges of vertex
u have been attached, and we write the normalization constant in (8.4.7) as
1
X
Dvs (u − 1, j − 1) = m + j0 , (8.4.9)
{u0 vs }
(u0 ,j 0 ) : mu0 +j 0 ≤e−1
hypothesis for each of these terms. Thus, we obtain, using also that m + δ ≥ 0,
\ m+δ Y
P Envt ,vt ≤ P(En0 vs ,vs ) P(Envt ,vt ) (8.4.10)
t∈[k]
zu,j t∈[k] : v 6=v t s
j0
X P(En0 v ,v ∩ {u0 vs }) Y
+ P(Envt ,vt ).
(u0 ,j 0 ) : mu0 +j 0 ≤e−1
zu,j t∈[k] : vt 6=vs
and the advancement of the induction hypothesis is complete when we note that
Dvs (u − 1, j − 1) + δ i
E 1En0 v ,vs
h
= P(Envs ,vs ). (8.4.12)
s zu,j
The claim in Lemma 8.10 follows by induction.
Dv (u2 − 1) + δ
= E 1{u 1 v} . (8.4.17)
1 (u2 − 1)(2 + δ) + 1 + δ
We use the following iteration, for u > u1 :
1
E 1{u 1 v} (Dv (u − 1) + δ)
h i
= 1+
(2 + δ)(u − 1) + 1 + δ 1
u
1
h i
= 1+δ
E 1
{u1 v}
(D v (u − 1) + δ)
u − 1 + 2+δ
Γ(u + 1)Γ(u1 + 1+δ ) h
E 1{u 1 v} (Dv (u1 ) + δ) .
i
2+δ
= 1+δ
(8.4.18)
Γ(u + 2+δ )Γ(u1 + 1) 1
Therefore,
1 1
P u1 v, u2 v
1 Γ(u2 )Γ(u1 + 1+δ )
1
h i
2+δ
= 1 E 1 (D v (u1 ) + δ)
(u2 − 1)(2 + δ) + 1 + δ Γ(u2 − 2+δ )Γ(u1 + 1) {u1 v}
1{u
h i
We thus need to compute E 1
v}
(D v (u1 ) + δ) . We use recursion to obtain
1
1{u
h i
E 1
v}
(Dv (u1 ) + δ) PA(m,δ)
u1 −1 (a)
1
1
= (Dv (u1 − 1) + 1 + δ)P u1
v | PA(m,δ)
u1 −1 (a)
(Dv (u1 − 1) + δ)(Dv (u1 − 1) + 1 + δ)
= , (8.4.20)
(u1 − 1)(2 + δ) + 1 + δ
8.4 Path Counting in Preferential Attachment Models 343
1
since Dv (u1 ) = Dv (u1 − 1) + 1 on the event {u1 v}. By [V1, Proposition 8.15],
3+δ 1+δ
Γ(u + 2+δ
)Γ(v + 2+δ
)
E[(Dv (u) + δ)(Dv (u) + 1 + δ)] = 1+δ 3+δ
(2 + δ)(1 + δ). (8.4.21)
Γ(u + 2+δ
)Γ(v + 2+δ
)
Consequently,
E 1{u
h i
1
v}
(Dv (u1 − 1) + δ)
1
1 1+δ
Γ(u1 + 2+δ
)Γ(v 2+δ
)+
= 1 3+δ
(2 + δ)(1 + δ)
[(u1 − 1)(2 + δ) + 1 + δ]Γ(u1 − 2+δ )Γ(v + 2+δ
)
1
Γ(u1 + 2+δ )Γ(v + 1+δ
2+δ
)
= (1 + δ) 1+δ 3+δ
. (8.4.22)
Γ(u1 + 2+δ )Γ(v + 2+δ )
Combining (8.4.19)–(8.4.22), we arrive at
1 1
P u1
v, u2 v
1+δ 1 1+δ
1 + δ Γ(u2 )Γ(u1 + 2+δ ) Γ(u1 + 2+δ
)Γ(v + 2+δ
)
= 1+δ
× 1+δ 3+δ
2 + δ Γ(u2 + 2+δ )Γ(u1 + 1) Γ(u1 + 2+δ
)Γ(v + 2+δ
)
1
1 + δ Γ(u2 )Γ(u1 + 2+δ
)Γ(v+ 1+δ
2+δ
)
= 1+δ 3+δ
, (8.4.23)
2 + δ Γ(u2 + 2+δ )Γ(u1 + 1)Γ(v + 2+δ )
as claimed in (8.4.15).
j
The proof of (8.4.16) for m ≥ 2 follows by again recalling that u v occurs when
1
(m−1)u+j [mv]\[m(v−1)] and we replace δ by δ/m. Now there are two possibilities,
1 1
depending on whether m(u1 − 1) + j1 v1 and m(u2 − 1) + j2 v2 hold for the same
v1 = v2 ∈ [mv] \ [m(v − 1)] or for two different v1 , v2 ∈ [mv] \ [m(v − 1)].
For v1 = v2 , we use (8.4.15) to obtain a contribution that is asymptotically equal to
m m+δ 1
(1 + o(1)), (8.4.24)
m 2m + δ (u1 u2 )1−γ v 2γ
2
where the factor m comes from the m distinct choices for v1 = v2 and the factor 1/m2
originates since we need to multiply u1 , u2 and v in (8.4.15) by m.
For v1 6= v2 , we use the negative correlation in Lemma 8.10 to bound this contribu-
tion from above by the product of the probabilities in (8.4.14), so that the contribution is
asymptotically bounded by
m(m − 1) m + δ 2 1
(1 + o(1)). (8.4.25)
m2 2m + δ (u1 u2 )1−γ v 2γ
Summing (8.4.24) and (8.4.25) completes the proof of (8.4.16).
with either
j
Envs ,vs = {u vs }, (8.4.29)
where u > vs satisfy {u, vs } = {πi , πi+1 } for some i ∈ {0, . . . , k − 1} and j = ji ∈ [m],
or
s1 s2
Envs ,vs = {u1 vs , u2 vs }, (8.4.30)
where u1 , u2 > vs satisfy (u1 , vs , u2 ) = (πi , πi+1 , πi+2 ) for some i ∈ {0, . . . , k − 1} and
(s1 , s2 ) = (ji , ji+1 ) ∈ [m]2 .
In the first case, by (8.4.13),
j M1
P(Envs ,vs ) = P u
vs ≤ , (8.4.31)
u vsγ
1−γ
Theorem 8.13 (Logarithmic lower distances for δ > 0 and m ≥ 2) Consider PA(m,δ)
n (a)
with δ > 0 and m ≥ 2. Then, as n → ∞,
P(distPA(m,δ)
n (a)
(o1 , o2 ) ≤ (1 − ε) logν n) → 0, (8.5.1)
where ν > 1 is the spectral radius of the offspring operator T κ of the Pólya point tree
defined in (8.5.61). These results also apply to PA(m,δ)
n (b) and PA(m,δ)
n (d).
The proof of Theorem 8.13 is organized as follows. We prove path-counting bounds for
PA(m,δ)
n (a) and PA(m,δ)
n (b) with δ > 0 in Section 8.5.1. Those for PA(m,δ)
n (d) are deferred
to Section 8.5.2 and are based on the Pólya finite graph description of PA(m,δ)
n (d). We prove
Theorem 8.13 in Section 8.5.3. In Section 8.5.4 we prove the resulting lower bounds for
δ = 0 and m ≥ 2.
where
s (1{x>y,s=O} + 1{x<y,s=Y} )
c(∅)
κ∅ (x, (y, s)) = , (8.5.4)
(x ∨ y)χ (x ∧ y)1−χ
with c(∅)
Y
= m, c(∅)
O
= m + δ . Further, let
f ? (y, s) = f (y, s? ), where O
?
= Y, Y? = O. (8.5.5)
Our main path-counting result is the following proposition, which will be crucial in obtain-
ing the lower bound on typical distances:
Proposition 8.14 (Path-counting and multi-type branching processes) Consider PA(m,δ)
n (a)
for m ≥ 2. For every ε > 0, there exists a K = Kε such that, with o1 , o2 independently
and uar chosen from [n],
K
P(distPA(m,δ) (1 + ε)k hf, T k−2
(o1 , o2 ) = k) ≤ ?
κ f i. (8.5.6)
n (a)
n
Equivalently, with Zk the generation size of the multi-type branching process that started
from an individual of uniform age and with type ∅,
K
P(distPA(m,δ) (o1 , o2 ) = k) ≤ (1 + ε)k E[Zk ]. (8.5.7)
n (a)
n
These results also apply to PA(m,δ)
n (b) and PA(m,δ)
n (d).
346 Small-World Phenomena in Preferential Attachment Models
Below, we give a proof based on the negative correlations (8.4.3) in Lemma 8.10, as well
as Lemma 8.11. In Section 8.5.2, we redo the analysis for PA(m,δ)
n (d) using its finite-graph
Pólya version in Theorem 5.10.
Proof We start by analyzing the consequences of (8.4.3) in Lemma 8.10 for the existence
of paths. Note that
1 X
P(distPA(m,δ) (o1 , o2 ) = k) ≤ P(~π e ⊆ PA(m,δ)
n ), (8.5.8)
n
n2 ~πe
where we sum over all self-avoiding edge-labeled paths ~π e , as in Definition 8.12.
This improves upon the obvious upper bound mk , which was used in Proposition 8.9.
Connection Probabilities
We rely on the asymptotic equalities in (8.4.14) and (8.4.16) in Lemma 8.11. Fix ε > 0. We
note that the bounds with an extra factor 1 + ε/2 in (8.4.14) and (1 + ε/2)2 in (8.4.16) can
be used only when the inequality πi > M holds for some M = Mε for the relevant πi .
For the contributions where πi ≤ M , we use the uniform bounds in (8.4.14) and (8.4.16).
Note that the number of i for which πi ≤ M can be at most M since επ is self-avoiding.
Thus, in total, this gives rise to an additional factor (M1 ∨ M2 )M ≡ K . We next look at
the asymptotic equalities in (8.4.14) and (8.4.16), and conclude that the product over the
constant equals
m + δ k−c m + 1 + δ c
, (8.5.12)
2m + δ 2m + δ
8.5 Logarithmic Lower Bounds on the Distance 347
where now c = c(~π ) is the number of OY reversals in the vertices in ~π and is thus defined
by
k−1
1{label(π )=
X
c(~π ) = i O,label(πi+1 )=Y } . (8.5.13)
i=0
since x 7→ x−p is decreasing. Also using that (1 + ε/2)(1 + ε/4) ≤ 1 + ε for ε > 0
sufficiently small, we can rewrite this as
Z 1 Z 1 k
K Y −a(p )
P(distPA(m,δ) (o1 , o2 ) = k) ≤ (1 + ε)k
··· A(~x) xi i dx0 · · · dxk .
n
n 0 0 i=0
(8.5.19)
348 Small-World Phenomena in Preferential Attachment Models
Comparison with T κ
We next relate the (k + 1)-fold integral in (8.5.19) to the operator T κ defined in (8.5.2). For
this, we note that, by Lemma 5.25 and (8.5.3)–(8.5.5), hf, T k−2 ?
κ f i equals
1 1 k−1
cti−1 ,ti
Z Z X Y
··· c(∅) (∅)
label(x1 ) clabel(xk )? dx0 · · · dxk .
0 0 t ,...,t i=1
(xi−1 ∨ xi )χ (xi−1 ∧ xi )1−χ
1 k
(8.5.20)
k−1
Again, we note that c(∅) (∅)
label(x1 ) , clabel(xk )? and (cti−1 ,ti )i∈[k−1] are determined by (xi )i=0
alone. It is not hard to see that, since χ = 1 − γ ,
k k
Y 1 Y −a(p )
= xi i , (8.5.21)
i=1
(x i−1 ∨ x i )χ (x i−1 ∧ x i )1−χ
i=0
as required, so that the powers of x1 , . . . , xk in (8.5.19) and in (8.5.20) agree. Indeed, note
that ai = a(pi ) = 2(1 − γ) + pi (2γ − 1) and pi = 1{xi <xi−1 } + 1{xi <xi+1 } , so that, since
1 − γ = χ, −a(pi ) equals the power of xi in
(xi ∨ xi−1 )χ (xi ∧ xi−1 )1−χ (xi ∨ xi+1 )χ (xi ∧ xi+1 )1−χ . (8.5.22)
We conclude that
Z 1 Z 1 k
−a(pi )
Y
··· A(~x) xi dx0 · · · dxk = hf, T k−2 ?
κ f i, (8.5.23)
0 0 i=0
PA(m,δ)
n (d) with m ≥ 2 and δ > −m. For any edge-labeled self-avoiding path ~π e as in
Definition 8.12,
n
Y n
Y
Pn (~π e ⊆ PA(m,δ)
n (d)) = ψs
ps
(1 − ψs )qs , (8.5.25)
s=1 s=1
k k−1
1{π =s} [1{π + 1{πi+1 >s} ], 1{s∈(π ,π
X X
π π
ps = i i−1 >s}
qs = i i+1 )∪(πi+1 ,πi )}
(8.5.26)
i=0 i=0
terms in (8.5.28) and (8.5.29) correspond to the prefactors in (8.4.14) and (8.4.16) in Lemma
8.11. Further, the factors m and m − 1 arise owing to the sum over (ji )k−1
i=0 . Combining with
k k−1 k−2 ?
the sums over (πi )i=0 and (ji )i=0 gives rise to the factor hf, T κ f i as in Proposition
8.14, as we prove next.
We compute
n
Y (βs + qs − 1)qs
s=2
(α + β s + ps + qs − 1)ps +qs
n
Y 1 (βs + qs − 1)qs
=
s=2
(α + βs + ps + qs − 1)ps (α + βs + qs − 1)qs
n qs −1
Y 1 Y βs + i
=
s=2
(α + βs + ps + qs − 1)ps i=0 α + βs + i
n qs −1
Y 1 Y α
= 1− . (8.5.31)
s=2
(α + βs + ps + qs − 1)ps i=0 α + βs + i
n k k
Y Y −(1{πi−1 >πi } +1{πi+1 >πi } ) Y 1
s−ps = πi = . (8.5.32)
s=1 i=1 i=1
(πi−1 ∧ πi )
where πmin = minki=0 πi and we have used that ps = 0 for s < πmin . This bounds the first
factor in (8.5.31).
We continue by analyzing the second factor in (8.5.31). By a Taylor expansion,
α α
+ O (α + βs + i)−2 ,
log 1 − =− (8.5.35)
α + βs + i α + βs + i
8.5 Logarithmic Lower Bounds on the Distance 351
so that
s −1
n qY
Y α
1−
s=1 i=0
α + βs + i
s −1
n qX
!
X α −2
= exp − + O (α + βs + i)
s=1 i=0
α + βs + i
s −1
n qX
n
!
X
−2
X α
= exp O(1) qs s − . (8.5.36)
s=1 s=1 i=0
α + βs + i
1{s∈(π ,π
Pk−1
Further, since qs = i=0 i i+1 )∪(πi+1 ,πi )}
by (8.5.26), we have
n k−1 πi+1 ∨πi −1
X qs X X 1
−γ = −γ . (8.5.38)
s=1
s i=0 s=πi ∧πi+1 +1
s
Further,
Xn
qs
k−1 X
X n
1{s>(πi ∧πi+1 )} X k
c
≤ ≤
s=1
s 2
i=0 s=1
s 2
i=0
(πi ∧ πi+1 )
k
X 1 k
≤ 2c ≤ 2c log 1 + , (8.5.40)
l=0
πmin + l πmin
since ~π e is self-avoiding. Using this with πmin = 1 yields the bound 2c log (1 + k). We
conclude that
s −1
n qY k
Y α k O(1) Y (πi−1 ∧ πi )γ
1− = 1+ . (8.5.41)
s=1 i=0
α + βs + i πmin i=1
(πi−1 ∨ πi )γ
By (8.5.9), there are mk−b (m − 1)b choices for the edge labels. By (8.5.30) and (8.5.14),
n
Y
mk−b (m − 1)b (α + ps − 1)ps = A(~π ). (8.5.43)
s=1
where qi = label(πi ). Apart from the factor (1 + k/πmin )O(1) , this agrees exactly with the
summand in (8.5.17). Thus, we may follow the analysis of (8.5.17). Bounding πmin ≥ 1 and
summing this over the vertices in ~π , combined with an approximation of the discrete sum
by an integral, leads to the claim in (8.5.24).
k−2 ?
XZ
hf, T κ f i = f (x, s)κ?k−2 ((x, s), (y, t))f ? (y, t)dxdy, (8.5.44)
s,t S2
where κ?1 ((x, s), (y, t)) = κ((x, s), (y, t)), and we define recursively
X Z 1
?k
κ ((x, s), (y, t)) = κ?(k−1) ((x, s), (z, r))κ((z, r), (y, t))dx. (8.5.45)
r∈{O,Y} 0
c c c2 x2χ−3/2 c2 y 2χ−3/2
[(x/y)1/2−χ + (y/x)1/2−χ ] = √ + √ . (8.5.48)
x1−χ y 1−χ y x
8.5 Logarithmic Lower Bounds on the Distance 353
1 2 k−1
= h1, Mk−2 1i,
2 2χ − 1
where 1 = (1, 1)T is the constant vector. A similar computation, now computing √ the inte-
2χ−3/2
grals from left to right instead, shows that the contribution due to y / x is bounded
354 Small-World Phenomena in Preferential Attachment Models
by
1 2 k−1 1 2 k−1
h(M∗ )k−2 1, 1i = h1, Mk−2 1i, (8.5.56)
2 2χ − 1 2 2χ − 1
so that we end up with
c2 2 k−1
hf, T k−2 ?
κ f i ≤ h1, Mk−2 1i. (8.5.57)
2 2χ − 1
The matrix M has largest eigenvalue
p
m(m + δ) + m(m − 1)(m + δ)(2m + 1 + δ)
λM = , (8.5.58)
2m + δ
and smallest eigenvalue
p
m(m + δ) − m(m − 1)(m + δ)(2m + 1 + δ)
µM = < λM . (8.5.59)
2m + δ
A simple computation using that m > −δ shows that µM > 0. Thus,
where the equality follows since χ = (m + δ)/(2m + δ). Then this proves that
Write
kn? = d(1 − 2ε) logν ne. (8.5.63)
Then, by Proposition 8.14, with K 0 = 2Kc2 and for all n sufficiently large,
?
kn
X K0 K 0 kn
P(distPA(m,δ) (o1 , o2 ) ≤ kn? ) ≤ [(1 + ε)ν]k−2 ≤ [(1 + ε)ν]kn
n (a)
k=0
n n
K 0 kn?
≤ nlog(1+ε)+1−2ε = o(1), (8.5.64)
n
since log(1 + ε) ≤ 1 + ε, as required. The proof for PA(m,δ)
n (d) proceeds identically, now
using Proposition 8.15 instead.
8.5 Logarithmic Lower Bounds on the Distance 355
Thus,
n
!k−1
4(Cm)k X 1
P(distPA(m,0) (o 1 , o2 ) = k) ≤
n
n s=1
s
k
4(Cm)
≤ (log n)k−1 . (8.5.68)
n
As a result,
X 4(Cm)k
P(distPA(m,0) (o1 , o2 ) ≤ kn? ) ≤ 4 (log n)k−1
n
k≤k?
n
n
X 1
≤4 (log n)−1 → 0, (8.5.69)
k≤k ?
2 k
n
?
since (2Cm log n)kn ≤ n, which follows from (8.5.65). This implies that the typical dis-
tances are whp at least kn? in (8.5.65). Since kn? ≤ (1 − ε) log n/ log log n, this completes
the proof of the lower bound on the graph distances for δ = 0 in Theorem 8.8.
In Exercise 8.18, the reader is asked to prove that the distance between vertices n − 1
and n is also whp at least kn? in (8.5.65). Exercise 8.19 considers whether the above proof
implies that the distance between vertices 1 and 2 is whp at least kn? in (8.5.65) as well.
356 Small-World Phenomena in Preferential Attachment Models
In this section we prove the lower bound in Theorem 8.7 for δ < 0. We do so in a more
general setting, by assuming an upper bound on the existence of paths in a model that is
inspired by Proposition 8.9:
Assumption 8.18 (Path probabilities) There exist constants κ > 0 and γ > 0 such that,
for all n and self-avoiding paths ~π = (π0 , . . . , πk ) ∈ [n]l ,
k
Y
P(~π ⊆ PAn ) ≤ κ(πi−1 ∧ πi )−γ (πi ∨ πi−1 )γ−1 . (8.6.1)
i=1
bound compared with those for NRn (w) and CMn (d), this truncated first-moment method
looks rather different compared with those presented in the proof of Theorems 6.7 and 7.8.
This difference also explains why distances are twice as large for PAn in Theorem 8.19
compared with Theorems 6.7 and 7.8.
Let us now briefly explain the truncated first-moment method. We start with an expla-
nation of the (unconstrained) first-moment bound and its shortcomings. Let u, v ≥ εn be
distinct vertices of PAn . Then, by Assumption 8.18, for kn ∈ N,
2k
! 2k
[n [ X n X
The shortcoming of the above bound is that the paths that contribute most to the total
weight are those that connect u or v quickly to very old vertices. However, such paths are
quite unlikely to be present. This explains why the very old vertices have to be removed in
order to get a reasonable estimate, and why this leads to small errors when we do so. For
this, and similarly to Section 6.3.2 for NRn (w), we split the paths into good and bad paths:
Definition 8.20 (Good and bad paths for PAn ) For a decreasing sequence g = (gl )l=0,...,k
of positive integers, we consider a path ~π = (π0 , . . . , πk ) to be good when πl ∧ πk−l ≥ gl
for all l ∈ {0, . . . , k}. We denote the event that there exists a good path of length k between
u and v by Ek (u, v). We further let Fl (v) denote the event that there exists a bad path of
length l in PAn starting at v . This means that there exists a path ~π ⊆ PAn , with v = π0 ,
such that π0 ≥ g0 , . . . , πl−1 ≥ gl−1 , but πl < gl , i.e., a path that exceeds the threshold after
exactly l steps. J
The truncated first-moment estimate arises when one is bounding the events of the existence
of certain good or bad paths by their expected numbers. Owing to the split into good and
bad paths, these sums will now behave better than without this split. Inequality (8.6.8) is
identical to the inequality (6.3.30) used in the proof of Theorem 6.7. However, the notion of
good has changed owing to the fact that the vertices no longer have a weight, but rather an
age.
By Assumption 8.18,
P(~π ⊆ PAn ) ≤ p(~π ). (8.6.9)
358 Small-World Phenomena in Preferential Attachment Models
We conclude that all terms on the rhs of (8.6.8) are now bounded in terms of fl,n (u, v), as
k −1
kn gX k −1
kn gX
X X
P(distPAn (u, v) ≤ 2kn ) ≤ fk,n (u, w) + fk,n (v, w) (8.6.13)
k=1 w=1 k=1 w=1
2kn
X n
X
+ fbk/2c,n (u, πbk/2c )fdk/2e,n (v, πbk/2c ).
k=1 πbk/2c =gbk/2c
we establish upper bounds on fk,n (u, v) and use these to show that the rightmost term in
(8.6.8) remains small when k = kn is chosen appropriately.
Our aim is to provide an upper bound of the form
for suitably chosen parameters αk , βk ≥ 0. Key to this choice is the following lemma:
Then there exists a constant c = c(γ, κ) > 1 such that, for all u ∈ [n],
n
X
q` (w)p(w, u) ≤ c α log(n/`) + βn2γ−1 u−γ
w=1
as required. This advances the induction hypothesis, and thus completes the proof.
8.6 Log Log Distance Lower Bound for Infinite-Variance Degrees 361
We next use (8.6.15) to prove Theorem 8.19. We start with the contributions due to bad
paths. Summing over (8.6.27) in Lemma 8.23, and using (8.6.18) and (8.6.26), we obtain
gk −1
X 1 2ε
fk,n (v, w) ≤ αk gk1−γ 21−γ ≤ 2 2 , (8.6.30)
w=1
1−γ π k
which, when summed over all k ≥ 1, is bounded by ε/3. Hence, together the first two
summands on the rhs in (8.6.13) are smaller than 2ε/3. This shows that the probability that
there exists a bad path from either u or v is small, uniformly in u, v ≥ εn.
We continue with the contributions due to good paths, which is the most delicate part of
the argument. For this, it remains for us to choose kn as large as possible while ensuring that
gkn ≥ 2 and
2kn
X n
X
fbk/2c,n (u, πbk/2c )fdk/2e,n (v, πbk/2c ) ≤ ε/3. (8.6.31)
k=1 πbk/2c =gbk/2c
Proving (8.6.31) for the appropriate kn is the main content of the remainder of this section.
Recall from Definition 8.22 that gk is the largest integer satisfying (8.6.22) and that the
parameters αk , βk are defined via equalities in (8.6.23) and (8.6.24). To establish lower
bounds for the decay of gk , we instead investigate the growth of ηk = n/gk > 0 for k large:
Proposition 8.24 (Inductive bound on ηk ) Recall Definition 8.22, and let ηk = n/gk . Let
ε ∈ (0, 1). Then, there exists a constant B = Bε such that, for any k = O(log log n),
−k/2
ηk ≤ eB(τ −2) , (8.6.32)
where we recall the degree power-law exponent τ = (2 − γ)/(1 − γ) from (8.6.2).
Exercise 8.22 asks the reader to relate the above bound to the growth of the Pólya point
tree.
Before turning to the proof of Proposition 8.24, we comment on it. Recall that we are
summing over πk ≥ gk , which is equivalent to n/πk ≤ ηk . The sums in (8.6.13) are such
that the summands obey this bound for appropriate values of k .
Compare this with (6.3.30), where, instead, the weights obey wπk ≤ bk . We see that ηk
plays a similar role to bk . Recall that wπk is indeed close to the degree of πk in GRGn (w),
while the degree of vertex πk in PA(m,δ)n is close to (n/πk )1/(τ −1) by [V1, (8.3.12)]. Thus,
−k/2
the truncation n/πk ≤ ηk can be interpreted as a bound of order eB(τ −2) on the degree
−k
of πk . Note, however, that bk ≈ e(τ −2) by (6.3.42), which grows roughly twice as quickly
as ηk . This is again a sign that distances in PAn are twice as large as those in GRGn (w).
Before proving Proposition 8.24, we first derive a recursive bound on ηk :
Lemma 8.25 (Recursive bound on ηk ) Recall Definition 8.22, and let ηk = n/gk . Then
there exists a constant C > 0 independent of η0 = ε > 0 such that
1−γ 1−γ
≤ C ηkγ + ηk+1
ηk+2 log ηk+1 . (8.6.33)
Proof By the definition of gk in (8.6.22) and the fact that γ − 1 < 0,
1−γ γ−1 1 π2
ηk+2 = n1−γ gk+2 ≤ n1−γ (k + 2)2 αk+2 . (8.6.34)
1−γ ε
362 Small-World Phenomena in Preferential Attachment Models
1 π2
n1−γ (k + 2)2 αk+2
1−γ ε
c π2
= n1−γ (k + 2)2 αk+1 log ηk+1 + βk+1 n2γ−1 .
(8.6.35)
1−γ ε
We bound each of the two terms in (8.6.35) separately. By (8.6.26),
c π2
n1−γ (k + 2)2 αk+1 log ηk+1
1−γ ε
c21−γ π 2 (1 − γ)ε γ−1
≤ n1−γ (k + 2)2 2 g log ηk+1
1−γ ε π (k + 1)2 k+1
(k + 2)2 1−γ
= c21−γ η log ηk+1 , (8.6.37)
(k + 1)2 k+1
c π2 c π2
n1−γ (k + 2)2 βk+1 n2γ−1 = (k + 2)2 βk+1 nγ . (8.6.38)
1−γ ε 1 − γ 6ε
c π2 c π2
(k + 2)2 βk+1 nγ = (k + 2)2 nγ c αk gk1−2γ + βk log ηk ,
(8.6.39)
1−γ ε 1−γ ε
which again leads to two terms that we bound separately. For the first term in (8.6.39), we
again use the fact that
(1 − γ)ε γ−1
αk ≤ 21−γ g ,
π2 k2 k
to arrive at
c π2
(k + 2)2 nγ cαk gk1−2γ
1−γ ε
c21−γ π 2 (1 − γ)ε (k + 2)2 γ
≤ (k + 2)2 nγ c 2 2 gkγ−1 gk1−2γ = c2 21−γ ηk , (8.6.40)
1−γ ε π k k2
which contributes to the first term on the rhs of (8.6.33).
By Definition 8.22 we have cβk n2γ−1 ≤ αk+1 , so that, using (8.6.36), the second term
8.6 Log Log Distance Lower Bound for Infinite-Variance Degrees 363
in (8.6.39) is bounded by
c π2
(k + 2)2 nγ cβk log ηk
1−γ ε
c π2
≤ (k + 2)2 αk+1 n1−γ log ηk
1−γ ε
(k + 2)2 γ−1 1−γ
≤ c21−γ g n log ηk
(k + 1)2 k+1
(k + 2)2 1−γ
= c21−γ η log ηk . (8.6.41)
(k + 1)2 k+1
Since k 7→ gk is decreasing, it follows that k 7→ ηk is increasing, so that
Proof of Proposition 8.24. We prove the proposition by induction on k , and start by ini-
tializing the induction. For k = 0,
n n
η0 = = ≤ ε−1 ≤ eB , (8.6.43)
g0 dεne
so that, by (8.6.44),
1/(τ −2)2 1/(τ −2)
ηk ≤ C(2C)1/(τ −2) ηk−4 + ηk−3 (log ηk−3 )1/[(1−γ)(τ −2)]
For the first term in (8.6.48), we use the upper bound η0 ≤ 1/ε to obtain
(τ −2)−k/2
Pk/2 −l Pk/2 −l −k/2
C l=0 (τ −2) η0 ≤C l=0 (τ −2) e(B/2)(τ −2) (8.6.49)
−k/2
≤ 12 eB(τ −2) ,
Pk/2 −l −k/2
when B ≥ 2 log(1/ε), and we use that C l=0 (τ −2) ≤ 12 e(B/2)(τ −2) for B large, since
C is independent of ε.
For the second term in (8.6.48), we use the induction hypothesis to obtain
k/2
(τ −2)−(i−1)
X Pi−1 −l −(i−1)
C l=0 (τ −2) ηk−2i+1 (log ηk−2i+1 )(τ −2) /(1−γ)
(8.6.50)
i=1
k/2 h i(τ −2)−(i−1) /(1−γ)
X Pi−1 −l −(k−1)/2
≤ C l=0 (τ −2) eB(τ −2) B(τ − 2)−(k−2i+1)/2 .
i=1
We can write
−(k−1)/2 −k/2 −k/2 √
eB(τ −2) = eB(τ −2) eB(τ −2) ( τ −2−1)
. (8.6.51)
√
Since τ − 2 − 1 < 0, for k = O(log log n) we can take B large enough that, uniformly
in k ≥ 1,
k/2 i(τ −2)−(i−1) /(1−γ)
Pi−1 −l √ h
B(τ −2)−k/2 ( τ −2−1)
X
C l=0 (τ −2) e B(τ − 2) −(k−2i+1)/2
< 12 .
i=1
(8.6.52)
We can now sum the bounds in (8.6.49) and (8.6.50)–(8.6.52) to obtain
−k/2 −k/2
ηk ≤ 12 + 21 eB(τ −2) = eB(τ −2)
, (8.6.53)
as required. This advances the induction hypothesis, and thus completes the proof of Propo-
sition 8.24 for B ≥ 2 log(1/ε).
We are now ready to complete the proof of Theorem 8.19:
Completion of the proof of Theorem 8.19. Recall that we were left with proving (8.6.31),
i.e., that uniformly in u, v ≥ εn,
2kn
X n
X
fbk/2c,n (u, πbk/2c )fdk/2e,n (v, πbk/2c ) ≤ ε. (8.6.54)
k=1 πbk/2c =gbk/2c
A crucial part of the proof will be the optimal choice of kn . By Lemma 8.24,
−k/2
gk ≥ n/ηk ≥ neB(τ −2) . (8.6.55)
8.6 Log Log Distance Lower Bound for Infinite-Variance Degrees 365
2kn
X n
X
fbk/2c,n (u, πbk/2c )fdk/2e,n (v, πbk/2c )
k=1 πbk/2c =gbk/2c
2kn n
αdk/2e w−γ + 1{w>gdk/2e−1 } βdk/2e wγ−1
X X 2
≤
k=1 w=gbk/2c
2kn n
w−2γ + 1{w>gdk/2e−1 } βdk/2e
X X
2 2
≤2 αdk/2e w2(γ−1) . (8.6.57)
k=1 w=gbk/2c
This gives two terms, which we estimate one at a time. For the first term, using that γ > 12
and that k 7→ αk is non-decreasing, while k 7→ gk is non-increasing, by (8.6.18), we find
that
2kn n n 2k
X X
2 2 X
2 αdk/2e w−2γ ≤ α2 (gbk/2c − 1)1−2γ (8.6.58)
k=1 w=gbk/2c
2γ − 1 k=1 dk/2e
2kn kn
2 X 1−2γ 4 × 22γ−1 ηkn X
≤ 2
αdk/2e gdk/2e = α2 g 2−2γ .
2γ − 1 k=1 2γ − 1 n k=1 k k
2kn n n k
X X
2 4 X
2 βdk/2e w2(γ−1) ≤ β 2 n2γ−1 . (8.6.61)
k=1 w=gdk/2e−1
2γ − 1 k=1 k
366 Small-World Phenomena in Preferential Attachment Models
Cηk2−2γn +1
ε2 Cηkn +1 ε2
≤ ≤ , (8.6.62)
n n
as in (8.6.58)–(8.6.60), since γ ∈ ( 12 , 1).
We conclude that, using (8.6.32) in Proposition 8.24,
2kn
X n
X
fbk/2c,n (u, πbk/2c )fdk/2e,n (v, πbk/2c )
k=1 πbk/2c =gbk/2c
Let us explain the philosophy of the proof. Note that Coren requires only information
about PA(m,δ)
n , while we are going to study its diameter in PA(m,δ)
2n . This allows us to use
the edges originating from vertices in [2n] \ [n] as a sprinkling of the graph that will cre-
ate shortcuts in PA(m,δ)
n . Such shortcuts shorten graph distances tremendously. We call the
vertices that create such shortcuts n-connectors.
Basically, this argument shows that a vertex v ∈ [n] of large degree Dv (n) 1 will
likely have an n-connector to a vertex u ∈ [n] satisfying Du (n) ≥ Dv (n)1/(τ −2) . This is
related to the power-iteration argument for the configuration model discussed below Propo-
sition 7.15. However, for preferential attachment models, we emphasize that it takes two
steps to link a vertex of large degree to another vertex of even larger degree. In the proof
for the configuration model in Theorem 7.13, this happened in only one step. Therefore,
distances in PA(m,δ)
n for δ < 0 are (at least in terms of upper bounds) twice as large as the
corresponding distances for a configuration model with similar degree structure. Let us now
state our main result, for which we require some notation.
For A ⊆ [2n], we write
The proof of Theorem 8.26 is divided into several smaller steps. We start by proving that
the diameter of the inner core Innern , which is defined by
is whp bounded by some finite constant Kδ < ∞. After this, we show that the distance
from the outer core, given by Outern = Coren \Innern , to the inner core can be bounded
by 2 log log n/| log (τ − 2)|. This shows that the diameter of the outer core is bounded
by 4 log log n/| log (τ − 2)| + Kδ , as required. We now give the details, starting with the
diameter of the inner core:
Before proving Proposition 8.27, we first introduce the important notion of an n-connector
between two sets of vertices A, B ⊆ [n], which plays a crucial role throughout the proof:
368 Small-World Phenomena in Preferential Attachment Models
Definition 8.28 (n-connector) Fix two sets of vertices A and B . We say that the vertex
j ∈ [2n] \ [n] is an n-connector between A and B if one of the edges incident to j connects
to a vertex in A, while another edge incident to j connects to a vertex in B . Thus, when
there exists an n-connector between A and B , the distance between A and B in PA(m,δ)2n is
at most 2. J
The next lemma gives bounds on the probability that an n-connector does not exist:
Lemma 8.29 (Connectivity sets in infinite-variance degree preferential attachment models)
2n (a) with m ≥ 2 and δ ∈ (−m, 0). For any two sets of vertices A, B ⊆
Consider PA(m,δ)
[n], there exists η = η(m, δ) > 0 such that
P(no n-connector for A and B | PA(m,δ)
n (a)) ≤ e−ηDA (n)DB (n)/n , (8.7.6)
where, for any A ⊆ [n],
X
DA (n) = Da (n), (8.7.7)
a∈A
denotes the total degree of vertices in A at time n. These results also apply to PA(m,δ)
2n (b)
and PA(m,δ)
2n (d) under identical conditions.
Lemma 8.29 plays the same role for preferential attachment models as Lemma 7.12 for
configuration models.
Proof We give only the proof for PA(m,δ) (m,δ) (m,δ)
2n (a); the proofs for PA2n (b) and PA2n (d)
(m,δ)
are identical. We note that for two sets of vertices A and B , conditional on PAn (a) the
probability that j ∈ [2n] \ [n] is an n-connector for A and B is at least
(DA (n) + δ|A|)(DB (n) + δ|B|)
, (8.7.8)
[2n(2m + δ)]2
independently of whether the other vertices are n-connectors.
Since Di (n) + δ ≥ m + δ > 0 for every i ≤ n, and δ < 0, for every i ∈ B , we have
δ δ m+δ
Di (n) + δ = Di (n) 1 + ≥ Di (n) 1 + = Di (n) , (8.7.9)
Di (n) m m
and thus DA (n) + δ|A| ≥ DA (n)(m + δ)/m. As a result, for η = (m + δ)2 /(2m(2m +
δ))2 > 0, the probability that j ∈ [2n] \ [n] is an n-connector for A and B is at least
ηDA (n)DB (n)/n2 , independently of whether the other vertices are n-connectors. There-
fore, the probability that there is no n-connector for A and B is, conditional on PA(m,δ)
n (a),
bounded above by
ηDA (n)DB (n) n
1− ≤ e−ηDA (n)DB (n)/n , (8.7.10)
n2
as required.
We now give the proof of Proposition 8.27:
Proof of√Proposition 8.27. From [V1,√ Theorem 8.3 and Exercise 8.20] whp, Innern contains
at least n vertices. Denote the first n vertices of Innern by I . We are relying on Lemma
8.29. Recall that Di (n) ≥ n1/[2(τ −1)] (log n)−1/2 for all i ∈ I . Observe that n1/(τ −1)−1 =
8.7 Log Log Upper Bounds for PAMs with Infinite-Variance Degrees 369
o(1) for τ > 2, so that, for any i, j ∈ I , the probability that there exists an n-connector for
i and j is bounded below by
1 − exp{−ηn1/(τ −1)−1 (log n)−1 } ≥ pn ≡ n−(τ −2)/(τ −1) (log n)−2 , (8.7.11)
for n sufficiently large.
√
We wish to couple Innern to an Erdős–Rényi random graph with Nn = n vertices and
edge probability pn , which we denote by ERNn (pn ). For this, for i, j ∈ [Nn ], we say that
an edge between i and j is present when there exists an n-connector connecting the ith and
j th vertices in I .
We now prove that this graph is stochastically bounded below by ERNn (pn ). Note that
(8.7.11) does not guarantee this coupling; instead we need to prove that the lower bound
holds uniformly when i and j belong to I , independently of the previous edges. For this, we
order the Nn (Nn − 1)/2 edges in an arbitrary way and bound the conditional probability
that the lth edge is present, conditioning on all previous edges, from below by pn for every
l. This proves the claimed stochastic lower bound.
Indeed, the lth edge is present precisely when there exists an n-connector connecting
the corresponding vertices, which we call i and j in I . Moreover, we shall not make use
of the first vertices that were used to n-connect the previous edges. This removes at most
Nn (Nn − 1)/2 ≤ n/2 possible n-connectors, after which at least another n/2 remain.
The probability that one of them is an n-connector for the ith and j th vertex in I is, for n
sufficiently large, bounded below by
1 − exp{−ηn1/(τ −1)−2 (log n)−1 n/2} ≥ pn ≡ n−(τ −2)/(τ −1) (log n)−2 ,
using 1 − e−x ≥ x/2 for x ∈ [0, 1] and η/2 ≥ 1/ log n for n sufficiently large. This
proves the claimed stochastic domination of the random graph on I and ER(Nn , pn ). Next,
we show that diam(ERNn (pn )) is, whp, uniformly bounded by a constant.
For this we use the result in (Bollobás, 2001, Corollary 10.12), which gives sharp bounds
on the diameter of an Erdős–Rényi random graph. Indeed, this result implies that if pd N d−1 −
2 log N → ∞, while pd−1 N d−2 − 2 log N → −∞, then diam(ERN (p)) = d, whp. In
our case, N = Nn = n1/2 and
p = pn = n−(τ −2)/(τ −1) (log n)−2 = N −2(τ −2)/(τ −1) (2 log N )−2 ,
−1 −1
which implies that, whp, τ3−τ < d ≤ τ3−τ + 1. Thus, we obtain that the diameter of I in
τ −1
PA2n is whp bounded by 2d ≤ 2( 3−τ + 1). In Exercise 8.23, the reader is asked to prove
(m,δ)
Proposition 8.30 (Distance between outer and inner core) Consider PA(m,δ)2n (a) with m ≥
2 and δ ∈ (−m, 0). The inner core Innern can whp be reached from any vertex in the outer
core Outern using no more than |2log
log log n
(τ −2)|
edges in PA(m,δ)
2n (a), i.e., whp,
2 log log n
max min distPA(m,δ) (i, j) ≤ . (8.7.12)
i∈Outern j∈Innern 2n (a)
| log (τ − 2)|
These results also apply to PA(m,δ) (m,δ)
2n (b) and PA2n (d).
On the event that the bounds in (8.7.21) hold, by Lemma 8.29 we obtain that the conditional
probability, given PA(m,δ)
n , that there exists an i ∈ Γk such that there is no n-connector
between i and Γk−1 is bounded, using Boole’s inequality, by
n exp − ηB[uk−1 ]2−τ uk = ne−ηBD log n ≤ n−(1+ζ) ,
(8.7.22)
where we have used (8.7.18) and taken D = 2(1 + ζ)/(ηB).
We now complete the proof of Proposition 8.30. Fix
$ %
? log log n
kn = . (8.7.23)
| log (τ − 2)|
By Lemma 8.31, and since kn? n−ζ = o(1) for all ζ > 0, the distance between Γkn? and
Innern is at most 2kn? . Therefore, we are done if we can show that
Outern ⊆ {i : Di (n) ≥ (log n)σ } ⊆ Γkn? = {i : Di (n) ≥ ukn? }, (8.7.24)
so that it suffices to prove that (log n)σ ≥ ukn? for any σ > 1/(3 − τ ). This follows by
(7.3.66), which implies that
ukn? = (log n)1/(3−τ )+o(1) ; (8.7.25)
by picking n sufficiently large, we see that this is smaller than (log n)σ for any σ > 1/(3 −
τ ). This completes the proof of Proposition 8.30.
Proof of Theorem 8.26. We note that whp diam2n (Coren ) ≤ Kδ +2kn? , where kn? in (8.7.23)
is the upper bound on maxi∈Outern minj∈Innern distPA(m,δ) (i, j) in Proposition 8.30, and we
2n
have made use of Proposition 8.27. This proves Theorem 8.26.
Together with Theorem 8.26, Theorem 8.32 proves the upper bound in Theorem 8.7:
Proof of the upper bound in Theorem 8.7. Choose o1 , o2 ∈ [2n] independently and uar.
Using the triangle inequality, we obtain the bound
distPA(m,δ)
2n
(o1 , o2 )
≤ distPA(m,δ)
2n
(o1 , Coren ) + distPA(m,δ)
2n
(o2 , Coren ) + diam2n (Coren ). (8.7.26)
By Theorem 8.32, the first two terms are each whp bounded by C log log log n. Further, by
Theorem 8.26, the third term is bounded by (1 + oP (1)) |4log
log log n
(τ −2)|
. This completes the proof
of the upper bound in Theorem 8.7.
372 Small-World Phenomena in Preferential Attachment Models
Exercise 8.24 shows that distPA(m,δ) (o1 , o2 ) − 2 log log n/| log(τ − 2)| is upper tight
2n
when distPA(m,δ) (o1 , Coren ) is tight.
2n
Proof of Theorem 8.32. We use the same ideas as in the proof of Theorem 8.26, but now
start from a vertex of large degree at time n instead. We need to show that, for fixed ε > 0,
a uniformly chosen vertex o ∈ [(2 − ε)n] can whp be connected to Coren using no more
than C log log log n edges in PA(m,δ)
2n . This is done in two steps.
In the first step, we explore the neighborhood of o in PA(m,δ)
2n until we find a vertex v0
with degree Dv0 (n) ≥ u0 , where u0 will be determined below. Denote the set of all vertices
in PA(m,δ)
2n that can be reached from o using exactly k different edges from PA(m,δ)2n by Sk .
Denote the first k for which there is a vertex in Sk whose degree at time n is at least u by
6 ∅ .
Tu(o) = inf k : Sk ∩ {v : Dv (n) ≥ u} = (8.7.27)
Recall the local convergence in Theorem 5.26, as well as the fact that each vertex v has
m older neighbors v1 , . . . , vm , whose ages av1 , . . . , avm are distributed as Uv(τi −2)/(τ −1) av ,
where av is the age of v . Therefore, whp, there is a vertex in Sk with arbitrarily small age,
and thus also with arbitrarily large degree, at time n. As a result, there exists a C = Cu,ε
such that, for sufficiently large n,
The second step is to show that a vertex v0 satisfying Dv0 (n) ≥ u0 for sufficiently large
u0 can be joined to the core by using O(log log log n) edges. To this end, we apply Lemma
8.29 to obtain, for any vertex a with Da (n) ≥ wa , the probability that there does not exist a
vertex b with Db (n) ≥ wb that is connected to a by an n-connector, conditional on PA(m,δ)
n ,
is at most
exp {−ηDa (n)DB (n)/n} , (8.7.29)
we thus obtain that the probability that such a b does not exists is at most
exp −η 0 wa wb2−τ ,
(8.7.31)
where η 0 = ηc. Fix ε > 0 such that (1 − ε)/(τ − 2) > 1. We then iteratively take
(1−ε)/(τ −2)
uk = uk−1 , to see that the probability that there exists a k for which there does not
exist a vk with Dvk (n) ≥ uk is at most
k
X
exp −η 0 uεl−1 .
(8.7.32)
l=1
we obtain that the probability that there exists an l with l ≤ k for which there does not exist
8.8 Diameters in Preferential Attachment Models 373
Now fix k = kn = dC log log log ne and choose u0 sufficiently large that
kn n o
l−1
X
exp −η 0 u0εκ ≤ ε/2. (8.7.35)
l=1
Then we obtain that, with probability at least 1 − ε/2, v0 is connected in kn steps to a vertex
kn
vkn with Dvkn (n) ≥ u0κ . Since, for C ≥ 1/ log κ, we have
kn
uκ0 ≥ ulog
0
log n
≥ (log n)σ (8.7.36)
when log u0 ≥ σ , we obtain that vkn ∈ Coren whp.
Since PA(m,δ)
T (a) is connected whp,
diamPA(m,δ)
n (a)
([T ]) ≤ diam(PA(m,δ)
T (a)), (8.8.4)
which is a tight random variable. Then, similarly to (8.8.2),
distPA(m,δ)
n (a)
(u, [T ]) ≤ distPA(1,δ/m)
mn (a)
(u, [mT ]). (8.8.5)
As in the proof for PA(m,δ)
n (b), whp, when T is sufficiently large,
max distPA(1,δ/m)
mn (a)
(u, [mT ]) ≤ c log n. (8.8.6)
u∈[mn]
P
Mk −→ ∞. (8.8.9)
Exercises 8.26–8.28 prove Lemma 8.36 for PA(m,δ) n (d); they rely on the arguments in the
proof of Proposition 5.22. The proofs for PA(m,δ)
n (a) and PA(m,δ)
n (b) are quite similar.
(m,δ)
To complete the lower bound on diam(PAn ) in Theorem 8.34, we take two vertices
u, v ∈ Mk with k = (1 − ε) log log n/ log m. By definition, ∂Bk(Gn ) (u), ∂Bk(Gn ) (v) ⊂
[n] \ [n/2]. We can then adapt the proof of Theorem 8.19 to show that the distance between
∂Bk(Gn ) (u) and ∂Bk(Gn ) (v) is whp still bounded from below by 4 log log n/| log (τ − 2)|.
Therefore, whp,
diam(PA(m,δ)
n ) ≥ distPA(m,δ)
n
(u, v) = 2k + distPA(m,δ)
n
(∂Bk(Gn ) (u), ∂Bk(Gn ) (v))
2(1 − ε) log log n 4 log log n
≥ + . (8.8.10)
log m | log (τ − 2)|
This gives an informal proof of the lower bound.
Critical Case δ = 0
We close this section by discussing the diameter for δ = 0:
tn n−2m/δ , which is an interesting case and explains why the supremum in (8.9.1) can
basically be restricted to t ∈ [n, n−2m/δ+o(1) ]. This sheds light on precisely what happens
when t = tn = n exp{(log n)α } for α = 1, a case that is left open above.
The probability that one of the m edges of vertex n + t + 1 connects to u, and another
one to v (which certainly makes the distance between u and v equal to 2), is close to
h Dv (n + t) + δ Du (n + t) + δ i
m(m − 1)E PA(m,δ)
n
2m(n + t) + (n + t)δ 2m(n + t) + (n + t)δ
m(m − 1) h i
= (1 + oP (1)) E (D v (n + t) + δ)(D u (n + t) + δ) PA (m,δ)
n
(2m + δ)2 t2
m(m − 1) t 2/(2+δ/m)
= (1 + oP (1)) (D v (n) + δ)(D u (n) + δ)
(2m + δ)2 t2 n
m(m − 1) −2(m+δ)/(2m+δ) −2/(2+δ/m)
= (1 + oP (1)) t n (Dv (n) + δ)(Du (n) + δ).
(2m + δ)2
(8.9.2)
d d
If we take u = o(n) (n)
1 , v = o2 , we have that Dv (n) −→ D1 , Du (n) −→ D2 , where
(D1 , D2 ) are two iid copies of the random variable with asymptotic degree distribution
P(D = k) = pk in (1.3.60). Thus, the conditional expectation of the total number of double
attachments to both o(n) (n)
1 and o2 up to time n + t is close to
t
X m(m − 1)(D1 + δ)(D2 + δ)
s−2(m+δ)/(2m+δ) n−2/(2+δ/m)
s=1
(2m + δ)2
m(m − 1)(D1 + δ)(D2 + δ) −2m/(2m+δ) −δ/(2m+δ)
≈ n t , (8.9.3)
(2m + δ)(−δ)
which becomes ΘP (1) when t = Kn−2m/δ . The above events, for different t, are close
8.9 Related Results on Distances in Preferential Attachment Models 377
to being independent. This suggests that the process of attaching to both o(n) (n)
1 and o2 is,
conditioning on their degrees (D1 , D2 ), Poisson with some random intensity.
log log n
) (o1 , o2 ) − 4
(n) (n)
distBPA(f (8.9.5)
n
| log(τ − 2)|
is a tight sequence of random variables.
The situation of affine preferential attachment functions f in (8.9.4) where γ ∈ (0, 12 ), for
which the degree power-law exponent satisfies τ = 1 + 1/γ > 3, is not so well understood,
but one can conjecture that again, the distance between o1(n) and o2(n) is whp logarithmic at
some base related to the multi-type branching process that describes its local limit.
The following theorem, for which γ = 12 so that τ = 3, describes nicely how the addition
of an extra power of a logarithm in the degree distribution affects the distances:
Theorem 8.40 (Critical case: interpolation) Consider BPA(f )
n where the concave attach-
ment rule f satisfies that there exists α > 0 such that
k α k k
f (k) = + +o , (8.9.6)
2 2 log k log k
Choose o1 , o2 independently and uar from [n]. Then, conditional on o1 ←→ o2 ,
1 log n
distBPA(f ) (o1 , o2 ) = (1 + oP (1)) . (8.9.7)
n
1 + α log log n
Exercise 8.37 shows that the degree distribution for f in (8.9.6) satisfies l>k pl ≈
P
k −2 (log k)−2α , as for GRGn (w) in Theorem 6.28. Comparing Theorems 8.40 and 6.28,
we see that, for large α, the typical distances in BPAn(f ) are about twice as large as those in
GRGn (w) with similar degrees. This gives an explanation of the occurrence of the extra
factor 2 in Theorem 8.7 compared with Theorem 6.3 for the Norros–Reittu model NRn (w),
and Theorem 7.2 for the configuration model CMn (d), when the power-law exponent τ
satisfies τ ∈ (2, 3). Note that this extra factor is absent precisely when α = 0.
378 Small-World Phenomena in Preferential Attachment Models
Exercise 8.8 (All early vertices are whp at distance 2 for δ < 0) Let δ ∈ (−m, 0) and m ≥ 2. Extend
Exercise 8.7 to the statement that, for K ≥ 1 fixed,
lim P(distPA(m,δ) (i, j) ≤ 2 ∀i, j ∈ [K]) = 1. (8.11.3)
n→∞ n
Exercise 8.9 (Early vertices are not at distance 2 when δ > 0) Let δ > 0 and m ≥ 2. Show that
lim lim P(distPA(m,δ) (i, j) = 2 ∀i, j ∈ [K]) = 0. (8.11.4)
K→∞ n→∞ n
Exercise 8.14 (Negative correlations for m = 1) Show that, for m = 1, Lemma 8.10 implies that if
(π0 , . . . , πk ) contains different coordinates as (ρ0 , . . . , ρk ) then
k−1
\ k−1
\ k−1
\ k−1
\
P {πi πi+1 } ∩ {ρi ρi+1 } ≤ P {πi πi+1 } P {ρi ρi+1 } . (8.11.5)
i=0 i=0 i=0 i=0
Exercise 8.17 (Most-recent common ancestor in PA(1,δ) n (cont.)) Fix o1 , o2 to be two vertices in [n]
chosen uar, and let V be the oldest vertex that the paths from 1 to o1 and that from 1 to o2 have in common
in PA(1,δ)
n . Extend Exercise 8.16 to show that distPA(1,δ) (b) (1, V ) is tight.
n
Exercise 8.28 (Concentration of number of minimally k-connected vertices: proof of Lemma 8.36) Con-
sider PA(m,δ)
n (d) with m ≥ 2 and δ ∈ (−m, 0) as in Exercise 8.26. Use Exercises 8.26 and 8.27 to prove
P
that Mk /E[Mk ] −→ 1 for all k ≤ (1 − ε) log log n/ log m, as in Lemma 8.36.
Exercise 8.29 (Monotonicity of distances in PA(m,δ) n ) Fix m ≥ 1 and δ > −m. Show that n 7→
distPA(m,δ) (i, j) is non-decreasing for n ≥ i ∨ j.
n
Informally, these results quantify the “six degrees of separation” paradigm in random
graphs, where we see that random graphs with very heavy-tailed degrees have ultra-small
typical distances, as could perhaps be expected.
Often, even the lines of proof of these results are similar, relying on clever path-counting
techniques. In particular, the results show that in both generalized random graphs and con-
figuration models, in the τ ∈ (2, 3) regime vertices of high degrees, say k , are typically
connected to vertices of even higher degree, of order k 1/(τ −2) . In the preferential attachment
model, on the other hand, this is not true, yet vertices of degree k tend to be connected to
vertices of degree k 1/(τ −2) in two steps, making typical distances roughly twice as large.
Overview of Part IV
In Part IV we study several related random graph models that can be seen as extensions of
the simple models studied so far. They incorporate novel features, such as directed edges,
clustering, communities, and/or geometry. The important aspect in Part IV will be to verify
to what extent the main results informally described in Meta Theorems A (see the start of
Part III) and B (see above) remain valid, and otherwise, to what extent they need to be
adapted. We will not give complete proofs but instead informally explain why results are
similar to those in Meta Theorems A and B or, instead, why they are different.
383
C HAPTER 9
R ELATED M ODELS
Abstract
In this chapter we discuss some related random graph models that have been
studied in the literature. We explain their relevance, as well as some of the prop-
erties in them. We discuss directed random graphs, random graphs with local
and global community structures, as well as spatial random graphs.
Here, we discuss real-world network models. We start in Section 9.1.1 by discussing citation
networks in detail. In Section 9.1.2 we draw conclusions about network modeling.
385
386 Related Models
nity structure, in that certain parts are more densely connected than the rest of the network,
and these communities are relevant in practice.
In citation networks, vertices denote scientific papers and the directed edges correspond
to citations of one paper to another. Obviously, such citations are directed, since it makes a
difference whether your paper cites mine, or my paper cites yours.
Citation networks grow in time. Indeed, papers do not disappear, so a citation, once made
in a published paper, does not disappear either. Further, their growth is enormous. Figure
9.1(a) shows that the number of papers in various fields grows exponentially in time, mean-
ing that more and more papers are being written. If you ever wondered why scientists seem
to be ever more busy, then this may be an obvious explanation.
In Figure 9.1(a) we display the number of papers in three different domains, namely,
Probability and Statistics (PS), Electrical Engineering (EE), and Biotechnology and Applied
Microbiology (BT). The data comes from the Web of Science data base. While exponential
growth is quite prominent in the data, it is somewhat unclear how this exponential growth
arises. It could be due either to the fact that the number of journals that are listed in Web of
Science grows over time or to the fact that journals contain more and more papers. However,
the exponential growth was observed as early as the 1980’s; see the book by Derek de Solla
Price (1986), appropriately called Little science, big science.
As you can see, we have already restricted to certain subfields in science, the reason being
that the publication and citation cultures in different fields are vastly different. Thus, we have
attempted to go to a situation in which the networks that we investigate are somewhat more
homogeneous. For this, it is relevant to be able to distinguish such fields, and to decide which
papers (or journals) contribute to which field. This is a fairly daunting task. However, it is
also an ill-defined task, as no subdomain is truly homogeneous. Let me restrict myself to
probability and statistics, as I happen to know this area best. In probability and statistics,
there are subdomains that are very pure, as well as areas that are highly applied such as
applied statistics. These areas do indeed have different publication and citation cultures.
Thus, science as a whole is probably hierarchical, in that large scientific disciplines can be
identified, that can, in turn, be subdivided into smaller subdomains, etc. However, one should
stop somewhere, and the three scientific disciplines relating to Figure 9.1 are homogeneous
enough to make our point.
Figure 9.1(b) shows the log–log plot for the in-degree distribution in these three citation
networks. We notice that these data sets seem to have empirical power-law citation distri-
butions. Thus, on average, papers attract few citations but the variability in the number of
citations is rather substantial. We are also interested in the dynamics of the citation distri-
bution of the papers published in a given year, as time proceeds. This can be observed in
Figure 9.2. We see a dynamical power law, meaning that at any time the degree distribution
of a cohort of papers from a given time period (in this case 1984) is close to a power law, but
the exponent changes over time (and in fact decreases, which corresponds to heavier tails).
When time grows quite large, the power law approaches a fixed value.
Interestingly, the existence of power-law in-degrees in citation networks also has a long
history. Derek de Solla Price (1965) observed it and even proposed a model for it that relied
on a preferential attachment mechanism, more than two decades before Barabási and Albert
(1999) proposed the first preferential attachment model.
9.1 Motivation: Real-World Network Modeling 387
(a) (b)
105 100
PS PS
EE EE
10−1 BT
BT
Yearly Publications
10−2
10−3
104
10−4
10−5
10−6
103 10−7 0
1980 1985 1990 1995 2000 2005 2010 2014 10 101 102 103
Figure 9.1 (a) Number of publications per year (logarithmic y axis). (b) Log–log
plot for the in-degree distribution tail in citation networks.
PS EE BT
100 100 100
1984 1984 1984
1988 1988 1988
1992 1992 1992
−1
1996 10 1996 1996
10−1 2000 2000 10−1 2000
2004 2004 2004
2008 2008 2008
2012 10−2 2012 2012
10−2 10−2
10−3
10−3 10−3
10−4
100 101 102 100 101 102 103 100 101 102
Figure 9.2 Degree distribution for papers from 1984 versus time.
We wish to discuss two further properties of citation networks and their dynamics. In
Figure 9.3 we see that the majority of papers stop receiving citations after some time, while
a few others keep being cited for longer times. This inhomogeneity in the evolution of vertex
in-degrees is not present in classical preferential attachment models, where the degree of
every fixed vertex grows as a positive power of the graph size. Figure 9.3 shows that the
number of citations of papers published in the same year can be rather different, and the
majority of papers actually stop receiving citations quite soon. In particular, after a first
increase the average increment of citations decreases over time (see Figure 9.4). We observe
PS EE BT
40
35
100
35
30
30 80
25
25
20 60
20
15 15
40
10 10
20
5 5
0 0 0
1990 2000 2010 1990 2000 2010 1990 2000 2010
Figure 9.3 Time evolution for citations of 20 randomly chosen papers from 1980
for PS and EE, and from 1982 for BT.
388 Related Models
PS EE BT
0.5
1984 0.7 1984
1987 1987
0.4 1990 1990
0.4 1993 0.6 1993
0.5
0.3
0.3
0.4
Figure 9.4 Average citation increment over a 20-year time window for papers
published in different years. PS presents an aging effect different from EE and BT,
showing that papers in PS receive citations longer than papers in EE and BT.
PS - Distribution of Age of Cited Paper EE - Distribution of Age of Cited Paper BT - Distribution of Age of Cited Paper
0.04 0.04
0.02
0.02 0.02
0 0 0
0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20
Age (years) Age (years) Age (years)
Figure 9.5 Distribution of the ages of cited papers, for PS, EE, and BT, for
different citing years.
a difference in this aging effect between the PS data set and the other two data sets, due to the
fact that in PS scientists tend to cite older papers than in EE or BT, again exemplifying the
differences in citation and publication patterns in different fields. Nevertheless, the average
increment of citations received by papers in different years tends to decrease over time for
all three data sets.
A last characteriztic that we observe is the log-normal distribution of the age of cited
papers. In Figure 9.5, we plot the distribution of cited papers, looking at references made by
papers in different years. We have used a 20-year time window in order to compare different
citing years. Notice that this log-normal distribution seems to be rather stable over time, and
the shape of the curve is also similar for different fields.
Let us summarize the differences between citation networks and the random graph mod-
els that form the basis of network science. First, citation networks are directed, which is
different from the typically undirected models that we have discussed so far. However, it is
not hard to adapt our models to become directed, and we explain this in Section 9.2. Second,
citation networks have a substantial community structure, in that parts of the network exist
that are much more densely connected than the network as a whole. We can argue that both
communities exist on a macroscopic scale, for example in terms of the various scientific
disciplines of which science consists, as well as on a microscopic scale, where research net-
works of small groups of scientists create subnetworks that are more densely connected than
the whole network. One could even argue that geography plays an important role in citation
networks, since many collaborations between scientists are within their own university or
country, even though we all work with various researchers around the globe.
9.1 Motivation: Real-World Network Modeling 389
Third, citation networks are dynamic, like preferential attachment models (PAMs), but
their time evolution is quite different from PAMs, as the linear growth in PAMs is replaced
by an exponential growth in citation networks. Further, papers in citation networks seem to
age, as seen both in Figures 9.3 and 9.4, in that citation rates become smaller for large times,
in such a way that typically paperscompletely stop receiving citations at some (random)
point in time.
In conclusion, finding an appropriate model for citation networks is quite a challenge, and
one should be quite humble in one’s expectation that the standard models are in any way
indicative of the complexity of real-world networks.
Now it would be very remarkable if any system existing in the real world could be exactly
represented by any simple model. However, cunningly chosen parsimonious models often
do provide remarkably useful approximations. For example, the law P V = RT relating
pressure P , volume V and temperature T of an “ideal” gas via a constant R is not exactly
true for any real gas, but it frequently provides a useful approximation and furthermore
its structure is informative since it springs from a physical view of the behavior of gas
molecules. For such a model there is no need to ask the question “Is the model true?” If
“truth” is to be the “whole truth” the answer must be “No.” The only question of interest
is “Is the model illuminating and useful?”
Thus, we should not feel discouraged at all! In particular, it is important to know when
to include extra features into the model at hand, so that it becomes more “useful.” For this,
the first step is to come up with models that do incorporate these extra features. Many of
the models discussed so far can rather straightforwardly be adapted to include features such
390 Related Models
(a) (b)
106
106
105
Maximum out-degree
Maximum in-degree
105
4
10
104
103
103
2
10
102
101
105 106 107 105 106 107
Size Size
Figure 9.6 Maximum (a) out- and (b) in-degrees of the 229 networks of size larger
than 10,000 from the KONECT data base.
as directedness, community structure, and geometry. Further, these properties can also be
combined. The simpler models that do not have such features serve as a useful model for
comparison, and can thus act as a “benchmark” for more complex situations. In this way,
the understanding of simple models often helps one to understand more complex models
since many properties, tools, and ideas can be extended to them. In some cases the extra
features give rise to a richer behavior, which then merits being studied in full detail. In this
way, network science has moved significantly forward compared with the models described
so far. The aim of this chapter is to highlight some of the lessons learned.
We discuss directed random graphs in Section 9.2, random graphs with macroscopic or
global communities in Section 9.3, random graphs with microscopic or local communities
in Section 9.4, and spatial random graphs in Section 9.5.
Many real-world networks are directed, in the sense that edges are oriented. For example, in
the World-Wide Web, the vertices are web pages, and the edges are the hyperlinks between
them. One could forget about these directions, but that would discard a wealth of informa-
tion. For example, in citation networks it makes a substantial difference whether my paper
cites yours, or yours cites mine. See Figure 9.6 for the maximum out- and in-degrees in the
KONECT data base.
This section is organized as follows. We start by defining directed graphs or digraphs.
After this, we discuss various models invented for them. We discuss directed inhomogeneous
random graphs in Section 9.2.1, directed configuration models in Section 9.2.2, and directed
preferential attachment models in Section 9.2.3.
A digraph G = (V (G), E(G)) on the vertex set V (G) = [n] has an edge set that is a
subset of the set E(G) ⊆ [n]2 = {(u, v) : u, v ∈ [n]} of all ordered pairs of elements of
[n]. Elements of G are called directed edges or arcs.
0.9
0.8
0.6
0.5
0.4
0.3
0.2
0.1
0
102 103 104 105 106 107
Size
Figure 9.7 Proportion of vertices in the largest strongly connected component
(LSSC) in the 229 networks of size larger than 10,000 from the KONECT data base.
Tendrils
164m
4.61%
OUT
IN SCC 215m
1,139m 1,828m 6.05%
31.96% 51.28%
Tube
9,1m
Disconnected Components 0.26%
208m
5.84%
Figure 9.8 The WWW according to Broder et al. (2000), with updated numbers
from Fujita et al. (2019).
which it is connected, as well as a backward connected component. Further, every vertex has
a strongly connected component (SCC) consisting of those vertices to which there exist both
a forward as well as a backward path. Exercise 9.1 shows that the SCCs in a graph are well
defined. See Figure 9.7 for the proportion of vertices in the largest SCC in the KONECT
data base.
The different notions of connectivity divide the graph up into several disjoint parts. Often,
there is a unique largest SCC that contains a positive proportion of the graph. The IN-part
of the graph consists of collections of vertices that are forward connected to the largest
SCC, but not backward connected. Further, the OUT-parts of the graph are the collections of
vertices that are backward connected to the largest SCC, but not forward connected. Finally,
there are the parts of the graph that are in neither of these parts, and consist of their own
SCC and IN- and OUT-parts. See Figure 9.8 for a description of these parts of the WWW,
as well as an estimate of their relative sizes.
392 Related Models
Let us explain how this marking is defined. For a vertex v , let m(v) denote its mark.
We then define an isomorphism φ : V (G1 ) → V (G2 ) between two labeled rooted graphs
(G1 , o1 ) and (G2 , o2 ) to be an isomorphism between (G1 , o1 ) and (G2 , o2 ) that respects the
marks, i.e., for which m1 (v) = m2 (φ(v)) for every v ∈ V (G1 ), where m1 and m2 denote
the degree-mark functions on G1 and G2 respectively. We then define R? as in (2.2.2), and
the metric on rooted degree-marked graphs as in (2.2.3). We call the resulting notion of local
convergence (LC) marked forward LC and marked backward LC, respectively.
Even when we are considering forward–backward neighborhoods, the addition of marks is
necessary. Indeed, while for the root of this graph we do know by construction its in- and out-
degrees, for the other vertices in the forward and backward neighborhoods this information
is still not available.
Exercises 9.3–9.5 investigate the various notions of local convergence for some directed
random graphs that are naturally derived from undirected graphs.
While the above discussion may not be relevant for all questions that one may wish to
investigate using local convergence techniques, it is useful for the discussion of PageRank,
as we discuss next.
Convergence of PageRank
Recall the definition of PageRank from [V1, Section 1.5]. Let us first explain the solution in
the absence of dangling ends, so that d(out)v ≥ 1 for all v ∈ [n]. Let Gn = (V (Gn ), E(Gn ))
denote a digraph, where we let n = |V (Gn )| denote the number of vertices. Fix the damping
factor α ∈ (0, 1). Then we let the vector of PageRanks (Rv(Gn ) )v∈V (Gn ) be the unique
solution to the equation
X R(Gn )
u
Rv(Gn ) = α (out) + 1 − α, (9.2.3)
u→v
du
P
satisfying the normalization v∈V (Gn ) Rv(Gn ) = n.
The damping parameter α ∈ (0, 1) guarantees that (9.2.3) has a unique solution. This
solution can be understood in terms of the stationary distribution of a “bored surfer.” In-
deed, denote πv = Rv(Gn ) /n, so that (πv )v∈V (Gn ) is a probability distribution that satisfies a
similar relation to (Rv(Gn ) )v∈V (Gn ) in (9.2.3), namely
X πu 1−α
πv = α (out) + . (9.2.4)
u→v
du n
Therefore, (πv )v∈V (Gn ) is the stationary distribution of a random walker, which, with prob-
ability α, jumps according to a simple random walk, i.e., it chooses any of the out-edges
with equal probability, while, with probability 1 − α, the walker is bored, forgets about the
search they were doing, and jumps to a uniform vertex in V (Gn ).
In the presence of dangling ends, we can just redistribute their mass equally over all
vertices that are not dangling ends, so that (9.2.3) becomes
X R(Gn ) α X (Gn )
Rv(Gn ) = α u
1{u6 ∈ D} + R + 1 − α, (9.2.5)
u→v
d(out)
u n u∈D u
where D = {v : d(out)
v = 0} ⊆ V (Gn ) denotes the collection of dangling vertices.
394 Related Models
The damping factor α is quite crucial. When α = 0, the stationary distribution is just
πv = 1/n for every v ∈ V (Gn ), so that all vertices have PageRank 1. This is not very
informative. On the other hand, PageRank possibly converges quite slowly when α is close
to 1, and this is also not what we want. Experimentally, α = 0.85 seems to work well and
strikes a nice balance between these two extremes.
We next investigate the convergence of the PageRank distribution on a directed graph
sequence (Gn )n≥1 that converges locally:
Theorem 9.1 (Existence of asymptotic PageRank distribution) Consider a sequence of
directed random graphs (Gn )n∈N and let on ∈ V (Gn ) be chosen uar. Then, the following
hold:
(a) If Gn converges locally weakly in the marked backward sense to (Ḡ, ō) ∼ µ̄, then there
(Ḡ)
exists a limiting distribution R∅ , with Eµ [R∅
(Ḡ)
] ≤ 1, such that
d (Ḡ)
Ro(Gnn ) −→ R∅ . (9.2.6)
(b) If Gn converges locally in probability in the marked backward sense to (G, o) ∼ µ,
then there exists a limiting distribution R∅ (G)
, with Eµ [R∅ (G)
] ≤ 1, such that, for every
(G)
r > 0 that is a continuity point of the distribution R∅ ,
1 X
1{Rv(Gn ) >r} −→ P
µ (R∅ (G)
> r) . (9.2.7)
n v∈V (G )
n
Interestingly, the positivity of the damping factor also allows us to give a power-iteration
formula for Rv(Gn ) , and thus for R∅ . Indeed, let
1{j→i}
A(G
i,j
n)
= , i, j ∈ V (Gn ) (9.2.8)
d(out)
j
denote the (normalized) adjacency matrix of the graph Gn . Then, Rv(Gn ) can be computed as
follows:
∞
X X
Rv(Gn ) = (1 − α) αk (A(Gn ) )kv,i . (9.2.9)
k=0 i∈V (Gn )
As a result, when Gn converges locally in probability in the marked backward sense with
limit (G, ∅), then R∅
(G)
can be computed:
∞
X X
(G)
R∅ = (1 − α) αk (A(G) )k∅,i , (9.2.10)
k=0 i∈V (G)
where A(G)i,j is the normalized adjacency matrix of the backwards local limit G. We refer to
the notes in Section 9.6 for a discussion of Theorem 9.1, including consideration of an error
in its original statement.
P(D∅ (in)
> r) r−(τin −1) (we are deliberately being vague about what means in this
context).
Exercises 9.6 and 9.7 investigate the implications of (9.2.10) for the power-law hypothesis
for random graphs having bounded out-degrees.
S = [t]. Such kernels are also highly convenient to approximate more general models, as
was exemplified in an undirected setting in Chapter 3.
Directed rank-1 inhomogeneous random graphs. We next generalize rank-1 inhomogeneous
random graphs. For v ∈ [n], let wv(in) and wv(out) be its respective in- and out-weights, which
will have the interpretation of the asymptotic average in- and out-degrees, respectively, under
a summation-symmetry condition on the weights. Then, the directed generalized random
graph DGRGn (w) has edge probabilities given by
wu(out) wv(in)
puv = p(DGRG)
uv = , (9.2.14)
`n + wu(out) wv(in)
where
1 X (out)
`n = (w + wv(in) ). (9.2.15)
2 v∈[n] v
Let (Wn(out) , Wn(in) ) = (wo(out) , wo(in) ) denote the in- and out-weights of a uniformly chosen
vertex o ∈ [n]. Similarly to Condition 1.1, we assume that
d
(Wn(out) , Wn(in) ) −→ (W (out) , W (in) ) (9.2.16)
and
E[Wn(out) ] → E[W (out) ], E[Wn(in) ] → E[W (in) ], (9.2.17)
where (W (out) , W (in) ) is the limiting in- and out-weight distribution. Exercises 9.8 investi-
gates the expected number of edges in this setting.
When thinking of wv(out) and wv(in) as corresponding to the approximate out- and in-degrees
of vertex v ∈ [n], it is reasonable to assume that
Indeed, if Dv(out) and Dv(in) denote the out- and in-degrees of vertex v ∈ [n], we know that
(recall Exercise 9.2)
X X
Dv(out) = Dv(in) . (9.2.19)
v∈[n] v∈[n]
Thus, if indeed wv(out) and wv(in) are approximately equal to the out- and in-degrees of vertex
v ∈ [n] then also
X X
wv(out) ≈ wv(in) , (9.2.20)
v∈[n] v∈[n]
respectively. These numbers are independent for disjoint subsets A and for different individ-
uals. The two branching processes X (s) and Y(s) correspond to the forward and backward
limits of DIRGn (κ), respectively.
We now extend the discussion by defining the marks of these branching processes. With
each individual
R of type r we associate
R independent marks having Poisson distributions with
means S κ(r, u)µ(du) and S κ(u, r)µ(du), respectively. These random variables corre-
spond to the “in-degrees” for the forward exploration process X (s) and the “out-degrees”
for the backward exploration process Y(s). Finally, for the forward–backward setting, we
let the marked branching processes X (s) and Y(s) be independent. We call these objects
Poisson marked branching processes with kernel κ. As inR Section 3.4.3, we let T κ be defined
as in (3.4.15), i.e., for f : S → R, we let (T κ f )(x) = S κ(x, y)f (y)µ(dy).
Theorem 9.2 (Local convergence of DIRGn (κ)) Suppose that κ is irreducible and con-
tinuous almost everywhere on (S × S, µ × µ) and that (9.2.13) holds. Then, DIRGn (κ)
converges locally in probability in the marked forward, backward, and forward–backward
sense to the above Poisson marked branching processes with kernel κ, where the law of the
type of the root ∅ is µ.
We do not give a proof of Theorem 9.2, but refer to Section 9.6 for its history. In Exercise
9.10 the reader is asked to determine the local limit of the directed Erdős–Rényi random
graph. Exercise 9.11 proves Theorem 9.2 in the case of finite-type kernels, while Exercise
9.12 investigates the local convergence of the directed generalized random graph.
becomes positive. Here ζX (s) and ζY (s) denote the survival probabilities of X (s) and Y(s),
respectively. The following theorem describes the phase transition in DIRGn (κ):
398 Related Models
Theorem 9.3 (Phase transition in DIRGn (κ)) Suppose that κ is irreducible and continu-
ous almost everywhere on (S × S, µ × µ) and that (9.2.13) holds. Then
P
|Cmax |/n −→ ζ, (9.2.23)
P P
while |C(2) |/n −→ 0 and |E(C(2) )|/n −→ 0.
Theorem 9.3 is the directed version of Theorem 3.19. Owing to the directed nature of the
random graph involved, it is now more involved to identify when ζ > 0. In the finite-type
case, this is determined by the largest eigenvalue of the mean-offspring matrix exceeding 1;
in the infinite-type case, this is less clear. Exercises 9.13 and 9.14 investigate the conditions
for a giant component to exist for directed Erdős–Rényi and generalized random graphs.
order for a graph with in- and out-degree sequence d = (d(in) , d(out) ) to exist, we need that
(recall Exercise 9.2)
X X
d(in)
v = d(out)
v . (9.2.24)
v∈[n] v∈[n]
and that
E[Dn(in) ] → E[D(in) ], and E[Dn(out) ] → E[D(out) ]. (9.2.26)
Exercise 9.17 investigates the convergence of the numbers of self-loops and multi-edges in
DCMn (d).
Let
pk,l = P(D(in) = k, D(out) = l) (9.2.28)
denote the asymptotic joint in- and out-degree distribution. We refer to (pk,l )k,l≥0 simply
as the asymptotic degree distribution of DCMn (d). The distribution (pk,l )k,l≥0 plays a role
for DCMn (d) similar to the role (pk )k≥0 plays for CMn (d). We further define
X X
p?(in)
k = lpk,l /E[D(out) ], p?(out)
l = kpk,l /E[D(in) ]. (9.2.29)
l k
(in) (out)
The distributions (pk )k≥0 and (pk )k≥0 correspond to the asymptotic forward in- and
out-degrees of a uniformly chosen edge in DCMn (d).
whereas every other vertex except the root has independent out-degree with law (p?(out) l )l≥0 .
Further, for a vertex of out-degree l, we let the mark (corresponding to its asymptotic in-
degree) be k with probability pk,l /p(out)
l . For the marked backward branching process, we
reverse the role of in- and out-. For the marked forward–backward branching process, we
let the root have joint out- and in-degree distribution (pk,l )k,l≥0 , and define the forward and
backward processes and marks as before. We call the above branching process the marked
unimodular branching process with degree distribution (pk,l )k,l≥0 .
The following theorem describes the local convergence in DCMn (d):
Theorem 9.4 (Local convergence of DCMn (d)) Suppose that the out- and in-degrees in a
directed configuration model DCMn (d) satisfy (9.2.25) and (9.2.26). Then DCMn (d) con-
verges locally in probability in the marked forward, backward, and forward–backward sense
to the above marked unimodular branching process with degree distribution (pk,l )k,l≥0 .
It will not come as a surprise that Theorem 9.4 is the directed version of Theorem 4.1,
and Exercise 9.18 asks the reader to prove Theorem 9.4 by adapting its proof.
Then, ζ (out)
has the interpretation of the asymptotic probability that a uniform vertex has a
large forward cluster, while ζ (in) has that of a uniform vertex having a large backward cluster.
400 Related Models
Further, let
X
ψ= pk,l (1 − θ(in) )l (1 − θ(out) )k , (9.2.32)
k,l
so that ψ has the interpretation of the asymptotic probability that a uniform vertex has both
a finite forward and a finite backward cluster. We conclude that 1 − ψ is the probability that
a uniform vertex has either a large forward or a large backward cluster, and thus
ζ = ζ (out) + ζ (in) − (1 − ψ) (9.2.33)
has the interpretation of the asymptotic probability that a uniform vertex has both a large
forward and a large backward cluster. Finally, we let
∞
X X klpk,l E[D(in) D(out) ]
ν= kp?(in) = = . (9.2.34)
k=0
k
k,l
E[D (out)
] E[D(out) ]
P∞
Alternatively, ν = k=0 kp?(out)
k = E[D(in) D(out) ]/E[D(in) ] by (9.2.27). The main result
concerning the size of the giant is as follows:
Theorem 9.5 (Phase transition in DCMn (d)) Suppose that the out- and in-degrees in the
directed configuration model DCMn (d) satisfy (9.2.25) and (9.2.26).
(a) When ν > 1, ζ in (9.2.33) satisfies ζ ∈ (0, 1] and
P
|Cmax |/n −→ ζ, (9.2.35)
P P
while |C(2) |/n −→ 0 and |E(C(2) )|/n −→ 0.
P P
(b) When ν ≤ 1, ζ in (9.2.33) satisfies ζ = 0, so that |Cmax |/n −→ 0 and |E(Cmax )|/n −→
0.
Theorem 9.5 is the adaptation to DCMn (d) of the existence of the giant for CMn (d)
in Theorem 4.9. In Exercise 9.19, the reader is asked to prove that the probability that the
size of |Cmax |/n exceeds ζ + ε vanishes whenever a graph sequence converges locally in
probability in the marked forward–backward sense. In Exercise 9.20, this is used to prove
Theorem 9.5(b).
Then, conditional on o1 → o2 ,
distDCMn (d) (o1 , o2 ) P 1
−→ . (9.2.38)
log n log ν
Theorem 9.6 is the directed version of Theorem 7.1. The philosophy behind the proof
is quite similar:.A breadth-first exploration process shows that |∂Br(out) (o1 )| grows roughly
like ν r , so in order to “catch” o2 , one would need r ≈ logν n = log n/ log ν (recall the
discussion of the various directed neighborhoods below (9.2.1)). Of course, at this stage,
the branching-process approximation starts to fail, which is why one needs to grow the
neighborhoods from two sides and use that also |∂Br(in) (o2 )| grows roughly like ν r .
It would be tempting to believe that (9.2.37) is more than what is needed; this is the con-
tent of Exercise 9.21. In Exercise 9.23, the reader is asked to prove that distDCMn (d) (o1 , o2 )
= oP (log n) when ν = ∞.
(9.2.39)
1 X (in)
c1 x −(τ (out) −2+δ)
≤ d 1 (out) ≤ c2 x −(τ (out) −2−δ)
,
n v∈[n] v {dv >v}
where the upper bound holds for every x ≥ 1 while the lower bound is required to hold only
for 1 ≤ x ≤ nβ for some β > 12 . The main result is then as follows:
Theorem 9.7 (Doubly logarithmic typical distances in DCMn (d)) Suppose that the out-
and in-degrees in the directed configuration model DCMn (d) satisfy (9.2.25), (9.2.26), and
(9.2.39). Then, conditional on o1 → o2 ,
distDCMn (d) (o1 , o2 ) P 1 1
−→ + . (9.2.40)
log log n | log (τ (in) − 2)| | log (τ (out) − 2)|
Theorem 9.7 is the directed version of Theorem 7.2.
(p?(out)
k )k≥0 , respectively. Write
1 1 ∂2
= f (s, t) , (9.2.43)
ν (in) E[D(in) ] ∂s∂t s=1−θ (in) ,t=1
and
1 1 ∂2
= f (s, t) . (9.2.44)
ν (out) E[D(out) ] ∂s∂t s=1,t=1−θ (out)
Then, the diameter in the directed configuration model DCMn (d) behaves as follows:
Theorem 9.8 (Logarithmic diameter in DCMn (d)) Suppose that the out- and in-degrees
in the directed configuration model DCMn (d) satisfy (9.2.25) and (9.2.26). Further, assume
that (9.2.37) holds. Then, if ν = E[D(in) D(out) ]/E[D(out) ] > 1,
diam(DCMn (d)) P 1 1 1
−→ + + . (9.2.45)
log n log ν (in)
log ν log ν (out)
Theorem 9.8 is the directed version of Theorem 7.19. The interpretation of the various
terms is similar to that in Theorem 7.19: the terms involving ν (in) and ν (out) indicate the
depths of the deepest traps, where a trap indicates that the neighborhood lives for a long
time without gaining substantial mass, i.e., it is thin. The term involving ν (in) is the depth of
the largest in-trap, so that the in-neighborhood is thin, and that involving ν (out) that of the
largest out-trap, so that the out-neighborhood is thin. These numbers are determined by first
taking r such that
1
P(|∂Br(in/out) (o)| ∈ [1, K]) ≈ Θ , (9.2.46)
n
where ∂Br(in/out) (o) corresponds to the ball of the backward r-neighborhood for ν (in) and
to the forward r-neighborhood for ν (out) , while K is arbitrary and large. Owing to large
deviations for supercritical branching processes, one can expect that
P(|∂Br(in/out) (o)| ∈ [1, K]) ≈ (ν (in/out) )r . (9.2.47)
Then we can identify r(in) = logν (in) (n) and r(out) = logν (out) (n). The solutions to (9.2.47)
are given by (9.2.43) and (9.2.44). For those special vertices u, v for which |∂Br(in) (in) (u)| ∈
temporal networks in which younger vertices can connect only to older vertices, such as in
citation networks (recall Section 9.1.1). The connectivity structure of such directed versions
is not particularly interesting. For example, the strongly connected component is always
small (see Exercise 9.24). Below, we discuss the PageRank of this model.
Theorem 9.9 (Power-law PageRank distribution of directed PAM) Let (Rv(Gn ) )v∈V (Gn ) be
the PageRank vector with damping factor α of the directed preferential attachment model
Gn with δ ≥ 0 and m ≥ 1, where edges in the normal preferential attachment model are
directed from young to old. Let R∅ be the limiting distribution of the PageRank Ro(Gnn ) of a
uniform vertex, as derived in Theorem 9.1. Then there exist constants 0 < c1 ≤ c2 < ∞
such that, for any r ≥ 1,
where µ is the law of the local limit of this directed preferential attachment model.
Theorem 9.9 implies that the PageRank power-law hypothesis, as explained in [V1, Sec-
tion 1.5] and restated in Section 9.2, is false in general. Indeed, the PageRank distribution
obeys a power law, as formulated above, with exponent τ (PR) = 1 + (2 + δ/m)/(1 + (m +
δ)α/m), while the in-degree obeys a power law with exponent τ = 3 + δ/m. Note that
τ (PR) → 1/α for δ → ∞, while τ (in) = τ = 3 + δ/m → ∞ for δ → ∞. Thus, the
power-law exponent of the directed preferential attachment PageRank remains uniformly
bounded independently of δ , while that of the in-degree distribution grows infinitely large.
This suggests that the PageRank distribution could have power-law tails even for random
graphs with thin-tailed in-degree distributions.
Since the PageRank distribution obeys a power law, it is of interest to investigate the
maximal PageRank in a network of size n. The theorem below gives a result for the very
first vertex:
Theorem 9.10 (PageRank of first vertex in a directed preferential attachment tree) Let
(Rv(Gn ) )v∈V (Gn ) be the PageRank vector with damping factor α of the directed preferential
attachment tree Gn with δ ≥ 0 and m = 1, defined above. Then there exists a limiting
random variable R such that
a.s.
n−(1+(1+δ)α)/(2+δ) R1(Gn ) −→ R. (9.2.49)
Theorem 9.10 shows that the PageRank of vertex 1 has the same order of magnitude as
the maximum of n random variables with power-law exponent τ (PR) = 1+(2+δ/m)/(1+
(m + δ)α/m) would have. It would be of interest to extend Theorem 9.10 to other values
of m, as well as to the the maximal PageRank maxv∈[n] Rv(Gn ) .
404 Related Models
Many real-world networks have communities that are global in size. For example, when
dividing science into its core fields, citation networks have just such a global community
structure, as discussed in Section 9.1.1. In Belgian telecommunication networks of who
calls whom, the division into the French and the Flemish speaking parts is clearly visible,
Blondel et al. (2008), while in US politics the division into Republicans and Democrats
plays a pronounced effect on the network structure of social interactions between politicians,
Mucha et al. (2010).
In this section we discuss random graph models for networks with a global community
structure. The section is organized as follows. In Section 9.3.1 we discuss stochastic block
models, which are the models of choice for networks with community structures. In Section
9.3.2 we consider degree-corrected stochastic block models, which are similar to stochas-
tic block models but allow for more pronounced inhomogeneity in the degree structure. In
Sections 9.3.3 and 9.3.4, we study configuration models and preferential attachment models
with global communities, respectively. We introduce the models, state the most important
results in them, and also discuss the topic of community detection in such models, a topic
that has attracted considerable attention owing to its practical importance.
Exercise 3.11 then shows that the resulting random graph is graphical as in Definition 3.3(a),
so that the results in Chapters 3 and 6 apply. As a result, we will not spend much time on the
degree distribution and the giant and graph distances in this model, as they were addressed
there. Exercise 9.25 elaborates on the degree structure, while Exercise 9.26 investigates fur-
ther the conditions for a giant to exist.
Let us mention that, for the stochastic block model to be a good model for networks with
a global community structure, one would expect that the edge probabilities of the internal
edges between vertices of the same type are larger than those of the external edges between
vertices of different types. In terms of formulas, this means that κ(s, s) > κ(s, r) for all
s, r ∈ S = [t] with s 6= r. For example, the bipartite Erdős–Rényi random graph has a
structure that is quite opposite to a random graph with global communities (as vertices only
have neighbors of a different type).
9.3 Random Graphs with Community Structure: Global Communities 405
where the maximum is over all possible permutations p from [t] to [t]. If such an algorithm
does not exist then we call the problem unsolvable. J
The maximum over permutations of the types in (9.3.2) is due to the fact that the type la-
bels generally have no meaning in real-world networks, so that they can be permuted without
changing anything. Exercise 9.27 shows that (9.3.2) is indeed false for random guessing.
Community detection is the most difficult when the degree distributions of vertices of all
the different types are the same. This is not surprising, as otherwise one may aim to classify
on the basis of the degrees of the graph. As a result, from now on, we assume that the
expected degrees of all types of vertices are the same. Some ideas about how one can prove
that the problem is solvable for unequal expected degrees can be obtained from Exercises
9.28 and 9.29.
We start by considering the case where there are just two types, so that we can take the
edge probability puv to be a/n for vertices of the same type, and b/n for vertices of opposite
types. Here we think of a > b. The question whether community detection is solvable is
answered in the following theorem:
Theorem 9.12 (Stochastic block model threshold) Take n to be even. Consider a stochastic
block model of two types, each having n/2 vertices, where the edge probability puv is a/n
for vertices of the same type, and b/n for vertices of opposite types, where a > b. Then, the
community detection problem is solvable as in Definition 9.11 when
(a − b)2
> 1, (9.3.3)
2(a + b)
while it is unsolvable when
(a − b)2
< 1. (9.3.4)
2(a + b)
Theorem 9.12 is quite surprising. Indeed, it shows that not only should a > b in order
406 Related Models
to have a chance to perform community detection but it also should be sufficiently large
compared to a + b. Further, the transition in Theorem 9.12 is sharp, in the sense that (9.3.3)
and (9.3.4) complement each other. It is unclear what happens in the critical case when
(a − b)2 = 2(a + b). The solvable case in (9.3.3) is sometimes called an “achievability
result,” the unsolvable case in (9.3.4) an “impossibility result.” We do not give the full proof
of Theorem 9.12, as this is quite involved. The proof of the solvable case also shows that the
proportion of pairs of vertices that are correctly classified to be of the same type converges
to 1 when (a − b)2 /[2(a + b)] grows large.
In Exercise 9.30, the reader is asked to show that (9.3.3) implies that a − b > 2 (and thus
a + b > 2), and to conclude that a giant thus exists in this setting.
While the results for a general number of types r are less complete, there is an achiev-
ability result when puv = a/n for vertices of the same type, and puv = b/n for all vertices
of different types, in which (9.3.3) is replaced by
(a − b)2
> 1, (9.3.5)
t(a + (t − 1)b)
which indeed reduces to (9.3.3) for t = 2. Also, many results exist about whether efficient
algorithms for community detection exist. In general, this means that not only should a
detection algorithm exist that achieves (9.3.2), but it should also be computable in reasonable
time (say Θ(n log n) for fixed t). We refer to Section 9.6 for a more elaborate discussion on
such results.
Let us continue this subsection by explaining how thresholds such as (9.3.3) and (9.3.5)
can be interpreted. Interestingly, there is a close connection with multi-type branching pro-
cesses. Consider a branching process with finitely many types. Kesten and Stigum (1966)
asked in this context when it would be possible to estimate the type of the root while ob-
serving the types of the vertices in generation k for very large k . In this case, the expected
offspring matrix equals Ms,r = κ(s, r)µ(r), which is a t × t matrix. Let λ1 > λ2 be the
two largest eigenvalues of M. Then, the Kesten–Stigum criterion is that estimation of the
root type is possible with probability strictly larger than 1/t when
λ22
> 1. (9.3.6)
λ1
Next, consider a general finite-type inhomogeneous random graph, with limiting type
distribution µ(s) and expected offspring matrix Ms,r = κ(s, r)µ(r). Obviously, the local
limit of the stochastic block model is the above multi-type branching process, so a link
between the two detection problems can indeed be expected. Under the condition in (9.3.6),
it is believed that the community detection problem is solvable and even that communities
can be detected in polynomial time. For t = 2, this is sharp, as we have seen above. For
t ≥ 3, the picture is much more involved. It is believed that, for t ≥ 4, a double phase
transition occurs: detection should be possible in polynomial time when λ22 /λ1 > 1, much
harder but still possible (i.e., the best algorithms take a time that is exponentially long in the
size of the network) when λ22 /λ1 > c? for some 0 < c? < 1, and information-theoretically
impossible when λ22 /λ1 < c? . However, this is not yet known in the general case.
The way to get from a condition like (9.3.6) to an algorithm for community detection is
9.3 Random Graphs with Community Structure: Global Communities 407
by using the two largest eigenvalues of the so-called non-backtracking matrix of the random
graph defined below, and to obtain an estimate for the partition by using the eigenvectors
corresponding to these eigenvectors. The leading eigenvalue converges to λ1 in probabil-
ity, while the second is bounded by |λ2 |. This, together with a good approximation to the
corresponding eigenvectors, suggests a specific estimation procedure that we explain now.
Let B be the non-backtracking matrix of the graph G. This means that B is indexed by
~
the oriented edges E(G) = {(u, v) : {u, v} ∈ E(G)}, so that B = (Be,f )e,f ∈E(G)
~ . For an
~
edge e ∈ E(G) , denote e = (e1 , e2 ), and write
Be,f = 1{e2 =f1 ,e1 6=f2 } , (9.3.7)
which indicates that e ends in the vertex in which f starts, but e is not the reversal of f . The
latter property explains the name non-backtracking matrix.
Now we come to the eigenvalues. We restrict ourselves to the case where t = 2, even
though some results extend with modifications to higher values of r. Let λ1 (B) and λ2 (B)
denote the two leading eigenvalues of B. Then, for the stochastic block model,
P P
λ1 (B) −→ λ1 , λ2 (B) −→ λ2 , (9.3.8)
where we recall that λ1 > λ2 are the two largest eigenvalues of M, where Ms,r =
κ(s, r)µ(r). It turns out that for the Erdős–Rényi random graph with edge probability
P
(a + b)/(2n) ≡ α/n, the first eigenvalue λ1 (B) √ satisfies λ1 (B) −→ λ1 = α, while
the second eigenvalue λ2 (B) satisfies λ2 (B) ≤ α + oP (1). Note that this does not follow
from (9.3.8), since M is a 1 × 1 matrix. For the stochastic block model with t = 2 instead,
P
λ2 (B) −→ λ2 = (a − b)/2. Thus, we can expect that the graph is a stochastic block model
when
λ2 (B)2
> 1, (9.3.9)
λ1 (B)
while if the reverse inequality holds then we are not even sure whether the model is an
Erdős–Rényi random graph or a stochastic block model. In the latter case the graph is so
random and homogeneously distributed that we are not able to make a good estimate of
the types of the vertices, which strongly suggests that this case is unsolvable. This at least
informally explains (9.3.6).
Finally, we explain how the above analysis of eigenvalues can be used to estimate the
~
types. Assume that λ22 /λ1 > 1. Let ξ2 (B) : E(G) → R denote the normalized eigenvector
corresponding to λ2 (B). We fix a constant θ > 0. Then, we estimate that σ̂(v) = 1 when
X θ
ξk (e) ≥ √ (9.3.10)
e : e =v
2
n
and otherwise that σ̂(v) = 2 for some deterministic threshold θ. This estimation can then
be shown to achieve (9.3.2) owing to the sufficient separation of the eigenvalues.
we pair the half-edges incident to vertices of type 1 uar to those incident to vertices of type
2, without replacement. As a result, there are edges only between vertices of types 1 and 2,
and the total number of edges is given in (9.3.16).
Using the above definition, we let the edges between vertices of types s, r be given by a
bipartite configuration model between the vertices in {v : σ(v) = s} and {v : σ(v) = r},
410 Related Models
Special cases of this model are the configuration model, for which t = 1, and the bipartite
configuration model itself, for which t = 2 and d(r)
v = 0 for every v with σ(v) = r .
Let µn (s) denote the proportion of vertices of type s. We again assume, as in (9.3.1) that
the type distribution µn (s) = ns /n satisfies that, for all s ∈ [t],
lim µn (s) = lim ns /n = µ(s). (9.3.18)
n→∞ n→∞
Also, in order to describe the local and global properties of the configuration model with
global communities, one should make assumptions similar to those for the original configu-
ration model in Condition 1.7 but now for the matrix of degree distributions. For example, it
is natural to assume that, for all s ∈ [t], the joint distribution function of all the type degrees
satisfies
1 X
Fn(s) (x1 , . . . , xt ) = 1 (1) (t) → F (s) (x1 , . . . , xt ), (9.3.19)
ns v : σ(v)=s {dv ≤x1 ,...,dv ≤xt }
for all x1 , . . . , xt ∈ R and some limiting joint distribution F (s) : Rt → [0, 1]. Further, it is
natural to assume that an adaptation of Condition 1.7(b) holds for all these degrees, such as
that, for all s, r ∈ [t], we have
1 X
d(r) → E[D(s,r) ], (9.3.20)
ns v : σ(v)=s v
where D(s,r) is the rth coordinate of the random vector whose distribution function is given
by F (s) , i.e.,
P(D(s,r) ≤ xr ) = lim F (s) (x1 , . . . , xt ). (9.3.21)
x1 ,...,xr−1 ,xr+1 ,...,xt →∞
While the configuration model, as well as its bipartite version, has attracted substantial at-
tention, the above extension has not. Exercises 9.35–9.37 informally investigate some of
properties of this extension.
Let the graph Gn at time n be given. At time n + 1, let vertex n + 1 have a type σ(n + 1)
that is chosen in an iid way from [t], where
The probability distribution η ? that solves hs (η ? ) = 0 for all s ∈ [t] can be shown to be
unique. We next define the crucial parameters in the model.
For s, r ∈ [t], let
κ(s, r)
θ? (s, r) = P 0 ? 0
, (9.3.25)
r 0 ∈[t] κ(s, r )η (r )
and write
X
θ? (s) = µ(r)θ? (s, r). (9.3.26)
r∈[t]
We let ns = #{v : σ(v) = s} denote the type count. Next, we study the degree distri-
bution in the above preferential attachment model with global communities. For s ∈ [r],
define
1 X
Pk(s) = 1{Dv (n)=k,σ(v)=s} , (9.3.27)
ns v∈[n]
to be the degree distribution of the types in the model, where Dv (n) denotes the degree of
vertex v at time n and ns equals the number of vertices of type s ∈ [t]. The main result on
the degree distribution is as follows:
412 Related Models
κ(s, s) = a > 1 for all s ∈ [t], and κ(s, r) = 1 for all s, r ∈ [t] with s 6= r, it is unclear
whether Err < (t − 1)/t.
Many more detailed result can be proved, for example that the probability that a vertex
label of vertex v is estimated wrongly converges uniformly for all v ∈ [n] \ [δn] for any
δ > 0. Also, there exists an algorithm that estimates σ(v) correctly whp provided that
v = o(n). We refrain from discussing such results further.
In the previous section we investigated settings where the models have a finite number of
communities, making the communities global. This setting is realiztic when we would like
to partition a network of choice into a finite number of parts, for example corresponding
to the main scientific fields in citation or collaboration networks, or the continents in the
Internet. However, in many other settings this is not realiztic. Indeed, most communities of
social networks correspond to smaller entities, such as school classes, families, sports teams,
etc. In most real-world settings, it is not even clear what communities look like. As a result,
community detection has become an art.
The topic is relevant, since most models (including the models with global community
structure from Section 9.4) have rather low clustering. For example, consider a general in-
homogeneous random graph IRGn (κn ) with kernel κn . Assume that κn (x, y) ≤ n. Then,
the expected number of triangles in an IRGn (κn ) is close to
1 X
E[# triangles in IRGn (κn )] = κn (xi , xj )κn (xj , xk )κn (xk , xi ), (9.4.1)
6n3 i,j,k∈[n]
0.9
0.8
Clustering coefficient
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
105 106 107 108
Size
Figure 9.9 Clustering coefficients in the 727 networks of size larger than 10,000
from the KONECT data base.
iztically model the community structure in real-world networks. Below, we consider several
models that attempt to do so. In Section 9.4.1 we start by discussing inhomogeneous random
graphs with community structures. We continue in Section 9.4.2 by describing the hierar-
chical configuration model as well as some close cousins; this is followed by a discussion
of random intersection graphs in Section 9.4.3 and exponential random graphs in Section
9.4.4.
Model Introduction
We will repeatedly make use of notation from Chapter 3. Let F consist of one representative
of each isomorphism class of finite connected graphs, chosen so that if F ∈ F has r vertices
then V (F ) = [r] = {1, 2, . . . , r}. Simple examples of such an F are the complete graphs
on r vertices, but other examples are also possible. Recall that S denotes the type space.
Given F ∈ F with r vertices, let κF : S r → [0, ∞) be a measurable function. The function
κF is called the kernel corresponding to F . A sequence κ e = (κF )F ∈F is a kernel family.
Let κ be a particular kernel family and n an integer. We define a random graph IRGn (κ
e e)
9.4 Random Graphs with Community Structure: Local Communities 415
(a) (b)
100 100
degree degree
10−1
s 10−1
s
k k
10−2 10−2
P(X > x)
P(X > x)
10−3 10−3
10−4 10−4
10−5 10−5
10−6 0 10−6 0
10 101 102 103 10 101 102 103 104 105
x x
(c) (d)
100 100
degree degree
10−1
s 10−1
s
k k
10−2 10−2
P(X > x)
P(X > x)
10−3 10−3
10−4 10−4
10−5 10−5
10−6 0 10−6 0
10 101 102 103 104 10 101 102 103 104
x x
with vertex set [n]. First let x1 , x2 , . . . , xn ∈ S be iid with distribution µ. Given x =
(x1 , . . . , xn ), construct IRGn (κ e) as follows, starting with the empty graph. For each r and
each F ∈ F with |V (F )| = r, and for every r-tuple of distinct vertices (v1 , . . . , vr ) ∈ [n]r ,
add a copy of F on the vertices v1 , . . . , vr (with vertex i of F mapped to vi ) with probability
κ (x , . . . , x )
F v1 vr
p(v1 , . . . , vr ; F ) = ∧ 1, (9.4.3)
nr−1
all these choices being independent. We assume throughout that κF is invariant under per-
mutations of the vertices of the graph F .
The reason for dividing by nr−1 in (9.4.3) is that we wish to consider sparse graphs.
Indeed, our main interest is the case when IRGn (κ e) has O(n) edges. As it turns out, we can
be slightly more general; however, when κF is integrable (which we assume), the expected
number of added copies of each graph F is O(n). Below, all incompletely specified integrals
are taken with respect to the appropriate r-fold product measure µr on S r .
In the special case where all κF are zero apart from κK2 , the kernel corresponding to
an edge, we recover (essentially) a special case of the inhomogeneous random graph model
discussed in Chapter 3. In this case, given x, two vertices i and j are joined with probability
κK2 (xi , xj ) + κK2 (xj , xi ) (κK2 (xi , xj ) + κK2 (xj , xi ))2
+O . (9.4.4)
n n2
416 Related Models
(a) (b)
10−1
10−1
2ein/(s(s − 1))
2ein/(s(s − 1))
10−2 10−2
10−3
10−3
2ein/(s(s − 1))
10−1 10−1
10−2
10−2
10−3
10−3
Figure 9.11 The relation between the community edge density 2ein /(s2 − s) and
the community size s can be approximated by a power law. (a) A MAZON
co-purchasing network, (b) G OWALLA social network, (c) English word relations,
(d) G OOGLE web graph. Figures taken from Stegehuis et al. (2016b).
The correction term will never matter, so we may as well replace κK2 by its symmetrized
version.
For any kernel family κ
e, let κe be the corresponding edge kernel, defined by
κe (x, y) (9.4.5)
X X Z
= κF (x1 , . . . , xi−1 , x, xi+1 , . . . , xj−1 , y, xj+1 , . . . , x|V (F )| ),
F {i,j}∈E(F ) S |V (F )\{i,j}|
where the second sum runs over all 2|E(F )| ordered pairs (i, j) with {i, j} ∈ E(F ),
and we integrate over all variables apart from x and y . Note that the sum need not always
converge. Since every term is positive this causes no problems: we simply allow κe (x, y) =
∞ for some x, y . Given xi and xj , the probability that i and j are joined in IRGn (κe) is at
most κe (xi , xj )/n. In other words, κe captures the edge probabilities in IRGn (κ
e), but not
the correlations.
Number of Edges
Before proceeding to Rdeeper properties, let us note that the expected number of added copies
of F is (1 + o(1))n S |V (F )| κF . Unsurprisingly, the actual number turns out to be concen-
9.4 Random Graphs with Community Structure: Local Communities 417
Existence of a Giant
We next consider the emergence of the giant component. For this, the linear operator T κe ,
defined by
Z
(T κe f )(x) = κe (x, y)f (y)µ(dy), (9.4.9)
S
we write this as ξ(κ) < ∞. This means that the expected number of edges in IRGn (κ e)
is O(n), see Theorem 9.15, and thus the expected degree of a uniform vertex is bounded.
(c) We say that a symmetric edge kernel κe : S 2 → [0, ∞) is reducible if
by a local graph Gi = (V (Gi ), E(Gi )). We assign each of the di half-edges incident to
vertex i to a vertex in Gi in an arbitrary way. Thus, vertex i is replaced by a pair consisting
of the community graph Gi and the inter-community degrees d(b) =P (du(b) )u∈V (Gi ) satisfying
v∈[N ] |V (Gv )|.
P
u∈V (Gi ) du = di . Naturally, the size of the graph becomes n =
(b)
As a result, we obtain a graph with two levels of hierarchy; its local structure is described
by the local graphs (Gi )i∈[N ] whereas its global structure is described by the configuration
model CMN (d). This model is called the hierarchical configuration model. A natural as-
sumption is that the degree sequence d = (di )i∈[N ] satisfies Condition 1.7(a),(b) with n
replaced by N , while the empirical distribution of the graphs satisfies, as N → ∞,
1 X
~ =
µn (H, d) 1 (b)
~
~ → µ(H, d), (9.4.15)
N i∈[N ] {Gi =H,(du )u∈V (Gi ) =d}
for every connected graph H of size |V (H)| and degree vector d~ = (dh )h∈|V (H)| and some
probability distribution µ on graphs with integer marks associated with the vertices. We as-
sume that µn (H, d) ~ = 0 for all H that are disconnected. Indeed, we think of the graphs
(Gi )i∈[N ] as describing the local community structure of the graph, so it makes sense to as-
sume that all (Gi )i∈[N ] are connected. In particular, (9.4.15) shows that a typical community
has bounded size.
We often also make assumptions on the average size of the community of a random ver-
tex. For this, it is necessary to impose that, with µn (H) =
P
~ µ (H, ~ and µ(H) =
d)
d n
P ~
d~ µ(H, d) the community distribution,
X 1 X X
|V (H)|µn (H) = |V (Gi )| → |V (H)|µ(H) < ∞. (9.4.16)
H
n i∈[N ] H
Equation (9.4.16) indicates that the community of a random vertex has a tight size, since
the community size of a random vertex has a size-biased community distribution (see Exer-
cise 9.43). The degree structure of the hierarchical configuration model is determined by the
model description. We next discuss the giant and the distances in the hierarchical configura-
tion model.
Theorem 9.19 (Giant in hierarchical configuration model) Assume that the inter-community
degree sequence d = (di )i∈[N ] satisfies Conditions 1.7(a),(b) with N replacing n and with
limit D, while the communities satisfy (9.4.15) and (9.4.16). Then, there exists ζ ∈ [0, 1]
such that
1 P 1 P
|Cmax | −→ ζ, |C(2) | −→ 0. (9.4.17)
n n
Write ν = E[D(D − 1)]/E[D]. Then, ζ > 0 precisely when ν > 1.
Since the communities (Gi )i∈[N ] are connected, the sizes of the clusters in the hierarchical
configuration model are closely related to those in CMN (d). Indeed, for v ∈ [n], let iv
420 Related Models
where C 0 (i) denotes the connected component of i in CMN (d). This allows one to move
back and forth between the hierarchical configuration model and the corresponding config-
uration model CMN (d) that describes the inter-community connections.
It also allows us to identify the limit ζ . Let ξ ∈ [0, 1] be the extinction probability of the
local limit of CMN (d) of a vertex of degree 1, so that a vertex of degree d survives with
probability 1 − ξ d . Then,
~ − ξ d ],
XX
ζ= |V (H)|µ(H, d)[1 (9.4.19)
H d~
where d = v∈V (H) dv . Further, ξ = 1 precisely when ν ≤ 1; see, e.g., Theorem 4.9. This
P
explains the result in Theorem 9.19. In Exercise 9.45, the reader is asked to fill in the details.
Before moving to graph distances, we discuss an example of the hierarchical configura-
tion model that has attracted attention under the name configuration model with household
structure.
As a result, the degree distribution in the household model is the size-biased degree in the
configuration model that describes its inter-community structure. In particular, this implies
that if the limiting degree distribution D in CMN (d) is a power law with exponent τ 0
then the limiting degree distribution in the household model is a power law with exponent
τ = τ 0 − 1. This is sometimes called a power-law shift and is clearly visible in Figure 9.12.
100
10−1
degree
s(= k)
10−3 0
10 101 102 103
x
Figure 9.12 Degree distribution of household model follows a power law with a
smaller exponent than the community size distribution and outside degree
distribution
u is being paired to that incident to v , distHCMn (G) (u, v) = 2 distCMN (d) (iu , iv ) − 1, where
we recall that iv is such that v ∈ V (Giv ). Thus, distances in HCMn (G) are asymptotically
twice as large as those in CMN (d). The reason is that paths in the household model alternate
between intra-community edges and inter-community edges, because the inter-community
degrees are all equal to 1 so there is no way to jump to a vertex using an inter-community
edge and leave through an inter-community edge again. This is different from general hier-
archical configuration models.
d(tr)
v “third-triangles” of pairs of half-edges.
The graph is built by (a) recursively choosing two half-edges uar without replacement,
and pairing them into edges (as for CMn (d)), and (b) choosing triples of third-triangles
uar and without replacement, and drawing edges between the three vertices incident to the
third-triangles that are chosen.
Let (Dn(si) , Dn(tr) ) denote the number of simple edges and triangles incident to a uniform
d
vertex in [n], and assume that (Dn(si) , Dn(tr) ) −→ (D(si) , D(tr) ) for some limiting distribution
(D(si) , D(tr) ). Newman (2009) performed a generating function analysis to investigate when
a giant component is expected to exist. The criterion that Newman found is that a giant exists
422 Related Models
when
E[(D(si) )2 ] 2E[(D(tr) )2 ] 2E[D(si) D(tr) ]
− 2 − 3 < . (9.4.21)
E[D(si) ] E[D(tr) ] E[D(si) ]E[D(tr) ]
When D(tr) = 0 almost surely, so that there are no triangles, this reduces to
E[(D(si) )2 ]
− 2 > 0, (9.4.22)
E[D(si) ]
which is equivalent to ν = E[D(si) (D(si) − 1)]/E[D(si) ] > 1 (recall Theorem 4.9).
It would be of interest to analyze this model mathematically. While the extra triangles
do create extra clustering in the graph, in that the graph is no longer locally tree-like, the
community structure of the graph is less clear. Of course, the above setting can be general-
ized to arbitrary cliques and possible other community structures, but this would make the
mathematical analysis substantially more involved. Exercises 9.46 and 9.47 investigate the
local and global clustering coefficients, respectively.
and making all the edges between vertices and groups conditionally independent given the
weights (wv )v∈[n] . In the theorem below, we assume that (wv )v∈[n] is a sequence of iid
random variables with finite mean:
Theorem 9.20 (Degrees in random intersection graph with iid vertex weights) Consider
the above random intersection graph, with m = βnα groups, vertex weights (wv )v∈[n]
that are iid copies of W ∼ F and have finite mean, and group membership probabilities
pva = (γwv n−(1+α)/2 ) ∧ 1. Then, for any v ∈ [n]:
P
(a) Dv −→ 0 when α < 1;
d PX
(b) Dv −→ i=1 Yi when α = 1, where (Yi )i≥1 are iid Poi(γ) random variables and
X ∼ Poi(βγW );
d
(c) Dv −→ X where X ∼ Poi(βγ 2 W ) when α > 1.
Theorem 9.20 can be understood as follows. The expected number of groups to which
individual v belongs is roughly (βnα )×(γwv n−(1+α)/2 ) = βγwv n−(1−α)/2 . When α < 1,
this is close to zero, so that Dv = 0 whp. For α = 1, it is close to Poisson, with parameter
βγwv , and the number of other individuals in each of these groups is approximately Poi(γ)
distributed. For α > 1, individual v belongs to a number of groups that tends to infinity
when n → ∞, while each group has expected vanishing size n(1−α)/2 . The latter means
that group sizes are generally 0 or 1, asymptotically independently, giving rise to the Poisson
distribution specified in part (c). Part (b) is the most interesting, and interpolates between
the two extremes.
Here n is the number of individuals while m is the number of groups. Naturally, in order for
the model to be well defined we require
X X
d(ve)
v = d(gr)
a . (9.4.27)
v∈[n] a∈[m]
B the number of groups in which the root participates has law D(ve) , which is the limiting
law of Dn(ve) = d(ve)
o with o ∈ [n] uar;
B the number of groups in which every other vertex participates has law Y ? − 1, where
Y ? − 1 is the size-biased version of D(ve) ;
B the numbers of vertices per group are iid random variables with law X ? , where X ? is
the size-biased version of the limiting law of Dn(gr) = d(gr)
V where now V ∈ [m] is a
uniformly chosen group.
By Theorem 9.21, the degree distribution of the random intersection graphs with pre-
scribed groups is equal to
(ve)
D
X
D= (Xi? − 1), (9.4.28)
i=1
which can be compared with Theorem 9.20(b). The intuition behind Theorem 9.21 is that the
random intersection graph can easily be obtained from the bipartite configuration model by
making all group members direct neighbors. By construction, the local limit of the bipartite
configuration model can be described by an alternating branching process of the size-biased
vertex and group distributions. Note that the local limit in Theorem 9.21 is not a tree; ver-
tices are in multiple groups. However, Theorem 9.21 does imply that the probability that
a uniform vertex has a neighbor with which shares two group memberships vanishes; see
Exercise 9.48. Thus, the overlap between groups is generally a single vertex.
Theorem 9.21 also has implications for the clustering coefficients. Indeed, by Theorem
2.23 the local clustering coefficient for the random intersection graph converges; by Theo-
rem 2.22, the same holds for the global clustering coefficient under a finite-second moment-
condition on the degrees. See Exercises 9.49 and 9.50 for more details.
9.4 Random Graphs with Community Structure: Local Communities 425
The exponential random graph is a way to leverage the randomness and still obtain a
model that one can write down. Indeed, let F be a collection of subgraphs and suppose we
observe that, on our favorite real-world network, the number of occurrences of subgraph F
equals αF for every F ∈ F . Let us now write down what this might mean. Let F be a
graph on |V (F )| = m vertices. For a graph G on n vertices and m vertices v1 , . . . , vm , let
G|(vi )i∈[m] be the subgraph spanned by (vi )i∈[m] . This means that the vertex set of G|(vi )i∈[m]
equals [m], while its edge set equals {{i, j} : {vi , vj } ∈ E(G)}. The number of occur-
rences of F in G can then be written as
1{G|(vi )i∈[m] =F } .
X
NF (G) = (9.4.31)
v1 ,...,vm ∈V (G)
Here, it is convenient to recall that we may equivalently write G = ([n], (xuv )1≤u<v≤n ),
where xi,j ∈ {0, 1} and xi,j = 1 if and only if {i, j} ∈ E(G). Then, we can write
NF (G) = NF (x).
In order to define a measure, we can take a so-called exponential family of the form
1 P
βF NF (x)
pβ~ (x) = e F ∈F , (9.4.32)
~
Zn (β)
~ is the normalization constant:
where Zn (β)
~ =
X P
Zn (β) e F ∈F βF NF (x) , (9.4.33)
x
In this case, E[NF (X)] = αF for all F ∈ F when P(X = x) = pβ~ (x). Further, when
conditioning on NF (x) = qF for some parameters (qF )F ∈F , the conditional exponential
random graph is uniform over the set of graphs with this property. This is a conditioning
property of exponential random graphs.
We next discuss two examples that we know quite well, and that arise as exponential
random graphs with certain specific subgraph counts:
Example
P 9.23 (Example: ERn (λ/n) and edge subgraphs) Take NF (x) = NK2 (x) =
u,v∈[n] uv = 2|E(Gn )|, so that we put a restriction on the expected number of edges or
x
complete graphs of size 2 in the graph. Then we see that, with Gn = ([n], (xuv )1≤u<v≤n ),
e2β|E(Gn )| = (1 + e2β )( 2 ) ,
n
~ =
X
Zn (β) (9.4.35)
x
and
1 Y e2βxuv
pβ~ (x) = e2β|E(Gn )| = . (9.4.36)
~
Zn (β) 1 + e2β
1≤u<v≤n
9.4 Random Graphs with Community Structure: Local Communities 427
Thus, the different edges are independent, and an edge is present with probability e2β /(1 +
e2β ), and absent with probability 1/(1 + e2β ). In a sparse setting, we aim at
λ
E[|E(Gn )|] = (n − 1), (9.4.37)
2
so that the average degree per vertex is precisely equal to λ. The constraint in (9.4.34) thus
reduces to !
n e2β λ
= (n − 1). (9.4.38)
2 1 + e2β 2
This leads to ERn (λ/n), where
e2β λ
= , (9.4.39)
1 + e2β n
that is, e2β = λ/(n − λ). This shows that the ERn (λ/n) is an example of an exponential
random graph with a constraint on the expected number of edges in the graph. Further, by
the conditioning property of exponential random graphs, conditional on ERn (λ/n) = m,
the distribution is uniform over all graphs with m edges. J
Example 9.24 (Example: GRGn (w) and vertex degrees) The second example arises
P we fix the (G
when expected degrees of all the vertices. This occurs when we take Nv (x) =
u∈[n] xvu = dv
n)
for every v ∈ [n]. In this case, with Gn = ([n], (xuv )1≤u<v≤n ), we
have
(Gn )
~ =
X P X P Y
Zn (β) e v∈[n] βv dv = e 1≤u<v≤n (βu +βv )xuv = (1 + eβu +βv )
x x 1≤u<v≤n
(9.4.40)
and
1 P
βv d(Gn)
Y e(βu +βv )xuv
pβ~ (x) = e v∈[n] v
= . (9.4.41)
~
Zn (β) 1 + eβi +βj
1≤u<v≤n
Thus the different edges are still independent, edge {u, v} being present with probability
eβu +βv /(1 + eβu +βj ) and absent with probability 1/(1 + eβu +βv ). In a sparse setting, we
aim for
E[d(G
v
n)
] = αv , (9.4.42)
so that the average degree of vertex v is precisely equal to αv . The constraint in (9.4.34) thus
reduces to
X eβv +βj
= αv . (9.4.43)
j6=v
1 + eβv +βj
Thus, GRGn (w) is an example of an exponential random graph with a constraint on the
428 Related Models
expected number of edges in the graph. Further, by the conditioning property of exponential
random graphs, conditional on d(G v
n)
= αv for all v ∈ [n], the distribution is uniform over
all graphs with these degrees. This gives an alternative proof of Theorem 1.4.
We note that this does not exactly fit the format as in (9.4.31) since we have fixed the
expected vertex degrees rather than subgraph counts. However, the model where we use
the number of k -stars for all k in (9.4.31) is closely related to the model where we fix the
expected degrees of all the vertices. J
Now that we have discussed two quite nice examples of an exponential random graph, let
us discuss its intricacies. The above choices, in Examples 9.23 and 9.24, are quite spe-
cial in the sense that the exponent in (9.4.32) is linear in the edge occupation statuses
(xuv )1≤u<v≤n . This gives rise to exponential random graphs that have independent edges.
However, when investigating more intricate subgraph counts, such as triangles, this linear-
ity no longer holds. Indeed, the number of triangles is a cubic function of (xuv )1≤u<v≤n . In
such cases the edges will no longer be independent, making the exponential random graph
very hard to study.
Indeed, the exponential form in (9.4.32) naturally leads to large deviations in random
graphs, a topic that is much better understood in a dense setting where the number of edges
grows proportionally to n2 . In the sparse setting such problems are hard, and sometimes
ill-defined, for example since the model may have phase transitions (see, e.g., Häggström
and Jonasson (1999)). Such phase transitions imply that the estimation problem of finding
parameters β ~ = (βF )F ∈F such that the expected subgraph counts are exactly as intended
may be ill defined. We refer to the notes and discussion in Section 9.6 for more background
and references.
The models described so far do not incorporate geometry at all. Yet, geometry may be rele-
vant (see, e.g., Wong et al. (2006) and the references therein). In many networks the vertices
are located somewhere in space and their locations may indeed be relevant. People who live
closer to one another are more likely to know each other, even though we all know people
who live far away from us. This is a very direct link to the geometric properties of networks.
However, the geometry may also be much more indirect or latent. For example, people who
have similar interests are also more likely to know one another. Thus, when we are associat-
ing a whole bunch of attributes with vertices in the network, vertices with similar attributes
(age, interests, hobbies, profession, music preference, etc.) may be more likely to know each
other. In any case, we are rather directly led to studying networks where the vertices are em-
bedded in some general geometric space. These are what we refer to as spatial networks.
One further aspect of spatial random graphs deserves to be mentioned. Owing to the
fact that nearby vertices are more likely to be neighbors, conversely it is also true that two
neighbors of a vertex are more likely to be connected. Therefore, geometry rather naturally
leads to clustering.
9.5 Spatial Random Graphs 429
where W (1) , W (2) are two independent exponential random variables with parameter 1. Al-
ternatively, it can be seen that T = (G1 + G2 − G3 )/2, where G1 , G2 , G3 are three inde-
x
pendent Gumbel distributions having density fG (x) = ex e−e on R.
Interestingly, the method of proof of Theorem 9.25 is quite close to that of Theorem 7.24.
Indeed, again the parts of the graph that can be reached in a distance at most t are analyzed.
Let P1 and P2 be two uniform points along the circle, so that Dn has the same distribution
as the distance between P1 and P2 . Denote by R(1) (t) and R(2) (t) the parts of the graph
that can be reached within a distance t from P1 and P2 , respectively. Then Dn = 2Tn ,
where Tn is the first time that R(1) (t) and R(2) (t) have a non-zero intersection. The proof
then consists of showing that, up to time Tn , the processes R(1) (t) and R(2) (t) are close to
certain continuous-time branching processes, primarily owing to the fact that the probability
that there are two intervals that overlap is quite small. The random variables W (1) and W (2)
can be viewed as appropriate martingale limits of these branching processes.
Comparing Theorem 9.25 with Theorem 7.24, we see that the rescaled distance Dn , after
subtraction of the correct multiple of log n, converges in distribution, while in Theorem 7.24,
convergence is at best along subsequences. This is due to the fact that Dn is a continuous
random variable, while graph distances are integer-valued. Therefore, graph distances suffer
from discretization effects. In the next paragraph, we will see that the graph distances in the
small-world model suffer from similar issues.
Further, the case where ρ = λn k → 0 has been studied, and there the behavior is closely
related to that in Theorem 9.25. See Section 9.6 for an extensive discussion. The parameter
ν arises as the largest eigenvalue of the offspring matrix of an appropriate two-type branch-
ing process that describes the local neighborhoods in the discrete small-world model. This
branching process has two types in that there is a difference between an interval starting
immediately after a shortcut, and intervals that have previously been found, owing to the
“hesitation” arising from the fact that a long-range edge now has length 1 rather than 0.
(a) (b)
Figure 9.13 Examples of hyperbolic graphs for n = 250, with (a) τ = 2.5 and (b)
τ = 3.5, and average degree approximately 5.
100 τ = 2.5
τ = 3.5
P(X > x)
10−1
We deduce that the model is scale-free, meaning that the asymptotic degree distribution
has infinite variance, precisely when α ∈ ( 12 , 1); otherwise the degree distribution obeys
a power law with a larger degree exponent. Let us informally explain the power law in
Theorem 9.27. For a vertex v with radial coordinate rv , we define its type tv by
tv = e(R−rv )/2 . (9.5.11)
Then, the degree Dv of vertex v can be approximated by a Poisson random variable with
mean tv , so that Dv is of order tv . Furthermore, the random variables (tv )v≥1 are distributed
as a power law with exponent τ = 2α + 1, so that the degrees have a power-law distribution
as well, with the same exponent.
The exact form of pk involves several special functions. Its identification is quite impres-
sive, the proof of which is rather involved. For the purpose of this book, however, the exact
shape of pk is not so relevant.
Much more is known about the local structure of the hyperbolic random graph, for exam-
ple, its local limit has been identified. We postpone this discussion to the next subsection, in
9.5 Spatial Random Graphs 433
which we discuss the local limit in geometric inhomogeneous random graphs. It turns out
that we can interpret the hyperbolic random graph as a special case which is interesting in
its own right.
also have the necessary clustering to make them appropriate models. Of course, the question
of how to embed them precisely is highly relevant and also quite difficult.
The example that has attracted the most attention is the Internet. In Figure 9.15, you can
see a hyperbolic embedding of the Internet, as performed by Boguná et al. (2010). We see
that the regions on the boundary of the outer circle can be grouped in a fairly natural way,
where the countries in which the autonomous systems reside seem to be grouped according
to their geography, with some exceptions (for example, it is not clear why Kenya is almost
next to the Netherlands). This hyperbolic geometry is first of all quite interesting, but it could
also be helpful in sustaining the ever growing Internet traffic.
iid, for example as power-law random variables. We write (xv )v∈[n] and (wv )v∈[n] for the
realizations of (Xv )v∈[n] and (Wv )v∈[n] .
The edges are conditionally independent given (xv )v∈[n] and (wv )v∈[n] , where the condi-
tional probability that the edge between u and v is present is equal to
pu,v = κn (kxu − xv k, wu , wv ), (9.5.13)
3
for some κ : [0, ∞) → [0, ∞). A prime example of such a GIRG is the so-called product
GIRG, for which
w w max{α,1}
u v
κn (t, wu , wv ) = λ 1 ∧ P t−dα , (9.5.14)
i∈[n] wi
where α, λ > 0 are appropriate parameters. Often, we assume that the vertex weights obey
a power law, i.e.,
P(W > w) = L(w)w−(τ −1) (9.5.15)
for some slowly varying function L : [0, ∞) → (0, ∞). As is usual, the literature treats a
variety of models and settings, and we refer to the notes and discussion in Section 9.6 for
more details. To describe the local limit of GIRGs, we need to assume a more restrictive
setting:
Assumption 9.30 (Limiting connection probabilities exist) Assume the following:
(a) the vertex weights (wv )v∈[n] satisfy Condition 1.1(a) for some limiting random variable
W;
(b) the vertex locations (xv )v∈[n] are a sequence of iid uniform locations on [− 12 , 12 ]d that
are independent of (wv )v∈[n] ;
(c) there exists a function κ : [0, ∞)3 → [0, ∞) such that κn (n−1/d t, xn , yn ) → κ(t, x, y)
for all xn → x and yn → y , where κ satisfies that there exists α > 0 such that, for all
t large enough,
E[κ(t, W1 , W2 )] ≤ t−α , (9.5.16)
with W1 , W2 two copies of the limiting random variable W in part (a).
In the limit T & 0, this gives rise to the (hard) hyperbolic graph studied in the previous
subsection. The identification is given in the following theorem:
436 Related Models
Exercise 9.54 asks the reader to investigate the proof of Theorem 9.32.
For u, v ∈ V (Gn ), we independently let {u, v} ∈ E(Gn ) with probability p(u, v), and
{u, v} 6∈ E(Gn ) otherwise.
Theorem 9.34 (Giant in GIRGs) Let Gn be the GIRG defined in (9.5.35) and (9.5.36) with
P
p ≥ 0, α ∈ [0, ∞], σ ∈ [0, ∞), and β > 0. Then |Cmax |/n −→ ζ whp, where ζ is the
survival probability of the Poisson infinite GIRG with edge probabilities given by (9.5.35)
P
and (9.5.36), while |C(2) |/n −→ 0 for all α > 0.
Special cases of the GIRG defined in (9.5.35) and (9.5.36) are the product GIRG, as well
as certain cases of the GIRG in Assumption 9.30. By the construction of the inhomogeneous
Poisson process in (9.5.35) and conditional on |V (Gn )| = n0 , the variables ((xi , wi ))i∈[n0 ]
9.5 Spatial Random Graphs 439
are iid with distribution that is uniform on 21 [−n1/d , n1/d ] for the xi ’s, and wi iid copies of
a Pareto random variable W with P(W > w) = w−(τ −1) for w ≥ 1.
Let us complete this discussion by considering the situation when d = 1 using the local
convergence statement in Theorem 9.33. Here, there is no infinite component in the local
limit for τ > 3, so that no giant exists in the pre-limit either by Corollary 2.27.
We call a vertex u strongly isolated when there does not exist an occupied edge {v1 , v2 } ∈
E(Gn ) with v1 ≤ u ≤ v2 in the graph. In particular, this means that the connected com-
ponent of u is finite. We prove that the expected number of occupied edges {v1 , v2 } with
v1 ≤ u ≤ v2 is bounded. Indeed, for the local limit of the hard hyperbolic graph in d = 1,
this expected number is equal to
" #
X 1 X h (Wv1 Wv2 )α i
E α ≤ C E
(Wv1 Wv2 )α + kv1 − v2 kα
v ≤u≤v
1 2 1 + kv1 −v2 k
cWv1 Wv2 v ≤u≤v
1 2
X h Xα i
≤C kE , (9.5.37)
k≥1
X α + kα
where X = W1 W2 is the product of two independent W variables. Now, when P(W >
w) = w−(τ −1) , it is not hard to see that
log x
P(X α > x) ≤ C . (9.5.38)
x(τ −1)/α
In turn, this implies that
h Xα i log k
E ≤ C τ −1 . (9.5.39)
X +k
α α k
We conclude that, when multiplied by k , this is summable when τ > 3 so that the expected
number of occupied edges {v1 , v2 } with v1 ≤ u ≤ v2 is bounded. In turn, this suggests
that this number equals zero with strictly positive probability (beware, these numbers are
not independent for different u), and this in turn suggests that there is a positive proportion
of them. However, when there is a positive proportion of strongly isolated vertices, there
cannot be an infinite component. This intuitively explains why the existence of the giant
component is restricted to τ ∈ (2, 3). Thus, the absence of a giant in hyperbolic graphs
with power-law exponent τ with τ > 3 is intimately related to the fact that this model is
inherently one-dimensional.
where Dv(n) denotes the degree of vertex v ∈ V (Gn ) in Gn . Thus, Dn (xn+1 ) denotes the
total degree of all vertices located in Br (xn+1 ). The m edges are connected to vertices
(y1 , . . . , ym ) conditionally independently given (Gn , xn+1 ). Thus, for all v ∈ V (Gn ) ∩
Br (xn+1 ),
Dv(n)
P(yi = v | Gn ) = , (9.5.42)
max(Dn (xn+1 ), αmAr n)
while
Dn (xn+1 )
P(yi = xn+1 | Gn ) = 1 − , (9.5.43)
max(Dn (xn+1 ), αmAr n)
where we denote Ar = µ(Br (u)); α ≥ 0 is a parameter, and r = rn is a radius to be chosen
9.5 Spatial Random Graphs 441
appropriately. The parameter r may depend on the size of the graph. The degree sequence
of the model that arises is characterized in the following theorem:
Theorem 9.37 (Degrees in preferential attachment models with uniform locations on the
sphere) Let S be the surface of a three-dimensional unit ball. Take rn = nβ−1/2 log n,
where β ∈ (0, 12 ) is a constant. Finally, let α > 2 and m ≥ 1. In the above geometric
preferential attachment model given by (9.5.41)–(9.5.43),
P
Pk(n) −→ pk , (9.5.44)
−(α+1)
where pk = Ck (1 + o(1)) for k large.
Theorem 9.37 allows for r = rn = o(1), so that vertices can make connections only to
vertices that are close by.
We next discuss a setting where rn remains fixed. Let us first introduce the model. We
again assume that S is a metric space, and µ is the uniform measure on S . Further, let
α : R+ → R+ be an attractiveness function. The graph process is denoted by (Gn )n≥0 .
Here G0 is assumed to be a connected graph with n0 vertices and e0 edges. We let the
spatial locations (Xi )i≥1 be iid draws from µ in S . Each vertex enters the graph with m
edges to be connected to the vertices already in the graph. Denote the receiving vertices by
(Vi(n+1) )i∈[m] . Conditional on Gn and Xn+1 , we let the vertices (Vi(n+1) )i∈[m] be condition-
ally iid with
(degGn (u) + δ)α(|u − Xn+1 |)
P(Vi(n+1) = u | Xn+1 , Gn ) = P , (9.5.45)
i∈[n] (degGn (Xi ) + δ)α(|Xi − Xn+1 |)
The main result about the degree distribution in the above model is as follows:
Theorem 9.38 (Degrees in preferential attachment models with uniform locations) Let S
be a general metric space, and let µ be the uniform measure on it. Let m ≥ 1 and δ ≥ 0. For
m = 1 assume that I1 < ∞, and, for m ≥ 2, that I2 < ∞ in (9.5.46). Let α be continuous,
and for δ = 0 assume that α(r) ≥ α0 > 0. In the geometric preferential attachment model
described in (9.5.45),
P Γ(k + δ)Γ(m + 2 + δ + δ/m)
Pk(n) −→ pk = (2 + δ/m) . (9.5.47)
Γ(m + δ)Γ(k + 3 + δ + δ/m)
The asymptotic degree distribution in (9.5.47) is identical to that in the non-geometric
preferential attachment model; see, e.g., (1.3.60). This is true since we are working on a
fixed metric space, where vertices become more and more dense. Thus, locally, the behavior
is very similar to a normal preferential attachment model, i.e., the spatial effects are “washed
away.” Remarkably, δ ∈ (−m, 0) is not allowed even though that model is perfectly well
defined.
So far, we have discussed settings where I1 < ∞, so that, in particular, the geometric
component is not very pronounced. We continue to study a setting where r 7→ α(r) is quite
large for small r, so that the proximity of the vertices does become much more pronounced:
442 Related Models
be the degree distribution of types in the model, where Dv (n) denotes the degree of vertex
v at time n. The main result on the degree distribution is as follows:
Theorem 9.40 (Degrees in preferential attachment models with uniform locations) Let
S = [t] be a finite space, and let µ be a measure on it. Assume that α(x, y) ≥ α0 > 0. In
the geometric preferential attachment model described in (9.5.45),
(n) P 2µs Γ(k)Γ(m + φ(s)−1 )
Ps,k −→ ps,k = , (9.5.50)
φs Γ(m)Γ(k + 1 + φ(s)−1 )
where φ(s) satisfies
X α(s, r)
φ(s) = P µ(r); (9.5.51)
r∈[t] j∈[t] α(j, r)ν(j)
of influence in some metric space, for example the torus [0, 1]d for some dimension d, for
which the metric equals
d(x, y) = min{kx − y + uk∞ : u ∈ {0, 1, −1}d }. (9.5.52)
When a new vertex arrives, it is uniformly located somewhere in the unit cube, and it con-
nects to each of the older vertices in whose region of influence it lands independently and
with fixed probability p. These regions of influence evolve as time proceeds, in such a way
that the volume of the influence region of the vertex v at time n is equal to
a1 Dv (n) + a2
Rv (n) = , (9.5.53)
n + a3
where now Dv (n) is the in-degree of vertex v at time n and a1 , a2 , a3 are parameters which
are chosen such that pa1 ≤ 1. One of the main results is that this model is a scale-free graph
process with limiting degree distribution (pk )k≥0 satisfying (1.1.9) with τ = 1 + 1/(pa1 ) ∈
[2, ∞):
Theorem 9.41 (Degrees in preferential attachment models with influence) In the above
preferential attachment models with influence where the volume of the influence region is
given by (9.5.53), for k ≤ (n1/8 / log n)4pa1 /(2pa1 +1) ,
1 X
Pk(n) = 1{Dv (n)=k} −→P
pk , (9.5.54)
n v∈[n]
where
k−1
pk Y ja1 + a2
pk = . (9.5.55)
1 + kpa1 + pa2 j=0 1 + jpa1 + a2
In Exercise 9.53, the reader is asked to prove that pk = ck −(1+1/(pa1 )) for the pk in
(9.5.55).
(a) (b)
Figure 9.16 Examples of scale-free percolation graphs, with (a) τ = 2.5 and (b)
τ = 3.5.
For models on Zd , the definition of power-law degrees needs some adaptation. Indeed,
we say that an infinite random graph has power-law degrees when
pk = P(Do = k), (9.5.56)
where Dx is the degree of vertex x ∈ Zd and o ∈ Zd is the origin, satisfies (1.4.4) for some
τ > 1. This is a reasonable definition. Indeed, let Br (o) = [−r, r]d ∩ Zd be a cube of width
2r around the origin o ∈ Zd , denote n = (2r + 1)d , and, for each k ≥ 0, let
1 X
Pk(n) = 1{Dx =k} , (9.5.57)
n x∈B (o)
r
Scale-Free Percolation
Let each vertex x ∈ Z be equipped with an iid weight Wx . Conditional on the weights
d
(Wx )x∈Zd , the edges in the graph are independent and the probability that there is an edge
between x and y is
α
/|x−y|αd
pxy = 1 − e−λ(Wx Wy ) , (9.5.58)
for α, λ ∈ (0, ∞). Here, the parameter α > 0 describes the long-range nature of the model,
while we think of λ > 0 as the percolation parameter. In terms of the weight distribution,
we are mainly interested in settings where the Wx have unbounded support in [0, ∞) and
particularly when they vary substantially, as in (9.5.15). The name scale-free percolation is
justified by the following theorem:
Theorem 9.42 (Power-law degrees for power-law weights) Fix d ≥ 1, consider scale-
free percolation as in (9.5.58), and assume that the vertex weights are iid random variables
satisfying (9.5.15).
9.5 Spatial Random Graphs 445
100 τ = 2.5
τ = 3.5
P(X > x)
10−1
(a) (b)
Figure 9.18 Examples of a scale-free percolation graph on the plane, with (a)
τ = 2.5 and (b) τ = 3.5.
100 τ = 2.5
τ = 3.5
P(X > x)
10−1
10−2
100 101 102
x
Figure 9.19 Degree distributions of the scale-free percolation on the planar graphs
in Figure 9.18.
similar phenomenon for rank-1 inhomogeneous random graphs, such as the Norros–Reittu
model and the configuration model, where the giant is robust (recall, e.g., Theorem 3.20).
We close our discussion on scale-free percolation by studying the graph distances. In
finite graphs, typical distances are obtained by choosing two vertices uar from the vertex set,
and studying how the graph distance between them evolves when the network size n → ∞.
For infinite models, however, we replace this by studying the graph distances between far-
away vertices, i.e., we study distG (x, y) for |x − y| large. By translation invariance, this is
the same as studying distG (0, x) for |x| large. Below is the main result:
Theorem 9.44 (Distances in scale-free percolation) Fix d ≥ 1, consider scale-free perco-
lation as in (9.5.58), with iid weights (Wx )x∈Zd satisfying (9.5.15), α > 1, and τ > 2, and
let λ > λc . Then, conditional on o ←→ x,
(a) for τ ∈ (2, 3) and α > 1,
distG (o, x) P 2
−→ ; (9.5.62)
log log |x| | log (τ − 2)|
(b) for τ > 3 and α ∈ (1, 2), whp for every ε > 0,
0
(log |x|)∆ −ε ≤ distG (o, x) ≤ (log |x|)∆+ε , (9.5.63)
9.5 Spatial Random Graphs 447
α
distG (o, x) & |x|
2 α=2
distG (o, x) ≈
log log |x| distG (o, x) ≈ (log |x|)∆
distG (o, x)
1 ≤2 α=1
1{{o,x}∈E(G)} |x − o| < ∞,
hX i
E (9.5.65)
x∈Zd
Since it is hard even to construct the spatial configuration model, it may not come as a
surprise that few results are known for this particular model.
We close this section by discussing a particular matching for d = 1 that is rather natural.
Give each half-edge a direction uar, meaning that it points to the left or to the right with
equal probability and independently across the edges. The edges are then obtained by pairing
half-edges pointing to each other, first exhausting all possible connections between nearest
neighbors, then linking second-nearest neighbors, and so on.
Assuming that D has finite mean, it is known that this algorithm leads to a well-defined
configuration, but that the expected length of the longest edge attached to a given vertex is
infinite. Indeed, let N be the length of the edge to the furthest neighbor of o. Then, N < ∞
almost surely. If we assign directions with a probability that is not equal to 12 then N = ∞
with positive probability. Further, let N1 denote the length of the first edge incident to o.
Then, E[N1 ] = ∞ when E[D] < ∞. Thus, edges have finite (spatial) length, but their
lengths have infinite mean.
In this chapter we have given an extensive introduction to random graph models for networks that are
directed, and have community structures and geometry. Obviously, these additional features can also be
combined, but there is only a limited literature on that, and we refrain from discussing such models.
We have not been able to cover all the relevant models that have attracted attention in the literature.
Particularly for dynamic random graphs, we have not discussed some of the relevant models. Examples are
copying or duplication models; these are dynamic random graphs in which new vertices copy a portion of
the neighbors of an older vertex (Kumar et al. (2000)).
Another class of dynamic models that has attracted considerable attention are models aimed at delaying
or accelerating the birth of the giant component. Indeed, one can obtain the combinatorial Erdős–Rényi
random graph by adding edges uniformly one by one, until the desired number of edges is added; then the
distribution is the same as that for the Erdős–Rényi random graph with the same number of edges. For this
process there is a giant when the number of added edges is m = cn with c > 12 , while there is not giant for
m = cn with c < 12 .
This process can be modified using a “power of choice” by considering a pair of edges at each time.
Achlioptas raised the question whether is it possible to select one of the two exposed edges at each stage
in such a way as to either delay or accelerate the birth of a giant? Spencer and Wormald (2007) called this
aptly birth control for giants. In general, one can select the edge for which the connected components on
either side are the smallest in some sense. Bohman and Frieze (2001, 2002) studied settings where the first
edge is taken when it connects two isolated vertices, but otherwise the second edge is chosen. They showed
that there is no giant yet when the number of added edges is m = cn with c > 0.535, so that this indeed
delays the birth of the giant. Spencer and Wormald (2007) narrowed down the birth of the giant as being in
between m = 0.8293n and m = 0.9823n.
Intuitively, one may guess that the birth of the giant is delayed most when the chosen edge minimizes
the product of the connected component sizes of the vertices in the edge. This is the so-called product rule,
and is sometimes also called explosive percolation since the size of the giant grows very fast after the giant
is first formed. In fact, this led Achlioptas et al. (2009) to the conjecture that the limiting size of the giant
for m = cn might be discontinuous around the critical value. This, however, turns out not to be true, as
proved by Riordan and Warnke (2011). It is as yet unclear how the limiting proportion of vertices in the
giant grows slightly beyond the critical value, though.
The notes to this chapter are substantially more extensive than those in other chapters, because many
models are being discussed, and we only have limited space. As before, the notes can be used to learn more
about the models and to get pointers to the literature.
For citation networks, there is a rich literature proposing models for them using preferential attachment
schemes and adaptations of these, mainly in the complex-networks literature. Aging effects, i.e., taking
account of the age of a paper in its likelihood of obtaining citations, have been extensively considered as
the starting point to investigate the dynamics of citation networks; see Wang et al. (2009, 2008); Hajra
and Sen (2005, 2006); Csárdi (2006). Here the idea is that old papers are less likely to be cited than new
papers. Such aging has been observed in many citation network data sets and makes PAMs with weight
functions depending only on the degree ill-suited for them. Indeed, PAMs could more aptly be called the-
old-get-richer models, i.e., in general old vertices have the highest degrees. In citation networks, however,
papers with many citations appear all the time. Wang et al. (2013) investigated a model that incorporates
these effects; see also Wang et al. (2014) for a comment on the methods in that paper. On the basis of
empirical data, they suggested a model where the aging function follows a log-normal distribution with
paper-dependent parameters, and the preferential attachment function is the identity. Wang et al. (2013)
estimated the fitness function rather than using the more classical approach where the latter is taken to be
an iid sample of random variables.
which would imply the limiting statement in (9.2.18). Lee and Olvera-Cravioto (2020) used the results in
Cao and Olvera-Cravioto (2020) to prove that the limiting PageRank of such directed generalized random
graphs exists and that the solution obeys the same recurrence relation as on a branching-process tree. In
particular, under certain independence assumptions, this implies that the PageRank power-law hypothesis
is valid for such models.
Directed configuration models were investigated by Cooper and Frieze (2004), who proved Theorem 9.5.
In fact, the results in Cooper and Frieze (2004) also prove detailed bounds on the strongly connected com-
ponent in the subcritical regime, as well as precise bounds on the number of vertices whose forward and
backward clusters are large and the asymptotic size of forward and backward clusters. A substantially sim-
pler proof was given by Cai and Perarnau (2021).
Both Cooper and Frieze (2004) as well as Cai and Perarnau (2021) made additional assumptions on the
degree distribution. In particular, they assumed that E[Dn(in) Dn(out) ] → E[D(in) D(out) ] < ∞, which we
do not. Further, Cooper and Frieze (2004) assumed that d is proper, which is a technical requirement on
the degree sequence stating that (a) E[(Dn(in) )2 ] = O(1), E[(Dn(out) )2 ] = O(1); (b) E[Dn(in) (Dn(out) )2 ] =
o(n1/12 log n). In view of the fact that such conditions do not appear in Theorem 4.9, these conditions can
be expected to be suboptimal for Theorem 9.5 to hold, and we next explain how they can be avoided by a
suitable degree-truncation argument, which we now explain:
Assume that the out- and in-degrees in the directed configuration model DCMn (d) satisfy (9.2.25) and
(9.2.26). By Exercise 9.19 below, |Cmax | ≤ n(ζ + ε) whp for n large and any ε > 0. Exercise 9.19 is
proved by an adaptation of the proof of Corollary 2.27 in the undirected setting. Thus, we need to show
only that |Cmax | > n(ζ − ε) whp for n large and any ε > 0
Fix b > 1 large. We now construct a lower bounding directed configuration model where all the degrees
are bounded above by b. This is similar to the degree-truncation argument for the undirected configuration
model discussed in Section 1.3.3 (recall Theorem 1.11). When v is such that dv = d(out) v + d(in)
v ≥ b, we
split v into nv = ddv /be vertices, and we deterministically redistribute all out- and in-half-edges over the
nv vertices in an arbitrary way such that the out- and in-degrees of all the vertices that used to correspond
to v now have both out- and in-degree bounded by b. Denote the corresponding random graph by DCM0n .
The resulting degree sequence again satisfies (9.2.25) and (9.2.26). Moreover, for b > 1 large and by
(9.2.25) and (9.2.26), the limits in (9.2.25) and (9.2.26) for the new degree sequence are quite close to the
original limits of the old degree sequence, while the degrees are now bounded. As a result, we can apply
the original result in Cooper and Frieze (2004) or Cai and Perarnau (2021) to the new setting.
DenotePthe size of the largest SCC in DCM0n by |Cmax 0 0
P|. Obviously, since we split vertices, |Cmax | ≤
0
|Cmax |+ v∈[n] (nv −1). Therefore, |Cmax | ≥ |Cmax |− v∈[n] (nv −1). Take b > 0 sufficiently large that
P 0 0
v∈[n] (nv − 1) ≤ εn/3 and that ζ ≥ ζ − ε/3, where ζ is the forward–backward survival probability of
0 0
the limiting directed DCMn and ζ that of DCMn (d). Finally, for every ε > 0, whp |Cmax | ≥ n(ζ 0 −ε/3)n.
As a result, we obtain that, again whp and as required,
0
|Cmax | ≥ |Cmax | − nε/3 ≥ n(ζ 0 − 2ε/3) ≥ n(ζ − ε). (9.6.3)
Chen and Olvera-Cravioto (2013) studied a way to obtain nearly iid in- and out-degrees in the directed
configuration model. Here, the problem is that if ((d(in)
v , dv
(out)
))v∈[n] is an iid bivariate distribution with
(in) (out)
P
equal means, then v∈[n] (dv − dv ) has Gaussian fluctuations at best, so that it will likely not be zero.
Chen and Olvera-Cravioto (2013) indicated how the excess in- or out-half-edges can be removed so as to
keep the degrees close to iid. Further, they showed that the removal of self-loops and multiple directed edges
does not significantly change the degree distribution (so that, in particular, one would assume that the local
limits would be the same, but Chen and Olvera-Cravioto (2013) considered only the degree distribution).
Theorem 9.6 was first proved by van der van der Hoorn and Olvera-Cravioto (2018) under stronger
assumptions, but then the claim proved is also much stronger. Indeed, van der van der Hoorn and Olvera-
Cravioto (2018) not only identified the first-order asymptotics, as in Theorem 9.6, but also fluctuations like
those stated for the undirected configuration model in Theorem 7.24. This proof is substantially harder
than that of (Cai and Perarnau, 2023, Proposition 7.7). Theorem 9.7 was proved by Colenbrander (2022).
Theorem 9.8 was proved by Cai and Perarnau (2023).
Directed preferential attachment models and its PageRank. The backward local limit of the preferential
attachment model with random out-degrees was identified by Banerjee et al. (2023), using a collapsing
procedure on a continuous-time branching process in the spirit of Garavaglia and van der Hofstad (2018).
Theorem 9.9 is (Banerjee and Olvera-Cravioto, 2022, Theorem 3.1), Theorem 9.10 is (Banerjee and Olvera-
Cravioto, 2022, Theorem 3.3).
9.6 Notes and Discussion for Chapter 9 451
Karoński et al. (1999); Stark (2004). Theorem 9.20 is (Deijfen and Kets, 2009, Theorem 1.1), where the
authors also proved that clustering can be tuned. The model was also investigated for more general distribu-
tions of groups per vertex by Godehardt and Jaworski (2003); Jaworski et al. (2006). Random intersection
graphs with prescribed degrees and groups are studied in a non-rigorous way in Newman (2003); Newman
and Park (2003). We refer to Bloznelis et al. (2015) for a survey of recent results.
Rybarczyk (2011) studied various properties of the random intersection graph when each vertex is in
precisely d groups that are all chosen uar from the collection of groups. In particular, Rybarczyk (2011)
proves results on the giant as in Theorem 9.22, as well as on the diameter of the graph, which is ΘP (log n)
when the model is sparse.
Bloznelis (2009, 2010a,b) studied a general random intersection model, where the sizes of groups are
iid random variables, and the sets of the vertices in them are chosen uar from the vertex set. His results
include distances (Bloznelis (2009)) and component sizes (Bloznelis (2010a,b)). Bloznelis (2013) studied
the degree and clustering structure in this setting.
Theorem 9.21 is proved in Kurauskas (2022); see also van der Hofstad et al. (2021). Both papers investi-
gate more general settings: Kurauskas (2022) also allows for settings with independent group memberships,
while van der Hofstad et al. (2021) also allows for more general group structures than the complete graph.
Theorem 9.22 is proved in van der Hofstad et al. (2022).
Random intersection graphs with communities. van der Hofstad et al. (2021) proposed a model that com-
bines the random intersection graph with more general communities than complete graphs. van der Hofstad
et al. (2021) identified the local limit, as well as the nature of the overlaps between different communities.
van der Hofstad et al. (2022) identified the giant component, also when percolation is being performed on
the model. See Vadon et al. (2019) for an informal description of the model, aimed at a broad audience.
Exponential random graphs. For a general introduction to exponential random graphs, we refer to Snijders
et al. (2006) and Wasserman and Pattison (1996). Frank and Strauss (1986) discussed the notion of Markov
graphs, for which the edges of the graph form a Markov field. The general exponential random graph is
only a Markov field when the subgraphs are restricted to edges, stars of any kind, and triangles. This is
exemplified by Example 9.24, where general degrees are used and give rise to a model with independent
edges. Kass and Wasserman (1996) discussed relations to Bayesian statistics.
For a discussion on the relation between statistical mechanics and exponential models, we refer to Jaynes
(1957). Let us now explain the relation between exponential random graphs and entropy maximization. Let
(px )x∈X be a probability measure on a general discrete set X . We define its entropy by
X
H(p) = − px log px . (9.6.4)
x∈X
Entropy measures the amount of randomness in a system. Shannon (1948) proved that the entropy is
the unique quantity that is positive, increases with increasing uncertainty, and is additive for independent
sources of uncertainty, so it is a very natural quantity.
The relation to exponential random graphs is that they are the random graphs that, for fixed expected
values of the subgraph counts NF (Gn ), optimize the entropy. Indeed, recall that X = (Xi,j )i≤i<j≤n are
Pof the graph, so that Gn is uniquely characterized by X. Then maximize H(p) over all
the edge statuses
the p such that x NF (x) = αF for some given αF and all subgraphs F ∈ F, where F is an appropriate
set of subgraphs. Then, using Lagrange multipliers, the optimization problem reduces to
1 PF ∈F βF NF (x)
pβ~ (x) = e , (9.6.5)
Z
where Z = Zn (β) ~ is the normalization constant given in (9.4.33), and β ~ = (βF )F ∈F is chosen as the
solution to (9.4.34). This implies that, indeed, the exponential random graph model optimizes the entropy
under this subgraph constraint.
See also Kass and Wasserman (1996) for a discussion of maximum entropy and a reference to its long
history as well as a critique of the method.
An important question is how one can find the appropriate β ~ = (βF )F ∈F such that (9.4.34) holds.
~ in (9.4.33) is quite
This is particularly difficult, since the computation of the normalization constant Zn (β)
hard. Often, Markov chain Monte Carlo (MCMC) techniques are used to sample efficiently from pβ~ . In this
case, such MCMC techniques perform a form of Glauber dynamics, for which (9.4.32) is the stationary
distribution. One can then try to solve (9.4.34) by keeping track of the value of NF in the simulation.
9.6 Notes and Discussion for Chapter 9 453
However, these methods can be very slow, as well as daunting, since the behavior of NF (X) under pβ~
~ making P NF (x)p ~ (x) very sensitive to small changes of β.
may undergo a phase transition in β, ~ See in
x β
particular Chatterjee and Diaconis (2013) for a discussion on this topic.
Bhamidi et al. (2011) (see also Bhamidi et al. (2008)) studied the mixing time of the exponential random
graph when edges are changed dynamically in a Glauber way. The results were somewhat disappointing,
since either edges are close to being iid as in an Erdős–Rényi random graph or the mixing is very slow.
These results, however, apply only to dense settings where the number of edges grows quadratically with
the number of vertices. This problem is closely related to large deviations on dense Erdős–Rényi random
graphs. See also Chatterjee (2017) for background on such large deviations and Chatterjee and Varadhan
(2011) for the original paper.
Results on sparse exponential random graphs are limited. We refer to Chakraborty et al. (2021) for a
discussion of how sparse exponential random graphs with a linear number of triangles can be obtained.
For more background on sufficient statistics and their relation to symmetries, we refer to Diaconis (1992).
For a discussion of the relation between information theory and exponential models, we refer to Shore and
Johnson (1980).
and Abdullah et al. (2017) identified their ultra-small-world behavior in Theorem 9.29. Bläsius et al. (2018)
studied the size of the largest cliques in hyperbolic graphs.
Geometric inhomogeneous random graphs (GIRG)s. Product GIRGs were introduced by Bringmann et al.
(2019) and Bringmann et al. (2020). The relation between the hyperbolic random graph and the product
GIRG as described in Theorem 9.31 can be found in (Bringmann et al., 2017, Section 7), where limits
are derived up to constants. Theorem 9.31 was first proved in (Komjáthy and Lodewijks, 2020, Section
7) under conditions slightly different from Assumption 9.30. The current statement of Assumption 9.30
is Assumptions 1.5–1.7 in van der Hofstad et al. (2023). An adaptation of the proof of Theorem 9.31 can
be found in Section 2.1.3 in van der Hofstad et al. (2023). Komjáthy and Lodewijks (2020) also studied
its weighted distances focussing on the case where τ ∈ (2, 3). Theorem 9.34 was proved in Bringmann
et al. (2020) for product GIRGs with τ ∈ (2, 3), except for the law of large numbers of the giant. This
again follows by a “giant is almost local” proof combined with the bound on the second largest component,
proved in Bringmann et al. (2020), and the identification of the local limit in van der Hofstad et al. (2023).
The result for the model in (9.5.35) and (9.5.36) is (Jorritsma et al., 2023, Corollary 2.3). The GIRGs in
(9.5.35) and (9.5.36) are called interpolating kernel-based spatial random graphs by Jorritsma et al. (2023).
The main focus of Jorritsma et al. (2023) is the size of the second largest connected component |C(2) |, on
which the authors proved sharp polylogarithmic bounds with the correct exponent.
The local convergence in probability in Theorem 9.33 is proved in van der Hofstad et al. (2023) using
path-counting techniques. Local weak convergence for product GIRGs was proved under slightly different
assumptions by Komjáthy and Lodewijks (2020) (see (Komjáthy and Lodewijks, 2020, Assumption 2.5)).
In more detail, a coupling version of Theorem 9.33 is stated in (Komjáthy and Lodewijks, 2020, Claim
3.3), where a blown-up version of the product GIRG is bounded from below and above by the limiting
model with slightly smaller and larger intensities, respectively. Take a vertex uar in the product GIRG.
Then whp it is also present in the lower and upper bounding Poisson infinite GIRG. Similarly, whp none of
the edges within a ball of intrinsic radius r will be different in the three models, which proves local weak
convergence. Local convergence in probability would follow from a coupling of the neighborhoods of two
uniformly chosen vertices in the GIRG to two independent limiting copies. Such independence is argued
in (Komjáthy and Lodewijks, 2020, proof of Theorem 2.12), in particular in the text around (Komjáthy and
Lodewijks, 2020, (3.16)). It can be expected that a hyperbolic random graph in d dimensions can be mapped
to a product GIRG in d−1 dimensions. The one-dimensional nature of the model for d = 2 discussed below
Theorem 9.34 should thus not arise when d ≥ 3, and one can expect a giant to exist for τ > 3 also.
Local limits as arising in Theorem 9.33, in turn, were studied by Hirsch (2017). Fountoulakis (2015)
studied an early version of a geometric Chung–Lu model.
Spatial preferential attachment models. Our exposition follows Jordan (2010, 2013); Flaxman et al. (2006,
2007). Jordan (2010) treats the case of uniform locations of the vertices, a problem first suggested by
Flaxman et al. (2006, 2007). Jordan (2013) studies preferential attachment models where the vertices are
located in a general metric space with not-necessarily-uniform location of the vertices. This is more difficult,
as then the power-law degree exponent depends on the location of the vertices. Theorem 9.37 follows from
(Flaxman et al., 2006, Theorem 1(a)), which is quite a bit sharper, as it states detailed concentration results
as well. Further results involve the proof of connectivity of the resulting graph and an upper bound on
the diameter of order O(log (n/r)) when r ≥ n−1/2 log n, m ≥ K log n for some large enough K,
and α ≥ 0. Flaxman et al. (2007) generalized these results to the setting where, instead of a unit ball, a
smoother version is used, while the majority of points were still within a distance rn = o(1). Theorem
9.38 is (Jordan, 2010, Theorem 2.1). Theorem 9.39 is (Jordan and Wade, 2015, Theorem 2.4). (Jordan and
Wade, 2015, Theorem 2.2) shows that the degree distribution for α(r) = exp{(log(1/r))γ } for γ > 23 is
the same as the so-called online nearest-neighbor graph, for which (Jordan and Wade, 2015, Theorem 2.1)
shows that the limiting degree distribution has exponential tails. Manna and Sen (2002) studied geometric
preferential attachment models from a simulation perspective.
Theorem 9.40 is (Jordan, 2013, Theorems 2.1 and 2.2). (Jordan, 2013, Theorem 2.3) contains partial
results for the setting where S is infinite. These results are slightly weaker, as they do not characterize the
degree power-law exponent exactly.
Aiello et al. (2008) gave an interpretation of spatial preferential attachment models in terms of influence
regions and proved Theorem 9.41 (see (Aiello et al., 2008, Theorem 1.1)). Further results involve the study
of maximal in-degrees and the total number of edges. See also Janssen et al. (2016) for a version with
non-uniform locations.
Jacob and Mörters (2015) studied the degree distribution and local clustering in a related geometric
preferential attachment model. Jacob and Mörters (2017) studied the robustness of the giant component in
9.6 Notes and Discussion for Chapter 9 455
that model, and also presented heuristics that distances are ultra-small in the case where the degrees have
infinite variance.
For a relation between preferential attachment graphs with so-called fertility and aging, and a geometric
competition-induced growth model for networks, we refer to Berger et al. (2004, 2005) and the references
therein. Zuev et al. (2015) studied how geometric preferential attachment models give rise to soft commu-
nities.
Complex network models on the hypercubic lattice. Below we give references to the literature.
Scale-free percolation was introduced by Deijfen et al. (2013). We have adapted the parameter choices,
so that the model is closer to the geometric inhomogeneous random graph. In particular, in Deijfen et al.
(2013), (9.5.58) is replaced with
α
pxy = 1 − e−λWx Wy /|x−y| , (9.6.6)
−γ
and then the power-law exponent for the degrees is such that P(Do > k) ≈ k , where γ = α(τ − 1)/d
and τ is the weight power-law exponent as in (9.5.15). The current set-up has the advantage that the degree
power-law exponent agrees with that of the weight distribution.
The fact that λc < ∞ holds in most cases is (Deijfen et al., 2013, Theorem 3.1). Theorem 9.43(a) is
(Deijfen et al., 2013, Theorem 4.2), Theorem 9.43(b) is (Deijfen et al., 2013, Theorem 4.4). Deprez et al.
(2015) showed that the percolation function is continuous when α ∈ (d, 2d), i.e., θ(λc ) = 0. However, in
full generality, continuity of the percolation function at λ = λc when λc > 0 is unknown.
Theorem 9.44(a) was proved in Deijfen et al. (2013); van der Hofstad and Komjáthy (2017), see in
particular Corollary 1.4 in van der Hofstad and Komjáthy (2017). Theorem 9.44(b) was proved in Heyden-
reich et al. (2017); Hao and Heydenreich (2023), following up on similar results for long-range percolation
proved by Biskup (2004); Biskup and Lin (2019). In long-range percolation, edges are present indepen-
dently, and the probability that the edge {x, y} is present equals |x − y|−αd+o(1) for some α > 0. In
this case, detailed results exist for the limiting behavior of distG (o, x) depending on the value of α. For
example, in Benjamini et al. (2004), it is shown that the diameter of this infinite percolation model is equal
to d1/(1 − α)e almost surely when α ∈ (0, 1). Theorem 9.44(c) is (Deprez et al., 2015, Theorem 8(b)).
Deprez and Wüthrich (2019) investigated graph distances in the continuum scale-free percolation model;
a related result was proved in the long-range percolation setting by Sönmez (2021), who also addressed
bounds on graph distances for α ∈ {1, 2}.
There is some follow-up work on scale-free percolation. Hirsch (2017) proposed a continuum model for
scale-free percolation. Deprez et al. (2015) argued that scale-free percolation can be used to model real-life
networks. Heydenreich et al. (2017) established recurrence and transience criteria for random walks on the
infinite connected component. For long-range percolation this was proved by Berger (2002).
Spatial configuration models on the lattice were introduced by Deijfen and Jonasson (2006); see also
Deijfen and Meester (2006). In our exposition, we follow Jonasson (2009), who studied more general un-
derlying graphs, such as trees or other infinite transitive graphs. Theorem 9.45 is (Jonasson, 2009, Theorem
3.1). (Jonasson, 2009, Theorem 3.2) extended Theorem 9.45 to settings where the degrees are not iid, but
rather translation invariant. In this case, it is still necessary for (9.5.65) that E[D(d+1)/d ] < ∞, but this
may not be enough. Sharper conditions are restricted to the setting where d = 1 (and where no condition
of the form E[Dk ] < ∞ suffices) and d = 2 (for which it suffices that E[D(d+1)/(d−1)+α ] < ∞ for some
α > 0, but if E[D(d+1)/(d−1)−α ] = ∞ for some α > 0, then there exist translation-invariant matchings
for which (9.5.65) fails).
We next discuss the properties of the model in d = 1 where the directions of the half-edges are chosen
independently. (Deijfen and Meester, 2006, Proposition 2.1) shows that, when the direction is chosen with
probability p 6= 12 , the maximal edge length N is infinite with positive probability. (Deijfen and Meester,
2006, Theorem 2.1) shows that N < ∞ almost surely when p = 12 , while (Deijfen and Meester, 2006,
Theorem 4.1) implies that E[N ] = ∞ when p = 12 .
Deijfen (2009) studied a related model where the vertices are a Poisson point process on Rd . This model
was further studied by Deijfen et al. (2012). In the latter paper, the surprising result is shown that for
any sequence of iid degrees of the points of the Poisson process, there are translation-invariant matchings
that percolate, as well as matchings that do not. Further, this matching can be a factor, where a translation-
invariant matching is called a factor if it is a deterministic function of the Poisson process and of the degrees
of the vertices in the Poisson process, that is, if it does not involve any additional randomness. See (Deijfen
et al., 2012, Theorem 1.1) for more details.
A threshold scale-free percolation model. We finally discuss the results by Yukich (2006) on another infinite
456 Related Models
geometric random graph model. We start by taking an iid sequence (Wx )x∈Zd of random variables on
[1, ∞) satisfying (9.5.15) with a constant slowly varying function. Fix δ ∈ (0, 1]. The edge {x, y} ∈
Zd × Zd appears in the random graph precisely when
exists, so that the model has a power-law degree sequence with power-law exponent τ (recall (1.4.3)). The
intuitive explanation of (9.6.8) is as follows. Suppose we condition on the value of Wo = w. Then, the
conditional distribution of Do given that Wo = w is equal to
1{|x|≤min{W τ /d ,W τ /d }} = 1{|x|≤W τ /d } .
X X
Do = (9.6.9)
o x x
x∈Zd x : |x|≤wτ /d
Note that the random variables (1{|x|≤W τ /d } )x∈Zd are independent Bernoulli random variables with pa-
x
rameter equal to
P(1{|x|≤W τ /d } = 1) = P(W ≥ |x|d/τ ) = |x|−d(τ −1)/τ . (9.6.10)
x
In order for Do ≥ k to occur, for k large, we must have that Wo = w is quite large, and, in this case, a
central limit theorem should hold for Do , with mean equal to
X
E[Do | Wo = w] = |x|−d(τ −1)/τ = cw(1 + o(1)), (9.6.11)
x : |x|≤wτ /d
for some explicit constant c = c(τ, d). Furthermore, the conditional variance of Do given that Wo = w is
bounded above by its conditional expectation, so that the conditional distribution of Do given that Wo = w
is highly concentrated. We omit further details, and merely note that this heuristic can be made precise by
using standard concentration results. Assuming sufficient concentration, we obtain that the probability that
Do ≥ k is asymptotically equal to the probability that W > wk , where wk is determined by the equation
P(Do > k) = P(W > k/c)(1 + o(1)) = (k/c)−(τ −1) (1 + o(1)), (9.6.13)
The result in Theorem 9.46 shows that distances in the model given by (9.6.7) are much smaller than
those in normal percolation models. Recall Meta Theorem B at the start of Part III. While Theorem 9.46
resembles the results in Meta Theorem B, the differences reside in the fact that distances are ultra-small
independently of the exact value of the degree power-law exponent.
Again, the result in Theorem 9.46 can be compared with similar results for long-range percolation (recall
the discussion of scale-free percolation).
Exercise 9.3 (Local convergence for randomly directed graphs) Let (Gn )n≥1 be a random graph se-
quence that converges locally in probability. Give each edge e a random orientation, by orienting e =
{u, v} as e = (u, v) with probability 12 and as e = (v, u) with probability 12 , independently across edges.
Show that the resulting digraph converges locally in probability in the marked forward, backward, and
forward–backward senses.
Exercise 9.4 (Local convergence for randomly directed graphs (cont.)) In the setting of Exercise 9.3,
assume that the convergence of (Gn ) is locally weakly. Conclude that the resulting digraph converges
locally weakly in the marked forward, backward, and forward–backward senses.
Exercise 9.5 (Local convergence for directed version of PA(m,δ)
n (d)) Consider the edges in PA(m,δ)
n (d)
to be oriented from young to old, so that the resulting digraph has out-degree m and random in-degrees.
Use Theorem 5.8 and 5.21 to show that this digraph converges locally in probability in the marked forward,
backward, and forward–backward senses.
Exercise 9.6 (Power-law lower bound for PageRank on directed version of PA(m,δ)
n (d)) Recall the di-
rected version of PA(m,δ)
n (d) in Exercise 9.5. Use (9.2.10) to show that there exists a constant c =
c(α, δ, m) > 0 such that
δ
P(R∅ > r) ≥ cr−τ , where τ =3+ (9.7.2)
m
is the power-law exponent of PA(m,δ)
n (d). What does this say about the PageRank power-law hypothesis for
the directed version of PA(m,δ)
n (d)?
Exercise 9.7 (Power-law lower bound for PageRank on digraphs with bounded out-degrees) Let (Gn )n≥1
be a random digraph sequence that converges locally in probability in the marked backward sense to
(D, ∅) ∼ µ. Assume that there exist 0 < a, b < ∞ such that d(out)v ∈ [a, b] for all v ∈ V (Gn ). As-
sume that µ(D∅ (in)
> r) ≥ cr−γ for some γ > 0. Use (9.2.10) to show that there exists a constant c0 > 0
such that µ(R∅ > r) ≥ c0 r−γ .
Exercise 9.8 (Mean number of edges in DGRGn (w)) Consider the directed generalized random graph,
as formulated in (9.2.14) and (9.2.15). Assume that the weight-regularity condition in (9.2.16) holds. Let
Xij be the indicator that there is a directed edge from i to j (with Xii = 0 for all i ∈ [n] by convention).
Show that
1 h X i E[W (in) ]E[W (out) ]
E Xij → . (9.7.3)
n E[W (in) + W (out) ]
i,j∈[n]
Conclude that the limit equals E[W (in) ] = E[W (out) ] when the symmetry condition in (9.2.18) holds.
458 Related Models
Exercise 9.9 (Number of edges in DGRGn (w)) In the setting of (9.7.3) in Exercise 9.8, show that
1 X P E[W (in) ]E[W (out) ]
Xij −→ , (9.7.4)
n E[W (in) + W (out) ]
i,j∈[n]
which equals 12 E[W (in) ] = 21 E[W (out) ] when the symmetry condition in (9.2.18) holds.
Exercise 9.10 (Local limit of directed Erdős–Rényi random graph) Use Theorem 9.2 to describe the local
limit of the directed Erdős–Rényi random graph.
Exercise 9.11 (Local convergence for finite-type directed inhomogeneous random graphs) Adapt the proof
of Theorem 3.14 to prove Theorem 9.2 in the case of finite-type kernels. Here, we recall that a kernel κ is
called finite type when (s, r) 7→ κ(s, r) takes on finitely many values.
Exercise 9.12 (Local convergence for DGRGn (w)) Consider the directed generalized random graph
as formulated in (9.2.14) and (9.2.15). Assume that the weight-regularity condition in (9.2.16) holds. Use
Theorem 9.2 to determine the local limit in probability of DGRGn (w). Is the local limit of the forward and
backward neighborhoods a single- or a multi-type branching process?
Exercise 9.13 (Phase transition for directed Erdős–Rényi random graph) For the directed Erdős–Rényi
random graph, show that ζ in Theorem 9.3 satisfies ζ > 0 precisely when λ > 1.
Exercise 9.14 (Phase transition for directed generalized random graph) Consider the directed general-
ized random graph, as formulated in (9.2.14) and (9.2.15). Assume that the weight-regularity condition in
(9.2.16) holds. What is the condition on the asymptotic weight distribution (W (out) , W (in) ) in (9.2.16) that
is equivalent to ζ > 0 in Theorem 9.3?
Exercise 9.15 (Correlations of out- and in-degrees of a randomly directed graph) In an undirected graph
G, randomly direct each edge by orienting e = {u, v} as (u, v) with probability 12 and as (v, u) with
probability 12 , as in Exercise 9.3. Let v ∈ V (G) be a vertex in G of degree dv . What is the correlation
coefficient between its out- and in-degrees in the randomly directed version of G? Note:
p The correlation
coefficient ρ(X, Y ) between two random variables X and Y is equal to Cov(X, Y )/ Var(X)Var(Y ).
Exercise 9.16 (Equivalence of convergence of in- and out-degrees in DCMn (d)) Show that (9.2.24)
d
implies that E[D(out) ] = E[D(in) ] when (Dn(in) , Dn(out) ) −→ (D(in) , D(out) ), E[Dn(in) ] → E[D(in) ], and
E[Dn(out) ] → E[D(in) ].
Exercise 9.17 (Self-loops and multiple edges in DCMn (d)) Argue that the proof of [V1, Proposition
7.13] can be adapted to show that the number of self-loops in DCMn (d) converges to a Poisson random
d
variable with parameter E[D(in) D(out) ] when (Dn(in) , Dn(out) ) −→ (D(in) , D(out) ) and
E[Dn(in) Dn(out) ] → E[D(in) D(out) ]. (9.7.5)
What can you say about the number of multiple edges in DCMn (d)? Note: No proof is expected, a reason-
able argument suffices.
Exercise 9.18 (Local convergence for DCMn (d) in Theorem 9.4) Give a proof of the local limit result in
Theorem 9.4 by suitably adapting the proof of Theorem 4.1.
Exercise 9.19 (One-sided law of large numbers for SSC) Adapt the proof of Corollary 2.27 to show that
when Gn = ([n], E(Gn )) converges locally in probability in the forward–backward sense to (G, o) having
distribution µ, then the size of the largest strongly connected component |Cmax | satisfies that, for every
ε > 0 fixed,
P(|Cmax | ≤ n(ζ + ε)) → 1, (9.7.6)
where ζ = µ(|C (o)| = ∞) is the forward–backward survival probability of the limiting graph (G, o) (i.e.,
the probability that both the forward and the backward component of o have infinite size).
Exercise 9.20 (Subcritical directed configuration model) Let DCMn (d) be a directed configuration
model that satisfies the degree-regularity conditions in (9.2.25) and (9.2.26). Let Cmax denote its largest
P
strongly connected component. Use Exercise 9.19 to show that |Cmax |/n −→ 0 when ζ = 0, where
9.7 Exercises for Chapter 9 459
ζ = µ(|C (o)| = ∞) is the forward–backward survival probability of the limiting graph (G, o). This
proves the subcritical result in Theorem 9.5(b).
Exercise 9.21 (Logarithmic growth of typical distances in the directed configuration model) Let DCMn (d)
be a directed configuration model that satisfies the degree-regularity conditions in (9.2.25) and (9.2.26). Ar-
gue heuristically why the logarithmic typical distance result in Theorem 9.6 remains valid when (9.2.37)
is replaced by the weaker condition that (Dn(in) Dn(out) )n≥1 is uniformly integrable. Also, give an example
where this uniform integrability is true, but (9.2.37) is not.
Exercise 9.22 (Logarithmic growth of typical distances in the directed configuration model (cont.)) Let
DCMn (d) be a directed configuration model that satisfies the degree-regularity conditions in (9.2.25) and
(9.2.26). Give a formal result of the claim in Exercise 9.21 by a suitable degree-truncation argument, as
explained above (9.6.3).
Exercise 9.23 (Ultra-small distances in the directed configuration model) Let DCMn (d) be a directed
configuration model that satisfies the degree-regularity conditions in (9.2.25) and (9.2.26). Use the degree-
truncation argument, as explained above (9.6.3), to show that distDCMn (d) (o1 , o2 ) = oP (log n) when
ν = ∞.
Exercise 9.24 (Strongly connected component in temporal networks) Let G be a temporal network, in
which vertices have a time label of their birth and edges are oriented from younger to older vertices. What
do the strongly connected components of G look like?
Exercise 9.25 (Degree structure in stochastic block models) Recall the definition of the stochastic block
model in Section 9.3.1, and assume that the type regularity condition in (9.3.1) holds. What is the asymptotic
expected degree of this model? When do all vertices have the same asymptotic expected degree?
Exercise 9.26 (Giant in stochastic block models) Recall the definition of the stochastic block model in
Section 9.3.1, and assume that the type regularity condition in (9.3.1) holds. When is there a giant compo-
nent?
Exercise 9.27 (Random guessing in stochastic block models) Consider a stochastic block model with t
types as introduced in Section 9.3.1 and assume that each of the types occurs equally often. Let σ̂(v) be a
random guess, so that (σ̂(v))v∈[n] is an iid vector, with σ̂(v) = s with probability 1/t for every s ∈ [t].
Show that (9.3.2) does not hold, i.e., show that the probability that
1 X h 1i
max 1{σ̂(v)=(p◦σ)(v)} − ≥ ε
p : [t]→[t] n t
v∈[n]
vanishes.
Exercise 9.28 (Degree structure in stochastic block models with unequal expected degrees) Let n be even.
Consider the stochastic block model with two types and n/2 vertices with each types. Let pij = a1 /n when
i, j have type 1, pij = a2 /n when i, j have type 2, and pij = b/n when i, j have different types. For
i ∈ {1, 2} and k ∈ N0 , let Ni,k (n) denote the number of vertices of type i and degree k. Show that
Ni,k (n) P
−→ P(Poi(λi ) = k), (9.7.7)
n
where λi = (ai + b)/2.
Exercise 9.29 (Community detection in stochastic block models with unequal expected degrees) In the
setting of Exercise 9.28, assume that a1 > a2 . Consider the following greedy community detection algo-
rithm: let σ̂(v) = 1 for the n/2 vertices v ∈ [n] of highest degree, and σ̂(v) = 2 for the remaining vertices
(breaking ties randomly when necessary). Argue that this algorithm achieves the solvability condition in
(9.3.2).
Exercise 9.30 (Parameter conditions for solvable stochastic block models) Consider the stochastic block
model in the setting of Theorem 9.12, and assume that (a − b)2 > 2(a + b), so that the community detection
problem is solvable. Show that a − b > 2 and thus also a + b > 2. Conclude that this model has a giant.
Exercise 9.31 (Parameter conditions for solvable stochastic block models (cont.)) In the setting of Exer-
cise 9.30, show that also the vertices of type 1 only (resulting in an Erdős–Rényi random graph of size n/2
and edge probability a/n) have a giant. What are the conditions for the vertices of type 2 to have a giant?
460 Related Models
Exercise 9.32 (Degree structure in degree-corrected stochastic block models) Recall the definition of the
degree-corrected stochastic block model in (9.3.11) in Section 9.3.2, and assume that the type regularity
condition in (9.3.1) holds. Assume further that E[Xup ] < ∞ for some p > 1. What is the asymptotic
expected degree of a vertex v of weight xv in this model? What are the restrictions on (κ(s, r))s,r∈S such
that the expected degree of vertex v with weight xv is equal to xv (1 + oP (1))?
Exercise 9.33 (Equal average degrees in degree-corrected stochastic block models) Recall the definition
of the degree-corrected stochastic block model in (9.3.11) in Section 9.3.2, and assume that (9.3.1) holds.
Let the number of types t ≥ 2 be arbitrary and assume that κ(s, r) = b for all s 6= r, while κ(s, s) = a.
Assume that µ(s) = 1/t for every s ∈ [t]. Compute the asymptotic average degree of a vertex of type s,
and show that it is independent of s.
Exercise 9.34 (Giant in the degree-corrected stochastic block models) Recall the definition of the degree-
corrected stochastic block model in Section 9.3.2, and assume that the type regularity condition in (9.3.1)
holds. When is there a giant component?
Exercise 9.35 (Degrees in configuration models with global communities) Recall the definition of the
configuration models with global communities in Section 9.3.3, and assume that the degree regularity con-
ditions in (9.3.18), (9.3.19), and (9.3.20) hold. What is the asymptotic average degree of this model? When
do all vertices have the same asymptotic expected degree?
Exercise 9.36 (Local limit in configuration models with global communities) Recall the definition of the
configuration models with global communities in Section 9.3.3, and assume that (9.3.18), (9.3.19), and
(9.3.20) hold. What is the local limit of this model? Note: No proof is expected; a reasonable argument
suffices.
Exercise 9.37 (Giant in configuration models with global communities) Recall the definition of the con-
figuration model with global communities in Section 9.3.3, and assume that (9.3.18), (9.3.19), and (9.3.20)
hold. When is there a giant component? Note: No proof is expected; a reasonable argument suffices.
Exercise 9.38 (Degree distribution in preferential attachment model with global P communities) Show that
(pk (θ))k≥m in (9.3.29) is a probability distribution for all θ, i.e., show that k≥m pk (θ) = 1 and pk (θ) ≥
0 for all k ≥ 1.
Exercise 9.39 (Degree distribution in preferential attachment model with global communities) In the
preferential attachment model with global
Pcommunities studied in Theorem 9.14, show that also the global
degree distribution given by Pk (n) = n1 v∈[n] 1{Dv (n)=k} converges almost surely.
Exercise 9.40 (Power-law degrees in preferential attachment model with global communities) In the pref-
erential attachment models with global communities studied in Theorem 9.14, show that the global degree
distribution has a power-law tail with exponent τ = 1 + 1/ maxs∈[s] θ? (s), provided that µ(s? ) > 0 for
at least one s? ∈ [r] satisfying θ? (s? ) = maxs∈[s] θ? (s).
Exercise 9.41 (Clustering in model with edges and triangles) Show that the global clustering coefficient in
the model where each pair of vertices is independently connected with probability λ/n, as for ERn (λ/n),
and each triple forms a triangle with probability µ/n2 , independently for all triplets and independently of
the status of the edges, converges to µ/((λ + µ)2 + µ).
Exercise 9.42 (Local limit in inhomogeneous random graph with communities) Recall the definition of
the inhomogeneous random graph with communities in Section 9.4.1. What is the local limit of this model?
Note: No proof is expected; a reasonable argument suffices.
Exercise 9.43 (Size-biased community size distribution in HCM) In the hierarchical configuration model
introduced in Section 9.4.2, choose a vertex o uar from [n]. Let Go be the community containing o. Show
that (9.4.16) implies that |V (Go )| converges in distribution, and identify its limiting distribution.
Exercise 9.44 (Local limit in hierarchical configuration model) Recall the definition of the hierarchical
configuration model in Theorem 9.19. What is the local limit of this model? Note: No proof is expected; a
reasonable argument suffices.
Exercise 9.45 (Law of large numbers for |Cmax | in hierarchical configuration model) Use Theorem 4.9 to
prove the law of large numbers for the giant in the hierarchical configuration model in Theorem 9.19, and
prove that ζ is given by (9.4.19).
9.7 Exercises for Chapter 9 461
Exercise 9.46 (Local clustering for configuration model with clustering) Recall the configuration model
with clustering defined in Section 9.4.2. Let (Dn(si) , Dn(tr) ) denote the number of simple edges and triangles
d
incident to a uniform vertex in [n], and assume that (Dn(si) , Dn(tr) ) −→ (D(si) , D(tr) ) for some limiting dis-
tribution (D(si) , D(tr) ). Compute the local clustering coefficient of this model under the extra assumptions
that E[Dn(si) ] → E[D(si) ] < ∞ and E[Dn(tr) ] → E[D(tr) ] < ∞.
Exercise 9.47 (Local clustering for configuration model with clustering) In the setting of Exercise 9.46,
compute the global clustering coefficient of this model under the extra assumption that also E[(Dn(si) )2 ] →
E[(D(si) )2 ] < ∞ and E[(Dn(tr) )2 ] → E[(D(tr) )2 ] < ∞.
Exercise 9.48 (Single overlap in random intersection graph) Consider the random intersection graph with
prescribed communities as defined in Section 9.4.3, under the conditions of Theorem 9.21. Show that it is
unlikely for a uniform vertex to have a neighbor with which it shares two groups.
Exercise 9.49 (Local clustering in the random intersection graph) Consider the random intersection graph
with prescribed communities as defined in Section 9.4.3, under the conditions of Theorem 9.21. Show that
the local clustering coefficient converges. When is this limit strictly positive?
Exercise 9.50 (Global clustering in the random intersection graph) Consider the random intersection
graph with prescribed communities as defined in Section 9.4.3, under the conditions of Theorem 9.21. What
are the conditions on the group membership and size distributions that imply that the convergence of the
global clustering coefficient in Theorem 2.22 follows? When is the limit of the global clustering coefficient
strictly positive?
Exercise 9.51 (Degree distribution in the discrete small-world model) Recall the discrete small-world
model in Section 9.5.1 as studied in Theorem 9.26, but now with λ, ρ > 0 and k fixed. What is the limit of
the probability that a uniform vertex has degree l for l ≥ 0?
Exercise 9.52 (Degree distribution in the geometric preferential attachment model with non-uniform loca-
tions) Recall that (9.5.50) in Theorem 9.40 identifies the degree distribution of the geometric preferential
attachment model at each of the elements zi ∈ S. Conclude what the degree distribution is of the entire
graph. Does it obey a degree power law, and, if so, what is the degree power-law exponent?
Exercise 9.53 (Power-law degrees for the spatial preferential attachment model with influence) Prove
that, for pk in (9.5.55) and for k large, we have
pk = ck−(1+1/(pa1 )) (1 + o(1)), (9.7.8)
so that the spatial preferential attachment model with influence indeed has a power-law degree distribution.
Exercise 9.54 (Degree distribution in the GIRG) Investigate the degree distribution in the GIRG in The-
orem 9.32 using a second-moment method.
Exercise 9.55 (Degree moments in scale-free percolation (Deijfen et al. (2013))) Recall that o denotes the
degree of the origin in the scale-free percolation model defined in (9.5.58). Show that E[Dop ] < ∞ when
p < τ − 1 and E[Dop ] = ∞ when p > τ − 1. In particular, the variance of the degrees is finite precisely
when τ > 3.
Exercise 9.56 (Positive correlation between edge statuses in scale-free percolation) Show that, for scale-
free percolation, for all x, y, z distinct, and for λ > 0,
P({x, y} and {x, z} occupied) ≥ P({x, y} occupied) P({x, z} occupied), (9.7.9)
the inequality being strict when P(Wo = 0) < 1. In other words, the edge statuses are positively correlated.
Exercise 9.57 (Local convergence of PageRank) Assume that Gn converges locally in probability in the
marked backward sense to (G, o) ∼ µ. Use (9.6.2) to show that, whp, and for every η, ε > 0,
1 X
1{Rv(Gn ) >r} ≤ µ(Ro(G) > r − ε) + η, (9.7.10)
n
v∈V (Gn )
and
1
1{Rv(Gn ) >r} ≥ µ(Ro(G) > r) − η.
X
(9.7.11)
n
v∈V (Gn )
462 Related Models
Exercise 9.58 (Local convergence of PageRank (cont.)) Use Exercise 9.57 to complete the proof of The-
orem 9.1(ii).
A PPENDIX A
M ETRIC S PACE S TRUCTURE OF ROOTED G RAPHS
Abstract
In this appendix we highlight some properties of and results about metric
spaces, including separable metric spaces and Borel measures on them, as used
throughout this book. We also present some missing details in the proof that
the space of rooted graphs is a separable metric space. Finally, we discuss what
compact sets look like in this topology and relate this to tightness criteria.
463
464 Metric Space Structure of Rooted Graphs
as in (2.2.2) and (2.2.3) for rooted graphs. Thus, the topology on rooted graphs can be seen as a special case
of a local topology. In a similar way, the metric on marked rooted graphs in (2.3.14) and (2.3.15) can be
viewed as a local topology. In this section we discuss local topologies in the general setting.
We next show that local topologies form a Polish space:
Theorem A.6 (Local topologies form a Polish space) Assume that {[x]r : x ∈ X } is Polish for every
r ≥ 1. Then the space (X , dloc ) is a Polish space, that is, (X , dloc ) is a metric, separable, and complete
space. Furthermore, a subset A ⊂ X is pre-compact (meaning that its closure is compact) if and only if the
sets {[x]r : x ∈ A} are pre-compact for every r ≥ 0.
Proof Let us first show that dloc is a distance. The symmetry and triangle inequality will then follow
directly. The fact that dloc (x, y) = 0 precisely when x = y is also easy, since if [x]r = [y]r for all r > 0
then x = y by (A.2.1).
We next show the separability of (X , dloc ). For any x ∈ X , we have dloc (x, [x]r ) ≤ 1/(r + 1) and
we have assumed that the set {[x]r : x ∈ X } of all restrictions of elements in X to radius r is separable.
Thus, (X , dloc ) arises as a union over r of dense countable sets in {[x]r : x ∈ X }, so that (X , dloc ) itself
is countable and dense for dloc .
For the completeness of (X , dloc ), we let (xn )n≥1 be a Cauchy sequence for dloc . Then, for every r,
the restriction [xn ]r is again Cauchy and, by the completeness of {[x]r : x ∈ X }, [xn ]r thus converges for
dX to a certain element yr ∈ {[x]r : x ∈ X }. By the continuity of x 7→ [x]r we deduce that [ys ]r = yr
for any s ≥ r, and so by the coherence property (A.2.1), we can define a unique element y ∈ X such that
yr = [y]r . It is then clear that xn → y for dloc .
We complete the proof by characterizing the compacts. The condition in the theorem is clearly necessary
for A to be pre-compact, for otherwise there exists r0 ≥ 0 and a sequence (xn )n≥1 in A whose restrictions
of radius r0 are all at a distance at most ε from each other. Such a sequence cannot admit a convergent
subsequence. Conversely, a subset A satisfying the condition of the theorem is easily seen to be pre-compact
for dloc : just cover it with restrictions of radius 1/(r + 1) centered on a 1/(r + 1)-net for dX of A to get a
1/(r + 1)-net for dloc .
We next proceed to discuss the convergence of random variables on (X , dloc ). We first recall that a
random variable X is a measurable function from the underlying probability space (Ω, F , P) with values in
the Polish space (X , dloc ) endowed with the Borel σ-field denoted by Bloc . Therefore, the natural notion of
convergence in distribution states that the sequence of random variables (Xn )n≥0 converges in distribution
d
(for the local topology) towards a random variable X, which we denote as Xn −→ X, if, for any bounded
continuous function h : X → R,
E[h(Xn )] → E[h(X)]. (A.2.3)
The main result of this section is the following theorem:
Theorem A.7 (Convergence of finite-dimensional distributions implies tightness) Assume that {[x]r : x ∈
X } is Polish for every r ≥ 1. The local topology satisfies the following properties:
(a) A family (Xi )i∈I of random variables with values in X is tight in the local topology if and only if the
family ([Xi ]r )i∈I is tight for every r ≥ 0.
(b) Let X1 and X2 be two random variables with values in (X , dloc ) such that P([X1 ]r ∈ A) =
d
P([X2 ]r ∈ A) for any A ∈ Bloc and any r ≥ 0. Then X1 = X2 .
A.3 Properties of the Metric dG? on Rooted Graphs 465
d
(c) Xn −→ X in the local topology when, for every r ≥ 1 and Borel sets A ∈ Bloc ,
Theorem A.7 is remarkable, since the convergence of P([Xn ]r ∈ A) is equivalent to the convergence of
finite-dimensional distributions. Normally, one would expect to need this convergence to be combined with
tightness to obtain convergence in distribution. Owing to the special nature of the local topology introduced
in this section (recall also Remark A.5), however, this convergence, combined with the fact that the limit is
a probability measure, implies tightness.
Proof of Theorem A.7 Part (a) follows directly from the compactness statement in Theorem A.6.
For part (b), we consider the family of events
n o
M = {x ∈ X : [x]r ∈ A} : A ∈ Bloc , r ≥ 0 . (A.2.5)
It is easy to see that the family M generates the Borel σ-field on X and moreover that M is stable under
finite intersections. It follows from the monotone class theorem that two random variables X1 and X2
agreeing on M have the same law.
For part (c), above we have already seen that the sets {[x]r ∈ A} are stable under finite intersections,
and it is easy to see that any open sets of the local topology can be written as a countable union of those sets.
The result then follows from (Billingsley, 1968, Theorem 2.2). In particular, we deduce that for a sequence
of random variables (Xn )n≥1 to converge in distribution it is necessary and sufficient that (Xn )n≥1 be tight
and that P([Xn ]r ∈ A) converges for every r ≥ 0 and every A ∈ Bloc . The two conditions are necessary,
d
since if P([Xn ]r ∈ A) converges to a limit that does not have full mass, then Xn −→ X does not hold.
d
From this, we conclude that if P([Xn ]r ∈ A) → µ([X]r ∈ A) where µ has full mass then Xn −→ X
does indeed follow.
denote the set of equivalence classes in G? . This is the set on which the distance dG? acts.
In this section we prove that ([G? ], dG? ) is a Polish space:
Theorem A.8 (Rooted graphs form a Polish space) dG? is a well defined metric on [G? ]. Further, the
metric space ([G? ], dG? ) is Polish.
We give an explicit proof of Theorem A.8, even though completeness and separability might also be
concluded from Theorem A.6, together with the observation that {[G, o]r : (G, o) ∈ G? } is Polish for
every r ≥ 1. This must be the case since completeness is obvious while separability follows because
dG? ((Gn , on ), (Gm , om )) ≤ ε implies that Br(Gn ) (on ) ' Br(Gm ) (om ) for all r ≤ 1/ε − 1.
The proof of Theorem A.8 is divided into several steps. These proof steps are a little involved, since
we need to deal with isomorphism classes of rooted graphs, rather than rooted graphs themselves. This
requires us to show that statements hold irrespective of the representative rooted graph chosen. We start in
Proposition A.10 below by showing that dG? is an ultrametric, which is a slightly stronger property than
being a metric, and which also implies that (G? , dG? ) is a metric space. In Proposition A.12, we show that
the metric space (G? , dG? ) is complete, and in Proposition A.14, we show that it is separable. After that,
we can complete the proof of Theorem A.8.
In the remainder of this section, we often work with r-neighborhoods Br(G) (o) of o in G. We emphasize
that we consider Br(G) (o) to be a rooted graph, with root o (recall (2.2.1)).
466 Metric Space Structure of Rooted Graphs
Our aim is to use (ψr )r≥0 to construct an isomorphism between (G1 , o1 ) and (G2 , o2 ).
(G ) (G ) (G )
Set Vr 1 = V (Br 1 (o1 )). Let ψr |V (G1 ) be the restriction of ψr to V0 1 = {o1 }. Then we know that
0
(G1 ) (G1 )
ψr (v) = o2 for every v ∈ V0 and r ≥ 0. We next let ψr |V (G1 ) be the restriction of ψr to V1 . Then,
1
(G1 ) (G2 )
ψr |V (G1 ) is an isomorphism between B1 (o1 ) and B1 (o2 ) for every r. Since there are only finitely
1
many such isomorphisms, the same isomorphism, say φ01 , needs to be repeated infinitely many times in the
sequence (ψr |V (G1 ) )r≥1 . Let N1 denote the values of r for which
1
(G1 )
Now we extend this argument to k = 2. Let ψr |V (G1 ) be the restriction of ψr to V2 . Again, ψr |V (G1 )
2 2
(G1 ) (G2 )
is an isomorphism between B2 (o1 ) and B2 (o2 ) for every r. Since there are again only finitely many
such isomorphisms, the same isomorphism, say φ02 , needs to be repeated infinitely many times in the se-
quence (ψr |V (G1 ) )r∈N1 . Let N2 denote the values of r ∈ N1 for which
2
(G1 )
We next generalize this argument to general k ≥ 2. Let ψr |V (G1 ) be the restriction of ψr to Vk .
k
(G1 ) (G2 )
Again, ψr |V (G1 ) is an isomorphism between Bk (o1 ) and Bk (o2 ) for every r. Since there are again
k
only finitely many such isomorphisms, the same isomorphism, say φ0k , needs to be repeated infinitely many
times in the sequence (ψr |V (G1 ) )r∈Nk−1 . Let Nk denote the values of r ∈ Nk−1 for which
k
φk (v)1{v∈Uk } .
0
X 0
ψ(v) = ψ∞ (v) = (A.3.7)
k≥1
We claim that ψ is the desired isomorphism between (G1 , o1 ) and (G2 , o2 ). The map ψ is clearly
bijective, since φ0k : Uk → φ0k (Uk ) is bijective. Further, let u, v ∈ V (G1 ). Denote
k = max{distG1 (o1 , u), distG1 (o1 , v)}.
Then u, v ∈ Vk 1 . Because φ0k is an isomorphism between Bk 1 (o1 ) and Bk 2 (o2 ), it follows that
(G ) (G ) (G )
φ0k (u), φ0k (v) ∈ V (Bk 2 (o2 )), and further that {φ0k (u), φ0k (v)} ∈ E(Bk 2 (o2 )) precisely when {u, v} ∈
(G ) (G )
(G1 ) 0 (G1 ) (G )
E(Bk (o1 )). Since ψ = φk on Vk , it then also follows that {ψ(u), ψ(v)} ∈ E(Bk 2 (o2 )) precisely
(G1 )
when {u, v} ∈ E(Bk (o1 )), as required. Finally, ψ(o1 ) = φk (o1 ) and φk (o1 ) = o2 for every k ≥ 0.
This completes the proof.
Proof of Proposition A.9. We note that if (Ĝ1 , ô1 ) ' (G1 , o1 ) and (Ĝ2 , ô2 ) ' (G2 , o2 ) then we have that
(G ) (G ) (Ĝ ) (Ĝ )
Br 1 (o1 ) ' Br 2 (o2 ) if and only if Br 1 (ô1 ) ' Br 2 (ô2 ). Therefore dG? ((G1 , o1 ), (G2 , o2 )) is in-
dependent of the exact choice of representative in the equivalence class of (G1 , o1 ) and (G2 , o2 ). In partic-
ular, dG? ((G1 , o1 ), (G2 , o2 )) is constant on such equivalence classes. This makes dG? ([G1 , o1 ], [G2 , o2 ])
well defined for [G1 , o1 ], [G2 , o2 ] ∈ [G? ].
(G )
Proof of Proposition A.10. (a) Assume that dG? ((G1 , o1 ), (G2 , o2 )) = 0. Then we have Br 1 (o1 ) '
(G )
Br 2 (o2 ) for all r ≥ 0, so that, by Lemma A.11, we also have that (G1 , o1 ) ' (G2 , o2 ) as required.
The proof of (b) is trivial and omitted.
For (c) and i, j ∈ [3], let
(Gj )
rij = sup{r : Br(Gi ) (oi ) ' Br (oj )}. (A.3.8)
(G ) (G ) (G ) (G )
Then Br 1 (o1 ) ' Br 3 (o3 ) for all r ≤ r13 and Br 2 (o2 ) ' Br 3 (o3 ) for all r ≤ r23 . We conclude
(G ) (G )
that Br 1 (o1 ) ' Br 2 (o2 ) for all r ≤ min{r13 , r23 }, so that r12 ≥ min{r13 , r23 }. This implies that
1/(r12 + 1) ≤ max{1/(r13 + 1), 1/(r23 + 1)},
which in turn implies the claim (recall (2.2.2)).
φr : V (Br(Gr ) (or )) → Vr recursively as follows. Let φ0 be the unique isomorphism from V (B0(Gr ) (o0 )) =
{o0 } to V0 = {1}.
(Gr )
Let ψr be an isomorphism between (Gr−1 , or−1 ) and Br−1 (or ), and ηr an arbitrary bijection between
(Gr )
V (Gr ) \ V (Br−1 (or )) to Vr \ Vr−1 . Define
(
φr−1 (ψr−1 (v)) for v ∈ V (Br−1 (Gr )
(or ));
φr (v) = (Gr ) (A.3.9)
ηr (v) for v ∈ V (Gr ) \ V (Br−1 (or )),
Proof of Theorem A.8. The function dG? is well defined on [G? ] × [G? ] by Proposition A.9. Proposition
A.10 implies that dG? is an (ultra)metric on [G? ]. Finally, Proposition A.12 proves that ([G? ], dG? ) is com-
plete, while Proposition A.14 proves that ([G? ], dG? ) is separable. Thus, ([G? ], dG? ) is a Polish space.
Thus, H? (r) contains those rooted graphs whose r-neighborhood is the same as that of a rooted graph in
H? . Clearly, H? (r) & H? as r → ∞. Therefore, also µ(H? (r)) & µ(H? ) and µ0 (H? (r)) & µ0 (H? ).
Finally, note that (G, o) ∈ H? (r) if and only if Br(G) (o) ∈ H? (r). Thus,
X
µ(H? (r)) = µ(Br(G) (o) ' H? ) (A.3.16)
H? ∈H? (r)
(where we realize that the fact that the sum is over equivalence classes makes the events {Br(G) (o) ' H? }
disjoint). Since µ(Br(G) (o) ' H? ) = µ0 (Br(G) (o) ' H? ), we conclude that
X X
µ(H? (r)) = µ(Br(G) (o) ' H? ) = µ0 (Br(G) (o) ' H? ) = µ0 (H? (r)), (A.3.17)
H? ∈H? (r) H? ∈H? (r)
where d(G)
v denotes the degree of v ∈ V (G). Then, a closed family of equivalence classes of rooted graphs
[K] ⊆ [G? ] is compact if and only if
Proof Recall from (Rudin, 1991, Theorem A.4) that a closed set K is compact when it is totally bounded,
meaning that, for every ε > 0, the set K can be covered by finitely many balls of radius ε. As a result, for
470 Metric Space Structure of Rooted Graphs
every r ≥ 1, there must be graphs (F1 , o1 ), . . . , (F` , o` ) such that K is covered by the finitely many open
sets
{(G, o) : Br(G) (o) ' Br(Fi ) (oi )}. (A.3.20)
(Fi )
Equivalently, every (G, o) ∈ K satisfies Br(G) (o) ' Br (oi ) for some i ∈ [`]. In turn, this is equivalent
to the statement that the set
Ar = {Br(G) (o) : (G, o) ∈ K} (A.3.21)
is finite for every r ≥ 1.
We finally prove that Ar is finite for every r ≥ 1 precisely when (A.3.19) holds. Denote ∆r =
sup(G,o)∈K ∆r (G, o). If ∆r is finite for every r ≥ 1, then, because every (G, o) ∈ K is connected,
the graphs Br(G) (o) can have at most
By assumption, limd→∞ f (d) = 0. Write m(G) = E do(G) ]. Thus, 1 ≤ m(G) ≤ f (0) < ∞. Write µ?G
for the degree-biased probability measure on {(G, v) : v ∈ V (G)}, that is,
d(G)
µ?G [(G, v)] = v
× µG [(G, v)], (A.3.23)
m(G)
and oG for the corresponding root. Since µG ≤ m(G)µ?G ≤ f (0)µ?G , it suffices to show that {µ?G : G ∈ A}
is tight. Note that {d(G)
oG : G ∈ A} is tight by assumption.
For r ∈ N, let FrM (v) be the event such that there is some vertex at distance at most r from v whose
A.4 Notes and Discussion 471
degree is larger than M . Let X be a uniform random neighbor of oG . Because µ?G is a stationary measure
for a simple random walk, FrM (oG ) and FrM (X) have the same probability. Also,
P Fr+1
M
(oG ) d(G)
oG oG P Fr (X) doG .
≤ d(G) M (G)
(A.3.24)
We claim that, for all r ∈ N and ε > 0, there exists M < ∞ such that P FrM (X) ≤ ε for all G ∈ A.
?
This clearly implies that {µG : G ∈ A} is tight. We prove the claim by induction on r.
The statement for r = 0 is trivial. Given that the property holds for r, let us now show it for r + 1. Given
ε > 0, choose d sufficiently large that P(d(G)
oG > d) ≤ ε/2 for all G ∈ A. Also, choose M sufficiently large
that P(FrM (oG )) ≤ ε/(2d) for all G ∈ A. Write F for the event that d(G) oG > d. Then, by conditioning on
d(G)
oG , we see that
≤ ε/2 + E 1F c do(G)
h i
G
P FrM (oG ) d(G) oG
Abbe, E., and Sandon, C. 2018. Proof of the achievability conjectures for the general stochastic block
model. Comm. Pure Appl. Math., 71(7), 1334–1406.
Abdullah, M. A., Bode, M., and Fountoulakis, N. 2017. Typical distances in a geometric model for complex
networks. Internet Math., pp. 38.
Achlioptas, D., D’Souza, R., and Spencer, J. 2009. Explosive percolation in random networks. Science,
323(5920), 1453–1455.
Aiello, W., Bonato, A., Cooper, C., Janssen, J., and Pralat, P. 2008. A spatial web graph model with local
influence regions. Internet Math., 5(1-2), 175–196.
Aldous, D. 1985. Exchangeability and related topics. Pages 1–198 of: École d’été de probabilités de
Saint-Flour, XIII–1983. Lecture Notes in Math., vol. 1117. Springer.
Aldous, D. 1991. Asymptotic fringe distributions for general families of random trees. Ann. Appl. Probab.,
1(2), 228–266.
Aldous, D., and Lyons, R. 2007. Processes on unimodular random networks. Electron. J. Probab., 12(54),
1454–1508.
Aldous, D., and Steele, J.M. 2004. The objective method: probabilistic combinatorial optimization and
local weak convergence. Pages 1–72 of: Probability on discrete structures. Encyclopaedia Math. Sci.,
vol. 110. Springer.
Anantharam, V., and Salez, J. 2016. The densest subgraph problem in sparse random graphs. Ann. Appl.
Probab., 26(1), 305–327.
Andreis, L., König, W., and Patterson, R. 2021. A large-deviations principle for all the cluster sizes of a
sparse Erdős–Rényi graph. Random Structures Algorithms, 59(4), 522–553.
Andreis, L., König, W., and Patterson, R. 2023. A large-deviations principle for all the components in a
sparse inhomogeneous random graph. Probab. Theory Rel. Fields, 186(1-2), 521–620.
Angel, O., and Schramm, O. 2003. Uniform infinite planar triangulations. Comm. Math. Phys., 241(2-3),
191–213.
Antunović, T., Mossel, E., and Rácz, M. 2016. Coexistence in preferential attachment networks. Combin.
Probab. Comput., 25(6), 797–822.
Arratia, R., Barbour, AD., and Tavaré, S. 2003. Logarithmic combinatorial structures: a probabilistic
approach. EMS Monographs in Mathematics. European Mathematical Society (EMS), Zürich.
Artico, I., Smolyarenko, I., Vinciotti, V., and Wit, EC. 2020. How rare are power-law networks really?
Proc. Roy. Soc. A, 476(2241), 20190742.
Athreya, K., and Ney, P. 1972. Branching processes. New York: Springer-Verlag. Die Grundlehren der
mathematischen Wissenschaften, Band 196.
Backhausz, Á., and Szegedy, B. 2022. Action convergence of operators and graphs. Canad. J. Math., 74(1),
72–121.
Ball, F., and Neal, P. 2002. A general model for stochastic SIR epidemics with two levels of mixing. Math.
Biosci., 180, 73–102. John A. Jacquez memorial volume.
Ball, F., and Neal, P. 2004. Poisson approximations for epidemics with two levels of mixing. Ann. Probab.,
32(1B), 1168–1200.
Ball, F., and Neal, P. 2008. Network epidemic models with two levels of mixing. Math. Biosci., 212(1),
69–87.
Ball, F., and Neal, P. 2017. The asymptotic variance of the giant component of configuration model random
graphs. Ann. Appl. Probab., 27(2), 1057–1092.
Ball, F., Mollison, D., and Scalia-Tomba, G. 1997. Epidemics with two levels of mixing. Ann. Appl.
Probab., 7(1), 46–89.
Ball, F., Sirl, D., and Trapman, P. 2009. Threshold behaviour and final outcome of an epidemic on a random
network with household structure. Adv. Appl. Probab., 41(3), 765–796.
Ball, F., Sirl, D., and Trapman, P. 2010. Analysis of a stochastic SIR epidemic on a random network
incorporating household structure. Math. Biosci., 224(2), 53–73.
473
474 References
Banerjee, S., and Olvera-Cravioto, M. 2022. PageRank asymptotics on directed preferential attachment
networks. Ann. Appl. Probab., 32(4), 3060–3084.
Banerjee, S., Deka, P., and Olvera-Cravioto, M. 2023. Local weak limits for collapsed branching processes
with random out-degreess. arXiv:2302.00562 [math.PR].
Barabási, A.-L. 2002. Linked: The new science of networks. Perseus Publishing.
Barabási, A.-L. 2016. Network science. Cambridge University Press.
Barabási, A.-L. 2018. Love is all you need: Clauset’s fruitless search for scale-free networks. Blog post
available at www.barabasilab.com/post/love-is-all-you-need.
Barabási, A.-L., and Albert, R. 1999. Emergence of scaling in random networks. Science, 286(5439),
509–512.
Barbour, A. D., and Reinert, G. 2001. Small worlds. Random Structures Algorithms, 19(1), 54–74.
Barbour, A. D., and Reinert, G. 2004. Correction: “Small worlds” [Random Structures Algorithms 19(1)
(2001) 54–74; MR1848027]. Random Structures Algorithms, 25(1), 115.
Barbour, A. D., and Reinert, G. 2006. Discrete small world networks. Electron. J. Probab., 11(47), 1234–
1283 (electronic).
Barbour, A. D., and Röllin, A. 2019. Central limit theorems in the configuration model. Ann. Appl. Probab.,
29(2), 1046–1069.
Beirlant, J., Goegebeur, Y., Segers, J., and Teugels, J. 2006. Statistics of extremes: theory and applications.
John Wiley and Sons.
Bender, E. A., and Canfield, E. R. 1978. The asymptotic number of labelled graphs with given degree
sequences. J. Combin. Theory (A), 24, 296–307.
Benjamini, I., and Schramm, O. 2001. Recurrence of distributional limits of finite planar graphs. Electron.
J. Probab., 6(23), 13 pp. (electronic).
Benjamini, I., Kesten, H., Peres, Y., and Schramm, O. 2004. Geometry of the uniform spanning forest:
transitions in dimensions 4, 8, 12, ... Ann. Math. (2), 160(2), 465–491.
Benjamini, I., Lyons, R., and Schramm, O. 2015. Unimodular random trees. Ergodic Theory Dynam.
Systems, 35(2), 359–373.
Berger, N. 2002. Transience, recurrence and critical behavior for long-range percolation. Comm. Math.
Phys., 226(3), 531–558.
Berger, N., Borgs, C., Chayes, J. T., D’Souza, R. M., and Kleinberg, R. D. 2004. Competition-induced
preferential attachment. Pages 208–221 of: Automata, languages and programming. Lecture Notes in
Comput. Sci., vol. 3142. Springer.
Berger, N., Borgs, C., Chayes, J. T., D’Souza, R. M., and Kleinberg, R. D. 2005. Degree distribution of
competition-induced preferential attachment graphs. Combin. Probab. Comput., 14(5-6), 697–721.
Berger, N., Borgs, C., Chayes, J., and Saberi, A. 2014. Asymptotic behavior and distributional limits of
preferential attachment graphs. Ann. Probab., 42(1), 1–40.
Bhamidi, S., Bresler, G., and Sly, A. 2008. Mixing time of exponential random graphs. Pages 803–812 of:
FOCS ’08: Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science.
IEEE Computer Society.
Bhamidi, S., Bresler, G., and Sly, A. 2011. Mixing time of exponential random graphs. Ann. Appl. Probab.,
21(6), 2146–2170.
Bhamidi, S., Evans, S., and Sen, A. 2012. Spectra of large random trees. J. Theoret. Probab., 25(3),
613–654.
Bhattacharya, A., Chen, B., van der Hofstad, R., and Zwart, B. 2020. Consistency of the PLFit estimator
for power-law data. arXiv:2002.06870 [math.PR].
Billingsley, P. 1968. Convergence of probability measures. John Wiley and Sons.
Bingham, N. H., Goldie, C. M., and Teugels, J. L. 1989. Regular variation. Encyclopedia of Mathematics
and its Applications, vol. 27. Cambridge University Press.
Biskup, M. 2004. On the scaling of the chemical distance in long-range percolation models. Ann. Probab.,
32(4), 2938–2977.
Biskup, M., and Lin, J. 2019. Sharp asymptotic for the chemical distance in long-range percolation. Random
Structures Algorithms, 55(3), 560–583.
References 475
Bläsius, T., Friedrich, T., and Krohmer, A. 2018. Cliques in hyperbolic random graphs. Algorithmica,
80(8), 2324–2344.
Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. 2008. Fast unfolding of communities in
large networks. J. Statis. Mech.: Theory and Experiment, 2008(10).
Bloznelis, M. 2009. A note on log log distances in a power law random intersection graph. arXiv:0911.5127
[math.PR].
Bloznelis, M. 2010a. Component evolution in general random intersection graphs. SIAM J. Discrete Math.,
24(2), 639–654.
Bloznelis, M. 2010b. The largest component in an inhomogeneous random intersection graph with cluster-
ing. Electron. J. Combin., 17(1), Research Paper 110, 17.
Bloznelis, M. 2013. Degree and clustering coefficient in sparse random intersection graphs. Ann. Appl.
Probab., 23(3), 1254–1289.
Bloznelis, M., Götze, F., and Jaworski, J. 2012. Birth of a strongly connected giant in an inhomogeneous
random digraph. J. Appl. Probab., 49(3), 601–611.
Bloznelis, M., Godehardt, E., Jaworski, J., Kurauskas, V., and Rybarczyk, K. 2015. Recent progress in com-
plex network analysis: models of random intersection graphs. Pages 69–78 of: Data science, learning
by latent structures, and knowledge discovery. Springer.
Bode, M., Fountoulakis, N., and Müller, T. 2015. On the largest component of a hyperbolic model of
complex networks. Electron. J. Combin., 22(3), Paper 3.24, 46.
Boguná, M., Papadopoulos, F., and Krioukov, D. 2010. Sustaining the internet with hyperbolic mapping.
Nature Commun., 1(1), 1–8.
Bohman, T., and Frieze, A. 2001. Avoiding a giant component. Random Structures Algorithms, 19(1),
75–85.
Bohman, T., and Frieze, A. 2002. Addendum to “Avoiding a giant component” [Random Structures Algo-
rithms 19(1) (2001), 75–85; MR1848028]. Random Structures Algorithms, 20(1), 126–130.
Boldi, P., and Vigna, S. 2004. The WebGraph Framework I: compression techniques. Pages 595–601 of:
Proc. 13th International World Wide Web Conference (WWW 2004). ACM Press.
Boldi, P., Rosa, M., Santini, M., and Vigna, S. 2011. Layered label propagation: a multiresolution
coordinate-free ordering for compressing social networks. Pages 587–596 of: Proceedings of the 20th
International Conference on the World Wide Web. ACM Press.
Bollobás, B. 1980. A probabilistic proof of an asymptotic formula for the number of labelled regular graphs.
European J. Combin., 1(4), 311–316.
Bollobás, B. 2001. Random graphs. Second edn. Cambridge Studies in Advanced Mathematics, vol. 73.
Cambridge University Press.
Bollobás, B., and Fernandez de la Vega, W. 1982. The diameter of random regular graphs. Combinatorica,
2(2), 125–134.
Bollobás, B., and Riordan, O. 2004a. The diameter of a scale-free random graph. Combinatorica, 24(1),
5–34.
Bollobás, B., and Riordan, O. 2004b. Shortest paths and load scaling in scale-free trees. Phys. Rev. E, 69,
036114.
Bollobás, B., and Riordan, O. 2006. Percolation. Cambridge University Press.
Bollobás, B., and Riordan, O. 2015. An old approach to the giant component problem. J. Combin. Theory
Ser. B, 113, 236–260.
Bollobás, B., Riordan, O., Spencer, J., and Tusnády, G. 2001. The degree sequence of a scale-free random
graph process. Random Structures Algorithms, 18(3), 279–290.
Bollobás, B., Janson, S., and Riordan, O. 2005. The phase transition in the uniformly grown random graph
has infinite order. Random Structures Algorithms, 26(1-2), 1–36.
Bollobás, B., Janson, S., and Riordan, O. 2007. The phase transition in inhomogeneous random graphs.
Random Structures Algorithms, 31(1), 3–122.
Bollobás, B., Janson, S., and Riordan, O. 2011. Sparse random graphs with clustering. Random Structures
Algorithms, 38(3), 269–323.
Bordenave, C. 2016. Lecture notes on random graphs and probabilistic combinatorial optimization. Version
April 8, 2016. Available at www.math.univ-toulouse.fr/˜bordenave/coursRG.pdf.
476 References
Bordenave, C., and Caputo, P. 2015. Large deviations of empirical neighborhood distribution in sparse
random graphs. Probab. Theory Rel. Fields, 163(1-2), 149–222.
Bordenave, C., and Lelarge, M. 2010. Resolvent of large random graphs. Random Structures Algorithms,
37(3), 332–352.
Bordenave, C., Lelarge, M., and Salez, J. 2011. The rank of diluted random graphs. Ann. Probab., 39(3),
1097–1121.
Bordenave, C., Lelarge, M., and Salez, J. 2013. Matchings on infinite graphs. Probab. Theory Rel. Fields,
157(1-2), 183–208.
Bordenave, C., Lelarge, M., and Massoulié, L. 2018. Nonbacktracking spectrum of random graphs: com-
munity detection and nonregular Ramanujan graphs. Ann. Probab., 46(1), 1–71.
Box, G. E. P. 1976. Science and statistics. J. Amer. Statist. Assoc., 71(356), 791–799.
Box, G. E. P. 1979. Robustness in the strategy of scientific model building. Pages 201–236 of: Robustness
in statistics. Elsevier.
Bringmann, K., Keusch, R., and Lengler, J. 2017. Sampling geometric inhomogeneous random graphs in
linear time. In: Proceeding of the 25th Annual European Symposium on Algorithms (ESA 2017). Schloss
Dagstuhl-Leibniz-Zentrum fuer Informatik.
Bringmann, K., Keusch, R., and Lengler, J. 2019. Geometric inhomogeneous random graphs. Theoret.
Comput. Sci., 760, 35–54.
Bringmann, K., Keusch, R., and Lengler, J. 2020. Average distance in a general class of scale-free networks
with underlying geometry. arXiv: 1602.05712 [cs.DM].
Britton, T., Deijfen, M., and Martin-Löf, A. 2006. Generating simple random graphs with prescribed degree
distribution. J. Statist. Phys., 124(6), 1377–1397.
Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., and Wiener, J.
2000. Graph structure in the Web. Computer Networks, 33, 309–320.
Broido, A., and Clauset, A. 2019. Scale-free networks are rare. Nature Commun., 10(1), 1017.
Cai, X. S., and Perarnau, G. 2021. The giant component of the directed configuration model revisited.
ALEA Lat. Am. J. Probab. Math. Statist., 18(2), 1517–1528.
Cai, X. S., and Perarnau, G. 2023. The diameter of the directed configuration model. Ann. Inst. Henri
Poincaré Probab. Stat., 59(1), 244–270.
Callaway, D. S., Hopcroft, J. E., Kleinberg, J. M., Newman, M. E. J., and Strogatz, S. H. 2001. Are randomly
grown graphs really random? Phys. Rev. E, 64, 041902.
Cao, J., and Olvera-Cravioto, M. 2020. Connectivity of a general class of inhomogeneous random digraphs.
Random Structures Algorithms, 56(3), 722–774.
Caravenna, F., Garavaglia, A., and van der Hofstad, R. 2019. Diameter in ultra-small scale-free random
graphs. Random Structures Algorithms, 54(3), 444–498.
Chakraborty, S., van der Hofstad, R., and den Hollander, F. 2021. Sparse random graphs with many trian-
gles. arXiv:2112.06526 [math.PR].
Chatterjee, S. 2017. Large deviations for random graphs. Lecture Notes in Mathematics, vol. 2197.
Springer. Lecture notes from the 45th Probability Summer School held in Saint-Flour, June 2015.
Chatterjee, S., and Diaconis, P. 2013. Estimating and understanding exponential random graph models.
Ann. Statist., 41(5), 2428–2461.
Chatterjee, S., and Durrett, R. 2009. Contact processes on random graphs with power law degree distribu-
tions have critical value 0. Ann. Probab., 37(6), 2332–2356.
Chatterjee, S., and Varadhan, S. R. S. 2011. The large deviation principle for the Erdős–Rényi random
graph. European J. Combin., 32(7), 1000–1017.
Chen, N., and Olvera-Cravioto, M. 2013. Directed random graphs with given degree distributions. Stoch.
Syst., 3(1), 147–186.
Chung, F., and Lu, L. 2001. The diameter of sparse random graphs. Adv. in Appl. Math., 26(4), 257–279.
Chung, F., and Lu, L. 2002a. The average distances in random graphs with given expected degrees. Proc.
Natl. Acad. Sci. USA, 99(25), 15879–15882 (electronic).
Chung, F., and Lu, L. 2002b. Connected components in random graphs with given expected degree se-
quences. Ann. Comb., 6(2), 125–145.
References 477
Chung, F., and Lu, L. 2003. The average distance in a random graph with given expected degrees. Internet
Math., 1(1), 91–113.
Chung, F., and Lu, L. 2004. Coupling online and offline analyses for random power law graphs. Internet
Math., 1(4), 409–461.
Chung, F., and Lu, L. 2006a. Complex graphs and networks. CBMS Regional Conference Series in Math-
ematics, vol. 107.
Chung, F., and Lu, L. 2006b. The volume of the giant component of a random graph with given expected
degrees. SIAM J. Discrete Math., 20, 395–411.
Clauset, A., Shalizi, C., and Newman, M. E. J. 2009. Power-law distributions in empirical data. SIAM
Review, 51(4), 661–703.
Cohen, R., and Havlin, S. 2003. Scale-free networks are ultrasmall. Phys. Rev. Lett., 90, 058701, 1–4.
Colenbrander, D. 2022. Ultra-small world phenomenon in the directed configuration model. M.Phil. thesis,
Eindhoven University of Technology.
Collevecchio, A., Cotar, C., and LiCalzi, M. 2013. On a preferential attachment and generalized Pólya’s
urn model. Ann. Appl. Probab., 23(3), 1219–1253.
Cooper, C., and Frieze, A. 2004. The size of the largest strongly connected component of a random digraph
with a given degree sequence. Combin. Probab. Comput., 13(3), 319–337.
Corten, R. 2012. Composition and structure of a large online social network in the Netherlands. PLOS
ONE, 7(4), 1–8.
Coscia, M. 2021. The atlas for the aspiring network scientist. arXiv:2101.00863 [cs.CY].
Csárdi, G. 2006. Dynamics of citation networks. Pages 698–709 of: Proceedings of the International
Conference on Artificial Neural Networks 2006. Lecture Notes in Computer Science, vol. 4131. Springer.
Curien, N. 2018. Random graphs: the local convergence perspective. Version October 17, 2018. Available
at www.imo.universite-paris-saclay.fr/˜curien/enseignement.html.
Danielsson, J., de Haan, L., Peng, L., and de Vries, C. G. 2001. Using a bootstrap method to choose the
sample fraction in tail index estimation. J. Multivariate Anal., 76(2), 226–248.
Darling, D. A. 1970. The Galton–Watson process with infinite mean. J. Appl. Probab., 7, 455–456.
Davies, P. L. 1978. The simple branching process: a note on convergence when the mean is infinite. J. Appl.
Probab., 15(3), 466–480.
Decelle, A., Krzakala, F., Moore, C., and Zdeborová, L. 2011. Asymptotic analysis of the stochastic block
model for modular networks and its algorithmic applications. Phys. Rev. E, 84(6), 066106.
Deijfen, M. 2009. Stationary random graphs with prescribed iid degrees on a spatial Poisson process.
Electron. Commun. Probab., 14, 81–89.
Deijfen, M., and Jonasson, J. 2006. Stationary random graphs on Z with prescribed iid degrees and finite
mean connections. Electron. Commun. Probab., 11, 336–346 (electronic).
Deijfen, M., and Kets, W. 2009. Random intersection graphs with tunable degree distribution and clustering.
Probab. Engrg. Inform. Sci., 23(4), 661–674.
Deijfen, M., and Meester, R. 2006. Generating stationary random graphs on Z with prescribed independent,
identically distributed degrees. Adv. Appl. Probab., 38(2), 287–298.
Deijfen, M., van den Esker, H., van der Hofstad, R., and Hooghiemstra, G. 2009. A preferential attachment
model with random initial degrees. Ark. Mat., 47(1), 41–72.
Deijfen, M., Häggström, O., and Holroyd, A. 2012. Percolation in invariant Poisson graphs with i.i.d.
degrees. Ark. Mat., 50(1), 41–58.
Deijfen, M., van der Hofstad, R., and Hooghiemstra, G. 2013. Scale-free percolation. Ann. Instit. Henri
Poincaré (B) Prob. Statist., 49(3), 817–838.
Dembo, A., and Montanari, A. 2010a. Gibbs measures and phase transitions on sparse random graphs.
Braz. J. Probab. Statist., 24(2), 137–211.
Dembo, A., and Montanari, A. 2010b. Ising models on locally tree-like graphs. Ann. Appl. Probab., 20(2),
565–592.
Deprez, P., and Wüthrich, M. 2019. Scale-free percolation in continuum space. Commun. Math. Statist.,
7(3), 269–308.
Deprez, P., Hazra, R., and Wüthrich, M. 2015. Inhomogeneous long-range percolation for real-life network
modeling. Risks, 3(1), 1–23.
478 References
Dereich, S., and Mörters, P. 2009. Random networks with sublinear preferential attachment: degree evolu-
tions. Electron. J. Probab., 14, 1222–1267.
Dereich, S., and Mörters, P. 2011. Random networks with concave preferential attachment rule. Jahresber.
Dtsch. Math.-Ver., 113(1), 21–40.
Dereich, S., and Mörters, P. 2013. Random networks with sublinear preferential attachment: the giant
component. Ann. Probab., 41(1), 329–384.
Dereich, S., Mönch, C., and Mörters, P. 2012. Typical distances in ultrasmall random networks. Adv. Appl.
Probab., 44(2), 583–601.
Dereich, S., Mönch, C., and Mörters, P. 2017. Distances in scale free networks at criticality. Electron. J.
Probab., 22, Paper No. 77, 38.
Diaconis, P. 1992. Sufficiency as statistical symmetry. Pages 15–26 of: American Mathematical Society
centennial publications, Vol. II.
Ding, J., Kim, J. H., Lubetzky, E., and Peres, Y. 2010. Diameters in supercritical random graphs via first
passage percolation. Combin. Probab. Comput., 19(5-6), 729–751.
Ding, J., Kim, J. H., Lubetzky, E., and Peres, Y. 2011. Anatomy of a young giant component in the random
graph. Random Structures Algorithms, 39(2), 139–178.
Dommers, S., van der Hofstad, R., and Hooghiemstra, G. 2010. Diameters in preferential attachment graphs.
J. Statist. Phys., 139, 72–107.
Dorogovtsev, S. N., Mendes, J. F. F., and Samukhin, A. N. 2000. Structure of growing networks with
preferential linking. Phys. Rev. Lett., 85(21), 4633–4636.
Dort, L., and Jacob, E. 2023. Local weak limit of dynamical inhomogeneous random graphs. arXiv:
2303.17437 [math.PR].
Draisma, G., de Haan, L., Peng, L., and Pereira, T. 1999. A bootstrap-based method to achieve optimality
in estimating the extreme-value index. Extremes, 2(4), 367–404.
Drees, H., Janßen, A., Resnick, S., and Wang, T. 2020. On a minimum distance procedure for threshold
selection in tail analysis. SIAM J. Math. Data Sci., 2(1), 75–102.
Drmota, M. 2009. Random trees: an interplay between combinatorics and probability. Springer.
Durrett, R. 2003. Rigorous result for the CHKNS random graph model. Pages 95–104 of: Discrete random
walks (Paris, 2003). Association of Discrete Mathematics and Theoretical Computer Sciience.
Durrett, R. 2007. Random graph dynamics. Cambridge Series in Statistical and Probabilistic Mathematics.
Cambridge University Press.
Eckhoff, M., Goodman, J., van der Hofstad, R., and Nardi, F. R. 2013. Short paths for first passage perco-
lation on the complete graph. J. Statist. Phys., 151(6), 1056–1088.
Elek, G. 2007. On limits of finite graphs. Combinatorica, 27(4), 503–507.
Erdős, P., and Rényi, A. 1959. On random graphs. I. Publ. Math. Debrecen, 6, 290–297.
Erdős, P., and Rényi, A. 1960. On the evolution of random graphs. Magyar Tud. Akad. Mat. Kutató Int.
Közl., 5, 17–61.
Erdős, P., and Rényi, A. 1961a. On the evolution of random graphs. Bull. Inst. Internat. Statist., 38, 343–
347.
Erdős, P., and Rényi, A. 1961b. On the strength of connectedness of a random graph. Acta Math. Acad. Sci.
Hungar., 12, 261–267.
Erdős, P., Greenhill, C., Mezei, T., Miklós, I., Soltész, D., and Soukup, L. 2022. The mixing time of switch
Markov chains: a unified approach. European J. Combin., 99, Paper No. 103421, 46.
van den Esker, H., van der Hofstad, R., Hooghiemstra, G., and Znamenski, D. 2006. Distances in random
graphs with infinite mean degrees. Extremes, 8, 111–140.
van den Esker, H., van der Hofstad, R., and Hooghiemstra, G. 2008. Universality for the distance in finite
variance random graphs. J. Statist. Phys., 133(1), 169–202.
Faloutsos, C., Faloutsos, P., and Faloutsos, M. 1999. On power-law relationships of the internet topology.
Computer Commun. Rev., 29, 251–262.
Federico, L. 2023. Almost-2-regular random graphs. Australas. J. Combin., 86, 76–96.
Federico, L., and van der Hofstad, R. 2017. Critical window for connectivity in the configuration model.
Combin. Probab. Comput., 26(5), 660–680.
References 479
Fernholz, D., and Ramachandran, V. 2007. The diameter of sparse random graphs. Random Structures
Algorithms, 31(4), 482–516.
Fienberg, S., and Wasserman, S. 1981. Categorical data analysis of single sociometric relations. Sociologi-
cal Methodology, 12, 156–192.
Fill, J., Scheinerman, E., and Singer-Cohen, K. 2000. Random intersection graphs when m = ω(n): an
equivalence theorem relating the evolution of the G(n, m, p) and G(n, p) models. Random Structures
Algorithms, 16(2), 156–176.
Flaxman, A., Frieze, A., and Vera, J. 2006. A geometric preferential attachment model of networks. Internet
Math., 3(2), 187–205.
Flaxman, A., Frieze, A., and Vera, J. 2007. A geometric preferential attachment model of networks II. In:
Proceedings of Workshop on Algorithms and Models for the Web Graph 2007.
Fountoulakis, N. 2015. On a geometrization of the Chung-Lu model for complex networks. J. Complex
Netw., 3(3), 361–387.
Fountoulakis, N., van der Hoorn, P., Mller, T., and Schepers, M. 2021. Clustering in a hyperbolic model of
complex networks. Electronic J. Probab., 26, 1–132.
Frank, O., and Strauss, D. 1986. Markov graphs. J. Amer. Statist. Assoc., 81(395), 832–842.
Friedrich, T., and Krohmer, A. 2015. On the diameter of hyperbolic random graphs. Pages 614–625
of: Automata, languages, and programming. Part II. Lecture Notes in Computer Science, vol. 9135.
Springer.
Friedrich, T., and Krohmer, A. 2018. On the diameter of hyperbolic random graphs. SIAM J. Discrete
Math., 32(2), 1314–1334.
Fujita, Y., Kichikawa, Y., Fujiwara, Y., Souma, W., and Iyetomi, H. 2019. Local bow-tie structure of the
web. Applied Netw. Sci., 4(1), 1–15.
Gamarnik, D., Nowicki, T., and Swirszcz, G. 2006. Maximum weight independent sets and matchings
in sparse random graphs. Exact results using the local weak convergence method. Random Structures
Algorithms, 28(1), 76–106.
Gao, P., and Greenhill, C. 2021. Mixing time of the switch Markov chain and stable degree sequences.
Discrete Appl. Math., 291, 143–162.
Gao, P., and Wormald, N. 2016. Enumeration of graphs with a heavy-tailed degree sequence. Adv. Math.,
287, 412–450.
Gao, P., van der Hofstad, R., Southwell, A., and Stegehuis, C. 2020. Counting triangles in power-law
uniform random graphs. Electron. J. Combin., 27(3), Paper No. 3.19, 28.
Garavaglia, A., and van der Hofstad, R. 2018. From trees to graphs: collapsing continuous-time branching
processes. J. Appl. Probab., 55(3), 900–919.
Garavaglia, A., van der Hofstad, R., and Woeginger, G. 2017. The dynamics of power laws: fitness and
aging in preferential attachment trees. J. Statist. Phys., 168(6), 1137–1179.
Garavaglia, A., van der Hofstad, R., and Litvak, N. 2020. Local weak convergence for PageRank. Ann.
Appl. Probab., 30(1), 40–79.
Garavaglia, A., Hazra, R., van der Hofstad, R., and Ray, R. 2022. Universality of the local limit in prefer-
ential attachment models. arXiv:2212.05551 [math.PR].
Gilbert, E. N. 1959. Random graphs. Ann. Math. Statist., 30, 1141–1144.
Gleiser, P., and Danon, L. 2003. Community structure in jazz. Adv. Complex Systems, 06(04), 565–573.
Godehardt, E., and Jaworski, J. 2003. Two models of random intersection graphs for classification. Pages
67–81 of: Exploratory data analysis in empirical research. Stud. Classification Data Anal. Knowledge
Organ. Springer.
Goñi, J., Esteban, F., de Mendizábal, N., Sepulcre, J., Ardanza-Trevijano, S., Agirrezabal, I., and Villoslada,
P. 2008. A computational analysis of protein–protein interaction networks in neurodegenerative diseases.
BMC Systems Biology, 2(1), 52.
Grimmett, G. 1999. Percolation. 2nd edn. Springer.
Gugelmann, L., Panagiotou, K., and Peter, U. 2012. Random hyperbolic graphs: degree sequence and
clustering. Pages 573–585 of: Proceedings of the International Colloquium on Automata, Languages,
and Programming. Springer.
480 References
Gulikers, L., Lelarge, M., and Massoulié, L. 2017a. Non-backtracking spectrum of degree-corrected
stochastic block models. Pages 1–27 of: Proceedings of the 8th Innovations in Theoretical Computer
Science Conference. LIPIcs. Leibniz Int. Proc. Inform., vol. 67. Schloss Dagstuhl–Leibniz-Zentrum für
Informatik. Art. No. 44.
Gulikers, L., Lelarge, M., and Massoulié, L. 2017b. A spectral method for community detection in moder-
ately sparse degree-corrected stochastic block models. Adv. Appl. Probab., 49(3), 686–721.
Gulikers, L., Lelarge, M., and Massoulié, L. 2018. An impossibility result for reconstruction in the degree-
corrected stochastic block model. Ann. Appl. Probab., 28(5), 3002–3027.
Gut, A. 2005. Probability: a graduate course. Springer Texts in Statistics. Springer.
Häggström, O., and Jonasson, J. 1999. Phase transition in the random triangle model. J. Appl. Probab.,
36(4), 1101–1115.
Hajek, B. 1990. Performance of global load balancing by local adjustment. IEEE Trans. Inform. Theory,
36(6), 1398–1414.
Hajek, B. 1996. Balanced loads in infinite networks. Ann. Appl. Probab., 6(1), 48–75.
Hajek, B., and Sankagiri, S. 2019. Community recovery in a preferential attachment graph. IEEE Trans.
Inform. Theory, 65(11), 6853–6874.
Hajra, K.B., and Sen, P. 2005. Aging in citation networks. Physica A. Statist. Mech. Applic., 346(1-2),
44–48.
Hajra, K.B., and Sen, P. 2006. Modelling aging characteristics in citation networks. Physica A: Statist.
Mech. Applic., 368(2), 575–582.
Hall, P. 1981. Order of magnitude of moments of sums of random variables. J. London Math. Soc., 24(2),
562–568.
Hall, P., and Welsh, A. 1984. Best attainable rates of convergence for estimates of parameters of regular
variation. Ann. Statist., 12(3), 1079–1084.
Halmos, P. 1950. Measure theory. Van Nostrand.
Hao, N., and Heydenreich, M. 2023. Graph distances in scale-free percolation: the logarithmic case. J.
Appl. Probab., 60(1), 295–313.
Hardy, G. H., Littlewood, J. E., and Pólya, G. 1988. Inequalities. Cambridge Mathematical Library. Cam-
bridge University Press. Reprint of the 1952 edition.
Harris, T. 1963. The theory of branching processes. Die Grundlehren der Mathematischen Wissenschaften,
Band 119. Springer-Verlag.
Hatami, H., Lovász, L., and Szegedy, B. 2014. Limits of locally-globally convergent graph sequences.
Geom. Funct. Anal., 24(1), 269–296.
Heydenreich, M., Hulshof, T., and Jorritsma, J. 2017. Structures in supercritical scale-free percolation. Ann.
Appl. Probab., 27(4), 2569–2604.
Hill, B. M. 1975. A simple general approach to inference about the tail of a distribution. Ann. Statist., 3(5),
1163–1174.
Hirsch, C. 2017. From heavy-tailed Boolean models to scale-free Gilbert graphs. Braz. J. Probab. Statist.,
31(1), 111–143.
van der Hofstad, R. 2017. Random graphs and complex networks. Volume 1. Cambridge Series in Statistical
and Probabilistic Mathematics. Cambridge University Press.
van der Hofstad, R. 2021. The giant in random graphs is almost local. arXiv:2103.11733 [math.PR].
van der Hofstad, R., and Komjáthy, J. 2017. Explosion and distances in scale-free percolation.
arXiv:1706.02597 [math.PR].
van der Hofstad, R., and Komjáthy, J. 2017. When is a scale-free graph ultra-small? J. Statist. Phys., 169(2),
223–264.
van der Hofstad, R., and Litvak, N. 2014. Degree–degree dependencies in random graphs with heavy-tailed
degrees. Internet Math., 10(3-4), 287–334.
van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. 2005. Distances in random graphs with finite
variance degrees. Random Structures Algorithms, 27(1), 76–123.
van der Hofstad, R., Hooghiemstra, G., and Znamenski, D. 2007a. Distances in random graphs with finite
mean and infinite variance degrees. Electron. J. Probab., 12(25), 703–766 (electronic).
References 481
van der Hofstad, R., Hooghiemstra, G., and Znamenski, D. 2007b. A phase transition for the diameter of
the configuration model. Internet Math., 4(1), 113–128.
van der Hofstad, R., van Leeuwaarden, J. S. H., and Stegehuis, C. 2017. Hierarchical configuration model.
Internet Math. arXiv:1512.08397 [math.PR].
van der Hofstad, R., Komjáthy, J., and Vadon, V. 2021. Random intersection graphs with communities. Adv.
Appl. Probab., 53(4), 1061–1089.
van der Hofstad, R., Komjáthy, J., and Vadon, V. 2022. Phase transition in random intersection graphs with
communities. Random Structures Algorithms, 60(3), 406–461.
van der Hofstad, R., van der Hoorn, P., and Maitra, N. 2023. Local limits of spatial inhomogeneous random
graphs. Adv. Appl. Probab., 1–48.
Holland, P., Laskey, K., and Leinhardt, S. 1983. Stochastic blockmodels: first steps. Social Netw., 5(2),
109–137.
Holme, P. 2019. Rare and everywhere: perspectives on scale-free networks. Nature Commun., 10(1), 1016.
van der Hoorn, P., and Olvera-Cravioto, M. 2018. Typical distances in the directed configuration model.
Ann. Appl. Probab., 28(3), 1739–1792.
Howes, N. 1995. Modern analysis and topology. Universitext. Springer-Verlag.
Jacob, E., and Mörters, P. 2015. Spatial preferential attachment networks: power laws and clustering coef-
ficients. Ann. Appl. Probab., 25(2), 632–662.
Jacob, E., and Mörters, P. 2017. Robustness of scale-free spatial networks. Ann. Probab., 45(3), 1680–1722.
Janson, S. 2004. Functional limit theorems for multitype branching processes and generalized Pólya urns.
Stochastic Process. Appl., 110(2), 177–245.
Janson, S. 2008. The largest component in a subcritical random graph with a power law degree distribution.
Ann. Appl. Probab., 18(4), 1651–1668.
Janson, S. 2009. Standard representation of multivariate functions on a general probability space. Electron.
Commun. Probab., 14, 343–346.
Janson, S. 2010a. Asymptotic equivalence and contiguity of some random graphs. Random Structures
Algorithms, 36(1), 26–45.
Janson, S. 2010b. Susceptibility of random graphs with given vertex degrees. J. Combin., 1(3-4), 357–387.
Janson, S. 2011. Probability asymptotics: notes on notation. arXiv:1108.3924 [math.PR].
Janson, S. 2020a. Asymptotic normality in random graphs with given vertex degrees. Random Structures
Algorithms, 56(4), 1070–1116.
Janson, S. 2020b. Random graphs with given vertex degrees and switchings. Random Structures and
Algorithms, 57(1), 3–31.
Janson, S., and Luczak, M. 2007. A simple solution to the k-core problem. Random Structures Algorithms,
30(1-2), 50–62.
Janson, S., and Luczak, M. 2008. Asymptotic normality of the k-core in random graphs. Ann. Appl. Probab.,
18(3), 1085–1137.
Janson, S., and Luczak, M. 2009. A new approach to the giant component problem. Random Structures
Algorithms, 34(2), 197–216.
Janson, S., Łuczak, T., and Rucinski, A. 2000. Random graphs. Wiley-Interscience Series in Discrete
Mathematics and Optimization. Wiley-Interscience.
Janssen, J., Prałat, P., and Wilson, R. 2016. Nonuniform distribution of nodes in the spatial preferential
attachment model. Internet Math., 12(1-2), 121–144.
Jaworski, J., Karoński, M., and Stark, D. 2006. The degree of a typical vertex in generalized random
intersection graph models. Discrete Math., 306(18), 2152–2165.
Jaynes, E. T. 1957. Information theory and statistical mechanics. Phys. Rev., 106(2), 620–630.
Jonasson, J. 2009. Invariant random graphs with iid degrees in a general geography. Probab. Theory Rel.
Fields, 143(3-4), 643–656.
Jordan, J. 2010. Degree sequences of geometric preferential attachment graphs. Adv. Appl. Probab., 42(2),
319–330.
Jordan, J. 2013. Geometric preferential attachment in non-uniform metric spaces. Electron. J. Probab., 18,
no. 8, 15.
482 References
Jordan, J., and Wade, A. 2015. Phase transitions for random geometric preferential attachment graphs. Adv.
Appl. Probab., 47(2), 565–588.
Jorritsma, J., and Komjáthy, J. 2022. Distance evolutions in growing preferential attachment graphs. Ann.
Appl. Probab., 32(6), 4356–4397.
Jorritsma, J., Komjáthy, J., and Mitsche, D. 2023. Cluster-size decay in supercritical kernel-based spatial
random graphs. arXiv:2303.00724 [math.PR].
Kallenberg, O. 2002. Foundations of modern probability. Second edn. Springer.
Kallenberg, O. 2017. Random measures, theory and applications. Probability Theory and Stochastic Mod-
elling, vol. 77. Springer.
Karoński, M., Scheinerman, E., and Singer-Cohen, K. 1999. On random intersection graphs: the subgraph
problem. Combin. Probab. Comput., 8(1-2), 131–159.
Karp, R.M. 1990. The transitive closure of a random digraph. Random Structures Algorithms, 1(1), 73–93.
Karrer, B., and Newman, M. E. J. 2011. Stochastic blockmodels and community structure in networks.
Phys. Rev. E, 83(1), 016107.
Kass, R.E., and Wasserman, L. 1996. The selection of prior distributions by formal rules. J. Amer. Statist.
Assoc., 91(435), 1343–1370.
Kesten, H. 1982. Percolation theory for mathematicians. Progress in Probability and Statistics, vol. 2.
Birkhäuser.
Kesten, H., and Stigum, B. P. 1966. A limit theorem for multidimensional Galton-Watson processes. Ann.
Math. Statist., 37, 1211–1223.
Kingman, J. F. C. 1975. The first birth problem for an age-dependent branching process. Ann. Probab.,
3(5), 790–801.
Kiwi, M., and Mitsche, D. 2015. A bound for the diameter of random hyperbolic graphs. Pages 26–39
of: 2015 Proceedings of the 12th Workshop on Analytic Algorithmics and Combinatorics (ANALCO).
SIAM.
Kiwi, M., and Mitsche, D. 2019. On the second largest component of random hyperbolic graphs. SIAM J.
Discrete Math., 33(4), 2200–2217.
Komjáthy, J,̇ and Lodewijks, B. 2020. Explosion in weighted hyperbolic random graphs and geometric
inhomogeneous random graphs. Stochastic Process. Appl., 130(3), 1309–1367.
Krioukov, D., Papadopoulos, F., Kitsak, M., Vahdat, A., and Boguñá, M. 2010. Hyperbolic geometry of
complex networks. Phys. Rev. E, 82(3), 036106, 18.
Krioukov, D., Kitsak, M., Sinkovits, R., Rideout, D., Meyer, D., and Boguñá, M. 2012. Network cosmology.
Sci. Rep., 2.
Krzakala, F., Moore, C., Mossel, E., Neeman, J., Sly, A., Zdeborová, L., and Zhang, P. 2013. Spectral
redemption in clustering sparse networks. Proc. National Acad. Sci., 110(52), 20935–20940.
Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A., and Upfal, E. 2000. Stochastic
models for the Web graph. Pages 57–65 of: Proceedings of the 42nd Annual IEEE Symposium on
Foundations of Computer Science.
Kunegis, J. 2013. KONECT: the Koblenz network collection. Pages 1343–1350 of: Proceedings of the
22nd International Conference on World Wide Web.
Kunegis, J. 2017. The Koblenz network collection.
Kurauskas, V. 2022. On local weak limit and subgraph counts for sparse random graphs. J. Appl. Probab.,
59(3), 755–776.
Last, G., and Penrose, M. 2018. Lectures on the Poisson process. Institute of Mathematical Statistics
Textbooks, vol. 7. Cambridge University Press.
Lee, J., and Olvera-Cravioto, M. 2020. PageRank on inhomogeneous random digraphs. Stochastic Process.
Appl., 130(4), 2312–2348.
Leskelä, L. 2019. Random graphs and network statistics. Available at http://math.aalto.fi/
˜lleskela/LectureNotes004.html.
Leskovec, J., and Krevl, A. 2014 (Jun). SNAP Datasets: Stanford large network dataset collection. http:
//snap.stanford.edu/data.
Leskovec, J., Kleinberg, J., and Faloutsos, C. 2007. Graph evolution: densification and shrinking diameters.
ACM Trans. Knowledge Discovery from Data (TKDD), 1(1), 2.
References 483
Leskovec, J., Lang, K., Dasgupta, A., and Mahoney, M. 2009. Community structure in large networks:
natural cluster sizes and the absence of large well-defined clusters. Internet Math., 6(1), 29–123.
Litvak, N., and van der Hofstad, R. 2013. Uncovering disassortativity in large scale-free networks. Phys.
Rev. E, 87(2), 022801.
Lo, T. Y. Y. 2021. Weak local limit of preferential attachment random trees with additive fitness.
arXiv:2103.00900 [math.PR].
Lovász, L. 2012. Large networks and graph limits. American Mathematical Society Colloquium Publica-
tions, vol. 60. American Mathematical Society, Providence, RI.
Łuczak, T. 1992. Sparse random graphs with a given degree sequence. Pages 165–182 of: Random graphs,
Vol. 2 (Poznań, 1989). Wiley.
Lyons, R. 2005. Asymptotic enumeration of spanning trees. Combin. Probab. Comput., 14(4), 491–522.
Manna, S., and Sen, P. 2002. Modulated scale-free network in Euclidean space. Phys. Rev. E, 66(6), 066114.
Massoulié, L. 2014. Community detection thresholds and the weak Ramanujan property. Pages 694–703
of: Proceedings of the 2014 ACM Symposium on Theory of Computing. ACM.
McKay, B. D. 1981. Subgraphs of random graphs with specified degrees. Congressus Numerantium, 33,
213–223.
McKay, B. D. 2011. Subgraphs of random graphs with specified degrees. In: Proceedings of the Interna-
tional Congress of Mathematicians 2010. Hindustan Book Agency.
McKay, B. D., and Wormald, N. 1990. Asymptotic enumeration by degree sequence of graphs of high
degree. European J. Combin., 11(6), 565–580.
Meester, R., and Roy, R. 1996. Continuum percolation. Cambridge Tracts in Mathematics, vol. 119.
Cambridge University Press.
Milewska, M., van der Hofstad, R., and Zwart, B. 2023. Dynamic random intersection graph: Dynamic
local convergence and giant structure. arXiv: 2308.15629 [math.PR].
Molloy, M., and Reed, B. 1995. A critical point for random graphs with a given degree sequence. Random
Structures Algorithms, 6(2-3), 161–179.
Molloy, M., and Reed, B. 1998. The size of the giant component of a random graph with a given degree
sequence. Combin. Probab. Comput., 7(3), 295–305.
Molloy, M., Surya, E., and Warnke, L. 2022. The degree-restricted random process is far from uniform.
arXiv: 2211.00835v1 [math.CO].
Moore, C., and Newman, M. E. J. 2000. Epidemics and percolation in small-world networks. Phys. Rev. E,
61, 5678–5682.
Mossel, E., Neeman, J., and Sly, A. 2015. Reconstruction and estimation in the planted partition model.
Probab. Theory Rel. Fields, 162(3-4), 431–461.
Mossel, E., Neeman, J., and Sly, A. 2016. Belief propagation, robust reconstruction and optimal recovery
of block models. Ann. Appl. Probab., 26(4), 2211–2256.
Mossel, E., Neeman, J., and Sly, A. 2018. A proof of the block model threshold conjecture. Combinatorica,
38(3), 665–708.
Mucha, P. J., Richardson, T., Macon, K., Porter, M. A., and Onnela, J.-P. 2010. Community structure in
time-dependent, multiscale, and multiplex networks. Science, 328(5980), 876–878.
Nair, J., Wierman, A., and Zwart, B. 2022. The fundamentals of heavy tails: properties, emergence, and
estimation. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press.
Newman, M. E. J. 2003. Properties of highly clustered networks. Phys. Rev. E, 68(2), 026121.
Newman, M. E. J. 2009. Random graphs with clustering. Phys. Rev. Lett., 103(Jul), 058701.
Newman, M. E. J. 2010. Networks: an introduction. Oxford University Press.
Newman, M. E. J., and Park, J. 2003. Why social networks are different from other types of networks.
Phys. Rev. E, 68(3), 036122.
Newman, M. E. J., and Watts, D.J. 1999. Scaling and percolation in the small-world network model. Phys.
Rev. E, 60, 7332–7344.
Newman, M. E. J., Moore, C., and Watts, D. J. 2000a. Mean-field solution of the small-world network
model. Phys. Rev. Lett., 84, 3201–3204.
Newman, M. E. J., Strogatz, S., and Watts, D. 2000b. Random graphs with arbitrary degree distribution and
their application. Phys. Rev. E, 64, 026118, 1–17.
484 References
Newman, M. E. J., Strogatz, S., and Watts, D. 2002. Random graph models of social networks. Proc.
National Acad. Sci., 99, 2566–2572.
Newman, M. E. J., Watts, D. J., and Barabási, A.-L. 2006. The structure and dynamics of networks. Prince-
ton Studies in Complexity. Princeton University Press.
Norros, I., and Reittu, H. 2006. On a conditionally Poissonian graph process. Adv. Appl. Probab., 38(1),
59–75.
Noutsos, D. 2006. On Perron–Frobenius property of matrices having some negative entries. Linear Algebra
Appl., 412(2-3), 132–153.
O’Connell, N. 1998. Some large deviation results for sparse random graphs. Probab. Theory Rel. Fields,
110(3), 277–285.
Parthasarathy, K. R. 1967. Probability measures on metric spaces. Probability and Mathematical Statistics,
No. 3. Academic Press.
Pemantle, R. 2007. A survey of random processes with reinforcement. Probab. Surv., 4, 1–79 (electronic).
Pickands III, J. 1968. Moment convergence of sample extremes. Ann. Math. Statistics, 39, 881–889.
Pittel, B. 1994. Note on the heights of random recursive trees and random m-ary search trees. Random
Structures Algorithms, 5(2), 337–347.
Price, D. J. de Solla. 1965. Networks of scientific papers. Science, 149, 510–515.
Price, D. J. de Solla. 1986. Little science, big science. . . and beyond. Columbia University Press.
Reittu, H., and Norros, I. 2004. On the power law random graph model of massive data networks. Perfor-
mance Evaluation, 55(1-2), 3–23.
Resnick, S. 2007. Heavy-tail phenomena. Springer Series in Operations Research and Financial Engineer-
ing. Springer. (Probabilistic and statistical modeling).
Riordan, O., and Warnke, L. 2011. Explosive percolation is continuous. Science, 333(6040), 322–324.
Riordan, O., and Wormald, N. 2010. The diameter of sparse random graphs. Combin., Probab. & Comput.,
19(5-6), 835–926.
Ross, S. M. 1996. Stochastic processes. Second edn. Wiley Series in Probability and Statisticss. John Wiley
and Sons.
Ruciński, A., and Wormald, N. 2002. Connectedness of graphs generated by a random d-process. J. Aust.
Math. Soc., 72(1), 67–85.
Rudas, A., Tóth, B., and Valkó, B. 2007. Random trees and general branching processes. Random Structures
Algorithms, 31(2), 186–202.
Rudin, W. 1987. Real and complex analysis. McGraw-Hill.
Rudin, W. 1991. Functional analysis. International Series in Pure and Applied Mathematics. McGraw-Hill.
Rybarczyk, K. 2011. Diameter, connectivity, and phase transition of the uniform random intersection graph.
Discrete Math., 311(17), 1998–2019.
Salez, J. 2013. Weighted enumeration of spanning subgraphs in locally tree-like graphs. Random Structures
Algorithms, 43(3), 377–397.
Schuh, H.-J., and Barbour, A. D. 1977. On the asymptotic behaviour of branching processes with infinite
mean. Adv. Appl. Probab., 9(4), 681–723.
Seneta, E. 1973. The simple branching process with infinite mean. I. J. Appl. Probab., 10, 206–212.
Seneta, E. 1974. Regularly varying functions in the theory of simple branching processes. Adv. Appl.
Probab., 6, 408–420.
Shannon, C. E. 1948. A mathematical theory of communication. Bell System Tech. J., 27, 379–423, 623–
656.
Shepp, L. A. 1989. Connectedness of certain random graphs. Israel J. Math., 67(1), 23–33.
Shore, J. E., and Johnson, R. W. 1980. Axiomatic derivation of the principle of maximum entropy and the
principle of minimum cross-entropy. IEEE Trans. Inform. Theory, 26(1), 26–37.
Simon, H. A. 1955. On a class of skew distribution functions. Biometrika, 42, 425–440.
Singer, K. 1996. Random intersection graphs. ProQuest LLC, Ann Arbor, MI. PhD Thesis, The Johns
Hopkins University.
Smythe, R., and Mahmoud, H. 1994. A survey of recursive trees. Teor. Ĭmovı̄r. Mat. Statist., 1–29.
Snijders, T. A., Pattison, P., Robbins, G., and Handcock, M. 2006. New specifications for exponential
random graph models. Sociological Methodology, 36(1), 99–153.
References 485
Söderberg, B. 2002. General formalism for inhomogeneous random graphs. Phys. Rev. E, 66(6), 066121,
6.
Söderberg, B. 2003a. Properties of random graphs with hidden color. Phys. Rev. E, 68(2), 026107, 12.
Söderberg, B. 2003b. Random graph models with hidden color. Acta Phys. Polonica B, 34, 5085–5102.
Söderberg, B. 2003c. Random graphs with hidden color. Phys. Rev. E, 68(1), 015102, 4.
Sönmez, E. 2021. Graph distances of continuum long-range percolation. Braz. J. Probab. Statist., 35(3),
609–624.
Spencer, J., and Wormald, N. 2007. Birth control for giants. Combinatorica, 27(5), 587–628.
Stark, D. 2004. The vertex degree distribution of random intersection graphs. Random Structures Algo-
rithms, 24(3), 249–258.
Stegehuis, C., van der Hofstad, R., and van Leeuwaarden, J. S. H. 2016a. Epidemic spreading on complex
networks with community structures. Sci. Rep., 6, 29748.
Stegehuis, C., van der Hofstad, R., and van Leeuwaarden, J. S. H. 2016b. Power-law relations in random
networks with communities. Phys. Rev. E, 94, 012302.
Sundaresan, S., Fischhoff, I., Dushoff, J., and Rubenstein, D. 2007. Network metrics reveal differences in
social organization between two fission–fusion species, Grevy’s zebra and onager. Oecologia, 151(1),
140–149.
Turova, T. S. 2011. The largest component in subcritical inhomogeneous random graphs. Combin., Probab.
Comput., 20(01), 131–154.
Turova, T. S., and Vallier, T. 2010. Merging percolation on Zd and classical random graphs: phase transition.
Random Structures Algorithms, 36(2), 185–217.
Ugander, J., Karrer, B., Backstrom, L., and Marlow, C. 2011. The anatomy of the Facebook social graph.
arXiv:1111.4503 [cs.SI].
Vadon, V., Komjáthy, J., and van der Hofstad, R. 2019. A new model for overlapping communities with
arbitrary internal structure. Applied Network Science, 4(1), 42.
Voitalov, I., van der Hoorn, P., van der Hofstad, R., and Krioukov, D. 2019. Scale-free networks well done.
Phys. Rev. Res., 1(3), 033034.
Wang, D., Song, C., and Barabási, A. L. 2013. Quantifying long-term scientific impact. Science, 342(6154),
127–132.
Wang, J., Mei, Y., and Hicks, D. 2014. Comment on ”Quantifying long-term scientific impact”. Science,
345(6193), 149.
Wang, M., Yu, G., and Yu, D. 2008. Measuring the preferential attachment mechanism in citation networks.
Physica A: Statist. Mech. Appli., 387(18), 4692 – 4698.
Wang, M., Yu, G., and Yu, D. 2009. Effect of the age of papers on the preferential attachment in citation
networks. Physica A: Statist. Mech. Applic., 388(19), 4273–4276.
Wasserman, S., and Pattison, P. 1996. Logit models and logistic regressions for social networks. Psychome-
trika, 61(3), 401–425.
Watts, D. J. 1999. Small worlds. The dynamics of networks between order and randomness. Princeton
Studies in Complexity. Princeton University Press.
Watts, D. J. 2003. Six degrees. The science of a connected age. W. W. Norton & Co.
Watts, D. J., and Strogatz, S. H. 1998. Collective dynamics of ‘small-world’ networks. Nature, 393, 440–
442.
Wong, L.H., Pattison, P., and Robins, G. 2006. A spatial model for social networks. Physica A: Statist.
Mech. Applic., 360(1), 99–120.
Wormald, N. 1981. The asymptotic connectivity of labelled regular graphs. J. Combin. Theory Ser. B,
31(2), 156–167.
Wormald, N. 1999. Models of random regular graphs. Pages 239–298 of: Surveys in combinatorics, 1999
(Canterbury). London Math. Soc. Lecture Note Series, vol. 267. Cambridge University Press.
Yukich, J. E. 2006. Ultra-small scale-free geometric networks. J. Appl. Probab., 43(3), 665–677.
Yule, G. U. 1925. A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F.R.S.
Phil. Trans. Roy. Soc. London, B, 213, 21–87.
Zhao, Y., Levina, E., and Zhu, J. 2012. Consistency of community detection in networks under degree-
corrected stochastic block models. Ann. Statist., 40(4), 2266–2292.
486 References
Zuev, K., Boguná, M., Bianconi, G., and Krioukov, D. 2015. Emergence of soft communities from geomet-
ric preferential attachment. Sci. Rep., 5, 9421.
G LOSSARY
487
488 Glossary
489
490 INDEX
hub, 7 Theorem
log–log plot degree distribution, 7 de Finetti, 190
scale-free, 6, 7 Helly, 192
scale-free nature, 5 Perron–Frobenius, 110
“scale-free networks are rare”, 7 Potter, 37
small-world phenomenon, 11 Tightness, 469
spatial structure, 14 general metric space, 469
super-spreader, 7 Tree, 41
Repeated configuration model, 25 exploration, 41
Rooted graph height, 328
isomorphism, 49 ordered, 41
metric, 49 rooted, 41
neighborhood, 48 Ulam–Harris labeling, 41
2-core, 313
Scale-free percolation, 444, 456 Two-regular graph
degree, 445 diameter, 324
small-world nature, 447 longest cycle, 324
ultra-small-world nature, 447 Typical distance, 13
Scale-free tree, 328
diameter, 328 Ultra-small distance
height, 328 power iteration for configuration model,
typical distance, 328 303
Self-avoiding path, 251 power-iteration for configuration model,
Size-biased distribution, 121 300
Small world, 247 Ultra-small world, 247, 248
Small-world model, 429 Ultra-small-world nature
continuous circle model, 429 Chung–Lu model, 248
small-world nature, 429, 430 configuration model, 291
Small-world nature generalized random graph, 248
Chung–Lu model, 247 geometric inhomogeneous random graph,
configuration model, 291 439
Erdős–Rényi random graph, 86 hyperbolic random graph, 433
generalized random graph, 247 Norros–Reittu model, 248
inhomogeneous random graph, 246 preferential attachment model, 337, 338
Norros–Reittu model, 247 scale-free percolation, 447
preferential attachment model, 336 Uniform integrability, 67
scale-free percolation, 447 Uniform random graph with prescribed degrees,
small-world model, 429, 430 27
Sparse network, 5 edge probabilities, 28
Spatial configuration model, 447 giant component, 171
matching, 448 switching algorithm, 27
Spatial preferential attachment model, 440 using configuration model, 25
degree distribution, 443 Uniform recursive trees, 378
Spatial random graph, 428 Universality, 37
clustering, 428 typical distances, 338
Spatial structure, 14
Stochastic domination, 40, 121 With high probability, 40