0% found this document useful (0 votes)

17 views

Notes RGC Nii

Uploaded by

kchapeliere

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Notes RGC Nii

Uploaded by

kchapeliere

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 508

RANDOM GRAPHS AND COMPLEX

NETWORKS
Volume 2

Remco van der Hofstad

April 16, 2024

Aan Mad, Max en Lars
het licht in mijn leven

Ter nagedachtenis aan mijn ouders

die me altijd aangemoedigd hebben
C ONTENTS

Preface xi
Possible Course Outlines xv

Part I Preliminaries 1

1 Introduction and Preliminaries 3

1.1 Motivation: Real-World Networks 4
1.2 Random Graphs and Real-World Networks 14
1.3 Random Graph Models 15
1.4 Power Laws and Their Properties 37
1.5 Notation and Preliminaries 39
1.6 Notes and Discussion for Chapter 1 42
1.7 Exercises for Chapter 1 44

2 Local Convergence of Random Graphs 47

2.1 Motivation: Local Limits 47
2.2 Metric Space of Rooted Graphs 48
2.3 Local Weak Convergence of Deterministic Graphs 50
2.4 Local Convergence of Random Graphs 55
2.5 Consequences of Local Convergence: Local Functionals 64
2.6 Giant Component is Almost Local 74
2.7 Notes and Discussion for Chapter 2 87
2.8 Exercises for Chapter 2 89

Part II Connected Components in Random Graphs 93

3 Connected Components in General Inhomogeneous Random Graphs 95

3.1 Motivation: Existence and Size of the Giant 95
3.2 Definition of the Model 97
3.3 Degree Sequence of Inhomogeneous Random Graphs 101
3.4 Multi-Type Branching Processes 108
3.5 Local Convergence for Inhomogeneous Random Graphs 115
3.6 Phase Transition for Inhomogeneous Random Graphs 124
3.7 Related Results for Inhomogeneous Random Graphs 129
3.8 Notes and Discussion for Chapter 3 131
3.9 Exercises for Chapter 3 132

4 Connected Components in Configuration Models 137

4.1 Motivation: Relating Degrees to Local Limit and Giant 137
4.2 Local Convergence of the Configuration Model 138
4.3 The Giant in the Configuration Model 150
4.4 Connectivity of Configuration Models 173
4.5 Related Results for Configuration Models 179

vii
viii Contents

4.6 Notes and Discussion for Chapter 4 184

4.7 Exercises for Chapter 4 185

5 Connected Components in Preferential Attachment Models 189

5.1 Motivation: Connections and Local Degree Structure 189
5.2 Exchangeable Random Variables and Pólya Urn Schemes 190
5.3 Local Convergence of Preferential Attachment Models 199
5.4 Proof of Local Convergence for Preferential Attachment Models 211
5.5 Connectivity of Preferential Attachment Models 231
5.6 Further Results for Preferential Attachment Models 233
5.7 Notes and Discussion for Chapter 5 237
5.8 Exercises for Chapter 5 238

Part III Small-World Properties of Random Graphs 243

6 Small-World Phenomena in Inhomogeneous Random Graphs 245

6.1 Motivation: Distances and Branching Processes 245
6.2 Small-World Phenomena in Inhomogeneous Random Graphs 246
6.3 Typical-Distance Lower Bounds in Inhomogeneous Random Graphs 250
6.4 Doubly Logarithmic Upper Bound for Infinite-Variance Weights 259
6.5 Logarithmic Upper Bound for Finite-Variance Weights 264
6.6 Related Results on Distances in Inhomogeneous Random Graphs 282
6.7 Notes and Discussion for Chapter 6 284
6.8 Exercises for Chapter 6 285

7 Small-World Phenomena in Configuration Models 289

7.1 Motivation: Distances and Degree Structure 289
7.2 Small-World Phenomenon in Configuration Models 291
7.3 Proofs of Small-World Results for the Configuration Model 292
7.4 Branching Processes with Infinite Mean 305
7.5 Diameter of the Configuration Model 311
7.6 Related Results on Distances in Configuration Models 316
7.7 Notes and Discussion for Chapter 7 321
7.8 Exercises for Chapter 7 322

8 Small-World Phenomena in Preferential Attachment Models 327

8.1 Motivation: Local Structure versus Small-World Properties 327
8.2 Logarithmic Distances in Preferential Attachment Trees 328
8.3 Small-World Phenomena in Preferential Attachment Models 336
8.4 Path Counting in Preferential Attachment Models 338
8.5 Logarithmic Lower Bounds on the Distance 344
8.6 Log Log Distance Lower Bound for Infinite-Variance Degrees 356
8.7 Log Log Upper Bounds for PAMs with Infinite-Variance Degrees 366
8.8 Diameters in Preferential Attachment Models 373
8.9 Related Results on Distances in Preferential Attachment Models 375
8.10 Notes and Discussion for Chapter 8 378
8.11 Exercises for Chapter 8 379
Contents ix

Part IV Related Models and Problems 383

9 Related Models 385

9.1 Motivation: Real-World Network Modeling 385
9.2 Directed Random Graphs 390
9.3 Random Graphs with Community Structure: Global Communities 404
9.4 Random Graphs with Community Structure: Local Communities 413
9.5 Spatial Random Graphs 428
9.6 Notes and Discussion for Chapter 9 448
9.7 Exercises for Chapter 9 457

Appendix Metric Space Structure of Rooted Graphs 463

References 473
Glossary 487
Index 489
P REFACE

Targets. In this book, which is Volume 2 of a sequence of two books, we study local limits,
connected components, and small-world properties of random graph models for complex
networks. Volume 1 describes the preliminaries of random graphs as models for real-world
networks, as investigated since 1999. These networks turned out to be rather different from
classical random graph models, for example in the number of connections that the elements
make. As a result, a wealth of new models was invented to capture these properties. Volume
1 studies these models as well as their degree structure. Volume 2 summarizes the insights
developed in this exciting period related to the local, connectivity, and small-world structure
of the proposed random graph models. While Volume 1 is intended to be used for a master
level course, where students have a limited prior knowledge of special topics in probability,
Volume 2 describes the more involved notions that have been the focus of attention of the
research community in the past two decades.
Volume 2 is intended to be used for a PhD level course, a reading seminar, or for re-
searchers wishing to obtain a consistent and extended overview of the results and method-
ologies developed in this scientific area. Volume 1 includes many of the preliminaries, such
as the convergence of random variables, probabilistic bounds, coupling, martingales, and
branching processes, and we frequently rely on these results.
The sequence of Volumes 1 and 2 aims to be self-contained. In Volume 2, we briefly repeat
some of the preliminaries on random graphs, including an introduction to the key models and
their degree distributions, as discussed in detail in Volume 1. In Volume 2, we aim to give
detailed and complete proofs. When we do not give proofs, we provide heuristics, as well
as extensive pointers to the literature. We further discuss several more recent random graph
models that aim to more realistically model real-world networks, as they incorporate their
directed nature, their community structure, and/or their spatial embedding.

Developments. The field of random graphs was pioneered in 1959–1960 by Erdős and Rényi
(1959; 1960; 1961a; 1961b), in the context of the probabilistic method. The initial work by
Erdős and Rényi incited a great amount of follow-up in the field, initially mainly in the
combinatorics community. See the standard references on the subject by Bollobás (2001)
and Janson, Łuczak, and Ruciński (2000) for the state of the art. Erdős and Rényi (1960)
gives a rather complete picture of the various phase transitions that occur in the Erdős–Rényi
random graph. This initial work did not aim to model real-world networks realistically.
In the period after 1999, owing to the fact that data sets of large real-world networks be-
came abundantly available, their structure has attracted enormous attention in mathematics
as well as in various applied domains. This is exemplified by the fact that one of the first
articles in the field, by Barabási and Albert (1999), has attracted over 40,000 citations. One
of the main conclusions from this overwhelming body of work is that many real-world net-
works share two fundamental properties. The first is that they are highly inhomogeneous, in
the sense that different vertices play rather different roles in the networks. This property is
exemplified by the degree structure of the real-world networks obeying power laws: these
networks are scale-free. This scale-free nature of real-world networks has prompted the

xi
xii Preface

community to come up with many novel random graph models that, unlike the Erdős–Rényi
random graph, do have power-law degree sequences. This was the key focus in Volume 1.
Content. In this book, we pick up on the trail left in Volume 1, where we now focus on the
connectivity structure between vertices. Connectivity can be summarized in two key aspects
of real-world networks: the facts that they are highly connected, as exemplified by the fact
that they tend to have one giant component containing a large proportion of the vertices (if
not all of them), and that they are small world, in that most pairs of vertices are separated by
short paths. We discuss the available methods for these proofs, including path-counting tech-
niques, branching-process approximations, exchangeable random variables, and de Finetti’s
theorems. We pay particular attention to a recent technique, called local convergence, that
makes the statement that random graphs “locally look like trees” precise.
This book consists of four parts. In Part I, consisting of Chapters 1 and 2, we start in
Chapter 1 by repeating some definitions from Volume 1, including the random graph mod-
els studied in the present book, which are inhomogeneous random graphs, configuration
models, and preferential attachment models. We also discuss general topics that are impor-
tant in random graph theory, such as power-law distributions and their properties. In Chapter
2, we continue by discussing local convergence, an extremely powerful technique that plays
a central role in the theory of random graphs and in this book. In Part II, consisting of Chap-
ters 3–5, we discuss local limits and large connected components in random graph models.
In Chapter 3, we further extend the definition of the generalized random graph to general
inhomogeneous random graphs. In Chapter 4, we discuss the local limit and large connected
components in the configuration model, and in Chapter 5, we discuss the local structure in,
and connectivity of, preferential attachment models. In Part III, consisting of Chapters 6–8,
we study the small-world nature of random graphs, starting with inhomogeneous random
graphs, continuing with the configuration model, and ending with the preferential attach-
ment model. In Part IV, consisting of Chapter 9, we study related random graph models and
their structure.
Along the way, we give many exercises that should help the reader to obtain a deeper
understanding of the material by working on the solutions. These exercises appear in the last
section of each of the chapters, and, when applicable, we refer to them at the appropriate
place in the text. We also provide extensive notes in the penultimate section of each chapter,
where we discuss the links to the literature and some extensions.
Literature. We have tried to give as many references to the literature as possible. However,
the number of papers on random graphs has exploded. In MathSciNet (see www.ams.org/
mathscinet), there were, on December 21, 2006, a total of 1,428 papers that contain
the phrase “random graphs” in the review text; on September 29, 2008, this number had
increased to 1,614, to 2,346 on April 9, 2013; to 2,986 on April 21, 2016; and to 12,038 on
October 5, 2020. These are merely the papers on the topic in the mathematics community.
What is special about random graph theory is that it is extremely multidisciplinary, and many
papers using random graphs are currently written in economics, biology, theoretical physics,
and computer science. For example, in Scopus (see www.scopus.com/scopus/home.
url), again on December 21, 2006, there were 5,403 papers that contain the phrase “random
graph” in the title, abstract or keywords; on September 29, 2008, this had increased to 7,928;
to 13,987 on April 9, 2013; to 19,841 on April 21, 2016; and to 30,251 on October 5, 2020. It
Preface xiii

can be expected that these numbers will continue to increase, rendering it utterly impossible
to review all the literature.
In June 2014, we decided to split the preliminary version of this book up into two books.
This has several reasons and advantages, particularly since Volume 2 is more tuned towards
a research audience, while Volume 1 is aimed at an audience of master students of varying
backgrounds. The pdf-versions of both Volumes 1 and 2 can be obtained from
www.win.tue.nl/˜rhofstad/NotesRGCN.html.
For errata for this book and Volume 1, or possible outlines for courses based on them, readers
are encouraged to look at this website or e-mail me. Also, for a more playful approach to
networks for a broad audience, including articles, videos, and demos of many of the models
treated in this book, we refer all readers to the NetworkPages at www.networkspages.
nl. The NetworkPages provide an interactive website developed by and for all those who
are interested in networks. Finally, we have relied on various real-world networks data sets
provided by the KONECT project; see http://konect.cc as well as Kunegis (2013)
for more details.
Thanks. This book, as well as Volume 1, would not have been possible without the help
and encouragement of many people. I particularly thank Gerard Hooghiemstra for encour-
aging me to write it, and for using it at Delft University of Technology almost simultane-
ously while I was using it at Eindhoven University of Technology in the spring of 2006 and
again in the fall of 2008. I thank Gerard for many useful comments, solutions to exercises,
and suggestions for improvements of the presentation throughout the book. Together with
Piet Van Mieghem, we entered the world of random graphs in 2001, and I have tremen-
dously enjoyed exploring this field together with them, as well as with Henri van den Esker,
Dmitri Znamenski, Mia Deijfen, Shankar Bhamidi, Johan van Leeuwaarden, Júlia Komjáthy,
Nelly Litvak and many others.
I thank Christian Borgs, Jennifer Chayes, Gordon Slade, and Joel Spencer for joint work
on random graphs that are like the Erdős–Rényi random graph but do have geometry. Spe-
cial thanks go to Gordon Slade, who introduced me to the exciting world of percolation,
which is closely linked to the world of random graphs (see the classic text on percolation
by Grimmett (1999)). It is striking to see two communities working on two such closely
related topics with different methods and even different terminology, and it has taken a
long time to build bridges between the two subjects. I am very happy that these bridges
are now rapidly appearing, and the level of communication between different communities
has increased significantly. I hope that this book helps to further enhance this communica-
tion. Frank den Hollander deserves a special mention. Frank, you have been important as a
driving force throughout my career, and I am very happy now to be working with you on
fascinating random graph problems!
Further, I thank
Marie Albenque, Yeganeh Alimohammadi, Rangel Baldasso, Gianmarco Bet,
Shankar Bhamidi, Finbar Bogerd, Marko Boon, Christian Borgs, Hao Can,
Francesco Caravenna, Rui Castro, Kota Chisaki, Deen Colenbrander, Nicolas Curien,
Umberto De Ambroggio, Mia Deijfen, Michel Dekking, Serte Donderwinkel,
Dylan Dronnier, Henri van den Esker, Lorenzo Federico, Federica Finazzi, Allison Fisher,
xiv Preface

Lucas Gerin, Cristian Giardinà, Claudia Giberti, Jesse Goodman, Rowel Gündlach,
Rajat Hazra, Markus Heydenreich, Frank den Hollander, Yusuke Ide, Simon Irons,
Emmanuel Jacob, Svante Janson, Guido Janssen, Lancelot James, Martin van Jole,
Joost Jorritsma, Willemien Kets, Heejune Kim, Bas Kleijn, Júlia Komjáthy, Norio Konno,
Dima Krioukov, John Lapeyre, Lasse Leskelä, Nelly Litvak, Neeladri Maitra,
Abbas Mehrabian, Marta Milewska, Steven Miltenburg, Mislav Mišković, Christian Mönch,
Peter Mörters, Mirko Moscatelli, Jan Nagel, Sidharthan Nair, Alex Olssen,
Mariana Olvera-Cravioto, Helena Peña, Manish Pandey, Rounak Ray, Nathan Ross,
Christoph Schumacher, Matteo Sfragara, Karoly Simon, Lars Smolders, Clara Stegehuis,
Dominik Tomecki, Nicola Turchi, Viktória Vadon, Thomas Vallier, Irène Ayuso Ventura,
Xiaotin Yu, Haodong Zhu and Bert Zwart
for remarks and ideas that have improved the content and presentation of these books sub-
stantially. Wouter Kager read the February 2007 version of this book in its entirety, giving
many ideas for improvements in the arguments and the methodology. Artëm Sapozhnikov,
Maren Eckhoff, and Gerard Hooghiemstra read and commented on the October 2011 ver-
sion. Haodong Zhu read the December 2023 version completely, and corrected several typos.

Particular thanks go to Dennis Timmers, Eefje van den Dungen, Joop van de Pol,
Rowel Gündlach and Lourens Touwen, who, as, my student assistants, have been a great
help in the development of this pair of books, in making figures, providing solutions to some
of the exercises, checking proofs, and keeping the references up to date. Maren Eckhoff
also provided many solutions to the exercises, for which I am grateful! Sándor Kolumbán,
Robert Fitzner, and Lourens Touwen helped me to turn all pictures of real-world networks
as well as simulations of network models into a unified style, a feat that is beyond my
LATEX skills. A big thanks for that! Also my thanks for suggestions and help with figures to
Marko Boon, Alessandro Garavaglia, Dimitri Krioukov, Vincent Kusters, Clara Stegehuis,
Piet Van Mieghem, and Yana Volkovich. A special thanks to my running mates Jan and
Ruud, whose continuing support has been extremely helpful for me.
Support. This work would not have been possible without the generous support of the
Netherlands Organization for Scientific Research (NWO) through VIDI grant 639.032.304,
VICI grant 639.033.806, and the Gravitation N ETWORKS grant 024.002.003.
P OSSIBLE C OURSE O UTLINES

The relation between the chapters in Volumes 1 and 2 is as follows:

Random Graphs
Introduc-
and Complex Branch-
Networks ing Pro-
tion [V1,
cesses [V1,
Chapter 1]
Chapter 3]
Phase Tran- Small
sition [V1, world [V2,
Chapter 4] Chapter 8]

Probabilistic Connec-
Methods [V1, tivity [V2,
Revis- Chapter 5]
ited [V1, Chapter 2]
Chapter 5]
Erdős–Rényi Introduc- Preferential
Random Graph tion [V1, Attachment Model
Chapter 8]

Introduc- Local Convergence Small

tion [V1, Connec-
Connec- [V2, Chapter 2] tivity [V2, World [V2,
Chapter 6] tivity [V2, Chapter 7]
Chapter 4]
Chapter 3]
Introduc-
Small tion [V1,
World [V2, Chapter 7]
Chapter 6]

Inhomogeneous Configura-
Random Graphs tion Model

Related Models
[V2, Chapter 9]

Here is some more explanation as well as a possible itinerary of a master or PhD course
on random graphs, based on Volume 2, in a course outline. For a course outline based on
Volume 1, we refer to [V1, Preface] for alternative routes through the material, we refer to
the book’s website at www.win.tue.nl/˜rhofstad/NotesRGCN.html:

B Start with the introduction to real-world networks in [V2, Chapter 1], which forms the
inspiration for what follows. For readers wishing for a more substantial introduction, do visit
Volume 1 for an extensive introduction to the models discussed here.
B Continue with [V2, Chapter 2] on the local convergence of (random and non-random)
graphs, as this is a crucial tool in the book and has developed into a key methodology in the
field.

The material in this book is rather substantial, and probably too much to be treated in one
course. Thus, we give two alternative approaches to teaching coherent parts of this book:
B You can either take one of the models and discuss the different chapters in Volume 2
xv
xvi Possible Course Outlines

that focus on them. [V2, Chapters 3 and 6] discuss inhomogeneous random graphs, [V2,
Chapters 4 and 7] discuss configuration models, while [V2, Chapters 5 and 8] focus on
preferential attachment models.
B The alternative is that you take one of the topics, and work through them in detail. [V2,
Part II] discusses the local limits and largest connected components or phase transition in
our random graph models, while [V2, Part III] treats their small-world nature.
If you have further questions and/or suggestions about course outlines, feel free to contact
me. Refer to www.win.tue.nl/˜rhofstad/NotesRGCN.html for further sugges-
tions on how to lecture from Volume 2.
Part I

Preliminaries

1
C HAPTER 1
I NTRODUCTION AND P RELIMINARIES

Abstract
In this chapter, we draw motivation from real-world networks and formulate
random graph models for them. We focus on some of the models that have re-
ceived the most attention in the literature, namely, Erdős–Rényi random graphs,
inhomogeneous random graphs, configuration models, and preferential attach-
ment models. We follow van der Hofstad (2017), which we refer to as [V1], both
for motivation and for the introduction to the random graph models involved.

Looking Back, and Ahead

In Volume 1 of this pair of books, we discussed various models having flexible degree se-
quences. The generalized random graph and the configuration model give us static flexible
models for random graphs with various degree sequences. Because of their dynamic nature,
preferential attachment models give us a convincing explanation of the abundance of power-
law degree sequences in various applications. We will often refer to Volume 1. When we do
so, we write [V1, Theorem 2.17] to signify that we refer to Theorem 2.17 in van der Hofstad
(2017).
In [V1, Chapters 6–8], we focussed on the properties of the degrees of such graphs. How-
ever, we noted in [V1, Chapter 1] that not only do many real-world networks have degree
sequences that are rather different from the ones of the Erdős–Rényi random graph, also
many examples have a giant connected component and are small worlds.
In Chapters 3–8, we will return to the models discussed in [V1, Chapters 6–8], and fo-
cus on their local structure, and their connected components, as well as on their distance
structure. Interestingly, a large chunk of the non-rigorous physics literature suggests that the
behavior in various different random graph models can be described by only a few essential
parameters. The key parameter of each of these models is the power-law degree exponent,
and the physics literature predicts the behavior in random graph models with similar degree
sequences to be similar. This is an example of the notion of universality, a central notion in
statistical physics. Despite its importance, there are only a few examples of universality that
can be rigorously proved. In Chapters 3–8, we investigate the level of universality present in
random graph models.

Organization of this Chapter

This chapter is organized as follows. In Section 1.1 we discuss real-world networks and the
inspiration that they provide. In Section 1.2, we then discuss how graph sequences, where
the size of the involved graphs tends to infinity, aim at describing large complex networks.
In Section 1.3 we recall the definition of several random graph models, as introduced in
Volume 1. In Section 1.4, we discuss power-law random variables, as they play an important
role in this book. In Section 1.5 we recall some of the standard notation and notions used
in this book. We close this chapter with notes and discussion in Section 1.6 and with exer-

3
4 Introduction and Preliminaries

cises in Section 1.7. We give few references to the literature within this chapter, but defer a
discussion of the history of the various models to the extensive notes in Section 1.6.

1.1 M OTIVATION : R EAL -W ORLD N ETWORKS

In the past two decades, an enormous research effort has been performed with regard to mod-
eling various real-world phenomena using networks. Networks arise in various applications
ranging from the connections between friends in friendship networks to the connectivity of
neurons in the brain, to the relations between companies and countries in economics, and the
hyperlinks between webpages in the World-Wide Web. The advent of the computer era has
made many network data sets available. Around 1999–2000, various groups started to inves-
tigate network data from an empirical perspective. [V1, Chapter 1] gives many examples of
real-world networks and the empirical findings from them. Here we give some basics.

1.1.1 G RAPHS AND N ETWORKS

A graph G = (V, E) consists of a collection V = V (G) of vertices, also called a vertex set,
and a collection of edges E = E(G), often called an edge set. The vertices correspond to
the objects that we model; the edges indicate some relation between pairs of these objects. In
our settings, graphs are usually undirected. Thus, an edge is an unordered pair {u, v} ∈ E
indicating that u and v with u, v ∈ V (G) are directly connected. When G is undirected,
if u is directly connected to v then also v is directly connected to u. Therefore, an edge
can be seen as a pair of vertices. When dealing with social networks, the vertices represent
the individuals in the population while the edges represent the friendships among them. We
sometimes work with multi-graphs, which are graphs possibly having self-loops or multiple
edges between vertices, and we will clearly indicate when we do so.
We mainly deal with finite graphs and then, for simplicity, we often take V = [n] :=
{1, . . . , n}. The degree d(G)
u of a vertex u ∈ V (G) in the graph G is equal to the number of
edges containing u, i.e.,

u = #{v ∈ V (G) : {u, v} ∈ E(G)}.

d(G) (1.1.1)
Often, we deal with the degree of a random vertex in G. Let o ∈ V (G) be a vertex chosen
uniformly at random (uar) in V (G). The typical degree is the random variable Dn given by
Dn = d(G)
o . (1.1.2)
It is not hard to see that the probability mass function of Dn is given by
1
1 (G) ,
X
P(Dn = k) = (1.1.3)
|V (G)| v∈V (G) {dv =k}

where, for a set A, we write |A| for its size. Exercise 1.1 asks you to prove (1.1.3).
The average degree in a network is equal to
1 X 2|E(G)|
d(G)
v = . (1.1.4)
|V (G)| v∈V (G) |V (G)|
1.1 Motivation: Real-World Networks 5

104

103
Average degree

102

101

100

101 102 103 104 105 106 107 108

Size
Figure 1.1 Average degrees in the 727 networks of size larger than 10,000 from the
KONECT data base.
We can rewrite (1.1.4) as
1 X
d(G) = E[Dn ], (1.1.5)
|V (G)| v∈V (G) v

where the expectation is with respect to the random vertex o in Dn = d(G)

o (recall (1.1.2)).
The average degree can take any value in between 0 for an empty graph, and |V (G)| − 1
for a complete graph. In reality, however, we see that the average degree of many real-
world networks is not very large, i.e., these networks tend to be sparse. Figure 1.1 shows
the average degrees in the KONECT data base, and we see that the average degree does not
seem to grow with the network size.
We next discuss some common features that many real-world networks turn out to have.

1.1.2 S CALE -F REE P HENOMENON

The first, maybe quite surprising, fundamental property of many real-world networks is
that the number of vertices with degree at least k decays slowly for large k . This implies
that degrees are highly variable and that, even though the average degree is not particularly
large, there exist vertices with extremely high degree. Often, the tail of the empirical degree
distribution seems to fall off as an inverse power of k . This is called a “power-law degree
sequence,” and the resulting graphs often go under the name “scale-free graphs.” This is
visualized for the Autonomous Systems (AS) graph from the Internet in Figure 1.5(a), where
the degree distribution of the AS graph is plotted on a log–log scale. Thus, we see a plot of
log k 7→ log nk , where nk is the number of vertices with degree k . When nk is proportional
to an inverse power of k , i.e., when, for some normalizing constant cn and exponent τ ,

nk ≈ cn k −τ , (1.1.6)

and thus
log nk ≈ log cn − τ log k, (1.1.7)
6 Introduction and Preliminaries

107

106

Maximum degree
105

104

103

102
105 106 107 108
Size
Figure 1.2 Maximal degrees in the 727 networks of size larger than 10,000 from
the KONECT data base. Linear regression gives log dmax = 0.742 + 0.519 log n.
so that the plot of log k 7→ log nk is close to a straight line. This is the reason why degree
sequences in networks are often depicted in a log–log fashion, rather than in the more cus-
tomary form of k 7→ nk . Here, and in the remainder of this section, we write ≈ to denote
an uncontrolled approximation. The power-law exponent τ can be estimated by the absolute
value of the slope of the line in the log–log plot. Naturally, we must have that
X
nk = |V (Gn )| < ∞, (1.1.8)
k

P to assume that τ > 1. In fact, many networks are sparse, meaning

so that it is reasonable
that their average k knk /|V (Gn )| remains uniformly bounded, which in turn suggests
that τ > 2 is to be expected. See Figure 1.2 for the maximal degrees in the KONECT data
base in log–log scale, which should be compared with Figure 1.1. While there does not seem
to be a trend in Figure 1.1, there does seem to be one in Figure 1.2; this indicates that the
log of the maximal degree tends to grow linearly with the log of the network size. The latter
is consistent with power-law degrees.
Let us define the degree distribution by p(G k
n)
= nk /|V (Gn )| = P(Dn = k) (recall
(1.1.2) and (1.1.3)), so that p(G
k
n)
equals the probability that a uniformly chosen vertex in a
graph Gn with n vertices has degree k (recall (1.1.3)). Then (1.1.6) can be reexpressed as
pk(Gn ) ≈ ck −τ , (1.1.9)
where again ≈ denotes an uncontrolled approximation.
Vertices with extremely high degrees go under various names, indicating their importance
in the field. They are often called hubs, like the hubs in airport networks. Another name for
them is super-spreader, indicating the importance of the high-degree vertices in spreading
information or diseases. The hubs quantify the level of inhomogeneity in the real-world
networks, and a large part of this book is centered around rigorously establishing the effect
that the high-degree vertices have on various properties of the graphs involved.
Further, a central topic in network science is how the behavior of stochastic processes
on networks is affected by degree inhomogeneities. Such effects are especially significant
when the networks are “scale-free,” meaning that they can be well approximated by power
1.1 Motivation: Real-World Networks 7

laws with exponents τ satisfying τ ∈ (2, 3), so that random variables with such degrees
have infinite variance. Since maximal degrees of networks of size n can be expected to
grow as n1/(τ −1) (see Exercise 1.2 for an illuminating example), Figure 1.2 suggests that,
on average, 1/(τ − 1) ≈ 0.519, so that, again on average, τ ≈ 2.93, which is in line with
such predictions.
For the Internet, log–log plots of degree sequences first appeared in a paper by the Falout-
sos brothers (1999) (see Figure 1.3(b) for the degree sequence in the Autonomous Systems
graph, where the degree distribution looks relatively smooth because it is binned). Here,
the power-law exponent is estimated as τ ≈ 2.15–2.20. Figure 1.3(a) displays the degree
distribution in the Internet Movie Data base (IMDb), in which the vertices are actors and
two actors are connected when they have acted together in a movie. Figure 1.4 displays the
degree-sequence for both the in- as well as the out-degrees in various World-Wide Web data
bases.
(a) (b)
100
10−1
10−1

10−2
Proportion

10−3
Proportion

10−3

10−4 10−5
10−5

10−6 10−7
10−7 0
10 101 102 103 104 105 100 101 102 103 104
Degree Degree
Figure 1.3 (a) Log–log plot of the degree sequence in the 2007 Internet Movie
Data base. (b) Log–log plot of the probability mass function of the Autonomous
Systems degree sequence on April 2014, on a log–log scale from Krioukov et al.
(2012) (data courtesy of Dmitri Krioukov). This degree distribution looks smoother
than others (see e.g., Figure 1.3(a) and 1.4), due to binning of the data.

Recent Discussion on Power-Law Degrees in Real-World Networks

Recently, a vigorous discussion has emerged on how often real-world networks have power-
law degree distributions. This discussion was spurred by Broido and Clauset (2019), who
claimed (even as the title of their paper) that
Scale-free networks are rare.
What did they do to reach this conclusion? Broido and Clauset (2019) performed the first
extensive analysis of a large number of real-world network data sets, and compared degree
sequences of these real-world networks with power-law, as well as with log-normal, expo-
nential and Weibull distributions. They also made comparisons with power-law distributions
having exponential truncation. The main conclusion of Broido and Clauset (2019) was that,
in many cases, alternative distributions are preferred over power laws (see also Table 1.1).
Clearly this work caused quite a stir, as the conclusion, if correct, would make about 20
years of network science close to redundant from a practical perspective. Barabási (2018)
8 Introduction and Preliminaries

(a) (b)
100 100
Google Google
10−1 Berkeley-Stanford 10−1 Berkeley-Stanford

10−2 10−2
Proportion

Proportion
10−3 10−3

10−4 10−4

10−5 10−5

10−6 0 10−6 0
10 101 102 103 104 105 10 101 102 103
In-degree Out-degree
Figure 1.4 The probability mass function of the in- and out-degree sequences in
the Berkeley-Stanford and Google competition graph data sets of the World Wide
Web in Leskovec et al. (2009). (a) In-degree; (b) out-degree.

Alternative f (x) ∝ MPL Inconclusive MAlt

Exponential e−λx 33% 26% 41%
1 −(log x−µ)2 /(2σ 2 )
Log-normal xe a
12% 40% 48%
Weibull e−(x/b) 33% 20% 47%
−τ −Ax
Power law with cutoff x e – 44% 56%

Table 1.1 For comparison, fits of scale-free and alternative distributions to real-world networks
taken from (Broido and Clauset, 2019, Table 1). Listed are the percentage of network data sets that
favor the power-law model MPL , the alternative model MAlt , or neither, under a likelihood-ratio test,
along with the form of the alternative distribution indicated by the alternative density x 7→ f (x).

wrote a blog post containing detailed criticism of the methods and results in Broido and
Clauset (2019), see also Voitalov et al. (2019). Holme (2019) summarized the status of the
arguments in 2019, reaching an almost philosophical conclusion:
Still, it often feels like the topic of scale-free networks transcends science – debating them
probably has some dimension of collective soul searching as our field slowly gravitates
toward data science, away from complexity science.
So, what did the discussion focus on? Here is a list of questions:
What are power-law data? An important question in the discussion on power-law degree
distributions is how to interpret the approximation sign in (1.1.9). Most approaches start
by assuming that the data are realizations of independent and identically distributed (iid)
random variables. This can only be an assumption, as degree distributions are mostly
graphical (meaning that they can arise as degree sequences of graphs without self-loops
and multiple edges), which introduces dependencies between them (if only because the
sum of the degrees needs to be even). However, without this assumption, virtually any
analysis becomes impossible, so let us assume this as well.
1.1 Motivation: Real-World Networks 9

Under the above assumption, one needs to infer the degree distribution from the sample
of degrees obtained from a real-world network. We denote the asymptotic degree distri-
bution by pk , i.e., the proportion of vertices of degree k in the infinite-graph limit. Under
this assumption, p(G k
n)
in (1.1.9) is the empirical probability mass function corresponding
to the true underlying degree distribution (pk )k≥0 . The question is thus what probability
mass functions (pk )k≥0 correspond to a power law.
Broido and Clauset (2019) interpreted the power-law assumption as
pk = ck −τ for all k ≥ kmin , (1.1.10)
and pk arbitrary for k ∈ [kmin − 1]; here c > 0 is chosen appropriately. The inclusion
of the kmin parameter is based on the observation that small values of k generally do not
satisfy the pure power law (see also Clauset et al. (2009), where (1.1.10) first appeared).
Barabási (2018) instead argued from the perspective of generative models (such as the
preferential attachment models described in Section 1.3.5, as well as in Chapters 5 and
8):

In other words, by 2001 it was pretty clear that there is no one-size-fits-all formula for
the degree distribution for networks driven by the scale-free mechanism. A pure power
law only emerges in simple idealised models, driven by only growth and preferential
attachment, and free of any additional effects.

Bear in mind that this dynamical approach is very different from that of Broido and
Clauset (2019), as the degrees in generative models can hardly be expected to be real-
izations of an iid sample! Barabási (2018) instead advocated a theory that predicts power
laws with exponential truncation for many settings, meaning that
pk = ck −τ e−Ak for all k ≥ dmin , (1.1.11)
where dmin denotes the minimal degree in the graph and c, A > 0 are appropriate con-
stants, but the theory also allows for “additional effects,” such as vertex fitnesses that
describe intrinsic differences in how likely it is to connect to vertices, and that may be
realistic in some real-world networks.
Voitalov et al. (2019) took a static approach related to that of Broido and Clauset
(2019), but instead assumed more general power laws of the form
X
1 − F (x) = pk = x−(τ −1) L(x) for all x ≥ 1, (1.1.12)
k>x

where x 7→ L(x) is a so-called slowly varying function, meaning a function that does not
change the power-law exponent, in that it grows or decays more slowly than any power
at infinity. See [V1, Definition 1.5], or Definition 1.19 below, for a precise definition. In
particular, distributions that satisfy (1.1.10) also satisfy (1.1.12), but not necessarily the
other way around.
The advantage of working with (1.1.12) is that this definition is quite general, yet a
large body of work within the extreme-value statistics community becomes available.
These results, as summarized in Voitalov et al. (2019), allow for the “most accurate”
ways of estimating the power-law exponent τ , which brings us to the next question.
10 Introduction and Preliminaries

How to estimate the power-law exponent? Since Broido and Clauset (2019) interpreted
the power-law assumption as in (1.1.10), estimating the model parameters then boiled
down to estimating kmin and τ . For this, Broido and Clauset (2019) relied on the first pa-
per on estimating power-law exponents in the area of networks, by Clauset et al. (2009),
who proposed the power-law-fit method (PLF IT). This method chooses the best possible
kmin on the basis of the difference between the empirical degree distribution for values
above kmin and the power-law distribution function based on (1.1.10) with an appropri-
ately estimated value τ̂ of τ , as proposed by Hill (1975), for realizations above kmin .
The estimator τ̂PLFit is then the estimator of τ corresponding to the optimal kmin .
The PLF ITmethod was recently proved to be a consistent method by Bhattacharya et al.
(2020), which means that the estimator will, in the limit, converge in probability to the
correct value τ , even under the weaker assumption in (1.1.12). Of course, the question
remains whether τ̂PLFit is a good estimator, for example in the sense that the rate of con-
vergence of τ̂PLFit to τ is optimal. The results and simulations in Drees et al. (2020)
suggest that, even in the case of a pure power law as in (1.1.10) with kmin = 1, τ̂PLFit
is outperformed by more classical estimators (such as the maximum likelihood estimator
for τ ). Voitalov et al. (2019) rely on the estimators proposed in the extreme-value lit-
erature; see e.g. Danielsson et al. (2001); Draisma et al. (1999); Hall and Welsh (1984)
for such methods and Resnick (2007); Beirlant et al. (2006) for extensive overviews of
extreme-value statistics.
The dynamical approach by Barabási (2018) instead focusses on estimating the pa-
rameters in the proposed dynamical models, a highly-interesting topic that is beyond the
scope of this book.
How to perform tests? When confronted with a model, or with two competing models
such as in Table 1.1, a statistician would often like to compare the fit of these models
to the data, so as to be able to choose between them. When both models are parametric,
meaning that they involve a finite number of parameters, like the models in Table 1.1,
this can be done using a so-called likelihood-ratio test. For this, one computes the likeli-
hood of the data (basically the probability that the model in question gives rise to exactly
what was found in the data) for each of the models, and then takes the ratio of the two
likelihoods. In the settings in Table 1.1, this means that the likelihood of the data for the
power-law model is divided by that for the alternative model. When this exceeds a certain
threshold, the test does not reject the possibility that the data comes from a power law,
otherwise it rejects the null hypothesis of a power-law degree distribution. This is done for
each of the networks in the data base, and Table 1.1 indicates the percentages for which
each of the models is deemed the most likely.
Unfortunately, such likelihood ratio tests can be performed only when one compares
parametric settings. The setting in (1.1.12) is non-parametric, as it involves the unknown
slowly varying function x 7→ L(x), and thus, in that setting, no statistical test can be
performed unless one makes parametric assumptions on the shape of x 7→ L(x) (by
assuming, for example, that L(x) is a power of log x). Thus, the parametric choice in
(1.1.10) is crucial in that it allows for a testing procedure to be performed. Alternatively,
if one does not believe in the “pure” power-law form as in (1.1.10), then tests are no
longer feasible. What approach should one then follow? See Artico et al. (2020) for a
1.1 Motivation: Real-World Networks 11

related testing procedure, in which the authors reached a rather different conclusion than
that of Broido and Clauset (2019).
How to partition networks? Broido and Clauset (2019) investigated a large body of net-
works, relying on a data base consisting of 927 real-world networks from the KONECT
project; see http://konect.cc as well as Kunegis (2013). We are also relying on this
data base for graphs showing network properties, such as average and maximal degrees,
etc. These networks vary in size, as well as in their properties (directed versus undirected,
static versus temporal, etc.). In their paper, Broido and Clauset (2019) report percentages
of networks having certain properties; see for example Table 1.1.
A substantial part of the discussion around Broido and Clauset (2019) focusses on
whether these percentages are representative. Take the example of a directed network,
which has several degree distributions, namely, in-degree, out-degree, and total degree
distributions (in the latter, the directions are simply ignored). This “diversity of degree
distributions” becomes even more pronounced when the network is temporal, meaning
that edges come and go as time progresses. When does one say that a temporal network
has a power-law degree distribution? When one of these degree distributions is classified
as power-law, when a certain percentage of them is, or when all of them are?
What is our approach in this book? We prefer to avoid the precise debate about whether
power laws in degree distributions are omnipresent or rare. We view power laws as a way
to model settings where there is a large amount of variability in the data, and where the
maximum values of the degrees are several orders of magnitude larger than the average
values (compare Figures 1.1 and 1.2). Power laws predict such differences in scale.
There is little debate about the fact that degree distributions in networks tend to be
highly inhomogeneous. Power laws are the model of choice to model such inhomo-
geneities, certainly in settings where empirical moments (for example, empirical vari-
ances) are very large. Further, inhomogeneities lead to interesting differences in structure
of the networks in question, which will be a focal point of this book. All the alternative
models in Table 1.1 have tails that are too thin for such differences to emerge. Thus, it
is natural to focus on models with power-law degrees to highlight the relation between
degree structure and network topology. Therefore, we often consider degree distributions
that are either exactly described by power laws or are bounded above or below by them.
The focus then resides in how the degree power-law exponent τ changes the network
topology.

After this extensive discussion of degrees in graphs, we continue by discussing graph

distances and their relation to small-world phenomena, a topic that is much less heatedly
debated.

1.1.3 S MALL -W ORLD P HENOMENON

A second fundamental network property observed in many real-world networks is the fact
that typical distances between vertices are small. This is called the “small-world” phe-
nomenon (see, e.g., the book by Watts (1999)). In particular, such networks are highly con-
nected: their largest connected component contains a significant proportion of the vertices.
Many networks, such as the Internet, even consist of one connected component, since other-
12 Introduction and Preliminaries

(a) (b)

0.4
0.12

0.3

0.08

0.2

0.04
0.1

0.0 0.00
1 2 3 4 5 6 7 8 9 10 11 12 13 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44

Figure 1.5 (a) Number of Autonomous Systems traversed in hopcount data. (b)
Internet hopcount data (courtesy of Hongsuda Tangmunarunkit).

wise e-mail messages could not be delivered between pairs of vertices in distinct connected
components.
Graph distances between pairs of vertices tend to be quite small in most networks. For
example, in the Internet, IP packets cannot use more than a threshold of physical links, and if
distances in the Internet were larger than this threshold then the e-mail service would simply
break down. Thus, the Internet graph has evolved in such a way that typical distances are
relatively small, even though the Internet itself is rather large. As seen in Figure 1.5(a), the
number of Autonomous Systems (ASs) traversed by an e-mail data set, sometimes referred
to as the AS-count, is typically at most 7. In Figure 1.5(b), the proportion of routers traversed
by an e-mail message between two uniformly chosen routers, referred to as the hopcount, is
shown. It shows that the number of routers traversed is at most 27. Figure 1.6 shows typical
distances in the IMDb; the distances are quite small despite the fact that the network contains
more than one million vertices.
The small-world nature of real-world networks is highly significant. Indeed, in small
worlds, news can spread quickly as relatively few people are needed to spread it between two
typical individuals. This is quite helpful in the Internet, where e-mail messages hop along
the edges of the network. At the other side of the spectrum, it also implies that infectious
diseases can spread quite quickly, as just a few infections can carry the disease to a large part
of the population. This implies that diseases have a large potential of becoming pandemic,
as the corona pandemic has made painfully clear.
Let us continue this discussion by formally introducing graph distances, as displayed in
Figures 1.5 and 1.6. For a graph G = (V (G), E(G)) and a pair of vertices u, v ∈ V (G),
we let the graph distance distG (u, v) between u and v be equal to the minimal number of
edges in a path linking u and v . When u and v are not in the same connected component,
we set distG (u, v) = ∞. We are interested in settings where G has a high amount of
connectivity, so that many pairs of vertices are connected to one another by short paths. In
order to describe the typical distances between vertices, we draw o1 and o2 independently
1.1 Motivation: Real-World Networks 13

0.6
2003

proportion of pairs
0.4

0.2

0
1 2 3 4 5 6 7 8 9 10
distance

Figure 1.6 Typical distances in the Internet Movie Data base (IMDb) in 2003.

and uar from V (G), and we investigate the random variable

distG (o1 , o2 ). (1.1.13)

The quantity in (1.1.13) is a random variable even for deterministic graphs, owing to the
presence of the two uar-chosen vertices o1 , o2 ∈ V (G). Figures 1.5 and 1.6 display the
probability mass functions of this random variable for some real-world networks.
Often, we consider distG (o1 , o2 ) conditional on distG (o1 , o2 ) < ∞. This means that we
consider the typical number of edges between a uniformly chosen pair of connected vertices.
As a result, distG (o1 , o2 ) is sometimes referred to as the typical distance.
The nice property of distG (o1 , o2 ) is that its distribution tells us something about all
possible distances in the graph. An alternative and frequently used measure of distance in a
graph is the diameter of the graph G, defined as

diam(G) = max distG (u, v). (1.1.14)

u,v∈V (G)

However, the diameter has several disadvantages. First, in many instances, the diameter
is algorithmically more difficult to compute than the typical distances (since one has to
compute the distances between all pairs of vertices and maximize over them). Second, it
is a number instead of a distribution of a random variable, and therefore contains far less
information than the distribution distG (o1 , o2 ). Finally, the diameter is highly sensitive to
relatively small changes in the graph G under consideration. For example, adding a relatively
small string of connected vertices to a graph (each of the vertices in the string having degree
2) may drastically change the diameter, while it hardly influences the typical distances.

1.1.4 R ELATED N ETWORK P ROPERTIES

There are many more features that one could take into account when modeling real-world
networks. See e.g., [V1, Section 1.5] for a slightly expanded discussion of such features.
Other features that many networks share, or, rather, form a way to distinguish between them,
are the following:
14 Introduction and Preliminaries

(a) their degree correlations, measuring the extent to which high-degree vertices tend to be
connected to high-degree vertices rather than to low-degree vertices (and vice versa);
(b) their clustering, measuring the extent to which pairs of neighbors of vertices are neighbors
themselves;
(c) their community structure, measuring the extent to which the network has more densely-
connected subgraphs;
(d) their spatial structure, where the spatial component is either describing true vertex loca-
tions in real-world networks, or instead some latent geometry in them. The spatial struc-
ture is such that vertices that are near are more likely to be connected.
See, e.g., the book by Newman (2010) for an extensive discussion of such features, as
well as the algorithmic problems that arise from them. We also refer the reader to Chapter
9, where we discuss several related models that focus on these properties.

1.2 R ANDOM G RAPHS AND R EAL -W ORLD N ETWORKS

In this section we discuss how random graph sequences can be used to model real-world
networks. We start by discussing graph sequences.

Graph Sequences
Motivated by the previous section, in which empirical evidence was discussed showing that
many real-world networks are scale free and small world, we set about the question of how to
model them. Since many networks are quite large, mathematically, we model real-world net-
works by graph sequences (Gn )n≥1 , where Gn = (V (Gn ), E(Gn )) has size |V (Gn )| = n
and we take the limit n → ∞. Since most real-world networks are such that the average
degree remains bounded, we will focus on the sparse regime. In the sparse regime (recall
(1.1.2) and (1.1.3)), it is assumed that
1 X
lim sup E[Dn ] = lim sup dv(Gn ) < ∞. (1.2.1)
n→∞ n→∞ |V (G n )| v∈V (G )n

Furthermore, we aim to study graphs that are asymptotically well behaved. For example,
we often either assume, or prove, that the typical degree distribution converges, i.e., there
exists a limiting degree random variable D such that
d
Dn −→ D, (1.2.2)
d
where −→ denotes weak convergence of random variables. Also, we assume that our graphs
are small worlds, which is often translated in the asymptotic sense that there exists a constant
K < ∞ such that
lim P(distG (o1 , o2 ) ≤ K log n) = 1, (1.2.3)
n→∞

where n denotes the network size. Sometimes, we even discuss ultra-small worlds, for which
lim P(distG (o1 , o2 ) ≤ ε log n) = 1 (1.2.4)
n→∞

for every ε > 0. In what follows, we discuss random graph models that share these two
features.
1.3 Random Graph Models 15

Random Graphs as Models for Real-World Networks

Real-world networks tend to be quite complex and unpredictable. This is understandable,
since connections often arise rather irregularly. We model such irregular behavior by letting
connections arise through a random process, thus leading us to study random graphs. By
the previous discussion, our graphs are large and their sizes n tend to infinity.
In such settings, we can either model the graphs by fixing their size to be large, or rather
by letting the graphs grow to infinite size in a consistent manner. We refer to these two
settings as static and dynamic random graphs. Both are useful viewpoints. Indeed, a static
graph is a model for a snapshot of a network at a fixed time, where we do not know how the
connections arose over time. Many network data sets are of this form. A dynamic setting,
however, may be useful when we know how the network came to be as it is. In the static
setting, we can make model assumptions on the degrees such that they are scale free. In
the dynamic setting, we can let the evolution of the graphs give rise to power-law degree
sequences, so that these settings may provide explanations for the frequent occurrence of
power laws in real-world networks.
Most of the random graph models that have been investigated in the (extensive) literature
are caricatures of reality, in the sense that one cannot confidently argue that they describe
any real-world network quantitatively correctly. However, these random graph models do
provide insight into how any of the above features can influence the global behavior of
networks. In this way, they provide possible explanations of the empirical properties of real-
world networks that are observed. Also, random graph models can be used as null models,
where certain aspects of real-world networks are taken into account while others are not.
This gives a qualitative way of investigating the importance of such empirical features in
the real world. Often, real-world networks are compared with uniform random graphs with
certain specified properties, such as their number of edges or even their degree sequence.
Below, we will come back to how to generate random graphs uar from the collection of all
graphs with these properties.
In the next section we describe four models of random graphs, three of which are static
and one is dynamic. Below, we frequently write f (n) = O(g(n)) if |f (n)|/|g(n)| is
uniformly bounded from above by a positive constant as n → ∞, f (n) = Θ(g(n)) if
f (n) = O(g(n)) and g(n) = O(f (n)), and f (n) = o(g(n)) if f (n)/g(n) tends to 0 as
n → ∞. We say that f (n) g(n) when g(n) = o(f (n)).

1.3 R ANDOM G RAPH M ODELS

We start with the most basic and simple random graph model, which has proved to be a
source of tremendous inspiration, both for its mathematical beauty, as well as for providing
a starting point for the analysis of random graphs.

1.3.1 E RD ŐS –R ÉNYI R ANDOM G RAPH

The Erdős–Rényi random graph is the simplest possible random graph. In it, we make every
possible edge between a collection of n vertices independently either open or closed with
equal probability. This means that the Erdős–Rényi random graph has vertex set [n] =
16 Introduction and Preliminaries

{1, . . . , n}, and the edge uv is occupied or present with probability p, and vacant or absent
otherwise, independently of all the other edges. Here we denote the edge between vertices
u, v ∈ [n] by uv . The parameter p is called the edge probability. The above random graph is
denoted by ERn (p). The model is named after Erdős and Rényi, since they made profound
contributions in the study of this model. Exercise 1.3 investigates the uniform nature of
ERn (p) with p = 21 . Alternatively speaking, ERn (p) with p = 21 is the null model, where
we take no properties of the network into account except for the total number of edges. The
vertices in this model have expected degree (n − 1)/2, which is quite large. As a result, this
model is not sparse at all. Thus, we next make this model sparse by making p smaller.
Since each edge is occupied with probability p, we obtain that
!
n−1 k
P(Dn = k) = p (1 − p)n−1−k = P(Bin(n − 1, p) = k), (1.3.1)
k

where Bin(m, p) is a binomial random variable with m trials and success probability p.
Note that
E[Dn ] = (n − 1)p, (1.3.2)
so for this model to be sparse, we need that p becomes small with n. Thus, we take
λ
p= , (1.3.3)
n
and study the graph as λ is held fixed while n → ∞. In this regime, we know that
d
Dn −→ D, (1.3.4)
with D ∼ Poi(λ), where Poi(λ) is a Poisson random variable with mean λ. It turns out that
this result can be strengthened to the statement that the proportion of vertices with degree
k also converges to the probability mass function of a Poisson random variable (see [V1,
Section 5.4], and in particular [V1, Theorem 5.12]), i.e., for every k ≥ 0,
1 X λk
Pk(n) = 1{dv =k} −→
P
pk ≡ e−λ , (1.3.5)
n v∈[n] k!

where dv denotes the degree of v ∈ [n].

It is well known that the Poisson distribution has very thin tails, even thinner than any
exponential, as you are requested to prove in Exercise 1.4. We conclude that the Erdős–
Rényi random graph is not a good model for real-world networks with their highly variable
degree distributions. In the next subsection, we discuss inhomogeneous extensions of Erdős–
Rényi random graphs which can have highly variable degrees.

1.3.2 I NHOMOGENEOUS R ANDOM G RAPHS

In inhomogeneous random graphs, we keep the independence of the edges, but make the
edge probabilities different for different edges. We will discuss such general inhomogeneous
random graphs in Chapter 3 below. Here, we start with one key example, which has attracted
most attention in the literature so far and is also discussed in great detail in [V1, Chapter 6].
1.3 Random Graph Models 17

Rank-1 Inhomogeneous Random Graphs

The simplest inhomogeneous random graph models are sometimes referred to as rank-1
models, since the edge probabilities are (close to) products of vertex weights (see Remark
1.5 below for more details). This means that the expected number of edges between vertices,
when viewed as a matrix, is (close to) a rank-1 matrix. We start by discussing one such
model, which is the so-called generalized random graph.
In the generalized random graph model, the edge probability of the edge between vertices
u and v , for u 6= v , is equal to
wu wv
puv = p(GRG)
uv = , (1.3.6)
`n + wu wv
where w = (wv )v∈[n] are the vertex weights, and `n is the total vertex weight, given by
X
`n = wv . (1.3.7)
v∈[n]

We denote the resulting graph by GRGn (w). In many cases, the vertex weights actually
depend on n, and it would be more appropriate (but also more cumbersome), to write the
weights as w(n) = (wv(n) )v∈[n] . To keep the notation simple, we refrain from making the
dependence on n explicit. A special case of the generalized random graph occurs when
nλ
we take wv ≡ n−λ , in which case puv = λ/n for all u, v ∈ [n] so that we retrieve the
Erdős–Rényi random graph ERn (λ/n).
The generalized random graph GRGn (w) is close to many other inhomogeneous random
graph models, such as the random graph with prescribed expected degrees or Chung–Lu
model, denoted by CLn (w), where instead
puv = p(CL)
uv = min(wu wv /`n , 1). (1.3.8)
A further adaptation is the so-called Poissonian random graph or Norros–Reittu model,
denoted by NRn (w), for which

uv = 1 − exp (−wu wv /`n ) .

puv = p(NR) (1.3.9)
See [V1, Sections 6.7 and 6.8] for conditions under which these random graphs are asymp-
totically equivalent, meaning that all events have equal asymptotic probabilities.
Naturally, the topology of the generalized random graph depends sensitively upon the
choice of the vertex weights w = (wv )v∈[n] . These vertex weights can be rather general,
and we investigate both settings where the weights are deterministic as well as settings
where they are random. In order to describe the empirical proportions of the weights, we
define their empirical distribution function to be
1 X
Fn (x) = 1{wv ≤x} , x ≥ 0. (1.3.10)
n v∈[n]

We can interpret Fn as the distribution of the weight of a uniformly chosen vertex in [n] (see
Exercise 1.7). We denote the weight of a uniformly chosen vertex o in [n] by Wn = wo , so
that, by Exercise 1.7, Wn has distribution function Fn .
The degree distribution can converge only when the vertex weights are sufficiently regular.
18 Introduction and Preliminaries

We often assume that the vertex weights satisfy the following regularity conditions, which
turn out to imply convergence of the degree distribution in the generalized random graph:
Condition 1.1 (Regularity conditions for vertex weights) There exists a distribution func-
tion F such that, as n → ∞, the following conditions hold:
(a) Weak convergence of vertex weights. As n → ∞,
d
Wn −→ W, (1.3.11)
where Wn and W have distribution functions Fn and F , respectively. Equivalently, for any
x for which x 7→ F (x) is continuous,
lim Fn (x) = F (x). (1.3.12)
n→∞

(b) Convergence of average vertex weight. As n → ∞,

E[Wn ] → E[W ] ∈ (0, ∞), (1.3.13)
where Wn and W have distribution functions Fn and F from part (a) above, respectively.
(c) Convergence of second moment of vertex weight. As n → ∞,
E[Wn2 ] → E[W 2 ] < ∞, (1.3.14)
where Wn and W have distribution functions Fn and F from part (a) above, respectively.
Condition 1.1 is virtually the same as [V1, Condition 6.4]. Condition 1.1(a) guarantees
that the weight of a “typical” vertex is close to a random variable W that is independent of
n. Condition 1.1(b) implies that the average weight of the vertices in GRGn (w) converges
to the expectation of the limiting weight variable. In turn, this implies that the expectation
of the average degree in GRGn (w) converges to the expectation of this limiting random
variable as well. Condition 1.1(c) ensures the convergence of the second moment of the
weights to the second moment of the limiting weight variable.
Remark 1.2 (Regularity for random weights) Sometimes we are interested in cases where
the weights of the vertices are random themselves. For example, this arises when the weights
w = (wv )v∈[n] are realizations of iid random variables. Then, the function Fn is also a ran-
dom distribution function. Indeed, in this case Fn is the empirical distribution function
P of the
random weights (wv )v∈[n] . We stress that E[Wn ] is then to be interpreted as n v∈[n] wv ,
1

which is itself random. Therefore, in Condition 1.1 we require random variables to converge,
and there are several notions of convergence that may be used. The notion of convergence
that we assume is convergence in probability (see [V1, Section 6.2]). J
Let us now discuss some canonical examples of weight distributions that satisfy the Reg-
ularity Condition 1.1.

Weights Moderated by a Distribution Function

Let F be a distribution function for which F (0) = 0 and fix
wv = [1 − F ]−1 (v/n), (1.3.15)
1.3 Random Graph Models 19

where [1 − F ]−1 is the generalized inverse function of 1 − F , defined, for u ∈ (0, 1), by
(recall [V1, (6.2.14) and (6.2.15)])
[1 − F ]−1 (u) = inf{x : [1 − F ](x) ≤ u}. (1.3.16)
For the choice (1.3.15), we can explicitly compute Fn as (see [V1, (6.2.17)])
1
Fn (x) = nF (x) + 1 ∧ 1, (1.3.17)
n
where x ∧ y denotes the minimum of x, y ∈ R. It is not hard to see that Condition 1.1(a)
holds for (wv )v∈[n] as in (1.3.15), while Condition 1.1(b) holds when E[W ] ∈ (0, ∞), and
Condition 1.1(c) holds when E[W 2 ] < ∞, as can be concluded from Exercise 1.9.

Independent and Identically Distributed Weights

We now discuss the setting where the weights are an independent and identically distributed
(iid) sequence of random variables, for which Conditions 1.1(b) and (c) follow from the law
of large numbers, and Condition 1.1(a) from Pthe Glivenko–Cantelli Theorem. Since we will
often deal with ratios of the form wu wv /( k∈[n] wk ), we assume that P(w = 0) = 0 to
avoid situations where all weights are zero.
Both settings, i.e., with weights (wv )v∈[n] as in (1.3.15), and with iid weights (wv )v∈[n] ,
have their own merits. The great advantage of iid weights is that the vertices in the resulting
graph are, in distribution, the same. More precisely, the vertices are completely exchange-
able, as in the Erdős–Rényi random graph ERn (p). Unfortunately, when we take the weights
to be iid, in the resulting graph the edges are no longer independent (despite the fact that they
are conditionally independent given the weights). In what follows, we focus on the setting
where the weights are prescribed. When the weights are deterministic, this changes nothing;
when the weights are iid, this means that we are conditioning on the weights.

Degrees in Generalized Random Graphs

We write dv for the degree of vertex v in GRGn (w). Thus, dv is given by

1{uv∈E(GRGn (w))} .
X
dv = (1.3.18)
u∈[n]

For k ≥ 0, we let
1 X
Pk(n) = 1{dv =k} (1.3.19)
n v∈[n]

denote the proportion of vertices with degree k of GRGn (w). We call (Pk(n) )k≥0 the de-
gree sequence of GRGn (w). We denote the probability mass function of a mixed-Poisson
distribution by pk , i.e., for k ≥ 0,
h Wki
pk = E e−W , (1.3.20)
k!
where W is a random variable having distribution function F from Condition 1.1. The main
result concerning the vertex degrees is as follows:
20 Introduction and Preliminaries

Theorem 1.3 (Degree sequence of GRGn (w)) Assume that Conditions 1.1(a),(b) hold.
Then, for every ε > 0,
X∞
P |Pk(n) − pk | ≥ ε → 0, (1.3.21)
k=0

where (pk )k≥0 is given by (1.3.20).

Proof This is given in [V1, Theorem 6.10].
Consequently, with Dn = do denoting the degree of a random vertex, we obtain
d
Dn −→ D, (1.3.22)
where P(D = k) = pk , defined in (1.3.20), as shown in Exercise 1.10.
Recall from Section 1.1.2 that we are often interested in scale-free random graphs, i.e.,
random graphs for which the degree distribution obeys a power law. We see from Theorem
1.3 that this is true precisely when D obeys a power law. This, in turn, occurs precisely when
W obeys a power law, for example, when, for w large,
c
P(W > w) = τ −1 (1 + o(1)). (1.3.23)
w
Then, for w large,
P(D > w) = P(W > w)(1 + o(1)). (1.3.24)
This follows from Theorem 1.3, in combination with [V1, Exercise 6.12], which shows that
the tail behavior of a mixed-Poisson distribution and that of its weight distribution agree for
power laws.

Generalized Random Graph Conditioned on its Degrees

The generalized random graph with edge probabilities as in (1.3.6) is rather special. In-
deed, when we condition on its degree sequence, the graph has a uniform distribution over
the set of all graphs with the same degree sequence. For this, note that GRGn (w) can be
equivalently encoded by (Xuv )1≤u≤v≤n , where Xuv is the indicator that the edge uv is oc-
cupied. Then, (Xuv )1≤u≤v≤n are independent Bernoulli random variables with edge prob-
abilities as in (1.3.6). By convention, let Xvv = 0 for every v ∈ [n], and XvuP = Xuv for
1 ≤ u < v ≤ n. In terms of the variables X = (Xuv )1≤u<v≤n , let dv (X) = u∈[n] Xuv
be the degree of vertex v . Then, the uniformity is equivalent to the statement that, for each
x = (xuv )1≤u<v≤n such that dv (x) = dv for every v ∈ [n],
1
P(X = x | dv (X) = dv ∀v ∈ [n]) = , (1.3.25)
#{y : dv (y) = dv ∀v ∈ [n]}
that is, the distribution is uniform over all graphs with the prescribed degree sequence. This
turns out to be rather convenient, and thus we state it formally here:
Theorem 1.4 (GRG conditioned on degrees has a uniform law) The generalized random
graph GRGn (w) with edge probabilities (puv )1≤u<v≤n given by
wu wv
puv = , (1.3.26)
`n + wu wv
1.3 Random Graph Models 21

conditioned on {dv (X) = dv ∀v ∈ [n]}, is uniform over all graphs with degrees (dv )v∈[n] .

Proof See [V1, Theorem 6.15].

In Chapter 3 below, we discuss a far more general setting of inhomogeneous random
graphs. The analysis of such random graphs is substantially more challenging than the rank-
1 case. As explained in more detail there, this is due to the fact that these random graphs
are no longer locally described by single-type branching processes, but rather by multi-type
branching processes.
Remark 1.5 (What’s in a name?) The models discussed here, GRGn (w) in (1.3.6) as
well as CLn (w) in (1.3.8) and NRn (w) in (1.3.9), go under various names in the literature.
Bollobás et al. (2007) referred to them as a rank-1 random graph, because puv ≈ wu wv /`n
and the matrix (wu wv /`n )u,v∈[n] has rank one. In the physics literature, they go under the
name of hidden variable models, where the weights (wv )v∈[n] are interpreted as the hidden
variables (and they are often assumed to be iid). Owing to the uniformity in the conditional
distribution given its degrees, GRGn (w) is also a maximal entropy model, as will be ex-
plained in more detail in Section 9.4.4. Finally, some researchers call them soft configuration
models; see Remark 1.6 for further discussion of this phrase. J

1.3.3 C ONFIGURATION M ODELS

The configuration model is a model in which the degrees of vertices are fixed beforehand.
Such a model is more flexible than the generalized random graph. For example, the gener-
alized random graph always has a positive proportion of vertices of degree 0, 1, 2, etc., as
easily follows from Theorem 1.3.
Fix an integer n that denotes the number of vertices in the random graph. Consider a
sequence of degrees d = (dv )v∈[n] . Again, it might be more appropriate, but also more
cumbersome, to write the degrees as d(n) = (d(n)
v )v∈[n] , and so we will refrain from this. The
aim is to construct an undirected (multi-)graph with n vertices, where vertex v has degree
dv . Here a multi-graph is a graph possibly having self-loops and multiple edges between
pairs of vertices.
Without loss of generality, we assume throughout this chapter that dv ≥ 1 for all v ∈ [n],
since, when dv = 0, vertex v is isolated and can be removed from the graph. One possible
random graph model takes the uniform measure over such undirected and simple graphs.
Here, we call a multi-graph simple when it has no self-loops, and no multiple edges exist
between any pair of vertices. However, the set of undirected simple graphs with n vertices
where vertex v has degree dv may be empty. For example, in order for such a graph to exist,
we must assume that the total degree
X
`n = dv (1.3.27)
v∈[n]

is even.
We wish to construct aP simple graph such that d = (dv )v∈[n] are the degrees of the n
vertices. Even when `n = v∈[n] dv is even, however, this is not always possible. Therefore,
instead, we construct a multi-graph. One way of obtaining such a multi-graph with the given
degree sequence is to pair the half-edges attached to the different vertices in a uniform way.
22 Introduction and Preliminaries

Two half-edges together form an edge, thus creating the edges in the graph. Let us explain
this in more detail.
To construct the multi-graph where vertex v has degree dv for all v ∈ [n], we have n
separate vertices and, incident to vertex v , we have dv half-edges. Every half-edge needs
to be connected to another half-edge to form an edge, and by forming all edges we build
the graph. For this, the half-edges are numbered in an arbitrary order from 1 to `n . We start
by randomly connecting the first half-edge with one of the `n − 1 remaining half-edges.
Once paired, two half-edges form a single edge of the multi-graph, and these half-edges are
removed from the list of half-edges that need to be paired. Hence, a half-edge can be seen
as the left or the right half of an edge. We continue the procedure of randomly choosing and
pairing the half-edges until all half-edges are connected, and we call the resulting graph the
configuration model with degree sequence d, abbreviated as CMn (d). The pairing of the
half-edges that induces the configuration model graph is sometimes called a configurationl.
A careful reader may worry about the order in which the half-edges are being paired.
In fact, this ordering turns out to be irrelevant since the random pairing of half-edges is
completely exchangeable. It can even be done in a random fashion, which will be useful
when investigating neighborhoods in the configuration model. See e.g., [V1, Definition 7.5
and Lemma 7.6] for more details on this exchangeability.
Interestingly, one can rather explicitly compute the distribution of CMn (d). To do so,
note that CMn (d) is characterized by the random vector (Xuv )1≤u≤v≤n . Here Xuv is the
number of edges between vertex u and v , and Xvv is the number of self-loops incident to
vertex v , so that
X
dv = Xvv + Xuv . (1.3.28)
u∈[n]

Note furthermore that Xvv appears twice in (1.3.28), which is natural, since a self-loop
consists of two half-edges. This does not conflict with the definition of dv for GRGn (w),
since Xuu = 0 and Xu,v ∈ {0, 1} for GRGn (w).
In terms of this notation, and writing G = (xuv )u,v∈[n] to denote a multi-graph on [n],
Q
1 v∈[n] dv !
P(CMn (d) = G) = . (1.3.29)
(`n − 1)!! v∈[n] 2xvv 1≤u≤v≤n xuv !
Q Q

See, e.g., [V1, Proposition 7.7] for this result. In particular, P(CMn (d) = G) is the same
for each simple G, where G is simple when xvv = 0 for every v ∈ [n] and xuv ∈ {0, 1}
for every 1 ≤ u < v ≤ n. Thus, the configuration model conditioned on simplicity is a
uniform random graph with the prescribed degree distribution. This is quite relevant, as it
gives a convenient way to obtain such a uniform graph, which is a highly non-trivial fact.

Remark 1.6 (What’s in a name continued?) The name configuration model was invented
by Bollobás (1980), who considered the matching of half-edges to be the configuration on
which the model is based. The model of study for Bollobás (1980) was the uniform simple
random regular graph, where all degrees are the same, as we discuss further below. Molloy
and Reed (1995, 1998) extended it to general degrees. As a result, it is sometimes also called
1.3 Random Graph Models 23

the Molloy–Reed model. With Xuv equal to the number of edges between vertices u and v ,
du dv
E[Xuv ] = , (1.3.30)
`n − 1
since each of the dv half-edges incident to vertex v has probability du /(`n − 1) to be con-
nected to vertex u. Since (1.3.30) is close to the edge probability puv in rank-1 random
graphs (recall Remark 1.5), rank-1 random graphs are sometimes called soft configuration
models. The configuration-model degree constraint is instead viewed as a hard constraint.J
The uniform nature of the configuration model conditioned on simplicity partly explains
its popularity, and it has become one of the most highly studied random graph models. It also
implies that, conditioned on simplicity, the configuration model is the null model for a real-
world network where all the degrees are fixed. This allows one to distinguish the relevance
of the degree inhomogeneity from other features of the network, such as its community
structure, clustering, etc.
As for GRGn (w), we again impose regularity conditions on the degree sequence d. In
order to state these assumptions, we introduce some notation. We denote the degree of a
uniformly chosen vertex o in [n] by Dn = do . The random variable Dn has distribution
function Fn given by
1 X
Fn (x) = 1{dv ≤x} , (1.3.31)
n v∈[n]
which is the empirical distribution of the degrees. We assume that the vertex degrees satisfy
the following regularity conditions:
Condition 1.7 (Regularity conditions for vertex degrees)
(a) Weak convergence of vertex degrees. There exists a distribution function F such that,
as n → ∞,
d
Dn −→ D, (1.3.32)
where Dn and D have distribution functions Fn and F , respectively.
Equivalently, for any x ∈ R,
lim Fn (x) = F (x). (1.3.33)
n→∞

Further, we assume that F (0) = 0, i.e., P(D ≥ 1) = 1.

(b) Convergence of average vertex degree. As n → ∞,
E[Dn ] → E[D] < ∞, (1.3.34)
where Dn and D have the distribution functions Fn and F from part (a) above, respectively.

(c) Convergence of second moment of vertex degrees. As n → ∞,

E[Dn2 ] → E[D2 ] ∈ (0, ∞), (1.3.35)
where Dn and D have distribution functions Fn and F from part (a) above, respectively.
The possibility that one will obtain a non-simple graph is a major disadvantage of the
configuration model. There are two ways of dealing with this complication, as follows:
24 Introduction and Preliminaries

Erased Configuration Model

The first way of dealing with self-loops and multi-edges is to erase the problems. This
means that we replace CMn (d) = (Xuv )1≤u≤v≤n by its erased version ECMn (d) =
(er) (er) (er)
(Xuv )1≤u≤v≤n , where Xvv ≡ 0, while Xuv = 1 precisely when Xuv ≥ 1. In words,
we remove the self-loops and merge all multiple edges to a single edge. Of course, this
changes the precise degree distribution. However, [V1, Theorem 7.10] (see also Theorem
1.8 below) shows that only a small proportion of the edges is erased, so that the erasing
does not change the asymptotic degree distribution. See [V1, Section 7.3] for more details.
Of course, the downside of this approach is that the degrees are changed by the procedure,
while we would like to keep the degrees precisely as specified.
Let us describe the degree distribution in the erased configuration model in more detail,
to study the effect of the erasure of self-loops and multiple edges. We denote the degrees in
the erased configuration model by D (er) = (Dv(er) )v∈[n] , so that
Dv(er) = dv − 2sv − mv , (1.3.36)
where (dv )v∈[n] are the degrees in CMn (d), sv = xvv is the number of self-loops of vertex
v in CMn (d), and
(xuv − 1)1{xuv≥2 }
X
mv = (1.3.37)
u6=v

is the number of multiple edges removed from v .

Denote the empirical degree sequence (pk(n) )k≥1 in CMn (d) by
1 X
k = P(Dn = k) =
p(n) 1{dv =k} , (1.3.38)
n v∈[n]

and denote the related degree sequence in the erased configuration model (Pk(er) )k≥1 by
1 X
Pk(er) = 1 (er) . (1.3.39)
n v∈[n] {Dv =k}

From the notation it should be clear that (p(n) k )k≥1 is a deterministic sequence when d =
(dv )v∈[n] is deterministic, while (Pk(er) )k≥1 is a random sequence, since the erased degrees
(Dv(er) )v∈[n] form a random vector even when d = (dv )v∈[n] is deterministic.
Now we are ready to state the main result concerning the degree sequence of the erased
configuration model:
Theorem 1.8 (Degree sequence of erased configuration model with fixed degrees) For
fixed degrees d satisfying Conditions 1.7(a),(b), the degree sequence of the erased config-
uration model (Pk(er) )k≥1 converges in probability to (pk )k≥1 . More precisely, for every
ε > 0,
X∞
P |Pk(er) − pk | ≥ ε → 0, (1.3.40)
k=1

where pk = P(D = k) as in Condition 1.7(a).

1.3 Random Graph Models 25

Proof See [V1, Theorem 7.10].

Theorem 1.8 indeed shows that most of the edges are kept in the erasure procedure; see
Exercise 1.17.

Configuration Model Conditioned on Simplicity

The second solution to the multi-graph problem of the configuration model is to throw away
the result when it is not simple, and try again. Therefore, this construction is sometimes
called the repeated configuration model. It turns out that, when Conditions 1.7(a)–(c) hold
(see [V1, Theorem 7.12]),
2
lim P(CMn (d) is a simple graph) = e−ν/2−ν /4
, (1.3.41)
n→∞

where
E[D(D − 1)]
ν= (1.3.42)
E[D]
is the expected forward degree. This is a realistic option when E[D2 ] < ∞. Unfortu-
nately, this is not an option when the asymptotic degrees obey an asymptotic power law with
τ ∈ (2, 3) (as, e.g., in (1.1.12)), since then E[D2 ] = ∞. Note that, by (1.3.29), CMn (d)
conditioned on simplicity is a uniform random graph with the prescribed degree sequence.
We denote this random graph by UGn (d). We return to the difficulty of generating simple
graphs with infinite-variance degrees in Section 1.3.4 below.

Relation between Generalized Random Graph and Configuration Model

Since CMn (d) conditioned on simplicity yields a uniform (simple) random graph with these
degrees, and, also, by (1.3.25), GRGn (w) conditioned on its degrees is a uniform (simple)
random graph with the given degree distribution, the laws of these (conditioned) random
graph models are the same. As a result, one can prove results for GRGn (w) by proving them
for CMn (d) under the appropriate degree conditions, and then proving that GRGn (w)
satisfies these conditions in probability.
A further useful result in this direction is that the weight regularity conditions in Condi-
tions 1.1(a),(b) imply the degree regularity conditions in Conditions 1.7(a),(b):
Theorem 1.9 (Regularity conditions for weights and degrees) Let dv be the degree of
vertex v in GRGn (w), and let d = (dv )v∈[n] . Then, d satisfies Conditions 1.7(a),(b) in
probability when w satisfies Conditions 1.1(a),(b), where
hWk i
P(D = k) = E e−W (1.3.43)
k!
denotes the mixed-Poisson distribution with mixing distribution W having distribution func-
tion F in Condition 1.1(a). Further, d satisfies Conditions 1.7(a)–(c) in probability when w
satisfies Conditions 1.1(a)–(c).

Proof See [V1, Theorem 7.19]. The weak convergence in Condition 1.7(a) follows from
Theorem 1.3.
26 Introduction and Preliminaries

Remark 1.10 (Proving results for GRGn (w) through CMn (d)) Combined with Theorem
1.4, Theorem 1.9 allows us to prove many results for the generalized random graph by first
proving them for the configuration model under appropriate conditions on its degrees, and
then extending them to the generalized random graph by proving that its degrees satisfy the
assumptions made. In particular, any property that holds in probability for CMn (d) can be
extended to GRGn (w) in this way. See [V1, Sections 6.6 and 7.5] for more details. This
strategy is also frequently used in the present volume. J

A Useful Degree-Truncation Argument for Heavy-Tailed Degrees

Recall from Section 1.1.2 that many real-world networks have substantial inhomogeneities
in their degrees. As a result, we frequently discuss configuration models with power-law
degrees, giving rise to degree distributions with maxima that grow as a positive power of
n. Such large degrees can be inconvenient in technical estimates. We next present a useful
degree-truncation argument for the configuration model, which allows us to compare such
a model with an alternative configuration model with bounded degrees. In its statement, we
write x ∧ y for the minimum of x, y ∈ R:
Theorem 1.11 (Degree truncation for configuration models) Consider CMn (d) with gen-
eral degrees. Fix b ≥ 1. There exists a related configuration model CMn0 (d0 ) with n0 ≥ n
that is coupled to CMn (d) and satisfies the following:

(a) the degrees in CMn0 (d0 ) are a truncated version of those in CMn (d), i.e., d0v = (dv ∧b)
for v ∈ [n], and d0v = 1 for v ∈ [n0 ] \ [n];
(b) the total degree in CMn0 (d0 ) is the same as that in CMn (d), i.e., v∈[n0 ] d0v = v∈[n] dv ;
P P

(c) for all u, v ∈ [n], if u and v are connected in CMn0 (d0 ), then so are u and v in
CMn (d), i.e., distCMn (d) (u, v) ≤ distCMn0 (d0 ) (u, v) almost surely.
Remark 1.12 (Truncation of degrees in range) The construction that proves Theorem 1.11
is highly flexible, and also allows for a degree truncation that maintains restrictions on the
minimal degree dmin = minv∈[n] dv . Indeed, fix b ≥ 2. There exists a related configuration
model CMn0 (d0 ) satisfying (b) and (c) in Theorem 1.11, while (a) is replaced by d0v = dv
when dv < 2b, by d0v = b when dv ≥ 2b for v ∈ [n], and by b ≤ d0v < 2b for v ∈ [n0 ] \ [n],
so that d0min = minv∈[n0 ] d0v ≥ dmin ∧ b. J
Proof The proof relies on an “explosion” or “fragmentation” of the vertices [n] in CMn (d).
Label the half-edges from 1 to `n . We go through the vertices v ∈ [n] one by one. When
dv ≤ b, we do nothing. When dv > b, we let d0v = b and keep the b half-edges with the
lowest labels. The remaining dv − b half-edges are exploded from vertex v , in that they
are incident to vertices of degree 1 in CMn0 (d0 ), and are given vertex labels above n. We
give the exploded half-edges the remaining labels of the half-edges incident to v . Thus, the
half-edges receive labels both in CMn (d) as well as in CMn0 (d0 ), and the labels of the half-
edges incident to v ∈P [n] in CMn0 (d0 ) are a subset of those in CMn (d). In total, we thus
create an extra n+ = v∈[n] (dv − b) ∨ 0 “exploded” vertices of degree 1, and n0 = n + n+ ,
where x ∨ y denotes the maximum of x, y ∈ R.
We then pair the half-edges randomly, in the same way in CMn (d) as in CMn0 (d0 ). This
means that when the half-edge with label x is paired with the half-edge with label y in
1.3 Random Graph Models 27

CMn (d), then also the half-edge with label x is paired with the half-edge with label y in
CMn0 (d0 ), for all x, y ∈ [`n ].
We now check parts (a)–(c). Obviously parts (a) and (b) follow from the construction.
For part (c), we note that all exploded vertices in [n+ ] \ [n] have degree 1. Further, for
vertices u, v ∈ [n], if there exists a path in CMn0 (d0 ) connecting them then the intermediate
vertices have degree at least 2, so that they cannot correspond to exploded vertices and must
therefore in CMn0 (d0 ) have labels in [n]. Thus, the same path of paired half-edges also
exists in CMn (d), so that u and v are also connected in CMn (d).
We conclude by adapting the construction to prove the statement in Remark 1.12. We
again go through the vertices v ∈ [n] one by one. When dv < 2b, we do nothing. When dv ≥
2b, we let d0v = b and keep the b half-edges with the lowest labels. The remaining dv − b
half-edges are exploded from vertex v , in that they are incident to “exploded” vertices that
all have degree b in CMn0 (d0 ) possibly except for one vertex that has degree in [b, 2b), and
are given vertex labels above n. This means that a vertex of degree dv ≥ 2b is replaced by
one vertex in [n] and bdv /bc−1 vertices in [n0 ]\[n], of which all, possibly except for the last
vertex, have degree b, and the degree of the last vertex equals dv − b(bdv /bc − 1) ∈ [b, 2b).
We again give the exploded half-edges the remaining labels of the half-edges incident to v .
This identifies the desired construction for Remark 1.12. For part (c), we note that the half-
edges incident to exploded vertices arise from the same vertex in [n] as before explosion, so
a path between vertices u0 , v 0 ∈ [n0 ] in CMn0 (d0 ) implies that a path between the vertices
u, v ∈ [n] that correspond to u0 , v 0 exists. This implies that part (c) holds.

1.3.4 U NIFORM R ANDOM G RAPHS AND S WITCHING A LGORITHMS FOR THEM

So far, we have focussed on obtaining a uniform random graph with a prescribed degree
sequence by conditioning the configuration model on being simple. As explained above,
this does not work so well when the degrees have infinite variance. Another setting where
this method fails to deliver occurs when the average degree is large rather than bounded, so
that the graph is no longer sparse in the strict sense (recall Section 1.1.1).
An alternative method for producing a sample from the uniform distribution on simple
graphs uses a switching algorithm. A switching algorithm is a Markov chain on the space of
simple graphs, where, in each step, some edges in the graph are rewired while keeping the
graph simple. Under mild conditions on the precise switching dynamics, the uniform distri-
bution is the stationary distribution of this Markov chain, so letting the switching algorithm
run for an infinitely long time, we obtain a perfect sample from the uniform distribution. The
“mild” conditions follow, for example, when the switch chain is doubly stochastic.
Switching algorithms can also be used rather effectively to compute probabilities of cer-
tain events for uniform random graphs with specified degrees, as we explain later. As such,
switching methods form an indispensable tool in studying uniform random graphs with pre-
scribed degrees. We start by explaining the basic switching algorithms and their relation to
uniform sampling.

Switch Markov Chain

The switch Markov chain is a Markov chain on the space of simple graphs with prescribed
degrees given by d. Fix a simple graph G = ([n], E(G)) for which the degree of vertex v
28 Introduction and Preliminaries

equals dv for all v ∈ [n]. We assume that such a simple graph exists, i.e., we assume that
d = (dv )v∈[n] is graphical.
In order to describe the dynamics of the switch chain, choose two edges {u, v} and {x, y}
uar from the edge set E(G), where G is the current simple graph. The possible switches of
these two edges are (1) {u, x} and {v, y}; (2) {v, x} and {u, y}; and (3) {u, v} and {x, y}
(so that no change is made). Choose each of these three options with probability equal to 13 ,
and write the chosenedges as e1 , e2 . Accept the switch when the resulting graph with edges
{e1 , e2 } ∪ (E(G) \ {u, v}, {x, y} ) is simple, and reject the switch otherwise (so that the
graph remains unchanged under the dynamics).
It is not hard to see that the resulting Markov chain is aperiodic and irreducible. Further,
the switch chain is doubly stochastic since it is reversible. As a result, its stationary dis-
tribution is the uniform random graph with prescribed degree sequence d, which we have
denoted by UGn (d), as required.
The above method works rather generally, and, in the limit of infinitely many switches,
produces a sample from UGn (d) for every graphical degree sequence, even when the de-
grees are large. As a result, this chain is the method of choice to produce a sample of
UGn (d) when the probability of simplicity of the configuration model vanishes. However,
it is unclear precisely how often one needs to switch in order for the Markov chain to be
sufficiently close to the uniform (and thus stationary) distribution. See the notes in Section
1.6 for a discussion of the history of the switch chain, as well as the available results about
its convergence.

Switching Methods for Random Graphs with Prescribed Degrees

Switching algorithms can also be used to prove properties about uniform random graphs with
prescribed degrees. Here, we explain how switching can be used to estimate the connection
probability
P between vertices of specific degrees in a uniform random graph. Recall that
`n = v∈[n] dv . Then, the asymptotics for the edge probabilities for UGn (d) are given in
the following theorem, where E(UGn (d)) denotes the edge set of UGn (d):
Theorem 1.13 (Edge probabilities for uniform random graphs with prescribed degrees) As-
sume that the empirical distribution Fn of d satisfies, for all x ≥ 1,
[1 − Fn ](x) ≤ cF x−(τ −1) , (1.3.44)
for some cF > 0 and τ ∈ (2, 3). Let U denote a set of unordered pairs of vertices and let
EU = {{s, t} ∈ E(UGn (d)) ∀{s, t} ∈ U } denote the event that {s, t} is an edge for
every {s, t} ∈ U . Then, assuming that |U | = O(1), for every {u, v} ∈
/ U,
(du − |Uu |)(dv − |Uv |)
P({u, v} ∈ E(UGn (d)) | EU ) = (1 + o(1)) , (1.3.45)
`n + (du − |Uu |)(dv − |Uv |)
where Uv denotes the set of pairs in U that contain v ∈ [n].

Remark 1.14 (Relation to ECMn (d) and GRGn (w)) Theorem 1.13 shows that, when
du dv `n ,
`n
1 − P({u, v} ∈ E(UGn (d))) = (1 + o(1)) . (1.3.46)
du dv
1.3 Random Graph Models 29

In the erased configuration model, on the other hand,

1 − P({u, v} ∈ E(ECMn (d))) ≤ e−du dv /(2`n ) , (1.3.47)
as will be crucially used in Chapter 7 below (see Lemma 7.12 for a proof of (1.3.47)).
Thus, the probability that two high-degree vertices are not connected is much smaller for
ECMn (d) than for UGn (d). On a related note, the fact that
du dv
P({u, v} ∈ E(UGn (d))) ≈ ,
`n + du dv
as in GRGn (w) when w = d, indicates once more that GRGn (w) and UGn (d) are
closely related. J
We now proceed with the proof of Theorem 1.13. We first prove a useful lemma about
the number of 2-paths starting from a specified vertex, where a 2-path is a path consisting of
two edges:
Lemma 1.15 (The number of 2-paths) Assume that d satisfies (1.3.44) for some cF > 0
and τ ∈ (2, 3). For any graph G whose degree sequence is d, the number of 2-paths starting
2
from any specified vertex is O(n(2τ −3)/(τ −1) ) = o(n).
Proof Without loss of generality we may assume that the degrees are ordered from large to
small as d1 ≥ d2 ≥ · · · ≥ dn . Then, for every v ∈ [n], the number of vertices with degree
at least dv is at least v . By (1.3.44), for every v ∈ [n],
cF n(dv − 1)1−τ ≥ n[1 − Fn ](dv − 1) ≥ v. (1.3.48)
1/(τ −1)
Thus, d ≤ (cF n/v) + 1. The number of 2-paths from any vertex is bounded by
Pd1 v
v=1 dv , which is at most
d1 d1
X cF n 1/(τ −1)
1/(τ −1)
X
+ 1 = (cF n) v −1/(τ −1) + d1 (1.3.49)
v=1
v v=1
2

(τ −2)/(τ −1)
= O n1/(τ −1) d1 = O n(2τ −3)/(τ −1) ,

since d1 ≤ (cF n)1/(τ −1) + 1. Since τ ∈ (2, 3), the above is o(n).
Proof of Theorem 1.13. To compute the asymptotics of P({u, v} ∈ E(UGn (d)) | EU ),
we switch between two classes of graphs, S and S̄ . Class S consists of graphs where all
edges in {u, v} ∪ U are present, whereas S̄ consists of all graphs where every {s, t} ∈ U is
present, but {u, v} is not. Recall that EU = {{s, t} ∈ E(UGn (d)) ∀{s, t} ∈ U } denotes
the event that {s, t} is an edge for every {s, t} ∈ U . Then, since the law on simple graphs
is uniform (see also Exercise 1.18),
|S| 1
P({u, v} ∈ E(UGn (d)) | EU ) = = , (1.3.50)
|S| + |S̄| 1 + |S̄|/|S|
and we are left to compute the asymptotics of |S̄|/|S|.
For this, we define an operation called a forward switching that converts a graph in G ∈
S to a graph G0 ∈ S̄ . The reverse operation, converting G0 to G, is called a backward
30 Introduction and Preliminaries

u v u v

x y x y

a b a b

Figure 1.7 Forward and backward switchings. The edge {u, v} is present on the
left, but not on the right.

switching. Then we estimate |S̄|/|S| by counting the number of forward switchings that can
be applied to the graph G ∈ S , and the number of backward switchings that can be applied
to the graph G0 ∈ S̄ . In our switching, we wish to have control on whether {u, v} is present
or not, so we tune it to take this restriction into account.
The forward switching on G ∈ S is defined by choosing two edges and specifying their
ends as {x, a} and {y, b}. We write this as directed edges (x, a) since the roles of x and a
are different, as indicated in Figure 1.7. We assume that EU occurs. The choice must satisfy
the following constraints:

(1) none of {u, x}, {v, y}, or {a, b} is an edge in G;

(2) {x, a}, {y, b} 6∈ U ;
(3) all of u, v , x, y , a, and b must be distinct except that x = y is permitted.
Given a valid choice, forward switching replaces the three edges {u, v}, {x, a}, and {y, b}
by {u, x}, {v, y}, and {a, b}, while ensuring that the graph after switching is simple. Note
that forward switching preserves the degree sequence, and converts a graph in S to a graph
in S̄ . See Figure 1.7 for an illustration of both the forward and backward switchings.
Next, we estimate the number of ways to perform a forward switching to a graph G in S ,
denoted by f (G), and the number of ways to perform a backward switching to a graph G0
in S̄ , denoted by b(G). The number of total switchings between S and S̄ is equal to (see
Exercise 1.19)
|S|E[f (G)] = |S̄|E[b(G0 )], (1.3.51)
where the expectation is over a uniformly random G ∈ S on the left-hand side, and over a
uniformly random G0 ∈ S̄ on the right-hand side, respectively. Consequently,
|S̄| E[f (G)]
= . (1.3.52)
|S| E[b(G0 )]
We next compute each of these factors.

The Number of Forward Switchings: Computing E[f (G)]

Given an arbitrary graph G ∈ S , the number of ways to carry out a forward switching is at
most `2n , since there are at most `n ways to choose (x, a), and at most `n ways to choose
(y, b). Note that choosing (x, a) for the first directed edge and (y, b) for the second directed
edge results in a different switch from vice versa.
1.3 Random Graph Models 31

To find a lower bound on the number of ways of performing a forward switching, we

subtract from `2n an upper bound on the number of invalid choices for (x, a) and (y, b).
Such invalid choices can be categorized as follows:
(a) at least one of {u, x}, {a, b}, {v, y} is an edge in G;
(b) at least one of {x, a} or {y, b} is in U ;
(c) any vertex overlap other than x = y (i.e., if one of a or b is equal to one of x or y , or if
a = b, or if one of u or v is one of {a, b, x, y}).
We now bound all these different categories of invalid choices. To find an upper bound
for (a), note that any choice in case (a) must involve a single edge, and a 2-path starting from
a specified vertex. By Lemma 1.15, the number of choices for (a) is then upper bounded by
3 × o(`n ) × `n = o(`2n ) (noting that n = Θ(`n )). The number of choices for case (b) is
O(`n ), as |U | = O(1), and there are at most `n ways to choose the other directed edge,
which is not restricted to be in U .
To bound the number of choices for (c), we investigate each case:
(c1) Either a or b is equal to x or y , or a = b. In this case, x, y, a, b forms a 2-path in G. Thus,
there are at most 5 × n × o(`n ) = o(`2n ) choices (noting that n = Θ(`n )), where n is
the number of ways to choose a vertex, and o(`n ) bounds the number of 2-paths starting
from this specified vertex.
(c2) One of u and v is one of {a, b, x, y}. In this case, there is one 2-path starting from u
or v , and a single edge. Thus, there are at most 8 × `n × dmax = o(`2n ) choices, where
dmax = maxv∈[n] dv bounds the number of ways to choose a vertex adjacent to u or v
and `n bounds the number of ways to choose a single edge, by Lemma 1.15.
Thus, the number of invalid choices for (x, a) and (y, b) is o(`2n ), so that the number of
forward switchings which can be applied to any G ∈ S is (1 + o(1))`2n . We conclude that
E[f (G)] = (1 + o(1))`2n . (1.3.53)

The Number of Backward Switchings: Computing E[b(G0 )]

Given a graph G0 ∈ S̄ , consider the backward switchings that can be applied to G0 . There
are at most `n (du − |Uu |)(dv − |Uv |) ways to do the backward switching, since we are
choosing an edge that is adjacent to u but not in U , an edge that is adjacent to v but not in
U , and another directed edge (a, b). For a lower bound, we consider the following forbidden
choices:
(a0 ) at least one of {x, a} or {y, b} is an edge;
(b0 ) {a, b} ∈ U ;
6 ∅).
(c0 ) any vertices overlapping other than x = y (i.e., when {a, b} ∩ {u, v, x, y} =
We now go through each of these forbidden cases.
For (a0 ), suppose that {x, a} is present, giving the 2-path {x, a}, {a, b} in G0 . There
are at most (du − |Uu |)(dv − |Uv |) ways to choose x and y . Given any choice for x and
y , by Lemma 1.15, there are at most o(`n ) ways to choose a 2-path starting from x in
G0 , and hence o(`n ) ways to choose a, b. Thus, the total number of choices is at most
o((du − |Uu |)(dv − |Uv |)`n ). The case where {y, b} is an edge is symmetric.
32 Introduction and Preliminaries

For (b0 ), there are O(1) choices for choosing {a, b} since |U | = O(1), and at most
(du − |Uu |)(dv − |Uv |) choices for x and y . Thus, the number of choices for case (b0 ) is
O((du − |Uu |)(dv − |Uv |)) = o((du − |Uu |)(dv − |Uv |)`n ).
For (c0 ), the case where a or b is equal to x or y corresponds to a 2-path starting from u or
v together with a single edge from u or v . Since o(`n ) bounds the number of 2-paths starting
from u or v and du −|Uu |+dv −|Uv | bounds the number of ways to choose the single edge,
there are o(`n (dv −|Uv |))+o(`n (du −|Uu |)) total choices. If a or b is equal to u or v , there
are (du − |Uu |)(dv − |Uv |) ways to choose x and y , and at most du + dv ways to choose
the last vertex as a neighbor of u or v . Thus, there are O((du − |Uu |)(dv − |Uv |)dmax ) =
o((du − |Uu |)(dv − |Uv |)`n ) total choices, since dmax = O(n1/(τ −1) ) = o(n) = o(`n ).
We conclude that the number of backward switchings that can be applied to any graph
G0 ∈ S 0 is (du − |Uu |)(dv − |Uv |)`n (1 + o(1)), so that
E[b(G0 )] = (du − |Uu |)(dv − |Uv |)`n (1 + o(1)). (1.3.54)

Conclusion
Combining (1.3.52), (1.3.53), and (1.3.54) results in
`2n
|S̄|/|S| = (1 + o(1)) , (1.3.55)
(du − |Uu |)(dv − |Uv |)`n
and thus (1.3.50) yields
1
P({u, v} ∈ E(UGn (d)) | EU ) =
1 + |S̄|/|S|
(du − |Uu |)(dv − |Uv |)
= (1 + o(1)) . (1.3.56)
`n + (du − |Uu |)(dv − |Uv |)

Remark 1.16 (Uniform random graphs and configuration models) Owing to the close links
between uniform random graphs with prescribed degrees and configuration models, we treat
the two models together, in Chapters 4 and 7. J

1.3.5 P REFERENTIAL ATTACHMENT M ODELS

Most networks grow in time. Preferential attachment models describe growing networks,
where the numbers of edges and vertices grow with time. Here we give a brief introduc-
tion. The model that we investigate produces a graph sequence denoted by (PA(m,δ) n (a))n≥1
and which, for every time n, yields a graph of n vertices and mn edges for some m =
1, 2, . . . This model is denoted by (PA(m,δ) n (a))n≥1 in [V1, Chapter 8]. Below, we define
(PA(m,δ)
n (b))n≥1 and (PA (m,δ)
n (d)) n≥1 , which are variations of this model.
We start by defining the model for m = 1, for which the graph consists of a collection of
trees. In this case, PA(1,δ)
1 (a) consists of a single vertex with a single self-loop. We denote
the vertices of PA(1,δ)
n (a) by v1(1) , . . . , vn(1) . We denote the degree of vertex vi(1) in PA(1,δ)
n (a)
by Di (n), where, by convention, a self-loop increases the degree by 2.
We next describe the evolution of the graph. Conditional on PA(1,δ)
n (a), the growth rule
1.3 Random Graph Models 33
(1)
to obtain PA(1,δ)
n+1 (a) is as follows. Add a single vertex vn+1 having a single edge. This edge
is connected to a second vertex (including itself), according to the probabilities
1+δ


 for i = n + 1,
n(2 + δ) + (1 + δ)

P vn+1
(1)

→ vi(1) | PA(1,δ)
n (a) = Di (n) + δ (1.3.57)
for i ∈ [n].


n(2 + δ) + (1 + δ)


This preferential attachment mechanism is called affine, since the attachment probabilities
in (1.3.57) depend in an affine way on the degrees of the random graph PA(1,δ) n (a).
The model with m > 1 is defined in terms of the model for m = 1 as follows. Fix δ ≥
−m. We start with PA(1,δ/m) mn (a), and denote the vertices in PA(1,δ/m)mn (a) by v1(1) , . . . , vmn
(1)
.
(1) (1,δ/m)
Then we identify or collapse the m vertices v1 , . . . , vm in PAmn (a) to become vertex
(1)

v1(m) in PA(m,δ) n (a). In doing so, we let all the edges that are incident to any of the vertices
in v1(1) , . . . , vm
(1)
be incident to the new vertex v1(m) in PA(m,δ) n (a). Then, we collapse the
(1) (1) (m)
m vertices vm+1 , . . . , v2m in PAmn (a) to become vertex v2 in PA(m,δ)
(1,δ/m)
n (a), etc. More
(1) (1)
generally, we collapse the m vertices v(j−1)m+1 , . . . , vjm in PA(1,δ/m)
mn (a) to become vertex
(m)
vj in PAn (a). This defines the model for general m ≥ 1.
(m,δ)

The resulting graph PA(m,δ) n (a) is a multi-graph with precisely n vertices and mn edges,
so that the total degree is equal to 2mn. The model with δ = 0 is sometimes called the
proportional model. The inclusion of the extra parameter δ > −m is relevant, though, as
we will see later. It can be useful to think of edges and vertices as carrying weights, where a
(1)
vertex has weight δ and an edge has weight 1. Then, the vertex vn+1 attaches its edges with
a probability proportional to the weight of the vertex plus the edges to which it is incident.
This, for example, explains why PA(1,δ/m) mn (a) needs to be used in the collapsing procedure,
(1,δ)
rather than PAmn (a).
The preferential attachment model (PA(m,δ) n (a))n≥1 is increasing in time, in the sense
that vertices and edges, once they have appeared, remain there forever. Thus, the degrees
are monotonically increasing in time. Moreover, vertices with a high degree have a higher
chance of attracting further edges of later vertices. Therefore, the model is sometimes called
a.s.
the rich-get-richer model. It is not hard to see that Di (n) −→ ∞ for each fixed i ≥ 1,
as n → ∞ (see Exercise 1.20). As a result, one could also call the preferential attachment
model the old-get-richer model.

Degrees of Fixed Vertices

We start by investigating the degrees of fixed vertices as n → ∞, i.e., we will study Di (n)
for fixed i as n → ∞. To formulate our results, we define the Gamma function t 7→ Γ(t)
for t > 0 by
Z ∞
Γ(t) = xt−1 e−x dx. (1.3.58)
0

The following theorem describes the evolution of the degree of fixed vertices:
Theorem 1.17 (Degrees of fixed vertices) Consider PA(m,δ)
n (a) with m ≥ 1 and δ > −m.
Then, Di (n)/n1/(2+δ/m) converges almost surely to a random variable ξi as n → ∞.
34 Introduction and Preliminaries

Proof This is to be found in [V1, Theorem 8.2 and (8.3.11)].

a.s.
It turns out that also n−1/(2+δ/m) maxv∈[n] Dv (n) −→ M for some limiting positive
and finite random variable M (see [V1, Section 8.7]). In analogy to iid random variables,
this fact suggests that the degree of a random vertex satisfies a power law with power-law
exponent τ = 3 + δ/m, and this is our next item on the agenda.

Degree Sequence of the Preferential Attachment Model

We write
1 X
Pk (n) = 1{Di (n)=k} (1.3.59)
n i∈[n]

for the (random) proportion of vertices with degree k at time n. For m ≥ 1 and δ > −m,
we define (pk )k≥0 by pk = 0 for k = 0, . . . , m − 1 and, for k ≥ m,
Γ(k + δ)Γ(m + 2 + δ + δ/m)
pk = (2 + δ/m) . (1.3.60)
Γ(m + δ)Γ(k + 3 + δ + δ/m)
It turns out that (pk )k≥0 is a probability mass function (see [V1, Section 8.4]). It arises as
the limiting degree distribution for PA(m,δ)
n (a), as shown in the following theorem:
Theorem 1.18 (Degree sequence in preferential attachment model) Consider PA(m,δ)
n (a)
with m ≥ 1 and δ > −m. There exists a constant C = C(m, δ) > 0 such that, as n → ∞,
r
log n
P max |Pk (n) − pk | ≥ C = o(1). (1.3.61)
k n
Proof See [V1, Theorem 8.3].
We next investigate the scale-free properties of (pk )k≥0 by investigating the asymptotics
of pk for k large. By (1.3.60) and Stirling’s formula, as k → ∞ we have
pk = cm,δ k −τ (1 + O(1/k)), (1.3.62)
where
Γ(m + 2 + δ + δ/m)
τ = 3 + δ/m > 2, and cm,δ = (2 + δ/m) . (1.3.63)
Γ(m + δ)
Therefore, by Theorem 1.18 and (1.3.62), the asymptotic degree sequence of PA(m,δ)
n (a) is
close to a power law with exponent τ = 3 + δ/m. We note that any exponent τ > 2 can be
obtained by choosing δ > −m and m ≥ 1 appropriately.

Related Preferential Attachment Rules

In this book, we also sometimes investigate the related (PA(m,δ)
n (b))n≥1 model, in which
self-loops for m = 1 in (1.3.57) are not allowed, so that
Di (n) + δ
P vn+1
(1)

→ vi(1) | PA(1,δ)
n (b) = for i ∈ [n]. (1.3.64)
n(2 + δ)
For m = 1, this model starts with two vertices and two edges in between, so that at time n,
there are precisely n edges. The model for m ≥ 2 is again defined in terms of the model
1.3 Random Graph Models 35
(1,δ/m)
(PAnm (b))n≥1 for m = 1 by collapsing blocks of m vertices, so that PA(m,δ)
n (b) has n
vertices and mn edges. The advantage of (PA(m,δ) n (b))n≥1 compared to (PA (m,δ)
n (a))n≥1 is
(m,δ) (m,δ)
that (PAn (b))n≥1 is naturally connected, while (PAn (a))n≥1 may not be. Note that
PA(m,δ)
n (b) can contain self-loops when m ≥ 2, due to the collapsing procedure.
(Model (PA(m,δ)n (c))n≥1 , as formulated in [V1, Section 8.3], is defined by connecting
edges with probability α to a uniformly chosen vertex and with probability 1 − α to a vertex
chosen proportionally to its degrees. It turns out to be equivalent to (PA(m,δ)
n (a))n≥1 . We
will not discuss this model further here.)
Another adaptation of the preferential attachment rule arises when no self-loops are ever
allowed, while the degrees are updated when the m edges incident to the new vertex are
being attached. We denote this model by (PA(m,δ)n (d))n≥1 . In this case, the model starts at
time n = 2 with two vertices labeled 1 and 2, and m edges between them. PA(m,δ) n (d) has
vertex set [n], and m(n − 1) edges. At time n + 1, for m ≥ 1 and j ∈ {0, . . . , m − 1}, we
(m)
attach the (j + 1)th edge of vertex vn+1 to vertex vi(m) with probability

Di (n, j) + δ
P vn+1,j+1
(m)

→ vi(m) | PA(m,δ)
n,j (d) = for i ∈ [n]. (1.3.65)
n(2m + δ)

Here, Di (n, j) is the degree of vertex vi(m) after the connection of the edges incident to the
(m)
first n + 1 vertices, as well as the first j edges incident to vertex vn+1 , and PA(m,δ)
n,j (d) is the
(m)
graph of the first n vertices, as well as the first j edges incident to vertex vn+1 . The model
is by default connected, and at time n consists of n + 1 vertices and mn edges. For m = 1,
apart from the different starting graphs, models (b) and (d) are identical. Indeed, PA(1,δ)n (b)
for n = 2 consist of two vertices with two edges between them, while PA(1,δ) n (d) for n = 2
consists of two vertices with one edge between them for PA(1,δ) n (d).

Many other adaptations are possible and have been investigated in the literature, such
(m)
as settings where the m edges incident to vn+1 are independently connected as in (1.3.65)
when j = 0. We refrain from discussing these. It is not hard to verify that Theorem 1.18
holds for all these adaptations, which explains why authors have often opted for the version
of the model that is most convenient for them. From the perspective of local convergence, it
turns out that (PA(m,δ)
n (d))n≥1 is the most convenient, as we will see in Chapter 5. On the
other hand, Theorem 1.17 contains minor adaptations between models, particularly since the
limiting random variables (ξi )i≥1 do depend on the precise model.

Bernoulli Preferential Attachment Model

We finally discuss a model that is quite a bit different from the other preferential attachment
models discussed above. The main difference is that in this model, the number of edges is
not fixed, but instead there is conditional independence in the edge attachments. We call
this model the Bernoulli preferential attachment model, as the attachment indicators are all
conditionally independent Bernoulli variables. Let us now give the details.
Fix a preferential attachment function f : N0 7→ (0, ∞). Then, the graph evolves as
follows. We start with a graph BPA1(f ) containing one vertex v1 and no edges. At each
time n ≥ 2, we add a vertex vn . Conditional on BPA(f )
n−1 , and independently for every
36 Introduction and Preliminaries

v ∈ [n − 1], we connect this vertex to v by a directed edge with probability

f (Dv(in) (n − 1))
, (1.3.66)
n−1
where Dv(in) (n − 1) is the in-degree of vertex v at time n − 1. This creates the random
graph BPA(f ) (f )
n . Note that the number of edges in the random graph process (BPAn )n≥1 is
a random variable, and thus not fixed. In particular, it makes a difference whether we use the
in-degree in (1.3.66) or the total degree.
We consider functions f : N 7→ (0, ∞) that satisfy that f (k + 1) − f (k) < 1 for every
k ≥ 0. Under this assumption and when f (0) ≤ 1, the empirical degree sequence converges
as n → ∞, i.e.,
k−1
1 X 1 Y f (l)
Pk (n) ≡ 1{Di (n)=k} −→
P
pk , where pk = . (1.3.67)
n i∈[n] 1 + f (k) l=0 1 + f (l)

In particular, log(1/pk )/ log k → 1+1/γ when f (k)/k → γ ∈ (0, 1) (see Exercise 1.23).
Remarkably, when f (k) = γk + β , the power-law exponent of the degree distribution does
not depend on β . The restriction that f (k + 1) − f (k) < 1 is needed to prevent the degrees
from exploding. Further, log(1/pk ) ∼ k 1−α /(γ(1 − α)) when f (k) ∼ γk α for some
α ∈ (0, 1) (see Exercise 1.24). Interestingly, there exists a P persistent hub, i.e., a vertex
that
P has maximal degree for all but finitely many times, when k≥1 1/f (k)2 < ∞. When
2
k≥1 1/f (k) = ∞, this does not happen.

1.3.6 U NIVERSALITY OF RANDOM GRAPHS

There are many other graph topologies where one can expect results similar to those in
the random graphs discussed above. We will discuss many related models in Chapter 9,
where we include several settings that are relevant in practice, such as directed graphs as
well as graphs with community structure and geometry. The random graph models that we
investigate are inhomogeneous, and one can expect that the results depend sensitively on the
amount of inhomogeneity present. This is reflected in the results that we prove, where the
precise asymptotics is different when the vertices have heavy-tailed rather than light-tailed
degrees. However, interestingly, what is “heavy tailed” and what is “light tailed” depends on
the precise model and setting at hand. Often, as we will see, the distinction depends on how
many moments the degree distribution has.
We have proposed many random graph models for real-world networks. Since these mod-
els aim at describing similar real-world networks, one would hope that they also give simi-
lar answers. Indeed, for a real-world network with a power-law degree sequence, we could
model its static structure by the configuration model with the same degree sequence, and its
dynamical properties by the preferential attachment model with similar scale-free degrees.
How do we interpret implications to the real world when these attempts give completely
different predictions?
Universality is the phrase physicists use when different models display similar behavior.
Models that show similar behavior are then in the same universality class. Enormous effort
has gone, and is currently going, into deciding whether various random graph models are
1.4 Power Laws and Their Properties 37

in the same universality class, or rather in different ones, and why. We will see that the
degree distribution decides the universality class for a wide range of models, as one might
possibly hope. This also explains why the degree distribution plays such a dominant role in
the investigation of random graphs. See Chapter 9 for more details.

1.4 P OWER L AWS AND T HEIR P ROPERTIES

In this book, we frequently deal with random variables having an (asymptotic) power-law
distribution. For such random variables, we often need to investigate truncated moments,
and we also often deal with their sized-biased distribution. In this section, we collect some
results concerning power-law random variables. We start by recalling the definition of a
power-law distribution:
Definition 1.19 (Power-law distributions) We say that X has a power-law distribution with
exponent τ when there exists a function x 7→ L(x) that is slowly varying at infinity such
that
1 − FX (x) = P(X > x) = L(x)x−(τ −1) . (1.4.1)
Here, we recall that a function x 7→ L(x) is slowly varying at infinity when, for every t > 0,
L(xt)
lim = 1. (1.4.2)
x→∞ L(x)
J
A crucial result about slowly varying functions is Potter’s Theorem, which we next recall:
Theorem 1.20 (Potter’s Theorem) Let x 7→ L(x) be slowly varying at infinity. For every
δ , there exists a constant Cδ ≥ 1 such that, for all x ≥ 1,
x−δ /Cδ ≤ L(x) ≤ Cδ xδ . (1.4.3)
Theorem 1.20 implies that the tail of any general power-law distribution, as in Definition
1.19, can be bounded above and below by that of a pure power-law distribution (i.e., one
without a slowly varying function) with a slightly adapted power-law exponent. As a result,
we can often deal with pure power laws instead.
We continue by studying the relation between power-law tails of the empirical degree
distribution and bounds on the degrees themselves:
Lemma 1.21 (Tail and degree bounds) Let d = (dv )v∈[n] be a degree distribution, d(1) ≥
d(2) ≥ · · · ≥ d(n−1) ≥ d(n) its non-increasing ordered version, and
1 X
Fn (x) = 1{dv ≤x} (1.4.4)
n v∈[n]

its empirical distribution. Then

[1 − Fn ](x) ≤ cF x−(τ −1) ∀x ≥ 1 (1.4.5)
implies that
1/(τ −1)
d(v) ≤ (cF n/v) +1 ∀v ∈ [n], (1.4.6)
38 Introduction and Preliminaries

while
1/(τ −1)
d(v) ≤ (cF n/v) ∀v ∈ [n] (1.4.7)
implies that
[1 − Fn ](x) ≤ cF x−(τ −1) ∀x ≥ 1. (1.4.8)
Proof Assume first that (1.4.5) holds. For every v ∈ [n], the number of vertices with
degree at least d(v) is at least v . By (1.4.5), for every v ∈ [n],
cF n(d(v) − 1)1−τ ≥ n[1 − Fn ](d(v) − 1) ≥ v. (1.4.9)
1/(τ −1)
Thus, d(v) ≤ (cF n/v) + 1, as required.
Next, assume that (1.4.7) holds. Then
1 X 1 X
[1 − Fn ](x) = 1{dv >x} = 1{d(v) >x}
n v∈[n] n v∈[n]
1 X
≤ 1{(cF n/v)1/(τ −1) >x}
n v∈[n]
1 X
= 1{v<ncF x−(τ −1) } ≤ cF x−(τ −1) , (1.4.10)
n v∈[n]

as required.
We next study truncated moments of random variables whose tail is bounded by that of a
power law:
Lemma 1.22 (Truncated moments) Let X be a non-negative random variable whose dis-
tribution function FX (x) = P(X ≤ x) satisfies, for every x ≥ 1,
1 − FX (x) ≤ CX x−(τ −1) . (1.4.11)
Then, for all a < τ − 1, there exists a constant CX (a) such that, for all ` ≥ 1,
E[X a 1{X>`} ] ≤ CX (a)`a−(τ −1) , (1.4.12)
while, for a > τ − 1 and all ` ≥ 1,
E[X a 1{X≤`} ] ≤ CX (a)`a−(τ −1) . (1.4.13)
Proof We note that for any cumulative distribution function x 7→ FX (x) on the non-
negative reals, we have a partial integration identity, stating that, for every f : R → R,
Z ∞ Z ∞
f (x)FX (dx) = f (u)[1 − FX (u)] + [f (x) − f (u)]FX (dx)
u
Zu∞ Z x
= f (u)[1 − FX (u)] + f 0 (y)dyFX (dx)
u u
Z ∞ Z ∞
0
= f (u)[1 − FX (u)] + f (y) FX (dx)dy
u y
Z ∞
= f (u)[1 − FX (u)] + f 0 (y)[1 − FX (y)]dy, (1.4.14)
u
1.5 Notation and Preliminaries 39

provided that either (a) y 7→ f 0 (y)[1 − FX (y)] is absolutely integrable, or (b) x 7→ f (x) is
either non-decreasing or non-increasing, so that f 0 (y)[1 − FX (y)] has a fixed sign. Here, the
interchange of the integration order is allowed by Fubini’s Theorem for non-negative func-
tions (Halmos, 1950, Section 36, Theorem B) when x 7→ f (x) is non-decreasing, and by
Fubini’s Theorem for absolutely–integrable functions (Halmos, 1950, Section 36, Theorem
C) when y 7→ f 0 (y)[1 − FX (y)] is absolutely integrable. Similarly, for f with f (0) = 0,
Z u Z uZ x Z u Z u
f (x)FX (dx) = f 0 (y)dyFX (dx) = f 0 (y) FX (dx)dy
0 0 0 0 y
Z u
= f 0 (y)[FX (u) − FX (y)]dy. (1.4.15)
0

When X ≥ 0, using (1.4.11) and (1.4.14), for a < τ − 1 and ` > 0,

Z ∞
E X 1{X>`} = ` P(X > `) + axa−1 P(X > x)dx
a a
`
Z ∞
a−(τ −1)
≤ CX ` + aCX xa−1 x−(τ −1) dx ≤ CX (a)`a−(τ −1) . (1.4.16)
`

as required. Further, for a > τ − 1 and ` > 0, now using (1.4.15),

Z `
E X a 1{X≤`} ≤ aCX xa−1 x−(τ −1) dx ≤ CX (a)`a−(τ −1) .

(1.4.17)
0

An important notion in many graphs is the size-biased version X ? of a non-negative

random variable X , which is given by
E[X 1{X≤x} ]
P(X ? ≤ x) = . (1.4.18)
E[X]
Exercise 1.25 shows that the size-biased distribution of the degree of a random vertex is the
degree of a random vertex in a random edge. Let FX? denote the distribution function of X ? .
The following lemma gives bounds on the tail of the distribution function FX? :
Lemma 1.23 (Size-biased tail distribution) Let X be a non-negative random variable
whose distribution function FX (x) = P(X ≤ x) satisfies that there exists a CX such that,
for every x ≥ 1,
1 − FX (x) ≤ CX x−(τ −1) . (1.4.19)
Assume that τ > 2, so that E[X] < ∞. Further, assume that E[X] > 0. Then, there exists
a constant CX? such that
1 − FX? (x) ≤ CX? x−(τ −2) . (1.4.20)
Proof This follows immediately from (1.4.18), by using (1.4.12) with a = 1.

1.5 N OTATION AND P RELIMINARIES

Let us introduce some standard notation used throughout this book, and recall some prop-
erties of trees and Poisson processes.
40 Introduction and Preliminaries

Abbreviations
We write rhs for right-hand side, and lhs for left-hand side. Further, we abbreviate with
respect to by wrt.

Random variables
d
We write X = Y to denote that X and Y have the same distribution. We write X ∼ Be(p)
when X has a Bernoulli distribution with success probability p, i.e., P(X = 1) = 1 −
P(X = 0) = p. We write X ∼ Bin(n, p) when the random variable X has a binomial
distribution with parameters n and p, and we write X ∼ Poi(λ) when X has a Poisson
distribution with parameter λ.
We write X ∼ Exp(λ) when X has an exponential distribution with mean 1/λ. We
write X ∼ Gam(r, λ) when X has a gamma distribution with scale parameter λ and shape
parameter r, for which the density, for x ≥ 0, is given by
fX (x) = λr xr−1 e−λx /Γ(r), (1.5.1)
where r, λ > 0, and we recall (1.3.58), while fX (x) = 0 for x < 0. The random variable
Gam(r, λ) has mean r/λ and variance r/λ2 . Finally, we write X ∼ Beta(α, β) when X
has a beta distribution with parameters α, β > 0, so that X has density, for x ∈ [0, 1],
fX (x) = xα−1 (1 − x)β−1 /B(α, β), (1.5.2)
where
Γ(α)Γ(β)
B(α, β) = (1.5.3)
Γ(α + β)
is the Beta-function, while fX (x) = 0 for x 6∈ [0, 1]. We sometimes abuse notation, and
write e.g., P(Bin(n, p) = k) to denote P(X = k) when X ∼ Bin(n, p).
We call a sequence of random variables (Xi )i≥1 independent and identically distributed
(iid) when they are independent, and Xi has the same distribution as X1 for every i ≥ 1.
For a finite set X , we say that X ∈ X is drawn uniformly at random (uar) when X has the
uniform distribution on X .

Convergence of Random Variables

We say that a sequence of events (En )n≥1 occurs with high probability (whp) when
limn→∞ P(En ) = 1.
d
For sequences of random variables (Xn )n≥1 , Xn −→ X denotes that Xn converges in
P
distribution to X , while Xn −→ X denotes that Xn converges in probability to X and
a.s.
Xn −→ X denotes that Xn converges almost surely to X . We write that Xn = OP (Yn )
when |Xn |/Yn is a tight sequence of random variables and Xn = ΘP (Yn ) when Xn =
P
OP (Yn ) and Yn = OP (Xn ). Finally, we write that Xn = oP (Yn ) when Xn /Yn −→ 0.

Stochastic Domination
We recall that a random variable X is stochastically dominated by a random variable Y
when FX (x) = P(X ≤ x) ≥ FY (x) = P(Y ≤ x) for every x ∈ R. We write this as
X Y . See [V1, Section 2.3] for more details on stochastic ordering.
1.5 Notation and Preliminaries 41

Two Useful Martingale Inequalities

Recall [V1, Section 2.5] for the definition of a martingale (Mn )n≥0 . We rely on Doob’s
martingale inequality, which for a martingale (Mn )n≥0 states that
h 2
i
E sup Mm − E[Mm ] ≤ Var(Mn ). (1.5.4)
m≤n

An immediate consequence is Kolmogorov’s martingale inequality, which states that

P(sup Mm − E[Mm ] ≥ ε) ≤ ε−2 Var(Mn ). (1.5.5)
m≤n

Densities in Inhomogeneous Poisson Processes

Let Π be an inhomogeneous Poisson process with intensity measure Λ : X → N, where
X ⊆ [0, ∞). This means R that the number of points Π(A) in A ⊆ X has a Poisson distri-
bution with parameter A Λ(dx), and the number of points in disjoint sets are independent
(see Last and Penrose (2018) for details on Poisson processes).
Let A be a bounded set. We wish to give a formula for the probability of the event that
the points of Π in A (of which there are Π(A)) are in (a1 + da1 , . . . , ak + dak ), where we
assume that a1 < a2 < · · · < ak . Thus, there is one point in a1 + da1 , one in a2 + da2 ,
etc. Note that this event is a subset of the event Π(A) = k . Denote this event by P(a1 +
da1 , . . . , ak + dak ). Assume that x 7→ Λ(x) is continuous almost everywhere. Then, for
all measurable A ⊆ X and all ordered elements a1 , . . . , ak ∈ A,
R k
Y
P(P(a1 + da1 , . . . , ak + dak )) = e− A
Λ(dx)
Λ(ai )dai . (1.5.6)
i=1
R
We refer to e− A Λ(dx) as the no-further-point probability, in that it ensures that Π has
R in A. We refer to Exercise 1.26 for a proof that (1.5.6) implies that
precisely k points
Π(A) ∼ Poi( A Λ(dx)), and Exercise 1.27 for a proof that (1.5.6) implies that Π(A)
and Π(B) are independent when A and B are disjoint.

Ordered Trees and Their Exploration

In this book, trees play a central role, and it is important to be clear about exactly what we
mean by a tree. Trees are rooted and ordered. A tree t has root ∅, vertex set V (t) and edge
set E(t), and the vertex set will be given an ordering below.
It is convenient to think of a tree t with root ∅ as being labeled in the Ulam–Harris way,
so that a vertex v in generation k has a label ∅v1 · · · vk , where vi ∈ N. Naturally, there
are some restrictions, in that if ∅v1 · · · vk ∈ V (t), then also ∅v1 · · · vk−1 ∈ V (t), and
∅v1 · · · (vk − 1) ∈ V (t) when vk ≥ 2. We refer to [V1, Chapter 3] for details.
It will sometimes also be useful to explore trees in a breadth-first manner. This corre-
sponds to the lexicographical ordering in the Ulam–Harris encoding of the tree. Ulam–Harris
trees are also sometimes called plane trees (see, e.g., (Drmota, 2009, Chapter 1)). Let us now
make the breadth-first ordering of the tree precise:
Definition 1.24 (Breadth-first order on a tree) For v ∈ V (t), let |v| be its height. Thus
|v| = k when v = ∅v1 · · · vk and |∅| = 0. Let u, v ∈ V (t). Then u < v when either
42 Introduction and Preliminaries

|u| < |v| or |u| = |v| and u = ∅u1 · · · uk and v = ∅v1 · · · vk are such that (u1 , . . . , uk ) <
(v1 , . . . , vk ) in the lexicographic sense. J
We next explain the breadth-first exploration of t:
Definition 1.25 (Breadth-first exploration of a tree) For a tree t of size |V (t)| = t, we let
(ak )tk=0 be the elements of V (t) ordered according to the breadth-first ordering of t (recall
Definition 1.24). For i ≥ 1, let xi denote the number of children of vertex ai . Thus, if dv
denotes the degree of v ∈ V (t) in the tree t, we have x1 = da0 = d∅ and xi = dai − 1 for
i ≥ 2. The recursion
si = si−1 + xi − 1 for i ≥ 1, with s0 = 1, (1.5.7)
describes the evolution of the number of unexplored vertices in the breadth-first exploration.
For a finite tree t of size |V (t)| = t, thus si > 0 for i ∈ {0, . . . , t − 1} while st = 0. J
The sequence (xi )ti=1 gives an alternative encoding of the tree t that is often convenient.
Indeed, by Exercise 1.28, the sequence (xi )ti=1 is in one-to-one correspondence to t.

Unimodular Branching Process Trees

We next describe one type of random tree that occurs frequently in our analyses, the so-
called unimodular branching process tree:
Definition 1.26 (Unimodular branching process tree) Fix a probability distribution (pk )k≥1 ,
where pk = P(D = k) for some integer-valued random variable D. The unimodular
branching process tree with root-offspring distribution (pk )k≥1 is the branching process
where the root has offspring distribution (pk )k≥1 , while all vertices in other generations
have offspring distribution p?k given by
(k + 1)
p?k = P(D? − 1 = k) = P(D = k + 1), (1.5.8)
E[D]
where we recall that D? denotes the size-biased version of D in (1.4.18). J
It turns out that unimodular branching process trees arise as local limits of random graphs,
seen from a uniform vertex. The distribution (pk )k≥1 , where pk = P(D = k), describes the
degree distribution in the graph, while the law (1.5.8) is related to the degree distribution of
other vertices that are close to a uniform vertex.
We let BP≤r denote the branching process up to and including generation r, and write
BPr = BP≤r \ BP≤r−1 for the rth generation. It is convenient to think of the branching
process tree, denoted as BP, as being labeled in the Ulam–Harris way (recall Definitions
1.24 and 1.25), so that a vertex v in generation r has a label ∅a1 · · · ar , where ai ∈ N. When
applied to BP, we denote this process by (BP(t))t≥1 , where BP(t) consists of precisely t+1
vertices (with BP(0) equal to the root ∅).

1.6 N OTES AND D ISCUSSION FOR C HAPTER 1

Sections 1.1–1.3 are in the majority summaries of chapters in Volume 1, to which we refer for notes and
discussion, so we restrict ourselves here to the exceptions.
1.6 Notes and Discussion for Chapter 1 43

Notes on Sections 1.1 and 1.2

See Barabási (2002) and Watts (2003) for expository accounts of the discovery of network properties by
Barabási, Watts, and co-authors. Newman et al. (2006) bundles together some of the original papers detail-
ing the empirical findings of real-world networks and the network models invented for them. The introduc-
tory book by Newman (2010) lists many of the empirical properties of, and scientific methods for, networks.
See also Barabási (2016) for an online book giving an extensive background to the science of networks, and
Coscia (2021) for an “atlas to the aspiring network scientist.”
The discussion of the scale-free phenomenon in Section 1.1.2 has been substantially extended compared
with [V1, Section 1.4.1]. Artico et al. (2020) considered another static definition of the degree distribution,
based on that in preferential attachment models (which they call the degree distribution of the de Solla Price
model in honor of Price (1965), who invented the model for citation networks; see also Section 9.1.1). This
can be seen as an interpolation between the static approach of Broido and Clauset (2019) and the dynamic
approach advocated by Barabási (2018). Artico et al. (2020) used maximum-likelihood techniques to argue
that power-law network degree distributions are not rare, classifying almost 65% of the tested networks as
having a power-law tail with at least 80% power. We further refer to Nair et al. (2022) for an extensive
discussion on heavy-tailed phenomena.

Notes on Section 1.3

Erdős–Rényi random graph. The seminal papers Erdős and Rényi (1959, 1960, 1961a,b) investigated a
related model in which a collection of m edges is chosen uar from the collection of n2 possible edges. The

model described here as the Erdős–Rényi random graph was actually not invented by Erdős and Rényi, but
rather by Gilbert (1959). When adding a fixed number of edges, the proportion of edges is 2m/n(n − 1) ≈
2m/n2 , so we should think of m ≈ 2λn for a fair comparison. Note that when we condition the total
number of edges in the independent-edge model to be equal to m, the law of the Erdős–Rényi random
graph is equal to the model where a collection of m uniformly chosen edges is added, which explains the
close relation between the two models. Owing to the concentration of the total number of edges, we can
indeed roughly exchange the binomial model with p = λ/n with the combinatorial model with m = 2λn.
The combinatorial model has the nice feature that it produces a uniform graph from the collection of all
graphs with m edges, and thus could serve as a null model for a real-world network in which only the
number of edges is fixed. We can also view ERn (λ/n) as percolation on the complete graph. Percolation
is a paradigmatic model in statistical physics describing random failures in networks (see Grimmett (1999)
for an extensive overview of percolation theory focussing on Zd ).
Inhomogeneous random graphs were first proposed by Söderberg (2002, 2003a,c,b). A general formal-
ism for inhomogeneous random graphs is described in the seminal work of Bollobás et al. (2007). The
generalized random graph was first introduced by Britton et al. (2006). The random graph with prescribed
expected degrees, or Chung–Lu model, was introduced, and studied intensively, by Chung and Lu (2002a,b,
2003, 2006a,b). The Poissonian random graph or Norros–Reittu model was introduced by Norros and Re-
ittu (2006). The conditions under which these random graphs are asymptotically equivalent [V1, Sections
6.7 and 6.8] were derived by Janson (2010a). Condition 1.1 has been slightly modified compared with [V1,
Condition 6.4], in that we now assume that E[W ] ∈ (0, ∞), which excludes trivial cases where the graph
is almost empty.
Configuration model and uniform random graphs with prescribed degrees. The configuration model was in-
vented by Bollobás (1980) to study uniform random regular graphs (see also (Bollobás, 2001, Section 2.4)).
The introduction was inspired by, and generalized the results in, the work of Bender and Canfield (1978).
The original work allowed for a careful computation of the number of regular graphs, using a probabilistic
argument. This is the probabilistic method at its best, and also explains the emphasis on the study of the
probability for the graph to be simple, as we will see below. The configuration model, as well as uniform
random graphs with a prescribed degree sequence, were further studied in greater generality by Molloy and
Reed (1995, 1998). This extension is quite relevant to us, as the scale-free nature of many real-world ap-
plications encourages us to investigate configuration models with power-law degree sequences. Condition
1.7 is a minor modification of [V1, Condition 7.8]. The terms “erased” and “repeated configuration model”
were coined by Britton et al. (2006).
The degree-truncation argument for the configuration model is, to the best of our knowledge, novel.
Switching algorithms, as discussed in Section 1.3.4 have a long history, dating back at least to McKay
(1981), see also Erdős et al. (2022); Gao and Greenhill (2021); Gao and Wormald (2016); McKay and
Wormald (1990), as well as McKay (2011) and the references therein for overviews. The literature on switch
44 Introduction and Preliminaries

chains focusses on two key aspects: first, their rapid mixing (Erdős et al. (2022); Gao and Greenhill (2021),
and various related papers, for which we refer to Erdős et al. (2022)), and, second, counting the number
of simple graphs using switch chain arguments (as in Gao and Wormald (2016)), which is the approach
that we take in this section. Rapid mixing means that the mixing time of the switch chain is bounded by
an explicit power of the number of vertices (or number of edges, or both combined). The powers, however,
tend to be large, and thus “rapid mixing” may not be rapid enough to give good guarantees when one is
trying to sample a uniform random graph of the degree distribution of some real-world network. Theorem
1.13 is adapted from Gao et al. (2020), where it was used to compute the number of triangles in uniform
random graphs with power-law degree distributions having infinite variance. See also Janson (2020b) for a
relation between the configuration model and uniform random graphs using switchings.
Preferential attachment models were first introduced in the context of complex networks by Barabási and
Albert (1999). Bollobás et al. (2001) studied the model by Barabási and Albert (1999), and later many other
papers followed on this, and related, models. Barabási and Albert (1999) and Bollobás et al. (2001) focussed
on the proportional model, for which δ = 0. The affine model was proposed by Dorogovtsev et al. (2000).
All these works were pre-dated by Price (1965); Simon (1955); Yule (1925); see [V1, Chapter 8] for more
details on the literature. The Bernoulli preferential attachment model was introduced and investigated by
Dereich and Mörters (2009, 2011, 2013).

Notes on Section 1.4

This material is folklore. A lot is known about slowly varying functions; we refer to the classic book on the
topic by Bingham et al. (1989) for details.

Notes on Section 1.5

Our choice of notation was heavily influenced by Janson (2011), to which we refer for further background
and equivalent notation.

1.7 E XERCISES FOR C HAPTER 1

Exercise 1.1 (Probability mass function typical degree) Prove that the probability mass function of the
degree of a uniform vertex is given by (1.1.3).
Exercise 1.2 (Growth maxima power-law random variables) Suppose that the non-negative random vari-
able X satisfies that there exists τ > 1,
P(X > x) = cX x−(τ −1) . (1.7.1)
−1/(τ −1) d
Let (Xv )v∈[n] be a sequence of iid copies of X. Show that Mn = maxv∈[n] Xv satisfies n Mn −→
M for some limiting random variable M .
Exercise 1.3 (Uniform random graph) Consider ERn (p) with p = 21 . Show that the result is a uniform
graph, i.e., it has the same distribution as a uniform choice from all the graphs on n vertices.
Exercise 1.4 (Thin-tailed Poisson) Show that, limk→∞ eαk pk = 0 for every α > 0, where pk =
e−λ λk /k! denotes the Poisson probability mass function.
Exercise 1.5 (A nice power-law distribution) Let the random variable X have generating function
GX (s) = E[sX ] = 1 − (1 − s)α . (1.7.2)
Fix α ∈ (0, 1). Identify the probability mass function P(X = k) of X.
Exercise 1.6 (A power-law distribution?) Consider G(s) = 1 − (1 − s)α as in Exercise 1.5, now for
α > 1. Is G(s) the generating function of a random variable?
Exercise 1.7 (Weight of uniformly chosen vertex) Let o be a vertex chosen uar from [n]. Show that the
weight wo of o has the distribution function Fn given in (1.3.10).
Exercise 1.8 (Maximal weight bound) Assume that Conditions 1.1(a) and (b) hold. Show that maxv∈[n] wv =
√
o(n). Further, show that maxv∈[n] wv = o( n) when Conditions 1.1(a)–(c) hold.
1.7 Exercises for Chapter 1 45

Exercise 1.9 (Domination weights) Let Wn have the distribution function Fn from (1.3.17). Show that
Wn is stochastically dominated by the random variable W having distribution function F . Here we recall
that Wn is stochastically dominated by W when P(Wn ≤ w) ≥ P(W ≤ w) for all w ∈ R.
Exercise 1.10 (Degree of uniformly chosen vertex in GRGn (w)) Prove that the asymptotic degree in
GRGn (w) satisfies (1.3.22) under the conditions of Theorem 1.3.
Exercise 1.11 (Power-law degrees in generalized random graphs) Prove that, under the conditions of
Theorem 1.3, the degree power-law tail in (1.3.24) for GRGn (w) follows from the weight power-law tail
in (1.3.23). Does the converse also hold?
Exercise 1.12 (Degree example) Let the degree sequence d = (dv )v∈[n] be given by
dv = 1 + (v mod 3). (1.7.3)
Show that Conditions 1.7(a)–(c) hold. What is the limiting degree variable D?
Exercise 1.13 (Poisson degree example) Let the degree sequence d = (dv )v∈[n] satisfy

λk
nk /n → e−λ (1.7.4)
k
and
X X
knk /n → λ, k2 nk /n → λ(λ + 1). (1.7.5)
k≥0 k≥0

Show that Conditions 1.7(a)–(c) hold. What is the limiting degree variable D?
Exercise 1.14 (Power-law degree example) Consider the random variable D having generating function,
for α ∈ (0, 1),
GD (s) = s − (1 − s)α+1 /(α + 1). (1.7.6)
What is the probability mass function of D?
Exercise 1.15 (Power-law degree example) Consider the random variable D having generating function
(1.7.6) with α ∈ (0, 1). Show that D has an asymptotic power-law distribution and compute its power-law
exponent.
Exercise 1.16 (Power-law degree example (cont.)) Consider the degree sequence d = (dv )v∈[n] with
dv = [1 − F ]−1 (v/n), where F is the distribution of a random variable D having generating function
(1.7.6) with α ∈ (0, 1). Show that Conditions 1.7(a) and (b) hold, but Condition 1.7(c) does not.
Exercise 1.17 (Number of erased edges) Assume that Conditions 1.7(a) and (b) hold. Show that Theorem
1.8 implies that the number of erased edges in ECMn (d) is oP (n).
Exercise 1.18 (Edge probability of uniform random graphs with prescribed degrees) Prove the formula
for the (conditional) edge probabilities in uniform random graphs with prescribed degrees in (1.3.50).
Exercise 1.19 (Edge probability of uniform random graphs with prescribed degrees (cont.)) Prove the
formula for the number of switches with and without a specific edge in uniform random graphs with pre-
scribed degrees in (1.3.51). Hint: Use an “out-is-in” argument that the number of switches from S to S̄ is
the same as the number of switches that enter S̄ from S.
Exercise 1.20 (Degrees grow to infinity almost surely) Consider the preferential attachment model PA(m,δ)
n (a).
a.s.
Fix m = 1 and i ≥ 1. Prove that Di (n) −→ ∞ as n → ∞, by using n
P
s=i Is Di (n), where (In )n≥i
is a sequence of independent Bernoulli random variables with P(In = 1) = (1 + δ)/(n(2 + δ) + 1 + δ).
What does this imply for m > 1?
Exercise 1.21 (Degrees of fixed vertices) Consider the preferential attachment model PA(m,δ)
n (a). Prove
Theorem 1.17 for m = 1 and δ > −1 using the martingale convergence theorem and the fact that
n−1
Di (n) + δ Y (2 + δ)s + 1 + δ
Mi (n) = (1.7.7)
1 + δ s=i−1 (2 + δ)(s + 1)
46 Introduction and Preliminaries

is a martingale for every i ≥ 1 and for n ≥ i.

Exercise 1.22 (Power-law degree sequence) Prove that the limiting degree distribution of preferential
attachment models in (1.3.60) satisfies the power-law asymptotics in (1.3.63) by using Stirling’s formula.
Exercise 1.23 (Degrees distribution of affine Bernoulli preferential attachment model) Recall the limiting
degree distribution (pk )k≥0 in (1.3.67). Show that pk ∼ cγ,β k−(1+1/γ) when f (k) = γk + β. What is
cγ,β ?
Exercise 1.24 (Degrees distribution of sublinear Bernoulli preferential attachment model) Recall the lim-
iting degree distribution (pk )k≥0 in (1.3.67). Show that log(1/pk ) ∼ k1−α /(γ(1 − α)) when f (k) ∼ γkα
for some α ∈ (0, 1).
Exercise 1.25 (Size-biased degree distribution and random edges) Let Dn be the degree of a random
vertex in a graph Gn = (V (Gn ), E(Gn )) of size |V (Gn )| = n. Let Dn? be the degree of a random vertex
in an edge drawn uar from E(Gn ). Show that Dn? has the size-biased distribution of Dn , where we recall
the definition of the size-biased distribution of a random variable from (1.4.18).
Exercise 1.26 (Number of points in an inhomogeneous Poisson process) Prove that the Poisson density
R number of points of the Poisson process in A has the appropriate Poisson
formula in (1.5.6) implies that the
distribution, i.e., Π(A) ∼ Poi( A Λ(dx)).
Exercise 1.27 (Number of points in an inhomogeneous Poisson process (cont.)) In the setting of Exercise
1.26, show that (1.5.6) implies that Π(A) and Π(B) are independent when A and B are disjoint.
Exercise 1.28 (Breadth-first encoding ordered rooted tree) Recall Definitions 1.24 and 1.25 for the breadth-
first order on, and exploration of, a rooted ordered tree. Show that the sequence (xi )ti=1 is in one-to-one
correspondence to the rooted ordered tree t.
C HAPTER 2
L OCAL C ONVERGENCE
OF R ANDOM G RAPHS

Abstract

In this chapter we discuss local convergence, which describes the intuitive no-
tion that a finite graph, seen from the perspective of a typical vertex, looks like
a certain limiting graph. Local convergence plays a profound role in random
graph theory.
We give general definitions of local convergence in several probabilistic senses.
We then show that local convergence in its various forms is equivalent to the
appropriate convergence of subgraph counts. We continue by discussing several
implications of local convergence, concerning local neighborhoods, clustering,
assortativity, and PageRank. We further investigate the relation between local
convergence and the size of the giant component, making the statement that the
giant is “almost local” precise.

2.1 M OTIVATION : L OCAL L IMITS

The local convergence of finite graphs was first introduced by Benjamini and Schramm
(2001) and, in a different context, independently by Aldous and Steele (2004). It describes
the intuitive notion that a finite graph, seen from the perspective of a vertex that is chosen
uar from the vertex set, looks like a certain limiting graph. This is already useful to in that
it makes precise the notion that a finite cube in Zd with large side length is locally much
alike Zd itself. However, it plays an even more profound role in random graph theory. For
example, local convergence to some limiting tree, which often occurs in random graphs,
as we will see throughout this book, is referred to as locally tree-like behavior. Such trees
are often branching processes; see for example [V1, Section 4.1] where this is worked out
for the Erdős–Rényi random graph. Since trees are generally simpler objects than graphs,
this means that, to understand a random graph, it often suffices to understand a branching
process tree instead.
Local convergence is a central technique in random graph theory, since many properties
of random graphs are in fact determined by a local limit. For example, the asymptotic num-
ber of spanning trees, the partition function of the Ising model, and the spectral distribution
of the adjacency matrix of the graph all turn out to be computable in terms of the local limit.
We refer to Section 2.7 for an extensive discussion of the highly non-trivial consequences of
local convergence. Owing to its enormous power, local convergence has become an indis-
pensable tool in the random graph theory of sparse graphs. In this book we will see several
examples of quantities whose convergence and limit are determined by the local limit, in-
cluding clustering, the size of the giant in most cases, and the PageRank distribution of
sparse random graphs. In this chapter, we lay the general foundations of local convergence.

47
48 Local Convergence of Random Graphs

Organization of the Chapter

This chapter is organized as follows. In Section 2.2 we start by discussing the metric space
of rooted graphs, which plays a crucial role in local convergence. In Section 2.3 we give the
formal definition of local weak convergence for deterministic graphs. There, we discuss their
convergence to some, surprisingly possibly random, limit. The randomness of the local weak
limit originates from the fact that we consider the graph as rooted at a random vertex, and
this randomness may persist in the limit. In Section 2.4 we then extend the notion of local
convergence to random graphs, for which there are several notions of convergence, such as
local weak convergence and local convergence in probability. In Section 2.5 we discuss the
consequences of local convergence to local functionals, such as clustering and assortativity.
For the latter, we extend the convergence of the neighborhood of a uniformly chosen vertex
to that of a uniformly chosen edge. In Section 2.6 we discuss the consequences of local
convergence on the giant component. While the proportion of vertices in the giant is not a
continuous functional in the local topology, one could argue that it is “almost local”, in a
way that can be made precise. We close the chapter with notes and discussion in Section 2.7
and with exercises in Section 2.8. Some of the technicalities are deferred to Appendix A.

2.2 M ETRIC S PACE OF ROOTED G RAPHS

Local weak convergence is a notion of the weak convergence of finite rooted graphs. In
general, weak convergence is equivalent to the convergence of expectations of continuous
functions. For continuity, one needs a topology. Therefore, we start by discussing the topol-
ogy of rooted graphs that is at the center of local weak convergence. We start with some
definitions:
Definition 2.1 (Locally finite and rooted graphs) A rooted graph is a pair (G, o), where
G = (V (G), E(G)) is a graph with vertex set V (G), edge set E(G), and root vertex
o ∈ V (G). Further, a rooted or non-rooted graph is called locally finite when each of its
vertices has finite degree (though not necessarily uniformly bounded). J
In Definition 2.1, graphs can have finitely or infinitely many vertices, but we always have
graphs in mind that are locally finite. Also, in the definitions below, the graphs are deter-
ministic and we clearly indicate when we move to random graphs instead. We next define
neighborhoods as rooted subgraphs of a rooted graph, for which we recall that distG denotes
the graph distance in the graph G:
Definition 2.2 (Neighborhoods as rooted graphs) For a rooted graph (G, o), we let Br(G) (o)
denote the (rooted) subgraph of (G, o) of all vertices at graph distance at most r away from
o. Formally, this means that Br(G) (o) = ((V (Br(G) (o)), E(Br(G) (o)), o), where
V (Br(G) (o)) = {u : distG (o, u) ≤ r}, (2.2.1)
E(Br(G) (o)) = {{u, v} ∈ E(G) : distG (o, u), distG (o, v) ≤ r}.
Also, let ∂Br(G) (o) denote the (unrooted) graph with vertex set V (∂Br(G) (o)) = V (Br(G) (o))\
(G)
V (Br−1 (o)) and edge set E(∂Br(G) (o)) = E(Br(G) (o)) \ E(Br−1 (G)
(o)). J
2.2 Metric Space of Rooted Graphs 49

We continue by introducing the notion of isomorphisms between graphs, which basically

describes graphs that “look the same.” Here is the formal definition:
Definition 2.3 (Graph isomorphism)
(a) Two (finite or infinite) graphs G1 = (V (G1 ), E(G1 )) and G2 = (V (G2 ), E(G2 )) are
called isomorphic, which we write as G1 ' G2 , when there exists a bijection φ mapping
V (G1 ) to V (G2 ) such that {u, v} ∈ E(G1 ) precisely when {φ(u), φ(v)} ∈ E(G2 ).
(b) Similarly, two rooted (finite or infinite) graphs (G1 , o1 ) and (G2 , o2 ) are called iso-
morphic, abbreviated as (G1 , o1 ) ' (G2 , o2 ), when there exists a bijection φ map-
ping V (G1 ) to V (G2 ) such that φ(o1 ) = o2 and {u, v} ∈ E(G1 ) precisely when
{φ(u), φ(v)} ∈ E(G2 ). J
Exercises 2.1 and 2.2 investigate the notion of graph isomorphisms. We let G? denote
the space of rooted graphs modulo isomorphisms. We often omit the equivalence classes,
and write (G, o) ∈ G? , bearing in mind that all (G0 , o0 ) such that (G0 , o0 ) ' (G, o) are
considered to be the same. Thus, formally we deal with the equivalence classes of rooted
graphs. In the literature, the equivalence class containing (G, o) is often denoted as [G, o].
Remark 2.4 (Multi-graphs) Sometimes, we consider multi-graphs, i.e., graphs that contain
self-loops or multi-edges, and that can be characterized as G = (V (G), (xu,v )u,v∈V (G) ),
where xu,v denotes the number of edges between the vertices u and v , and xu,u denotes
the number of self-loops at u. The above notions easily extend to such a setting. Indeed,
let G = (V (G), (xu,v )u,v∈V (G) ) and G0 = (V (G0 ), (x0u,v )u,v∈V (G0 ) ) be two multi-graphs.
An isomorphism φ : V (G) 7→ V (G0 ) is then instead required to be a bijection, satisfying
xu,v = x0φ(u),φ(v) for every {u, v} ∈ E(G). We will not place much emphasis on multi-
graphs in the main text. J
These notions allow us to turn the space of connected rooted graphs into a metric space:
Definition 2.5 (Metric on rooted graphs) Let (G1 , o1 ) and (G2 , o2 ) be two rooted con-
nected graphs, and write Br(Gi ) (oi ) for the neighborhood of vertex oi ∈ V (Gi ). Let
R? = sup{r : Br(G1 ) (o1 ) ' Br(G2 ) (o2 )}, (2.2.2)
and define the metric
1
dG? (G1 , o1 ), (G2 , o2 ) = . (2.2.3)
R? + 1
J
The value R? is the largest value of r for which Br(G1 ) (o1 ) is isomorphic to Br(G2 ) (o2 ).
When R? = ∞, Br(G1 ) (o1 ) is isomorphic to Br(G2 ) (o2 ) for every r ≥ 1, and then the rooted
graphs (G1 , o1 ) and (G2 , o2 ) are the same apart from an isomorphism; see Lemma A.11 in
the appendix where this is worked out in detail.
The space G? of rooted graphs is a nice metric space under the metric dG? in (2.2.3),
in that (G? , dG? ) is separable and thus Polish. Here we recall that a metric space is called
separable when there exists a countable dense subset of elements. Later on we will see that
such a countable dense set can be created by looking at finite rooted graphs. Since graphs
that agree up to distance r are at distance at most 1/(r + 1) from each other (see Exercise
50 Local Convergence of Random Graphs

2.3), this is indeed a dense countable subset. We discuss the metric structure of the space of
rooted graphs in more detail in Appendix A.3. Exercises 2.4 and 2.5 study such aspects.

2.3 L OCAL W EAK C ONVERGENCE OF D ETERMINISTIC G RAPHS

In this section, we discuss the local weak convergence of deterministic graphs (Gn , on ),
rooted at a uniform vertex on ∈ V (Gn ), whose size tends to infinity as n → ∞. This section
is organized as follows. In Section 2.3.1, we give the definitions of local weak convergence
of (possibly disconnected) finite graphs. In Section 2.3.2, we provide a convenient criterion
to prove local weak convergence and discuss tightness. In Section 2.3.3, we show that when
the limit has full support on some subset of rooted graphs, convergence can be restricted to
that set. In Section 2.3.4, we discuss two examples of graphs that converge locally weakly.
We close in Section 2.3.5 by discussing the local weak convergence of marked graphs, which
turns out to be useful in many applications of local weak convergence.

2.3.1 D EFINITION OF L OCAL W EAK C ONVERGENCE

Above, we worked with connected graphs (see, e.g., Definition 2.5). We often wish to apply
local weak convergence arguments to disconnected graphs. For such examples, we think of
the rooted connected graph (Gn , on ) as corresponding to the connected component C (on )
of on in Gn . Here, for v ∈ V (Gn ), we let C (v) denote its connected component. Then, we
define, similarly to (2.2.1), the rooted graph C (on ) = ((V (C (on )), E(C (on ))), on ) as
V (C (on )) = {u : distGn (o, u) < ∞}, (2.3.1)
E(C (on )) = {{u, v} ∈ E(Gn ) : distGn (o, u), distGn (o, v) < ∞}.
For h : G? → R, by convention, we extend the definition to all (not necessarily connected)
graphs by letting
h(Gn , on ) ≡ h(C (on )). (2.3.2)

We next use these conventions to define the local weak convergence of finite graphs:
Definition 2.6 (Local weak convergence) Let Gn = (V (Gn ), E(Gn )) denote a finite
(possibly disconnected) graph. Let (Gn , on ) be the rooted graph obtained by letting on ∈
V (Gn ) be chosen uar, and restricting Gn to the connected component C (on ) of on in Gn .
We say that (Gn , on ) converges locally weakly to the connected rooted graph (G, o), which
is a (possibly random) element of G? having law µ, when, for every bounded and continuous
function h : G? 7→ R,
E h Gn , on → Eµ h G, o ,

(2.3.3)
where the expectation on the rhs of (2.3.3) is wrt (G, o) having law µ, while the expectation
d
on the lhs is wrt the random vertex on . We denote the above convergence by (Gn , on ) −→
(G, o). J
Of course, by (2.3.2), the values h(Gn , on ) give you information only about C (on ),
which may be only a small portion of the graph when Gn is disconnected. However, since
2.3 Local Weak Convergence of Deterministic Graphs 51

we are sampling on ∈ V (Gn ) uar, actually we may “see” every connected component, so
in distribution we do observe the graph as a whole.
Since later we apply local weak convergence ideas to random graphs, we need to be
absolutely clear about with respect to what we take the expectation. Indeed, the expectation
in (2.3.3) is wrt the random root on ∈ V (Gn ), and is thus equal to

1 X
E h Gn , on =

h(Gn , v). (2.3.4)
|V (Gn )| v∈V (G )
n

The notion of local weak convergence plays a central role in this book. It may be hard
to grasp, and it also may appear to be rather weak. In what follows, we discuss examples
of graphs that converge locally weakly. Further, in Section 2.5 we discuss examples of how
local weak convergence may be used to obtain interesting consequences for graphs, such as
their clustering and degree–degree dependencies, measured through the assortativity coeffi-
cient. We continue by discussing a convenient criterion for proving local weak convergence.

2.3.2 L OCAL W EAK C ONVERGENCE : C RITERION

We next provide a convenient criterion for local weak convergence:

Theorem 2.7 (Criterion for local weak convergence) The sequence of finite rooted graphs
((Gn , on ))n≥1 converges locally weakly to (G, o) ∼ µ precisely when, for every rooted
graph H? ∈ G? and all integers r ≥ 0,

1
1{B
X
p(Gn ) (H? ) = (Gn )
(v)'H? }
→ µ(Br(G) (o) ' H? ), (2.3.5)
|V (Gn )| v∈V (G r
n)

where Br(Gn ) (v) is the rooted r-neighborhood of u in Gn , and Br(G) (o) is the rooted r-
neighborhood of v in the limiting graph (G, o).

Proof This is a standard weak convergence argument. First, the local weak convergence
in Definition 2.6 implies that (2.3.5) holds, since we can take p(Gn ) (H? ) = E[h G, on
and h G, o = 1{Br(G) (o)'H? } , and h : G? 7→ {0, 1} is bounded and continuous (see Ex-

ercise 2.6). For the other direction, since µ is a probability measure on G? , the sequence
((Gn , on ))n≥1 is tight; see Theorem A.7 in Appendix A.2. By tightness, every subsequence
of ((Gn , on ))n≥1 has a further subsequence that converges in distribution. We work along
that subsequence, and note that the limiting law is that of (G, o), since the laws of Br(G) (o)
for all r ≥ 1 uniquely identify the law of (G, o) (see Proposition A.15 in Appendix A.3.5).
Since this is true for every subsequence, the local weak limit is (G, o).

Theorem 2.7 shows that the proportion of vertices in Gn whose neighborhoods look like
H? converges to a (possibly random) limit. See Exercise 2.8, where you are asked to con-
struct an example where the local weak limit of a sequence of deterministic graphs actually
is random. You are asked to prove local weak convergence for some examples in Exercises
2.9 and 2.10. Appendix A.3.6 discusses tightness in G? in more detail.
52 Local Convergence of Random Graphs

2.3.3 L OCAL W EAK C ONVERGENCE AND C OMPLETENESS OF THE L IMIT

In many settings, the local weak limit is almost surely contained in a smaller set of rooted
graphs, for example rooted trees. In this case, it turns out to be enough to prove the con-
vergence in Definition 2.11 below only for those elements in which the limit (G, o) takes
values. Let us explain this in more detail.
Let T? ⊂ G? be a subset of the space of rooted graphs. Let T? (r) ⊆ T? be the subset of
T? of graphs for which the distance between any vertex and the root is at most r. Then, we
have the following result:
Theorem 2.8 (Local weak convergence and subsets) Let (Gn )n≥1 be a sequence of rooted
graphs, and on ∈ V (Gn ) be a vertex chosen uar. Let (G, o) be a random variable on
G? having law µ. Let T? ⊂ G? be a subset of the space of rooted graphs. Assume that
d
µ (G, o) ∈ T? = 1. Then, (Gn , on ) −→ (G, o) when (2.3.5) holds for all H? ∈ T? (r)

and all r ≥ 1.
We will apply Theorem 2.8 in particular when the limit is almost surely a tree. Then
Theorem 2.8 implies that we have to investigate only rooted graphs H? that are finite trees
of height at most r themselves.
Proof The set T? (r) is countable. Therefore, since µ (G, o) ∈ T? = 1, for every ε > 0,

there exists an m = m(ε) and a subset T? (r, m) of size at most m such that µ Br(G) (o) ∈
T? (r, m) ≥ 1 − ε. Fix this set. Then we bound

P Br(Gn ) (on ) 6∈ T? (r) = 1 − P Br(Gn ) (on ) ∈ T? (r)

≤ 1 − P Br(Gn ) (on ) ∈ T? (r, m) .

(2.3.6)
Therefore,
lim sup P Br(Gn ) (on ) 6∈ T? (r) ≤ 1 − lim inf P Br(Gn ) (on ) ∈ T? (r, m)

(2.3.7)
n→∞ n→∞

= 1 − µ Br(G) (o) ∈ T? (r, m) ≤ 1 − (1 − ε) = ε.

Since ε > 0 is arbitrary, we conclude that P Br(Gn ) (on ) 6∈ T? (r) → 0. In particular, this

means that, for any H? 6∈ T? (r),

P Br(Gn ) (on ) ' H? → 0 = µ Br(G) (o) ' H? .

(2.3.8)
Thus, when the required convergence holds for every H? ∈ T? (r), it follows for every
H? 6∈ T? (r) with limit zero when µ (G, o) ∈ T? = 1.

2.3.4 E XAMPLES OF L OCAL W EAK C ONVERGENCE

We close this section by discussing two important examples of local weak convergence. We
start with large boxes in Zd , and then discuss the local weak limit of finite trees.

Local Weak Convergence of Boxes in Zd

Consider the nearest-neighbor box [n]d , where x = (x1 , . . . , xd ) ∈ Zd is a neighbor of
y ∈ Zd precisely when there is a unique i ∈ [d] such that |xi − yi | = 1. Take a root uar,
2.3 Local Weak Convergence of Deterministic Graphs 53

d
and denote the resulting graph by (Gn , on ). We claim that (Gn , on ) −→ (Zd , o), which we
now prove. We rely on Theorem 2.7, which shows that we need to prove the convergence of
subgraph proportions.
d
Let µ be the point measure on (Zd , o), so that µ(Br(G) (o) ' Br(Z ) (o)) = 1. Thus, by
d
Theorem 2.8, it remains to show that p(Gn ) (Br(Z ) (o)) → 1 (recall (2.3.5)). For this, we note
d
that Br(Gn ) (on ) ' Br(Z ) (o) unless on happens to lie within a distance strictly smaller than r
from one of the boundaries of [n]d . This means that one of the coordinates of on is either in
[r − 1], or in [n] \ [n − r + 1]. Since the latter occurs with vanishing probability, the claim
follows.
In the above case, we see that the local weak limit is deterministic, as one would have
expected. One can generalize the above to the local weak convergence of tori as well.

Local Weak Convergence of Truncated Trees

Recall the notion of a tree in Section 1.5. Fix a degree d, and a height n that we take to
infinity. We now define the regular tree Td,n truncated at height n. The graph Td,n has
vertex set
n
V (Td,n ) = {∅} ∪
[
{∅} × [d] × [d − 1]k−1 , (2.3.9)
k=1

and edge set as follows. Let v = ∅v1 · · · vk and u = ∅u1 · · · u` be two vertices. We say
that u is the parent of v when ` = k − 1 and ui = vi for all i ∈ [k − 1]. Then we say
that two vertices u and v are neighbors when u is the parent of v or vice versa. We obtain a
graph with
|V (Td,n )| = 1 + d + · · · + d(d − 1)n−1 (2.3.10)
vertices.
Let on denote a vertex chosen uar from V (Td,n ). To study the local weak limit of
(Td,n , on ), we first consider the so-called canopy tree. For this, we take the graph Td,n ,
root it at any leaf, which we will call the root-leaf, and take the limit of n → ∞. Denote
this graph by Tcd , which we consider to be an unrooted graph, but we keep the root-leaf for
reference purposes. This graph has a unique infinite path from the root-leaf. Let o` be the
`th vertex on this infinite path (the root-leaf being o0 ), and consider (Tcd , o` ). Define the
limiting measure µ by
µ((Tcd , o` )) ≡ µ` = (d − 2)(d − 1)−(`+1) , ` ≥ 0. (2.3.11)

Fix Gn = Td,n . We claim that (Gn , on ) ≡ (Td,n , on ) −→ (G, o) with law µ in (2.3.11).
d

We again rely on Theorem 2.7, which shows that we need to prove only the convergence of
the subgraph proportions.
(Tc )
By Theorem 2.8, it remains to show that p(Gn ) (Br d (o` )) → µ` (recall (2.3.5)). When
(Tc )
n is larger than r (which we now assume), Br(Gn ) (on ) ' Br d (o` ) precisely when on has
distance ` from the closest leaf. There are
d(d − 1)k−1 (2.3.12)
vertices at distance k from the root, out of a total of |V (Td,n )| = d(d − 1)n /(d − 2)(1 +
54 Local Convergence of Random Graphs

o(1)). Having distance ` to the closest leaf in Td,n is the same as having distance k = n − `
from the root. Thus,
d(d − 1)k−1
p(Gn ) (Br(Td ) (o` )) =
c
→ (d − 2)(d − 1)−(`+1) = µ` , (2.3.13)
|V (Td,n )|
as required.
We see that, for truncated regular trees, the local weak limit is random, where the ran-
domness originates from the choice of the random root of the graph. More precisely, this is
due to the choice how far away the chosen root is from the leaves of the finite tree. Perhaps
surprisingly, the local limit of a truncated regular tree is not the infinite regular tree.

2.3.5 L OCAL W EAK C ONVERGENCE OF M ARKED ROOTED G RAPHS

We next extend the notion of local weak convergence to include marks. Such marks are
associated with the vertices as well as the edges, and can take values in a general complete
separable metric space. Rooted graphs with marks are called marked graphs. Such more
general set-ups are highly relevant in many applications, for example when dealing with
general inhomogeneous graphs. Let us explain this in some more detail:
Definition 2.9 (Marked rooted graphs) A marked (multi-)graph is a (multi-)graph G =
(V (G), E(G)) together with a set M (G) of marks taking values in a complete separa-
ble metric space Ξ, called the mark space. Here M maps from V (G) and E(G) to Ξ.
Images in Ξ are called marks. Each edge is given two marks, one associated with (“at”)
each of its endpoints. Thus, we could think of the marks as located at the “half-edges”
incident to a vertex. Such a half-edge can be formalized as (u, {u, v}) for u ∈ V (G)
and {u, v} ∈ E(G). Alternatively, we can think of the mark as located on a directed
edge (u, v) where {u, v} ∈ E(G); we will use both perspectives. The only assump-
tion on the degrees is that they are locally finite. We denote the marked rooted graph by
(G, o, M (G)) = ((V (G), E(G)), o, M (G)). J
We next extend the metric to the above setting of marked rooted graphs. Let the distance
between (G1 , o1 , M (G1 )) and (G2 , o2 , M (G2 )) be 1/(1+R? ), where R? is the supremum
of those r > 0 such that there is some rooted isomorphism of the balls of (graph distance)
radius brc around the roots of Gi , such that each pair of corresponding marks has distance
in Ξ less than 1/r. This is formalized as follows:
Definition 2.10 (Metric on marked rooted graphs) Let dΞ be a metric on the space of marks
Ξ. Then, we let
1
dG? (G1 , o1 , M1 (G1 )), (G2 , o2 , M2 (G2 )) = , (2.3.14)
1 + R?
where
n
R? = sup r : Br(G1 ) (o1 ) ' Br(G2 ) (o2 ), and there exists φ such that
dΞ (m1 (u), m2 (φ(u))) ≤ 1/r ∀u ∈ V (Br(G1 ) (o1 )), and
o
dΞ (m1 (u, v), m2 (φ(u, v))) ≤ 1/r ∀{u, v} ∈ E(Br(G1 ) (o1 )) , (2.3.15)
2.4 Local Convergence of Random Graphs 55

with φ : V (Br(G1 ) (o1 )) → V (Br(G2 ) (o2 )) running over all isomorphisms between Br(G1 ) (o1 )
and Br(G2 ) (o2 ) satisfying φ(o1 ) = o2 . J

When Ξ is a finite set, we can simply let dΞ (a, b) = 1{a6=b} , so that (2.3.14)–(2.3.15) state
that not only should the neighborhoods Br(G1 ) (o1 ) and Br(G2 ) (o2 ) be isomorphic but also the
corresponding marks on the vertices and half-edges in Br(G1 ) (o1 ) and Br(G2 ) (o2 ) should all
be the same.
Definition 2.10 puts a metric structure on marked rooted graphs. With this metric topology
in hand, we can simply adapt all convergence statements to this setting. We refrain from
stating all these extensions explicitly. See Exercise 2.17 for an application of marked graphs
to directed graphs. For example, the marked rooted graph setting is a way to formalize the
setting of multi-graphs in Remark 2.4 (see Exercise 2.18).
Having discussed the notion of local weak convergence for deterministic graphs, we now
move on to random graphs. Here the situation becomes more delicate, as now we have
double randomness, both in the random root as well as the random graph. This gives rise to
surprising subtleties.

2.4 L OCAL C ONVERGENCE OF R ANDOM G RAPHS

We next discuss the local convergence of random graphs. This section is organized as fol-
lows. In Section 2.4.1 we define what it means for a sequence of random graphs to converge
locally, as well as which different versions thereof exist. In Section 2.4.2 we then give a use-
ful criterion to verify the local convergence of random graphs. In Section 2.4.3 we prove the
completeness of the limit by showing that, when the limit is supported on a subset of rooted
graphs, then one needs only to verify the convergence for that subset. In many examples that
we encounter in this book, this subset is the collection of trees. We close with two examples:
that of random regular graphs in Section 2.4.4 and that of the Erdős–Rényi random graph
ERn (λ/n) in Section 2.4.5.

2.4.1 D EFINITION OF L OCAL C ONVERGENCE OF R ANDOM G RAPHS

Even for random variables, there are different notions of convergence that are relevant, such
as convergence in distribution and in probability. Also for local convergence, there are sev-
eral related notions of convergence that we may consider:

Definition 2.11 (Local convergence of random graphs) Let (Gn )n≥1 with Gn = (V (Gn ), E(Gn ))
denote a finite sequence of (possibly disconnected) random graphs. Then,

(a) Gn converges locally weakly to (Ḡ, ō) having law µ̄ when

E h Gn , on → Eµ̄ h Ḡ, ō ,

(2.4.1)

for every bounded and continuous function h : G? 7→ R, where the expectation E on the
lhs of (2.4.1) is wrt the random vertex on and the random graph Gn . This is equivalent
d
to (Gn , on ) −→ (G, o).
56 Local Convergence of Random Graphs

(b) Gn converges locally in probability to (G, o) having law µ when

E h Gn , on | Gn −→ Eµ h G, o ,
P
(2.4.2)
for every bounded and continuous function h : G? 7→ R.
(c) Gn converges locally almost surely to (G, o) having law µ when
a.s.
E h Gn , on | Gn −→ Eµ h G, o ,

(2.4.3)
for every bounded and continuous function h : G? 7→ R. J
When we have local convergence in probability, E h Gn , on | Gn , which is a ran-

dom variable due to its dependence on the random graph Gn , converges in probability to
Eµ h G, o , which is possibly also a random variable in that µ might be a random proba-
bility distribution on G? . When, instead,
we havelocal weak convergence, only expectations
wrt the random graph of the form E h Gn , on converge, and the limiting measure µ̄ is
deterministic.
Remark 2.12 (Local convergence in probability and rooted versus unrooted graphs) Usu-
ally, if we have a sequence of objects xn living in some space X , and xn converges to
x, then x also lives in X . In the above definitions of local convergence in probability and
almost surely, respectively, we take a graph sequence (Gn )n≥1 that converges locally in
probability and almost surely, respectively, to a rooted graph (G, o) ∼ µ. One might have
P a.s.
guessed that this is related to (Gn , on ) −→ (G, o) and (Gn , on ) −→ (G, o), but in fact
it is quite different. Let us restrict attention to local convergence in probability. Indeed,
P
(Gn , on ) −→ (G, o) is a very strong and arguably not so useful statement. For one, it re-
quires that ((Gn , on ))n≥1 and (G, o) live on the same probability space, which is not often
evidently the case. Further, sampling on gives rise to a variability in (Gn , on ) that is hard to
capture by the limit (G, o). Indeed, when n varies, the root on in (Gn , on ) will have to be
redrawn every once in a while, and it seems difficult to do this in such a way that (Gn , on )
is consistently close to (G, o). J
Remark 2.13 (Random measure interpretation of local convergence in probability) The
following observations turn local convergence in probability into the convergence of ob-
jects living on the same space, namely, the space of probability measures on rooted graphs.
Denote the empirical neighborhood measure µn on G? by
1
1{(Gn ,v)∈H? } ,
X
µn (H? ) = (2.4.4)
|V (Gn )| v∈V (G )
n

for every measurable subset H? of G? . Then, (Gn )n≥1 converges locally in probability to
the random rooted graph (G, o) ∼ µ when
P
µn (H? ) −→ µ(H? ) (2.4.5)
for every measurable subset H? of G? . This is equivalent to Definition 2.11(b), since, for
every bounded and continuous h : G? 7→ R, and denoting the conditional expected value of
h(Gn , on ) when (Gn , on ) ∼ µn by Eµn [h(Gn , on ) | Gn ], we have
1 X
Eµn [h(Gn , on ) | Gn ] = h(Gn , v) = E h Gn , on | Gn , (2.4.6)

|V (Gn )| v∈V (G )
n
2.4 Local Convergence of Random Graphs 57

which converges to Eµ h G, o by (2.4.2). Thus, (2.4.5) is equivalent to the convergence

in probability of the empirical neighborhood measure in Definition 2.11(b). This explains

why we view local convergence in probability as a property of the graph Gn rather than
of the rooted graph (Gn , on ). In turn, it can be shown that convergence in probability of
the empirical neighborhood measure in Definition 2.11(b) is equivalent to convergence in
distribution of the two-vertex measure
1 X
µ̄(2)
n (H?
(1)
, H?
(2)
) = P((Gn , v1 ) ∈ H?(1) , (Gn , v2 ) ∈ H?(2) )
|V (Gn )|2 v ,v ∈V (G )
1 2 n
(2.4.7)
to µ̄(2) (H?(1) , H?(2) ) = E[µ(H?(1) )µ(H?(2) )], for every pair of measurable subsets H?(1)
and H?(2) of G? (see e.g., Exercise 2.16). A similar comment applies to local convergence
almost surely. J
The limiting probability measures µ̄ for local convergence in distribution and µ for local
convergence in probability and local convergence almost surely are closely related:
Corollary 2.14 (Relation between local limits) Suppose that (Gn )n≥1 converges locally
d
in probability to (G, o) having law µ. Then (Gn , on ) −→ (Ḡ, ō) having law µ̄, and µ̄(·) =
E[µ(·)] is defined, for every bounded and continuous h : G? → R, by
h i
Eµ̄ h Ḡ, ō = E Eµ h G, o .

(2.4.8)

Further, if (Gn )n≥1 converges almost surely to (G, o), then (Gn )n≥1 also converges locally
in probability to (G, o).
Proof Note that Eµ h G, o is a bounded random variable, and so is E h Gn , on | Gn .

Therefore, by the Dominated Convergence Theorem [V1, Theorem A.1], the expectations
also converge. We conclude that
h i h i
E h Gn , on = E E h Gn , on | Gn → E Eµ h G, o ,

(2.4.9)

which proves that the claim with the limit identified in (2.4.8) holds. The relation between
local convergence almost surely and in probability follows from that for random variables
and Definition 2.11.
In most of our examples, the law µ of the local limit in probability is actually determin-
istic, in which case µ̄ = µ. However, there are some cases where this is not true. A simple
example arises as follows. For ERn (λ/n), the local limit in probability turns out to be a
Poi(λ) branching process (see Section 2.4.5). Therefore, when considering ERn (X/n),
where X is uniform on [0, 2], the local limit in probability will be a Poi(X) branching pro-
cess. Here, the expected offsprings conditioned on the random variable X are random and
related, as they are all equal to X . This is not the same as a mixed-Poisson branching process
with offspring distribution Poi(X), since, for the local limit in probability of ERn (X/n),
we draw X only once. We refer to Section 2.4.5 for more details on local convergence for
ERn (λ/n).
We have added the notion of local convergence in the almost sure sense, even though
for random graphs this notion is often not highly useful. Indeed, almost sure convergence
58 Local Convergence of Random Graphs

for random graphs can already be tricky, since for static models such as the Erdős–Rényi
random graph and the configuration model, there is no obvious relation between the graphs
of size n and those of size n + 1. This of course is different for the preferential attachment
model, which forms a (consistent) random graph process.

2.4.2 C RITERION FOR L OCAL C ONVERGENCE OF R ANDOM G RAPHS

We next discuss a convenient criterion for local convergence, inspired by Theorem 2.7:
Theorem 2.15 (Criterion for local convergence of random graphs) Let (Gn )n≥1 be a se-
quence of graphs. Then,
d
(a) (Gn , on ) −→ (Ḡ, ō) ∼ µ̄ precisely when, for every rooted graph H? ∈ G? and all
integers r ≥ 0,
1 X
E[p(Gn ) (H? )] = P(Br(Gn ) (v) ' H? ) → µ̄(Br(G) (ō) ' H? ).
|V (Gn )| v∈V (G )
n
(2.4.10)
(b) Gn converges locally in probability to (G, o) ∼ µ precisely when, for every rooted
graph H? ∈ G? and all integers r ≥ 0,
1
1{Br(Gn ) (v)'H? } −→
X P
p(Gn ) (H? ) = µ(Br(G) (o) ' H? ). (2.4.11)
|V (Gn )| v∈V (G )
n

(c) Gn converges locally almost surely to (G, o) ∼ µ precisely when, for every rooted
graph H? ∈ G? and all integers r ≥ 0,
1
1 (Gn ) a.s.
X
p(Gn ) (H? ) = −→ µ(Br(G) (o) ' H? ). (2.4.12)
|V (Gn )| v∈V (G ) {Br (v)'H? }
n

Proof This follows from Theorem 2.7. Indeed, for part (a), it follows directly, as part (a)
deals with local weak convergence as in Theorem 2.7. For convergence almost surely as
in part (c), this also follows directly. For part (b), we need an extra argument. By the Sko-
rokhod Embedding Theorem, for each H? , there exists a probability space for which the
convergence in (2.4.12) occurs almost surely. The same holds for any finite subcollection of
H? ∈ G? , and since the set of graphs H? that can occur as r-neighborhoods is countable, it
can even be extended to all such H? ∈ H? . Thus, the statement again follows from Theorem
2.7.
In what follows, we are mainly interested in local convergence in probability, since this is
the notion that is the most powerful and useful in the setting of random graphs.

2.4.3 L OCAL C ONVERGENCE AND C OMPLETENESS OF THE L IMIT

For many random graph models, the local limit is almost surely contained in a smaller set
of rooted graphs, as already mentioned in Section 2.3.3 and as extensively used in Section
2.3.4. The most common example occurs when the random graph converges locally to a
tree (recall that trees are rooted, see Section 1.5), but it can apply more generally. In this
case, and similarly to Theorem 2.8, it turns out to be enough to prove the convergence in
Definition 2.11 only for elements of this subset. Let us explain this in more detail.
2.4 Local Convergence of Random Graphs 59

Recall that T? ⊂ G? is a subset of the space of rooted graphs, and that T? (r) ⊆ T? is
the subset of T? of graphs for which the distance between any vertex and the root is at most
r. Then, we have the following result:
Theorem 2.16 (Local convergence and subsets) Let (Gn )n≥1 be a sequence of rooted
graphs. Let (Ḡ, ō) be a random variable on G? having law µ̄. Let T? ⊂ G? be a subset of
d
the space of rooted graphs. Assume that µ̄ (Ḡ, ō) ∈ T? = 1. Then, (Gn , on ) −→ (Ḡ, ō)

when (2.4.10) holds for all H? ∈ T? (r) and all r ≥ 1.

Similar extensions hold for the local convergence in probability in (2.4.11) and local
convergence almost surely in (2.4.12), with µ̄ replaced by µ and (Ḡ, ō) by (G, o).
Proof The proof for local weak convergence is identical to that of Theorem 2.8. The exten-
sions to local convergence in probability and almost surely follow similarly. See Exercises
2.20 and 2.21.

2.4.4 L OCAL C ONVERGENCE OF R ANDOM R EGULAR G RAPHS

In this subsection we give the very first example of a random graph that converges locally in
probability. This is the random regular graph, which is obtained by taking the configuration
model CMn (d), letting dv = d for all v ∈ [n], and conditioning it on simplicity (recall
the discussion below (1.3.29)). Here, we assume that nd is even. The main result is the
following:
Theorem 2.17 (Local convergence of random regular graphs) Fix d ≥ 1 and assume that
nd is even. The random regular graph of degree d and size n converges locally in probability
to the rooted d-regular tree.
Looking back at the local weak limit of truncated trees discussed in Section 2.3.4, we
see that a random regular graph is a much better approximation to a d-regular tree than a
truncation of the latter. One could see this as another example of the probabilistic method,
where a certain amount of randomization is useful to construct deterministic objects.
Proof Let Gn be the random regular graph, and let (Td , o) be the rooted d-regular tree.
We view Gn as a configuration model CMn (d) with dv = d for all v ∈ [n], conditioned on
simplicity. We first prove Theorem 2.17 for this configuration model instead.
P
We see that the only requirement is to show that p(Gn ) (Br(Td ) (o)) −→ 1, for which we use
a second-moment method. We start by showing that E[p(Gn ) (Br(Td ) (o))] → 1. We write
1 − E[p(Gn ) (Br(Td ) (o))] = P(Br(Gn ) (on ) is not a tree). (2.4.13)
For Br(Gn ) (on ) not to be a tree, a cycle needs to occur within a distance r. We grow the
neighborhood Br(Gn ) (on ) by pairing the half-edges incident to discovered vertices one by one
in a breadth-first way (recall Definitions 1.24 and 1.25). Since there is just a bounded number
of unpaired half-edges incident to the vertices found at any moment in the exploration, and
since we need to pair at most
d + d(d − 1) + · · · + d(d − 1)r−1 (2.4.14)
half-edges, the probability that any one of them creates a cycle vanishes. We conclude that
60 Local Convergence of Random Graphs

1 − E[p(Gn ) (Br(Td ) (o))] → 0. Next, we show that E[p(Gn ) (Br(Td ) (o))2 ] → 1, which shows
P
that Var(p(Gn ) (Br(Td ) (o))) → 0, and thus p(Gn ) (Br(Td ) (o)) −→ 1. Now,

1 − E[p(Gn ) (Br(Td ) (o))2 ] = P(Br(Gn ) (o(1)

n ) or Br
(Gn )
n ) is not a tree) → 0,
(o(2) (2.4.15)

as before, where o(1)

n and on are two independent and uniformly chosen vertices in [n]. This
(2)

completes the proof for the configuration model. Since we have proved the convergence
in probability of the subgraph proportions, convergence in probability follows when we
condition on simplicity (recall [V1, Corollary 7.17]), and thus the proof also follows for
random regular graphs. We leave the details of this argument as Exercise 2.19.

2.4.5 L OCAL C ONVERGENCE OF E RD ŐS –R ÉNYI R ANDOM G RAPHS

In this subsection, we work out one more example and show that the Erdős–Rényi random
graph ERn (λ/n) converges locally in probability to a Poi(λ) branching process:
Theorem 2.18 (Local convergence of Erdős–Rényi random graph) Fix λ > 0. ERn (λ/n)
converges locally in probability to a Poisson branching process with mean offspring λ.

Proof We start by reducing the proof to a convergence in probability statement.

Setting the Stage for the Proof

We start by using the convenient criterion in Theorem 2.15, so that we are left to prove
(2.4.11) in Theorem 2.15.
We then rely on Theorem 2.16, and prove the convergence of subgraph proportions as
in (2.4.12), but only for trees. Recall the discussion of rooted and ordered trees in Section
1.5. There, we considered trees to be ordered as described in Definition 1.24, so that, in
particular, every vertex has an ordered set of forward neighbors.
Fix a tree t. Then, in order to prove that ERn (λ/n) converges locally in probability to a
Poi(λ) branching process, we need to show that, for every tree t and all integers r ≥ 0,
1 X
p(Gn ) (t) = 1{Br(Gn ) (u)'t} −→
P
µ(Br(G) (o) ' t), (2.4.16)
n u∈[n]

where Gn = ERn (λ/n), and the law µ of (G, o) is that of a Poi(λ) branching process. We
see that, in this case, µ is deterministic, as it will be in most examples encountered in this
book. In (2.4.20), we may without loss of generality assume that t is a finite tree of depth at
most r, since otherwise both sides are zero.

Ordering Trees and Subgraphs

Of course, for the event {Br (o) ' t} to occur, the order of the tree t is irrelevant. Recall
(G)

the breadth-first exploration of the tree t in Definition 1.25, which is described in terms of
(xi )ti=0 as in (1.5.7) and the corresponding vertices (ai )ti=0 , where t = |V (t) denotes the
number of vertices in t. Further, note that (G, o) is, by construction, an ordered tree, and
therefore Br(G) (o) inherits this ordering. We make this explicit by writing B̄r(G) (o) for the
ordered version of Br(G) (o). Therefore, we can write B̄r(G) (o) = t to indicate that the two
2.4 Local Convergence of Random Graphs 61

ordered trees B̄r(G) (o) and t agree. In terms of this notation, one can compute
Y λ xi
µ(B̄r(G) (o) = t) = e−λ , (2.4.17)
i∈[t] : dist(∅,a )<r
xi !
i

where dist(∅, v) is the tree distance between v ∈ V (t) and the root ∅ ∈ V (t). We note that
B̄r(G) (o) = t says nothing about the degrees of the vertices that are at distance exactly r away
from the root ∅, which is why we restrict to vertices v with dist(∅, ai ) < r in (2.4.18).
Further, µ(B̄r (o) = t0 ) = µ(B̄r (o) = t) for each ordered tree t0 that is isomorphic to the
tree t. This is because the root degrees and degree sequences of the non-root vertices are the
same for all trees that are isomorphic to t, and the right-hand side of (2.4.17) depends only
on the degree of the root, and the degrees of all other non-root vertices (recall also Definition
1.25 and Exercise 1.28). Therefore,
Y λ xi
µ(Br(G) (o) ' t) = #(t) e−λ , (2.4.18)
i∈[t] : dist(∅,a )<r
xi !
i

where #(t) is the number of ordered trees that are isomorphic to t. This identifies the right-
hand side of (2.4.20).
We note further that by permuting the labels of all the childrenQ of any vertex in t, we
obtain a rooted tree that is isomorphic to t, and there are i∈[t] xi ! such permutations.
However, not all of them may lead to distinct ordered trees. In our analysis, the precise
value of #(t) will be irrelevant.
It is convenient to order also the vertices in Br(Gn ) (on ), where Gn = ERn (λ/n). This
can be achieved by ordering the forward children of a vertex in Br(Gn ) (on ) according to their
vertex labels. We denote the result as B̄r(Gn ) (o), which is an ordered graph. Then, we can
again write B̄r(Gn ) (o) = t to indicate that the two ordered graphs Br(Gn ) (o) and t agree. This
implies that Br(Gn ) (o) is a tree (so there are no cycles within depth r), and that its ordered
version is equal to the ordered tree t. Then, as in (2.4.18),
1 X 1 X
p(Gn ) (t) = 1{Br(Gn ) (v)'t} = #(t) 1 (Gn ) . (2.4.19)
n v∈[n] n v∈[n] {B̄r (v)=t}

We will prove below that

1 X
1{B̄r(Gn ) (v)=t} −→
P
µ(B̄r(G) (o) = t). (2.4.20)
n v∈[n]

Second-Moment Method: First Moment

To prove (2.4.20), we use a second-moment method. Denote
1{B̄r(Gn ) (v)=t} .
X
Nn,r (t) = (2.4.21)
v∈[n]

The second-moment method shows that Nn,r (t) is well concentrated around nµ(Br(G) (o) =
t). We start by investigating the first moment of Nn,r (t), which equals
X
E[Nn,r (t)] = P(B̄r(Gn ) (v) = t) = nP(B̄r(Gn ) (1) = t), (2.4.22)
v∈[n]
62 Local Convergence of Random Graphs

where the latter step uses the fact that the distributions of the neighborhoods of all vertices
in ERn (λ/n) are the same.
We recall the breadth-first description of an ordered tree in Definitions 1.24 and 1.25
in Section 1.5. Let vi ∈ [n] denote the vertex label of the ith vertex that is explored in the
breadth-first exploration. Let Xi denote the number of forward neighbors of vi , except when
vi is at graph distance r from vertex 1, in which case we set Xi = 0 by convention. Further,
let Yi denote the number of edges leading to already found, but not yet explored, vertices.
Then, B̄r(Gn ) (1) = t occurs precisely when (Xi , Yi ) = (xi , 0) for all i ∈ [t]. Therefore,

P(B̄r(Gn ) (1) = t) = P((Xi , Yi ) = (xi , 0) ∀i ∈ [t])

= P((X[t] , Y[t] ) = (x[t] , 0[t] ) ∀i ∈ [t]), (2.4.23)

where we use the abbreviation x[t] = (x1 , . . . , xt ). Conditioning, we write this as

t
Y
P (Xi , Yi ) = (x[i] , 0[i] ) | (X[i−1] , Y[i−1] ) = (x[i−1] , 0[i−1] ) . (2.4.24)
i=1

Conditional on (X[i−1] , Y[i−1] ) = (x[i−1] , 0[i−1] ), for all i for which vi is at a distance at
most r − 1 from vertex 1, we have

Xi ∼ Bin(ni , λ/n), (2.4.25)

when ni = n − si−1 − i + 1, and Xi = 0 otherwise. Here, we recall from (1.5.7) in

Definition 1.25 that (si )i≥0 satisfies s0 = 1 and si = si−1 + xi − 1 for i ≥ 1. Further, for
all i ∈ [t],

Yi ∼ Bin(si−1 − 1, λ/n), (2.4.26)

since there are si−1 active vertices, and Yi counts the number of edges between vi and
any other vertex. Finally, Xi and Yi are conditionally independent given (X[i−1] , Y[i−1] ) =
(x[i−1] , 0[i−1] ), owing to the independence of the edges in ERn (λ/n). Note that the distance
of vi from vertex 1 is exactly equal to the distance of the corresponding vertex ai ∈ V (t) to
the root ∅ ∈ V (t). Therefore, since P(Bin(ni , λ/n) = xi ) → e−λ λxi /xi !,

Y Y λ si−1
P(B̄r(Gn ) (1) = t) = P(Bin(ni , λ/n) = xi ) × 1−
i∈[t] : dist(∅,ai )<r i∈[t]
n
Y λxi
→ e−λ = µ(B̄r(G) (o) = t), (2.4.27)
i∈[t] : dist(∅,ai )<r
xi !

where the last equality follows from (2.4.17).

We conclude that

1
P(B̄r(Gn ) (1) = t) = E[Nn,r (t)] → µ(B̄r(G) (o) = t). (2.4.28)
n
2.4 Local Convergence of Random Graphs 63

Second-Moment Method: Second Moment

For the second moment of Nn,r (t), we compute
X
E[Nn,r (t)2 ] = P(B̄r(Gn ) (u1 ) = t, B̄r(Gn ) (u2 ) = t) (2.4.29)
u1 ,u2 ∈[n]

= nP(B̄r(Gn ) (1) = t) + n(n − 1)P(B̄r(Gn ) (1) = t, B̄r(Gn ) (2) = t).

We have already computed the asymptotics of the first term, so we are left with the second
term. We claim that
P(B̄r(Gn ) (1) = t, B̄r(Gn ) (2) = t) → µ(B̄r (o) = t)2 . (2.4.30)
We are left with showing (2.4.30). Let distGn (1, 2) denote the graph distance between
vertices 1 and 2 in Gn = ERn (λ/n). We make the following split:
P(B̄r(Gn ) (1) = t, B̄r(Gn ) (2) = t)
= P(B̄r(Gn ) (1) = t, B̄r(Gn ) (2) = t, distGn (1, 2) > 2r)
+ P(B̄r(Gn ) (1) = t, B̄r(Gn ) (2) = t, distGn (1, 2) ≤ 2r). (2.4.31)
We bound these terms one by one. Using the fact that all vertices are exchangeable, so that
vertex 2 has the same distribution as a uniform vertex different from vertex 1, the second
term in (2.4.31) can be bounded as follows:
P(B̄r(Gn ) (1) = t, B̄r(Gn ) (2) = t, distGn (1, 2) ≤ 2r)
h |B (Gn ) (1)| − 1 i 1 (Gn )
2r
≤ P(distGn (1, 2) ≤ 2r) ≤ E ≤ E |B2r

(1)| . (2.4.32)
n−1 n
Recall the definition of ∂Bk(Gn ) (v) in Definition 2.2, and write
2r
X
E |B2r E |∂Bk(Gn ) (1)| .
(Gn )
(1)| = (2.4.33)
k=0

It is not hard to show by induction on k that (see Exercise 2.25)

E |∂Bk(Gn ) (1)| ≤ λk .

(2.4.34)
This implies that
P(distGn (1, 2) ≤ 2r) = o(1). (2.4.35)
We rewrite the first term in (2.4.31) as
P(B̄r(Gn ) (2) = t, distGn (1, 2) > 2r | B̄r(Gn ) (1) = t)P(B̄r(Gn ) (1) = t). (2.4.36)
The second probability converges by (2.4.28). For the first probability, we note that, condi-
tional on B̄r(Gn ) (1) = t, distGn (1, 2) > 2r occurs when vertex 2 is not in B̄r(Gn ) (1) and
no vertex appearing in the exploration process of B̄r(Gn ) (2) connects to ∂ B̄r(Gn ) (1). On the
realizations where the event B̄r(Gn ) (1) = t holds, |∂ B̄r(Gn ) (1)| equals the number of (non-
root) leafs of t, which is bounded, and thus distGn (1, 2) > 2r occurs whp. Further, for
{B̄r(Gn ) (2) = t} ∩ {distGn (1, 2) > 2r} to occur, the exploration needs to use only vertices
in [n] \ B̄r(Gn ) (1), of which there are n − |B̄r(Gn ) (1)| = n − |V (t)|.
64 Local Convergence of Random Graphs

Since |V (t)| is bounded, as in (2.4.28) we conclude that

P(B̄r(Gn ) (2) = t, distGn (1, 2) > 2r | B̄r(Gn ) (1) = t) → µ(B̄r(G) (o) = t), (2.4.37)

which completes the proof of (2.4.30).

Completion of the Proof

The convergence in (2.4.30) implies that Var(Nn,r (t))/E[Nn,r (t)]2 → 0, so that, by the
Chebychev inequality [V1, Theorem 2.18],
Nn,r (t) P
−→ 1. (2.4.38)
E[Nn,r (t)]
In turn, by (2.4.28), this implies that
1 P
Nn,r (t) −→ µ(B̄r(G) (o) = t), (2.4.39)
n
which, by (2.4.19), implies that
P
p(Gn ) (t) = #(t)Nn,r (t)/n −→ #(t)µ(B̄r(G) (o) = t) = µ(Br(G) (o) ' t), (2.4.40)

as required.

Local Convergence Proofs in the Remainder of the Book

In Chapters 3, 4, and 5 below we extend the above analysis to inhomogeneous random
graphs, the configuration model (as well as uniform random graphs with prescribed de-
grees), and preferential attachment models, respectively. In many cases, the steps taken are
like the above. We always combine a second-moment method for Nn,r (t) with explicit com-
putations that allow us to show the appropriate adaptation of (2.4.39).

2.5 C ONSEQUENCES OF L OCAL C ONVERGENCE : L OCAL F UNCTIONALS

In this section we discuss some consequences of local convergence that will either prove
to be useful in what follows or describe how network statistics are determined by the local
limit.

2.5.1 L OCAL C ONVERGENCE AND C ONVERGENCE OF N EIGHBORHOODS

We start by showing that the number of vertices at distance up to m from a uniform vertex
weakly converges to the neighborhood sizes of the limiting rooted graph:
Corollary 2.19 (Weak convergence of neighborhood sizes) Let (Gn )n≥1 be a sequence of
graphs whose sizes |V (Gn )| tend to infinity.
d
(a) Assume that (Gn , on ) −→ (Ḡ, ō) ∼ µ̄ on G? . Then, for every m ≥ 1,
m m
d
|∂Br(Gn ) (on )| −→ |∂Br(Ḡ) (ō)| . (2.5.1)
r=0 r=1
2.5 Consequences of Local Convergence: Local Functionals 65

(b) Assume that Gn converges locally in probability to (G, o) ∼ µ on G? . Then, for every
m ≥ 1, with o(1)
n , on two independent uniformly chosen vertices in V (Gn ),
(2)

m
|∂Br(Gn ) (o(1)
n )|, |∂Br
(Gn )
(o(2)
n )|
r=1
d
m
−→ |∂Br (o )|, |∂Br (o(2) )|
(G) (1) (G)
, (2.5.2)
r=1

where the two limiting neighborhood sizes are independent given µ.

Proof Part (a) follows immediately, since
h(G, o) = 1{|∂Br(G) (o)|=`r ∀r∈[m]} (2.5.3)
is a bounded continuous function for every m and `1 , . . . , `m (see Exercise 2.29). The proof
of (2.5.2) in part (b) follows by noting that

P Bm (Gn )
(o(1)
n ) ' t 1 , Bm
(Gn )
(o (2)
n ) ' t 2 | G n = p(Gn ) (t1 )p(Gn ) (t2 ), (2.5.4)

by the independence and uniformity of o(1)

n , on . Therefore,
(2)

P Bm (Gn )
n ) ' t1 , Bm (on ) ' t2 | Gn
(o(1) (Gn ) (2)

P
−→ µ(Bm
(G)
(o(1) ) ' t1 )µ(Bm
(G)
(o(2) ) ' t2 ). (2.5.5)
Taking the expectation proves the claim (the reader is invited to provide the fine details of
this argument in Exercise 2.30).
In the above discussion, it is crucial to note that the limits in (2.5.2) correspond to two
independent copies of (G, o) having law µ, but with the same µ, where µ is a random
probability measure on G? . It is here that the possible randomness of µ manifests itself.
Recall also the example below Corollary 2.14.
We continue by showing that local convergence implies that the graph distance between
two uniform vertices tends to infinity:
Corollary 2.20 (Large distances) Let (Gn )n≥1 be a graph sequence whose sizes |V (Gn )|
n , on be two vertices chosen independently and uar from V (Gn ).
tend to infinity. Let o(1) (2)

d
Assume that (Gn , on ) −→ (Ḡ, ō) ∼ µ̄. Then
P
n , on ) −→ ∞.
distGn (o(1) (2)
(2.5.6)
Proof It suffices to prove that, for every r ≥ 1,
P(distGn (o(1)
n , on ) ≤ r) = o(1).
(2)
(2.5.7)
For this, we use that o(2)
n is chosen uar from V (Gn ) independently of on , so that
(1)

P(distGn (o(1)
n , on ) ≤ r) = E |Br
(Gn ) (1)
(2)
(on )|/|V (Gn )|
= E |Br(Gn ) (on )|/|V (Gn )| .

(2.5.8)
By Corollary 2.19(a), |Br(Gn ) (on )| is a tight random variable, so that
P
|Br(Gn ) (on )|/|V (Gn )| −→ 0. (2.5.9)
66 Local Convergence of Random Graphs

Further, |Br(Gn ) (on )|/|V (Gn )| ≤ 1 almost surely. Thus, by the Dominated Convergence
Theorem ([V1, Theorem A.1]), E |Br(Gn ) (on )|/n = o(1) for every r ≥ 1, so that the
claim follows.
We close this section by showing that local convergence implies that the number of con-
nected components converges:
Corollary 2.21 (Number of connected components) Let (Gn )n≥1 be a sequence of graphs
whose sizes |V (Gn )| tend to infinity, and let Qn denote the number of connected components
in Gn .
d
(a) Assume that (Gn , on ) −→ (Ḡ, ō) ∼ µ̄. Then
E[Qn /|V (Gn )|] → Eµ̄ [1/|C (ō)|], (2.5.10)
where |C (ō)| is the size of the connected component of ō in Ḡ.
(b) Assume that Gn converges locally in probability to (G, o) ∼ µ. Then

Qn /|V (Gn )| −→ Eµ [1/|C (o)|].

P
(2.5.11)
Proof Note that
X 1
Qn = . (2.5.12)
v∈V (Gn )
|C (v)|

Thus, for part (a), we write

" #
1 X 1
E[Qn /|V (Gn )|] = E
|V (Gn )| v∈V (G ) |C (v)|
n
" #
1
=E , (2.5.13)
|C (on )|

where on ∈ V (Gn ) is chosen uar. Since h(G, o) = 1/|C (o)| is a bounded and continuous
function (where, by convention, h(G, o) = 0 when |C (o)| = ∞; see Exercise 2.22), the
claim follows.
For part (b), instead, we have
" # " #
1 1
Qn /|V (Gn )| = E Gn −→ Eµ
P
, (2.5.14)
|C (on )| |C (o)|
as required.

2.5.2 L OCAL C ONVERGENCE AND C LUSTERING C OEFFICIENTS

In this subsection we discuss the convergence of various local and global clustering coef-
ficients when a random graph converges locally. We start by recalling the global clustering
coefficient, following [V1, Section 1.5]. For a graph Gn = (V (Gn ), E(Gn )), we let

1{ij,jk∈E(Gn )} =
X X
WG n = dv (dv − 1) (2.5.15)
i,j,k∈V (Gn ) v∈V (Gn )
2.5 Consequences of Local Convergence: Local Functionals 67

denote twice the number of wedges in the graph Gn . The factor two arises because the
wedge ij, jk is the same as the wedge kj, ji, but it is counted twice in (2.5.15). We further
let
1{ij,jk,ik∈E(Gn )}
X
∆G n = (2.5.16)
i,j,k∈V (Gn )

denote six times the number of triangles in Gn . The global clustering coefficient CCGn in
Gn is defined as
∆Gn
CCGn = . (2.5.17)
WGn
The global clustering coefficient measures the proportion of wedges for which the closing
edge is also present. As such, it can be thought of as the probability that two random friends
of a random individual are friends themselves.
The following theorem describes the conditions for the clustering coefficient to converge.
In its statement, we recall that a sequence (Xn )n≥1 of random variables is uniformly inte-
grable when
lim lim sup E[|Xn |1{|Xn |>K} ] = 0. (2.5.18)
K→∞ n→∞

Theorem 2.22 (Convergence of global clustering coefficient) Let (Gn )n≥1 be a sequence
of graphs whose sizes |V (Gn )| tend to infinity. Assume that Gn converges locally in prob-
ability to (G, o) ∼ µ. Further, assume that Dn = d(G on
n)
is such that (Dn2 )n≥1 is uniformly
integrable, and that µ(do > 1) > 0. Then
P Eµ [∆G (o)]
CCGn −→ , (2.5.19)
Eµ [do (do − 1)]
where ∆G (o) = u,v∈∂B1 (o) 1{{u,v}∈E(G)} denotes twice the number of triangles in G that
P
contain o as a vertex.
Proof We write
E[∆Gn (on ) | Gn ]
CCGn = , (2.5.20)
E[d(Gn) (Gn )
on (don − 1) | Gn ]

P choice of on ∈ V (Gn ), and don denotes the

(Gn )
where the expectations are wrt the uniform
degree of on in Gn while ∆Gn (on ) = u,v∈∂B (Gn ) (on ) 1{{u,v}∈E(Gn )} denotes twice the
1
number of triangles of which on is part.
d
By local convergence in probability, which implies local weak convergence, ∆Gn (on ) −→
d
∆G (o) and d(G on (don − 1) −→ do (do − 1). However, both are unbounded functionals, so
n) (Gn )

that the convergence of their expectations over on does not follow immediately from local
convergence in probability. It is here that we need to make use of the uniform integrability
of (Dn2 )n≥1 , where Dn = d(G n)
on . We make the split

E[do(Gnn ) (do(Gnn ) − 1) | Gn ] (2.5.21)

= E[don (don − 1)1{(d(G
(Gn ) (Gn )
o
n) 2
) ≤K}
| Gn ] + E[don (don − 1)1{(d(G
(Gn )
o
n) 2 | Gn ].
(Gn )
) >K}
n n
68 Local Convergence of Random Graphs

Again by local convergence in probability (recall Corollary 2.19),

on (don − 1)1{(do
E[d(G | Gn ] −→ Eµ [do (do − 1)1{d2o ≤K} ],
P
n) (Gn )
(Gn ) 2
) ≤K}
(2.5.22)
n

since h(G, o) = do (do − 1)1{d2o ≤K} is a bounded continuous function. Further, by the
on ) )n≥1 and with E denoting the expectation wrt
uniform integrability of (Dn2 )n≥1 = ((d(Gn) 2

on as well as wrt the random graph, for every ε > 0 there exists an N = N (ε) sufficiently
large such that, uniformly in n ≥ N (ε),

on (don − 1)1{(d(G on ) 1{(d(G

n) 2
E[d(Gn) (Gn )
o
n) 2
) >K}
] ≤ E[(d(G o
n) 2
) >K}
] ≤ ε2 . (2.5.23)
n n

By the Markov inequality,

1

P E[d(Gon
n)
(d(Gn )
on − 1) (Gn ) 2
{(do ) >K}
| G n ] ≥ ε
n

1
on ) 1{(d(G
n) 2
≤ E[(d(G n) 2
) >K}
] ≤ ε. (2.5.24)
ε on

It follows that E[d(G − 1) | Gn ] −→ Eµ [do (do − 1)], as required. Since µ(do >
P
on (don
n) (Gn )

1) > 0, we have Eµ [do (do − 1)] > 0, so that also

1/E[d(G
on (don − 1) | Gn ] −→ 1/Eµ [do (do − 1)].
P
n) (Gn )

The proof that E[∆Gn (on ) | Gn ] −→ Eµ [∆G (o)] is similar, where now we make the
P

split
E[∆Gn (on ) | Gn ] = E[∆Gn (on )1{(d(G
o
n) 2
) ≤K}
| Gn ]
n

+ E[∆Gn (on )1{(d(G

o
n) 2
) >K}
| Gn ]. (2.5.25)
n

Since ∆Gn (on ) ≤ d(G

on (don − 1), the first term is again the expectation of a bounded and
n) (Gn )

continuous functional, and therefore converges in probability. The second term, on the other
hand, satisfies
E[∆Gn (on )1{(do(Gn ) )2 >K} | Gn ] ≤ E[(do(Gnn ) )2 1{(d(G
o
n) 2
) >K}
| Gn ], (2.5.26)
n n

which can be treated as in the analysis of E[do(Gnn ) (do(Gnn ) − 1) | Gn ] above, as required.

We see that in order to obtain convergence of the clustering coefficient, we need an addi-
tional uniform integrability condition on the degree distribution. Indeed, more precisely, we
need that (Dn2 )n≥1 is uniformly integrable, with Dn = d(G n)
on the degree of a uniform vertex.
This is a recurring theme below.
We next discuss a related clustering coefficient, where such additional assumptions are
not needed. For this, we define the local clustering coefficient for vertex v ∈ V (Gn ) to be
∆Gn (v)
CCGn (v) = , (2.5.27)
dv (dv − 1)
where ∆Gn (v) = s,t∈∂B (Gn ) (v) 1{{s,t}∈E(Gn )} is again twice the number of triangles of
P
1
which v is part. Then, we let the local clustering coefficient be
1 X
CCGn = CCGn (v). (2.5.28)
|V (Gn )| v∈V (G )
n
2.5 Consequences of Local Convergence: Local Functionals 69

Here, we can think of ∆Gn (v)/[dv (dv − 1)] as the proportion of edges present between
neighbors of v , and then (2.5.28) takes the average of this. The following theorem implies
its convergence without any further uniform integrability conditions, and thus justifies the
name local clustering coefficient:
Theorem 2.23 (Convergence of local clustering coefficient) Let (Gn )n≥1 be a sequence of
graphs whose sizes |V (Gn )| tend to infinity. Assume that Gn converges locally in probability
to (G, o) ∼ µ. Then
h ∆ (o) i
G
CCGn −→ Eµ
P
. (2.5.29)
do (do − 1)
Proof We now write
h ∆Gn (on ) i
CCGn = E | G n , (2.5.30)
d(Gn) (Gn )
on (don − 1)

and note that h(G, o) = ∆G (o)/[do (do −1)] is a bounded continuous functional. Therefore,
h ∆Gn (on ) i h ∆ (o) i
G
E E
P
| G n −→ µ , (2.5.31)
d(Gn) (Gn )
on (don − 1) do (do − 1)
as required.
There are more versions of clustering coefficients. Convergence of the so-called clustering
spectrum is discussed in the notes in Section 2.7.

2.5.3 N EIGHBORHOODS OF E DGES AND D EGREE –D EGREE D EPENDENCIES

In this subsection we discuss the convergence of degree–degree dependencies when a ran-
dom graph converges locally. By degree–degree dependencies, we mean the dependencies
between the degrees of the vertices at the two ends of edges in the graph. Often, this is
described in terms of the degree–degree dependencies in an edge drawn uar from the col-
lection of all edges in the graph. Such dependencies are often described in terms of the
so-called assortativity coefficient. For this, it is also crucial to discuss the convergence of
the local neighborhood of a uniformly chosen edge. We will again see that an extra uniform
integrability condition is needed for the assortativity coefficient to converge.
We start by defining the neighborhood structure of edges. It will be convenient to consider
directed edges. We let e = (u, v) be an edge directed from u to v , and let E(G ~ n) =
{(u, v) : {u, v} ∈ E(Gn )} denote the collection of directed edges, so that |E(G ~ n )| =
2|E(Gn )|. Directed edges are convenient, as they assign a (root-)vertex to an edge. For
e = (u, v), we often write e = u and ē = v for its start and end vertices.
For H? ∈ G? , let
1
1{Br(Gn ) (u)'H? } .
X
pe(Gn ) (H? ) = (2.5.32)
2|E(Gn )| ~
(u,v)∈E(Gn )

Note that
pe(Gn ) (H? ) = P(Br(Gn ) (e) ' H? | Gn ), (2.5.33)
70 Local Convergence of Random Graphs

where e = (e, ē) is a uniformly chosen directed edge from E(G ~ n ). Thus, pe(Gn ) (H? ) is the
edge-equivalent of p (Gn )
(H? ) in (2.3.5). We next study its asymptotics:
Theorem 2.24 (Convergence neighborhoods of edges) Let (Gn )n≥1 be a sequence of
graphs whose sizes |V (Gn )| tend to infinity. Assume that Gn converges locally in proba-
bility to (G, o) ∼ µ. Assume further that (do(Gnn ) )n≥1 is a uniformly integrable sequence of
random variables, and that µ(do ≥ 1) > 0. Then, for every H? ∈ G? ,

P
Eµ [do 1{Br(G) (o)'H? } ]
pe(Gn ) (H? ) −→ . (2.5.34)
Eµ [do ]
Proof We recall (2.5.32), and note that
2 1 X
|E(Gn )| = d(Gn ) = E[d(Gn)
| Gn ]. (2.5.35)
|V (Gn )| |V (Gn )| v∈V (G ) v on
n

Therefore, since (do(Gnn ) )n≥1 is uniformly integrable, local convergence in probability implies
that
2
|E(Gn )| −→ Eµ [do ].
P
(2.5.36)
|V (Gn )|
Since µ(do ≥ 1) > 0, it follows that Eµ [do ] > 0.
Further, we rewrite
1 1
1{Br(Gn ) (u)'H? } = du(Gn ) 1{Br(Gn ) (u)'H? }
X X
|V (Gn )| ~ n)
|V (G n )| u∈V (Gn )
(u,v)∈E(G

1
h i
= E d(G
on
n)
(Gn )
{Br (on )'H? }
| G n , (2.5.37)

where on is a uniformly chosen vertex in V (Gn ). Again, since (d(G on )n≥1 is uniformly
n)

integrable and by local convergence in probability,

1 n −→ Eµ [do 1{Br(G) (o)'H? } ].

h i
E d(G
P
on
n)
(Gn )
{Br (on )'H? }
| G (2.5.38)

Therefore, by (2.5.32), taking the ratio of the terms in (2.5.36) and (2.5.38) proves the claim.

We continue by considering the degree–degree distribution given by

1
1{d(G
X
p(G n)
k,l = n) (G )
=k,dē n =l}
. (2.5.39)
2|E(Gn )| ~ e
e∈E(Gn )

Thus, p(G n)
k,l is the probability that a random directed edge connects a vertex of degree k with
one of degree l. By convention, we define p(G n)
k,l = 0 when k = 0. The following theorem
proves that the degree–degree distribution converges when the graph locally converges in
probability:
Theorem 2.25 (Degree–degree convergence) Let (Gn )n≥1 be a sequence of graphs whose
sizes |V (Gn )| tend to infinity. Assume that Gn converges locally in probability to (G, o) ∼
2.5 Consequences of Local Convergence: Local Functionals 71

µ. Assume further that (do(Gnn ) )n≥1 is a uniformly integrable sequence of random variables,
and that µ(do ≥ 1) > 0. Then, for every k, l with k ≥ 1,
(Gn ) P
pk,l −→ kµ(do = k, dV = l), (2.5.40)

where V is a neighbor of o chosen uar.

Proof Recall (2.5.36). We rewrite
1
1{d
X
(Gn ) (G )
=k,dē n =l}
|V (Gn )| ~ n)
e
e∈E(G
!
1 1 X
1{du(Gn ) =k} 1 (Gn )
X
=k
|V (Gn )| u∈V (G ) k v : v∼u {du =l}
n

= k E 1{d(G
h i
o
n)
=k,d
(Gn )
=l}
| G n , (2.5.41)
n V

where V is a uniform neighbor of on , which itself is uniform in V (Gn ). Again, by local

convergence in probability and the fact that the distribution of (do , dV ) conditional on (G, o)
is a deterministic function of B2(Gn ) (on ),
1
1{d
X P
(Gn ) (G )
=k,dē n =l}
−→ kµ(do = k, dV = l). (2.5.42)
|V (Gn )| ~ n)
e
e∈E(G

Thus, by (2.5.39), taking the ratio of the terms in (2.5.36) and (2.5.42) proves the claim.
We finally discuss the consequences for the assortativity coefficient (recall [V1, Section
1.5]). We now write the degrees in Gn as (dv )v∈V (Gn ) to avoid notational clutter. Define the
assortativity coefficient as

i,j∈V (Gn ) di dj 1{(i,j)∈E(G

P ~
~ n )} − di dj /|E(Gn )|)
ρGn = P , (2.5.43)
di 1{i=j} − di dj /|E(G
~ n )|) di dj

i,j∈V (Gn )

where we recall that E(G ~ n ) is the collection of directed edges and we make the abbreviation
(Gn )
di = di for i ∈ V (Gn ). We can recognize ρGn in (2.5.43) as the empirical correlation
coefficient of the two-dimensional sequence of variables (de , dē )e∈E(G ~ n ) . As a result, it is
the correlation between the coordinates of the two-dimensional random variable of which
(Gn )
(pk,l )k,l≥1 is the joint probability mass function. We can rewrite the assortativity coefficient
ρGn more conveniently as
P
~ n ) di dj − (
P 2 2 ~
(i,j)∈E(G i∈V (Gn ) di ) /|E(Gn )|
ρGn = P . (2.5.44)
d3 − (
P
i i
~ n )|
d2 )2 /|E(G
i∈V (Gn ) i∈V (Gn )

The following theorem gives conditions for the convergence of ρGn when Gn converges
locally in probability:
Theorem 2.26 (Assortativity convergence) Let (Gn )n≥1 be a sequence of graphs whose
sizes |V (Gn )| tend to infinity. Assume that Gn converges locally in probability to (G, o) ∼
72 Local Convergence of Random Graphs

µ. Assume further that Dn = do(Gnn ) is such that (Dn3 )n≥1 is uniformly integrable, and that
µ(do = r) < 1 for every r ≥ 0. Then,

P Eµ [d2o dV ] − Eµ [d2o ]2 /Eµ [do ]

ρGn −→ , (2.5.45)
Eµ [d3o ] − Eµ [d2o ]2 /Eµ [do ]

where V is a neighbor of o chosen uar.

Proof We start with (2.5.44), and consider the various terms. We divide all the sums by
n) 3
n. Then, by local convergence in probability and the uniform integrability of ((d(G
on ) )n≥1 ,
which implies that (don )n≥1 is also uniformly integrable,
(Gn )

1 ~
|E(Gn )| = E[do(Gnn ) | Gn ] −→ Eµ [do ].
P
(2.5.46)
n
n) 3
Again by local convergence and the uniform integrability of ((d(G
on ) )n≥1 , which implies
(Gn ) 2
that ((don ) )n≥1 is also uniformly integrable,

1 X
d2 = E[(d(Gn) 2 2
on ) | Gn ] −→ Eµ [do ].
P
(2.5.47)
|V (Gn )| i∈V (G ) i
n

Further, again by local convergence in probability and the assumed uniform integrability of
((do(Gnn ) )3 )n≥1 ,
1 X
d3 = E[(d(Gn) 3 3
on ) | Gn ] −→ Eµ [do ].
P
(2.5.48)
|V (Gn )| i∈V (G ) i
n

This identifies the limits of all but one of the sums appearing in (2.5.44). Details are left to
the reader in Exercise 2.26. Further, Eµ [d3o ] − Eµ [d2o ]2 /Eµ [do ] > 0 since µ(do = r) < 1
for every r ≥ 0 (see Exercise 2.27).
We finally consider the last term, involving the product of the degrees across edges, i.e.,
!
1 X 1 X
2 1 X
di dj = d dv (2.5.49)
|V (Gn )| ~
|V (Gn )| u∈V (G ) u du v : v∼u
(i,j)∈E(Gn ) n

= E[d2on dV | Gn ],

where V is a uniform neighbor of on . When the degrees are uniformly bounded, the func-
tional h(G, o) = d2o E[dV | G] is bounded and continuous, so that it will converge. However,
the degrees are not necessarily bounded, so a truncation argument is needed.
We make the split
1 1
di dj 1{di ≤K,dj ≤K}
X X
di dj =
|V (Gn )| ~ n)
|V (Gn )| ~ n)
(i,j)∈E(G (i,j)∈E(G
1
di dj (1 − 1{di ≤K,dj ≤K} ).
X
+ (2.5.50)
|V (Gn )| ~ n)
(i,j)∈E(G
2.5 Consequences of Local Convergence: Local Functionals 73

We now rewrite, in the same way as above,

1
di dj 1{di ≤K,dj ≤K} = E[d2on dV 1{don ≤K,dV ≤K} | Gn ].
X
(2.5.51)
|V (Gn )| ~ n)
(i,j)∈E(G

By local convergence in probability (or by Theorem 2.25), since the functional is now
bounded and continuous,
1
di dj 1{di ≤K,dj ≤K} −→ Eµ [d2o dV 1{don ≤K,dV ≤K} ].
X P
(2.5.52)
|V (Gn )| ~ n)
(i,j)∈E(G

We are left with showing that the second contribution in (2.5.50) is small. We bound this
contribution as follows:
1
di dj (1{di >K} + 1{dj >K} )
X
|V (Gn )| ~ n)
(i,j)∈E(G
2
di dj 1{di >K} .
X
= (2.5.53)
|V (Gn )| ~ n)
(i,j)∈E(G

We now use the Cauchy-Schwarz inequality to bound this:

1
di dj (1{di >K} + 1{dj >K} )
X
|V (Gn )| ~ n)
(i,j)∈E(G
2
s s
d2i 1{di >K}
X X
≤ d2j
|V (Gn )| ~ n) ~ n)
(i,j)∈E(G (i,j)∈E(G

on ) 1{d(G
n) 3
= 2E[(d(G o
n)
>K}
| Gn ]1/2 E[(d(Gn) 3
on ) | Gn ]
1/2
. (2.5.54)
n

n) 3
By the uniform integrability of ((d(G
on ) )n≥1 , there exists K = K(ε) and N = N (ε) such
that, for all n ≥ N ,

on ) 1{d(G
n) 3
E[(d(G o
n)
>K}
] ≤ ε4 /4. (2.5.55)
n

In turn, by the Markov inequality, this implies that

ε3 4
1 ≤ 3 E[(do(Gnn ) )3 1{d(G

n) 3
P E[(d(G
on ) (Gn )
{don >K}
| G n ] ≥ n)
>K}
] ≤ ε. (2.5.56)
4 ε on

As a result, with probability at least 1 − ε and for ε > 0 sufficiently small to accommo-
date the factor En [(d(Gn ) 3 1/2
on ) ] (which is uniformly bounded by the uniform integrability of
(Gn ) 3
((don ) )n≥1 ),

1
di dj (1{di >K} + 1{dj >K} ) ≤ ε3/2 E[(d(G
X
n) 3 1/2
o n ) | Gn ] ≤ ε. (2.5.57)
|V (Gn )| ~ n)
(i,j)∈E(G

This completes the proof.

74 Local Convergence of Random Graphs

2.6 G IANT C OMPONENT IS A LMOST L OCAL

We continue by investigating the size of the giant component when the graph converges
locally. Here, we simplify the notation by assuming that Gn = (V (Gn ), E(Gn )) is such
that |V (Gn )| = n, and we recall that
|Cmax | = max |C (v)| (2.6.1)
v∈V (Gn )

denotes the maximal connected component size. While Corollary 2.21 shows that the num-
ber of connected components is well behaved in the local topology, the proportion of vertices
in the giant is not so nicely behaved.

2.6.1 A SYMPTOTICS OF THE G IANT

Clearly, the proportion of vertices in the largest connected component |Cmax |/n is not con-
tinuous in the local convergence topology (see Exercise 2.31), as it is a global object. In
d
fact, |C (on )|/n also does not converge in distribution when (Gn , on ) −→ (G, o). How-
ever, local convergence still tells us a useful story about the existence of a giant, as well as
its size:
Corollary 2.27 (Upper bound on the giant) Let (Gn )n≥1 be a sequence of graphs whose
sizes |V (Gn )| = n tend to infinity. Assume that Gn converges locally in probability to
(G, o) ∼ µ. Write ζ = µ(|C (o)| = ∞) for the survival probability of the limiting graph
(G, o). Then, for every ε > 0 fixed,
P(|Cmax | ≤ n(ζ + ε)) → 1. (2.6.2)
P
In particular, Corollary 2.27 implies that |Cmax |/n −→ 0 when ζ = 0 (see Exercise
2.33), so that there can only be a giant when the local limit has a positive survival probability.
Proof Define
1{|C (v)|≥k} .
X
Z≥k = (2.6.3)
v∈V (Gn )

Assume that Gn converges locally in probability to (G, o). Then, we conclude that, with
ζ≥k = µ(|C (o)| ≥ k) (see Exercise 2.32),
Z≥k
= E[1{|C (on )|≥k} | Gn ] −→ ζ≥k .
P
(2.6.4)
n
For every k ≥ 1,
{|Cmax | ≥ k} = {Z≥k ≥ k}, (2.6.5)
and |Cmax | ≤ Z≥k on those realizations where the event that Z≥k ≥ 1 holds. Note that
ζ = limk→∞ ζ≥k = µ(|C (o)| = ∞). We take k large enough that ζ ≥ ζ≥k − ε/2. Then,
for every k ≥ 1, ε > 0, and all n large enough that n(ζ + ε) ≥ k ,
P(|Cmax | ≥ n(ζ + ε)) ≤ P(Z≥k ≥ n(ζ + ε))
≤ P(Z≥k ≥ n(ζ≥k + ε/2)) = o(1). (2.6.6)
2.6 Giant Component is Almost Local 75

We conclude that while local convergence cannot determine the size of the largest con-
nected component, it can prove an upper bound on |Cmax |. In this book, we often extend this
P
to |Cmax |/n −→ ζ = µ(|C (o)| = ∞), but this is no longer a consequence of local conver-
P
gence alone. In Exercise 2.31, you are asked to give an example where |Cmax |/n −→ η < ζ ,
even though Gn does converge locally in probability to (G, o) ∼ µ. Therefore, in general,
more involved arguments must be used. The next theorem shows that one, relatively simple,
/ y for the statement that C (x) and C (y)
condition suffices. In its statement, we write x ←→
are disjoint:
Theorem 2.28 (The giant is almost local) Let Gn = (V (Gn ), E(Gn )) denote a random
graph of size |V (Gn )| = n. Assume that Gn converges locally in probability to (G, o) ∼ µ.
Assume further that
1 h i
lim lim sup 2 E # (x, y) ∈ V (Gn ) × V (Gn ) : |C (x)|, |C (y)| ≥ k, x ←→ / y = 0.
k→∞ n→∞ n
(2.6.7)
Then, if Cmax and C(2) denote the largest and second largest connected components (with
ties broken arbitrarily),
|Cmax | P |C(2) | P
−→ ζ = µ(|C (o)| = ∞), −→ 0. (2.6.8)
n n
Remark 2.29 (“Giant is almost local” proofs) Theorem 2.28 shows that the relatively mild
condition in (2.6.7) suffices for the giant to have the expected limit. In fact, it is necessary
and sufficient; see Exercise 2.34. It is most useful when we can easily show that vertices with
large clusters are likely to be connected, and it will be applied to the Erdős–Rényi random
graph below, to configuration models in Section 4.3, and to inhomogeneous random graphs
with finitely many types in Section 6.5.3. J

We now start with the proof of Theorem 2.28. Recall that ζ = µ(|C (o) = ∞) might be
a random variable when µ is a random probability measure on rooted graphs. We first note
that, by Corollary 2.27, the statement follows on the event that ζ = µ(|C (o) = ∞) = 0,
so that it suffices to prove Theorem 2.28 on the event that ζ > 0. By conditioning on this
event, we may assume that ζ > 0 almost surely.
We recall that the vector (|C(i) |)i≥1 denotes the cluster sizes ordered in size, from large
to small with ties broken arbitrarily, so that |C(1) | = |Cmax |. The following lemma gives
a useful estimate of the sum of squares of these ordered cluster sizes. In its statement, we
write Xn,k = ok,P (1) when
lim lim sup P(|Xn,k | > ε) = 0. (2.6.9)
k→∞ n→∞

Exercise 2.23 shows that Z≥k /n = ζ + ok,P (1).

Lemma 2.30 (Convergence of sum of squares of cluster sizes) Under the conditions of
Theorem 2.28,
1 X
|C(i) |2 1{|C(i) |≥k} = ζ 2 + ok,P (1). (2.6.10)
n2 i≥1
76 Local Convergence of Random Graphs

Proof We use that, by local convergence in probability and for any k ≥ 1 fixed (recall
(2.6.4))
1 1X
Z≥k = |C(i) |1{|C(i) |≥k} = ζ + ok,P (1), (2.6.11)
n n i≥1
by Exercise 2.23. Further,
2
Z≥k 1X
ζ 2 + ok,P (1) = 2
= |C(i) |2 1{|C(i) |≥k} + ok,P (1). (2.6.12)
n n i≥1
Indeed,
1 X
|C(i) ||C(j) |1{|C(i) |,|C(j) |≥k}
n2 i,j≥1
i6=j
1
= # (x, y) ∈ V (Gn ) × V (Gn ) : |C (x)|, |C (y)| ≥ k, x ←→/ y , (2.6.13)
n2
which, by the Markov inequality, and abbreviating (x, y) ∈ V (Gn ) × V (Gn ) to (x, y),
satisfies
1
lim lim sup P 2 # (x, y) : |C (x)|, |C (y)| ≥ k, x ←→ / y ≥ε
k→∞ n→∞ n
1 h i
≤ lim lim sup 2 E # (x, y) : |C (x)|, |C (y)| ≥ k, x ←→ / y = 0, (2.6.14)
k→∞ n→∞ εn

by our main assumption in (2.6.7). We conclude that

2
1 X Z≥k
|C(i) | 2
1{|C(i) |≥k} = = ζ 2 + ok,P (1), (2.6.15)
n2 i≥1 n2
1
|C(i) |2 −→ ζ 2 .
P P
by (2.6.11). This proves (2.6.10). Exercise 2.24 proves that n2 i≥1

We are now ready to complete the proof of Theorem 2.28:

Proof of Theorem 2.28. By Lemma 2.30,
1 X |Cmax | 1 X
ζ 2 + ok,P (1) = 2 |C(i) |2 1{|C(i) |≥k} ≤ |C(i) |1{|C(i) |≥k}
n i≥1 n n i≥1
|Cmax |
= (ζ + ok,P (1)), (2.6.16)
n
where the last step follows by (2.6.11). Thus, |Cmax |/n ≥ ζ + ok,P (1). Since k is arbitrary,
this proves that |Cmax | ≥ nζ(1 + oP (1)). By Corollary 2.27, also |Cmax | ≤ nζ(1 + oP (1)).
P P
Therefore, |Cmax |/n −→ ζ. In turn, since |Cmax |/n −→ ζ and by (2.6.11), on the event
that |C(2) | ≥ k ,
1 1X 1X |Cmax |
|C(2) | ≤ |C(i) |1{|C(i) |≥k} = |C(i) |1{|C(i) |≥k} − = ok,P (1), (2.6.17)
n n i≥2 n i≥1 n
P
which, again since k is arbitrary, implies that |C(2) |/n −→ 0.
2.6 Giant Component is Almost Local 77

2.6.2 P ROPERTIES OF THE G IANT

We next extend Theorem 2.28 somewhat, and investigate the structure of the giant. For this,
we first let v` (Cmax ) denote the number of vertices with degree ` in the giant component,
and we recall that |E(Cmax )| denotes the number of edges in the giant component:
Theorem 2.31 (Properties of the giant) Under the assumptions of Theorem 2.28, when
ζ = µ(|C (o)| = ∞) > 0,
v` (Cmax ) P
−→ µ(|C (o)| = ∞, do = `). (2.6.18)
n
Further, assume that Dn = d(G
on is such that (Dn )n≥1 is uniformly integrable. Then,
n)

|E(Cmax )| P 1 h
−→ Eµ do 1{|C (o)|=∞} .
i
(2.6.19)
n 2
Proof The proof follows that of Theorem 2.28. We now define, for k ≥ 1, A ⊆ N, and
with dv the degree of v in Gn ,

1{|C (v)|≥k,dv ∈A} .

X
ZA,≥k = (2.6.20)
v∈V (Gn )

so that, with A = {`},

1
v` (Cmax ) ≤ µ(|C (o)| = ∞, do = `) + ok,P (1), (2.6.23)
n
which establishes the required upper bound. We are left to prove the corresponding lower
bound.
Applying (2.6.22) to A = {`}c , we obtain that, for all ε > 0,
1h i
lim P |Cmax | − v` (Cmax ) ≤ µ(|C (o)| ≥ k, do 6= `) + ε/2 = 1. (2.6.24)
n→∞ n
We argue by contradiction. Suppose that, for some `,
v (C )
` max
lim inf P ≤ µ(|C (o)| = ∞, do = `) − ε = κ > 0. (2.6.25)
n→∞ n
Then, by (2.6.24), we have that also the lim inf of the probability of the intersection of
{[v` (Cmax )/n ≤ µ(|C (o)| = ∞, do = `)−ε} and {[|Cmax |−v` (Cmax )]/n ≤ µ(|C (o)| ≥
k, do 6= `) + ε/2} is at least κ > 0.
78 Local Convergence of Random Graphs

Therefore, along the subsequence (nl )l≥1 that attains the lim inf in (2.6.25), with asymp-
totic probability κ > 0, and using (2.6.24),
|Cmax | 1 v` (Cmax )
= [|Cmax | − v` (Cmax )] +
n n n
≤ µ(|C (o)| ∈ {`}c ) + ε/2 + µ(|C (o)| = ∞, do = `) − ε
≤ µ(|C (o)| = ∞) − ε/2, (2.6.26)
which contradicts Theorem 2.28. We conclude that (2.6.25) cannot hold, so that (2.6.18)
follows.
For (2.6.19), we note that
1X
|E(Cmax )| = `v` (Cmax ). (2.6.27)
2 `≥1

We divide by n and split the sum over ` into small and large `:
|E(Cmax )| 1 X 1 X
= `v` (Cmax ) + `v` (Cmax ). (2.6.28)
n 2n `∈[K] 2n `>K

For the first term in (2.6.28), by (2.6.18), we have

1 X P 1 X
`v` (Cmax ) −→ µ(|C (o)| = ∞, do = `)
2n `∈[K] 2 `∈[K]
1 h
= Eµ do 1{|C (o)|=∞,do ∈[K]} .
i
(2.6.29)
2
For the second term in (2.6.28), we obtain the bound, with n` the number of degree ` vertices
in Gn ,
1 X 1 X n` 1 n)
` = E d(G on 1{do

`v` (Cmax ) ≤ (Gn )
>K}
| Gn . (2.6.30)
2n `>K 2 `>K n 2 n

By uniform integrability,

on 1{d(G
lim lim sup E d(n)

o
n)
>K}
= 0. (2.6.31)
K→∞ n→∞ n

As a result, by the Markov inequality and for every ε > 0,

on 1{do 1{do(Gn ) >K} /ε → 0,

P E d(G ≤ E do(n)

n)
(Gn )
>K}
| G n > ε n
(2.6.32)
n n

when first n → ∞ followed by K → ∞. This completes the proof of (2.6.19).

It is not hard to extend the above analysis to the local convergence in probability of the
giant, as well as its complement, as formulated in the following theorem:
Theorem 2.32 (Local limit of the giant) Under the assumptions of Theorem 2.28, when
ζ = µ(|C (o)| = ∞) > 0,
1 X
1{Br(Gn ) (v)'H? } −→
P
µ(|C (o)| = ∞, Br(G) (o) ' H? ), (2.6.33)
n v∈C
max
2.6 Giant Component is Almost Local 79

and
1 X
1{B (Gn )
(v)'H? }
P
−→ µ(|C (o)| < ∞, Br(G) (o) ' H? ). (2.6.34)
n v6∈C r
max

Proof The convergence in (2.6.34) follows from that in (2.6.33) combined with the fact
that, by assumption,
1 X
1 (Gn ) P
−→ µ(Br(G) (o) ' H? ). (2.6.35)
n v∈V (G ) {Br (v)'H? }
n

The convergence in (2.6.34) can be proved as for Theorem 2.31, now using that, for every
H? ⊆ G? ,
1 1 X
ZH? ,≥k ≡ 1 (Gn )
n n v∈V (G ) {|C (v)|≥k,Br (v)∈H? }
n

−→ µ(|C (o)| ≥ k, Br(G) (o) ∈ H? ),

P
(2.6.36)
P
and, since |Cmax |/n −→ ζ > 0 by Theorem 2.28,
1 X ZH? ,≥k
1{B (Gn )
(v)∈H? }
≤ . (2.6.37)
n v∈C r
n
max

We then argue by contradiction again as in (2.6.25) and (2.6.26). We leave the details to the
reader.

2.6.3 “G IANT IS A LMOST L OCAL” C ONDITION R EVISITED

The “giant is almost local” condition (2.6.7) is sometimes inconvenient to verify, and we
now give an alternative form that is often easier to work with. Recall that when we count
vertex pairs (x, y) with certain properties, we mean that (x, y) ∈ V (Gn ) × V (Gn ):
Lemma 2.33 (Condition (2.6.7) revisited) Under the assumptions of Theorem 2.28, the
condition in (2.6.7) holds when
1 h i
lim lim sup 2 E # (x, y) : |∂Br(Gn ) (x)|, |∂Br(Gn ) (y)| ≥ r, x ←→
/ y = 0, (2.6.38)
r→∞ n→∞ n

provided that there exists r = rk → ∞ such that, as k → ∞,

µ(|C (o)| ≥ k, |∂Br(G) (o)| < r) → 0, µ(|C (o)| < k, |∂Br(G) (o)| ≥ r) → 0.
(2.6.39)
Proof Denote

Pk = # (x, y) : |C (x)|, |C (y)| ≥ k, x ←→
/ y , (2.6.40)
0

Pr = # (x, y) : |∂Br (x)|, |∂Br (y)| ≥ r, x ←→
(Gn ) (Gn )
/ y . (2.6.41)

Then,
|Pk − Pr0 | ≤ 2n Z<r,≥k + Z≥r,<k ,

(2.6.42)
80 Local Convergence of Random Graphs

where
1{|∂B
X
Z<r,≥k = (Gn )
r (v)|<r,|C (v)|≥k}
, (2.6.43)
v∈V (Gn )

1{|∂B
X
Z≥r,<k = (Gn )
r (v)|≥r,|C (v)|<k}
. (2.6.44)
v∈V (Gn )

Therefore, by local convergence in probability,

1 2
|Pk − Pr0 | ≤ Z<r,≥k + Z≥r,<k

2
(2.6.45)
n n
P
−→ 2µ(|C (o)| ≥ k, |∂Br(G) (o)| < r) + 2µ(|C (o)| < k, |∂Br(G) (o)| ≥ r).
Take r = rk as in (2.6.39). Then, the rhs of (2.6.45) vanishes, so that by the Dominated
Convergence Theorem [V1, Theorem A.1], we also have
1
E |Pk − Pr0k | = 0.

lim lim sup 2
(2.6.46)
k→∞ n→∞ n
We arrive at
1 1 1
E Pk ≤ lim lim sup 2 E |Pk − Pr0k | + 2 E Pr0k = 0, (2.6.47)

lim lim sup 2
k→∞ n→∞ n k→∞ n→∞ n n
by (2.6.38), since rk → ∞ when k → ∞.
The assumption in (2.6.39) on the local limit is often easily verified. For example, for the
Erdős–Rényi random graph ERn (λ/n) with λ > 1, to which we apply it below, we can
take r = k and use that, on the event of survival (recall [V1, Theorem 3.9]),
a.s.
λ−r |∂Br(G) (o)| −→ M. (2.6.48)
Here (G, o) ∼ µ denotes a Poisson branching process with mean λ offspring, for which we
know that M > 0 on the event of survival by [V1, Theorem 3.10]. Therefore, µ(|C (o)| ≥
k, |∂Bk(G) (o)| < k) → 0 as k → ∞. Further, µ(|C (o)| < k, |∂Bk(G) (o)| ≥ k) = 0
trivially. However, there are examples where (2.6.39) fails, and then also the equivalence of
(2.6.7) and (2.6.38) may also be false.

2.6.4 G IANT IN E RD ŐS –R ÉNYI R ANDOM G RAPHS

In this subsection we use the local convergence in probability of ERn (λ/n) in Theorem
2.18, combined with the fact that the “giant is almost local” in Theorem 2.28, to identify the
phase transition and size of the giant in the Erdős–Rényi random graph ERn (λ/n):
Theorem 2.34 (Phase transition in Erdős–Rényi random graph) Fix λ > 0, and let Cmax
be the largest connected component of the Erdős–Rényi random graph ERn (λ/n) and C(2)
the second largest connected component (breaking ties arbitrarily). Then,
|Cmax | P |C(2) | P
−→ ζλ , −→ 0, (2.6.49)
n n
where ζλ is the survival probability of a Poisson branching process with mean offspring λ.
In particular, ζλ > 0 precisely when λ > 1.
2.6 Giant Component is Almost Local 81

Further, for λ > 0, with ηλ = 1 − ζλ and for all ` ≥ 0,

v` (Cmax ) P −λ λ`
−→ e [1 − ηλ` ], (2.6.50)
n `!
and
|E(Cmax )| P 1
−→ λ[1 − ηλ2 ]. (2.6.51)
n 2
The law of large numbers for the giant in Theorem 2.34 for λ > 1 was also proved in
[V1, Theorem 4.8], where a more precise bound was given on the convergence rate. There,
the proof was given using explicit computations. Here we show that it follows rather directly
from local convergence considerations. We refer to [V1, Section 4.6] for a discussion of the
history of the phase transition for ERn (λ/n).
Proof The main work resides in showing that the condition (2.6.7) in Theorem 2.28 holds.
In turn, (2.6.7) in Theorem 2.28 can be replaced by (2.6.38) in Lemma 2.33.
Indeed, local convergence in probability follows from Theorem 2.18. Then, the claim in
(2.6.49) follows directly from Theorem 2.28, while (2.6.50) and (2.6.51) follow from The-
orem 2.31, and the observations that, for a Poisson branching process with mean offspring
λ > 1,
λ`
µ(|C (o)| = ∞, do = `) = e−λ [1 − ηλ` ], (2.6.52)
`!
and thus
Eµ do 1{|C (o)|=∞} =
h i X
`µ(|C (o)| = ∞, do = `)
`
X λ`
= `e−λ [1 − ηλ` ] = λ[1 − ηλ e−λ(1−ηλ ) ],
`
`!
= λ[1 − ηλ2 ], (2.6.53)
since ηλ satisfies ηλ = e−λ(1−ηλ ) . Therefore, we are left to show that (2.6.38) holds, which
is the key to the proof. The proof proceeds in several steps.

Step 1: Relation to, and Concentration of, Binomials

P(|∂Br (o1 )|, |∂Br (o2 )| ≥ r, o1 ←→

/ o2 ) (2.6.55)
X
Pr o1 ←→ / o2 P(|∂Br (oi )| = b(i)

0 , |Br−1 (oi )| = s0 , i ∈ {1, 2}).
(i)
=
(1) (2) (1) (2)
b0 ,b0 ,s0 ,s0
82 Local Convergence of Random Graphs

Our aim is to show that, for every ε > 0, we can find r = rε such that, for every b(1) (2)
0 , b0 ≥ r
(1) (2)
and s0 , s0 fixed,
lim sup Pr o1 ←→

/ o2 ≤ ε. (2.6.56)
n→∞

Under Pr ,
|∂Br+1 (o1 ) \ Br (o2 )| ∼ Bin(n(1) (1)
1 , p1 ), (2.6.57)
where
(1)
λ b0
n(1) (1) (2)
1 = n − s0 − s0 , p(1)
1 = 1 − 1 − . (2.6.58)
n
Here, we note that the vertices in ∂Br (o2 ) play a different role from those in ∂Br (o1 ), as
they can be in ∂Br+1 (o1 ), but those in ∂Br (o1 ) cannot. This explains the slightly asymmet-
ric form with respect to vertices 1 and 2 in (2.6.58).
We are led to studying concentration properties of binomial random variables. For this,
we rely on the following lemma:
Lemma 2.35 (Concentration binomials) Let X ∼ Bin(m, p). Then, for every δ > 0,
δ 2 E[X]

P X − E[X] ≥ δ E[X] ≤ 2 exp −

. (2.6.59)
2(1 + δ/3)
Proof This is a direct consequence of [V1, Theorem 2.21].
Lemma 2.35 ensures that whp the boundary of |∂Br+1 (o1 )| is close to λ|∂Br (o1 )| for r
and n large, so that the boundary grows by a factor λ > 1. Further applications lead to the
statement that |∂Br+k (o1 )| ≈ λk |∂Br (o1 )|. Thus, in roughly a logλ n steps, the boundary
will have expanded to na vertices. However, in order to make this precise, we need that (1)
the sum of complementary probabilities in Lemma 2.35 is still quite small uniformly in k
and for r large; and (2) we have good control over the number of vertices in the boundaries,
not just in terms of lower bounds, but also in terms of upper bounds, as that gives control
over the number of vertices that have not yet been used. For the latter, we also need to deal
with the δ -dependence in (2.6.59).
We prove (2.6.56) by√first growing |∂Br+k (o1 )| for k ≥ 1 until |∂Br+k (o1 )| is very
large (much larger than n will suffice), and then, outside Br+k (o1 ) for the√ appropriate k ,
growing |∂Br+k (o2 )| for k ≥ r until |∂Br+k (o2 )| is also very large (now n will suffice).
Then, it is very likely that there is a direct edge between the resulting boundaries. We next
provide the details.

Step 2: Erdős–Rényi Neighborhood Growth of First Vertex

To make the above analysis precise, we start by introducing some notation. We grow the
0
sets ∂Br+l (o1 ) outside Br (o2 ), and denote ∂Br+l−1 (o1 ) to indicate this (minor) change.
(1)
We let bk = b0 [λ(1 + ε)] and bk = b0 [λ(1 − ε)]k denote upper and lower bounds
(1) k (1) (1)

0
on |∂Br+k (o1 )| that we will prove to hold whp, where we choose ε > 0 small enough that
λ(1 − ε) > 1. We let
k−1
(1)
X
s(1)
k−1 = s (1)
0 + bl (2.6.60)
l=0
2.6 Giant Component is Almost Local 83

0
denote the resulting upper bound on |Br+k−1 (o1 )|. We fix a ∈ ( 12 , 1), and let
k ≤ kn? = kn? (ε) = da logλ(1−ε) ne, (2.6.61)
and note that there exists C > 1 such that
k−1
X b(1)
0
(1)
sk−1 ≤ s0 + (1)
[λ(1 + ε)]l ≤ s(1)
0 + [λ(1 + ε)]l
l=0
λ(1 + ε) − 1
a log λ(1+ε)/ log λ(1−ε)
≤ Cn , (2.6.62)
uniformly in k ≤ kn? .
We choose a ∈ ( 12 , 1) so that a log λ(1 + ε)/ log λ(1 − ε) ∈ ( 12 , 1).
Define the good event by
(1)
\ (1)
0
(1)
Er,[k] = Er,l , where (1)
Er,k = {b(1)
k ≤ |∂Br+k (o1 )| ≤ bk }. (2.6.63)
l∈[k]

We write
Y (1)
Pr Er,[k] Pr Er,l | Er,[l−1]
(1)
(1)
= , (2.6.64)
l∈[k]

so that
X (1)
Pr Er,[k] Pr (Er,l )c | Er,[l−1]
(1)

≥1− (1)
. (2.6.65)
l∈[k]

(1)
0 (1)
With the above choices, we have, conditional on |∂Br+l−1 (o1 )| = b(1)
l−1 ∈ [bl−1 , bl−1 ]
0 (1) (1)
and |Br+l−1 (o1 )| = sl−1 ≤ sl−1 ,
0
|∂Br+l (o1 )| ∼ Bin(n(1) (1)
l , pl ), (2.6.66)
where
(1)
λ bl−1
(1)
nl = n − sl−1 − s0 , (1) (2) (1)
pl =1− 1− . (2.6.67)
n
0
The fact that we grow ∂Br+l−1 (o1 ) outside of Br (o2 ) is reflected in the subtraction of s(2) 0
0
in n(1)
l . We aim to apply Lemma 2.35 to |∂B r+l (o 1 )| , with δ = ε/2 , for which it suffices to
prove bounds on the (conditional) expectation n(1)
l p (1)
l . We use that
b(1)
l−1 λ (b(1) λ)2 b(1)
l−1 λ
− l−1 2 ≤ p(1) l ≤ . (2.6.68)
n 2n n
Therefore, with Er denoting expectation wrt Pr ,

0 b(1)
l−1 λ
Er [|∂Br+l (1)
(o1 )| | Er,[l−1] ] = n(1) (1)
l pl ≤ n = λb(1)
l−1 , (2.6.69)
n
which provides the upper bound on n(1) (1)
l pl . For the lower bound, we use the lower bound
in (2.6.68) to note that pl ≥ (1 − ε/4)λb(1)
(1)
l−1 /n for n sufficiently large, since we are on
?
(1)
Er,[l−1] . Further, n(1) (1)
l ≥ (1 − ε/4)n on Er,[l−1] , uniformly in l ≤ kn . We conclude that, for
n sufficiently large,
b(1) λ
nl pl(1) (1)
≥ (1 − ε/4) n l−1 ≥ (1 − ε/2)λb(1)
2
l−1 . (2.6.70)
n
84 Local Convergence of Random Graphs
(1)
0
(1)
Recall the definition of Er,l in (2.6.63). As a result, b(1)
k ≤ |∂Br+k (o1 )| ≤ bk implies that
0
|∂Br+k (o1 )| − n(1)
l pl
(1)
≤ (ε/2)n(1) (1)
l pl . Thus, by Lemma 2.35 with δ = ε/2,

0
Pr (Er,l(1) c
≤ Pr |∂Br+l

(1)
) | Er,[l−1] (o1 )| − n(1)
l pl
(1)
≥ (ε/2)n(1)
l pl
(1) (1)
Er,[l−1]
2
ε (1 − ε/2)λb(1)

l−1

≤ 2 exp − = 2 exp −qλb(1)
l−1 , (2.6.71)
8(1 + ε/6)
where q = ε2 (1 − ε/2)/[8(1 + ε/6)] > 0.
We conclude that, for n sufficiently large,
k
X (1)
Pr Er,[k] ≥ 1 − 2 e−qλbl−1 ,
(1)

(2.6.72)
l=1

which is our key estimate for the neighborhood growth in ERn (λ/n).

Step 3: Erdős–Rényi Neighborhood Growth of Second Vertex

We next grow the neighborhoods from vertex 2 in a similar way, and we focus on the dif-
ferences only. In the whole argument below, we are conditioning on |∂Br+l (o1 )| = b(1)
l ∈
(1)
?
[b(1)
l , bl ] and |B r+l 1(o )| = s(1)
l ≤ s (1)
l for all l ∈ [k n ].
Further, rather than exploring (∂Br+k (o2 ))k≥0 , we again explore these neighborhoods
0 0
outside Br+k ? (o1 ), and denote them by (∂Br+k (o2 ))k≥0 .
n
(2)
k (2) k
We define bk = b(2) (2)
0 [λ(1 + ε)] and bk = b0 [λ(1 − ε)] in the same way as for o1 ,
and, as in (2.6.60),
k−1
(2)
X
s(2) (2) (1)
k−1 = s0 + sk + bl , (2.6.73)
l=0

so that also, uniformly in k ≤ kn? ,

a log λ(1+ε)/ log λ(1−ε)
s(2)
k−1 ≤ Cn . (2.6.74)

We further define, as in (2.6.63) and for some C > 1,

(2)
\ (2)
0
(2)
Er,[k] = Er,l , where (2)
Er,k = {b(2)
k ≤ |∂Br+k (o2 )| ≤ bk }. (2.6.75)
l∈[k]

(2)
0 (2)
Then, conditional on the above, as well as on |∂Br+k−1 (o2 )| = b(2)
k−1 ∈ [bk−1 , bk−1 ] and
0
|Br+k−1 (o2 )| = s(2) (2)
k−1 ≤ sk−1 , we have

0
|∂Br+k (o2 )| ∼ Bin(n(2) (2)
k , pk ), (2.6.76)

where now
(2)
λ bk−1
n(2) (1)
k = n − skn
(2)
? − sk−1 , p(2)
k = 1 − 1 − . (2.6.77)
n
Let Pr,kn? denote the conditional probability given |∂Br (o1 )| = b(i) (i)
0 with b0 ≥ r,
2.6 Giant Component is Almost Local 85

|Br−1 (oi )| = s(i) (1)

0 , and Er,[kn? ] . Following the argument in the previous paragraph, we are

led to the conclusion that

?
kn
X (2)
Pr,kn? Er,[kn? ] ≥ 1 − 2 e−qλbl−1 ,
(2)

(2.6.78)
l=1

where again q = ε2 (1 − ε/2)/[8(1 + ε/6)] > 0.

Completion of the Proof of Theorem 2.34

We will use Lemma 2.33, and conclude that we need to show (2.6.56). Recall (2.6.78), and
k ? ?
that bk(i) = b(i) (i)
0 [λ(1 − ε)] , where b0 ≥ r = rε . Fix k = kn = kn (ε) = da logλ(1−ε) ne as
?
in (2.6.61). For this k = kn , take r = rε sufficiently large that
k
X (1) (2)
[e−qλbl−1 + e−qλbl−1 ] ≤ ε/2. (2.6.79)
l=1

Denote the good event by

Erε ,[kn? ] = Er(1)
ε ,[kn ]
(2)
? ∩ Er ,[k ? ] ,
ε n
(2.6.80)
where we recall (2.6.63) and (2.6.75). Then,
lim inf Pr Erε ,[kn? ] ≥ 1 − ε.

(2.6.81)
n→∞

On Erε ,[kn? ] ,
k
|∂Br+kn? (o1 )| ≥ b(1)
kn
(1)
? = b0 [λ(1 − ε)] ≥ rε na , (2.6.82)

where we choose a such that a ∈ ( 12 , 1). An identical bound holds for |∂Br+k
0
n
? (o2 )|. There-
0
fore, the total number of direct edges between ∂Brε +kn? (o1 ) and ∂Brε +kn? (o2 ) is at least
(rε na )2 n when a > 21 . Each of these potential edges is present independently with
probability λ/n. Therefore,
a 2
λ (rε n )
Pr distERn (λ/n) (o1 , o2 ) > 2(kn? + rε ) + 1 | Erε ,[kn? ] ≤ 1 − = o(1). (2.6.83)
n
We conclude that, for n sufficiently large,

Pr distERn (λ/n) (o1 , o2 ) ≤ 2(kn? + rε ) + 1 | Erε ,[kn? ] = 1 − o(1). (2.6.84)

Thus, by (2.6.81), for n sufficiently large,

Pr (o1 ←→ / o2 ) ≤ Pr Ercε ,[kn? ] + Pr o1 ←→

/ o2 | Erε ,[kn? ] ≤ ε/2 + ε/2 ≤ ε. (2.6.85)

Since ε > 0 is arbitrary, the claim in (2.6.56) follows.

Small-World Nature of ERn (λ/n)

In the above proof, we also identified an upper bound on the typical distances in ERn (λ/n),
that is, the graph distance in ERn (λ/n) between o1 and o2 , as formulated in the following
theorem:
86 Local Convergence of Random Graphs

Theorem 2.36 (Small-world nature Erdős–Rényi random graph) Consider ERn (λ/n)
with λ > 1. Then, conditional on o1 ←→ o2 ,
distERn (λ/n) (o1 , o2 ) P 1
−→ . (2.6.86)
log n log λ

Proof The lower bound follows directly from (2.4.34), which implies that
k
h i X λl λk+1 − 1
P distERn (λ/n) (o1 , o2 ) ≤ k = E |Bk (o1 )|/n ≤

= . (2.6.87)
l=0
n n(λ − 1)
Applying this to k = d(1 − η) logλ ne shows that, for any η > 0,
P distERn (λ/n) (o1 , o2 ) ≤ (1 − η) logλ n → 0.

(2.6.88)
For the upper bound, we start by noting that
1 X
P(o1 ←→ o2 | ERn (λ/n)) = |C(i) |2 −→ ζλ2 ,
P

2
(2.6.89)
n i≥1

by Exercise 2.24 below. Thus, conditioning on o1 ←→ o2 is asymptotically the same as

o1 , o2 ∈ Cmax . The upper bound then follows from the fact that the event Erε ,[kn? ] in (2.6.80)
holds with probability at least 1 − ε, and, on the event Erε ,[kn? ] and recalling (2.6.61),
distERn (λ/n) (o1 , o2 ) ≤ 2(kn? + rε ) + 1 ≤ 2(a log(1−ε)λ (n) + rε ) + 1, (2.6.90)
with probability at least 1 − ε, by (2.6.84). For any η > 0, we can take ε > 0 so small and
a > 12 so close to 21 that the rhs can be bounded from above by (1 + η) logλ n We conclude
that distERn (λ/n) (o1 , o2 ) ≤ (1 + η) logλ n whp on the event that o1 ←→ o2 .

2.6.5 O UTLINE OF THE R EMAINDER OF THIS B OOK

The results proved for the Erdős–Rényi random graph complete the preliminaries for this
book given in Part I. They further allow us to provide a brief glimpse into the content of
the remainder of this book. We will prove results about local convergence, the existence of
the giant component, and the small-world nature of various random graph models. We focus
on random graphs with substantial inhomogeneities, such as the generalized random graph,
the configuration model and its related uniform random graph with prescribed degrees, and
various preferential attachment models. The remainder of this book consists of three parts.
In Part II, consisting of Chapters 3, 4 and 5, we focus on local convergence as in The-
orem 2.18, but then applied to these models. Further, we investigate the size of the giant
component as in Theorems 2.28 and 2.34. The proofs are often more involved than the cor-
responding results for the Erdős–Rényi random graph, since these random-graph models are
inhomogeneous and often also lack independence of the edge statuses. Therefore, the proofs
require detailed knowledge of the structure of the models in question.
In Part III, consisting of Chapters 6, 7, and 8, we focus on the small-world nature of
these random graph models, as in Theorem 2.36. It turns out that the exact scaling of the
graph distances depends on the level of inhomogeneity present in the random graph model.
In particular, we will see that when the second moment of the degrees remains bounded,
2.7 Notes and Discussion for Chapter 2 87

then graph distances grow logarithmically as in Theorem 2.36. If, on the other hand, the
second moment blows up with the graph size, then distances are smaller. In particular, often
these typical distances are doubly logarithmic when the degrees obey a power law with
exponent τ that satisfies τ ∈ (2, 3), so that even a moment of order 2 − ε is infinite for some
ε > 0. Anyone who has done some numerical work will realize that in practice there is little
difference between log log n and a constant, even when n is quite large.
One of the main conclusions of the local convergence results in Part II is that the most
popular random graph models for inhomogeneous real-world networks are locally tree-like,
in that the majority of neighborhoods of vertices have no cycles. This is for example true
for the Erdős–Rényi random graph, see Theorem 2.18, since the local limit is a branching
process tree. In many real-world settings, however, this is not realistic. Certainly in social
networks, many triangles and even cliques of larger size exist. Therefore, in Part IV, con-
sisting of Chapter 9, we investigate some adaptations of the models discussed in Parts II and
III. These models may incorporate clustering or community structure; they may be directed
or living in a geometric space. All these aspects have received tremendous attention in the
literature. Therefore, with Part IV in hand, the reader will be able to access the literature
more easily.

2.7 N OTES AND D ISCUSSION FOR C HAPTER 2

There is an extensive body of work studying dense graph limits, using the theory of graphons; see Lovász
(2012) and the references therein. Links have been built between this theory and the so-called “local–global”
limits of sparse graphs called graphings. Elek (2007) proved that local limits of bounded-degree graphs are
graphings; see also Hatami et al. (2014) for an extension. The related notion of graphops was defined in
Backhausz and Szegedy (2022).

Notes on Section 2.1

There is no definitive source on local convergence techniques. Aside from the classics Aldous and Steele
(2004) and Benjamini and Schramm (2001), I have been inspired by the introductions in Bordenave (2016),
Curien (2018), and Leskelä (2019). Further, the discussions on local convergence by Bordenave and Caputo
(2015) and Anantharam and Salez (2016) are clean and clear, and have been very helpful. In the literature,
often the notation [G, o]r is used for the isomorphism classes of Br(G) (o).

Notes on Section 2.2

The discussion in this and the following section follows Aldous and Steele (2004) and Benjamini and
Schramm (2001). We refer to Appendix A.3 for proofs of various properties of the metric dG? on rooted
graphs, including the fact that it turns G? into a Polish space.

Notes on Section 2.3

Various generalizations of local convergence are possible. For example, Aldous and Steele (2004) intro-
duced the notion of geometric rooted graphs, which are rooted graphs where each edge e receives a weight
`(e), turning the rooted graph into a metric space itself. Benjamini, Lyons, and Schramm (2015) allowed
for the more general marks as discussed in Section 2.3.5. Dembo and Montanari (2010b) defined a version
of local weak convergence in terms of convergence of subgraph counts (see (Dembo and Montanari, 2010b,
Definition 2.1)). Aldous and Lyons (2007) also studied the implications for stochastic processes, such as
percolation and random walks, on unimodular graphs.
88 Local Convergence of Random Graphs

Notes on Section 2.4

Dembo and Montanari (2010b) proved that their local weak convergence in terms of convergence of sub-
graph counts defined in (Dembo and Montanari, 2010b, Definition 2.1) holds for several models, including
ERn (λ/n) and CMn (d) under appropriate conditions, while Dembo and Montanari (2010a) provides more
details. See e.g., (Dembo and Montanari, 2010a, Lemma 2.4) for a proof for the configuration model, and
(Dembo and Montanari, 2010a, Proposition 2.6) for a proof for the Erdős–Rényi random graph.
The local limit of the random regular graph in Theorem 2.17 was first identified by McKay (1981). An
intuitive analysis of P(B̄r(Gn ) (1) = t) for Gn = ERn (λ/n) in (2.4.28) has already appeared in [V1,
Section 4.1.2]; see in particular [V1, (4.1.12)]. The local limit was identified by Dembo and Montanari
(2010b,a), and can be obtained from the breadth-first exploration first analyzed by Karp (1990) in the
context of random digraphs.
We thank Shankar Bhamidi for useful discussions about possibly random limits in Definition 2.11. For
background on the convergence of measures on rooted graphs in Remark 2.13, we refer to (Kallenberg,
d
2017, Chapter 4, and in particular Lemma 4.8). Local weak convergence, i.e., (Gn , on ) −→ (G, o) is,
interestingly enough, not the same as convergence in distribution of the measure µn defined in Remark 2.13
(see (Kallenberg, 2017, Theorem 4.11).
The local convergence of dynamic random graphs has been studied in Dort and Jacob (2023) and
Milewska et al. (2023).

Notes on Section 2.5

While many of the results in this section are “folklore,” appropriate references are not obvious.
Theorems 2.22 and 2.23 discuss the convergence of two clustering coefficients. In the literature, the
clustering spectrum has also attracted attention. For this, we recall that nk denotes the number of vertices
with degree k in Gn , and define the clustering coefficient of vertices of degree k to be
1 X ∆Gn (v)
cGn (k) = , (2.7.1)
nk (n)
k(k − 1)
v∈V (Gn ) : dv =k

where nk is the number of vertices of degree k in Gn . It is not hard to adapt the proof of Theorem 2.23 to
P
show that, under its assumptions, cGn (k) −→ cG (k), where
h ∆ (o) i
G
cG (k) = Eµ | do = k , (2.7.2)
k(k − 1)
and the convergence holds for all k for which p(G) k = µ(do(G) = k) > 0. See Exercise 2.37.
The convergence of the assortativity coefficient in Theorem 2.26 is restricted to degree distributions
that have uniformly integrable third moments. In general, an empirical correlation coefficient needs a finite
variance of the random variables to converge to the correlation coefficient. Nelly Litvak and the author (see
van der Hofstad and Litvak (2014) and Litvak and van der Hofstad (2013)) proved that when the random
variables do not have finite variance, such convergence (even for an iid sample) can be to a proper random
variable that has support containing a subinterval of [−1, 0] and a subinterval of [0, 1], giving problems in
interpretation.
For networks, ρGn in (2.5.44) is always well defined, and gives a value in [−1, 1]. However, also for
networks there is a problem with this definition. Indeed, van der Hofstad and Litvak (2014) and Litvak and
van der Hofstad (2013) proved that if a limiting value of ρGn exists for a sequence of networks and the
third moment of the degree of a random vertex is not uniformly integrable, then lim inf n→∞ ρGn ≥ 0, so
no asymptotically disassortative graph sequences exist for power-law networks with infinite third-moment
degrees. Naturally, other ways of classifying the degree–degree dependence can be proposed, such as the
correlation of their ranks. Here, a sequence of numbers x1 , . . . , xn has ranks r1 , . . . , rn when xi is the ri th
largest of x1 , . . . , xn . Ties tend to be broken by giving random ranks for the equal values. For practical
purposes, a scatter plot of the values might be the most useful way to gain insight into degree–degree
dependencies.
Several related graph properties or parameters have been investigated using local convergence. Lyons
(2005) showed that the exponential growth rate of the number of spanning trees of a finite connected graph
can be computed through the local limit. See also Salez (2013) for weighted spanning subgraphs, and
Gamarnik et al. (2006) for maximum-weight independent sets. Bhamidi et al. (2012) identified the limiting
spectral distribution of the graph adjacency matrix of finite random trees using local convergence, and
2.8 Exercises for Chapter 2 89

Bordenave et al. (2011) proved the convergence of the spectral measure of sparse random graphs (see also
Bordenave and Lelarge (2010) and Bordenave et al. (2013) for related results). A property that is almost
local is the density of the densest subgraph in a random graph, as shown by Anantharam and Salez (2016)
and studied in more detail in Section 4.5.

Notes on Section 2.6

The results in this section appeared in van der van der Hofstad (2021), and were developed for this book.

2.8 E XERCISES FOR C HAPTER 2

Exercise 2.1 (Graph isomorphisms fix vertex and edge numbers) Assume that G1 ' G2 . Show that G1
and G2 have the same number of vertices and edges.
Exercise 2.2 (Graph isomorphisms fix degree sequence) Let G1 and G2 be two finite graphs. Assume that
G1 ' G2 . Show that G1 and G2 have the same degree sequences. Here, for a graph G, we let the degree
sequence be (p(G)
k )k≥0 , where
1
1{d(G)
X
p(G)
k = , (2.8.1)
|V (G)| v =k}
v∈V (G)

(G)
and dv is the degree of v in G.
(G)
Exercise 2.3 (Distance to rooted graph ball) Recall the definition of the ball Br (o) around o in the
graph G in (2.2.1). Show that dG? Br(G) (o), (G, o) ≤ 1/(r + 1). When does equality hold?
Exercise 2.4 (Countable number of graphs with bounded radius) Fix r. Show that there is a countable
number of isomorphism classes of rooted graphs (G, o) with radius at most r. Here, we let the radius
rad(G, o) of a rooted graph (G, o) be equal to rad(G, o) = maxv∈V (G) distG (o, v) where distG denotes
the graph distance in G.
Exercise 2.5 (G? is separable) Use Exercise 2.4 above to show that the set of rooted graphs G? has a
countable dense set, and is thus separable. (See also Proposition A.12 in Appendix A.3.2.)
Exercise 2.6 (Continuity of local neighborhood functions) Fix H? ∈ G? . Show that h : G? 7→ {0, 1}
given by h G, o = 1{B (G) (o)'H } is continuous.

r ?

Exercise 2.7 (Bounded number of graphs with bounded radius and degrees) Show that there are only a
bounded number of isomorphism classes of rooted graphs (G, o) with radius at most r for which the degree
of every vertex is at most k.
Exercise 2.8 (Random local weak limit) Construct the simplest (in your opinion) possible example where
the local weak limit of a sequence of deterministic graphs is random.

Exercise 2.9 (Local weak limit of line and cycle) Let Gn be given by V (Gn ) = [n], E(Gn ) = {i, i +
1} : i ∈ [n − 1] be a line. Show that (Gn , on ) converges to (Z, 0). Show that the same is true for the cycle,

for which E(Gn ) = {i, i + 1} : i ∈ [n − 1] ∪ {1, n} .
Exercise 2.10 (Local weak limit of finite tree) Let Gn be the tree of depth k, in which every vertex except
the 3 × 2k−1 leaves have degree 3. Here n = 3(2k − 1). What is the local weak limit of Gn ?
Exercise 2.11 (Uniform integrability and convergence of size-biased degrees) Show that when (d(Gn)
on )n≥1
forms a uniformly integrable sequence of random variables, there exists a subsequence along which Dn? ,
the size-biased version of Dn = do(Gnn ) , converges in distribution.
Exercise 2.12 (Uniform integrability and degree regularity condition) For Gn = CMn (d), show that
Conditions 1.7(a),(b) imply that (d(Gn)
on )n≥1 is a uniformly integrable sequence of random variables.

Exercise 2.13 (Adding a small disjoint graph does not change local weak limit) Let Gn be a graph that
converges in the local weak sense. Let an ∈ N be such that an = o(n), and add a disjoint copy of an
arbitrary graph of size an to Gn . Denote the resulting graph by G0n . Show that G0n has the same local weak
limit as Gn .
90 Local Convergence of Random Graphs

Exercise 2.14 (Local weak convergence does not imply uniform integrability of the degree of a random
vertex) In the setting of Exercise 2.13, add a complete graph of size an to Gn . Let a2n n. Show that
the degree of a vertex chosen uar in G0n is not uniformly integrable.
Exercise 2.15 (Local limit of random 2-regular graph) Show that the configuration model CMn (d) with
dv = 2 for all v ∈ [n] converges locally in probability to (Z, 0). Conclude that the same applies to the
random 2-regular graph.
Exercise 2.16 (Independent neighborhoods of different vertices) Let Gn converge locally in probability
to (G, o). Let (o(1) (2) (1)
n , on ) be two independent uniformly chosen vertices in V (Gn ). Show that (Gn , on )
and (Gn , o(2)
n ) jointly converge to two conditionally independent copies of (G, o) given µ.

Exercise 2.17 (Directed graphs as marked graphs) There are several ways to describe directed graphs as
marked graphs. Give one.
Exercise 2.18 (Multi-graphs as marked graphs) Use the formalism of marked rooted graphs in Definition
2.10 to cast the setting of multi-graphs discussed in Remark 2.4 into this framework.
Exercise 2.19 (Uniform d-regular simple graph) Use Theorem 2.17 and (1.3.41) to show that the uniform
random d-regular graph (which is the same as the d-regular configuration model conditioned on simplicity)
also converges locally in probability to the infinite d-regular tree.
Exercise 2.20 (Local weak convergence and subsets) Recall the statement of Theorem 2.16. Prove that
d
local weak convergence (Gn , on ) −→ (Ḡ, ō) when (2.4.10) holds for all H? ∈ T? (r) and all r ≥ 1.
Exercise 2.21 (Local convergence in probability and subsets) Recall the statement of Theorem 2.16. Prove
that Gn converges locally in probability to (G, o) when (2.4.11) holds for all H? ∈ T? (r) and all r ≥ 1.
Extend this to almost sure local convergence and (2.4.12).
Exercise 2.22 (Functional for number of connected components is continuous) Prove that h(G, o) =
1/|C (o)| is a bounded and continuous function, where, by convention, h(G, o) = 0 when |C (o)| = ∞.
Exercise 2.23 (Functional for number of connected components is continuous) Recall the notion of a
random variable Xn,k being ok,P (1) in (2.6.9). Recall the definition of Z≥k in (2.6.3). Show that Z≥k /n =
ζ + ok,P (1).
Exercise 2.24 (Convergence of sum of squares of cluster sizes) Show that, under the conditions of Theo-
rem 2.28 and with ζ = µ(|C (o)| = ∞),
1 X
|C(i) |2 −→ ζ 2 .
P
(2.8.2)
n2
i≥1

Exercise 2.25 (Expected boundary of balls in Erdős–Rényi random graphs) Prove that E |∂Br(Gn ) (1)| ≤

λr for Gn = ERn (λ/n) and every r ≥ 0. This can be done, for example, by using induction and showing
that, for every r ≥ 1,
E |∂Br(Gn ) (1)| | Br−1
(Gn ) (Gn )

(1) ≤ λ|∂Br−1 (1)|. (2.8.3)
Exercise 2.26 (Uniform integrability and moment convergence) Assume that Dn = do(Gnn ) is such that
(Dn3 )n≥1 is uniformly integrable. Assume further that Gn converges locally in probability to (G, o). Prove
n) 3 3
that E[(d(G
on ) | Gn ] −→ Eµ [do ]. Conclude that (2.5.47) and (2.5.48) hold. Hint: You need to be very
P

3
careful, as Eµ [do ] may be a random variable when µ is a random measure.
Exercise 2.27 (Uniform integrability and moment convergence) Use Cauchy–Schwarz to show that Eµ [d3o ]−
Eµ [d2o ]2 /Eµ [do ] > 0 when µ(do = r) < 1 for every r ≥ 0.
Exercise 2.28 (Example of weak convergence where convergence in probability fails) Construct an ex-
ample where Gn converges locally weakly to (G, o), but not locally in probability.
Exercise 2.29 (Continuity of neighborhood functions) Fix m ≥ 1 and `1 , . . . , `m . Show that
h(G, o) = 1{|∂B (G) (o)|=` (2.8.4)
r k ∀k≤m}

is a bounded continuous function in (G? , dG? ).

2.8 Exercises for Chapter 2 91

Exercise 2.30 (Proof of (2.5.2)) Let Gn converge locally in probability to (G, o). Prove the joint conver-
gence in distribution of the neighborhood sizes in (2.5.2) using Exercise 2.16.
Exercise 2.31 (Example where the proportion in the giant is smaller than the survival probability) Con-
P
struct an example where Gn converges locally in probability to (G, o) ∼ µ, while |Cmax |/n −→ η < ζ =
µ(|C (o)| = ∞).
Exercise 2.32 (Convergence of the proportion of vertices in clusters of size at least k) Let Gn converge
P
locally in probability to (G, o) as n → ∞. Show that Z≥k in (2.6.3) satisfies that Z≥k /n −→ ζ≥k =
µ(|C (o)| ≥ k) for every k ≥ 1.
Exercise 2.33 (Upper bound on |Cmax | using local convergence) Let Gn = (V (Gn ), E(Gn )) denote a
random graph of size |V (Gn )| = n. Assume that Gn converges locally in probability to (G, o) ∼ µ as
n → ∞, and assume that the survival probability of the limiting graph (G, o) satisfies ζ = µ(|C (o)| =
P
∞) = 0. Show that |Cmax |/n −→ 0.
Exercise 2.34 (Sufficiency of (2.6.7) for almost locality of the giant) Let Gn = (V (Gn ), E(Gn )) denote
a random graph of size |V (Gn )| = n. Assume that Gn converges locally in probability to (G, o) ∼ µ and
write ζ = µ(|C (o)| = ∞) for the survival probability of the limiting graph (G, o). Assume that
1 h i
lim sup 2 lim sup E # (x, y) ∈ V (Gn ) × V (Gn ) : |C (x)|, |C (y)| ≥ k, x ←→ / y > 0. (2.8.5)
k→∞ n n→∞

Then, prove that for some ε > 0,

lim sup P(|Cmax | ≤ n(ζ − ε)) > 0. (2.8.6)
n→∞

Exercise 2.35 (Lower bound on graph distances in Erdős–Rényi random graphs) Use Exercise 2.25 to
show that, for every ε > 0,
dist
ERn (λ/n) (o1 , o2 )

lim P ≤ 1 − ε = 0. (2.8.7)
n→∞ logλ n
Exercise 2.36 (Lower bound on graph distances in Erdős–Rényi random graphs) Use Exercise 2.25 to
show that
lim lim sup P distERn (λ/n) (o1 , o2 ) ≤ logλ n − K = 0,

(2.8.8)
K→∞ n→∞

which is a significant extension of Exercise 2.35.

Exercise 2.37 (Convergence of the clustering spectrum) Prove that, under the conditions of Theorem 2.23,
the convergence of the clustering spectrum in (2.7.2) holds for all k such that p(G)
k = µ(d(G)
o = k) > 0.
Part II

Connected Components in Random Graphs

Overview of Part II
In this Part, we study local limits and connected components in random graphs, and the
relation between them. In more detail, we investigate the connected components of uniform
vertices, thus also describing the local limits of these random graphs. Further, we study the
existence and structure of the largest connected component, sometimes also called the giant
component when it contains a positive (as opposed to zero) proportion of the vertices in the
graph.
In many random graphs, such a giant component exists when there are sufficiently many
connections, while the largest connected component is much smaller than the number of
vertices when there are few connections. Thus, these random graphs undergo a phase tran-
sition. We identify the size of the giant component, as well as its structure in terms of the
degrees of its vertices. We also investigate whether the graph is fully connected. General
inhomogeneous random graphs are studied in Chapter 3, and the configuration model, as
well the closely related uniform random graph with prescribed degrees, in Chapter 4. In the
last chapter of this part, Chapter 5, we study the connected components and local limits of
preferential attachment models.

93
C HAPTER 3
C ONNECTED C OMPONENTS IN G ENERAL
I NHOMOGENEOUS R ANDOM G RAPHS

Abstract
In this chapter, we introduce the general setting of inhomogeneous random
graphs that are generalizations of the Erdős–Rényi and generalized random
graphs. In inhomogeneous random graphs the status of edges is independent,
with unequal edge-occupation probabilities. While these edge probabilities are
moderated by vertex weights in generalized random graphs, in the general set-
ting they are described in terms of a kernel.
The main results in this chapter concern the degree structure, the multi-type
branching process local limits, and the phase transition in these inhomogeneous
random graphs. We also discuss various examples, and indicate that they can
have rather different structures.

3.1 M OTIVATION : E XISTENCE AND S IZE OF THE G IANT

In this chapter we discuss general inhomogeneous random graphs, which are sparse random
graphs in which the edge statuses are independent. We investigate their local limits, as well
as their connectivity structure, and their giant component. This is inspired by the fact that
many real-world networks are highly connected, in the sense that their largest connected
component contains a large proportion of the total vertices of the graph. See Table 3.1 for
many examples and Figure 3.1 for the proportion of vertices in the maximal connected
components in the KONECT data base.
Table 3.1 and Figure 3.1 raise the question of how one can view settings where giant

0.9
Relative size of LCC

0.8

0.7

0.6

0.5

0.4

0.3

102 103 104 105 106 107 108

Size
Figure 3.1 Proportion of vertices in the maximal connected component in the
1,203 networks from the KONECT data base.

95
96 Connected Components in General Inhomogeneous Random Graphs

Subject % in giant Total Original source Data

AD-blood 0.8542 96 Goñi et al. (2008) Goñi et al. (2008)
MS-blood 0.8780 205 Goñi et al. (2008) Goñi et al. (2008)
actors 0.8791 2,180,759 Boldi and Vigna (2004) Boldi et al. (2011)
DBLP 0.8162 986,324 Boldi and Vigna (2004) Boldi et al. (2011)
Zebras 0.8214 28 Sundaresan et al. (2007) Sundaresan et al. (2007)

Table 3.1 The rows in this table correspond to the following real-world networks:
Protein–protein interactions in the blood of people with Alzheimer’s disease.
Protein–protein interactions in the blood of people with multiple sclerosis.
IMDb collaboration network, where actors are connected when they have co-acted in a movie.
DBLP collaboration network, where scientist are connected when they have co-authored a paper.
Interactions between zebras, where zebras are connected when they have interacted during the
observation phase.

components exist. We know that there is a phase transition in the size of the giant component
in ERn (λ/n); recall [V1, Chapter 4]. A main topic in the present chapter is to investigate the
conditions for a giant component to be present in general inhomogeneous random graphs;
this occurs precisely when the local limit has a positive survival probability (recall Section
2.6). Therefore, we also investigate the local convergence of inhomogeneous random graphs
in this chapter.
We will study much more general models, where edges are present independently, than
in the generalized random graph in [V1, Chapter 6]; see also Section 1.3.2. There, vertices
have weights associated to them, and the edge-occupation probabilities are approximately
proportional to the product of the weights of the vertices that the edge connects. This means
that vertices with high weights have relatively large probabilities of connections to all other
vertices, a property that may not always be appropriate. Let us illustrate this by an example,
which is a continuation of [V1, Example 6.1]:
Example 3.1 (Population of two types: general setting) Suppose that we have a complex
network in which there are n1 vertices of type-1 and n2 of type-2. Type-1 individuals have
on average m1 neighbors, type-2 individuals m2 , where m1 6= m2 . Further, suppose that
the probability that a type-1 individual is a friend of a type-2 individual is quite different
from the probability that a type-1 individual is a friend of another type-1 individual.
In the generalized random graph model proposed in [V1, Example 6.3], the probabil-
ity that a type-s individual is a friend of a type-r individual (where s, r ∈ [2]) equals
ms mr /(`n + ms mr ), where `n = n1 m1 + n2 m2 . Approximating this probability by
ms mr /`n , we see that the probability that a type-1 individual is a friend of a type-2 in-
dividual is highly related to the probability that a type-1 individual is friend of a type-1
individual. Indeed, take two type-1 and two type-2 individuals. Then, the probability that
the type-1 individuals are friends and the type-2 individuals are friends is almost the same
as the probability that the first type-1 individual is friend of the first type-2 individual, and
that the second type-1 individual is a friend of the second type-2 individual. Thus, there is
some, possibly unwanted and artificial, symmetry in the model.
How can one create instances where the edge probabilities between vertices of the same
type are much larger, or alternatively much smaller, than they would be for the generalized
3.2 Definition of the Model 97

random graph? In sexual networks, there are likely to be more edges between the different
sexes than amongst them, while in highly polarized societies most connections are within the
groups. In the two extremes, we either have a bipartite graph, where vertices are connected
only to vertices of the other type, or a disjoint union of two Erdős–Rényi random graphs,
consisting of the vertices of the two types and no edges between them. We aim to be able
to obtain anything in between. In particular, the problem with the generalized random graph
originates in the approximate product structure of the edge probabilities. In this chapter, we
deviate from such a product structure. J

As explained above, we wish to be quite flexible in our choices of edge probabilities.

However, we also aim for settings where the random graph is sufficiently “regular,” as for
example exemplified by its degree sequences converging to some deterministic distribution,
or even by a local limit existing. In particular, we aim for settings where the random graphs
are sparse. As a result, we need to build this regularity into the precise structure of the
edge probabilities. This will be achieved by introducing a sufficiently regular kernel that
moderates the edge probabilities.

Organization of this Chapter

This chapter is organized as follows. In Section 3.2, we introduce general inhomogeneous
random graphs. In Section 3.3 we study the degree distribution in such random graphs. In
Section 3.4 we treat multi-type branching processes, the natural generalization of branching
processes for inhomogeneous random graphs. In Section 3.5 we use these multi-type branch-
ing processes to identify the local limit of inhomogeneous random graphs. In Section 3.6 we
study the phase transitions of inhomogeneous random graphs. In Section 3.7 we state some
related results. We close this chapter with notes and discussion in Section 3.8 and exercises
in Section 3.9.

3.2 D EFINITION OF THE M ODEL

We assume that our individuals (vertices) have types which are in a certain type space S .
When there are individuals of just two types, as in Example 3.1, then it suffices to take
S = {1, 2}. However, the model allows for rather general sets of types of the individuals,
both finite as well as (countably or even uncountably) infinite type spaces. An example
of an uncountably infinite type space arises when the types are related to the ages of the
individuals in the population. Also the setting of the generalized random graph with wi
satisfying (1.3.15) correponds to the uncountable type-space setting when the distribution
function F is that of a continuous random variable W . We therefore also need to know
how many individuals there are of a given type. This is described in terms of a measure µn ,
where, for A ⊆ S , µn (A) denotes the proportion of individuals having a type in A.
In our general model, instead of vertex weights, the edge probabilities are moderated by a
kernel κ : S 2 → [0, ∞). The probability that two vertices of types x1 and x2 are connected
is approximately κ(x1 , x2 )/n, and different edges are present independently. Since there
are many choices for κ, we arrive at a rather flexible model.
98 Connected Components in General Inhomogeneous Random Graphs

3.2.1 I NHOMOGENEOUS R ANDOM G RAPHS AND T HEIR K ERNELS

We start by making the above definitions formal, by defining the ground space and kernel:
Definition 3.2 (Setting: ground space and kernel)
(a) A ground space is a pair (S, µ), where S is a separable metric space and µ is a Borel
probability measure on S .
(b) A vertex space is a triple (S, µ, (xn )n≥1 ), where (S, µ) is a ground space and, for each
n ≥ 1, xn is a (possibly random) sequence (x1 , x2 , . . . , xn ) of n points of S , such that
µn (A) = #{v ∈ [n] : xv ∈ A}/n → µ(A) (3.2.1)
for every µ-continuity set A ⊆ S .
(c) A kernel κ : S 2 → [0, ∞) is a symmetric non-negative (Borel) measurable function. By
a kernel on a vertex space (S, µ, (xn )n≥1 ), we mean a kernel on (S, µ). J
Before defining the random graph model, we state the necessary conditions on our ker-
nels:
Definition 3.3 (Setting: graphical and irreducible kernels)
(a) A kernel κ is graphical if the following conditions hold:
(i) κ is continuous a.e. on S 2 ;
(ii)
ZZ
κ(x, y)µ(dx)µ(dy) < ∞; (3.2.2)
S2

(iii)
1 1
X ZZ
[κ(xu , xv ) ∧ n] → κ(x, y)µ(dx)µ(dy). (3.2.3)
n2 1≤u<v≤n
2 S2

Similarly, a sequence (κn )n≥1 of kernels is called graphical with limit κ when, for µ-
almost every y, z ,
yn → y and zn → z imply that κn (yn , zn ) → κ(y, z), (3.2.4)
where κ satisfies conditions (a) and (b) above, and
1 X 1
ZZ
[κn (xu , xv ) ∧ n] → κ(x, y)µ(dx)µ(dy). (3.2.5)
n 1≤u<v≤n 2 S2
(b) A kernel κ is called reducible if
∃A ⊆ S with 0 < µ(A) < 1 such that κ = 0 a.e. on A × (S\A);
otherwise κ is irreducible. J
We now discuss the above definitions. Below, we will take puv = [κn (xu , xv ) ∧ n]/n.
Then the assumptions in (3.2.2), (3.2.3), (3.2.5) imply that the expected number of edges
E[|E(IRG n (κn ))|] is proportional to n, and that the proportionality constant is precisely
1
RR
2 S2
κ(x, y)µ(dx)µ(dy). Thus, in the terminology of [V1, Chapter 1], the model is sparse
(recall Section 1.1.1). This sparsity allows us to approximate graphical kernels by bounded
3.2 Definition of the Model 99

ones in such a way that the number of removed edges is oP (n), a fact that will be crucially
used in what follows. Indeed, bounded graphical kernels can be well approximated by step
functions similarly to the way in which continuous functions on R can be well approximated
by step functions. In turn, such step functions on S × S correspond to random graphs with
vertices having only finitely many different types.
We extend the setting to n-dependent sequences (κn )n≥1 of kernels in (3.2.4), as in many
natural cases the kernels do depend on n. In particular, this allows us to deal with several
closely related and natural notions of the edge probabilities, all at the same time (see, e.g.,
(3.2.6) and (3.2.7) below), showing that identical results hold in each of these cases.
Roughly speaking, κ is reducible if the vertex set [n] of IRGn (κ) can be split into two
parts in such a way that the probability of an edge from one part to the other is zero, and κ
is irreducible otherwise. For reducible kernels, we could equally well have started with each
of these parts separately, explaining why the notion of irreducibility is quite natural.
In many cases, we take S = [0, 1], xi = i/n, and µ the Lebesgue-measure on [0, 1].
Then, clearly, (3.2.1) is satisfied. In fact, Janson (2009) shows that we can always restrict
to S = [0, 1] by suitably adapting the other choices of our model. However, for notational
purposes, it is more convenient to work with general S . For example, when S = {1} is just
a single type, the model reduces to the Erdős–Rényi random graph, and, in the setting where
S = [0, 1], this is slightly more cumbersome, as can be worked out in detail in Exercise 3.1.

3.2.2 I NHOMOGENEOUS R ANDOM G RAPHS AND T HEIR E DGE P ROBABILITIES

Now we come to the definition of our random graph. Given a sequence of kernels (κn )n≥1 ,
for n ∈ N, we let IRGn (κn ) be the random graph on [n] in which each possible edge
uv = {u, v}, where u, v ∈ [n] with u 6= v , is present with probability
1
puv (κn ) = puv = [κn (xu , xv ) ∧ n], (3.2.6)
n
and the events that different edges are present are independent. Exercise 3.2 shows that the
lower bound in (3.2.3) always holds for IRGn (κ) when κ is independent of n and continu-
ous a.e. Further, Exercise 3.3 shows that (3.2.3) holds for IRGn (κ) when κ is bounded and
continuous.
We also allow for the choices, inspired by the Norros–Reittu and generalized random
graphs,

−κn (xu ,xv )/n κn (xu , xv )

uv (κn ) = 1 − e
p(NR) , and p(GRG)
uv (κn ) = . (3.2.7)
n + κn (xu , xv )
All the results presented here remain valid for the choices in (3.2.7). When
X
κn (xu , xv )3 = o(n3 ), (3.2.8)
u,v∈[n]

this follows immediately from [V1, Theorem 6.18] (see Exercise 3.4). In the next section,
we discuss some examples of inhomogeneous random graphs.
100 Connected Components in General Inhomogeneous Random Graphs

3.2.3 E XAMPLES OF I NHOMOGENEOUS R ANDOM G RAPHS

Erdős–Rényi Random Graph
If S is general and κ(x, y) = λ for every x, y ∈ S then the edge probabilities puv given
by (3.2.6) are all equal to λ/n (for n > λ). It follows that IRGn (κ) = ERn (λ/n). The
simplest choice here is to take S = {1}.

Chung–Lu Model
For CLn (w) with w = (wv )v∈[n] , where wv = [1 − F ]−1 (v/n) as in (1.3.15), we take
S = [0, 1], xv = v/n and, with ψ(x) = [1 − F ]−1 (x),
κn (x, y) = ψ(x)ψ(y)n/`n . (3.2.9)
For CLn (w) with w = (wv )v∈[n] satisfying Condition 1.1 in Section 1.3.2, instead, we take
S = [0, 1], xv = v/n, and
κn (u/n, v/n) = wu wv /E[Wn ]. (3.2.10)
Exercises 3.5 and 3.6 study the Chung–Lu random graph in the present framework.

Homogeneous Bipartite Random Graph

Let n be even, let S = {1, 2}, and let xv = 1 for v ∈ [n/2] and xv = 2 for v ∈ [n] \ [n/2].
Further, let κ be defined by κ(x, y) = λ when x 6= y and κ(x, y) = 0 when x = y .
Then IRGn (κ) is the random bipartite graph with n/2 vertices in each class, where each
possible edge between classes is present with probability λ/n, independently of the other
edges, while the edges within each of the two classes are all absent. Exercise 3.7 investigates
the validity of Definitions 3.2 and 3.3 for homogeneous bipartite graphs.

Stochastic Block Model

The stochastic block model generalizes the above setting. Again, let n be even, let S =
{1, 2}, and let xv = 1 for v ∈ [n/2] and xv = 2 for v ∈ [n] \ [n/2]. Further, let κ be
defined by κ(x, y) = b when x 6= y and κ(x, y) = a when x = y . This means that vertices
of the same type are connected with probability a/n, while vertices with different types are
connected with probability b/n. A major research effort has been devoted to studying when
it can be statistically detected that a > b. Below, we also investigate more general stochastic
block models.

Homogeneous Random Graphs

We call an inhomogeneous random graph homogeneous when, for almost every x ∈ S ,
Z
λ(x) = κ(x, y)µ(dy) ≡ λ. (3.2.11)
S

Thus, despite the inhomogeneity that is present, every vertex in the graph has (asymptot-
ically) the same number of expected offspring. Exercise 3.8 shows that the Erdős–Rényi
random graph, the homogeneous bipartite random graph, and the stochastic block model are
all homogeneous random graphs. In such settings, the level of inhomogeneity is limited.
3.3 Degree Sequence of Inhomogeneous Random Graphs 101

Inhomogeneous Random Graphs with Finitely Many Types

Fix t ≥ 2 and suppose we have a graph with t different types of vertices. Let S = [t]. Let
ns denote the number of vertices of type s, and let µn (s) = ns /n. Let IRGn (κn ) be the
random graph where two vertices, of types s and r, respectively, are independently joined
by an edge with probability κn (s, r)/n ∧ 1. Then κn is equivalent to a t × t matrix, and
the random graph IRGn (κn ) has vertices of t different types (or colors). We conclude that
our general inhomogeneous random graph covers the cases of a finite (or even countably
infinite) number of types. Exercises 3.9–3.11 study the setting of inhomogeneous random
graphs with finitely many types. It will turn out that this case is particularly important, as
many other settings can be arbitrarily well approximated by inhomogeneous random graphs
with finitely many types. As such, this model will be the building block upon which most of
our results are built.

Uniformly Grown Random Graph

The uniformly grown random graph model, or Dubins model, is an example of the general
inhomogeneous random graphs as discussed in the previous section. We take a vertex space
[n], and assume that each edge uv is present with probability
λ
puv = , (3.2.12)
max{u, v}
all edge statuses being independent random variables. Equivalently, we can view this random
graph as arising dynamically, where vertex n connects to a vertex m < n with probability
λ/n independently for all m ∈ [n−1]. In particular, the model with n = ∞ is well defined.

Sum Kernels
We have already seen that product kernels are special, as they give rise to the Chung–Lu
model or its close relatives, the generalized random graph and the Norros–Reittu model. For
sum kernels, instead, we take κ(x, y) = ψ(x) + ψ(y), so that
puv = min{(ψ(u/n) + ψ(v/n))/n, 1}. (3.2.13)

Conclusion on Inhomogeneous Random Graph Models

We conclude that there are many examples of random graphs with independent edges that
fall into the general class of inhomogeneous random graphs, some of them leading to rather
interesting behavior. In what follows, we investigate them in general. We start by investigat-
ing their degree structure. Let us make some notational conventions. We will use S = [t]
for the type space of finite-type inhomogeneous random graphs, and write s, r for elements
in it. However, we write x, y ∈ S for pairs of types when S is continuous.

3.3 D EGREE S EQUENCE OF I NHOMOGENEOUS R ANDOM G RAPHS

We start by investigating the degrees of the vertices of IRGn (κn ). As we shall see, the
degree of a vertex of a given type x is asymptotically Poisson with mean
Z
λ(x) = κ(x, y)µ(dy) (3.3.1)
S
102 Connected Components in General Inhomogeneous Random Graphs

that (possibly) depends on the type x ∈ S . This leads to a mixed-Poisson distribution for
the degree D of a (uniformly chosen) random vertex of IRGn (κn ). We recall that Nk (n)
denotes the number of vertices of IRGn (κn ) with degree k , i.e.,

1{dv =k} ,
X
Nk (n) = (3.3.2)
v∈[n]

where dv is the degree of vertex v ∈ [n]. Our main result is as follows:

Theorem 3.4 (Degree sequence of IRGn (κn )) Let (κn ) be a graphical sequence of ker-
nels, with limit κ as described in Definition 3.3(a). For any fixed k ≥ 0,
λ(x)k −λ(x)
Z
P
Nk (n)/n −→ e µ(dx), (3.3.3)
S k!
where x 7→ λ(x) is defined by
Z
λ(x) = κ(x, y)µ(dy). (3.3.4)
S

Theorem 3.4 is equivalent to the statement that

Nk (n)/n −→ P(D = k),
P
(3.3.5)
where D has a mixed-Poisson distribution with mixing distribution Wλ given by
P(Wλ ≤ x) = µ({y ∈ S : λ(y) ≤ x}). (3.3.6)
See Figure 3.2 for examples of the degree distribution in the generalized random graph:
we show the degree distribution itself, the size-biased degree distribution, and the degree
distribution of a random neighbor of a uniform vertex.
In the remainder of this section we will prove Theorem 3.4. This proof is a good exam-
ple of how proofs for inhomogeneous random graphs will be carried out later in this text.
Indeed, we start by proving Theorem 3.4 for the finite-types case, which is substantially
easier. After this, we give a proof in the general case, for which we need to prove results on
approximations of sequences of graphical kernels. These approximations apply to bounded
kernels, and thus we also need to show that unbounded kernels can be well approximated by
bounded kernels. It is here that the assumption (3.2.3) is crucially used.

3.3.1 D EGREE S EQUENCE OF F INITE -T YPE C ASE

Now we will prove Theorem 3.4 in the finite-types case, for which S = [t] for some t < ∞.
Take a vertex v of type s, let dv be its degree, and let dv,r be the number of edges from v to
vertices of type r ∈ [t]. Then, clearly,
X
dv = dv,r . (3.3.7)
r∈[t]

Recall that, in the finite-type case, the edge probability between vertices of types s and
r is denoted by (κn (s, r) ∧ n)/n. Further, (3.2.4) implies that κn (s, r) → κ(s, r) for
every s, r ∈ [t], while (3.2.1) implies that the number ns of vertices of type s satisfies
µn (s) = ns /n → µ(s) for some probability distribution (µ(s))s∈[t] .
3.3 Degree Sequence of Inhomogeneous Random Graphs 103

(a) (b)
100 100

10−1 10−1

10−2 10−2
P(X > x)

P(X > x)
10−3 10−3

Figure 3.2 Degree distributions in the generalized random graph with

n = 100, 000: (a) τ = 2.5 and (b) τ = 3.5.

Assume that n ≥ max κ. The random variables (dv,r )r∈[t] are independent, and dv,r ∼
Bin(nr − 1{s=r} , κn (s, r)/n) −→ Poi(µ(r)κ(s, r)), where nr is the number of vertices
d

with type r and µ(r) = limn→∞ nr /n is the limiting type distribution. Hence
X
d
dv −→ Poi κ(s, r)µ(r) = Poi(λ(s)), (3.3.8)
r∈[t]

R P
where λ(s) = κ(s, r)µ(dr) = r∈[t] κ(s, r)µ(r). Consequently,

λ(s)k −λ(s)
P(dv = k) → P(Poi(λ(s)) = k) = e . (3.3.9)
k!
Let Nk,s (n) be the number of vertices in IRGn (κn ) of type s with degree k . Then

1 1
E[Nk,s (n)] = ns P(dv = k) → µ(s)P(Poi(λ(s)) = k). (3.3.10)
n n
It is easily checked that Var(Nk,s (n)) = O(n) (see Exercise 3.12). Hence,

1
Nk,s (n) −→ P(Poi(λ(s)) = k)µ(s),
P
(3.3.11)
n
and thus, summing over s ∈ [t],

1 X 1 X
P(Poi(λ(s)) = k)µ(s) = P(D = k).
P
Nk (n) = Nk,s (n) −→ (3.3.12)
n s∈[t]
n s∈[t]

This proves Theorem 3.4 in the finite-type case.

In order to prove Theorem 3.4 in the general case, we approximate a sequence of graph-
ical kernels (κn ) by appropriate regular finite kernels, as we explain in detail in the next
subsection.
104 Connected Components in General Inhomogeneous Random Graphs

3.3.2 F INITE -T YPE A PPROXIMATIONS OF B OUNDED K ERNELS

Recall that S is a separable metric space, and that µ is a Borel measure on S with µ(S) = 1.
Here the metric and topological structure of S are important. We refer to Appendix A.1 for
more details on metric spaces.
In this section, we assume that (κn ) is a graphical sequence of kernels with limit κ, as de-
scribed in Definition 3.3(a). Our lower bounds on the kernels do not require the boundedness
of κ, but the upper bounds do. Thus, we sometimes assume that
sup sup κn (x, y) < ∞, (3.3.13)
n≥1 x,y∈S

Our aim is to find finite-type approximations of κn that bound κn from above and below.
It is here that the metric structure of S , as well as the convergence properties of κn and
a.e.-continuity of (x, y) 7→ κ(x, y) in Definition 3.3, are crucially used:
Proposition 3.5 (Finite-type approximations of general kernels) If (κn )n≥1 is a graph-
ical sequence of kernels on a vertex space (S, µ, (xn )n≥1 ) with limit κ, then there exist
sequences (κm )m≥1 , and (κm )m≥1 when (3.3.13) holds, of finite-type kernels on the same
vertex space (S, µn , (xn )n≥1 ) satisfying the following:
(a) if κ is irreducible, then so are κm and κm for all large enough m;
(b) κm (x, y) % κ(x, y) for (µ × µ)-a.e. x, y ∈ S ;
(c) κm (x, y) & κ(x, y) for (µ × µ)-a.e. x, y ∈ S .
Let us now give some details. We find these finite-type approximations by giving a par-
tition Pm of S on which κn (x, y) is almost constant when x and y are inside cells of the
partition. Fix m ≥ 1; this indicates the number of cells in the partition of S . Given a se-
quence of finite partitions Pm = {Am1 , . . . , AmMm } of S and an x ∈ S , we define the
function x 7→ im (x) by requiring that
x ∈ Am,im (x) . (3.3.14)
Thus, im (x) indicates the cell in Pm containing x. For A ⊆ S , we write diam(A) =
sup{dist(x, y) : x, y ∈ A}, where dist(·, ·) denotes the distance on S . We obtain the
following key approximation result:
Lemma 3.6 (Approximating partition) Fix m ≥ 1. There exists a sequence of finite parti-
tions Pm = {Am1 , . . . , AmMm } of S such that:
(a) each Ami is measurable and µ(∂Ami ) = 0;
(b) for each m, Pm+1 refines Pm , i.e., each Ami is a union j∈Jmi Am+1,j for some set
S
Jmi ;
(c) for almost every x ∈ S , diam(Am,im (x) ) → 0 as m → ∞, where im (x) is defined by
(3.3.14).
Proof This proof is a little technical. When S = (0, 1] and µ is continuous, we can take
Pm as the dyadic partition into intervals of length 2−m . If S = (0, 1] and µ is arbitrary,
then we can do almost the same: only we shift the endpoints of the intervals a little when
necessary to avoid point masses of µ.
In general, we can proceed as follows. Let z1 , z2 , . . . be a dense sequence of points in
3.3 Degree Sequence of Inhomogeneous Random Graphs 105

S . For any zi , the balls Bd (zi ) = {y ∈ S : dist(y, zi ) ≤ d}, for d > 0, have disjoint
boundaries, and thus all except at most a countable number of them are µ-continuity sets.
Consequently, for every m ≥ 1, we may choose balls Bmi =SBdmi (zi ) that are µ-continuity
S satisfying 1/m < dmi < 2/m. Then, i Bmi = 0S and if we define
sets and have radii
0
Bmi := Bmi \ j<i Bmj , we obtain for each m an infinite partition {Bmi }i≥1 of S into µ-
continuity sets, each with diameter S at most 4/m. To get a finite partition, we choose qm large
enough to ensure that, with B00 := i>qm Bmi 0
, we have µ(Bm0 0
) < 2−m ; then {Bmi
0
}qi=0
m
is
0
a partition of S for each m, with diam(Bmi ) ≤ 4/m Tmfor i ≥ 1.
Finally, we let Pm consist of all intersections l=1 Bli0 l with 0 ≤ il ≤ ql ; then con-
ditions (a) and (b) are satisfied. Condition (c) follows from the Borel–Cantelli Lemma: as
0 0
m0 ) is finite, a.e. x is in finitely many of the sets Bm0 . For any such x, if m is
P
m µ(B
0
large enough then x ∈ Bmi for some i ≥ 1, so the part of Pm containing x has diameter
0
satisfying diam(Bmi ) ≤ 4/m.
Now we are ready to complete the proof of Proposition 3.5.
Proof of Proposition 3.5. Recall from Definition 3.3 that a kernel κ is a symmetric measur-
able function on S × S that is a.e. continuous. Recall also that κn is a graphical sequence
of kernels, so that it satisfies the convergence properties in (3.2.4). Fixing a sequence of par-
titions with the properties described in Lemma 3.6, we can define sequences of lower and
upper approximations to κ by
κm (x, y) = inf{κ(x0 , y 0 ) : x0 ∈ Am,im (x) , y 0 ∈ Am,im (y) }, (3.3.15)
κm (x, y) = sup{κ(x0 , y 0 ) : x0 ∈ Am,im (x) , y 0 ∈ Am,im (y) }. (3.3.16)
We thus replace κ by its infimum or supremum on each Ami × Amj . As κm might be
+∞, we use it only for bounded κn as in (3.3.13). Obviously, κm and κm are constant on
Am,i ×Am,j for every i, j , so that κm and κm correspond to finite-type kernels (see Exercise
3.14).
By Lemma 3.6(b),
κm ≤ κm+1 and κm ≥ κm+1 . (3.3.17)
Furthermore, since κ is almost everywhere continuous then, by Lemma 3.6(c),
κm (x, y) → κ(x, y) and κm (x, y) → κ(x, y) for (µ × µ)-a.e. (x, y) ∈ S 2 .
(3.3.18)
If (κn ) is a graphical sequence of kernels with limit κ, then we similarly define
κm (x, y) := inf{(κ ∧ κn )(x0 , y 0 ) : x0 ∈ Am,im (x) , y 0 ∈ Am,im (y) , n ≥ m}, (3.3.19)
κm (x, y) := sup{(κ ∨ κn )(x0 , y 0 ) : x0 ∈ Am,im (x) , y 0 ∈ Am,im (y) , n ≥ m}. (3.3.20)
By (3.3.17), κm ≤ κm+1 , and, by Lemma 3.6(c) and (3.2.4) in Definition 3.3(a),
κm (x, y) % κ(x, y) as m → ∞, for (µ × µ)-a.e. (x, y) ∈ S 2 . (3.3.21)
This proves part (b) of Proposition 3.5. The proof of part (c) is similar. For the irreducibility
in part (a), we may assume that µ is irreducible. In fact, κm may be reducible for some
m. We omit the proof that κm can be adapted in such a way that the adapted version is
irreducible.
106 Connected Components in General Inhomogeneous Random Graphs

Since κm ≤ κ, we can obviously construct our random graph in such a way that all
edges in IRGn (κm ) are also present in IRGn (κn ), which we will write as IRGn (κm ) ⊆
IRGn (κn ), and in what follows we will assume this. See also Exercise 3.15. Similarly, we
shall assume that IRGn (κm ) ⊇ IRGn (κn ) when κn is bounded as in (3.3.13). Moreover,
when n ≥ m,
κn ≥ κm , (3.3.22)
and we may assume that IRGn (κm ) ⊆ IRGn (κn ). By the convergence of the sequence of
kernels (κn ), we further obtain that the number of edges also converges. Thus, in bounding
κn , we do not create or destroy too many edges. This provides the starting point of our
analysis, which we provide in the following subsection.

3.3.3 D EGREE S EQUENCES OF G ENERAL I NHOMOGENEOUS R ANDOM G RAPHS

Now we are ready to complete the proof of Theorem 3.4 for general sequences of graphical
kernels (κn ). Define κm by (3.3.19). Since we will be only using the lower bounding kernel
κm (which always exists), we need not assume that κn is bounded.
Let ε > 0 be given. From (3.2.5) and monotone convergence, there is an m such that
ZZ ZZ
κm (x, y)µ(dx)µ(dy) > κ(x, y)µ(dx)µ(dy) − ε. (3.3.23)
S2 S2

Recall that IRGn (κm ) ⊆ IRGn (κn ). By (3.2.5) and (3.3.23),

1
|E(IRGn (κn ) \ IRGn (κm ))|
n
1 1
= |E(IRGn (κn ))| − |E(IRGn (κm ))|
n ZZ n
1 1 ε
ZZ
P
−→ κ(x, y)µ(dx)µ(dy) − κm (x, y)µ(dx)µ(dy) < , (3.3.24)
2 S2 2 S2 2
so that, whp, |E(IRGn (κn ) \ IRGn (κm ))| < εn. Let us write Nk(m) (n) for the number
of vertices of degree k in IRGn (κm ) (and Nk (n) for those in IRGn (κn )). It follows that,
whp,
|Nk(m) (n) − Nk (n)| < 2εn. (3.3.25)
Writing D(m) for the equivalent of D defined using κm in place of κ, we have Nk(m) (n)/n
−→ P(D(m) = k) by the proof for the finite-type case. Thus, whp,
P

|Nk(m) (n)/n − P(D(m) = k)| < ε. (3.3.26)

Finally, we have E[D] = S λ(x)µ(dx) = S 2 κ(x, y)µ(dx)µ(dy). Since λ(m) (x) ≤

R RR

λ(x), we can couple the limiting degrees in such a way that D(m) ≤ D almost surely, and
thus
P(D 6= D(m) ) = P(D − D(m) ≥ 1) ≤ E[D − D(m) ]
ZZ ZZ
= κ(x, y)µ(dx)µ(dy) − κm (x, y)µ(dx)µ(dy) < ε. (3.3.27)
S2 S2
3.3 Degree Sequence of Inhomogeneous Random Graphs 107

Combining (3.3.25), (3.3.26), and (3.3.27), we see that |Nk (n)/n− P(D = k))| < 4ε whp,
as required.

3.3.4 D EGREE D ISTRIBUTION IN I NHOMOGENEOUS R ANDOM G RAPHS : D ISCUSSION

Now that we have identified the limit of the degree distribution, let us discuss its proof as
well as some properties of the limiting degree distribution.

Bounded Kernels
First of all, the above proof is exemplary of several proofs that we will use in this chapter
as well as in Chapter 6. The current proof is particularly simple, as it makes use only of the
lower bounding finite-type inhomogeneous random graph, while in many settings we also
need the upper bound. This upper bound can apply only to bounded kernels κn as in (3.3.13).
As a result, we need to study the effect of bounding κn , for example by approximating it by
κn (x, y) ∧ K for large enough K .

Tail Properties of the Degree Distribution

Let W = Wλ be the random variable λ(U ), where U is a random variable on S having
distribution µ. Then we can also describe the mixed-Poisson distribution of D as Poi(W ).
Under mild conditions, the tail probabilities P(D > k) and P(W > k) agree for large k .
We state this for the case of power-law tails; many of these results generalize to regularly
varying tails. Let N>k (n) be the number of vertices with degree larger than k .

Corollary 3.7 (Power-law tails for the degree sequence) Let (κn ) be a graphical sequence
of kernels with limit κ. Suppose that

P(W > k) = µ {x ∈ S : λ(x) > k} = cW k −(τ −1) (1 + o(1))

(3.3.28)

as k → ∞, for some cW > 0 and τ > 2. Then

N>k (n)/n −→ P(D > k) = cW k −(τ −1) (1 + o(1)),

P
(3.3.29)

where the first limit is for k fixed and n → ∞, and the second for k → ∞.

Proof It suffices to show that P(D > k) = cW k −(τ −1) (1 + o(1)); the remaining conclu-
sions then follow from Theorem 3.4. For any ε > 0, as k → ∞,

P(Poi(W ) > k | W > (1 + ε)k) → 1,

and P(Poi(W ) > k | W < (1 − ε)k) = o(k −(τ −1) ). (3.3.30)

It follows that P(D > k) = P(Poi(W ) > k) = cW k −(τ −1) (1 + o(1)) as k → ∞. Exercise
3.16 asks you to fill in the details of this argument.
Corollary 3.7 shows that the general inhomogeneous random graph does include natural
cases with power-law degree distributions. Recall that we have already observed in [V1,
Theorem 6.7] that this is the case for GRGn (w) when the weights sequence w is chosen
appropriately.
108 Connected Components in General Inhomogeneous Random Graphs

3.4 M ULTI -T YPE B RANCHING P ROCESSES

In order to study further properties of IRGn (κn ), we need to understand the neighborhood
structure of vertices. This will be crucially used in the next section, where we study the
local convergence properties of IRGn (κn ). For simplicity, let us restrict ourselves first to
the finite-types case. As we have seen, nice kernels can be arbitrarily well approximated by
finite-type kernels, so this should be a good start. Then, for a vertex of type s, the number
of neighbors of type r is close to Poisson-distributed with approximate mean κ(s, r)µ(r).
Even when we assume independence of the neighborhood structures of different vertices, we
still do not arrive at a classical branching process as discussed in [V1, Chapter 3]. Instead,
we can describe the neighborhood structure with a branching process in which we keep track
of the type of each vertex. For general κ and µ, we can even have a continuum of types. Such
branching processes are called multi-type branching processes. In this section, we discuss
some of the basics of these processes.

3.4.1 M ULTI -T YPE B RANCHING P ROCESSES WITH F INITELY M ANY T YPES

Multi-type branching process can be effectively analyzed using linear algebra in the finite-
type case, and functional analysis in the infinite-type case. In order to do so, we first in-
troduce some notation. We will assume that we are in the finite-type case, and denote the
number of types by t. We let j = (j1 , . . . , jt ) ∈ Nt0 be a vector of non-negative integers,
and denote by p(s)
j the probability that an individual of type s gives rise to offspring j , i.e.,
js children of type s for all s ∈ [t]. The offsprings of different individuals are all mutually
(s)
independent. Denote by Zk,r the number of individuals of type r in generation k when start-
ing from a single individual of type s and Z (s) (s) (s)
k = (Zk,1 , . . . , Zk,t ). We are interested in the
survival or extinction of multi-type branching processes, and in the growth of the generation
sizes. In the multi-type case, we are naturally led to a matrix set-up.
We now discuss the survival versus extinction of multi-type branching processes. We
denote the survival probability of the multi-type branching process when one starts from a
single individual of type s ∈ [t] by
ζ (s) = P(Z (s)
k 6= 0 for all k), (3.4.1)
and we let ζ = (ζ (1) , . . . , ζ (t) ) denote the vector of survival probabilities. Exercise 3.18
identifies the survival probability when starting with a random type having distribution
(µ(s))s∈[t] . Our first aim is to investigate the condition for ζ = 0 to hold.

Multi-Type Branching Processes and Generating Functions

t
We write p(j) = (p(1) (r)
j , . . . , pj ) and, for z ∈ [0, 1] , we let
X (s) Y
G(s) (z) = pj zrjr (3.4.2)
j r∈[t]

be the joint probability generating function of the offspring of an individual of type s ∈ [t].
We write
G(z) = (G(1) (z), . . . , G(r) (z)) (3.4.3)
3.4 Multi-Type Branching Processes 109

for the vector of generating functions. We now generalize [V1, Theorem 3.1] to the multi-
type case.
Let ζ be the smallest solution in the lexicographic order on Rt to
ζ = 1 − G(1 − ζ). (3.4.4)
It turns out that ζ is the vector whose sth component equals the survival probability of
(Z k(s) )k≥0 . Define
h Y Z (s) i
k (z) = E
G(s) zt k,t , (3.4.5)
r∈[t]

and Gk (z) = (G(1) (r)

k (z), . . . , Gk (z)). Then, Gk+1 (z) = Gk (G(z)) = G(Gk (z)). Since
Gk (0) = (P(Z k = 0))k≥0 , we also have that limk→∞ Gk (0) is the vector of extinction
(s)

probabilities, so that ζ = 1 − limk→∞ Gk (0) is the vector of survival probabilities.

Naturally, the vector of survival probabilities depends sensitively on the type of the ances-
tor of the branching process. On the other hand, under reasonable assumptions, the positivity
of the survival probability is independent of the type of the root of the branching process.
A necessary and sufficient condition for this property is that, with positive probability, an
individual of type s arises as a descendent of an individual of type r for each pair of types
s and r. See Exercise 3.19. Exercise 3.20 relates this to the lth power of the mean offspring
matrix (E[Z1,r(s)
])s,r∈[t] .
We next exclude a case where the branching mechanism is trivial:
Definition 3.8 (Singular multi-type branching processes) We call a multi-type branching
process singular when G in (3.4.3) equals G(z) = Mz for some matrix M. Otherwise we
call the multi-type branching process non-singular. J
For a singular multi-type branching process, each individual in the branching process has
precisely one offspring almost surely (see Exercise 3.21). When each individual has pre-
cisely one offspring, the multi-type branching process is equivalent to a Markov chain, and
the process almost surely survives. Thus, in this case there is no survival versus extinction
phase transition. We assume throughout the remainder of this section that the multi-type
branching process is non-singular.

3.4.2 S URVIVAL VERSUS E XTINCTION OF M ULTI -T YPE B RANCHING P ROCESSES

Let λsr = E[Z1,r (s)
] denote the expected number of offspring of type r of a single individ-
ual of type s. In analogy to the random graph setting, we write λsr = κ(s, r)µ(r). This
can always be done (and in fact in many ways), for example by taking κ(s, r) = tλsr ,
µ(r) = 1/t. The current set-up is convenient, however, as it allows the expected number
of neighbors in the inhomogeneous random graphs to be matched to the expected number
of offspring in the multi-type branching process. For random graphs, κ(s, r) = κ(r, s) is
necessarily symmetric, while for multi-type branching processes this is not necessary. Let
T κ = (κ(s, r)µ(r))s,r∈[t] be the matrix of expected number of offspring.
Definition 3.9 (Irreducible and positively regular multi-type branching processes) We call
a multi-type branching process irreducible when, for every pair of types s, r ∈ [t], there
110 Connected Components in General Inhomogeneous Random Graphs

exists l such that (T lκ )s,r > 0, where the matrix T lκ is the lth power of T κ . We call a
multi-type branching process positively regular if there exists l such that (T lκ )s,r > 0 for
all s, r ∈ [t]. J
The definition of irreducible multi-type branching processes in Definition 3.9 is closely
related to that of irreducible random graph kernels in Definition 3.3. The name irreducibility
can be understood since it implies that the Markov chain of the number of individuals of the
various types is irreducible.
By the Perron–Frobenius theorem, in the positively regular case, the matrix T κ has a
unique largest eigenvalue equal to kT κ k with non-negative left eigenvector xκ , and the
eigenvalue kT κ k can be computed as
sX
kT κ k = sup kT κ xk, where kxk = x2s . (3.4.6)
x : kxk≤1
s∈[t]

Fix s ∈ [t]. We note that, for k ≥ 0 and all y ∈ Nt0 ,

E[Z (s) (s)
k+1 | Z k = y] = T κ y, (3.4.7)
so that
k (s)
E[Z (s)
k ] = T κe , (3.4.8)
where T kκ denotes the k -fold application of the matrix T κ and e(s) is the vector which has 1
at the sth position, and zeros otherwise.
The identities in (3.4.7) and (3.4.8) have several important consequences concerning the
phase transitions of multi-type branching processes. First, when kT κ k < 1,
E[Z (s) k k
k ] ≤ kT κ k ke k = kT κ k ,
(s)
(3.4.9)
which vanishes exponentially fast. Therefore, by the Markov inequality ([V1, Theorem
2.17]), a multi-type branching process dies out almost surely. When kT κ k > 1, on the
other hand, the sequence
−k
Mk = xκ Z (s)
k kT κ k (3.4.10)
is a non-negative martingale, by (3.4.7) and the fact that xκ is a left eigenvector with eigen-
value kT κ k, since xκ T κ = kT κ kxκ . By the Martingale Convergence Theorem ([V1, The-
orem 2.24]), the martingale Mk converges almost surely as k → ∞.
Under some further assumptions on Mk , for example that M1 has finite second moment,
a.s.
we obtain that Mk −→ M∞ and E[Mk ] → E[M∞ ]. More precisely, there is a multi-type
analog of the Kesten–Stigum Theorem ([V1, Theorem 3.10]). Since E[M0 ] = xκ e(s) > 0,
we thus have that Z (s)
k grows exponentially with a strictly positive probability, which implies
that the survival probability is positive. [V1, Theorem 3.1] can be adapted to show that
P
Z k(s) −→ 0 when kT κ k = 1 in the non-singular case. See, e.g. (Harris, 1963, Sections
II.6 and II.7). We conclude that ζ > 0 precisely when kT κ k > 1 for non-singular and
irreducible multi-type branching processes:
Theorem 3.10 (Survival versus extinction of finite-type branching processes) Let (Z (s)
k )k≥0
be a non-singular positively regular multi-type branching process with offspring matrix T κ
on the type space S = [t]. Then the following hold:
3.4 Multi-Type Branching Processes 111

(a) The survival probability ζ is the largest solution to ζ = 1 − G(1 − ζ), and ζ > 0
precisely when kT κ k > 1.
(b) Assume that kT κ k > 1. Let xκ be the unique positive left eigenvector of T κ . Then,
−k
as k → ∞ , the martingale Mk = xκ Z (i) k kT κ k converges almost surely to a non-
negative limit on the event of survivalPprecisely when E[Z1(s) log (Z1(s) )] < ∞ for all
s ∈ [t], where Z1(s) = kZ (s)
1 k1 =
(s)
r∈[t] Z1,r is the total number of offspring of a
type-s individual.

3.4.3 P OISSON M ULTI -T YPE B RANCHING P ROCESSES

We now specialize to Poisson multi-type branching processes as these turn out to be the most
relevant in the inhomogeneous random graph setting. We call a multi-type branching pro-
cess Poisson when all the numbers of children of each type are independent Poisson random
variables. Thus, Z (s) = (Z1,1(s) (s)
, . . . , Z1,t ) is a vector of independent Poisson random vari-
ables with means (κ(s, 1)µ(1), . . . , κ(s, t)µ(t)). As we will see later, Poisson multi-type
branching processes arise naturally when one is exploring a component of IRGn (κ) start-
ing at a vertex of type s. This is analogous to the use of the single-type Poisson branching
process in the analysis of the Erdős–Rényi random graph ERn (λ/n), as discussed in detail
in [V1, Chapters 4 and 5].

Poisson Multi-Type Branching Processes with Finitely Many Types

For Poisson multi-type branching processes with finitely many types, let s ∈ [t] be the type
of the root, and compute
h Y Z (s) i P
G(s) (z) = E zr 1,r = e r∈[t] κ(s,r)µ(r)(zr −1) = e(T κ (z−1))s . (3.4.11)
r∈[t]

Thus, the vector of survival probabilities ζ satisfies

ζ = 1 − e−T κ ζ , (3.4.12)
where, for a vector x, ex denotes the coordinate-wise exponential.
There is a beautiful property of Poisson random variables that allows us to construct a
Poisson multi-type branching process in a particularly convenient way. This property follows
from the following Poisson thinning property:
Lemma 3.11 (Poisson number of multinomial trials) Let X have a Poisson distribution
with parameter λ. Perform X multinomial trials, where the ith outcome appears with prob-
ability pi for probabilities (pi )ki=1 . Consider (Xi )ki=1 , where Xi denotes the total number
of outcomes i. Then (Xi )ki=1 is a sequence of independent Poisson random variables with
parameters (λpi )ki=1 .
Pk
Proof Let (xi )ki=1 denote a sequence of non-negative integers, denote x = i=1 xi and
compute
P((Xi )ki=1 = (xi )ki=1 ) = P(X = x)P((Xi )ki=1 = (xi )ki=1 | X = x) (3.4.13)
! k
x
λ x Y (λpi )xi
= e−λ px1 1 · · · pxkk = e−λpi ,
x! x1 , x2 , . . . , xk i=1
(xi )!
112 Connected Components in General Inhomogeneous Random Graphs
Pk
since i=1 pi = 1.
By Lemma 3.11, we can alternatively construct a Poisson branching process as follows.
For an individual of typePs, let its total number of offspring Ns have a Poisson distribution
with parameter λ(s) = r∈[t] κ(s, r)µ(r). Then give each of the children independently a
type r with probability κ(s, r)µ(r)/λ(s). Let Nsr denote the total number of individuals of
type r thus obtained. We conclude that Z (s) 1 has the same distribution as (Nsr )r∈[t] .
We now extend the above setting of finite-type Poisson multi-type branching processes
to the infinite-type case. Again, we prove results in the infinite-type case by reducing to the
finite-type case.

Poisson Multi-Type Branching Processes with Infinitely Many Types

Let κ be a kernel. We define the Poisson multi-type branching processes with kernel κ
as follows. Each individual of type x ∈ S is replaced in the next generation by a set of
individuals distributed as a Poisson process on S with intensity κ(x, y)µ(dy). Thus, the
number
R of children with types in a subset A ⊆ S has a Poisson distribution with mean
A
κ(x, y)µ(dy) , and these numbers are independent for disjoint sets A and for different
individuals; see, e.g., Kallenberg (2002) or Section 1.5.
Let ζκ (x) be the survival probability of the Poisson multi-type branching process with
kernel κ, starting from a root of type x ∈ S . Set
Z
ζκ = ζκ (x)µ(dx). (3.4.14)
S

Again, it can be seen, in a way similar to that above, that ζκ > 0 if and only if kT κ k > 1,
where now the linear operator T κ is defined, for f : S → R, by
Z
(T κ f )(x) = κ(x, y)f (y)µ(dy), (3.4.15)
S

for any (measurable) function f such that this integral is defined (finite or +∞) for a.e.
x ∈ S.
Note that T κ f is defined for every f ≥ 0, with 0 ≤ T κ f ≤ ∞. If κ ∈ L1 (S × S), as
we assume throughout, then T κ f is also defined for every bounded f . In this case T κ f ∈
L1 (S) and thus T κ f is finite almost everywhere.
The consideration of multi-type branching processes with a possibly uncountable number
of types requires some functional analysis. Similarly to the finite-type case in (3.4.6), we
define
n o
kT κ k = sup kT κ f k : f ≥ 0, kf k ≤ 1 ≤ ∞. (3.4.16)

When finite, kT κ k is the norm of T κ as an operator on L2 (S); it is infinite if T κ does not

define a bounded operator on L2 (S). The norm kT κ k is at most the Hilbert–Schmidt norm
of T κ :
ZZ 1/2
2
kT κ k ≤ kT κ kHS = kκkL2 (S×S) = κ(x, y) µ(dx)µ(dy) . (3.4.17)
S2
3.4 Multi-Type Branching Processes 113

We also define the non-linear operator Φκ by

Φκ f (x) = 1 − e−(T κ f )(x) ,

x ∈ S, (3.4.18)
for f ≥ 0. Note that for such an f we have 0 ≤ T κ f ≤ ∞, and thus 0 ≤ Φκ f ≤ 1. We
characterize the survival probability ζκ (x), and thus ζκ , in terms of Φκ , showing essentially
that the function x 7→ ζκ (x) is the maximal fixed point of the non-linear operator Φκ (recall
(3.4.12)). Again, the survival probability satisfies that ζκ > 0 precisely when kT κ k > 1,
recall the finite-types case in Theorem 3.10. This leads us to the following definition:
Definition 3.12 (Super- and subcritical multi-type branching processes) We call a multi-
type branching process supercritical when kT κ k > 1, critical when kT κ k = 1, and sub-
critical when kT κ k < 1. J
Then, the above discussion can be summarized by saying that a non-singular multi-type
branching process survives with positive probability precisely when it is supercritical.

Poisson Branching Processes and Product Kernels: The Rank-1 Case

We continue by studying the rank-1 case, where the kernel is of product structure. Let
κ(x, y) = ψ(x)ψ(y), so that
Z
(T κ f )(x) = ψ(x)ψ(y)f (y)µ(dy) = ψ(x)hψ, f iµ , (3.4.19)
S

where the inner product hf, giµ is defined as

Z
hf, giµ = f (x)g(y)µ(dy). (3.4.20)
S

In this case, we see that ψ is an eigenvector with eigenvalue

Z
ψ(y)2 µ(dy) ≡ kψk2L2 (µ) . (3.4.21)
S

Thus, also
kT κ k = kψk2L2 (µ) , (3.4.22)
since ψ is the unique non-negative eigenfunction, and a basis of eigenfunctions can be found
by taking a basis in the space orthogonal to ψ (each member of which will have eigenvalue
0). Thus, the rank-1 multi-type branching process is supercritical when kψk2L2 (µ) > 1,
critical when kψk2L2 (µ) = 1, and subcritical when kψk2L2 (µ) < 1.
The rank-1 case is rather special, and not only since we can explicitly compute the eigen-
vectors of the operator T κ . It also turns out that the rank-1 multi-type case reduces to a
single-type branching process with mixed-Poisson offspring distribution. For this, we recall
the construction right below Lemma 3.11. We compute that
Z Z
λ(x) = ψ(x)ψ(y)µ(dy) = ψ(x) ψ(y)µ(dy), (3.4.23)
S S

so that an offspring of an individual of type x receives a mark y ∈ A with probability

R R
κ(x, y)µ(dy) ψ(x)ψ(y)µ(dy) ψ(y)µ(dy)
Z Z
A
p(x, dy) = = R = RA . (3.4.24)
A A λ(x) ψ(x) S ψ(z)µ(dz) S
ψ(z)µ(dz)
114 Connected Components in General Inhomogeneous Random Graphs

We conclude that every individual chooses its type independently of the type of its parent.
This means that this multi-type branching process reduces to a single-type branching process
with offspring distribution Poi(Wλ ), where
R
ψ(y)µ(dy)
P(Wλ ∈ A) = RA . (3.4.25)
S
ψ(z)µ(dz)
This makes the rank-1 setting particularly appealing.

Poisson Branching Processes and Sum Kernels

For the sum kernel, the analysis becomes slightly more involved, but can still be solved.
Recall that κ(x, y) = ψ(x) + ψ(y) for the sum kernel. Anticipating a nice shape of the
eigenvalues and eigenvectors, we let φ(x) = aψ(x) + b, and verify the eigenvalue relation.
We compute
Z Z
(T κ φ)(x) = κ(x, y)φ(y)µ(dy) = [ψ(x) + ψ(y)](aψ(y) + b))µ(dy)
S S

= ψ(x) akψkL1 (µ) + b + akψkL2 (µ) + bkψkL1 (µ) ). (3.4.26)
The eigenvalues and corresponding eigenvectors satisfy
(T κ φ)(x) = λ(aψ(x) + b). (3.4.27)
Solving for a, b, λ leads to akψkL1 (µ) + b = aλ, akψk2L2 (µ) + bkψkL1 (µ) = λb, so that the
vector (a, b)T is the eigenvector with eigenvalue λ of the matrix
kψkL1 (µ)

1
. (3.4.28)
kψk2L2 (µ) kψkL1 (µ)
Solving this equation leads to eigenvalues
λ = kψkL1 (µ) ± kψkL2 (µ) , (3.4.29)
and the corresponding eigenvectors ψ(x) ± kψkL2 (µ) . Clearly, the maximal eigenvalue
equals λ = kψkL2 (µ) + kψkL1 (µ) , with corresponding L2 (µ)-normalized eigenvector
ψ(x) + kψkL2 (µ)
φ(x) = . (3.4.30)
2 kψk2L2 (µ)
+ kψkL1 (µ) kψkL2 (µ)
All other eigenvectors can be chosen to be orthogonal to ψ and 1, so that this corresponds
to a rank-2 setting.

Unimodular Poisson Branching Processes and Notation

In what follows, we will often consider multi-type Poisson branching processes that start
from the type distribution µ. Thus, we fix the root ∅ in the branching-process tree, and give
it a random type Q satisfying P(Q ∈ A) = µ(A), for any measurable A ⊆ X . This
corresponds to the unimodular setting, which is important in random graph settings. The
idea is that the total number of vertices with types in A is close to nµ(A), so that if we pick
a vertex uar, it has a type in A with asymptotic probability equal to µ(A). Recall Definition
1.26 for unimodular single-type branching processes.
3.5 Local Convergence for Inhomogeneous Random Graphs 115

We now introduce some helpful notation along the lines of that in Section 1.5. We let
BP≤r denote the branching process up to and including generation r, where, for each in-
dividual v in the rth generation, we record its type as Q(v). It is convenient to think of
the branching-process tree, denoted as BP, as being labeled in the Ulam–Harris way (recall
Section 1.5), so that a vertex v in generation r has a label ∅a1 · · · ar , where ai ∈ N. When
applied to BP, we denote this process by (BP(t))t≥1 , where BP(t) consists of precisely
t + 1 vertices and their types (with BP(0) equal to the root ∅ and its type Q(∅)). We recall
Definitions 1.24 and 1.25 for details.

Monotone Approximations of Kernels

In what follows, we often approximate general kernels by kernels with finitely many types,
as described in Proposition 3.5. For monotone sequences, we can prove the following con-
vergence result:
Theorem 3.13 (Monotone approximations of multi-type Poisson branching processes) Let
(n)
(κn ) be a sequence of kernels such that κn (x, y) % κ(x, y). Let BP≤r denote the first r
generations of the Poisson multi-type branching process with kernel κn and BP≤r that of
d
the Poisson multi-type branching process with kernel κ. Then BP(n) ≤r −→ BP≤r . Further, let
(n)
ζ≥k (x) denote the probability that an individual of type x has at least k descendants. Then
(n)
ζ≥k (x) % ζ≥k (x).
Proof Since κn (x, y) % κ(x, y), we can write
X
κ(x, y) = ∆κn (x, y), where ∆κn (x, y) = κn (x, y) − κn−1 (x, y). (3.4.31)
n≥1

We can represent this by a sum of independent Poisson multi-type processes with intensities
∆κn (x, y) and can associate a label n with each individual that arises from ∆κn (x, y).
Then the branching process BP(n)≤r is obtained by keeping all vertices with labels at most
(n) d
n, while BP≤r is obtained by keeping all vertices. Consequently, BP≤r −→ BP≤r fol-
lows since κn → κ. Further, 1 − ζ≥k (x) = ζ<k (x) = P(|BP≤k | ≥ k), which thus also
(n) (n) (n)

converges.

3.5 L OCAL C ONVERGENCE FOR I NHOMOGENEOUS R ANDOM G RAPHS

In this section, we prove the local convergence of IRGn (κn ) in general:

Theorem 3.14 (Local convergence of IRGn (κn )) Assume that κn is an irreducible graph-
ical kernel converging to some limiting kernel κ. Then IRGn (κn ) converges locally in prob-
ability to the unimodular multi-type marked branching-process tree (G, o) ∼ ρ, where
B the root has type Q with distribution
ρ(Q ∈ A) = µ(A) for all A ⊆ S; (3.5.1)
B a vertex of type x independently has offspring distribution Poi(λ(x)), with
Z
λ(x) = κ(x, y)µ(dy), (3.5.2)
S
116 Connected Components in General Inhomogeneous Random Graphs

and each of its offspring receives an independent type with distribution Q(x) given by
R
κ(x, y)µ(dy)
ρ(Q(x) ∈ A) = RA . (3.5.3)
S
κ(x, y)µ(dy)
The proof of Theorem 3.14 follows a familiar pattern: we first prove it for the finite-type
case, and then use finite-type approximations to extend the proof to the infinite-type case.
We use ρ for the law of the local limit rather than µ, as in Chapter 2, to avoid confusion with
the limiting type measure µ appearing in the definition of IRGn (κn ).

3.5.1 L OCAL C ONVERGENCE : F INITELY M ANY T YPES

In order start the proof of (2.4.11) for Theorem 3.14 in the finite-type case, we introduce
some notation. Fix a rooted ordered tree (t, q) of r generations, where the vertex v ∈ V (t)
has type q(v) for all v ∈ V (t). It will be convenient again to think of t as being labeled in
the Ulam–Harris way, so that a vertex v in generation r has label ∅a1 · · · ar , where ai ∈ N
(recall Section 1.5).
We also randomly order the vertices in [n], and let B̄r(Gn ;Q) (v) be the ordered version of
Br(Gn ;Q)
(v), where the neighbors of each vertex are placed in increasing order of their labels.
Let
1{B̄r(Gn ;Q) (v)=(t,q)}
X
Nn,r (t, q) = (3.5.4)
v∈[n]

denote the number of vertices whose ordered local neighborhood up to generation r, includ-
ing their types, equals (t, q). Here, in B̄r(Gn ;Q) (v), we record the types of the vertices in
B̄r(Gn ) (v). Theorem 2.15 implies that in order to prove Theorem 3.14, we need to show that
Nn,r (t, q) P
−→ ρ(B̄r(G,Q) (o) = (t, q)), (3.5.5)
n
where (B̄r(G,Q) (o))r≥0 are the vertex-marked r-neighborhoods of the unimodular branching
process (G, o) ∼ ρ described in Theorem 3.14, including the types of the tree vertices.
Recall that these neighborhoods are ordered trees. This implies the convergence in proba-
bility of marked rooted graphs, discussed in Section 2.3.5, and the usual local convergence
in probability of the (unmarked) neighborhood follows by summing over the types of the
vertices in B̄r(Gn ) (v). Let
1{B̄r(Gn ) (v)=t} .
X
Nn,r (t) = (3.5.6)
v∈[n]

Then, indeed, since there is only a finite number of types, (3.5.5) also implies that Nn,r (t)/n
P
−→ ρ(t), where, with a slight abuse of notation, we write
X
ρ(t) = ρ(B̄r(G,Q) (o) = (t, q)) = ρ(B̄r(G) (o) = t) (3.5.7)
q

for the probability that the branching process produces a certain marked tree. We can then
apply Theorem 2.15.
To prove (3.5.5), we follow the usual pattern of using a second-moment method. We first
prove that the first moment satisfies E[Nn,r (t, q)]/n → ρ(B̄r(G,Q) (o) = (t, q)), after which
3.5 Local Convergence for Inhomogeneous Random Graphs 117

we prove that Var(Nn,r (t, q)) = o(n2 ). Then, (3.5.5) follows by the Chebychev inequality
([V1, Theorem 2.18]).

Local Convergence: First Moment

We start by noting that
1
E[Nn,r (t, q)] = P(B̄r(Gn ;Q) (on ) = (t, q)), (3.5.8)
n
where on ∈ [n] is a vertex chosen uar. Our aim is to prove that P(B̄r(Gn ;Q) (on ) = (t, q)) →
ρ(B̄r(G,Q) (o) = (t, q)).
Let us start with the branching process and analyze ρ(B̄r(G,Q) (o) = (t, q)). Fix a vertex
v ∈ V (t) of type q(v). The probability of obtaining a sequence of dv children of (ordered)
types (q(v1), . . . , q(vdv )) equals
dv
λ(q(v))dv Y κ(q(v), q(vj))µ(q(vj))
e−λ(q(v))
dv ! j=1
λ(q(v))
dv
−λ(q(v)) 1 Y
=e κ(q(v), q(vj))µ(q(vj)), (3.5.9)
dv ! j=1

since we first draw a Poisson λ(q(v)) number of children, and then assign a type q to each of
them with probability κ(q(v), q)µ(q)/λ(q(v)). This is true independently for all v ∈ V (t)
with |v| ≤ r − 1, so that
dv
Y 1 Y
ρ(B̄r(G,Q) (o) = (t, q)) = e−λ(q(v)) κ(q(v), q(vj))µ(q(vj)). (3.5.10)
v∈V (t) : |v|≤r−1
dv ! j=1

For a comparison with the graph exploration, it turns out to be convenient to rewrite this
probability slightly. Let t≤r−1 = {v : |v| ≤ r − 1} denote the vertices in the first r − 1
generations of t and let |t≤r−1 | denote its size. We can order the elements of t≤r−1 in their
|t≤r−1 |
lexicographic or Ulam–Harris ordering as (vi )i=1 (recall Definition 1.24 in Section 1.5).
Then we can write
|t≤r−1 | dv
Y
−λ(q(vi )) 1 Yi
ρ(B̄r(G,Q)
(o) = (t, q)) = e κ(q(vi ), q(vi j))µ(q(vi j)). (3.5.11)
i=1
dvi ! j=1

Let us now turn to IRGn (κn ). Fix a vertex v ∈ [n] of type q(v). Recall that nq denotes
the number of type q vertices. The probability of obtaining a sequence of dv neighbors of
(ordered) types (q(v1), . . . , q(vdv )) equals
dv
1 Y κn (q(v), q) nq −mq Y κn (q(v), q(vj))
1− [nqj − mq(vj) (j − 1)], (3.5.12)
dv ! q∈S n j=1
n

where mq = #{i : q(vi) = q} is the number of type-q vertices in (q(v1), . . . , q(vdv )) and
mq(vj) (j) = #{i ≤ j : q(vi) = q} is the number of type-q vertices in (q(v1), . . . , q(vj)).
Here, the first factor, 1/dv !, arises since we are assigning an ordering on all vertices uar,
118 Connected Components in General Inhomogeneous Random Graphs

the second factor, involving the product over q ∈ S , since all other edges (except for the
specified ones) need to be absent, and the third factor, involving the product over j ∈ [dv ],
specifies that the edges to vertices of the (ordered) sequence of types are present.
When n → ∞, κn (q(v), q) → κ(q(v), q) for every q ∈ S since nq /n → µ(q), so that
dv
1 Y κn (q(v), q) nq −mq Y [nqj − mq(vj) (j − 1)]
1− κn (q(v), q(vj))
dv ! q∈S n j=1
n
dv
−λ(q(v)) 1 Y
→e κ(q(v), q(vj))µ(q(vj)), (3.5.13)
dv ! i=1
as required. The above computation, however, ignores the depletion-of-points effect, that
fewer vertices participate in the course of the exploration.
|t≤r−1 |
To describe this, recall the lexicographic ordering of the elements in t≤r−1 as (vi )i=1 ,
and, for a type q , let mq (i) = #{j ∈ [i] : q(vj ) = q} denote the number of type-q
individuals in (t, q) encountered up to and including the ith exploration. Then
|t≤r−1 |
Y 1 Y κn (q(vi ), q) nq −mq (i−1)
P(B̄r
(Gn ;Q)
(o) = (t, q)) = 1− (3.5.14)
i=1
dvi ! q∈S n
dvi
Y κn (q(vi ), q(vi j))
× [nqj − mq(vj) (i + j − 1)].
j=1
n
As n → ∞, this converges to the rhs of (3.5.11), as required. This completes the proof of
(3.5.8), and thus the convergence of the first moment, which, in turn, implies local weak
convergence.

Local Convergence in Probability: Second Moment

Here, we study the second moment of Nn,r (t, q), and show that it is close to the first moment
squared:
Lemma 3.15 (Concentration of the number of trees) As n → ∞,
Var(Nn,r (t, q)2 )
→ 0. (3.5.15)
n2
P
Consequently, Nn,r (t, q)/n −→ ρ(B̄r(G,Q) (o) = (t, q)).
Proof We start by computing
E[Nn,r (t, q)2 ]
= P(B̄r(n,Q) (o1 ) = B̄r(n,Q) (o2 ) = (t, q)), (3.5.16)
n2
where o1 , o2 ∈ [n] are two vertices chosen independently and uar from [n].
By Corollary 2.20 and by local weak convergence, as proved above, distIRGn (κn ) (o1 , o2 ) >
2r whp for any r fixed. Thus,
E[Nn,r (t, q)2 ]
= P(B̄r(Gn ;Q) (o1 ), B̄r(Gn ;Q) (o2 ) = (t, q), o2 ∈ (Gn )
/ B2r (o1 )) + o(1).
n2
(3.5.17)
3.5 Local Convergence for Inhomogeneous Random Graphs 119

We now condition on B̄r(Gn ;Q) (o1 ) ' (t, q), and write

P(B̄r(Gn ;Q) (o1 ), B̄r(Gn ;Q) (o2 ) = (t, q), o2 ∈ (Gn )

/ B2r (o1 ))
= P(B̄r(Gn ;Q) (o2 ) = (t, q) | B̄r(Gn ;Q) (o1 ) = (t, q), o2 ∈ (Gn )
/ B2r (o1 ))
× P(B̄r (Gn ;Q)
(o1 ) = (t, q), o2 ∈ (Gn )
/ B2r (o1 )). (3.5.18)

We already know that P(B̄r(Gn ;Q) (o1 ) = (t, q)) → ρ(B̄r(G,Q) (o) = (t, q)), so that also

P(B̄r(Gn ;Q) (o1 ) = (t, q), o2 ∈ (Gn )

/ B2r (o1 )) → ρ(B̄r(G,Q) (o) = (t, q)). (3.5.19)

In Exercise 3.24, you can prove that (3.5.19) does indeed hold.
We next investigate the conditional probability given B̄r(Gn ;Q) (o1 ) = (t, q) and o2 ∈ /
(Gn )
B2r (o1 ), by noting that the probability that B̄r (Gn ;Q)
(o2 ) = (t, q) is the same as the prob-
ability that B̄r(Gn ;Q) (o2 ) = (t, q) in IRGn0 (κn ), which is obtained by removing the vertices
in Br(Gn ;Q) (o1 ), as well as the edges from them, from IRGn (κn ). We conclude that the result-
ing random graph has n0 = n − |V (t)| vertices, and n0q = nq − mq vertices of type q ∈ [t],
where mq is the number of type-q vertices in (t, q). Further, κn0 (s, r) = κn (s, r)n0 /n. The
whole point is that κn0 (s, r) → κ(s, r) and n0q /n → µ(q) still hold. Therefore, we also
have

P B̄r(Gn ;Q) (o2 ) = (t, q) | Br(Gn ;Q) (o1 ) = (t, q), o2 ∈ (Gn )

/ B2r (o1 )
→ ρ(B̄r(G,Q) (o) = (t, q)). (3.5.20)

and we have proved that E[Nn,r (t, q)2 ]/n2 → ρ(B̄r(G,Q) (o) = (t, q))2 . From this, (3.5.15)
follows directly since E[Nn,r (t, q)]/n → ρ(B̄r(G,Q) (o) = (t, q)). As a result, Nn,r (t, q)/n
P
−→ ρ(B̄r(G,Q) (o) = (t, q)), as required.

Lemma 3.15 completes the proof of Theorem 3.14 in the finite-type case.

3.5.2 L OCAL C ONVERGENCE : I NFINITELY M ANY T YPES

We next extend the proof of Theorem 3.14 to the infinite-types case. We follow the strategy
in Section 3.3.3.
Fix a general sequence of graphical kernels (κn ). For n ≥ m, again define κm by
(3.3.19), so that κn ≥ κm . Couple the random graphs IRGn (κm ) and IRGn (κn ) in such
a way that E(IRGn (κm )) ⊆ E(IRGn (κn )). Let ε0 > 0 be given. Recall (3.3.24), which
shows that we can take m so large that the bound
X
|E(IRGn (κm ))| ≤ |E(IRGn (κn ))| = (Du − Du(m) ) ≤ ε0 n (3.5.21)
u∈[n]

holds whp. We let K denote the maximal degree in t. Let Nn,r (m)
(t, q) denote Nn,r (t, q) for
the kernel κm (and keep Nn,r (t, q) as in (3.5.4) for the kernel κn ).
If a vertex v is such that Br(Gn ;Q) (v) ' (t, q) in IRGn (κm ), but not in IRGn (κn ), or
(Gn ;Q)
vice versa, then one vertex in Br−1 (v) needs to have a different degree in IRGn (κm )
120 Connected Components in General Inhomogeneous Random Graphs

from that in IRGn (κn ). Thus,

|Nn,r
(m)
(t, q) − Nn,r (t, q)|
1{u∈B(Gn ;Q) (v),B̄r(Gn ;Q) (v)=(t,q) in IRGn (κ 1{D
X
≤ (m)
r−1 m )} u 6=Du }
u,v

1{u∈B 1{D
X
+ (Gn ;Q) (G ;Q)
(v),B̄r n (v)=(t,q) in IRGn (κn )}
(m)
u 6=Du }
. (3.5.22)
r−1
u,v

Recall that the maximal degree of any vertex in V (t) is K . Further, if B̄r(Gn ;Q) (v) = (t, q)
and u ∈ Br−1(Gn ;Q)
(v), then all the vertices on the path between u and v have degree at most
K . Therefore,
Kr − 1
1{u∈Br−1
X X
`
(Gn ;Q) (Gn ;Q)
(v),B̄r (v)=(t,q) in IRGn (κm )}
≤ K ≤ , (3.5.23)
v `≤r−1
K −1
and, in the same way,
Kr − 1
1{u∈B
X
(Gn ;Q) (G ;Q)
(v),B̄r n (v)=(t,q) in IRGn (κn )}
≤ . (3.5.24)
v
r−1 K −1
We thus conclude that whp
Kr − 1 X
|Nn,r
(m)
(t, q) − Nn,r (t, q)| ≤ 2 1 (m)
K − 1 u∈[n] {Du 6=Du }
K r − 1 X (m) Kr − 1 0
≤2 (Du − Du ) ≤ 2 ε n. (3.5.25)
K − 1 u∈[n] K −1

Taking ε0 = ε(K − 1)/[2(K r − 1)], we thus obtain that whp

|Nn,r
(m)
(t, q) − Nn,r (t, q)| ≤ εn, (3.5.26)
as required.
For Nn,r
(m)
(t, q), we can use Theorem 3.14 in the finite-types case to obtain
1 (m) P
Nn,r (t, q) −→ ρ(B̄r(G,Q,m) (o) = (t, q)). (3.5.27)
n
The fact that m can be taken so large that
|ρ(B̄r(G,Q,m) (o) = (t, q)) − ρ(B̄r(G,Q) (o) = (t, q))| ≤ ε (3.5.28)
follows from Theorem 3.13.

3.5.3 C OMPARISON WITH B RANCHING P ROCESSES

In this subsection we describe a beautiful comparison of the neighborhoods of a uniformly
chosen vertex in rank-1 inhomogeneous random graphs, such as the generalized random
graph, the Chung–Lu model, or the Norros–Reittu model, with a marked branching pro-
cess. This comparison is particularly pretty when one considers the Norros–Reittu model,
in which such neighborhoods are stochastically dominated by a mixed-Poisson branching
process. We start by describing the result for the rank-1 setting, after which we extend it to
3.5 Local Convergence for Inhomogeneous Random Graphs 121

kernels with finitely many types. Recall the rank-1 Poisson branching processes defined in
Section 3.4.3, and recall that there such branching processes were shown to be equivalent to
one-type mixed-Poisson branching processes.

Stochastic Domination of Connected Components by a Branching Process

We dominate the connected component of a vertex in the Norros–Reittu model by the total
progeny of a unimodular branching processes with mixed-Poisson offspring. This domina-
tion allows us to control differences, and makes the heuristic argument after Theorem 3.20
below precise.
Define the mark random variable or mark to be the random variable M with distribution
P(M = m) = wm /`n , m ∈ [n]. (3.5.29)
Let (Xv )v be a collection of independent random variables, where:
(a) the number of children of the root X∅ has a mixed-Poisson distribution with random
parameter wM∅ , where M∅ is uniformly chosen in [n];
(b) the number of children Xv of a node v in the Ulam–Harris tree has a mixed-Poisson
distribution with random parameter wMv , where (Mv )v6=∅ are iid random marks with
distribution (3.5.29) that are independent of M∅ .
We call (Xv , Mv )v a marked mixed-Poisson branching process (MMPBP).
Clearly, wM∅ has the same distribution as Wn defined in (1.3.10), while the distribution
of wMv for each v with |v| ≥ 1 is iid with distribution wM given by
n n
1 X
1{w wm 1{wm ≤x}
X
P(wM ≤ x) = m ≤x}
P(M = m) =
m=1
`n m=1
= P(Wn? ≤ x) = Fn? (x), (3.5.30)
where Wn? is the size-biased version of Wn .
When we are interested only in the numbers of individuals, we then obtain a unimodular
branching process since the random variables (Xv )v are independent, and the random vari-
ables (Xv )v6=∅ are iid (see Exercise 3.25). However, in what follows, we make explicit use
of the marks (Mv )v6=∅ , as the complete information (Xv , Mv )v gives us a way to retrieve
the connected component of the vertex M∅ , something that would not be possible on the
basis of (Xv )v only.
In order to define the neighborhood exploration in NRn (w), we introduce a thinning
procedure that guarantees that we inspect a vertex only once. We think of Mv as being the
vertex label in NRn (w) of the node v in the Ulam–Harris tree, and Xv = Poi(wMv ) as
its potential number of children. These potential children effectively become children when
their marks correspond to vertices in NRn (w) that have not yet appeared in the exploration.
The thinning ensures this. To describe the thinning, we set
(a) ∅ unthinned; and,
(b) for v with v 6= ∅, we thin v when either (i) one tree vertex on the (unique) path between
the root ∅ and v has been thinned; or (ii) when Mv = Mv0 for some unthinned vertex
v 0 < v . Here we recall the breadth-first ordering in the Ulam–Harris tree from Definition
1.24 in Section 1.5.
122 Connected Components in General Inhomogeneous Random Graphs

We now make the connection between the thinned marked mixed-Poisson branching pro-
cess and neighborhood exploration precise:
Proposition 3.16 (Connected components as thinned marked branching processes) The
connected component of a uniformly chosen vertex C (o) is equal in distribution to the
set of vertices {Mv : v unthinned} ⊆ [n] and the edges between them inherited by the
marked mixed-Poisson branching process (Xv , Mv )v . Here, {Mv : v unthinned} consists
of the marks of unthinned tree nodes encountered in the marked mixed-Poisson branching
process up to the end of the exploration. Consequently, the set of vertices at graph distance
r from o has the same distribution as

{Mv : v unthinned, |v| = r} . (3.5.31)
r≥0

Proof We prove the two statements simultaneously. By construction, the distribution of o is

the same as that of M∅ , the mark of the root of the marked mixed-Poisson branching process.
We continue by proving that the direct neighbors of the root ∅ agree in both constructions.
In NRn (w), conditional on M∅ = l, the direct neighbors are equal to {j ∈ [n] \ {l} : Ilj =
1}, where (Ilj )j∈[n]\{l} are independent Be(plj ) random variables with plj = 1−e−wl wj /`n .
We now prove that the same is true for the marked mixed-Poisson branching process.
Conditional on M∅ = l, the root has a Poi(wl ) number of children, where these Poi(wl )
offspring receive iid marks. We make use of the fundamental “thinning” property of the
Poisson distribution in Lemma 3.11. By Lemma 3.11, the random vector (X∅,j )j∈[n] , where
X∅,j is the number of offspring of the root that receive mark j , is a vector of independent
Poisson random variables with parameters wl wj /`n . Owing to the thinning, a mark occurs
precisely when X∅,j ≥ 1. Therefore, the mark j occurs, independently for all j ∈ [n], with
probability

P(X∅,j ≥ 1) = 1 − P(X∅,j = 0) = 1 − e−wl wj /`n = p(NR)

jl . (3.5.32)

This proves that the set of marks of the children of the root in the MMPBD has the same
distribution as the set of neighbors of the chosen vertex in NRn (w).
Next, we look at the number of new elements of C (o) neighboring a vertex found in
the exploration. Fix one such vertex, and let its tree label be v = ∅v1 . First, condition on
Mv = l, and assume that v is not thinned. Conditional on Mv = l, the number of children
of v in the MMPBP has distribution Poi(wl ). Each of these Poi(wl ) children receives an iid
mark. Let Xv,j denote the number of children of v that receive mark j .
By Lemma 3.11, (Xv,j )j∈[n] is again a vector of independent Poisson random variables
with parameters wl wj /`n . Owing to the thinning, a mark appears within the offspring of
individual v precisely when Xv,j ≥ 1, and these events are independent. In particular, for
each j that has not appeared as the mark of an unthinned vertex, the probability that it occurs
as the child of a vertex having mark l equals 1 − e−wj wl /`n = p(NR)
lj , as required.

Stochastic Domination by Branching Processes: Finite-Type Case

The rank-1 setting described above is special, since the marks of vertices in the tree are in-
dependent random variables in that they do not depend on the type of their parent. However,
this is not true in general. We next describe how the result can be generalized. We restrict
3.5 Local Convergence for Inhomogeneous Random Graphs 123

to the finite-type case for convenience. Further, we let the edge probabilities of our random
graph be given by
−κn (su ,sv )/n
uv = 1 − e
puv = p(NR) , (3.5.33)

where su ∈ [t] is the type of vertex u ∈ [n] and S = [t] is the collection of types.
Let us introduce some notation. Recall that ns denotes the number of vertices of type
s ∈ [t], and write n≤s = r≤s nr . Define the intervals Is = [n≤s ] \ [n≤s−1 ] (where, by
P
convention, I0 is the empty set). We note that all vertices in the intervals Is play the same
role, and this is used crucially in the coupling that we present below.
We now describe the cluster exploration of a uniformly chosen vertex o ∈ [n], which has
type s with probability µn (s) = ns /n. To define the cluster of o, as well as the types of
the vertices in it, we define the mark distribution of a tree vertex of type r to be the random
variable M (r) with distribution
1
P(M (r) = `) = , ` ∈ Ir . (3.5.34)
nr
Let (Xv , Tv , Mv )v be a collection of random variables, where:

(a) the root ∅ has type s with probability µn (s) = ns /n, and, given the type s of the
P X∅ of the root has a mixed-Poisson distribution with ran-
root, the number of children
dom parameter λn (s) = r∈[t] κn (s, r)µn (r), where each child v with |v| = 1 of ∅
independently receives a type Tv , where Tv = r with probability κn (s, r)µn (r)/λn (s);
(b) given its type s, the number of childrenP Xv of a tree vertex v has a mixed-Poisson
distribution with parameter λn (s) = r∈[t] κn (s, r)µn (r), and child vj with j ≥ 1 of
v receives a type Tvj , where Tvj = r with probability κn (s, r)µn (r)/λn (s);
(c) given that a tree vertex v has type r, it receives an independent mark Mv (r) with distri-
bution in (3.5.34).

We call (Xv , Tv , Mv )v a marked multi-type Poisson branching process. Then, the follow-
ing extension of Proposition 3.16 holds:

Proposition 3.17 (Connected components as thinned marked multi-type branching pro-

cesses) The connected component of a vertex C (v) of type s is equal in distribution to
the set of vertices {Mv : v unthinned} ⊆ [n] and the edges between them inherited by
(Xv , Tv , Mv )v . Here, {Mv : v unthinned} are the marks of unthinned tree vertices encoun-
tered in the marked multi-type Poisson branching process up to the end of the exploration.
Similarly, the set of vertices at graph distance r from o has the same distribution as

{Mv : v unthinned, |v| = r} . (3.5.35)
r≥0

The reader is asked to prove Proposition 3.17 in Exercise 3.28.

3.5.4 L OCAL C ONVERGENCE OF G ENERALIZED R ANDOM G RAPHS

We close this section by investigating the locally tree-like nature of the generalized random
graph. Our main result is as follows:
124 Connected Components in General Inhomogeneous Random Graphs

Theorem 3.18 (Locally tree-like nature of GRGn (w)) Assume that Conditions 1.1(a),(b)
hold. Then GRGn (w) converges locally in probability to the unimodular branching-process
tree, with offspring distribution (pk )k≥0 given by
h Wki
pk = P(D = k) = E e−W . (3.5.36)
k!
This result also applies to NRn (w) and CLn (w) under the same conditions.
Theorem 3.18 follows directly from Theorem 3.14. However, we give an alternative proof
that relies on the locally tree-like nature of CMn (d) proved in Theorem 4.1 below and the
relation between GRGn (w) and CMn (d) discussed in Section 1.3 and Theorem 1.9. This
approach is interesting in itself, since it allows for general proofs for GRGn (w) by proving
the result first for CMn (d), and then merely extending it to GRGn (w). We frequently rely
on such a proof strategy for GRGn (w).

3.6 P HASE T RANSITION FOR I NHOMOGENEOUS R ANDOM G RAPHS

In this section we discuss the phase transition in IRGn (κn ). The main result shows that
there is a giant component when the associated multi-type branching process is supercritical
(recall Definition 3.12), while otherwise there is not:
Theorem 3.19 (Giant component of IRG) Let (κn ) be a sequence of irreducible graphical
kernels with limit κ, and let Cmax and C(2) denote the two largest connected components of
IRGn (κn ) (breaking ties arbitrarily). Then,
P
|Cmax |/n −→ ζκ , (3.6.1)
P
and |C(2) |/n −→ 0. In all cases ζκ < 1, while ζκ > 0 precisely when kT κ k > 1.
Theorem 3.19 is a generalization of the law of large numbers for the largest connected
component in [V1, Theorem 4.8] for ERn (λ/n) (see Exercise 3.31); recall also Theorem
2.34.
We do not give a complete proof of Theorem 3.19 in this chapter. The upper bound follows
directly from the local convergence in Theorem 3.14, together with Corollary 2.27. For the
lower bound, it suffices to prove this for kernels with finitely-many types, by Proposition
3.5. This proof is deferred to Section 6.5.3 in Chapter 6. We close this section by discussing
a few examples of Theorem 3.19.

Bipartite Random Graph

We let n be even and take S = {1, 2} and
κn (s, r) = κ(s, r) = λ1{s6=r} /2. (3.6.2)
Thus, for u < v , the edge probabilities puv given by (3.2.6) are equal to λ/(2n) (for 2n >
λ) when u ∈ [n/2] and v ∈ [n] \ [n/2], and equal 0 otherwise.
In this case, kT κ k = λ with corresponding eigenfunction f (s) = 1 for all s ∈ S .
Thus, Theorem 3.19 proves that there is a phase transition at λ = 2. Furthermore, the
function ζλ (s) reduces to the single value ζλ/2 , which is the survival probability of a Poisson
3.6 Phase Transition for Inhomogeneous Random Graphs 125

branching process with mean offspring λ/2. This is not surprising, since the degree of each
vertex is Bin(n/2, λ/n), so the bipartite random graph of size n is, in terms of its local
structure, closely related to the Erdős–Rényi random graph of size n/2.

Finite-Type Case
The bipartite random graph can also be viewed as a random graph with two types of vertices
(i.e., the vertices [n/2] and [n] \ [n/2]). We now generalize our results to the finite-type
case, in which we have seen that κn is equivalent to a t × t matrix (κn (s, r))s,r∈[t] , where
t denotes the number of types. In this case, IRGn (κn ) has vertices of t different types (or
colors), say ns vertices of type s, where two vertices of type s and r are joined by an edge
with probability n−1 κn (s, r) ∧ 1. Exercises 3.29 and 3.30 investigate the phase transition in
the finite-type case.

Random Graph with Prescribed Expected Degrees

We next consider the Chung–Lu model or expected-degree random graph, where κn is given
by (3.2.10), i.e., κn (u/n, v/n) = wu wv /E[Wn ] for all u, v ∈ [n] with u 6= v .
We first assume that Conditions 1.1(a)–(c) hold, so that in particular E[W 2 ] < ∞, where
W has distribution function F . A particular instance of this case is the choice wv = [1 −
F ]−1 (v/n) for v ∈ [n] in (1.3.15) when W ∼ F and E[W ] < ∞. In this case, the sequence
(κn ) converges to κ, where the limit κ is given by (recall (3.2.10))

κ(x, y) = ψ(x)ψ(y)/E[W ], (3.6.3)

where ψ(x) = [1 − F ]−1 (x). We have already obtained kT κ k = kψk2 /

R
S
ψ(x)µ(dx) =
E[W 2 ]/E[W ]. Thus,

kT κ k = E[W 2 ]/E[W ]. (3.6.4)

In the case where E[W 2 ] = ∞, on the other hand, we take fε (x) = cψ(x)1{x∈[ε,1]} , where
c = cε is such that kfε k = 1. Then, kT κ fε k → ∞, so that kT κ k = ∞, and CLn (w) is
always supercritical in this regime.

General Rank-1 Setting

Theorem 3.19 identifies the phase transition in IRGn (κn ). For the rank-1 setting, we prove
some quite strong results, as we now discuss. We denote the complexity of a connected
component C by E(C ) − V (C ) + 1, which equals the maximal number of edges that need
to be removed to turn C into a tree. We further recall that vk (C ) denotes the number of
vertices of degree k in the connected component C . Our main result is as follows:

Theorem 3.20 (Phase transition in generalized random graphs) Suppose that Conditions
1.1(a),(b) hold and consider the random graphs GRGn (w), CLn (w) or NRn (w), letting
n → ∞. Denote pk = P(Poi(W ) = k) as defined below (1.3.22). Let Cmax and C(2) be the
largest and second largest components of GRGn (w), CLn (w), or NRn (w).
126 Connected Components in General Inhomogeneous Random Graphs

(a) If ν = E[W 2 ]/E[W ] > 1, then there exist ξ ∈ (0, 1), ζ ∈ (0, 1) such that
P
|Cmax |/n −→ ζ,
vk (Cmax )/n −→ pk (1 − ξ k ), for every k ≥ 0,
P

|E(Cmax )|/n −→ 21 E[W ](1 − ξ 2 ).

while |C(2) |/n −→ 0 and |E(C(2) )|/n −→ 0. Further, 12 E[W ](1 − ξ 2 ) > ζ , so that the
P P

complexity of the giant is linear.

(b) If ν = E[W 2 ]/E[W ] ≤ 1, then |Cmax |/n −→ 0 and |E(Cmax )|/n −→ 0.
P P

The proof of Theorem 3.20, except for the proof of the linear complexity, is deferred to
Section 4.3.2 in Chapter 4, where a similar result is proved for the configuration model.
By the strong relation between the configuration model and the generalized random graph
(recall Theorem 1.9), this result can be seen to imply Theorem 3.20.
Let us discuss some implications of Theorem 3.20, focussing on the supercritical case
where ν = E[W 2 ]/E[W ] > 1. In this case, the parameter ξ is the extinction probability of
a branching process with offspring distribution p?k = P(Poi(W ? ) = k), where W ? is the
size-biased version of W . Thus,
?
(ξ−1)
ξ = GPoi(W ? ) (ξ) = E[eW ], (3.6.5)
?
where GPoi(W ? ) (s) = E[sPoi(W ) ] is the probability generating function of a mixed-Poisson
random variable with mixing distribution W ? .
Further, since vk (Cmax )/n −→ pk (1 − ξ k ) and |Cmax |/n −→ ζ , it must hold that
P P

X
ζ= pk (1 − ξ k ) = 1 − GD (ξ), (3.6.6)
k≥0

where GD (s) = E[sD ] is the probability generating function of D = Poi(W ). We also note
P
that |E(Cmax )|/n −→ η , and compute
X X kpk
η= 1
kpk (1 − ξ k ) = 21 E[W ] (1 − ξ k )
2
k≥0 k≥0
E[W ]
1
E[W ] 1 − ξGPoi(W ? ) (ξ) = 12 E[W ] 1 − ξ 2 ),

= 2
(3.6.7)
as required.
We now compare the limiting total number of edges with |Cmax |. Recall the useful corre-
lation inequality in [V1, Lemma 2.14] that states that E[f (X)g(X)] ≥ E[f (X)]E[g(X)]
for any non-decreasing functions f and g and random variable X . Applying this to f (k) = k
and g(k) = 1 − ξ k , which are both increasing, leads to
X X X
kpk (1 − ξ k ) > kpk (1 − ξ k )pk = E[W ]ζ. (3.6.8)
k≥0 k≥0 k≥0

As a result, by (3.6.7),
X
η= 1
2
kpk (1 − ξ k ) > 12 E[W ]ζ. (3.6.9)
k≥0
3.6 Phase Transition for Inhomogeneous Random Graphs 127

Thus, the average degree η/ζ in the giant component is strictly larger than the average degree
in the entire graph E[W ]/2.
We finally show that η > ζ , so that the giant has linear complexity. By convexity of
x 7→ xk−1 and the fact that ξ < 1, for k ≥ 1, we have
k−1
X
ξ i ≤ k(1 + ξ k−1 )/2, (3.6.10)
i=0

with strict inequality for k ≥ 3. Multiply by 1 − ξ to obtain

1 − ξ k ≤ k(1 − ξ)(1 + ξ k−1 )/2, (3.6.11)
again for every k ≥ 1, again with strict inequality for k ≥ 3. Now multiply by pk , and sum
to get, since pk > 0 for all k > 3,
X X
pk (1 − ξ k ) < (1 − ξ) kpk (1 + ξ k−1 )/2. (3.6.12)
k k

The lhs of (3.6.12) equals ζ by (3.6.6). We next investigate the rhs of (3.6.12). Recall that
X
kpk = E[W ], (3.6.13)
k

and, by (3.6.5),
X kpk
ξ k−1 = ξ. (3.6.14)
k
E[W ]
Hence, the rhs of (3.6.12) is
(1 − ξ)(E[W ] + E[W ]ξ)/2 = E[W ](1 − ξ 2 )/2 = η, (3.6.15)
which is the limit in probability of |E(Cmax )|/n. Thus, ζ < η .

Attack Vulnerability of CLn (w)

Suppose an adversary attacks a network by removing some of its vertices. A clever adversary
would do this in a smart way; this is often referred to as a deliberate attack. On the other
hand, the vertices might also be exposed to random failures, which is often referred to as
a random attack. The results as stated above do not specifically apply to these settings, but
they do have intuitive consequences.
We model a deliberate attack as the removal of a proportion of the vertices with high-
est weights, whereas a random attack is modeled by random removal of the vertices inde-
pendently with a given fixed probability. One of our aims is to quantify the effect of such
attacks, and in particular the difference between random and deliberate attacks. We denote
the proportion of remaining vertices by p. We always assume that ν > 1, so that a giant
component exists before the attack, and we investigate under what conditions on p and the
graph CLn (w), the giant component remains in existence.
We start by addressing the case of random attack for the CLn (w) model under Conditions
1.1(a)–(c), where E[W 2 ] < ∞. One of the difficulties of the above set-up is that we remove
vertices rather than edges, so that the resulting graph is no longer an IRG. In percolation
128 Connected Components in General Inhomogeneous Random Graphs

jargon, we are dealing with site percolation rather than with bond percolation. We will start
by relating the obtained graph to an inhomogeneous random graph.
Note that when we explore a connected component of a vertex after a random attack, the
vertex may not have been affected by the attack, which has probability p. After this, in the
exploration, we always inspect an edge between a vertex that is unaffected by the attack
and a vertex of which we do not yet know whether it has been attacked or not. As a result,
for random attacks, the probability that it is affected equals p independently of the past
randomness. Therefore, it is similar to the random graph where puv is replaced by p × puv .
For a branching process, this identification is exact, and we have that ζκ,p = pζpκ , where
ζκ,p denotes the survival probability of the unimodular multi-type marked branching-process
tree in Theorem 3.14, where additionally each individual in the tree is killed with probability
1 − p independently of all other randomness. For CLn (w), this equality is only asymptotic.
In the case where E[W 2 ] < ∞, so that ν < ∞, this means that there exists a critical value
pc = 1/ν , such that, if p > pc , the giant component persists in CLn (w), where vertices
are removed with probability 1 − p, while the giant component is destroyed for p ≤ pc .
Thus, when E[W 2 ] < ∞, CLn (w) is sensitive to random attacks. When E[W 2 ] = ∞, on
the other hand, ν = ∞ also, so that the giant component persists for every p ∈ [0, 1), and
the graph is called robust to random attacks. Here we must note that the size of the giant
component does decrease, since ζκ,p < pζκ < ζκ !
To mimic a deliberate attack, we remove a proportion p of vertices with highest weight.
For convenience, we assume that w = (w1 , . . . , wn ) is non-increasing. Then, removing a
proportion p of the vertices with highest weight means that w is replaced with w(p), which
is equal to wv (p) = wv 1{v>np} , and we denote the resulting edge probabilities by

puv (p) = max{1, wu (p)wv (p)/`n }. (3.6.16)

In this case, the resulting graph on [n] \ [n(1 − p)] is again a Chung–Lu model for which ν
is replaced with ν(p), given by

ν(p) = E[ψ(U )2 1{U >1−p} ]/E[W ], (3.6.17)

where U is uniform on [0, 1] and we recall that we have written ψ(u) = [1−F ]−1 (u). Now,
for any distribution function F , E[[1 − F ]−1 (U )2 1{U >p} ] < ∞, so that, for p sufficiently
close to 0, ν(p) < 1 (see Exercise 3.39). Thus, the CLn (w) model is always sensitive to
deliberate attacks.

Phase Transitions in Uniformly Grown Random Graphs and for Sum Kernels
Recall the definition of the uniformly grown random graph in (3.2.12). A vertex v is con-
nected independently with all u ∈ [v − 1] with probability puv = λ/v . This leads to an
inhomogeneous random graph with type space S = [0, 1] and limiting kernel κ(x, y) =
λ/(x ∨ y). It is non-trivial to compute kT κ k, but remarkably this can be done, to yield
kT κ k = 4λ, so that a giant exists for all λ > 14 . We do not give the proof of kT κ k = 4λ
here, and refer to Exercise 3.40 for details. Exercise 3.41 investigates when there is a giant
for sum kernels, as in (3.2.13).
3.7 Related Results for Inhomogeneous Random Graphs 129

3.7 R ELATED R ESULTS FOR I NHOMOGENEOUS R ANDOM G RAPHS

In this section, we discuss some related results for inhomogeneous random graphs. While
we give intuition about their proofs, we do not include them in full detail.

Largest Subcritical Cluster

For the classical ERn (λ/n), it is well known that the stronger bound |Cmax | = ΘP (log n)
holds in the subcritical case for which λ < 1 (see [V1, Theorems 4.4 and 4.5]), and that
|C(2) | = ΘP (log n) in the supercritical case for which λ > 1. These bounds do not always
hold in the general framework considered here, but if we add some conditions then we can
improve the estimates in Theorem 3.19 for the subcritical case to OP (log n):
Theorem 3.21 (Largest subcritical and second largest supercritical clusters) Consider the
inhomogeneous random graph IRGn (κn ), where (κn ) is a graphical sequence of kernels
with limit κ. It follows that:
(a) if κ is subcritical and supx,y,n κn (x, y) < ∞, then |Cmax | = OP (log n);
(b) if κ is supercritical, κ is irreducible, and either
inf κn (x, y) > 0 or sup κn (x, y) < ∞,
x,y,n x,y,n

then |C(2) | = OP (log n).

When limn→∞ supx,y κn (x, y) = ∞, the largest subcritical clusters can have rather
different behavior, as we now show for the rank-1 case. Note that, by Theorem 3.19 as well
as the fact that kT κ k = ν = E[W 2 ]/E[W ], a rank-1 model can be subcritical only when
E[W 2 ] < ∞, i.e., in the case of finite-variance degrees. However, when W has a power-law
tail, the highest weight can be much larger than log n. Then the largest subcritical cluster is
also much larger than log n:
Theorem 3.22 (Subcritical phase for rank-1 inhomogeneous random graphs) Let w satisfy
Conditions 1.1(a)–(c) with ν = E[W 2 ]/E[W ] < 1, and assume further that there exist
τ > 3 and c2 > 0 such that, for all x ≥ 1, the empirical weight distribution satisfies
[1 − Fn ](x) ≤ c2 x−(τ −1) . (3.7.1)
Then, for NRn (w) with wmax = maxv∈[n] wv ,
wmax
|Cmax | = + oP (n1/(τ −1) ). (3.7.2)
1−ν
Theorem 3.22 is most interesting in the case where the limiting distribution function F in
Condition 1.1 has a power-law tail. For example, for w as in (1.3.15), let F satisfy
[1 − F ](x) = cW x−(τ −1) (1 + o(1)). (3.7.3)
Then, wmax = w1 = [1 − F ]−1 (1/n) = (cW n)1/(τ −1) (1 + o(1)). Therefore,
|Cmax | = (cW n)1/(τ −1) /(1 − ν) + oP (n1/(τ −1) ). (3.7.4)
Thus, the largest connected component is much larger than for ERn (λ/n) with λ < 1.
Theorem 3.22 can be intuitively understood as follows. The connected component of a
130 Connected Components in General Inhomogeneous Random Graphs

typical vertex is close to a branching process, so that it is whp bounded, and its expected
connected component size is close to 1/(1 − ν). Thus, the best way to obtain a large con-
nected component is to start with a vertex with high weight wi , and let all of its roughly wi
children be independent branching processes. Therefore, in expectation, each child is con-
nected to another 1/(1 − ν) different vertices, leading to a connected component size of
roughly wi /(1 − ν). This is clearly largest when wi = maxj∈[n] wj = wmax , leading to an
intuitive explanation of Theorem 3.22.
Theorems 3.21 and 3.22 raise the question what the precise conditions are for |Cmax | to
be of order log n. Intuitively, if wmax log n then |Cmax | = wmax /(1 − ν)(1 + oP (1)),
whereas if wmax = Θ(log n) then |Cmax | = ΘP (log n) as well. In Turova (2011), it was
proved that |Cmax |/ log n converges in probability to a finite constant when ν < 1 and
the weights are iid with distribution function F with E[eαW ] < ∞, for some α > 0, i.e.,
exponential tails are sufficient.

Large Deviations for the Erdős–Rényi Giant

We close this section by discussing a beautiful and surprising large-deviations result for the
size of the giant |Cmax | in ERn (λ/n):
Theorem 3.23 (Large deviations for the Erdős–Rényi giant) Consider ERn (λ/n) with
λ > 0. Then, for all u ∈ [0, 1],
1
lim log P(|Cmax | = dnue) = −Jλ (u), (3.7.5)
n→∞ n

where, for u ∈ [uk , uk−1 ],

Jλ (u) = −kum(λu) + ku log u + (1 − ku) log(1 − ku)
+ λu − k(k + 1)λu2 /2, (3.7.6)
with m(u) = log(1 − e−u ) and where u0 = 1 and, for k ≥ 1,
n u o
uk = sup u : = 1 − e−λu . (3.7.7)
1 − ku
Note that u1 = 1 for λ ≤ 1, so that, for all u ∈ [0, 1] and λ ≤ 1,
Jλ (u) = −um(λu) + u log u + (1 − u) log(1 − u) + λu(1 − u). (3.7.8)
This function is nicely convex with a unique minimum for u = 0; the minimum equals 0, as
can be expected.
For λ > 1, however, the situation is more involved. The function u 7→ Jλ (u) still has a
unique minimum for u = ζλ , where Jλ (ζλ ) = 0. However, the function u 7→ Jλ (u) is not
convex and has infinitely many non-analyticities. This can be understood as follows. When
u ≈ ζλ or when u > ζλ , the rate function Jλ (u) measures the exponential rate of the event
that there exists a connected component of size dnue. However, when u becomes quite small
compared with ζλ , not only should there be a connected component of size dnue, but also
all other connected components should be smaller than dnue. Since ERn (λ/n) with Cmax
removed is also an Erdős–Rényi random graph with appropriate parameters, when u is quite
small it again becomes exponentially rare that this Erdős–Rényi random graph has a giant
that has size at most dnue. When u is very small, we need to iterate this several times, and
3.8 Notes and Discussion for Chapter 3 131

the k parameter in (3.7.6) measures how many such exponential contributions arise before
the graph remaining after the removal of all large components becomes such that its giant
has whp size at most dnue.
Interestingly, applying Theorem 3.23 to u = 1 also provides the relation
1
lim log P(ERn (λ/n) connected) = − log(1 − e−λ ); (3.7.9)
n→∞n
see also Exercise 3.42.

3.8 N OTES AND D ISCUSSION FOR C HAPTER 3

Notes on Section 3.1

Example 3.1 already points at general stochastic block models.

Notes on Section 3.2

The seminal paper by Bollobás et al. (2007) studied inhomogeneous random graphs in an even more general
setting, where the number of vertices in the graph need not be equal to n. In this case, the vertex space is
called a generalized vertex space. However, we simplify the discussion here slightly by assuming that the
number of vertices is always equal to n. An example where the extension to a random number of vertices
is crucially used was by Turova and Vallier (2010), who studied an interpolation between percolation and
ERn (p).
The Dubins model was first investigated by Shepp (1989). The uniformly grown random graph was
proposed by Callaway et al. (2001). In their model the graph grows dynamically as follows. At each time
step, a new vertex is added. Further, with probability δ, two vertices are chosen uar and joined by an
undirected edge. This process is repeated for n time steps, where n is the number of vertices in the graph.
Callaway et al. predicted, using physical reasonings, that in the limit of large n, the resulting graph has
a giant component precisely when δ > 81 , and the proportion of vertices in the giant component is of
√
the order e−Θ(1/( 8δ−1)) when δ > 81 is close to 18 . Such behavior is sometimes called an infinite-order
phase transition. Durrett (2003) discusses this model. Here we discussed the variant of this model that was
proposed and analyzed by Bollobás et al. (2005).

Notes on Section 3.3

Theorem 3.4 is a special case of (Bollobás et al., 2007, Theorem 3.13). This section is a minor modifica-
tion of results in Bollobás et al. (2007). Indeed, Proposition 3.5 is an adapted version of (Bollobás et al.,
2007, Lemma 7.3); see also the proof of the irreducibility result in Proposition 3.5(a) there. Lemma 3.6 is
(Bollobás et al., 2007, Lemma 7.1).

Notes on Section 3.4

See (Athreya and Ney, 1972, Chapter V) or (Harris, 1963, Chapter III) for more background on multi-type
branching processes. We note that the convergence in Theorem 3.10(b) may not hold when starting from
an individual of a fixed type. This is, for example, the case when there are two types and individuals only
have children of the other type. Poisson multi-type branching processes are discussed in detail in (Bollobás
et al., 2007, Sections 5 and 6). The Perron–Frobenius Theorem can be found in many places; (Noutsos,
2006, Theorem 2.2) is a convenient reference with many related results.

Notes on Section 3.5

Theorem 3.14 is novel, to the best of our knowledge, even though Bollobás et al. (2007) proved various
relations between inhomogeneous random graphs and branching processes. The use of ρ for the law of the
local limit rather than µ as in Chapter 2 is in honor of Bollobás et al. (2007), who used ρ for the limiting law
of the component size distribution. This proof is closely related to that of Theorem 3.14, even though the
result is not phrased in terms of local convergence. Proposition 3.16 appeared first as (Norros and Reittu,
132 Connected Components in General Inhomogeneous Random Graphs

2006, Proposition 3.1), where the connections between NRn (w) and Poisson branching processes were
first exploited to prove the versions of Theorem 6.3 in Chapter 6.

Notes on Section 3.6

Theorem 3.19 is a special case of (Bollobás et al., 2007, Theorem 3.1). The finite-type case of Theorem
3.19 was studied by Söderberg (2002, 2003a,c,b). Theorem 3.20 was taken from Janson and Luczak (2009),
where the giant component is investigated for the configuration model. We explain the proof in detail in
Section 4.3, where we also prove how the result for the configuration model in Theorem 4.9 can be used
to prove Theorem 3.20. Earlier versions for random graphs with given expected degrees or the Chung–Lu
model appeared in Chung and Lu (2002b, 2004, 2006b) (see also the monograph Chung and Lu (2006a)).
Central limit theorems for the giant component for the rank-1 model as studied in Theorem 3.20 are proved
in Janson (2020a) (see also Janson (2020b)). See Section 4.6 for more details. I learned the proof of the
linear complexity of the giant in Theorem 3.20 from Svante Janson.
Bollobás et al. (2007) proved various other results concerning the giant component of IRGn (κn ). For
example, (Bollobás et al., 2007, Theorem 3.9) proved that the giant component of IRGn (κn ) is stable in
the sense that its size does not change much if we add or delete a small linear number of edges. Note that
the edges added or deleted do not have to be random or independent of the existing graph; rather, they can
be chosen by an adversary after inspecting the whole of IRGn (κn ). More precisely, (Bollobás et al., 2007,
Theorem 3.9) shows that, for small enough δ > 0, the giant component of IRGn (κn ) in the supercritical
regime changes by more than εn vertices if we remove any collection of δn edges.

Notes on Section 3.7

Theorem 3.22 is (Janson, 2008, Corollary 4.4). Theorem 3.23 was proved by O’Connell (1998). Andreis
et al. (2021) proved related results on large deviations of component sizes in the Erdős–Rényi random
graph, while Andreis et al. (2023) extended this to general inhomogeneous random graphs.

3.9 E XERCISES FOR C HAPTER 3

Exercise 3.1 (Erdős–Rényi random graph) Show that, for S = [0, 1] and puv = κ(u/n, v/n)/n with
κ : [0, 1]2 → [0, ∞) continuous, the model is the Erdős–Rényi random graph with edge probability λ/n
precisely when κ(x, y) = λ. Is this also true when κ : [0, 1]2 → [0, ∞) is not continuous?
Exercise 3.2 (Lower bound on expected number of edges) Show that when κ : S × S → [0, ∞) is
continuous, then
ZZ
1 1
lim inf E[|E(IRGn (κ))|] ≥ κ(x, y)µ(dx)µ(dy), (3.9.1)
n→∞ n 2 S2

so that the lower bound in (3.2.3) generally holds.

Exercise 3.3 (Expected number of edges) Show that (3.2.3) holds when κ : S × S → [0, ∞) is bounded
and continuous.
Exercise 3.4 (Asymptotic equivalence for general IRGs) Prove that the random graphs IRGn (p) with
puv as in (3.2.6) are asymptotically equivalent to IRGn (p) with puv = p(NR)
uv (κn ) and to IRGn (p) with
puv = p(GRG)
uv (κn ) when (3.2.8) holds.
Exercise 3.5 (The Chung–Lu model) Prove that when κ : [0, 1]2 → [0, ∞) is given by

κ(x, y) = [1 − F ]−1 (x)[1 − F ]−1 (y)/E[W ], (3.9.2)

then κ is graphical precisely when E[W ] < ∞, where W has distribution function F . Further, κ is always
irreducible.
Exercise 3.6 (The Chung–Lu model repeated) Let w̃v = [1 − F ]−1 (v/n) nE[W ]/`n and wv = [1 −
p

F ]−1 (v/n) as in (1.3.15). Assume that wv2 = o(`n ). Show that the edge probabilities in CLn (w̃) are

p̃uv = [1 − F ]−1 (i/n)[1 − F ]−1 (i/n)/(nE[W ]).

3.9 Exercises for Chapter 3 133

Further, show that CLn (w̃) and CLn (w) are asymptotically equivalent whenever (E[Wn ] − E[W ])2 =
o(1/n2 ).
Exercise 3.7 (Definitions 3.2 and 3.3 for the homogeneous bipartite graph) Prove that Definitions 3.2 and
3.3 hold for the homogeneous bipartite graph.
Exercise 3.8 (Examples of homogeneous random graphs) Show that the Erdős–Rényi random graph, the
homogeneous bipartite random graph, and the stochastic block model are all homogeneous random graphs.
Exercise 3.9 (Homogeneous bipartite graph) Prove that the homogeneous bipartite random graph is a
special case of the finite-type case.
Exercise 3.10 (Irreducibility for the finite-types case) Prove that, in the finite-type case, irreducibility
follows when there exists an m such that the mth power of the matrix (κ(s, r)µ(r))s,r∈[t] contains no
zeros.
Exercise 3.11 (Graphical limit in the finite-types case) Prove that, in the finite-type case, the convergence
of µn in (3.2.1) holds precisely when, for every type s ∈ S,
lim ns /n = µ(s). (3.9.3)
n→∞

Exercise 3.12 (Variance of number of vertices of degree k and type s) Let IRGn (κn ) be a finite-type in-
homogeneous random graph with graphical sequence of kernels κn . Let Nk,s (n) be the number of vertices
of degree k and type s. Show that Var(Nk,s (n)) = O(n).
Exercise 3.13 (Proportion of isolated vertices in inhomogeneous random graphs) Let IRGn (κn ) be an
inhomogeneous random graph with a graphical sequence of kernels κn that converges to κ. Show that the
proportion of isolated vertices converges to
Z
1
N0 (n) −→ p0 = e−λ(x) µ(dx).
P
(3.9.4)
n
R
Conclude that p0 > 0 when λ(x)µ(dx) < ∞.
Exercise 3.14 (Upper and lower bounding finite-type kernels) Prove that the kernels κm and κm in
(3.3.15) and (3.3.16) are of finite type.
Exercise 3.15 (Inclusion of graphs for larger κ) Let κ0 ≤ κ hold a.e. Show that we can couple IRGn (κ0 )
and IRGn (κ) in such a way that IRGn (κ0 ) ⊆ IRGn (κ).
Exercise 3.16 (Tails of Poisson variables) Use the stochastic domination of Poisson random variables
with different parameters, as well as the concentration properties of Poisson variables, to complete the
proof of (3.3.30), showing that the tail asymptotics of the weight distribution and that of the mixed-Poisson
random variable with that weight agree.
Exercise 3.17 (Power laws for sum kernels) Let κ(x, y) = ψ(x) + ψ(y) for a continuous function
ψ : [0, 1] 7→ [0, ∞), and let the reference measure µ be uniform on [0, 1]. Use Corollary 3.7 to identify
when the degree distribution satisfies a power law. How is the tail behavior of D related to that of ψ?
Exercise 3.18 (Survival probability of individual with random type) Consider a multi-type branching
process where the root has type s with probability µ(s) for all s ∈ [t]. Show that the survival probability ζ
equals
X (s)
ζ= ζ µ(s), (3.9.5)
s∈[t]

where ζ (s) was defined in (3.4.1).

Exercise 3.19 (Irreducibility of multi-type branching process) Show that the positivity of the survival
probability ζ (s) of an individual of type s is independent of the type s when the probability that an individual
of type r has a type-s descendant is strictly positive for every s, r ∈ [t].
Exercise 3.20 (Irreducibility of multi-type branching process (cont.)) Prove that the probability that an
individual of type s to have a type-r descendant is strictly positive precisely when there exists an l such that
T lκ (s, r) > 0, where T κ (s, r) = κ(s, r)µ(r) is the mean offspring matrix.
134 Connected Components in General Inhomogeneous Random Graphs

Exercise 3.21 (Singularity of multi-type branching process) Prove that G(z) = Mz for some matrix M
precisely when each individual in the multi-type branching process has exactly one offspring almost surely.
Exercise 3.22 (Erdős–Rényi random graph) Prove that NRn (w) = ERn (λ/n) when w is constant with
wv = −n log (1 − λ/n) for all v ∈ [n].
Exercise 3.23 (Homogeneous Poisson multi-type branching processes) In analogy with the homogeneous
random graph as defined in (3.2.11), we call a Poisson multi-type branching process homogeneous when
the expected offspring of a tree vertex of type x equals λ(x) = λ for all x ∈ S. Consider a homogeneous
Poisson multi-type branching process with parameter λ. Show that the function φ(x) = 1 is an eigenvector
of T κ with eigenvalue λ. Conclude that (Zj /λj )j≥0 is a martingale, where (Zj )j≥0 denotes the number
of individuals in the jth generation, irrespective of the starting distribution.
Exercise 3.24 (Proof of no-overlap property in (3.5.19)) Prove that P(B̄r(Gn ;Q) (o1 ) = (t, q), o2 ∈
(Gn ;Q)
B2r (o1 )) → 0, and conclude that (3.5.19) holds.
Exercise 3.25 (Unimodular mixed-Poisson branching process) Recall the definition of a unimodular
branching process in Definition 1.26. Prove that the mixed-Poisson branching process described in (3.5.29)
and (3.5.30) is indeed unimodular.
Exercise 3.26 (Branching process domination of Erdős–Rényi random graph) Show that Exercise 3.22 to-
gether with Proposition 3.16 imply that |C (o)| T ? , where T ? is the total progeny of a Poisson branching
process with mean −n log (1 − λ/n) offspring.
Exercise 3.27 (Local convergence of ERn (λ/n)) Use Theorem 3.18 to show that ERn (λ/n) converges
locally in probability to the Poisson branching process with parameter λ.
Exercise 3.28 (Coupling to a multi-type Poisson branching process) Prove the stochastic relation be-
tween multi-type Poisson branching processes and neighborhoods in Norros–Reittu inhomogeneous random
graphs in Proposition 3.17 by adapting the proof of Proposition 3.16.
Exercise 3.29 (Phase transition for r = 2) Let ζκ(1) and ζκ(2) denote the survival probabilities of an
irreducible multi-type branching process with two types starting from vertices of types 1 and 2, respectively.
Give necessary and sufficient conditions for ζκ(i) > 0 to hold for i ∈ {1, 2}.
Exercise 3.30 (The size of small components in the finite-type case) Prove that, in the finite-types case,
when (κn ) converges to a limiting kernel κ, then supx,y,n κn (x, y) < ∞ holds, so that the results of
Theorem 3.21 apply in the sub- and supercritical cases.
Exercise 3.31 (Law of large numbers for |Cmax | for ERn (λ/n)) Prove that, for the Erdős–Rényi random
P
graph, Theorem 3.19 implies that |Cmax |/n −→ ζλ , where ζλ is the survival probability of a Poisson
branching process with mean-λ offspring.
Exercise 3.32 (Connectivity of uniformly chosen vertices) Suppose we draw two vertices independently
and uar from [n] in IRGn (κn ). Prove that Theorem 3.20 implies that the probability that the vertices are
connected converges to ζ 2 .
Exercise 3.33 (The size of small components for CLn (w)) Use Theorem 3.21 to prove that, for CLn (w)
with weights given by (1.3.15) and 1 < ν < ∞, the second largest cluster has size |C(2) | = OP (log n)
when W has bounded support or is almost surely bounded below by ε > 0 with E[W ] < ∞. Further,
|Cmax | = OP (log n) when W has bounded support and ν < 1. Here W is a random variable with
distribution function F .
Exercise 3.34 (Average degree in two populations) Show that the average degree is close to pm1 + (1 −
p)m2 in the setting of Example 3.1 with n1 vertices of type 1 satisfying n1 /n → p.
Exercise 3.35 (Phase transition for two populations) Show that ζ > 0 precisely when [pm21 + (1 −
p)m22 ]/[pm1 +(1−p)m2 ] > 1 in the setting of Example 3.1 with n1 vertices of type 1 satisfying n1 /n → p.
Exercise 3.36 (Phase transition for two populations (cont.)) In the setting of Exercise 3.35, find an example
of p, m1 , m2 where the average degree is less than 1, yet there exists a giant component.
Exercise 3.37 (Degree sequence of giant component for rank 1) Consider GRGn (w) as in Theorem 3.20.
Show that the proportion of vertices of Cmax having degree ` is close to p` (1 − ξ ` )/ζ.
3.9 Exercises for Chapter 3 135

Exercise 3.38 (Degree sequence of complement of giant component) Consider GRGn (w) as in Theorem
3.20. Show that when ξ < 1, the proportion of vertices outside the giant component Cmax having degree
` is close to p` ξ ` /(1 − ζ). Conclude that the degree sequence of the complement of the giant component
never satisfies a power law. Can you give an intuitive explanation for this?
Exercise 3.39 (Finiteness of ν(p)) Prove that ν(p) in (3.6.17) satisfies ν(p) < ∞ for every p ∈ (0, 1].
Exercise 3.40 (Phase transition of uniformly grown random graphs) Recall the uniformly grown random
graph in (3.2.12). Look up the proof that kT κ k = 4λ in (Bollobás et al., 2007, Section 16.1).
Exercise 3.41 (Phase transition of sum kernels) Recall the inhomogeneous random graph with sum kernel
in (3.2.13). When does it have a giant?
Exercise 3.42 (Connectivity probability of sparse ERn (λ/n)) Use Theorem 3.23 to prove that
1
lim log P(ERn (λ/n) connected) = log(1 − e−λ )
n→∞ n
as in (3.7.9).
C HAPTER 4
C ONNECTED C OMPONENTS IN
C ONFIGURATION M ODELS

Abstract
In this chapter we investigate the local limit of the configuration model, identify
when it has a giant component, and find its size and degree structure. We give
two proofs, one based on a “the giant is almost local” argument, and the other
based on a continuous-time exploration of the connected components in the
configuration model. Further results include its connectivity transition.

4.1 M OTIVATION : R ELATING D EGREES TO L OCAL L IMIT AND G IANT

In this chapter we study the connectivity structure of the configuration model. We focus
on the local connectivity, by investigating its local limit, as well as the global connectivity,
by identifying its giant component and connectivity transition. In inhomogeneous random
graphs there always is a positive proportion of vertices that are isolated (recall Exercise
3.13). In many real-world examples, we observe the presence of a giant component (recall
Table 3.1). In many of these examples the giant is almost the whole graph and sometimes,
by definition, it is the whole graph. For example, the Internet needs to be connected in such a
way as to allow e-mail messages to be sent between any pair of vertices. In many other real-
world examples, though, it is not at all obvious whether, or why, the network is connected.
See Figure 4.1 (which is the same as Figure 3.1), and observe that there are quite a few
connected networks in the KONECT data base.
Table 4.1 invites us to think about what makes networks (close to) fully connected. We

0.9
Relative size of LCC

0.8

0.7

0.6

0.5

0.4

0.3

102 103 104 105 106 107 108

Size
Figure 4.1 Proportion of vertices in the maximal connected component in the
1,203 networks from the KONECT data base. The network is connected when the
proportion equals 1.
137
138 Connected Components in Configuration Models

Subject % in giant Size Source Data

California roads 0.9958 1,965,206 Leskovec et al. (2009) Leskovec and Krevl (2014)
Facebook 0.9991 721m Ugander et al. (2011) Ugander et al. (2011)
Hyves 0.996 8,047,530 Corten (2012) Corten (2012)
arXiv astro-ph 0.9538 18,771 Leskovec et al. (2007) Kunegis (2017)
US power grid 1 4,941 Watts and Strogatz (1998) Kunegis (2017)
Jazz-musicians 1 198 Gleiser and Danon (2003) Kunegis (2017)

Table 4.1 The rows in the above table represent the following six real-world networks:
In the California road network, vertices represent intersections or endpoints of roads.
In the Facebook network, vertices represent the users and the edges Facebook friendships.
Hyves was a Dutch social media platform. Vertices represent users, and edges friendships.
The arXiv astro-physics network represents authors of papers within the astro-physics section of
arXiv, where an edge between authors represents that they have co-authored a paper.
In the high-voltage power network in western USA, the vertices represent transformer substations
and generators, and the edges transmission cables.
In the jazz-musicians data set, vertices represent musicians and connections indicate past
collaborations.

investigate this question here in the context of the configuration model. The advantage of the
configuration model is its high flexibility in degree structure, so that all degrees can have at
least a certain minimal value. We will see that this can give rise to connected random graphs
that at the same time remain sparse, as is the case in many real-world networks.

Organization of this Chapter

This chapter is organized as follows. In Section 4.2 we study the local limit of the con-
figuration model. In Section 4.3 we state and prove the law of large numbers for the giant
component in the configuration model, thus establishing the giant’s phase transition. In Sec-
tion 4.4 we study the conditions for the configuration model to be connected. In Section 4.5
we state further results on the configuration model. We close this chapter in Section 4.6 with
notes and discussion, and with exercises in Section 4.7.

4.2 L OCAL C ONVERGENCE OF THE C ONFIGURATION M ODEL

We start by investigating the locally tree-like nature of the configuration model. Recall the
unimodular branching-process tree from Definition 1.26. Our main result is as follows:
Theorem 4.1 (Locally tree-like nature of the configuration model) Assume that Con-
ditions 1.7(a),(b) hold. Then CMn (d) converges locally in probability to the unimodu-
lar branching-process tree (G, o) ∼ µ with root offspring distribution (pk )k≥0 given by
pk = P(D = k).
Before starting the proof of Theorem 4.1, let us informally explain the above connection
between local neighborhoods and branching processes. We note that the asymptotic off-
spring distribution at the root is equal to (pk )k≥0 , where pk = P(D = k) is the asymptotic
degree distribution. Indeed, fix Gn = CMn (d). Then, the probability that a random vertex
4.2 Local Convergence of the Configuration Model 139

(a) (b)
100 100

10−1 10−1

10−2 10−2
P(X > x)

P(X > x)
10−3 10−3

10−4 10−4
Degree distribution Degree distribution
−5
Size-biased degree distribution −5
Size-biased degree distribution
10 Random friend degree distribution 10 Random friend degree distribution
100 101 102 103 100 101
Degrees Degrees
Figure 4.2 Degree distributions in the configuration model with n = 100, 000 and
(a) τ = 2.5; (b) τ = 3.5.
has degree k is equal to
p(G
k
n)
= P(Dn = k) = nk /n, (4.2.1)
(Gn )
where nk denotes the number of vertices with degree k . By Condition 1.7(a), pk converges
to pk = P(D = k), for every k ≥ 1. This explains the offspring of the root of our branching-
process approximation.
The offspring distribution of individuals in the first and later generations is given by
(k + 1)pk+1
p?k = . (4.2.2)
E[D]
We now explain this heuristically, by examining the degree of the vertex to which the first
half-edge incident to the root is paired. By the uniform matching of half-edges, the probabil-
ity that a vertex of degree k is chosen is proportional to k . Ignoring the fact that the root and
one half-edge have already been chosen (which does have a minor effect on the number of
available or free half-edges), the degree of the vertex incident to the chosen half-edge equals
k with probability equal to kp(G k
n)
/E[Dn ] (recall (4.2.1)). See Figure 4.2 for an example of
the degree distribution in the configuration model, where we show the degree distribution
itself, the size-biased degree distribution, and the degree distribution of a random neighbor
of a uniform vertex, for two values of τ . As can be guessed from the local limit, the latter
two degree distributions are virtually indistinguishable.
However, one of the half-edges is used in connecting to the root, so that, for a vertex
incident to the root to have k offspring, it needs to connect its half-edge to a vertex of degree
k + 1. Therefore, the probability that the offspring, or “forward degree,” of any of the direct
neighbors of the root is k equals
(Gn )
(k + 1)pk+1
p?k(Gn ) = . (4.2.3)
E[Dn ]
Thus, (pk?(Gn ) )k≥0 can be interpreted as the forward degree distribution of vertices in the
cluster exploration. When Conditions 1.7(a),(b) hold, we also have pk?(Gn ) → p?k , where
(p?k )k≥0 is defined in (4.2.2). As a result, we often refer to (p?k )k≥0 as the asymptotic forward
degree distribution.
140 Connected Components in Configuration Models

The above heuristic argues that any direct neighbor of the root has a number of for-
ward neighbors with asymptotic law (p?k )k≥0 . However, every time we pair two half-edges,
the number of free or available half-edges decreases by 2. Similarly to the depletion-of-
points effect in the exploration of connected components in the Erdős–Rényi random graph
ERn (λ/n), the configuration model CMn (d) suffers from a depletion-of-points-and-half-
edges effect. Thus, by iteratively connecting half-edges in a breadth-first way, the offspring
distribution changes along the way, which potentially gives trouble.
Luckily, the number of available half-edges is initially `n − 1, which is very large when
Conditions 1.7(a),(b) hold, since then `n /n = E[Dn ] → E[D] > 0. Thus, we can pair
many half-edges before we start noticing that their number decreases. As a result, the de-
grees of different vertices in the exploration process are close to being iid, leading to a
branching-process approximation of neighborhoods in the configuration model. In order to
prove Theorem 4.1, we need to pair only a bounded number of edges, but our approximation
extends significantly beyond this.
In order start the proof of Theorem 4.1 based on (2.4.11), we introduce some notation.
First, we let B̄r(Gn ) (v) denote the ordered version of Br(Gn ) (v), obtained by ordering the half-
edges randomly and performing a breadth-first exploration from the smallest to the largest
labeled half-edge. We again write B̄r(Gn ) (v) = t to denote that this ordered neighborhood is
equal to the ordered tree t.
Fix a rooted ordered tree t with r generations, and let

1{B̄r(Gn ) (v)=t}
X
Nn,r (t) = (4.2.4)
v∈[n]

denote the number of vertices in Gn = CMn (d) whose ordered local neighborhood up to
generation r equals t. By Theorem 2.15, to prove Theorem 4.1, we need to show that

Nn,r (t) P
−→ µ(B̄r(G) (o) = t), (4.2.5)
n
where (G, o) ∼ µ denotes the unimodular branching process with root offspring distribution
(pk )k≥1 . Here, we also rely on Theorem 2.8 to see that it suffices to prove (4.2.5) for trees,
since the unimodular branching-process tree is a tree with probability 1.
To prove (4.2.5), and as we have done before, we use a second-moment method. We
start by proving that the first moment E[Nn,r (t)]/n → µ(B̄r(G) (o) ' t), after which we
prove that Var(Nn,r (t)) = o(n2 ). Then (4.2.5) follows from the Chebychev inequality
[V1, Theorem 2.18].

4.2.1 P ROOF OF L OCAL C ONVERGENCE OF C ONFIGURATION M ODEL : F IRST M OMENT

We next relate the neighborhood in a random graph to a branching process where the
root has offspring distribution Dn , while all other individuals have offspring distribution
Dn? − 1, where Dn? is the size-biased distribution of Dn . Denote this branching process by
(BPn (t))t≥1 , as described in Section 1.5. In terms of Definition 1.25, we say that a tree ver-
tex is explored when we have inspected how many children it has. Here, BPn (t) denotes the
branching process when we have explored precisely t tree vertices, so that BPn (1) denotes
4.2 Local Convergence of the Configuration Model 141

the root ∅ and its neighbors. We explore it in breadth-first order as in Definition 1.25 in
Section 1.5.
d d
Clearly, by Conditions 1.7(a),(b), we have Dn −→ D and Dn? −→ D? , which implies
d
that BPn (t) −→ BP(t) for every t finite, where BP(t) is the restriction of the unimodular
branching process (G, o) with root offspring distribution (pk )k≥1 to its first t individuals
(see Exercise 4.1). Note that, for t a fixed rooted tree of at most r generations, Br(G) (o) ' t
precisely when BP(tr ) ' t, where tr denotes the number of vertices in the first r − 1
generations in t.
We let (Gn (t))t≥1 denote the graph exploration process from a uniformly chosen vertex
o ∈ [n]. Here Gn (t) is the exploration where we have paired precisely t−1 half-edges, in the
breadth-first manner as described in Definition 1.25, while we also indicate the half-edges
incident to the vertices found. Thus, Gn (1) consists of o ∈ [n] and its Dn = do half-
edges, and every further exploration corresponds to the pairing of a half-edge. In particular,
from (Gn (t))t≥1 , we can retrieve Br(Gn ) (o) for every r ≥ 0, where Gn = CMn (d). The
following lemma proves that we can couple the graph exploration to the branching process in
such a way that (Gn (t))t∈[mn ] is equal to (BPn (t))t∈[mn ] whenever mn → ∞ sufficiently
slowly. In the statement, we write (Gb n (t), BP
c n (t))t≥1 for the coupling of (Gn (t))t∈[m ] and
n

(BPn (t))t∈[mn ] :
Lemma 4.2 (Coupling graph exploration and branching process) Subject to Conditions
1.7(a),(b), there exists a coupling (G b n (t), BP
c n (t))t≥1 of (Gn (t))t≥1 and (BPn (t))t≥1 such
that
P (Gb n (t))t∈[m ] 6= (BP
n
c n (t))t∈[m ] = o(1),
n
(4.2.6)

when mn → ∞ sufficiently slowly. Consequently, E[Nn,r (t)]/n → µ(B̄r(G) (o) = t).

Remark 4.3 (Extensions) Here we discuss p some useful extensions of Lemma 4.2. First, in
its proof we will see that any mn = o( n/dmax ) is allowed. Here dmax = maxv∈[n] dv
is the maximal vertex degree in CMn (d), which is o(n) when Conditions 1.7(a),(b) hold
(compare with Exercise 1.8). Second, Lemma 4.2 can easily be extended p to deal with the
explorations from two sources (o1 , o2 ), where we can still take mn = o( n/dmax ) and the
two branching processes to which we couple the exploration from two sources, denoted by
(1) (2)
(BP
c (t))t∈[m ] and (BP
n n
c (t))t∈[m ] , are iid.
n n
J
Proof of Lemma 4.2. We let the offspring of the root of the branching process D b n be equal
to do , which is the number of neighbors of the vertex o ∈ [n] that was chosen uar. By
construction Db n = do , so that also G
b n (1) = BP
c n (1). The proof consists of several parts.

Joint Neighborhood Construction

Fix m. We next explain how to jointly construct (G b n (t), BP
c n (t))t∈[m] given that we have
already constructed (Gb n (t), BP
c n (t))t∈[m−1] . Further, for each half-edge, we record its status
as real or ghost, where the real half-edges correspond to those present in both G b n (m − 1)
c n (m−1), while the ghost half-edges are present only in G
and BP b n (m−1) or BPc n (m−1).
b n (m), we take the first unpaired half-edge xm in G
To obtain G b n (m − 1). When this half-
0
edge has the ghost status, we draw a uniform unpaired half-edge ym and then pair xm to
142 Connected Components in Configuration Models

0 b n (m), and we give all sibling half-edges of y 0 the ghost status (where we
ym to obtain G m
recall that the sibling half-edges of a half-edge y are those half-edges unequal to y that are
incident to the same vertex as is y ).
When the half-edge has the real status, it needs to be paired both in G b n (m) and BP c n (m).
To obtain Gn (m), this half-edge needs to be paired with a uniform “free” half-edge, i.e., one
b
that has not been paired so far. For BP c n (m), this restriction does not hold. We now show
how these two choices can be conveniently coupled.
c n (m), we draw a uniform half-edge ym from the collection of all half-edges,
For BP
independently of the past randomness. Let Um denote the vertex to which ym is incident.
We then let the mth individual in (BP c n (t))t∈[m−1] have precisely dU −1 children. Note that
m
dUm −1 has the same distribution as Dn? −1 and, by construction, the collection dUt −1 t≥1
is iid. This constructs BP c n (m), except for the statuses of the sibling half-edges incident to
Ut , which we describe below.
For Gb n (m), when ym is still free, i.e., it has not yet been paired in (G b n (t))t∈[m−1] , we
let xm be paired with ym in G b n (m); we have thus also constructed (G b n (t), BP
c n (t))t∈[m] .
We give all the other half-edges of Um the status “real” when Um has not yet appeared in
b n (m − 1), and otherwise we give them the ghost status. The latter case implies that a cycle
G
appears in (G b n (t))t∈[m] . By construction, such a cycle does not occur in (BP c n (t))t∈[m] ,
where reused vertices are simply repeated several times.
A difference in the coupling arises when ym has already been paired in (G b n (t))t∈[m−1] , in
which case we give all the sibling half-edges of Ut the ghost status. For G b n (m), we draw a
0 0
uniform unpaired half-edge ym and pair xm with ym instead, to obtain G b n (m), and we give
0
all the sibling half-edges of ym the ghost status. Clearly, this might give rise to a difference
between G b n (m) and BP c n (m).
We continue the above exploration algorithm until it terminates at some time Tn . Since
each step pairs exactly one half-edge, we have that Tn = |E(C (o))|, so that Tn ≤ `n /2
steps. The final result is then (G b n (t), BP
c n (t))t∈[T ] . At this moment, however, the branching-
n

process tree (BP c n (t))t≥1 has not been fully explored, since the tree vertices corresponding
to ghost half-edges in (BP c n (t))t≥1 have not been explored. We complete the tree explo-
ration (BPc n (t))t≥1 by iid drawing children of all the ghost tree vertices until the full tree is
obtained.
We emphasize that the law of (BP c n (t))t≥1 obtained above is not the same as that of
(BPn (t))t≥1 , since the order in which half-edges are paired is chosen in such a way that
(Gb n (t))t∈[T ] has the same law as the graph exploration process (Gn (t))t∈[T ] . However,
n n

with σn the first time that a ghost half-edge is paired, we have that (BP c n (t))t∈[σ ] does have
n

the same law as (BPn (t))t≥1 .

Differences between the Branching Process and Graph Exploration

We now provide bounds on the probability that differences occur between (Gb n (t))t≥1 and
(BP
c n (t))t≥1 before time mn . As discussed above, there are two sources of difference be-
tween (G
b n (t))t∈[m] and (BP
c n (t))t∈[m] :
4.2 Local Convergence of the Configuration Model 143

Half-edge reuse. In the above coupling, a half-edge reuse occurs when ym has already been
paired and is being reused in the branching process. As a result, for (Gb n (t))t∈[m] , we need
0
to redraw ym to obtain ym , which is used instead in (Gn (t))t∈[m] ;
b
Vertex reuse. A vertex reuse occurs when Um = Um0 for some m0 < m. In the above cou-
pling, this means that ym is a half-edge that has not yet been paired in (Gb n (t))t∈[m−1] , but
it is incident to a half-edge that has already been paired in (G
b n (t))t∈[m−1] . In particular,
the vertex Um to which it is incident has already appeared in (G b n (t))t∈[m−1] , and it is be-
ing reused in the branching process. In this case, a copy of Um appears in (BP c n (t))t∈[m] ,
while a cycle appears in (Gn (t))t∈[m] .
b

We continue by providing a bound on both contributions:

Half-Edge Reuse
At time m − 1, precisely 2m − 1 half-edges are forbidden for use by (G
b n (t))t∈[m] . The
probability that the half-edge ym equals one of these half-edges is
2m − 1
. (4.2.7)
`n
Hence the expected number of half-edge reuses before time mn is
mn
X 2m − 1 m2
= n = o(1), (4.2.8)
m=1
`n `n
√
when mn = o( n). The Markov inequality ([V1, Theorem√ 2.17]) shows that the probabil-
ity that a half-edge reuse occurs is also o(1) when mn = o( n).

Vertex Reuse
The probability that vertex v is chosen in the mth draw of (BP
c n (t))t≥1 is equal to dv /`n .
The probability that vertex v is drawn twice before time mn is therefore at most

mn (mn − 1) d2v
. (4.2.9)
2 `2n
The expected number of vertex reuses up to time mn is thus at most

mn (mn − 1) X d2v dmax

≤ m2n = o(1), (4.2.10)
2`n v∈[n]
`n `n
p
by Condition 1.7(a),(b) when mn = o( n/dmax ). Again the Markov inequality completes
the proof.
This completes the coupling part of Lemma 4.2, including the bound on mn as formu-
lated in Remark 4.3. It is straightforward to check that the exploration can be performed
from the two sources (o1 , o2 ) independently, thus establishing the required coupling to two
independent n-dependent branching processes as claimed in Remark 4.3.
144 Connected Components in Configuration Models

Completion of the Proof: Convergence of E[Nn,r (t)]/n

In order to show that E[Nn,r (t)]/n → µ(B̄r(G) (o) = t), we let tr denote the number of
individuals in the first r−1 generations in t and let (t(t))t∈[tr ] be its breadth-first exploration
as in Definition 1.25. It is here that the fact that both B̄r(G) (o) and t are ordered is crucial.
Then
E[Nn,r (t)]/n = P((Gn (t))t∈[tr ] = (t(t))t∈[tr ] ), (4.2.11)
so that
P(Br(Gn ) (o) ' t) = P((Gn (t))t∈[tr ] = (t(t))t∈[tr ] )
= P((BPn (t))t∈[tr ] = (t(t))t∈[tr ] ) + o(1)
= P((BP(t))t∈[tr ] = (t(t))t∈[tr ] ) + o(1)
= µ(B̄r(G) (o) = t) + o(1), (4.2.12)
where the second equality is (4.2.6) in Lemma 4.2, while the third is the statement that
d
BPn (t) −→ BP(t) for every finite t by Exercise 4.1. This proves the claim. We note that
(4.2.12) implies the local weak convergence of CMn (d).

4.2.2 P ROOF OF L OCAL C ONVERGENCE OF C ONFIGURATION M ODEL : S ECOND M O -

MENT

We next study the second moment of Nn,r (t) and show that it is almost its first moment
squared:
Lemma 4.4 (Concentration number of trees) Subject to Conditions 1.7(a),(b),
E[Nn,r (t)2 ]
→ µ(B̄r(G) (o) = t)2 . (4.2.13)
n2
P
Consequently, Nn,r (t)/n −→ µ(B̄r(G) (o) = t).
Proof Let o1 , o2 ∈ [n] be two vertices chosen uar from [n], independently. We start by
computing
E[Nn,r (t)2 ]
= P(B̄r(Gn ) (o1 ), B̄r(Gn ) (o2 ) = t). (4.2.14)
n2
d
Recall that |Br(Gn ) (o1 )| −→ |Br(G) (o)|, where (G, o) ∼ µ denotes the local weak limit of
CMn (d) derived above. Since |Br(G) (o)| is a tight random variable, o2 ∈ / B2r(Gn )
(o1 ) whp
(recall Corollary 2.20), so that also
E[Nn,r (t)2 ]
= P(B̄r(Gn ) (o1 ), B̄r(Gn ) (o2 ) = t, o2 ∈ (Gn )
/ B2r (o1 )) + o(1). (4.2.15)
n2
We now condition on B̄r(Gn ) (o1 ) = t, and write
P(B̄r(Gn ) (o1 ), B̄r(Gn ) (o2 ) = t, o2 ∈ (Gn )
/ B2r (o1 ))
= P(B̄r(Gn ) (o2 ) = t | B̄r(Gn ) (o1 ) = t, o2 ∈ (Gn )
/ B2r (o1 ))
× P(B̄r (o1 ) = t, o2 ∈
(Gn ) (Gn )
/ B2r (o1 )). (4.2.16)
4.2 Local Convergence of the Configuration Model 145

We already know that P(B̄r(Gn ) (o1 ) = t) → µ(B̄r(G) (o) = t), so that also

P(B̄r(Gn ) (o1 ) = t, o2 ∈ (Gn )

/ B2r (o1 )) → µ(B̄r(G) (o) = t). (4.2.17)

In Exercise 4.4, the reader can prove that (4.2.17) does indeed hold.
Conditional on B̄r(Gn ) (o1 ) = t and o2 ∈ / Br(Gn ) (o1 ), the probability that B̄r(Gn ) (o2 ) = t is
the same as the probability that B̄r (o2 ) = t in CMn0 (d0 ), which is obtained by removing
(Gn )

all vertices in Br(Gn ) (o1 ). Thus, since B̄r(Gn ) (o1 ) = t, we have that n0 = n − |V (t)| and d0
is the corresponding degree sequence. The key point is that the degree distribution d0 still
satisfies Conditions 1.7(a),(b). Therefore, we also have

P(B̄r(Gn ) (o2 ) = t | B̄r(Gn ) (o1 ) = t, o2 ∈

/ Br(Gn ) (o1 )) → µ(B̄r(G) (o) = t), (4.2.18)

and we have proved (4.2.13).

Finally, since E[Nn,r (t)]/n → µ(B̄r(G) (o) = t) and E[Nn,r (t)2 ]/n2 → µ(B̄r(G) (o) =
t)2 , it follows that Var(Nn,r (t)/n) → 0. Since also E[Nn,r (t)]/n → µ(B̄r(G) (o) = t),
P
we obtain that Nn,r (t)/n −→ µ(B̄r(G) (o) = t), as required.
Lemma 4.4 completes the proof of Theorem 4.1.
Exercises 4.2 and 4.3 study an adaptation of the above proof for NRn (w) using Proposi-
tion 3.16.

4.2.3 C ONCENTRATION I NEQUALITY P ROOF USING A R EVEALMENT P ROCESS

In this subsection we provide an alternative proof for the concentration proved in Section
4.2.2 (see in particular Lemma 4.4), based on concentration techniques. The proof also
improves the local convergence in probability to almost surely, and further will be useful
for analyzing uniform random graphs with prescribed degrees, as in the next section.
We label the half-edges arbitrarily as [`n ], and pair them one by one, taking the next
available half-edge with lowest label. For t ∈ [`n /2], let Ft denote the σ -algebra generated
by the first t pairings. We define the Doob martingale (Mt )t≥1 by

Mt = E[Nn,r (t) | Ft ], (4.2.19)

so that M0 = E[Nn,r (t)] and M`n /2 = Nn,r (t). We use the Azuma–Hoeffding inequality
[V1, Theorem 2.27] to obtain the concentration of M`n /2 − M0 . For this, we investigate, for
t ∈ [`n /2],
Mt − Mt−1 = E[Nn,r (t) | Ft ] − E[Nn,r (t) | Ft−1 ]. (4.2.20)

In the first term, we reveal one more pairing compared with the second term. We now study
the effect of this extra pairing. Let ((xs , ys ))s∈[`n /2] be the pairing conditional on Ft , where
we let the pairing ((xs , ys ))s∈[`n /2] be such that xs < ys . We now construct a pairing
((x0s , ys0 ))s∈[`n /2] that has the correct distribution under Ft−1 , while ((xs , ys ))s∈[`n /2] and
((x0s , ys0 ))s∈[`n /2] differ by at most two edges almost surely, i.e., by switching one pairing.
For this, we let (xs , ys ) = (x0s , ys0 ) for s ∈ [t − 1]. Then we let xt be the half-edge with
lowest label that has not been paired yet at time t, and yt its pair prescribed by Ft . Further,
we also let x0t = xt , and we let yt0 be a pair of xt chosen independently of yt from the set
of available half-edges prescribed by Ft−1 . Then, clearly, ((xs , ys ))s∈[t] and ((x0s , ys0 ))s∈[t]
146 Connected Components in Configuration Models

have the correct distributions. We complete the proof by describing how the remaining half-
edges can be paired in such a way that at most two edges are different in ((xs , ys ))s∈[`n /2]
and ((x0s , ys0 ))s∈[`n /2] .
Let (xa , yb ) be the unique pair of ((xs , ys ))s∈[`n /2] such that yt0 ∈ {xa , ya }. Then, we
pair yt in ((x0s , ys0 ))s∈[`n /2] to {xa , ya } \ {yt0 }. Thus, in ((x0s , ys0 ))s∈[`n /2] , yt is paired
with ya when yt0 = xa , and yt is paired with xa when yt0 = ya . This means that the
pair of edges (xt , yt ) and (xa , ya ) in ((xs , ys ))s∈[`n /2] is switched to (xt , yt0 ) and either
the ordered version of {xa , yt } or that of {ya , yt } in ((x0s , ys0 ))s∈[`n /2] . All other pairs in
((x0s , ys0 ))s∈[`n /2] are the same as in ((xs , ys ))s∈[`n /2] . Since (xt , yt0 ) is paired independently
of (xt , yt ), the conditional distribution of ((x0s , ys0 ))s∈[`n /2] given Ft−1 is the same as that
of ((xs , ys ))s∈[`n /2] given Ft−1 , as required.
0
Let Nn,r (t) be the number of vertices whose r-neighborhood is isomorphic to t in
0 0
((xs , ys ))s∈[`n /2] . The above coupling gives that
0
Mt − Mt−1 = E[Nn,r (t) − Nn,r (t) | Ft ]. (4.2.21)
When switching two edges, the number of verticesPwhose r-neighborhood is isomorphic
r k
to t cannot change by more than 4c, where c = k=0 d and d is the maximal degree
in t. Indeed, the presence of an edge {u, v} in the resulting multi-graph Gn affects the
event {Br(Gn ) (i) ' t} only if there exists a path of length at most r in Gn between i and
{u, v}, the maximal degree along which is at most d. For the given choice of {u, v} there
are at most 2c such values of i ∈ [n]. Since a switch changes two edges, we obtain that
0
|Nn,r (t) − Nn,r (t)| ≤ 4c. Thus, Azuma–Hoeffding [V1, Theorem 2.27] implies that (with
the time variable n in [V1, Theorem 2.27] replaced by `n /2)
P(|Nn,r (t) − E[Nn,r (t)]| ≥ nε) = P(|M`n /2 − M0 | ≥ nε)
2
/[16c2 `n ]
≤ 2e−(nε) . (4.2.22)
Since this vanishes exponentially, and so is summable, we have proved the following corol-
lary:
Corollary 4.5 (Almost sure local convergence) The local convergence in Theorem 4.1 in
fact occurs almost surely.

4.2.4 L OCAL C ONVERGENCE OF U NIFORM G RAPHS WITH P RESCRIBED D EGREES

We start by investigating the locally tree-like nature of the uniform random graph UGn (d)
with prescribed degrees d (recall Section 1.3.3). Our main result is as follows:
Theorem 4.6 (Locally tree-like nature of uniform graphs with given degrees) Assume that
Conditions 1.7(a),(b) hold. Assume further that the empirical distribution Fn of d satisfies
[1 − Fn ](x) ≤ cF x−(τ −1) , (4.2.23)
for some cF > 0, τ ∈ (2, 3). Then UGn (d) converges locally in probability to the unimod-
ular branching-process tree with root offspring distribution (pk )k≥0 = (P(D = k))k≥0 .
We give two proofs of Theorem 4.6. The first proof, below, uses Theorem 1.13 to com-
pare probabilities for UGn (d) with those in CMn (d), and then relies on Theorem 4.1. The
4.2 Local Convergence of the Configuration Model 147

second proof uses the concentration inequality in (4.2.22). We refer the reader to Section 4.6
for further discussion.

Proof We rely on Theorem 1.13, for which (4.2.23) provides the assumption. We will
compare the neighborhood probabilities in UGn (d) with those in CMn (d), and show that
these are asymptotically equal. We then use Theorem 4.1 to reach the conclusion.
j
It is convenient to order all half-edges in UGn (d) randomly. We then write {u
v in UGn (d)} for the event that the j th half-edge incident to u connects to v in UGn (d). For
j
CMn (d), we also order the half-edges in [`n ] randomly, and we write {u v in CMn (d)}
for the event that the j th half-edge incident to u connects to v in CMn (d).
Fix an ordered tree t and write Gn = UGn (d). Let us start by computing P(B̄r(Gn ) (o1 ) =
t), where we write B̄r(Gn ) (o) for the ordered version of Br(Gn ) (o), in which we order all
half-edges incident to o, as well as the forward half-edges incident to v ∈ Br(Gn ) (o) \ {o},
according to their labels. Here we bear in mind that the half-edge at v connecting v to the
(unique) vertex closer to the root o is not one of the forward edges.
Recall (1.3.45) in Theorem 1.13. Since the degrees in t are bounded, we can make the
approximation

(du − |Uu |)(dv − |Uv |)

P(u ∼ v | EU ) = (1 + o(1)) , (4.2.24)
`n
where Ux denotes the set of pairs in U that contain x.
We condition on u = o, which occurs with probability 1/n. We explore Br(Gn ) (o) in a
breadth-first manner, starting with the neighbors (vi )di=1
u
of u. In this setting, |Uvi | = 0 for
all i ∈ [du ], while du − |Uu | = du − i + 1 for the probability of u ∼ vi in (4.2.24). Thus,
with Ui−1 = {{u, v1 }, . . . , {u, vi−1 }},

(du − i + 1)dvi
P(u ∼ vi | EUi−1 ) = (1 + o(1)) . (4.2.25)
`n
Taking the product over i ∈ [du ], we conclude that
du
Y dv
P(∂B1(Gn ) (u) = {v1 , . . . , vdu }) = (1 + o(1))du ! i
. (4.2.26)
i=1
`n

Recalling that B̄1(Gn ) (u) is the ordered version of the 1-neighborhood of u and noting that
there are du ! orderings of the edges incident to u, each of them equally likely, we thus obtain
that
du
du ! Y dvj
P(∂ B̄1(Gn )
(u) = (v1 , . . . , vdu )) = (1 + o(1))
du ! j=1 `n
du
Y dvj
= (1 + o(1)) . (4.2.27)
`
j=1 n
148 Connected Components in Configuration Models

Alternatively,
du
j
Y dvj
P(u vj ∀j ∈ [du ] in UGn (d)) = (1 + o(1)) . (4.2.28)
`
j=1 n

For CMn (d), we compute

du
j
Y dvj
P(u vj ∀j ∈ [du ] in CMn (d)) =
` − 2j + 1
j=1 n
du
Y dvj
= (1 + o(1)) , (4.2.29)
`
j=1 n

so that
j
P(u vj ∀j ∈ [du ] in UGn (d))
j
= (1 + o(1))P(u vj ∀j ∈ [du ] in CMn (d)). (4.2.30)
This shows that, conditional on o = u, the neighborhood sets of u in UGn (d) can be
coupled whp to those in CMn (d).
We continue by investigating the neighborhood set of another vertex v ∈ Br(Gn ) (u). For
this, we note that one edge has already been used to connect v to Br(Gn ) (u), so there are dv −1
edges remaining, which we will call forward edges. Let v be the sth vertex in B̄r(Gn ) (u), and
let Fs−1 denote all the information about the edges and vertices that have been explored
before vertex v . Then we compute
v −1
dY
j dvj
P(v vj ∀j ∈ [dv − 1] in UGn (d) | Fs−1 ) = (1 + o(1)) , (4.2.31)
j=1
`n

where we use that there are (dv − 1)! orderings of the dv − 1 forward edges. For CMn (d),
we instead compute
v −1
dY
j dvj
P(v vj ∀j ∈ [du ] in CMn (d) | Fs−1 ) =
j=1
`n − 2j − 2s − 1
v −1
dY
dvj
= (1 + o(1)) , (4.2.32)
j=1
`n

so that, for every s ≥ 1,

j
P(v vj ∀j ∈ [dv − 1] in UGn (d) | Fs−1 )
j
= (1 + o(1))P(v vj ∀j ∈ [dv − 1] in CMn (d) | Fs−1 ). (4.2.33)
Using (4.2.30) for the neighborhood sets of the root o = u, and (4.2.33) repeatedly for all
other vertices in Br(Gn ) (u), we conclude that
P(B̄r(Gn ) (o1 ) = t in UGn (d)) = (1 + o(1))P(B̄r(Gn ) (o1 ) = t in CMn (d)). (4.2.34)
4.2 Local Convergence of the Configuration Model 149

We then use (4.2.12) to conclude that

P(B̄r(Gn ) (o1 ) = t in UGn (d)) = µ(B̄r(G) (o) = t) + o(1), (4.2.35)
as required.
We continue by computing P(B̄r(Gn ) (o1 ), B̄r(Gn ) (o2 ) = t), for which we use similar ideas
to arrive at
P(B̄r(Gn ) (o1 ), B̄r(Gn ) (o2 ) = t in UGn (d))
= (1 + o(1))P(B̄r(Gn ) (o1 ), B̄r(Gn ) (o2 ) = t in CMn (d)). (4.2.36)
By (4.2.14), as well as (4.2.13) in Lemma 4.4, we obtain
P(B̄r(Gn ) (o1 ), B̄r(Gn ) (o2 ) = t in UGn (d))
= µ(B̄r(G) (o) = t)2 + o(1), (4.2.37)
so that
Nn,r (t) 1 P
= #{u : B̄r(Gn ) (u) = t in UGn (d)} −→ µ(B̄r(G) (o) = t), (4.2.38)
n n
as required. This completes the proof of Theorem 4.6.

A Concentration Inequality Proof

As for Corollary 4.5, we can use the concentration inequality in (4.2.22) to show that
UGn (d) converges locally almost surely:
Corollary 4.7 (Almost sure local convergence) The local convergence in Theorem 4.6 in
fact occurs almost surely.
To prove Corollary 4.7, we rely on the following lemma about the probability of the
simplicity of CMn (d), which is interesting in its own right:
Lemma 4.8 (Simplicity probability) Consider CMn (d) subject to Conditions 1.7(a),(b).
Then
P(CMn (d) simple) = e−o(n) . (4.2.39)
Proof We call a vertex high degree when its degree is at least K , and low degree when its
degree is in [k]. The quantities K and k will be determined later on. Let

dv 1{dv ≥K} , dv 1{dv ∈[k]}

X X
d≥K = d[k] = (4.2.40)
v∈[n] v∈[n]

be the total numbers of half-edges incident to high- and low-degree vertices, respectively.
We start by pairing the d≥K half-edges incident to high-degree vertices, and we call their
pairing good when the half-edges incident to a high-degree vertex are all connected to dis-
tinct low-degree vertices. We also call a sub-pairing (i.e., a pairing of a subset of the d≥K
half-edges incident to high-degree vertices) good when the half-edges in it are such that all
half-edges incident to the same vertex are paired with distinct vertices.
Let n[k] denote the number of low-degree vertices. Note that, independently of how earlier
150 Connected Components in Configuration Models

half-edges have been paired, the probability that the pairing of a half-edge keeps the sub-
pairing good is at least (n[k] − d≥K )/`n ≥ α for some α > 0, when k is such that n[k] ≥ εn
and when K is large enough that d≥K ≤ εn/2, which we assume from now on.
Let En be the probability that the pairing of half-edges incident to high-degree vertices is
good. Then, by the above,
P(En ) ≥ αd≥K . (4.2.41)
Now choose K = K(ε) sufficiently large that log(1/α)d≥K ≤ εn/2 for every n. Then we
obtain
P(En ) ≥ e−εn/2 . (4.2.42)
Having paired the half-edges incident to the high-degree vertices, we pair the remaining
half-edges uniformly. Note that CMn (d) is simple precisely when this pairing produces
a simple graph. Since the maximal degree is now bounded by K , the probability of the
simplicity of this graph is Θ(1) e−εn/2 (recall (1.3.41)). Thus, we arrive at
P(CMn (d) simple) ≥ e−εn . (4.2.43)
Since ε > 0 is arbitrary, the claim follows, when noting that obviously P(CMn (d) simple) ≤
1 = e−o(n) .
Proof of Corollary 4.7. By (4.2.22), and by Lemma 4.8,
P(|Nn,r (t) − E[Nn,r (t)]| ≥ nε | CMn (d) simple)
2
/[16c2 `n ] o(n)
≤ 2e−(nε) e = e−Θ(n) , (4.2.44)
which completes the proof since CMn (d), conditioned on simplicity, has the same law as
UGn (d) by (1.3.29) and the discussion below it.

4.3 T HE G IANT IN THE C ONFIGURATION M ODEL

In this section we investigate the connected components in the configuration model. Simi-
larly to the Erdős–Rényi random graph, we identify when the configuration model whp has
a giant component. Again, this condition has the interpretation that an underlying branching
process describing the exploration of a cluster has a strictly positive survival probability.
For a graph G, we recall that vk (G) denotes the number of vertices of degree k in G and
|E(G)| the number of edges. The main result concerning the size and structure of the largest
connected components of CMn (d) is the following:
Theorem 4.9 (Phase transition in CMn (d)) Consider CMn (d) subject to Conditions
1.7(a),(b). Assume that p2 = P(D = 2) < 1. Let Cmax and C(2) be the largest and second
largest connected components of CMn (d) (breaking ties arbitrarily).
(a) If ν = E[D(D − 1)]/E[D] > 1, then there exist ξ ∈ [0, 1), ζ ∈ (0, 1] such that
P
|Cmax |/n −→ ζ ,
vk (Cmax )/n −→ pk (1 − ξ k )
P
for every k ≥ 0,
2
1
E[D](1
P
|E(Cmax )|/n −→ 2
− ξ ),
4.3 The Giant in the Configuration Model 151
P P
while |C(2) |/n −→ 0 and |E(C(2) )|/n −→ 0.
(b) If ν = E[D(D − 1)]/E[D] ≤ 1, then |Cmax |/n −→ 0 and |E(Cmax )|/n −→ 0.
P P

Consequently, the same result holds for the uniform random graph P with degree sequence d
satisfying Conditions 1.7(a),(b), under the extra assumption that i∈[n] d2i = O(n).

Reformulation in Terms of Branching Processes

We start by interpreting the results in Theorem 4.9 in terms of branching processes, as
also arising in Section 4.2. As it turns out, we can interpret ζ as the survival probability
of the unimodular branching process with root offspring distribution (pk )k≥1 that appears
in Theorem 4.1 and ξ as the extinction probability of a branching process with offspring
distribution (p?k )k≥0 . Thus, ζ satisfies
X
ζ= pk (1 − ξ k ), (4.3.1)
k≥1

where ξ satisfies
X
ξ= p?k ξ k . (4.3.2)
k≥0

Clearly, ξ = 1 precisely when

X
ν= kp?k ≤ 1. (4.3.3)
k≥0

By (4.2.2), we can rewrite

1 X E[D(D − 1)]
ν= k(k + 1)pk+1 = , (4.3.4)
E[D] k≥0 E[D]

which explains the condition on ν in Theorem 4.9(a).

Further, to understand the asymptotics of vk (Cmax ), we note that there are nk = npk(Gn ) ≈
npk vertices with degree k . Each of the k direct neighbors of a vertex of degree k survives
with probability close to 1 − ξ , so that the probability that at least one of them survives is
close to 1 − ξ k . When one neighbors of the vertex of degree k survives, the vertex itself is
part of the giant component, which explains why vk (Cmax )/n −→ pk (1 − ξ k ).
P

Finally, an edge consists of two half-edges, and an edge is part of the giant compo-
nent precisely when this is true for one vertex incident to it, which occurs with probabil-
ity 1 − ξ 2 . There are in total `n /2 = nE[Dn ]/2 ≈ nE[D]/2 edges, which explains why
|E(Cmax )|/n −→ 12 E[D](1 − ξ 2 ). Therefore, the results in Theorem 4.9 have a simple
P

explanation in terms of the branching-process approximation of the connected component

of a uniform vertex in [n] in CMn (d).

The Condition P(D = 2) = p2 < 1

Because isolated vertices do not matter, without loss of generality, we may assume that
p0 = 0. The case p2 = 1, for which ν = 1, is quite exceptional. We give three examples
showing that then quite different behaviors are possible.
152 Connected Components in Configuration Models

Our first example is when dv = 2 for all v ∈ [n], so we are studying a random 2-
regular graph. In this case the components are cycles and the distribution of cycle lengths in
CMn (d) is given by Ewen’s sampling formula ESF( 12 ); see, e.g., Arratia et al. (2003). This
implies that |Cmax |/n converges in distribution to a non-degenerate distribution on [0, 1]
(Arratia et al., 2003, Lemma 5.7) and not to any constant as in Theorem 4.9. Moreover,
the same is true for |C(2) |/n (and for |C(3) |/n, . . .), so in this case there are several large
components.
To see this result intuitively, we note that in the exploration of a cluster we start with
one vertex with two half-edges. When pairing a half-edge, it connects to a vertex that again
has two half-edges. Therefore, the number of half-edges to be paired is always equal to 2,
up to the moment when the cycle is closed, and the cluster is completed. When there are
m = αn free half-edges left, the probability of closing up the cycle equals 1/m = 1/(αn),
and, thus, the time this takes is of order n. A slight extension of this reasoning shows that
the time it takes to close a cycle is nTn , where Tn converges to a limiting non-degenerate
random variable (see Exercise 4.5).
Our second example for p2 = 1 is obtained by adding a small number of vertices of
degree 1. More precisely, we let n1 → ∞ be such that n1 /n → 0 and n2 = n − n1 . In this
case, components can either be cycles, or strings of vertices with degree 2 terminated with
two vertices with degree 1. When n1 → ∞, it is more likely that a long string of vertices of
degree 2 will be terminated by a vertex of degree 1 than by closing the cycle, as for the latter
we need to pair to a unique half-edge while for the former we have n1 choices. Therefore,
intuitively this implies that |Cmax | = oP (n) (see Exercise 4.6 for details).
Our third example for p2 = 1 is obtained by instead adding a small number of vertices
of degree 4 (i.e., n4 → ∞ such that n4 /n → 0, and n2 = n − n4 .) We can regard
each vertex of degree 4 as two vertices of degree 2 that have been identified. Therefore,
to obtain CMn (d) with this degree distribution, we can start from a configuration model
having n0 = n + n4 vertices, and uniformly identify n4 pairs of vertices of degree 2. Since
the configuration model with n0 = n+n4 vertices of degree 2 has many components having
size of order n, most of these merge into one giant component upon identification of these
pairs. As a result, |Cmax | = n − oP (n), so there is a giant component containing almost
everything (see Exercise 4.7).
We conclude that the case where p2 = P(D = 2) = 1 is quite sensitive to the precise
properties of the degree structure, which are not captured by the limiting distribution (pk )k≥1
only. In what follows, we thus ignore the case where p2 = 1.

Organization of the Proof of Theorem 4.9

We give two proofs of the existence and uniqueness of the giant in Theorem 4.9. In the first
proof, in Section 4.3.1 we apply the “giant component is almost local” result in Theorems
2.28 and 2.31. In the second proof, in Section 4.3.2, we apply a continuous-time exploration
to “uncover” the giant.
4.3 The Giant in the Configuration Model 153

4.3.1 “G IANT C OMPONENT IS A LMOST L OCAL” P ROOF

Setting the Stage for the Proof of Theorem 4.9
Theorem 4.9(b) follows directly from Corollary 2.27 combined with the local convergence
in probability in Theorem 4.1 and the fact that, for ν ≤ 1 and p2 < 1, the unimodular
branching-process tree with root offspring distribution (pk )k≥0 given by pk = P(D = k)
dies out almost surely.
Theorem 4.9(a) follows from Theorems 2.28–2.31, together with the facts that, for the
unimodular branching-process tree with root offspring distribution (pk )k≥0 given by pk =
P(D = k), we have
µ(|C (o)| = ∞, do = `) = p` (1 − ξ ` ) (4.3.5)
and
Eµ do 1{|C (o)|=∞} = E[D](1 − ξ 2 ).
h i
(4.3.6)

For the latter, we refer to Exercise 4.8. Thus, it suffices to check the assumptions in Theorems
2.28–2.31. The uniform integrability of Dn = d(G on
n)
follows from Conditions 1.7(a),(b).
For the assumptions in Theorem 2.28, the local convergence in probability follows from
Theorem 4.1, so we are left to prove the crucial hypothesis in (2.6.7), which the remainder
of the proof does.
We first prove (2.6.7) under the assumption that dv ≤ b for all v ∈ [n]. At the end
of the proof, we will lift this assumption. To start with our proof of (2.6.7), applied to
Gn = CMn (d) under the condition that dv ≤ b for all v ∈ [n], we first use the al-
ternative formulation from Lemma 2.33, and note that (2.6.39) holds for the unimodular
branching-process tree with root offspring distribution (pk )k≥0 given by pk = P(D = k).
Thus, instead of proving (2.6.7), it suffices to prove (2.6.38), which we do later.
Recall from (2.6.54) that, with o1 , o2 ∈ [n] chosen independently and uar,
1 h i
E # (x, y) ∈ [n] × [n] : |∂B (Gn )
r (x)|, |∂Br
(Gn )
(y)| ≥ r, x ←→
/ y
n2
= P(|∂Br (o1 )|, |∂Br (o2 )| ≥ r, o1 ←→
(Gn ) (Gn )
/ o2 ). (4.3.7)
Thus, (2.6.38) states that
lim lim sup P(|∂Br(Gn ) (o1 )|, |∂Br(Gn ) (o2 )| ≥ r, o1 ←→
/ o2 ) = 0. (4.3.8)
r→∞ n→∞

This is what we focus on from now on.

Coupling to Branching Process: Lemma 4.2 and Beyond

We next apply Lemma √ 4.2, in particular using the two extensions in Remark 4.3. Take
an arbitrary mn = o( n); then Remark 4.3 shows that whp we can perfectly couple
(Bk(Gn ) (o1 ))k≤kn , where

k n = inf k : |Bk(Gn ) (o1 )| ≥ mn , (4.3.9)
to a unimodular branching processes (BP(1) (Gn )
k )k≤kn with root offspring distribution (pk )k≥1
(1)
in (4.2.1). Here we recall the notation below Definition 1.26, where BPk was defined to
154 Connected Components in Configuration Models

consist of the individuals in the k th generation of this branching process. Since all degrees
are bounded, |Bk(Gnn ) (o1 )| ≤ (1 + b)mn = Θ(mn ). Let Cn (1) denote the event that this
perfect coupling happens, so that

and P(Cn (1)) = 1 − o(1).

Cn (1) = (|∂Bk(Gn ) (o1 )|)k≤kn = (|BP(1)
k |)k≤kn ,
(4.3.10)
We can extend the above coupling to deal with vertex o2 , for which we need to explore
a little further. For this, we start by defining the necessary notation. Fix mn ≥ mn , to be
determined later. Let

k n = inf k : |Bk(Gn ) (o2 )| ≥ mn , (4.3.11)

and, again since all degree are bounded, |Bk(Gn ) (o2 )| ≤ (1 + b)mn = Θ(mn ). Further, for
n
δ > 0, we let

Cn (2) = (|∂Bk(Gn ) (o2 )|)k≤kn = (|BP(2)
k |)k≤kn

∩ |∂Bk (o2 )| − |BPk | ≤ (m2n /`n )1+δ ∀k ∈ [k n ]

(Gn ) (2)

≡ Cn (2, 1) ∩ Cn (2, 2), (4.3.12)

where Cn (2, 1) and Cn (2, 2) refer to the events in the first and second line of (4.3.12), respec-
tively. Here (BP(2)
k )k≥0 is again
√an n-dependent unimodular branching process independent
of (BP(1) )
k k≥0 . With m n = o( n), we will later pick mn such that mn mn n, to reach
our conclusion. The following lemma shows that also Cn (2) occurs whp:

Lemma 4.10 (Coupling beyond Lemma 4.2) Consider CMn (d) and let m2n /`n → ∞.
Then, for every δ > 0,

P(Cn (2)) = 1 − o(1). (4.3.13)

Proof The fact that Cn (2, 1) = (|∂Bk(Gn ) (o2 )|)k≤kn = (|BP(2) k |)k≤kn occurs whp fol-
lows in the same way as in (4.3.10). We thus investigate the bounds in Cn (2, 1) only for
k ∈ (k n , k n ].
Define an = (m2n /`n )1+δ where δ > 0. Let mn = (b + 1)mn denote the maximal
number of vertex explorations needed to explore the kn th generation. Recalling the notation
in Lemma 4.2, G b n (mn ) and BP
c n (mn ) denote the half-edges and individuals found up to
b n (mn ) \ BP
the mn th step of the exploration starting from o2 ; let G c n (mn ) \
c n (mn ) and BP
b n (mn ) denote the sets of half-edges that are in one, but not the other, exploration. Then
G

Cn (2, 2)c ⊆ {|G

b n (mn ) \ BP
c n (mn )| ≥ an } ∪ {|BP
c n (mn ) \ G
b n (mn )| ≥ an }. (4.3.14)
4.3 The Giant in the Configuration Model 155

We apply a first-moment method and obtain the bound

where the notation is as in Lemma 4.2, |G b n (mn )| and |BP

c n (mn )| denote the number of
half-edges and individuals found up to the mn th step of the exploration starting from o2 ,
and mn = (b + 1)m̄n , while x+ = max{0, x} and x− = max{0, −x}.
To bound these expectations, we adapt the proof of Lemma 4.2 to our setting. We start
with the first term in (4.3.15), for which we use the exploration up to size m̄n used in the
proof of Lemma 4.2. We note that the only way that |G b n (t + 1)| − |G
b n (t)| can be larger
than |BP
c n (t + 1)| − |BP
c n (t)| is when a half-edge reuse occurs. This gives rise to a primary
ghost, which then possibly gives rise to some further secondary ghosts, i.e., ghost vertices
that are found by pairing a ghost half-edge. Since all degrees are bounded by b, on Cn (2, 1)
where the first k n generations are perfectly coupled, the total number of secondary ghosts,
together with the single primary ghost, is at most
k̄n −kn
X (b − 1)k̄n −kn +1 − 1
cn ≡ (b − 1)k = , (4.3.16)
k=0
b−2

which tends to infinity. On Dn , we can let k̄n − k n → ∞ arbitrarily slowly. Therefore, on

Cn (2, 1) ∩ Dn , we have

|G
b n (m̄n )| − |BP
c n (m̄n )| ≤ cn #{half-edge reuses up to time m̄n }. (4.3.17)
Thus, by (4.2.8),
m̄2n
E 1Cn (2,1)∩Dn |G
h i
b n (m̄n )| − |BP
c n (m̄n )| ≤ cn . (4.3.18)
+ `n
We continue with the second term in (4.3.15), which is similar. We note that |G
b n (t + 1)| −
|G
b n (t)| can be smaller than |BP
c n (t+1)|−|BP
c n (t)| when a half-edge reuse occurs, or when
a vertex reuse occurs. Thus, again using that the total number of secondary ghosts, together
with the single primary ghost, is at most cn , on Cn (2, 1) ∩ Dn ,

|BP
c n (m̄n )| − |G
b n (m̄n )| ≤ cn #{half-edge and vertex reuses up to time m̄n }. (4.3.19)

As a result, by (4.2.8) and (4.2.10),

m̄2n
E 1Cn (2,1)∩Dn |G
h i
b n (m̄n )| − |BP
c n (m̄n )| ≤ cn . (4.3.20)
− `n
156 Connected Components in Configuration Models

We conclude that
1 m̄2 ` δ/2
n
P(Cn (2)c ) ≤ 2cn n = O = o(1), (4.3.21)
an `n m̄2n
when taking cn → ∞ such that cn = o((m̄2n /`n )δ/2 ).
We now define the successful coupling event Cn to be
Cn = Cn (1) ∩ Cn (2), so that P(Cn ) = 1 − o(1). (4.3.22)

Neighborhood Growth of the Branching Process

The previous step relates the graph exploration process to the (n-dependent) unimodular
pair of branching processes (BP(1) (2)
k , BPk )k≤kn . In this step, we investigate the growth of
these branching processes in a similar way as for the Erdős–Rényi random graph in Section
2.4.5.
(i)
Denote b(i)
0 = |BPr |, which we assume to be at least r (which is true on the event
Cn ∩ {|∂Br (o1 )|, |∂Br(Gn ) (o2 )| ≥ r}).
(Gn )

Recall that νn = E[Dn (Dn − 1)]/E[Dn ] denotes the expected forward degree of a
uniform half-edge in Gn = CMn (d), which equals the expected offspring of the branching
processes (BP(i)
k )k≥0 . Define

Dn = b(i) (i) (i)
k ≤ |BPr+k | ≤ b̄k ∀i ∈ [2], k ≥ 0 , (4.3.23)
where (b(i) (i) (i)
k )k≥0 and (b̄k )k≥0 satisfy the recursions b0 = b̄(i)
0 = b(i)
0 , while, for some
1
α ∈ ( 2 , 1),
(i) α (i) α
b(i) (i)
k+1 = bk νn − (b̄k ) , b̄(i) (i)
k+1 = b̄k νn + (b̄k ) . (4.3.24)
The following lemma investigates the asymptotics of (b(i) (i)
k )k≤kn −r and (b̄k )k≤kn −r :

Lemma 4.11 (Asymptotics of b(i) (i)

k and b̄k ) Assume that limn→∞ νn = ν > 1 and that
(i)
(i)
b0 = |BPr | ≥ r. Then there exists an A = Ar > 1 such that, for all k ≥ 0,
(i) k (i) k
b̄(i)
k ≤ Ab0 νn , b(i)
k ≥ b0 νn /A. (4.3.25)
(i) k
Proof First, obviously, b̄(i)
k ≥ b0 νn . Thus, also using that νn > 1,
(i) α (i) α−1
b̄(i) (i) (i)
k+1 = b̄k νn + (b̄k ) ≤ b̄k νn (1 + (b̄k ) )
≤ b̄k νn 1 + r−(1−α) νn−(1−α)k .
(i)

(4.3.26)
By iteration, this implies the upper bound with A replaced by Ār , given by
Y
1 + r−(1−α) νn−(1−α)k < ∞.

Ār = (4.3.27)
k≥0

(i) k
For the lower bound, we use that b̄(i)
k ≤ Ār b0 νn to obtain
α (i) α αk
b(i) (i)
k+1 ≥ bk νn − Ār (b0 ) νn . (4.3.28)
We use induction to show that
(i) k
b(i)
k ≥ ak b0 νn , (4.3.29)
4.3 The Giant in the Configuration Model 157

where a0 = 1 and
ak+1 = ak − Āαr r1−α νn(α−1)k−1 . (4.3.30)
The initialization follows since b(i) (i)
0 = b0 and a0 = 1. To advance the induction hypothesis,
we substitute it to obtain that
(i) k+1
b(i)
k+1 ≥ ak b0 νn − Āαr (b(i) α αk
0 ) νn
k+1 α−1 (α−1)k−1
ak − Āαr (b(i)

= b(i)
0 νn 0 ) νn
k+1
ak − Āαr rα−1 νn(α−1)k−1 = ak+1 b(i) k+1

≥ b(i)
0 νn 0 νn , (4.3.31)
by (4.3.30). Finally, ak is decreasing, and thus ak & a ≡ 1/Ar , where
Y −1
Ar = 1 − Āαr r−(1−α) νn−(1−α)k <∞
k≥0

for r large enough, so that the claim follows with A = Ar = max{Ār , Ar }.

The following lemma shows that Dn = b(i) (i) (i)
k ≤ |BPr+k | ≤ b̄k ∀i ∈ [2], k ∈ [kn − r]
occurs whp when first n → ∞ followed by r → ∞:
(i)
Lemma 4.12 (Dn occurs whp) Assume that b(i)
0 = |BPr | ≥ r . Then

lim lim sup P(Dn ) = 1. (4.3.32)

r→∞ n→∞

Proof We will show that limr→∞ lim supn→∞ P(Dnc ) = 0. We write

n −r
kX
P(Dnc ) ≤ P(Dn,k
c
∩ Dn,k−1 ), (4.3.33)
k=1

where σn2 is the variance of the offspring distribution, given by

1 X
σn2 = dv (dv − 1)2 − νn2 , (4.3.39)
`n v∈[n]

so that σn2 ≤ b(b − 1)2 is uniformly bounded. Thus, by the union bound for i ∈ {1, 2},
P(Dn,k
c
∩ Dn,k−1 ) ≤ 2σn2 (b̄(i)
k−1 )
1−2α
, (4.3.40)
and we conclude that
X
P(Dnc ) ≤ 2σn2 1−2α 1−2α

(b̄(1)
k−1 ) + (b̄(2)
k−1 ) . (4.3.41)
k≥1

The claim now follows from Lemma 4.11 together with α ∈ ( 12 , 1) and the fact that σn2 ≤
b(b − 1)2 remains uniformly bounded.

Completion of the Proof

Recall (4.3.8). Also recall the definition of Cn in (4.3.22), (4.3.10), and (4.3.12), and that of
Dn in (4.3.23). Let Gn = Cn ∩ Dn be the good event. By (4.3.22) and Lemma 4.12,
lim lim sup P(|∂Br(Gn ) (o1 )|, |∂Br(Gn ) (o2 )| ≥ r, o1 ←→
/ o2 ; Gnc ) = 0, (4.3.42)
r→∞ n→∞

so that it suffices to investigate P(|∂Br(Gn ) (o1 )|, |∂Br(Gn ) (o2 )| ≥ r, o1 ←→

When ∂Bk(Gnn ) (o1 ) ∩ ∂Bk(Gn ) (o2 ) 6= ∅, we have o1 ←→ o2 so this does not contribute to
n
(4.3.42).
On the other hand, when ∂Bk(Gnn ) (o1 ) ∩ ∂Bk(Gn ) (o2 ) = ∅, by Lemma 4.11 and when
n
m2n /`n → ∞ sufficiently slowly, |∂Bk(Gnn ) (o1 )| = ΘP (mn ) and |∂Bk(Gnn ) (o2 )| = ΘP (mn ).
The same bounds hold for the number of half-edges Zk(1)n and Zk(2) incident to ∂Bk(Gnn ) (o1 )
n
and ∂Bk(Gn ) (o2 ), respectively, since Zk(1)n ≥ |∂Bk(Gnn+1
)
(o 1 )| and Zk
(2)
≥ |∂Bk(Gnn+1
)
(o2 )|, so
n n
(1) (2)
that also Zkn = ΘP (mn ) and Zk = ΘP (mn ).
n
Conditional on having paired some half-edges incident to ∂Bk(Gnn ) (o1 ), each further such
half-edge has probability at least 1 − Zk(2) /`n of being paired with a half-edge incident to
n
∂Bk(Gnn ) (o2 ), thus creating a path between o1 and o2 . The latter conditional probability is
independent of the pairing of the earlier half-edges. Thus, the probability that ∂Bk(Gnn ) (o1 ) is
not directly connected to ∂Bk(Gn ) (o2 ) is at most
n

Zk(2) Zk(1)n /2
1− n , (4.3.45)
`n
4.3 The Giant in the Configuration Model 159

since at least Zk(1) /2 pairings need to be performed. This probability vanishes when mn mn
n
n. As a result, as n → ∞,
P(|∂Br(Gn ) (o1 )|, |∂Br(Gn ) (o2 )| ≥ r, o1 ←→
/ o2 ; Gn ) = o(1), (4.3.46)
as required. This completes the proof of (2.6.7) for CMn (d) with uniformly bounded de-
grees, and indeed shows that distCMn (d) (o1 , o2 ) ≤ 2r + k n + k n + 1 whp on the event that
|∂Br(Gn ) (o1 )|, |∂Br(Gn ) (o2 )| ≥ r.

Degree Truncation to Lift the Uniform Degree Boundedness Condition

We finally lift the assumption that dv ≤ b for all v ∈ [n] in the proof of (2.6.7), by applying
the degree-truncation technique from Theorem 1.11 with b large. This is not necessary when,
e.g., E[Dn3 ] = Θ(1) but it allows us to deal with high variability in the degrees.
By Conditions 1.7(a),(b), we can choose b large enough that
1 X εn
P(Dn? > b) = dv 1{dv >b} ≤ . (4.3.47)
`n v∈[n] `n

Recall the definition of CMn0 (d0 ) in Theorem 1.11 and its proof. This implies that CMn0 (d0 )
has at most (1+ε)n vertices, and that the (at most εn) extra vertices compared with CMn (d)
all have degree 1, while the vertices in [n] have degree d0v ≤ b. Further, with Cmax
0
the largest
0
connected component in CMn0 (d ), by (4.3.47), we have
0
|Cmax | ≤ |Cmax | + εn, (4.3.48)
so that
P(|Cmax | ≥ n(ζ 0 − 2ε)) → 1, (4.3.49)
0
|/n −→ ζ 0 . We denote the limiting parameters of CMn0 (d0 ) by (p0k )k≥1 , ξ 0 and
P
when |Cmax
0
ζ , and note that, for ε as in (4.3.47), when ε & 0, we have
p0k → pk , ξ 0 → ξ, ζ 0 → ζ, (4.3.50)
so that we can take b sufficiently large that, for all ε > 0,
P(|Cmax | ≥ n(ζ − 3ε)) → 1. (4.3.51)
This proves the required lower bound on |Cmax |, while the upper bound follows from Corol-
lary 2.27.
Remark 4.13 (Small-world properties of CMn (d)) We next discuss the consequences of
the above proof to the small-world nature of CMn (d), in a similar way to the proof of
Theorem 2.36. Here we should consider the use of the degree truncation in Theorem 1.11.
Let o1 , o2 ∈ [n] be chosen uar. Recall from Theorem 1.11(c) that distCMn (d) (o1 , o2 ) ≤
distCMn0 (d0 ) (o1 , o2 ). The above “giant is almost local” proof shows that, whp, if n → ∞
followed by r → ∞, then
distCMn0 (d0 ) (o1 , o2 ) ≤ 2r + k n + k n + 1. (4.3.52)

Lemma 4.11 implies the asymptotics k n = (1 + oP (1)) log mn / log νn0 and k n = (1 +
160 Connected Components in Configuration Models

oP (1)) log mn /log νn0 , where νn0 = E[Dn0 0 (Dn0 0 − 1)]/E[Dn0 0 ]. Thus, on the event Gn =
Cn ∩ D n ,

log (mn mn ) log n

2r + k n + k n + 1 = (1 + oP (1)) 0
= (1 + oP (1)) (4.3.53)
log νn log νn0

when mn mn /n → ∞ slowly enough. Assume that Conditions 1.7(a)–(c) hold. Then, we

can choose b = b(ε) and n large enough to make 1/ log νn0 ≤ (1 + ε)/ log ν , so that

log n
distCMn (d) (o1 , o2 ) ≤ (1 + ε). (4.3.54)
log ν

Such small-world results are explored in more detail in Chapter 7. J

4.3.2 C ONTINUOUS -T IME E XPLORATION P ROOF

In this subsection we give an alternative continuous-time exploration proof of Theorem 4.9,
using a clever randomization scheme to explore the connected components one by one.
This construction is explained in terms of a simple continuous-time algorithm below. The
algorithm describes the number of vertices of given degrees that have been found, as well as
the total number of unpaired half-edges, at time t > 0. It is proved that, when n → ∞, these
quantities all converge in probability to deterministic functions described in terms of some
functions s 7→ H(s) and s 7→ G?D (s). In particular, the number of unpaired half-edges in
our exploration is given in terms of s 7→ H(s), so that the first zero of this function gives the
size of the giant component. We analyze this algorithm below and show that, when ζ > 0,
after a short initial period of exploring small clusters, the giant component is found, and the
exploration explores it completely, after which no large component is left. When ζ = 0,
however, only small clusters are found. A crucial aspect in the proof resides in how to deal
with the depletion-of-points-and-half-edges effect.

Reformulation in Terms of Generating Functions

We start by reformulating the results in Theorem 4.9 in terms of generating functions, which
play a crucial role throughout our proof. For s ∈ [0, 1], let
∞
X
GD (s) = pk sk = E[sD ] (4.3.55)
k=1

be the probability generating function for the probability distribution (pk )k≥1 given by pk =
P(D = k). Recall that, for a non-negative random variable D, the random variable D?
denotes its size-biased distribution. Define further, again for s ∈ [0, 1],
?
X
G?D (s) = E[sD ] = p?k sk = G0D (s)/E[D], (4.3.56)
k≥0

H(s) = E[D]s s − G?D (s) .

(4.3.57)
4.3 The Giant in the Configuration Model 161

Note that G?D (1) = 1, and thus H(0) = H(1) = 0. Note also that
d X
H 0 (1) = E[D] 1 − G?D (1) = E[D] 1 − kp?k
ds k≥0
X
= E[D] − k(k − 1)pk = −E[D(D − 2)]. (4.3.58)
k≥1

P properties of s 7→ H(s), see Lemma 4.18 below. We conclude that if E[D(D −

For further
2)] = k k(k − 2)pk > 0 and if p?0 > 0, then there is a unique ξ ∈ (0, 1) such that
H(ξ) = 0, or equivalently G?D (ξ) = ξ , so that indeed ξ is the extinction probability of the
branching process with offspring distribution (p?k )k≥0 in (4.2.2). Further, E[D(D − 2)] > 0
precisely when ν = E[D(D − 1)]/E[D] > 1. When instead p?0 = 0, or equivalently
p1 = 0, ξ = 0 is the unique solution in [0, 1) of H(ξ) = 0. The functions s 7→ H(s) and
s 7→ G?D (s) play a central role in our analysis.

Reduction to the Case where P(D = 1) = p1 > 0

In our proof, it is convenient to assume that p1 = P(D = 1) > 0. The extinction probability
ξ and the survival probability ζ satisfy ξ = 0 and ζ = 1 when p1 = 0, which causes
technical difficulties in the proof. We now explain how we can reduce the case where p1 = 0
to the case where p1 > 0 in a similar way to that used in Theorem 1.11.
Let dmin = min{k : pk > 0} be the minimum of the support of the asymptotic degree
distribution D. Fix ε > 0, and assume that ε < pdmin . Consider the configuration model with
ñ = n + (dmin − 1)εn and degree sequence d̃ = (d˜i )i∈[n] , with ñk = nk for all k > dmin ,
ñdmin = ndmin − εn, and ñ1 = dmin εn. This configuration model can be obtained from
CMn (d) by replacing εn vertices of degree dmin by dmin vertices having degree 1, as if we
have “forgotten” that these vertices are actually equal.
Clearly, CMn (d) can be retrieved by identifying εn collections of dmin vertices of de-
gree 1, and collapsing them to a single vertex of degree dmin . When d satisfies Conditions
1.7(a),(b), then so does d̃ with limiting degree distribution p̃1 = dmin ε/(1 + (dmin − 1)ε),
p̃dmin = (pdmin −ε)/(1+(dmin −1)ε), and p̃k = pk /(1+(dmin −1)ε) for all k > dmin . Let
]
C max denote the giant in the configuration model with extra degree-1 vertices. The above
procedure satisfies |Cmax | ≥ |C ] ]
max | − dmin εn. Further, if ζε denotes the limit of |Cmax |/ñ
for d̃, we have that ζε → 1 as ε & 0. As a result, Theorem 4.9 for ζ = 1, ξ = 0 follows
from Theorem 4.9 with p1 > 0, for which ζ < 1 and ξ > 0. In the remainder of the proof,
we therefore without loss of generality assume that ξ > 0 and ζ < 1.

Finding the Largest Component

The components of an arbitrary finite graph or multi-graph can be found by a standard
breadth-first exploration. Pick an arbitrary vertex v and determine the component of v as
follows: include all the neighbors of v in an arbitrary order; then add in the neighbors of
the neighbors, and so on, until no more vertices can be added. The vertices included until
this moment form the component of v . If there are still vertices left in the graph then pick
any such vertex w and repeat the above to determine the second connected component (the
component of vertex w). Carry on in this manner until all the components have been found.
The same result can be obtained in the following way, which turns out to be convenient
162 Connected Components in Configuration Models

for the exploration of the giant component in the configuration model. Regard each edge
as consisting of two half-edges, each half-edge having one endpoint. We label the vertices
as sleeping or awake (i.e., used) and the half-edges as sleeping, active, or dead (already
paired into edges); the sleeping and active half-edges are also called living. We start with all
vertices and half-edges sleeping. Pick a vertex and label its half-edges as active. Then take
any active half-edge, say x, and find its partner y in the graph; label these two half-edges as
dead. Further, if the endpoint of y is sleeping, label it as awake and all other half-edges of
the vertex incident to y as active. Repeat as long as there are active half-edges. When there is
no active half-edge left, we have obtained the first connected component in the graph. Then
start again with another vertex until all components are found.
We apply this algorithm to CMn (d), revealing its edges during the process. Thus we
initially only observe the vertex degrees and the half-edges, not how they are paired with
form edges. Hence, each time we need a partner for a half-edge, this partner is uniformly
distributed over all other living half-edges. It is here that we are using the specific structure
of the configuration model, which simplifies the analysis substantially.
We make the random choices of finding a partner for the half-edges by associating iid
random maximal lifetimes Ex to the half-edge x, where Ex has an Exp(1) distribution. We
interpret these lifetimes as clocks, and changes in our exploration process occur only when
the clock of a half-edge rings. In other words, each half-edge dies spontaneously at rate 1
(unless killed earlier). Each time we need to find a partner for a half-edge x, we then wait
until the next living half-edge unequal to x dies and take that one. This process in continuous
time can be formulated as an algorithm, constructing CMn (d) and exploring its components
simultaneously, as follows. Recall that we start with all vertices and half-edges sleeping. The
exploration is then formalized in the following three steps:

Step 1 When there is no active half-edge (as at the beginning), select a sleeping vertex and
declare it awake and all its half-edges active. For definiteness, we choose the vertex by
choosing a half-edge uar among all sleeping half-edges. When there is no sleeping half-
edge left, the process stops; the remaining sleeping vertices are all isolated and we have
explored all other components.
Step 2 Pick an active half-edge (which one does not matter) and kill it, i.e., change its status to
dead.
Step 3 Wait until the next half-edge dies (spontaneously, as a result of its clock ringing). This
half-edge is paired with the one killed in step Step 2 to form an edge of the graph. If
the vertex incident to it is sleeping, then we change this vertex to awake and all other
half-edges incident to it to active. Repeat from Step 1.

The above randomized algorithm is such that components are created between the succes-
sive times at which Step 1 is performed, where we say that Step 1 is performed when there
is no active half-edge and, as a result, a new vertex is chosen whose connected component
we continue exploring.
The vertices in the component created during one of these intervals between the suc-
cessive times at which Step 1 is performed are the vertices that are awakened during the
interval. Note also that a component is completed and Step 1 is performed exactly when the
4.3 The Giant in the Configuration Model 163

number of active half-edges is 0 and a half-edge dies at a vertex where all other half-edges (if
any) are dead. Below, we investigate the behavior of the key characteristics of the algorithm.

Analysis of the Algorithm for the Configuration Model

We start by introducing the key characteristics of the above exploration algorithm. Let S(t)
and A(t) be the numbers of sleeping and active half-edges, respectively, at time t, and let

L(t) = S(t) + A(t) (4.3.59)

be the number of living half-edges. For definiteness, we define these random functions to be
right-continuous.
P
Let us first look at L(t). We start with `n = i∈[n] di half-edges, all sleeping and thus
living, but we immediately perform Step 1 and Step 2 and kill one of them. Thus, L(0) =
`n − 1. Next, as soon as a living half-edge dies, we perform Step 3 and then (instantly)
either Step 2 or both Step 1 and Step 2. Since Step 1 does not change the number of
living half-edges while Step 2 and Step 3 each decrease it by 1, the total result is that L(t)
is decreased by 2 each time one of the living half-edges dies, except when the last living one
dies and the process terminates. Because of this simple dynamics of t 7→ L(t), we can give
sharp asymptotics of L(t) when n → ∞:

Proposition 4.14 (Number of living half-edges) As n → ∞, for any t0 ≥ 0 fixed,

sup |n−1 L(t) − E[Dn ]e−2t | −→ 0.

P
(4.3.60)
0≤t≤t0

Proof The process t 7→ L(t) satisfies L(0) = `n − 1, and it decreases by 2 at rate L(t).
As a result, it is closely related to a death process. We study such processes in the following
lemma:

Lemma 4.15 (Asymptotics of death processes) Let d, γ > 0 be given and let (N (x) (t))t≥1
be a Markov process such that N (x) (0) = x almost surely, and the dynamics of t 7→
(N (x) (t))t≥1 is such that from position y , it jumps down by d at a rate γy . In other words,
the waiting time until the next event is Exp(1/γy) and each jump is of size d downwards.
Then, for every t0 ≥ 0,
h 2
i
E sup N (x) (t) − e−γdt x ≤ 8d(eγdt0 − 1)x + 8d2 . (4.3.61)
t≤t0

Proof The proof follows by distinguishing several cases. First assume that d = 1 and
that x is an integer. In this case, the process is a standard pure death process taking the
values x, x − 1, x − 2, . . . , 0, describing the number of particles alive when the particles
die independently at rate γ > 0. As is well known, and is easily seen by regarding N (x) (t)
as the sum of x independent copies of the process N (1) (t), the process (eγt N (x) (t))t≥1 , is
a martingale starting in x. Furthermore, for every t ≥ 0, the random variable N (x) (t) has
a Bin(x, e−γt ) distribution, since each particle (of which there are x) has a probability of
dying before time t equal to e−γt , and the different particles die independently.
Application of Doob’s martingale inequality (recall (1.5.4)), now in continuous time,
164 Connected Components in Configuration Models

yields
h 2
i h 2
i h 2 i
E sup N (x) (t) − e−γt x ≤ E sup eγt N (x) (t) − x ≤ 4E eγt N (x) (t0 ) − x
t≤t0 t≤t0
2γt
= 4e Var(N (x) (t0 )) ≤ 4(eγt0 − 1)x. (4.3.62)
This proves the claim when x is integer.
Next, we still assume
that d = 1, but let x > 0 be arbitrary. We can couple the two
processes N (x) (t) t≥1 and N (bxc) (t))t≥1 with different initial values in such a way that
whenever the smaller one jumps by 1, so does the other. This coupling keeps
|N (x) (t) − N (bxc) (t)| < 1 (4.3.63)
for all t ≥ 0, and thus,
sup N (bxc) (t) − e−γt bxc ≤ sup N (x) (t) − e−γt x + 2, (4.3.64)
t≤t0 t≤t0

so that by (4.3.62), in turn,

h 2
i
E sup N (x) (t) − e−γt x ≤ 8(eγt0 − 1)x + 8. (4.3.65)
t≤t0

Finally, for a general d > 0, we observe that N (x) (t)/d is a process of the same type with the
parameters (γ, d, x) replaced by (γd, 1, x/d), and the general result follows from (4.3.65)
and (4.3.62).
The proof of Proposition 4.14 follows from Lemma 4.15 with d = 2, x = `n − 1 =
nE[Dn ] − 1, and γ = 1.
We continue by considering the sleeping half-edges S(t). Let Vk (t) be the number of
sleeping vertices of degree k at time t, so that
∞
X
S(t) = kVk (t). (4.3.66)
k=1

Note that Step 2 does not affect sleeping half-edges, and that Step 3 implies that each
sleeping vertex of degree k is eliminated (i.e., awakened) with intensity k , independently of
what happens to all other vertices. However, some sleeping vertices eliminated by Step 1,
which complicates the dynamics of t 7→ Vk (t).
It is here that the depletion-of-points-and-half-edges effect enters the analysis of the com-
ponent structure of CMn (d). This effect is complicated, but we will see that it is quite
harmless, as can be understood by noting that we apply Step 1 only when we have com-
pleted the exploration of an entire component. Since we are mainly interested in settings
where the giant component is large, we will see that we will not be using Step 1 very often
before having completely explored the giant component. After having completed the explo-
ration of the giant component, we start using Step 1 again quite frequently, but it will turn
out that then it is very unlikely to be exploring any particularly large connected component.
Thus, we can have a setting in mind where the number of applications of Step 1 is quite
small. With this intuition in mind, we first ignore the effect of Step 1 by letting Vek (t) be the
number of vertices of degree k such that all its half-edges x have maximal lifetimes Ex > t
i.e., none of its k half-edges would have died spontaneously up to time t, assuming they all
4.3 The Giant in the Configuration Model 165

escaped Step 1. We conclude that, intuitively, the difference between Vk (t) and Vek (t) can
be expected to be insignificant. We thus start by focussing on the dynamics of (Vek (t))t≥1 ,
ignoring the effect of Step 1, and later correct for this omission.
For a given half-edge, we call the half-edges incident to the same vertex its sibling half-
edges. Further, let
∞
X
S(t)
e = k Vek (t) (4.3.67)
k=1

denote the number of half-edges whose sibling half-edges have all escaped spontaneous
death up to time t. Comparing with (4.3.66), we see that the process S(t)
e ignores the effect
of Step 1 in an identical way to Vek (t).
Recall the functions GD , G?D from (4.3.55) and (4.3.56), and define, for s ∈ [0, 1],
h(s) = sE[D]G?D (s). (4.3.68)
Then, we can identify the asymptotics of (Vek (t))t≥1 in a similar way to that in Proposition
4.14:
Lemma 4.16 (Number of living vertices of degree k ) Subject to Conditions 1.7(a),(b), as
n → ∞ and for any t0 ≥ 0 fixed,
sup |n−1 Vek (t) − pk e−kt | −→ 0
P
(4.3.69)
t≤t0

for every k ≥ 0, and

∞
X
sup |n−1 Vek (t) − GD (e−t )| −→ 0,
P
(4.3.70)
t≤t0
k=1

sup |n−1 S(t)

e − h(e−t )| −→ 0. P
(4.3.71)
t≤t0

Proof The statement (4.3.69) again follows from Lemma 4.15, now with γ = k , x = nk
and d = 1. We can replace p(Gk
n)
= nk /n by pk by Condition 1.7(a).
By Condition 1.7(b), the sequence of random variables (Dn )n≥1 is uniformly integrable,
which means that for every ε > 0 there exists K < ∞ such that k>K knk /n =
P
E[Dn 1{D Pn >k} ] < ε for all n. We may further assume (or deduce from Fatou’s inequal-
ity) that k>K kpk < ε and obtain by (4.3.69) that, whp,
∞
X
−1 e − h(e−t )| = sup
sup |n S(t) k(n−1 Vek (t) − pk e−kt )
t≤t0 t≤t0
k=1
K n
k
X X
≤ k sup |n−1 Vek (t) − pk e−kt | + k + pk
k=1
t≤t0
k>K
n
≤ ε + ε + ε,
proving (4.3.71). An almost identical argument yields (4.3.70).
Remarkably, the difference between S(t) and S(t)e is easily estimated. The following
result can be viewed as the key to why this approach works. Indeed, it gives a uniform upper
bound on the difference due to the application of Step 1:
166 Connected Components in Configuration Models

Lemma 4.17 (Effect of Step 1) Let dmax := maxv∈[n] dv be the maximum degree of
CMn (d). Then
0 ≤ S(t)
e − S(t) < sup (S(s)
e − L(s)) + dmax . (4.3.72)
0≤s≤t

The process (S(t))

e t≥1 runs at scale n (see, e.g., the related statement for (L(t))t≥1 in
Proposition 4.14). Further, dmax = o(n) when Conditions 1.7(a),(b) hold. Finally, one can
expect that S(s)
e ≤ L(s) holds, since the difference is related to the number of active half-
edges. Thus, intuitively, sup0≤s≤t (S(s)
e − L(s)) = S(0)e − L(0) = 0. We will make that
argument precise after we have proved Lemma 4.17. We now conclude that S(t) e − S(t) =
oP (n), and so they have the same limit after rescaling by n. Let us now prove Lemma 4.17:
Proof Clearly, Vk (t) ≤ Vek (t), and thus S(t) ≤ S(t) e − S(t) increases
e . Furthermore, S(t)
only as a result of Step 1. Indeed, Step 1 acts to guarantee that A(t) = L(t) − S(t) ≥ 0,
and is only performed when A(t) = 0.
If Step 1 is performed at time t and a vertex of degree j > 0 is awakened, then Step 2
applies instantly and we have A(t) = j − 1 < dmax and consequently
e − S(t) = S(t)
S(t) e − L(t) + A(t) < S(t)
e − L(t) + dmax . (4.3.73)
e − S(t) is never changed by Step 2 and either unchanged or decreased
Furthermore, S(t)
e − S(t) does not increase until the next time Step 1 is performed.
by Step 3. Hence, S(t)
Consequently, for any time t, if s was the last time before (or equal to) t that Step 1 was
e − S(t) ≤ S(s)
performed, then S(t) e − S(s), and the result follows by (4.3.73).
Let us now set the stage for taking the limits of n → ∞. Recall that A(t) = L(t) − S(t)
denotes the number of awakened vertices, and let
e = L(t) − S(t)
A(t) e = A(t) − (S(t)
e − S(t)) (4.3.74)
e ≤ A(t)
denote the number of awakened vertices, ignoring the effect of Step 1. Thus, A(t)
since S(t) ≤ S(t)
e . We use A(t)
e as a proxy for A(t) similarly to how S(t) e is used as a
proxy for S(t).
Recall the definition of s 7→ H(s) in (4.3.57). By Lemmas 4.14 and 4.16, and the defini-
e = L(t) − S(t)
tion that A(t) e , for any t0 ≥ 0, we have

sup |n−1 A(t)

e − H(e−t )| −→ 0. P
(4.3.75)
t≤t0

Lemma 4.17 can be rewritten as

0 ≤ S(t)
e − S(t) < − inf A(s)
e + dmax . (4.3.76)
s≤t

By (4.3.74) and (4.3.76),

e ≤ A(t) < A(t)
A(t) e − inf A(s)
e + dmax , (4.3.77)
s≤t

which, perhaps, illuminates the relation between A(t) and A(t)

e . Recall that connected com-
ponents are explored between subsequent zeros of the process t 7→ A(t). The function
4.3 The Giant in the Configuration Model 167

t 7→ H(e−t ), which acts as the limit of A(t)

e (and, as proved below, also of A(t)), is strictly
positive in (0, − log ξ) and H(1) = H(ξ) = 0. Therefore, we expect A(t) e to be positive
for t ∈ (0, − log ξ), and, if so, inf s≤t A(s)
e = 0. This would prove that indeed A(t)
e and
A(t) are close on this entire interval, and the exploration on the interval t ∈ (0, − log ξ)
will turn out to correspond to the exploration of the giant component.
The idea is to continue our algorithm defined by Steps 1–3 until the giant component has
been found, which implies that A(t) > 0 for the time of exploration of the giant component,
and A(t) = 0 for the first time we complete the exploration of the giant component, which is
t = − log ξ . Thus, the term inf s≤t A(s)
e in (4.3.77) ought to be negligible. When Conditions
1.7(a),(b) hold, we further have that dmax = o(n), so that one can expect A(t)
e to be a good
approximation to A(t). The remainder of the proof makes this intuition precise. We start by
summarizing some useful analytical properties of s 7→ H(s) on which we will rely later:
Lemma 4.18 (Properties of s 7→ H(s)) Suppose that Conditions 1.7(a),(b) hold, and let
H(x) be given by (4.3.57). Suppose also that p2 < 1.

(a) If ν = E[D(D − 1)]/E[D] > 1 and p1 > 0, then there is a unique ξ ∈ (0, 1) such
that H(ξ) = 0. Moreover, H(s) < 0 for all s ∈ (0, ξ) and H(s) > 0 for all s ∈ (ξ, 1).

(b) If ν = E[D(D − 1)]/E[D] ≤ 1, then H(s) < 0 for all s ∈ (0, 1).

Proof As remarked earlier, H(0) = H(1) = 0 and H 0 (1) = −E[D(D − 2)]. Fur-
thermore, if we define φ(s) := H(s)/s, then φ(s) = E[D](s − G?D (s)) is a concave
function on (0, 1], and it is strictly concave unless pk = 0 for all k ≥ 3, in which case
H 0 (1) = −E[D(D − 2)] = p1 > 0. Indeed, p1 + p2 = 1 when pk = 0 for all k ≥ 3.
Since we assume that p2 < 1, we thus obtain that p1 > 0 in this case.
In case (b), we thus have that φ is concave and φ0 (1) = H 0 (1) − H(1) ≥ 0, with
either the concavity or the inequality strict, and thus φ0 (s) > 0 for all s ∈ (0, 1), whence
φ(s) < φ(1) = 0 for s ∈ (0, 1).
In case (a), H 0 (1) < 0, and thus H(s) > 0 for s close to 1. Further, when p1 > 0,
H (0) = −h0 (0) = −p1 < 0, and thus H(s) ≤ 0 for s close to 0. Hence, there is at least
0

one ξ ∈ (0, 1) with H(ξ) = 0 and, since H(s)/s is strictly concave and also H(1) = 0,
there is at most one such ξ and the result follows.

Proof of Theorem 4.9: Preparations

Let ν > 1, let ξ be the zero of H given by Lemma 4.18(a), and let θ = − log ξ . Then, by
Lemma 4.18, H(e−t ) > 0 for 0 < t < θ, and thus inf t≤θ H(e−t ) = 0. Consequently,
(4.3.75) implies
n−1 inf A(t)
e = inf n−1 A(t)
e − inf H(e−t ) −→
P
0. (4.3.78)
t≤θ t≤θ t≤θ

Further, by Condition 1.7(b), dmax = o(n), and thus dmax /n → 0. Therefore, (4.3.76)
and (4.3.78) yield
sup n−1 |A(t) − A(t)| = sup n−1 |S(t)
P
e e − S(t)| −→ 0. (4.3.79)
t≤θ t≤θ
168 Connected Components in Configuration Models

Thus, by (4.3.75),
sup |n−1 A(t) − H(e−t )| −→ 0.
P
(4.3.80)
t≤θ

This is the work horse of our argument. By Lemma 4.18, we know that t 7→ H(e−t ) is
positive on (0, − log ξ) when ν > 1. Thus, exploration in the interval (0, − log ξ) will find
the giant component. In particular, we need to show that whp no large connected component
is found before or after this interval (showing that the giant is unique), and we need to
investigate the properties of the giant, in terms of its number of edges, vertices of degree k ,
etc. We now provide these details.
Let 0 < ε < θ/2. Since H(e−t ) > 0 on the compact interval [ε, θ − ε], (4.3.80) implies
that A(t) remains whp positive on [ε, θ − ε], and thus we have not started exploring a new
component in this interval.
On the other hand, again by Lemma 4.18(a), H(e−(θ+ε) ) < 0 and (4.3.75) implies that
−1 e
n A(θ + ε) −→ H(e−(θ+ε) ), while A(θ + ε) ≥ 0. Thus, with ∆ = |H(e−θ−ε )|/2 > 0,
P

whp
e + ε) − S(θ + ε) = A(θ + ε) − A(θ
S(θ e + ε) ≥ −A(θ
e + ε) > n∆, (4.3.81)

while (4.3.79) yields that S(θ)−S(θ)

e < n∆ whp. Consequently, whp S(θ+ε)−S(θ+ε)
e >
S(θ) − S(θ), so whp Step 1 is performed at least once between the times θ and θ + ε.
e

Let T1 be the last time that Step 1 was performed before time θ/2. Let T2 be the next time
that Step 1 is performed (by convention, T2 = ∞ if such a time does not exist). We have
shown that, for any ε > 0 and whp, 0 ≤ T1 ≤ ε and θ − ε ≤ T2 ≤ θ + ε. In other words,
P P
T1 −→ 0 and T2 −→ θ. We conclude that we have found one component that is explored
P P
between time T1 −→ 0 and time T2 −→ θ. This is our candidate for the giant component,
and we continue to study its properties, i.e., its size, its number of edges, and its number of
vertices of degree k . These properties are stated separately in the next proposition, so that
we are able to reuse them later on:
Proposition 4.19 (Connected component properties) Let T1? and T2? be two random times
when Step 1 is performed, with T1? ≤ T2? , and assume that T1? −→ t1 and T2? −→ t2
P P

where 0 ≤ t1 ≤ t2 ≤ θ < ∞. If C ? is the union of all components explored between T1?

and T2? then

vk (C ? )/n −→ pk (e−kt1 − e−kt2 ),

P
k ≥ 0, (4.3.82)
|C ? |/n −→ GD (e−t1 ) − GD (e−t2 ),
P
(4.3.83)
|E(C ? )|/n −→ 12 h(e−t1 ) − 12 h(e−t2 ).
P
(4.3.84)

In particular, if t1 = t2 , then |C ? |/n −→ 0 and |E(C ? )| −→ 0.

P P

Below, we apply Proposition 4.19 to T1 = oP (1) and T2 = θ + oP (1). We can identify the
values of the above constants for t1 = 0 and t2 = θ as e−t1 = 1, e−t2 = ξ , GD (e−t1 ) = 1,
GD (e−t2 ) = 1 − ζ , h(e−t1 ) = 2E[D], h(e−t2 ) = 2E[D]ξ 2 (see Exercise 4.9).
By Proposition 4.19 and Exercise 4.9, Theorem 4.9(a) follows if we can prove that the
4.3 The Giant in the Configuration Model 169

connected component found between times T1 and T2 is indeed the giant component. This
will be proved after we complete the proof of Proposition 4.19:
Proof The set of vertices C ? contains all vertices awakened in the interval [T1? , T2? ) and
no others, and thus (writing Vk (t−) = lims%t Vk (s))
vk (C ? ) = Vk (T1? −) − Vk (T2? −), k ≥ 1. (4.3.85)

Since T2? −→ t2 ≤ θ and H is continuous, we obtain that

inf H(e−t ) −→ inf H(e−t ) = 0,

P
(4.3.86)
t≤T2? t≤t2

where the latter equality follows since H(1) = 0. Now, (4.3.75) and (4.3.76) imply, in
analogy with (4.3.78) and (4.3.79), that n−1 inf t≤T2? A(t)
P
e −→ 0 and thus also

sup n−1 |S(t)

P
e − S(t)| −→ 0. (4.3.87)
t≤T2?

Since Vek (t) ≥ Vk (t) for every k and t ≥ 0,

∞
X
Vek (t) − Vk (t) ≤ k −1 `(Ve` (t) − V` (t)) = k −1 (S(t)
e − S(t)), k ≥ 1. (4.3.88)
`=1

Hence (4.3.87) implies that supt≤T2? |Vek (t) − Vk (t)| = oP (n) for every k ≥ 1. Conse-
quently, using Lemma 4.16, for j = 1, 2, we have
?
Vk (Tj? −) = Vek (Tj? −) + oP (n) = npk e−kTj + oP (n) = npk e−ktj + oP (n), (4.3.89)
P∞
and (4.3.82) follows by (4.3.85). Similarly, using k=0 (Vek (t) − Vk (t)) ≤ S(t)
e − S(t),
∞
X ∞
X
?
|C | = (Vk (T1? −) − Vk (T2? −)) = (Vek (T1? −) − Vek (T2? )) + oP (n) (4.3.90)
k=1 k=1
? ?
= nGD (e−T1 ) − nGD (e−T2 ) + oP (n),
and
∞
X ∞
X
2|E(C ? )| = k(Vk (T1? −) − Vk (T2? )) = k(Vek (T1? −) − Vek (T2? )) + oP (n)
k=1 k=1
−T1? −T2?
= nh(e ) − nh(e ) + oP (n). (4.3.91)

Thus, (4.3.83) and (4.3.84) follow from the convergence Ti? −→ ti and the continuity of
P

t 7→ GD (e−t ) and t 7→ h(e−t ).

We are now ready to complete the Proof of Theorem 4.9.

Completion of the proof of Theorem 4.9

Let Cmax
0
be the component created at time T1 and explored until time T2 , where we recall
that T1 is the last time Step 1 was performed before time θ/2 and we let T2 be the next time
P P
it is performed if that did occur and let T2 = ∞ otherwise. Then T1 −→ 0 and T2 −→ θ.
170 Connected Components in Configuration Models

The cluster Cmax

0
is our candidate for the giant component Cmax , and we next prove that
indeed it is, whp, the largest connected component.
By Proposition 4.19, with t1 = 0 and t2 = θ,
0
)|/n −→ pk (1 − e−kθ ),
P
|vk (Cmax (4.3.92)
0 P −θ
|Cmax |/n −→ GD (1) − GD (e ) = 1 − GD (ξ), (4.3.93)
0 1 1 E[D]
)|/n −→ (h(1) − h(e−θ )) = (h(1) − h(ξ)) = (1 − ξ 2 ), (4.3.94)
P
|E(Cmax
2 2 2
using Exercise 4.9. We have found one large component Cmax 0
with the claimed numbers of
vertices and edges. It remains to show that whp there is no other large component. The basic
idea is that if there exists another component that has at least η`n half-edges in it, then it
should have a reasonable chance of actually being found quickly. Since we can show that
the probability of finding a large component before T1 or after T2 is small, there just cannot
be any other large connected component. Let us now make this intuition precise.
No early large component. Here, we first show that it is unlikely that a large component
different from Cmax0
is found before time T1 . For this, let η > 0, and apply Proposition 4.19
to T0 = 0 and T1 , where T1 has been defined to be the last time Step 1 was performed before
P
time θ/2. Then, since T1 −→ 0, the total number of vertices and edges in all components
found before Cmax 0
, i.e., before time T1 , is oP (n). Hence, recalling that `n = Θ(n) by
Condition 1.7(b),
P(a component C with |E(C )| ≥ η`n is found before Cmax
0
) → 0. (4.3.95)
We conclude that whp no component containing at least η`n half-edges is found before Cmax
0

is found.
No late large component. In order to study the probability of finding a component containing
at least η`n edges after Cmax
0
is found, we start by letting T3 be the first time after time T2
e − S(t) increases by at most dmax = o(n) each time
that Step 1 is performed. Since S(t)
Step 1 is performed, we obtain from (4.3.87) that
e − S(t)) ≤ sup (S(t)
sup (S(t) e − S(t)) + dmax = oP (n). (4.3.96)
t≤T3 t≤T2

Comparing this with (4.3.81), for every ε > 0 and whp we have that θ + ε > T3 . Since also
T3 > T2 −→ θ, it follows that T3 −→ θ. If C 0 is the component created between times T2
P P

and T3 , then Proposition 4.19 applied to T2 and T3 yields |C 0 |/n −→ 0 and |E(C 0 )| −→ 0.
P P

On the other hand, if there existed a component C 6= Cmax 0

in CMn (d) with at least
η`n edges that had not been found before Cmax , then with probability at least η the vertex
0

chosen at random by Step 1 at time T2 to start the component C 0 would belong to C . If this
occurred, we would clearly have that C = C 0 . Consequently,
P(a component C with |E(C )| ≥ η`n is found after Cmax
0
)
≤ η −1 P(|E(C 0 )| ≥ η`n ) → 0, (4.3.97)

since |E(C 0 )| −→ 0.
P

Completion of the proof of Theorem 4.9(a). Combining (4.3.95) and (4.3.97), we see that
4.3 The Giant in the Configuration Model 171

whp there is no connected component except Cmax0

that has at least η`n edges. As a re-
sult, we must have that Cmax = Cmax , where Cmax is the largest connected component.
0

Further, again whp, |E(C(2) )| < η`n . Consequently, the results for Cmax follow from
P
(4.3.92)–(4.3.94). We have further shown that |E(C(2) )|/`n −→ 0, which implies that
P P
|E(C(2) )|/n −→ 0 and |C(2) |/n −→ 0 because `n = Θ(n) and |C(2) | ≤ |E(C(2) )| + 1.
This completes the proof of Theorem 4.9(a).
Completion of the proof of Theorem 4.9(b). The proof of Theorem 4.9(b) is similar to the
last step in the proof for Theorem 4.9(a). Indeed, let T1 = 0 and let T2 be the next time that
Step 1 is performed, and let T2 = ∞ if this does not occur. Then
sup |A(t) − A(t)|
e = sup |S(t)
e − S(t)| ≤ 2dmax = o(n). (4.3.98)
t≤T2 t≤T2

For every ε > 0, n−1 A(ε) −→ H(e−ε ) < 0 by (4.3.75) and Lemma 4.18(b), while
P
e
P
A(ε) ≥ 0, and it follows from (4.3.98) that whp T2 < ε. Hence, T2 −→ 0. We apply
Proposition 4.19 (which holds in this case too, with θ = 0) and find that if C is the first
P
component found, then |E(C )|/n −→ 0.
Let η > 0. If |E(Cmax )| ≥ η`n , then the probability that the first half-edge chosen by
Step 1 belongs to Cmax , and thus C = Cmax , is 2|E(Cmax )|/(2`n ) ≥ η , and hence,
P(|E(Cmax )| ≥ η`n ) ≤ η −1 P(|E(C )| ≥ η`n ) → 0. (4.3.99)
The results follows since `n = Θ(n) by Condition 1.7(b) and |Cmax | ≤ |E(Cmax )| + 1.
This completes the proof of Theorem 4.9(b), and thus that of Theorem 4.9.

4.3.3 G IANT C OMPONENT OF R ELATED R ANDOM G RAPHS

In this subsection we extend the results of Theorem 4.9 to related models, such as uniform
simple random graphs with a given degree sequence, as well as generalized random graphs.
Recall that UGn (d) denotes a uniform simple random graph with degrees d (see Section
1.3.4 and [V1, Section 7.5]). The results in Theorem 4.9 also hold for UGn (d) when we
assume that Conditions 1.7(a)–(c) hold:
Theorem 4.20 (Phase transition in UGn (d)) Let d satisfy Conditions 1.7(a)–(c). Then
Theorem 4.9 extends to the uniform simple graph with degree sequence d.

Proof By [V1, Corollary 7.17], and since d = (dv )v∈[n] satisfies Conditions 1.7(a)–(c),
any event En that occurs whp for CMn (d) also occurs whp for UGn (d). By Theorem 4.9,
the event En = { |Cmax |/n − ζ ≤ ε} occurs whp for CMn (d), so it also holds whp for
UGn (d). The proof for the other properties is identical.
Note that it is not obvious how to extend Theorem 4.20 to the case where ν = ∞, which
we discuss now:
Theorem 4.21 (Giant in uniform graph with given degrees for ν = ∞) Consider UGn (d),
where the degrees d satisfy Conditions 1.7(a),(b), and assume that there exists τ ∈ (2, 3)
such that, for every x ≥ 1,
[1 − Fn ](x) ≤ cF x−(τ −1) . (4.3.100)
172 Connected Components in Configuration Models

Then, Theorem 4.9 extends to the uniform simple graph with degree sequence d.

Sketch of proof. We do not present the entire proof, but rather sketch it. We will show that,
for every ε > 0, there exists δ = δ(ε) > 0 such that

P |Cmax | − ζn > εn ≤ e−δn ,

(4.3.101)

and
P(|vk (Cmax ) − pk (1 − ξ k )n| > εn) ≤ e−δn . (4.3.102)

This exponential concentration is quite convenient, as it allows us to extend the result to the
setting of uniform random graphs by conditioning CMn (d) to be simple. Indeed, by Lemma
4.8, it follows that the result also holds for the uniform simple random graph UGn (d) when
Conditions 1.7(a),(b) hold. In Exercise 4.10 below, the reader is invited to fill in the details
of the proof of Theorem 4.21.

We refer to Section 4.5 for a further discussion of (4.3.102). There, we discuss approxi-
mations for P(CMn (d) simple) under conditions such as (4.3.100).
We next prove Theorem 3.19 for rank-1 inhomogeneous random graphs, as already stated
in Theorem 3.20 and as restated here for convenience:

Theorem 4.22 (Phase transition in rank-1 random graphs) Let w satisfy Condition 1.1(a)–
(c). Then the results in Theorem 4.9 also hold for GRGn (w), CLn (w), and NRn (w).

Proof Let dv be the degree of vertex v ∈ [n] in GRGn (w) defined in [V1, (1.3.18)], where
we use a small letter to avoid confusion with Dn , which is the degree of a uniform vertex
in [n]. By [V1, Theorem 7.18], the law of GRGn (w) conditioned on the degrees d and
CMn (d) conditioned on being simple agree (recall also Theorem 1.4). By Theorem 1.9,
(dv )v∈[n] satisfies Conditions 1.7(a)–(c) in probability. Then, by [V1, Theorem 7.18] and
Theorem 4.9, the results in Theorem 4.9 also hold for GRGn (w). By [V1, Theorem 6.20],
the same result applies to CLn (w), and, by [V1, Exercise 6.39], also to NRn (w).

Unfortunately, when ν = ∞ we cannot rely on the fact that, by [V1, Theorem 7.18], the
law of GRGn (w) conditioned on the degrees d and CMn (d) conditioned on being simple
agree. Indeed, when ν = ∞, the probability that CMn (d) is simple vanishes. Therefore,
we instead rely on a truncation argument to extend Theorem 4.22 to the case where ν = ∞.
It is here that the monotonicity of GRGn (w) in terms of the edge probabilities can be used
rather conveniently:

Theorem 4.23 (Phase transition in GRGn (w)) Let w satisfy Conditions 1.1(a),(b). Then,
the results in Theorem 4.9 also hold for GRGn (w), CLn (w), and NRn (w).
P
Proof We prove only that |Cmax |/n −→ ζ ; the other statements in Theorem 4.9 can be
proved in a similar fashion (see Exercises 4.11 and 4.12 below). We prove Theorem 4.23
only for NRn (w), the proofs for GRGn (w) and CLn (w) being similar. The required upper
bound |Cmax |/n ≤ ζ + oP (1) follows by the local convergence in probability in Theorem
3.14 and Corollary 2.27.
4.4 Connectivity of Configuration Models 173

For the lower bound, we bound NRn (w) from below by a random graph with edge prob-
abilities
−(wu ∧K)(wv ∧K)/`n
uv = 1 − e
p(K) . (4.3.103)
Therefore, we also have |Cmax | |Cmax
(K)
|, where Cmax
(K)
is the largest connected component
in the inhomogeneous random graph with edge probabilities (puv (K)
)u,v∈[n] . Let
1 X
wv(K) = (wv ∧ K) (wu ∧ K), (4.3.104)
`n u∈[n]

so that the edge probabilities in (4.3.103) correspond to the Norros–Reittu model with
weights (wv(K) )v∈[n] . It is not hard to see that, when Condition 1.1(a) holds for (wv )v∈[n] ,
Conditions 1.1(a)–(c) hold for (wv(K) )v∈[n] , where the limiting random variable equals W (K) =
(W ∧ K)E[(W ∧ K)]/E[W ]. Therefore, Theorem 4.22 applies to (wv(K) )v∈[n] . We deduce
P
that |Cmax
(K)
|/n −→ ζ (K) , which is the survival probability of the two-stage mixed-Poisson
branching process with mixing variable W (K) . Since ζ (K) → ζ when K → ∞, we conclude
P
that |Cmax |/n −→ ζ .

4.4 C ONNECTIVITY OF C ONFIGURATION M ODELS

Assume that P(D = 2) < 1. By Theorem 4.9, we see that |Cmax |/n −→ 1 when P(D ≥
P

2) = 1, as in this case the survival probability ζ of the local limit equals 1. In this section,
we investigate the conditions under which CMn (d) is whp connected, i.e., Cmax = [n] and
|Cmax | = n. Our main result shows that this occurs whp when dmin = minv∈[n] dv ≥ 3:
Theorem 4.24 (Connectivity of CMn (d)) Assume that Conditions 1.7(a),(b) hold. Fur-
ther, assume that dv ≥ 3 for every v ∈ [n]. Then
P(CMn (d) disconnected) = o(1). (4.4.1)
If Condition 1.7(a) holds with p1 = p2 = 0, then ν ≥ 2 > 1 is immediate, so we
are always in the supercritical regime. Also, ζ = 1 when p1 = p2 = 0, since survival
of the unimodular branching-process tree occurs with probability 1. Therefore, Theorem
4.9 implies that the largest connected component has size n(1 + oP (1)) when Conditions
1.7(a),(b) hold. Theorem 4.24 extends this to the statement that CMn (d) is whp connected.
Theorem 4.24 yields an important difference between the generalized random graph and
the configuration model, also from a practical point of view. Indeed, for the generalized
random graph to be whp connected, the degrees must tend to infinity. This has already been
observed for ERn (p) in [V1, Theorem 5.8]. The configuration model can be connected
while the average degree is bounded. Many real-world networks are connected, which makes
the configuration model often more suitable than inhomogeneous random graphs from this
perspective (recall Table 4.1 and Figure 4.1).
Proof The proof is based on a relatively simple counting argument. We recall that a config-
uration denotes a pairing of all the half-edges. We note that the probability of a configuration
equals 1/(`n − 1)!!. On the event that CMn (d) is disconnected, there exists a set of vertices
I ⊆ [n] with |I| ≤ bn/2c such that all half-edges incident to vertices in I are paired only
174 Connected Components in Configuration Models

with half-edges incident to other vertices in I . For I ⊆ [n], we let `n (I) denote the total
degree of I , i.e.,
X
`n (I) = di . (4.4.2)
i∈I

Since dmin ≥ 3, we can use Theorem 4.9 to conclude that most edges are in Cmax , and
I 6= Cmax . Therefore, `n (I) = o(`n ) = o(n), and we may, without loss of generality,
assume that `n (I) ≤ `n /2. We denote by En the event that there exists a collection of
connected components I consisting of |I| ≤ bn/2c vertices for which the sum of degrees
is at most `n (I) ≤ `n /2, so that En occurs whp, i.e.
P(Enc ) = o(1). (4.4.3)
Clearly, in order for the half-edges incident to vertices in I to be paired only to other
half-edges incident to vertices in I , `n (I) needs to be even. The number of configurations
for which this happens is bounded above by
(`n (I) − 1)!!(`n (I c ) − 1)!!. (4.4.4)
As a result,
X (`n (I) − 1)!!(`n (I c ) − 1)!!
P(CMn (d) disconnected; En ) ≤
I⊆[n]
(`n − 1)!!
`n (I)/2
X Y `n (I) − 2j + 1
= , (4.4.5)
I⊆[n] j=1
`n − 2j + 1

where the sum over I ⊆ [n] is restricted to sets I for which 1 ≤ |I| ≤ bn/2c and
`n (I) ≤ `n /2 is even. Exercise 4.13 uses (4.4.5) to bound the probability of the existence
of an isolated vertex (i.e., a vertex with only self-loops).
Define
x
Y 2x − 2j + 1
f (x) = , (4.4.6)
` − 2j + 1
j=1 n

so that
X
P(CMn (d) disconnected; En ) ≤ f (`n (I)/2). (4.4.7)
I⊆[n]

We can rewrite
Qx Qx−1 x−1
(2x − 2j + 1) (2i + 1) Y 2j + 1
f (x) = Qj=1
x = Qx−1i=0 = , (4.4.8)
j=1 (`n − 2j − 1) k=0 (`n − 2k + 1)
` − 2j + 1
j=0 n

where we set j = x−i and j = k+1 in the second equality. Thus, for x ≤ `n /4, x 7→ f (x)
is decreasing because
f (x + 1) 2x + 1
= ≤ 1. (4.4.9)
f (x) `n − 2x + 1
4.4 Connectivity of Configuration Models 175

Since `n (I) ≤ `n /2, we also have that `n (I)/2 ≤ `n /4, so that f (`n (I)/2) ≤ f (a) for
any a ≤ `n (I)/2. Now, since di ≥ 3 for every i ∈ [n] and `n (I) ≤ `n /2 is even,

`n (I) ≥ 2d3|I|/2e, (4.4.10)

n

which depends only on the number of vertices in I . There are precisely m ways of choos-
ing m vertices out of [n], so that, with m = |I|,
bn/2c
!
X X n
P(CMn (d) disconnected; En ) ≤ f (d3|I|/2e) = f (d3m/2e). (4.4.11)
I⊆[n] m=1
m

Define
!
n
hn (m) = f (d3m/2e), (4.4.12)
m

so that
bn/2c
X
P(CMn (d) disconnected; En ) ≤ hn (m). (4.4.13)
m=1

We are left with studying hn (m). For this, we write

hn (m + 1) n − m f (d3(m + 1)/2e)
= . (4.4.14)
hn (m) m+1 f (d3m/2e)
Note that, for m odd,
f (d3(m + 1)/2e) f ((3m + 1)/2 + 1) 3m + 2
= = . (4.4.15)
f (d3m/2e) f ((3m + 1)/2) `n − 3m
while, for m even,
f (d3(m + 1)/2e) f (3m/2 + 2) 3m + 3 3m + 1
= = . (4.4.16)
f (d3m/2e) f (3m/2) `n − 3m − 1 `n − 3m + 1
Thus, for m odd and using `n ≥ 3n,

hn (m + 1) n − m 3m + 2 3(n − m)
= ≤ ≤ 1, (4.4.17)
hn (m) m + 1 `n − 3m `n − 3m
while, for m even and using `n ≥ 3n,

hn (m + 1) n − m 3m + 3 3m + 1 3m + 1
= ≤ . (4.4.18)
hn (m) m + 1 `n − 3m − 1 `n − 3m + 1 `n − 3m − 1
Thus, we obtain that, for m ≤ n/2 and since `n ≥ 3n, there exists a c > 0 such that

hn (m + 1) c
≤1+ . (4.4.19)
hn (m) n
176 Connected Components in Configuration Models

We conclude that, for m ≤ n/2 such that for m ≥ 3,

m−1 bn/2c
Y hn (j + 1) Y
hn (m) = hn (3) ≤ hn (3) (1 + c/n)
j=3
hn (j) j=3

≤ hn (3)(1 + c/n)bn/2c ≤ hn (3)ec/2 , (4.4.20)

so that
εn bn/2c
X X
P(CMn (d) disconnected; En ) ≤ hn (m) ≤ hn (1) + hn (2) + hn (m)
m=1 m=3

≤ hn (1) + hn (2) + nhn (3)ec/2 /2. (4.4.21)

By Exercises 4.13–4.15, hn (1), hn (2) = O(1/n), so we are left with computing hn (3).
For this, we note that d3m/2e = 5 when m = 3, so that
!
n 9!!n(n − 1)(n − 2)
hn (3) = f (5) =
3 6(`n − 1)(`n − 3)(`n − 5)(`n − 7)(`n − 9)
= O(1/n2 ). (4.4.22)
As a result, nhn (3) = O(1/n). We conclude that
P(CMn (d) disconnected; En ) = O(1/n), (4.4.23)
which, together with (4.4.3), completes the proof.
The bound in (4.4.23) is stronger than required, and even suggests that
P(CMn (d) disconnected) = O(1/n). However, our proof falls short of this, since we
started with the assumption that the non-giant (collection of) component(s) I satisfies `n (I) ≤
`n /2. For this, in turn, we used Theorem 4.9 to conclude that the complementary probabil-
ity is o(1). In the following theorem we will use the degree-truncation argument in Section
1.3.3, see in particular Remark 1.12, to substantially improve upon Theorem 4.24:
Theorem 4.25 (Connectivity of CMn (d)) Assume that dv ≥ 3 for every v ∈ [n]. Then
P(CMn (d) disconnected) = O(1/n). (4.4.24)
Theorem 4.25 improves upon Theorem 4.24 in that there are no restrictions on the degree
other than dmin ≥ 3 (not even the degree regularity in Conditions 1.7(a),(b)) and that the
probability of disconnection is O(1/n). Exercise 4.16 uses Theorem 4.25 to prove that
P(CMn (d) disconnected) = Θ(1/n) (4.4.25)
when P(D = 3) > 0, and Exercise 4.17 does the same for P(D = 4) > 0.
Proof We start by using Remark 1.12 with b = 3, which means that we may assume that
all degrees are in {3, 4, 5}. Indeed, the construction leading to Remark 1.12 splits vertices
of high degrees into (possibly several) vertices of degree lying in the interval [b, 2b), which
equals {3, 4, 5} for b = 3. Further, when the graph after splitting is connected, it certainly
must have been before splitting. We slightly abuse notation, and keep on writing CMn (d)
for the graph after degree truncation.
4.4 Connectivity of Configuration Models 177

We then follow the proof of Theorem 4.24, and now define I as the collection of com-
ponents that satisfies `n (I) ≤ `n /2. It should be remarked that in this case we cannot rely
upon Theorem 4.9, which implies (4.4.3). Theorem 4.9 was used to show that `n (I) ≤ `n /2
and |I| ≤ n/2 whp. The fact that `n (I) ≤ `n /2 was used in (4.4.9) to show that x 7→ f (x)
is decreasing for the appropriate x, and this still holds. The fact that |I| ≤ n/2 was used to
restrict the sum over m in (4.4.11) and the formulas that followed it, which we can now no
longer use, and thus we need an alternative argument.
In the current setting, since the degrees are all in {3, 4, 5},
3|I| ≤ `n (I) ≤ `n /2, (4.4.26)
so that m ≤ `n /6. Following the proof of Theorem 4.24 up to (4.4.13), we thus arrive at
b`n /6c
X
P(CMn (d) disconnected) ≤ hn (m). (4.4.27)
m=1

The bound in (4.4.17) remains unchanged since it did not rely on m ≤ n/2, while, for
m ≤ `n /6, (4.4.18) can be bounded as follows:
3m + 1
≤ 1 + O(1/n). (4.4.28)
`n − 3m − 1
As a result, both (4.4.17) and (4.4.18) remain valid, proving that hn (m + 1)/hn (m) ≤
1 + c/n. We conclude that the proof can be completed as for Theorem 4.24.
The above proof is remarkably simple, and requires very little of the precise degree dis-
tribution to be satisfied except for dmin ≥ 3. In what follows, we investigate what happens
when this fails. We first continue by showing that CMn (d) is with positive probability dis-
connected when n1 , the number of vertices of degree 1, satisfies n1 n1/2 :
Proposition 4.26 (Disconnectivity of CMn (d) when n1 n1/2 ) Let Conditions 1.7(a),(b)
hold, and assume that n1 n1/2 . Then
lim P(CMn (d) connected) = 0. (4.4.29)
n→∞

Proof We note that CMn (d) is disconnected when there are two vertices of degree 1 whose
half-edges are paired with each other. When the half-edges of two vertices of degree 1 are
paired with each other, we say that a 2-pair is created. Then, since after i pairings of degree-1
vertices to higher-degree vertices, there are `n − n1 − i + 1 half-edges incident to higher-
degree vertices, out of a total of `n − 2i + 1 unpaired half-edges, we have
n1
Y `n − n 1 − i + 1
P(CMn (d) contains no 2-pair) =
i=1
`n − 2i + 1
n1
Y n1 − i
= 1− . (4.4.30)
i=1
`n − 2i + 1
For each i ≥ 1,
n1 − i n1 − i
1− ≤1− ≤ e−(n1 −i)/`n , (4.4.31)
`n − 2i + 1 `n
178 Connected Components in Configuration Models

so that we arrive at
n1
Y
P(CMn (d) contains no 2-pair) ≤ e−(n1 −i)/`n
i=1
−n1 (n1 −1)/[2`n ]
=e = o(1), (4.4.32)

since `n = Θ(n) and n1 n1/2 .

We close this section by showing that the probability that CMn (d) is connected is strictly
smaller than 1 when p2 > 0:

Proposition 4.27 (Disconnectivity of CMn (d) when p2 > 0) Let Conditions 1.7(a),(b)
hold, and assume that p2 > 0. Then,

lim sup P(CMn (d) connected) < 1. (4.4.33)

n→∞

Proof We perform a second-moment method on the number Pn (2) of connected compo-

nents consisting of two vertices of degree 2. We compute
2n2 (n2 − 1)
E[Pn (2)] = , (4.4.34)
2(`n − 1)(`n − 3)
since there are n2 (n2 − 1)/2 pairs of vertices of degree 2, and the probability that a fixed
pair forms a connected component is equal to 2/(`n − 1)(`n − 3). By Conditions 1.7(a),(b),
which imply that n2 /n → p2 ,

E[Pn (2)] → p22 /E[D]2 ≡ λ2 . (4.4.35)

By assumption, p2 > 0, so that also λ2 > 0. By investigating the higher factorial moments,
d
and using [V1, Theorem 2.6], it follows that Pn (2) −→ Poi(λ2 ), so that

P(CMn (d) disconnected) ≥ P(Pn (2) > 0) → 1 − e−λ2 > 0, (4.4.36)

as required. The proof that [V1, Theorem 2.6] can be applied is Exercise 4.18.

Almost-Connectivity for dmin ≥ 2

We close this section with a detailed result on the size of the giant when dmin ≥ 2:

Theorem 4.28 (Almost-connectivity of CMn (d) when p1 = 0) Consider CMn (d) where
the degrees d satisfy Conditions 1.7(a),(b), and assume that p2 ∈ (0, 1). Also assume that
dv ≥ 2 for every v ∈ [n]. Then
d
X
n − |Cmax | −→ kXk , (4.4.37)
k≥2

where (Xk )k≥2 are independent Poisson random variables with parameters λk = λk 2k−1 /k
with λ = p2 /E[D]. Consequently,
P 2 k
P(CMn (d) connected) → e− k≥2 (2λ ) /(2k)
∈ (0, 1). (4.4.38)
4.5 Related Results for Configuration Models 179

Rather than giving the complete proof of Theorem 4.28, we give a sketch of it:

Sketch of proof of Theorem 4.28. Let Pn (k) denote the number of k -cycles consisting of
degree-2 vertices, for k ≥ 2. Obviously, every vertex in such a cycle is not part of the giant
component, so that
X
n − |Cmax | ≥ kPn (k). (4.4.39)
k≥2

d
A multivariate moment method allows one to prove that (Pn (k))k≥2 −→ (Xk )k≥2 , where
(Xk )k≥2 are independent Poisson random variables with parameters (see Exercise 4.19)

λk2 /(2k) = lim E[Pn (k)].

n→∞

In order to complete the argument, two approaches are possible (and have been used in
the literature). First, Federico and van der Hofstad (2017) used counting arguments to show
that as soon as a connected component has at least one vertex v of degree dv ≥ 3, then it
is whp part of the giant component Cmax . This then proves that (4.4.39) is whp an equality.
See also Exercise 4.20.
Alternatively, and more in the style of Łuczak (1992), one can pair up all the half-edges
incident to vertices of degree 2, and then realize that the graph, after pairing of all these
degree-2 vertices, is again a configuration model with a changed degree distribution. The
cycles consisting of only degree-2 vertices will be removed, so that we need only to consider
the contribution of pairing strings of degree-2 vertices to vertices of degrees at least 3. If both
ends of the string are each connected to two distinct vertices of degrees ds , dt at least 3, then
we can imagine this string to correspond to a single vertex of degree ds + dt − 2 ≥ 4, which
is sufficiently large.
Unfortunately, it is also possible that the string of degree-2 vertices is connected to the
same vertex u of degree du ≥ 3, thus possibly reducing the degree by 2. When du ≥ 5,
there are still at least three half-edges remaining at u. Thus, we need only to consider the case
where we create a cycle of vertices of degree 2 with one vertex u in it of degree du = 3 or
du = 4, respectively, which corresponds to vertices of remaining degree 1 or 2, respectively.
In Exercise 4.21, the reader is asked to prove that there is a bounded number of such cycles.
We conclude that it suffices to extend the proof of Theorem 4.24 to the setting where there
is a bounded number of vertices of degrees 1 and 2. The above argument can be repeated
for the degree-2 vertices. We can deal with the degree-1 vertices in a similar way. Pairing
the degree-1 vertices again leads to vertices of remaining degree at least 3 − 1 = 2 after the
pairing, which is fine when the remaining degree is at least 3; otherwise they can be dealt
with in the same way as the other degree-2 vertices. We refrain from giving more details.

4.5 R ELATED R ESULTS FOR C ONFIGURATION M ODELS

In this section we discuss related results on connected components for the configuration
model. We start by discussing the subcritical behavior of the configuration model.
180 Connected Components in Configuration Models

Largest Subcritical Cluster

When ν < 1, so that in particular E[D2 ] < ∞, the largest connected component for
CMn (d) is closely related to the largest degree:
Theorem 4.29 (Subcritical phase for CMn (d)) Let d satisfy Conditions 1.7(a)–(c) with
ν = E[D(D − 1)]/E[D] < 1. Suppose further that there exists τ > 3 and c2 > 0 such
that, for all x ≥ 1,
[1 − Fn ](x) ≤ c2 x−(τ −1) . (4.5.1)

Then, for CMn (d) with dmax = maxj∈[n] dj ,

dmax
|Cmax | = + oP (n1/(τ −1) ). (4.5.2)
1−ν
Theorem 4.29 is closely related to Theorem 3.22 for GRGn (w). In fact, we can use
Theorem 4.29 to prove Theorem 3.22, see Exercise 4.22. Note that the result in (4.5.2)
is most interesting when dmax = Θ(n1/(τ −1) ), as it would be in the case when the degrees
obey a power law with exponent τ (for example, in the case where the degrees are iid). This is
the only case where Theorem 4.29 is sharp (see Exercise 4.23). When dmax = o(n1/(τ −1) ),
Theorem 4.29 implies that |Cmax | = oP (n1/(τ −1) ), a less precise result.
The intuition behind Theorem 4.29 is that from the vertex of maximal degree, there are
dmax half-edges that can reach more vertices. Since the random graph is subcritical, one
can use Theorem 4.1 to prove that the tree rooted at any half-edge incident to the vertex
of maximal degree converges in distribution to a subcritical branching process. Further, the
trees rooted at different half-edges are close to being independent. Thus, by the law of large
numbers, one can expect that the total number of vertices in these dmax trees is close to
dmax times the expected size of a single tree, which is 1/(1 − ν). This explains the result
in Theorem 4.29. Part of this intuition is made precise in Exercises 4.25 and 4.26. Exercises
4.27 and 4.28 investigate conditions under which |Cmax | = dmax /(1 − ν)(1 + oP (1)) might,
or might not, hold.

Number of Simple Scale-Free Graphs with Given Degrees

We next discuss the number of simple graphs when the degree distribution has infinite vari-
ance, and thus no longer satisfies Condition 1.7(c). The main result is as follows:
Theorem 4.30 (Number of simple graphs: infinite-variance degrees) Let d = (di )i∈[n]
satisfy Conditions 1.7(a),(b), as well as either P(Dn = k) ≤ ck −τ for some τ > 52 , or
√
P(Dn > k) ≤ ck −(τ −1) for some τ > 1 + 3. Then the number of simple graphs having
degree sequence d equals
(`n − 1)!! n `
n E[Dn2 ] 3 X o
exp − + + + log(1 + du dv /`n ) + o(1) .
2E[Dn ] 4 1≤u<v≤n
Q
v∈[n] dv ! 2
√
More general results exist, including cases when dmax n; see the discussion in
Section 4.6. Theorem 4.30 also holds when E[Dn2 ] = o(n1/8 ). The difficulty
√ in the proof
of Theorem 4.30 is that it allows for degree sequences for which dmax n. In this case,
4.5 Related Results for Configuration Models 181

there are many multi-edges between vertices of degree of order dmax in CMn (d), and the
conditioning on being simple thus has a dramatic effect.
As a consequence of Theorem 4.30, we obtain that, subject to its assumptions,
n `
n E[Dn2 ] 3 X o
P(CMn (d) simple) = exp − + + + log(1 + du dv /`n ) + o(1) ;
2 2E[Dn ] 4 1≤u<v≤n
(4.5.3)
recall (4.2.39) for a more general, but weaker, estimate. In Exercise 4.29 the reader can show
that (4.5.3) is indeed e−o(n) under the conditions of Theorem 4.30.

Densest Subgraph Problem

We next consider the densest subgraph problem on CMn (d), as a nice example of how
local convergence methods can be used effectively to prove highly non-trivial and non-local
results. For a graph G = (V (G), E(G)), we let H be a subgraph when V (H) ⊆ V (G),
and E(H) = {{u, v} ∈ E(G) : u, v ∈ V (H)}. We then let

|E(H)|
κ(G) = max (4.5.4)
∅6=H⊆G |V (H)|

be the density of the densest subgraph of G. It is far from obvious that the asymptotics of
κ(Gn ) can be described using local convergence methodologies, but a deep relation exists:
Theorem 4.31 (Densest subgraph of sparse CM) Consider CMn (d) subject to Condition
1.7(a), and assume that P(D = 1) < 1 as well as that there exists θ > 0 such that

sup E[eθDn ] < ∞. (4.5.5)

n≥1

P
Then κ(CMn (d)) −→ κ(µ), where µ is the law of the unimodular branching process with
root offspring distribution (pk )k≥1 with pk = P(D = k), and κ(µ) is defined in (4.5.9)
below.

Theorem 4.31 describes the convergence of the edge density of the densest subgraph of
CMn (d) as well as the fact that its limit is a functional of the local limit, as described in
Theorem 4.1. Theorem 4.31 holds under a strong degree assumption, in the sense that the
degree distribution has exponentially small tails. It is unclear whether Theorem 4.31 remains
valid when (4.5.5) fails. We refer to the notes and discussion in Section 4.6 for more details.
Exercise 4.30 shows that Condition 1.7(a) and (4.5.5) imply that Conditions 1.7(b),(c) hold.
Let us now shed some light on the how the proof of Theorem 4.31 can be related to local
convergence. This proof is beautiful, while at the same time also technically demanding.
The proof highlights how this link can be used to define κ(µ), as well as to establish the
convergence of κ(CMn (d)) to it. This connection is through load balancing problems.
Let G = (V (G), E(G)) be a finite, simple, undirected graph. As before, we write E(G) ~
for the set of directed edges, formed by replacing each edge {u, v} ∈ E(G) by the two
directed edges (u, v) and (v, u). An allocation on G is a map θ : E(G)~ → [0, 1] satisfying
θ(u, v) + θ(v, u) = 1 for every {u, v} ∈ E(G). The load induced by θ at a vertex o ∈
182 Connected Components in Configuration Models

V (G) is given by
X
∂θ(o) := θ(o, v). (4.5.6)
v : {v,o}∈E(G)

~
An allocation θ is balanced when, for every (u, v) ∈ E(G), ∂θ(u) < ∂θ(v) implies that
θ(u, v) = 0.
When we are thinking of each edge as carrying a unit amount of load, an allocation needs
to be chosen that distributes load over its endpoints in such a way that the total load is as
balanced as possible across the graph. Thus,
P a balanced allocation optimizes this allocation
problem, in that a balanced θ minimizes v∈V (G) f (∂θ(v)) either over some strictly convex
f : [0, 1] → [0, ∞), or over all convex f : [0, 1] → [0, ∞).
From now on, we let θ denote a balanced allocation. Remarkably, it can be seen that
∂θ(v) measures the local density of G at v ∈ V (G). In particular, in terms of this load
balancing problem,
κ(G) = max ∂θ(v). (4.5.7)
v∈V (G)

We are left with studying the vector (∂θ(v))v∈[n] . Denote the empirical load distribution
by
1
1{∂θ(v)∈A} ,
X
LG (A) = (4.5.8)
|V (G)| v∈V (G)

for every Borel set A ⊆ [0, ∞). When Gn converges locally, one would also expect that
LGn (A) → L(A) for some limiting measure L. This indeed turns out to be true (but is
technically quite challenging). In fact, it turns out that if Gn converges locally to (G, o) ∼ µ
then L = Lµ . In terms of L, we have the characterization that
κ(µ) = sup{t ∈ R : L[t, ∞) > 0}. (4.5.9)
Unfortunately, this is not the end of the story. Indeed, by the above, one would expect that
P
κ(Gn ) −→ κ(µ) if Gn converges locally in probability to (G, o) ∼ µ. This, however, is
far from obvious as the graph parameter κ(G) is too sensitive to be controlled only by local
convergence. Indeed, let Gn converge locally, and add a disjoint clique Kmn to Gn of size
mn = o(n) to obtain G+ +
n . Then, obviously, κ(Gn ) = max{κ(Gn ), (mn − 1)/2}. Thus,
the precise structure of the graph Gn is highly relevant, and CMn (d) under the condition
(4.5.5) turns out to be “nice enough.”
We do not prove Theorem 4.31 but instead indicate how (4.5.5) can be used to show that
κ(CMn (d)) is bounded. This proceeds in four key steps.
In the first step, we investigate the number of edges NS between vertices in a set S ⊆
[n], and show that NSPis stochastically bounded by a binomial random variable with mean
d2S /m, where dS = v∈S dv is the total degree of the set S . This can be seen by pairing
the half-edges one by one, giving priority to the half-edges incident to vertices in S . Let
(Xt )t≥1 denote the Markov chain that describes the number of edges with both endpoints in
S after t pairings. Then, conditioning on (Xs )ts=1 , the probability that Xt+1 = Xt + 1 is
(dS − Xt − t − 1)+ dS − t − 1 d
≤ 1{t≤dS } ≤ S 1{t≤dS } . (4.5.10)
`n − 2t − 1 `n − 2t − 1 `n
4.5 Related Results for Configuration Models 183

This in fact shows that NS is stochastically dominated by a Bin(dS , dS /`n ) variable.

In the second step, the tail NS is bounded, using that
!
n r (np)r
P(Bin(n, p) ≥ r) ≤ p ≤ = E[Bin(n, p)]r /r!. (4.5.11)
r r!
Thus,
1 2r 2r r Y θds
P(NS ≥ r) ≤ d ≤ 2 e , (4.5.12)
r!`rn S θ `n s∈S

using the crude bounds x2r ≤ (2r)!ex for x = dS θ, and (2r)!/r! ≤ (2r)r . As a result, with
Xk,r denoting the number of subgraphs in CMn (d) with k vertices and at least r edges,
X 2r r X Y
E[Xk,r ] ≤ P(NS ≥ r) ≤ 2`
eθds
|S|=k
θ n
|S|=k s∈S
2r r 1 X k 2r r e X k
≤ 2 eθdv ≤ 2 eθdv , (4.5.13)
θ `n k! v∈[n] θ `n k v∈[n]

since k! ≥ (k/e)k . We can rewrite the resulting bound slightly more conveniently. Denote
α = sup E[Dn ], λ = sup E[eθDn ], (4.5.14)
n≥1 n≥1

and pick θ > 0 small enough that λ < ∞. It is here that (4.5.5) is crucially used. Then,
2r r eλn k
E[Xk,r ] ≤ . (4.5.15)
θ2 αn k
In the third step, we first note that, for any set S ⊆ [n] with |S| ≥ nδ , the edge density
of S is at most
dS `n E[Dn ]
≤ = , (4.5.16)
2|S| 2δn 2δ
which remains uniformly bounded. Thus, to show that κ(CMn (d)) is uniformly bounded, it
suffices to analyze sets of size at most δn. For δ ∈ (0, 1) and t > 1, we then let Zδ,t denote
the number of subsets S with |S| ≤ δn and |E(S)| ≥ t|S| in CMn (d). We next show that
there exists a δ > 0 such that, for every t > 1, there exists a χ < ∞ such that
log n t−1
E[Zδ,t ] ≤ χ . (4.5.17)
n
In particular, Zδ,t = 0 whp, so that the density of the densest subgraph is bounded by
(1 + ε)(1 ∧ E[D]/(2δ)). In order to see (4.5.17), we note that
δn
X
E[Zδ,t ] = E[Xk,dkte ]. (4.5.18)
k=1

By (4.5.15),
2dkte dkte k dkte−k
E[Xk,dkte ] ≤ (eλ)k ≤ f (k/n)k , (4.5.19)
θ2 αk n
184 Connected Components in Configuration Models

where we define
2(t + 1) t+1
f (δ) = 1 ∨ (eλ)δ t−1 . (4.5.20)
θ2 α
We choose δ ∈ (0, 1) small enough that f (δ) < 1. Note that δ 7→ f (δ) is increasing, so
that, for every 1 ≤ m ≤ δn,
m
X δn
X
E[Zδ,t ] = f (m/n)k + f (δ)k
k=1 k=m+1

f (m/n) f (δ)m
≤ + . (4.5.21)
1 − f (m/n) 1 − f (δ)
Finally, choose m = c log n with c fixed. Then f (m/n) is of order (log n/n)t−1 , while
f (δ)m (log n/n)t−1 when c is large enough. This proves (4.5.17).
The fourth step concludes the proof. The bound in (4.5.17) shows that κ(CMn (d)) re-
mains uniformly bounded. Further, it also shows that either there is a set of size at least δn
whose density is at least t, or Zδ,t > 0. The latter occurs with vanishing probability for the
appropriate δ > 0, so that whp there is a set of size at least δn whose density is at least t.
The fact that such high-density sets must be large is crucial to go from the convergence of
LGn to Lµ (which follows from local convergence) to that of κ(CMn (d)) to κ(µ) (which,
as we have seen, generally does not follow from local convergence). Indeed, local conver-
gence has direct implications on the local properties of only a positive proportion of vertices,
so problems might arise in this convergence should the maximum in (4.5.7) be carried by a
vanishing proportion of vertices.

4.6 N OTES AND D ISCUSSION FOR C HAPTER 4

Notes on Section 4.2

Theorems 4.1 and 4.6 are classical results, and have appeared in various guises throughout the literature. For
example, Dembo and Montanari (2010b) crucially relied on it to identify the limiting pressure for the Ising
model on the configuration model; see also Dembo and Montanari (2010a) and Bordenave (2016) for more
detailed discussions. Bordenave and Caputo (2015) proved that the neighborhoods in the configuration
model satisfy a large-deviation principle at speed n. This in particular implies that the probability that
CMn (d) contains the wrong number of r-neighborhoods of a specific type decays exponentially, as in
(4.2.22) but with the right constant in the exponent. The conditions posed by Bordenave and Caputo (2015)
are substantially stronger than those in Theorem 4.6 in that they assume that dmax is uniformly bounded.
The results in Bordenave and Caputo (2015) also apply to the Erdős–Rényi random graph. The technique
for obtaining the concentration inequality in (4.2.22) was pioneered by Wormald (1999). Lemma 4.8 is
(Bollobás and Riordan, 2015, Lemma 21).

Notes on Section 4.3

The “giant is almost local” proof of Theorem 4.9 in Section 4.3.1 was adapted from van der Hofstad (2021).
The continuous-time exploration proof in Section 4.3.2 was adapted from Janson and Luczak (2009), who,
in turn, generalize the results by Molloy and Reed (1995, 1998). The results by Molloy and Reed (1995,
1998) are not phrased in terms of branching processes, which makes them a bit more difficult to grasp.
We also refer to Bollobás and Riordan (2015), who gave an alternative proof using branching-process
approximations on the exploration of the giant component. They also provided the extension Theorem 4.21
of Theorem 4.20, by showing that the probability of a deviation of order εn of vk (Cmax ) is exponentially
small for the configuration model. Since the probability of simplicity in CMn (d) for power-law degrees
4.7 Exercises for Chapter 4 185

with infinite variance degree is not exponentially small (recall Lemma 4.8), this implies Theorem 4.21. In
their statement of the main result implying Theorem 4.21, Bollobás and Riordan (2015) used a condition
slightly different from (4.3.100), namely, that there exists a p > 1 such that
E[Dnp ] → E[Dp ] < ∞. (4.6.1)
It is straightforward to show that (4.6.1) for some p > 1 holds precisely when (4.3.100) holds for some
τ > 2. See Exercise 4.31.
The sharpest results for n2 = n(1 − o(1)) are in Federico (2023), to which we refer for details. There,
Federico proved the results in Exercises 4.6 and 4.7 and derived the exact asymptotics of n − |Cmax |.
Barbour and Röllin (2019) proved a central limit theorem for the giant in Theorem 4.9, where the asymp-
totics of the variance already had already been identified by Ball and Neal (2017). Janson (2020a) (see also
Janson (2020b)) lifted the simplicity condition under Condition 1.7(a)–(c) using switchings, so that the
results extend to uniform random graphs with prescribed degrees, as conjectured in Barbour and Röllin
(2019). Janson and Luczak (2008) proved related central limit theorems for the k-core in the configuration
model.

Notes on Section 4.4

The results concerning the connectivity of CMn (d) are folklore. A version of Theorem 4.24 can be found
in (Chatterjee and Durrett, 2009, Lemma 1.2). We could not find the precise version stated in Theorems
4.24 and 4.25. Theorem 4.28 was proved by Federico and√van der Hofstad (2017). This paper also allowed
for a number of vertices n1 of degree 1 satisfying n1 = ρ n. Earlier versions include the results by Łuczak
(1992) for dmin ≥ 2 and by Wormald (1981), who proved r-connectivity when dmin = r (meaning that
the removal of any set of r − 1 vertices keeps the graph connected). A result related to Theorem 4.25 can
be found in Ruciński and Wormald (2002) for a different class of almost d-regular random graphs, where
edges are added one by one until the addition of any further edge creates a vertex of degree d + 1. Such
random graphs can, however, be quite far from uniform (Molloy et al. (2022)).

Notes on Section 4.5

Theorem 4.29 was proved by Janson (2008). Theorem 1.1 in that paper shows that |Cmax | ≤ An1/(τ −1)
when the power-law upper bound in (4.5.1) holds, while Theorem 1.3 in the same paper gives the asymptotic
statement. Further, Remark 1.4 states that the jth largest cluster has size d(j) /(1 − ν) + oP (n1/(τ −1) ).
Theorem 4.30 was proved by Gao and Wormald (2016), to which we refer for further discussion. The
most general result available is their Theorem 6.
Theorem 4.31 was proved by Anantharam and Salez (2016); see in particular Theorem 3 in that paper.
In the case of the Erdős–Rényi random graph, the results in Anantharam and Salez (2016) proved the
conjectures by Hajek (1990), who established the link of load balancing problems to the densest subgraph
problem. In particular, (4.5.7) is (Hajek, 1990, Corollary 7). See also Hajek (1996) for a discussion of load
balancing problems on infinite graphs. The proof of the boundedness of κ(CMn (d)) under the exponential
moment condition in (4.5.5) was taken from (Anantharam and Salez, 2016, Section 11). It is unclear what
P
the minimal condition is that guarantees that κ(CMn (d)) −→ κ(µ). For CMn (d) with infinite-variance
P
degrees, it is not hard to see that κ(CMn (d)) −→ ∞. Indeed, subject to Conditions 1.7(a),(b), whp there
exists
√ a clique of increasing size when there is a growing number of vertices of degree much larger than
n, as we will discuss in more detail in Chapter 7.

4.7 E XERCISES FOR C HAPTER 4

Exercise 4.1 (Convergence of n-dependent branching process) Assume that Conditions 1.7(a),(b) hold.
d d
Prove that Dn? −→ D? , and conclude that BPn (t) −→ BP(t) for every t finite, where the branching
processes (BPn (t))t≥1 and (BP(t))t≥1 were defined in Section 4.2.1.
Exercise 4.2 (Coupling to n-dependent branching process for NRn (w)) Use Proposition 3.16 to adapt
the coupling to an n-dependent branching process in Lemma 4.2, as well as Remark 4.3, to NRn (w).
Exercise 4.3 (Local convergence of NRn (w)) Use the solution of Exercise 4.2 to show that NRn (w)
converges locally in probability to a mixed-Poisson unimodular branching process with mixing distribution
W when Conditions 1.1(a),(b) hold.
186 Connected Components in Configuration Models

Exercise 4.4 (Proof of no-overlap property in (4.2.17)) Subject to the conditions in Theorem 4.1, prove
that P(Br(Gn ) (o1 ) ' t, o2 ∈ B2r
(Gn )
(o1 )) → 0, and conclude that the no-overlap property in (4.2.17)
holds.
Exercise 4.5 (Component size of vertex 1 in a 2-regular graph) Consider CMn (d) where all degrees are
equal to 2, i.e., n2 = n. Let C (1) denote the size of the connected component of vertex 1. Show that
d
|C (1)|/n −→ T, (4.7.1)
√
where P(T ≤ x) = 1 − 1 − x.
Exercise 4.6 (Component size in a 2-regular graph with some degree-1 vertices) Consider CMn (d) with
n1 → ∞ with n1 /n → 0, and n2 = n − n1 . Let C (1) denote the size of the connected component of
vertex 1. Show that
P
|C (1)|/n −→ 0. (4.7.2)
Exercise 4.7 (Component size in a 2-regular graph with some degree-4 vertices) Consider CMn (d) with
n4 → ∞ with n4 /n → 0, and n2 = n − n4 . Let C (1) denote the the size of the connected component of
vertex 1. Show that
P
|C (1)|/n −→ 1. (4.7.3)

Exercise 4.8 (Expected degree giant in CMn (d)) Prove that Eµ do 1{|C (o)|=∞} = E[D](1 − ξ 2 )
h i

as claimed in (4.3.6), where µ is the law of the unimodular branching-process tree with root offspring
distribution (pk )k≥0 given by pk = P(D = k).
Exercise 4.9 (Limiting constants in Theorem 4.9) Recall the constants t1 = 0 and t2 = θ = − log ξ,
where ξ is the zero of H given by Lemma 4.18(a). Prove that for t1 = 0 and t2 = θ, e−t1 = 1, e−t2 = ξ,
GD (e−t1 ) = 1, GD (e−t2 ) = 1 − ζ, h(e−t1 ) = 2E[D], and h(e−t2 ) = 2E[D]ξ 2 , where, for θ = ∞, e−t2
should be interpreted as 0.
Exercise 4.10 (Giant in UGn (d) for ν = ∞) Combine (4.2.39) and (4.3.101)–(4.3.102) to complete the
proof of the identification of the giant in UGn (d) for ν = ∞ in Theorem 4.21.
Exercise 4.11 (Number of degree-k vertices in giant NRn (w)) Let w satisfy Conditions 1.1(a),(b). Adapt
the proof of |Cmax |/n −→ ζ in Theorem 4.23 to show that vk (Cmax )/n −→ pk (1 − ξ k ) for NRn (w).
P P

Exercise 4.12 (Number of edges in giant NRn (w)) Let w satisfy Conditions 1.1(a),(b). Use Exercise
4.11 to show that |E(Cmax )|/n −→ 12 E[W ](1 − ξ 2 ).
P

Exercise 4.13 (Isolated vertex in CMn (d)) Use (4.4.5) to show that, when dv ≥ 3 for all v ∈ [n],
3n
P(∃ isolated vertex in CMn (d)) ≤ . (4.7.4)
(2`n − 1)(2`n − 3)
Exercise 4.14 (Isolated vertex (Cont.)) Use (4.4.11) to reprove Exercise 4.13. Hence, the bound in (4.4.11)
is quite sharp.
Exercise 4.15 (Connected component of size 2) Use (4.4.11) to prove that, when dv ≥ 3 for all v ∈ [n],
15n(n − 1)
P(∃ component of size 2 in CMn (d)) ≤ . (4.7.5)
(2`n − 1)(2`n − 3)(2`n − 5)
Exercise 4.16 (Lower bound on probability CMn (d) disconnected) Show that
P(CMn (d) disconnected) ≥ c/n
for some c > 0 when P(D = 3) > 0 and E[D] < ∞.
Exercise 4.17 (Lower bound on probability CMn (d) disconnected) Show that
P(CMn (d) disconnected) ≥ c/n
for some c > 0 when P(D = 4) > 0 and E[D] < ∞.
4.7 Exercises for Chapter 4 187

Exercise 4.18 (Factorial moments of Pn (2)) Consider CMn (d) subject to Conditions 1.7(a),(b), and
assume that p2 > 0. Let Pn (2) denote the number of 2-cycles consisting of two vertices of degree 2. Prove
that, for every k ≥ 1 and with λ2 = p22 /E[D]2 ,
E[(Pn (2))k ] → λk2 , (4.7.6)
d
where we recall that xk = x(x − 1) · · · (x − k + 1). Conclude that Pn (2) −→ Poi(λ2 ).
Exercise 4.19 (Cycles in CMn (d)) Let Pn (k) denote the number of k-cycles consisting of degree-2
d
vertices, for k ≥ 2. Let λ = p2 /E[D]. Use the multivariate moment method to prove that (Pn (k))k≥2 −→
(Xk )k≥2 , where (Xk )k≥2 are independent Poisson random variables with parameters

λk 2k−1 /k = lim E[Pn (k)].

n→∞

Exercise 4.20 (Cmax when dmin = 2) Consider CMn (d) with dmin = 2 and assume that P(D ≥ 3) > 0.
Show that Theorem 4.28 holds if P(∃v : dv ≥ 3 and v 6∈ Cmax ) = o(1).
Exercise 4.21 (Cycles of degree 2 vertices with one other vertex) Subject to Conditions 1.7(a),(b) and
dmin ≥ 2, show that the expected number of cycles consisting of vertices of degree 2 with a starting and
ending vertex of degree k converges to
k(k − 1)pk X
(2p2 /E[D])` .
2E[D]2
`≥1

Exercise 4.22 (Subcritical power-law GRGn (w) in Theorem 3.22) Use the size of the largest connected
component in the subcritical power-law CMn (d) in Theorem 4.29, combined with Theorem 1.9, to identify
the largest connected component in the subcritical power-law GRGn (w) in Theorem 3.22.
Exercise 4.23 (Sharp asymptotics in Theorem 4.29) Recall the setting of the largest connected subcritical
component in CMn (d) in Theorem 4.29. Prove that |Cmax | = dmax /(1 − ν)(1 + oP (1)) precisely when
dmax = Θ(n1/(τ −1) ). Prove that |Cmax | = dmax /(1−ν)(1+oP (1)) precisely when dmax = Θ(n1/(τ −1) ).
Exercise 4.24 (Sub-polynomial subcritical clusters) Use Theorem 4.29 to prove that |Cmax | = oP (nε ) for
every ε > 0 when (4.5.1) holds for every τ > 1. Thus, when the maximal degree is sub-polynomial in n,
also the size of the largest connected component is.
Exercise 4.25 (Single tree asymptotics in Theorem 4.29) Assume that the conditions in Theorem 4.29
hold. Use Theorem 4.1 to prove that the tree rooted at any half-edge incident to the vertex of maximal
degree converges in distribution to a subcritical branching process with expected total progeny 1/(1 − ν).
Exercise 4.26 (Two-tree asymptotics in Theorem 4.29) Assume that the conditions in Theorem 4.29 hold.
Use the local convergence in Theorem 4.1 to prove that the two trees rooted at any pair of half-edges
incident to the vertex of maximal degree jointly converge in distribution to two independent subcritical
branching processes with expected total progeny 1/(1 − ν).
Exercise 4.27 (Theorem 4.29 when dmax = o(log n)) Assume that the subcritical conditions in Theorem
4.29 hold, so that ν < 1 . Suppose that dmax = o(log n). Do you expect |Cmax | = dmax /(1−ν)(1+oP (1))
to hold? Note: No proof is expected; a reasonable argument will suffice.
Exercise 4.28 (Theorem 4.29 when dmax log n) Assume that the subcritical conditions in Theorem
4.29 hold, so that ν < 1. Suppose that dmax log n. Do you expect |Cmax | = dmax /(1 − ν)(1 + oP (1))
to hold? Note: No proof is expected; a reasonable argument will suffice.
Exercise 4.29 (Probability of simplicity in Theorem 4.30) Subject to the conditions in Theorem 4.30,
show that (4.5.3) implies that P(CMn (d) simple) = e−o(n) , as proved in Lemma 4.8.
Exercise 4.30 (Exponential moments) Show that Condition 1.7(a) and supn≥1 E[eθDn ] < ∞ as in (4.5.5)
imply that E[Dnp ] → E[Dp ] for every p > 0. Conclude that then also Conditions 1.7(b)-(c) hold.
Exercise 4.31 (Moment versus tails) Show that E[Dnp ] → E[Dp ] < ∞ for some p > 1 precisely when
[1 − Fn ](x) ≤ cF x−(τ −1) for all x ≥ 1 and some τ > 2.
C HAPTER 5
C ONNECTED C OMPONENTS IN
P REFERENTIAL ATTACHMENT M ODELS

Abstract
In this chapter we investigate the connectivity structure of preferential attach-
ment models. We start by discussing an important tool: exchangeable random
variables and their distribution as described in de Finetti’s Theorem. We ap-
ply these results to Pólya urn schemes, which, in turn, we use to describe the
distribution of the degrees in preferential attachment models.
It turns out that Pólya urn schemes can also be used to describe the local limit
of preferential attachment models. A crucial ingredient is the fact that the edges
in the Pólya urn representation are conditionally independent, given the appro-
priate randomness. The resulting local limit is the Pólya point tree, a specific
multi-type branching process with continuous types.

5.1 M OTIVATION : C ONNECTIONS AND L OCAL D EGREE S TRUCTURE

The models discussed so far share the property that they are static and their edge-connection
probabilities are close to being independent. As discussed at great length in [V1, Chapter
8], see also Section 1.3.5, preferential attachment models were invented for their dynamic
structure: since edges incident to younger vertices connect to older vertices in a way that
favours high-degree vertices, preferential attachment models develop power-law degree dis-
tributions. This intuitive dynamics comes at the expense of creating dynamic models in
which edge-connection probabilities are hard to compute. As a result, we see that proofs for
preferential attachment models are generally substantially harder than those for inhomoge-
neous random graphs and configuration models.
In this chapter, we explain how this difference can be overcome, to some extent, by real-
izing that the degree evolution in preferential attachment models can be described in terms
of exchangeable random variables. Because of this, we can describe these models in terms
of independent edges, given some appropriate extra randomness.

Organization of this Chapter

We start in Section 5.2 by discussing exchangeable random variables, and their fascinating
properties. Indeed, de Finetti’s Theorem implies that infinite sequences of exchangeable ran-
dom variables are, conditional on the appropriate randomness, independent and identically
distributed. We continue in Section 5.3 by stating local convergence for preferential attach-
ment models and setting the stage for its proof. A major result here is the finite-graph Pólya
urn description of the preferential attachment model, which states that its edges are con-
ditionally independent given the appropriate randomness. The proof of local convergence
is completed in Section 5.4 using various martingale, coupling, and Poisson-process tech-
niques. In Section 5.5 we investigate the connectivity of preferential attachment models.
Section 5.6 highlights some further results for preferential attachment models. We close this
chapter in Section 5.7 with notes and discussion and in Section 5.8 with exercises.

189
190 Connected Components in Preferential Attachment Models

5.2 E XCHANGEABLE R ANDOM VARIABLES AND P ÓLYA U RN S CHEMES

In this section we discuss the distribution of infinite sequences of exchangeable random

variables and their applications to Pólya urn schemes.

De Finetti’s Theorem for Exchangeable Random Variables

We start by defining the conditions for sequences of random variables to be exchangeable:

Definition 5.1 (Exchangeable random variables) A finite sequence of random variables

(Xi )ni=1 is called exchangeable when the distribution of (Xi )ni=1 is the same as that of
(Xσ(i) )ni=1 for any permutation σ : [n] → [n]. An infinite sequence (Xi )i≥1 is called ex-
changeable when (Xi )ni=1 is exchangeable for every n ≥ 1. J

The notion of exchangeability is rather strong and implies for example that the distribution
of Xi is the same for every i (see Exercise 5.1) as well as that (Xi , Xj ) have the same
distribution for every i 6= j .
Clearly, when a sequence of random variables is iid then it is also exchangeable (see Exer-
cise 5.2). A second example arises when we take a sequence of random variables that are iid
conditionally on some random variables. An example could be a sequence of Bernoulli ran-
dom variables that are iid conditional on their success probability U but U itself is random.
This is called a mixture of iid random variables. Remarkably, the distribution of an infinite
sequence of exchangeable random variables is always such a mixture. This is the content of
de Finetti’s Theorem, which we state and prove here in the case where (Xi )i≥1 are indicator
variables:

Theorem 5.2 (De Finetti’s Theorem) Let (Xi )i≥1 be an infinite sequence of exchangeable
random variables, and assume that Xi ∈ {0, 1}. Then there exists a random variable U
with P(U ∈ [0, 1]) = 1 such that, for all n ≥ 1 and k ∈ [n],

P(X1 = · · · = Xk = 1, Xk+1 = · · · = Xn = 0) = E[U k (1 − U )n−k ]. (5.2.1)

The theorem of de Finetti (Theorem 5.2) states that an infinite exchangeable sequence of
indicators has the same distribution as an independent Bernoulli sequence with a random
success probability U . Thus, the different elements of the sequence are not independent, but
their dependence enters only through the success probability U .
The proof of Theorem 5.2 can be relatively easily extended to more general settings, for
example, when Xi takes on a finite number of values. Since we are relying on Theorem 5.2
only for indicator variables, we refrain from stating this more general version.
Define Sn to be the number of ones in (Xi )ni=1 , i.e.,
n
X
Sn = Xk . (5.2.2)
k=1

Then Theorem 5.2 is equivalent to the statement that

h i
P(Sn = k) = E P Bin(n, U ) = k . (5.2.3)
5.2 Exchangeable Random Variables and Pólya Urn Schemes 191

The reader is asked to prove (5.2.3) in Exercise 5.4. Equation (5.2.3) also allows us to com-
pute the distribution of U . Indeed, when we suppose that
Z b
lim P(Sn ∈ (an, bn)) = f (u)du, (5.2.4)
n→∞ a

where f is a density, then (5.2.3) implies that f is in fact the density of the random variable
U . This is useful in applications of de Finetti’s Theorem (Theorem 5.2). Equation (5.2.4)
a.s.
follows by noting that Sn /n −→ U by the strong law of large numbers applied to the
conditional law given U . In Exercise 5.3, you are asked to fill in the details.
Proof of Theorem 5.2. The proof makes use of Helly’s Theorem, which states that any se-
quence of bounded random variables has a weakly converging subsequence. We fix m ≥ n
and condition on Sm to write
P(X1 = · · · = Xk = 1, Xk+1 = · · · = Xn = 0) (5.2.5)
Xm
P X1 = · · · = Xk = 1, Xk+1 = · · · = Xn = 0 | Sm = j P(Sm = j).

=
j=k

m
By exchangeability, and conditional on Sm = j , each sequence (Xi )i=1 containingm−n
pre-
m
cisely j ones is equally likely. There are precisely j such sequences, and precisely j−k
of them start with k ones followed by n − k zeros. Thus,
m−n

j−k
P X1 = · · · = Xk = 1, Xk+1 = · · · = Xn = 0 | Sm = j = m .

(5.2.6)
j

Writing (m)k = m(m − 1) · · · (m − k + 1) for the k th falling factorial of m, we therefore

arrive at
P(X1 = · · · = Xk = 1, Xk+1 = · · · = Xn = 0)
m
X (j)k (m − j)n−k
= P(Sm = j). (5.2.7)
j=k
(m)n

When m → ∞ and for k and n with k ≤ n fixed,

(j)k (m − j)n−k j k j n−k
= 1− + o(1). (5.2.8)
(m)n m m
Equation (5.2.8) can be seen by splitting between j > εm and j ≤ εm for ε > 0 arbitrarily
small. For the former (j)k = j k (1 + o(1)), while for the latter (j)k ≤ (εm)k and (m −
j)n−k /(m)n ≤ m−k .
Recall that Sm = j , so that
P(X1 = · · · = Xk = 1, Xk+1 = · · · = Xn = 0) = lim E Ymk (1 − Ym )n−k , (5.2.9)

m→∞

where Ym = Sm /m. Note that it is here that we make use of the fact that (Xi )i≥1 is an in-
finite exchangeable sequence of random variables. Equation (5.2.9) is the point of departure
for the completion of the proof.
We have that Ym ∈ [0, 1] since Sm ∈ [0, m], so that the sequence of random variables
192 Connected Components in Preferential Attachment Models

(Ym )m≥1 is bounded. By Helly’s Theorem, it thus contains a weakly converging subse-
quence, i.e., there exists a subsequence (Yml )l≥1 with liml→∞ ml = ∞ and a random
d
variable U such that Yml −→ U . Since the random variable Ymk (1 − Ym )n−k is uniformly
bounded for each k, n, Lebesgue’s Dominated Convergence Theorem ([V1, Theorem A.1])
gives that
lim E Ymk (1 − Ym )n−k = lim E Ymk l (1 − Yml )n−k

m→∞ l→∞

= E U k (1 − U )n−k .

(5.2.10)
This completes the proof. Yet a careful reader may wonder whether the above proof on the
basis of subsequences is enough. Indeed, it is possible that another subsequence (Ym0l )l≥1
d
with liml→∞ m0l = ∞ has a different limiting random variable V such that Ym0l −→ V.
However, from (5.2.9) we then conclude that E V k (1 − V )n−k = E U k (1 − U )n−k

for every k, n. In particular, E[V k ] = E[U k ] for every k ≥ 0. Since the random variables
U, V are almost surely bounded by 1, and have the same moments, they also have the same
d
distribution. We conclude that Yml −→ U for every subsequence (ml )l≥1 along which
d
(Yml )l≥1 converges, and this is equivalent to Ym −→ U .
The theorem of de Finetti implies that if Xk and Xn are coordinates of an infinite ex-
changeable sequence of indicators then they are positively correlated; see Exercise 5.5. Thus,
it is impossible for infinite exchangeable sequences of indicator variables to be negatively
correlated, which is somewhat surprising.
In the proof of de Finetti’s Theorem, it is imperative that the sequence (Xi )i≥1 is infinite.
This is not merely a technicality of the proof. Rather, there are finite exchangeable sequences
of random variables for which the equality (5.2.1) does not hold. Indeed, take an urn filled
with b blue and r red balls, and draw balls successively without replacement. Thus, the urn
is sequentially being depleted, and it will be empty after the (b + r)th ball is drawn. Let Xi
denote the indicator that the ith ball drawn is blue. Then, clearly, the sequence (Xi )r+bi=1 is
exchangeable. However,
b(b − 1)
P(X1 = X2 = 1) =
(b + r)(b + r − 1)
b 2
< ) = P(X1 = 1)P(X2 = 1), (5.2.11)
b+r
so that X1 and X2 are negatively correlated.

Pólya Urn Schemes

An important application of de Finetti’s Theorem (Theorem 5.2) arises in so-called Pólya
urn schemes. An urn contains a number of balls. We start with B0 = b0 blue balls and
R0 = r0 red balls at time n = 0. Let Wb , Wr : N → (0, ∞) be two weight functions. Then,
at time n + 1, the probability of drawing a blue ball, conditional on the number Bn of blue
balls at time n, is proportional to the weight of the blue balls at time n, i.e., it is
Wb (Bn )
. (5.2.12)
Wb (Bn ) + Wr (Rn )
5.2 Exchangeable Random Variables and Pólya Urn Schemes 193

After drawing a ball, it is replaced together with a second ball of the same color; we denote
this Pólya urn scheme by ((Bn , Rn ))n≥0 . Naturally, since we always replace one ball by
two balls, the total number of balls Bn + Rn = b0 + r0 + n is deterministic.
In this section, we restrict to the case where there exist ar , ab > 0 such that

Wb (k) = ab + k, Wr (k) = ar + k, (5.2.13)

i.e., both weight functions are linear with the same slope, but possibly a different intercept.
Our main result concerning Pólya urn schemes is the following theorem:

Theorem 5.3 (Limit theorem for linear Pólya urn schemes) Let ((Bn , Rn ))n≥0 be a Pólya
urn scheme starting with (B0 , R0 ) = (b0 , r0 ) balls of each color, and with linear weight
functions Wb and Wr as in (5.2.13) for some ar , ab > 0. Then, as n → ∞,
Bn a.s.
−→ U, (5.2.14)
Bn + Rn
where U has a Beta distribution with parameters a = b0 + ab and b = r0 + ar , and, for all
k ≤ n,
h i
P(Bn = b0 + k) = E P Bin(n, U ) = k . (5.2.15)

Before proving Theorem 5.3, let us comment on its remarkable content. Clearly, the num-
ber of blue balls Bn is not a binomial random variable, as early draws of blue balls reinforce
the proportion of blue balls in the end. However, (5.2.15) states that we can first draw a ran-
dom variable U and then, conditionally on that random variable, the number of blue balls
is binomial. This is an extremely useful perspective, as we will see later on. The urn con-
ditioned on the limiting variable U is sometimes called a Pólya urn with strength U , and
Theorem 5.3 implies that this is a mere binomial experiment given the strength. The vari-
ables a = b0 + ab and b = r0 + ar of the Beta distributions indicate the initial weights of
each of the two colors.
Proof of Theorem 5.3. We start with the almost sure convergence in (5.2.14). Let Mn =
(Bn + ab )/(Bn + Rn + ab + ar ). Note that
1
E[Mn+1 | (Bl )nl=1 ] = E[(Bn+1 + ab ) | Bn ]
Bn+1 + Rn+1 + ab + ar
1 h Bn + ab i
= Bn + ab +
Bn+1 + Rn+1 + ab + ar Bn + Rn + ab + ar
Bn + ab h Bn + Rn + ab + ar + 1 i
=
Bn+1 + Rn+1 + ab + ar Bn + Rn + ab + ar
Bn + ab
= = Mn , (5.2.16)
Bn + Rn + ab + ar
since Bn+1 + Rn+1 + ab + ar = Bn + Rn + ab + ar + 1. As a result, (Mn )n≥0 is a
non-negative martingale, and thus converges almost surely to some random variable U by
the Martingale Convergence Theorem ([V1, Theorem 2.24]).
We continue by identifying the limiting random variable in (5.2.14), which will follow
194 Connected Components in Preferential Attachment Models

from (5.2.15). Let Xn denote the indicator that the nth ball drawn is blue. We first show that
(Xn )n≥1 is an infinite exchangeable sequence. Note that
n
X n
X
Bn = b0 + Xj , Rn = r0 + (1 − Xj ) = r0 + b0 + n − Bn . (5.2.17)
j=1 j=1

Now, for any sequence (xt )nt=1 ,

n
Y Wb (bt−1 )xt Wr (rt−1 )1−xt
P (Xt )nt=1 (xt )nt=1

= = , (5.2.18)
t=1
Wb (bt−1 ) + Wr (rt−1 )
Pt Pn
where bt = b0 + j=1 xj and rt = r0 + b0 + t − bt . Denote k = t=1 xt . Then, by
(5.2.13) and (5.2.17),
n
Y n
Y
(Wb (bt−1 ) + Wr (rt−1 )) = (b0 + r0 + ab + ar + t − 1), (5.2.19)
t=1 t=1

while
n
Y k−1
Y n
Y n−k−1
Y
Wb (bt−1 )xt = (b0 + ab + m), Wr (rt−1 )1−xt = (r0 + ar + j).
t=1 m=0 t=1 j=0
(5.2.20)
Thus, we arrive at
Qk−1 Qn−k−1
m=0 (b + m) j=0 (r + j)
P (Xt )nt=1 (xt )nt=1

= = Qn−1 , (5.2.21)
t=0 (b + r + t)

where b = b0 + ab and r = r0 + ar . In particular, (5.2.21) does not depend on the or-

der in which the elements of (xt )nt=1 appear, so that the sequence (Xn )n≥1 is an infinite
exchangeable sequence. de Finetti’s Theorem (Theorem 5.2) implies that (Xn )n≥1 is a mix-
ture of Bernoulli random variables with a random success probability U , and we are left to
compute the distribution of U . We also observe that the distribution of U depends only on
b0 , r0 , ab , ar through b = b0 + ab and r = r0 + ar .
To identify the law of U , we verify (5.2.4). For fixed 0 ≤ k ≤ n, there are nk sequences

of k ones and n − k zeros. Each sequence has the same probability, given by (5.2.21). Thus,
! Qk−1 Qn−k−1
n m=0 (b + m) j=0 (r + j)
P(Sn = k) = Qn−1
k t=0 (b + r + t)
Γ(n + 1) Γ(k + b) Γ(n − k + r) Γ(b + r)
= × × ×
Γ(k + 1)Γ(n − k + 1) Γ(b) Γ(r) Γ(n + b + r)
Γ(b + r) Γ(k + b) Γ(n − k + r) Γ(n + 1)
= × × × . (5.2.22)
Γ(r)Γ(b) Γ(k + 1) Γ(n − k + 1) Γ(n + b + r)
For k and n − k large, by [V1, (8.3.9)],
Γ(b + r) k b−1 (n − k)r−1
P(Sn = k) = (1 + o(1)). (5.2.23)
Γ(r)Γ(b) nb+r−1
5.2 Exchangeable Random Variables and Pólya Urn Schemes 195

Taking k = dune (recall (5.2.4)), this leads to

Γ(b + r) b−1
lim nP(Sn = dune) = u (1 − u)r−1 , (5.2.24)
n→∞ Γ(r)Γ(b)
which is the density of a Beta distribution with parameters b and r; the convergence is
uniform for u ∈ (a, b) for all a, b ∈ (0, 1). It is not hard to show from (5.2.24) that (5.2.4)
holds with f (u) the right-hand side of (5.2.24) (see Exercise 5.6).

Multiple Urn Extensions

We next explain how Theorem 5.3 can be inductively extended to urns with several colors
of balls. This is essential in the analysis of preferential attachment models, where we need a
large number of urns.
Assume that we have an urn with ` colors (Ci (n))i∈[`],n≥0 , where Ci (n) denotes the
number of balls of color i at time n. Again, we restrict to the setting where the weight
functions in the urn are affine, i.e., there exist (ai )i∈[`] such that
Wi (k) = ai + k. (5.2.25)
We assume that the Pólya urn starts with ki balls of color i, and that a ball is drawn according
to the weights Wi (Ci (n)) for i ∈ [`], after which it is replaced by two balls of the same
P` P`
color. For j ∈ [`], we let a[j,`] = i=j ai and C[j,`] (n) = i=j Ci (n). We first view the
balls of color 1 and the other colors as a two-type urn. Thus,
C1 (n) a.s.
−→ U1 , (5.2.26)
n
where U1 has a Beta distribution with parameters a = k1 + a1 and b = k[2,`] + a[2,`] .
This highlights the proportion of balls of color 1, but groups all other balls together as one
“combined” color. This combined color takes a proportion 1 − U1 of the balls. Now, the
times that a “combined” color ball is drawn again forms a (multi-type) Pólya urn scheme,
now with the colors 2, . . . , `. This implies that
C2 (n) a.s.
−→ U2 (1 − U1 ), (5.2.27)
n
where U2 is independent of U1 and has a Beta distribution with parameters a = k2 + a2 and
b = k[3,`] + a[3,`] . Repeating the procedure gives that
i−1
Ci (n) a.s. Y
−→ Ui (1 − Uj ), (5.2.28)
n j=1

where Ui is independent of (U1 , . . . , Ui−1 ) and has a Beta distribution with parameters
a = ki + ai and b = k[i,`] + a[i,`] . This gives not only an extension of Theorem 5.3 to urns
with multiple colors but also an appealing independence structure of the limits.

Applications to Relative Sizes in Scale-Free Trees

We close this section by discussing applications of Pólya urn schemes to scale-free trees.
We start at time n = 2 with an initial graph consisting of two vertices of which vertex 1
has degree d1 and vertex 2 has degree d2 . Needless to say, in order for the initial graph
196 Connected Components in Preferential Attachment Models

to be possible, we need d1 + d2 to be even, and the graph may contain self-loops and
multiple edges. After this, we successively attach vertices to older vertices with probability
proportional to the degree plus δ > −1. We do not allow for self-loops in the growth
of the trees, so that the structures connected to vertices 1 and 2 are trees (but the entire
structure is not when d1 + d2 > 2). This is a generalization of (PA(1,δ)
n (b))n≥2 , in which we
are more flexible in choosing the initial graph. The model for (PA(1,δ)
n (b))n≥1 arises when
d1 = d2 = 2 (see Exercise 5.8). For (PA(1,δ) n (d))n≥1 , d1 = d2 = 1 is the most relevant
(1,δ)
(recall from Section 1.3.5 that (PAn (d))n≥1 starts at time 1 with two vertices and one
edge between them).
We decompose the growing tree into two trees. For i = 1, 2, we let Ti (n) be the tree
of vertices that are closer to vertex i than to vertex 3 − i. Thus, the tree T2 (n) consists of
those vertices for which the path in the tree from the vertex to vertex 1 passes through vertex
2, and T1 (n) consists of the remainder of the scale-free tree. Let Si (n) = |Ti (n)| denote
the number of vertices in Ti (n). Clearly, S1 (n) + S2 (n) = n, which is the total number
of vertices in the tree at time n. We can apply Theorem 5.3 to describe the relative sizes of
T1 (n) and T2 (n):
Theorem 5.4 (Tree decomposition for scale-free trees) For scale-free trees with initial
degrees d1 , d2 ≥ 1, as n → ∞,
S1 (n) a.s.
−→ U, (5.2.29)
n
where U has a Beta distribution with parameters a = (d1 + δ)/(2 + δ) and b = (d2 +
δ)/(2 + δ), and
h i
P(S1 (n) = k) = E P Bin(n − 1, U ) = k − 1 . (5.2.30)

By Theorem 5.4, we can decompose a scale-free tree into two disjoint scale-free trees
each of which contains an almost surely positive proportion of the vertices.

Proof The evolution of (S1 (n))n≥2 can be viewed as a Pólya urn scheme. Indeed, when
S1 (n) = s1 (n), the probability of attaching the (n + 1)th vertex to T1 (n) is equal to
(2s1 (n) + d1 − 2) + δs1 (n)
, (5.2.31)
(2s1 (n) + d1 − 2) + δs1 (n) + 2(s2 (n) + d2 ) + δs2 (n)
since the number of vertices in Ti (n) equals Si (n), while the total degree of Ti (n) equals
(2Si (n) + di − 2). We can rewrite this as
s1 (n) + (d1 − 2)/(2 + δ)
, (5.2.32)
s1 (n) + s2 (n) + (d1 + d2 − 4)/(2 + δ)
which is equal to (5.2.12) in the case (5.2.13) when r0 = b0 = 1 and ab = (d1 − 2)/(2 +
δ), ar = (d2 − 2)/(2 + δ). Therefore, Theorem 5.4 follows directly from Theorem 5.3.
We continue by adapting the above argument to the size of the connected component
of, or subtree containing, vertex 1 in PA(1,δ)
n (a) (recall Section 1.3.5), which we denote by
0
S1 (n):
5.2 Exchangeable Random Variables and Pólya Urn Schemes 197

Theorem 5.5 (Tree decomposition for preferential attachment trees) For PA(1,δ)
n (a), as
n → ∞,
S10 (n) a.s. 0
−→ U , (5.2.33)
n
where U 0 has a mixed Beta distribution with random parameters a = I + 1 and b =
1 + (1 + δ)/(2 + δ), and where, for ` ≥ 2,
P(I = `) = P(first vertex that is not connected to vertex 1 is vertex `). (5.2.34)
Consequently,
h i
P(S1 (n) = k) = E P Bin(n − 1, U 0 ) = k − 1 . (5.2.35)

Proof By construction, all vertices in [I − 1] are in the subtree containing vertex 1. We

note that S10 (n) = n for all n < I , and S10 (I) = I − 1, while the subtree containing vertex
I consists only of the vertex I , i.e., S20 (I) = 1. For n ≥ I + 1, the evolution of (S10 (n))n≥2
can be viewed as a Pólya urn scheme. Indeed, when S10 (n) = s01 (n), the probability of
attaching the (n + 1)th vertex to the tree rooted at vertex 1 is equal to
(2 + δ)s01 (n)
. (5.2.36)
(2 + δ)n + 1 + δ
We can rewrite this as
s01 (n) (s01 (n) − I) + I
= , (5.2.37)
n + (1 + δ)/(2 + δ) (n − I) + I + (1 + δ)/(2 + δ)
which is equal to (5.2.12) in the case (5.2.13) when b0 = r0 = 1 and ab = I, ar =
(1 + δ)/(2 + δ). Therefore, Theorem 5.5 follows directly from Theorem 5.3.

Applications to Relative Degrees in Scale-Free Trees

We continue by discussing an application of Pólya urn schemes to relative initial degrees.
For this, we fix an integer k ≥ 2 and consider only times n ≥ k at which an edge is attached
to one of the k initial vertices. We work with (PA(1,δ)
n (a))n≥1 , so that we start at time n = 1
with one vertex having one self-loop, after which we successively attach vertices to older
vertices with probabilities proportional to the degree plus δ > −1, allowing for self-loops.
The main result is as follows:

n (a))n≥1 , as n → ∞,
Theorem 5.6 (Relative degrees in scale-free trees) For (PA(1,δ)
Dk (n) a.s.
−→ ψk , (5.2.38)
D[k] (n)
where D[k] (n) = D1 (n) + · · · + Dk (n) and ψk ∼ Beta(1 + δ, (k − 1)(2 + δ)).
a.s.
By Theorem 1.17, Dk (n)n−1/(2+δ) −→ ξk , where ξk is positive almost surely by the
argument in the proof of [V1, Theorem 8.14]. It thus follows from Theorem 5.6 that ψk =
ξk /(ξ1 + · · · + ξk ). We conclude that Theorem 5.6 allows us to identify properties of the
law of the limiting degrees.
198 Connected Components in Preferential Attachment Models

Proof of Theorem 5.6. Denote the sequence of stopping times (τk (n))n≥2k−1 , by τk (2k −
1) = k − 1, where
τk (n) = inf{t : D[k] (t) = n}, (5.2.39)
i.e., τk (n) is the time where the total degree of vertices [k] equals n. The initial condition
τk (2k − 1) = k − 1 is chosen such that the half-edge incident to vertex k is already con-
sidered to be present at time k − 1, but the receiving end of that edge is not. This guarantees
that also the attachment of the edge of vertex k is properly taken into account.
a.s.
Note that τk (n) < ∞ for every n, since Dj (n) −→ ∞ as n → ∞ for every j . Moreover,
a.s.
since τk (n) −→ ∞ as n → ∞,
Dk (n) Dk (τk (n)) Dk (τk (n))
lim = lim = lim . (5.2.40)
n→∞ D[k] (n) n→∞ D[k] (τk (n)) n→∞ n

Now, the random variables (Dk (τk (n)), D[k−1] (τk (n))) n≥2k−1 form a Pólya urn scheme,
with Dk (τk (2k − 1)) = 1, and D[k−1] (τk (2k − 1)) = 2k − 2. The edge at time τk (n) is
attached to vertex k with probability
Dk (τk (n)) + δ
, (5.2.41)
n + kδ
which is the probability of a Pólya urn scheme having linear weights as in (5.2.13) with
ab = δ, ar = (k − 1)δ , b0 = 1, and r0 = 2(k − 1). Thus, the statement follows from
Theorem 5.3.
Theorem 5.6 is easily extended to (PA(1,δ)
n (b))n≥1 :

n (b))n≥1 , as n → ∞,
Theorem 5.7 (Relative degrees in scale-free trees) For (PA(1,δ)
Dk (n) a.s. 0
−→ ψk , (5.2.42)
D[k] (n)
where ψk0 ∼ Beta(1 + δ, (2k − 1) + (k − 1)δ) for k ≥ 3, and ψ20 ∼ Beta(2 + δ, 2 + δ).
The dynamics for (PA(1,δ) (1,δ)
n (b))n≥1 are slightly different than those of (PAn (a))n≥1 ,
(1,δ)
since PAn (b) does not allow for self-loops in the growth of the tree. Indeed, now the
random variables (Dk (τk (n)), D[k−1] (τk (n))) n≥2k form a Pólya urn scheme, starting
with Dk (τk (2k)) = 1, and D[k−1] (τk (2k)) = 2k − 1. The edge at time τk (n) is attached
to vertex k with probability
Dk (τk (n)) + δ
, (5.2.43)
n + kδ
which are the probabilities of a Pólya urn scheme in (5.2.12) in the linear weight case in
(5.2.13) when ab = δ, ar = (k − 1)δ , b0 = 1, and r0 = 2k − 1. The setting is a little
different for k = 2, since vertex 3 attaches to vertices 1 and 2 with equal probability, so
that ψ20 ∼ Beta(2 + δ, 2 + δ). Thus, again the statement follows from Theorem 5.3. See
Exercise 5.10 for the complete proof. We conclude that, even though (PA(1,δ) n (a))n≥1 and
(PA(1,δ)
n (b))n≥1 have the same asymptotic degree distribution, the limiting degree ratios in
Theorems 5.6 and 5.7 are different.
5.3 Local Convergence of Preferential Attachment Models 199

5.3 L OCAL C ONVERGENCE OF P REFERENTIAL ATTACHMENT M ODELS

In this section we study the local limit of preferential attachment models, which is a more
difficult subject that of inhomogeneous random graphs or configuration models. Indeed,
it turns out that the local limit is not described by a homogeneous unimodular branching
process but rather by an inhomogeneous multi-type branching process.

5.3.1 L OCAL C ONVERGENCE OF PAM S WITH F IXED N UMBER OF E DGES

In this subsection we study the local limit of the preferential attachment model, using Pólya
urn schemes. We start with (PA(m,δ)n (d))n≥1 , as this model turns out to be the simplest for
local limits due to its close relation to Pólya urn schemes. In Section 5.4.4, we discuss the
related models (PA(m,δ)
n (a))n≥1 and (PA(m,δ)n (b))n≥1 .
Recall from Section 1.3.5 that the graph starts at time 1 with two vertices having m edges
between them. Let τk be the k th time that an edge is added to either vertex 1 or 2. The re-
lation to Pólya urn schemes can be informally explained by noting that the random variable
k 7→ D1 (τk )/(D1 (τk ) + D2 (τk )) can be viewed as the proportion of type-1 vertices in a
Pólya urn starting with m balls of types 1 and 2, respectively. Application of de Finetti’s
Theorem shows that D1 (τk )/(D1 (τk ) + D2 (τk )) converges almost surely to a certain Beta
distribution, which we denote by ψ . What is particularly nice about this description is that
the random variable D1 (τk ) has exactly the same distribution as m plus a Bin(k, ψ) distri-
bution, i.e., conditional on ψ , D1 (τk ) is a sum of iid random variables. In the graph context,
the Pólya urn description becomes more daunting. However, a description in terms of Beta
random variables can again be given.

Definition of the Pólya Point Tree

We will show that, asymptotically, the neighborhood of a random vertex on in PA(m,δ) n (d) is
a multi-type branching process, in which every vertex has a type that is closely related to the
age of the vertex. Thus, as for inhomogeneous random graphs and the configuration model,
the local limit of PA(m,δ)
n (d) is again a tree, but it is not homogeneous. This tree has degrees
with a Poissonian component to them, as for inhomogeneous random graphs, owing to the
younger vertices that connect to the vertex of interest, but also a deterministic component in
that every vertex is connected to m older vertices.
To keep track of these two aspects, we give the nodes in the tree a type that consists of
its age as well as a label indicating the relative age wrt its parent. Indeed, each node has
two different classes of children, labeled O and Y. The children labeled with O are the older
neighbors to which one of the initial m edges of the parent is connected, while children
labeled Y are younger nodes, which use one of their m edges to connect to their parent.
Since we are interested in the asymptotic neighborhood of a uniform vertex, the age of the
root, which corresponds to the limit of on /n, is a uniform random variable on [0, 1]. In order
to describe its immediate neighbors, we have to consider to which older nodes of label O the
root is connected, as well as the number and ages of the younger nodes, of label Y. After
this, we again have to consider the number of Y- and O-labeled children that its children
have, etc.
Let us now describe these constructs in detail. We define a multi-type branching process,
200 Connected Components in Preferential Attachment Models

called the Pólya point tree. Its nodes are labeled by finite words, using the Ulam–Harris
labeling of trees in Section 1.5, as w = w1 w2 · · · wl , each carrying an age as well as a label
Y or O denoting whether the child is younger or older than its parent in the tree.
The root ∅ has age U∅ , where U∅ is chosen uar in [0, 1]. The root is special, and has
no label in {Y, O}, since it has no parent. Having discussed the root of the tree, we now
construct the remainder of the tree by recursion.
In the recursion step, we assume that the Ulam–Harris word w (recall Section 1.5) and
the corresponding age variable Aw ∈ [0, 1] have been chosen in a previous step. For j ≥ 1,
let wj be the j th child of w, and set
(
m if w is the root or of label O,
m− (w) = (5.3.1)
m − 1 if w is of label Y.
The intuition behind (5.3.1) is that m− (w) equals the number of older children of w, which
equals m when w is older than its parent, and m − 1 when w is younger than its parent.
Recall that a Gamma distribution with parameters r and λ has the density given in (1.5.1).
Let Γ have a Gamma distribution with parameters r = m + δ and λ = 1, and let Γ? be the
size-biased version of Y , which has a Gamma distribution with parameters r = m + δ + 1
and λ = 1 (see Exercise 5.11). We then take
(
Γ if w is the root or of label Y,
Γw ∼ (5.3.2)
Γ? if w is of label O,
independently of everything else.
Let w1, . . . , wm− (w) be the children of w having label O, and let their ages Aw1 , . . . ,
Awm− (w) be given by
1/χ
Awj = Uwj Aw , (5.3.3)
m (w)
−
where (Uwj )j=1 are iid uniform random variables on [0, 1] that are independent of every-
thing else, and let
m+δ
χ= . (5.3.4)
2m + δ
Further, let (Aw(m− (w))+j) )j≥1 be the (ordered) points of a Poisson point process on [Aw , 1]
with intensity
Γw x1/(τ −1)−1
ρw (x) = 1/(τ −1)
, (5.3.5)
τ − 1 Aw
where we recall that τ = 3 + δ/m by (1.3.63), and the nodes (w(m− (w) + j))j≥1 have
label Y. The children of w are the nodes wj with labels O and Y.
The above random tree is known as the Pólya point tree. The Pólya point tree is a
multi-type discrete-time branching process, where the type of a node w is equal to the pair
(aw , tw ), with aw ∈ [0, 1] corresponding to the age of the vertex, and tw ∈ {Y, O} to
its label. Thus, the type space S = [0, 1] × {Y, O} of the multi-type branching process is
continuous.
Let us discuss the offspring structure of the above process. Obviously, there are finitely
5.3 Local Convergence of Preferential Attachment Models 201

(a) (b)
100 100

10−1
10−1

10−2
P(X > x)

P(X > x)
−2
10

10−3

10−3
10−4
Degree distribution Degree distribution
−4
Size biased degree distribution −5
Size-biased degree distribution
10 Random friend degree distribution 10 Random friend degree distribution
100 101 102 103 100 101
Degrees Degrees

Figure 5.1 Degree distributions in the preferential attachment model with

n = 10, 000, m = 2, and (a) δ = −1, τ = 2.5; (b) δ = 1, τ = 3.5.

many children of label O. Further, note that 1/(τ − 1) = m/(2m + δ) > 0, so the intensity
ρw in (5.3.5) of the Poisson process is integrable. Thus every vertex in the random tree has
almost surely finitely many children.

With the above description in hand, we are ready to state our main result concerning local
convergence of PA(m,δ)
n (d):

Theorem 5.8 (Local convergence of preferential attachment models) Fix m ≥ 1 and

δ > −m. The preferential attachment model PA(m,δ)
n (d) converges locally in probability to
the Pólya point tree.

See Figures 5.1 and 5.2 for examples of the various degree distributions in the preferential
attachment model, where we plot the degree distribution itself, the size-biased degree dis-
tribution, and the degree distribution of a random neighbor of a uniform vertex. Contrary to
the generalized random graphs and the configuration model (recall Figures 3.2 and 4.2), the
latter two degree distributions are different, particularly for small values of n, as in Figure
5.1, even though their power-law exponents do agree.
Extensions of Theorem 5.8 to other models, including those where self-loops are allowed,
are given in Section 5.4.4 below. These results show that Theorem 5.8 is quite robust to
minor changes in the model definition, and that it also applies to PA(m,δ)
n (a) and PA(m,δ)
n (b)
with the same limit. We refer to the discussion in Section 5.7 for more details and also the
history of Theorem 5.8.

The proof of Theorem 5.8 is organized as follows. We start in Section 5.3.2 by investi-
gating consequences of Theorem 5.8 for the degree structure of PA(m,δ)
n (d) (or any other
graph having the same local limit). In Section 5.3.3 we prove that PA(m,δ)
n (d) can be rep-
resented in terms of conditionally independent edges by relying on a Pólya urn description.
The remainder of the proof of local convergence is deferred to Section 5.4.
202 Connected Components in Preferential Attachment Models

(a) (b)
100 100

10−1 10−1

10−2 10−2
P(X > x)

P(X > x)
10−3 10−3

Figure 5.2 Degree distributions in the preferential attachment model with

n = 100, 000, m = 2, and (a) δ = −1, τ = 2.5; (b) δ = 1, τ = 3.5.

5.3.2 C ONSEQUENCES OF L OCAL C ONVERGENCE : D EGREE S TRUCTURE

Before turning to the proof of Theorem 5.8, we use it to describe some properties of the
degrees of vertices in PA(m,δ)
n (d). This is done in the following lemma:
Lemma 5.9 (Degree sequence of PAn(m,δ) (d)) Let Do (n) be the degree at time n of a vertex
chosen uar from [n] in PA(m,δ)
n (d). Then,
2m + δ Γ(m + 2 + δ + δ/m) Γ(k + δ)
P(Do (n) = k) → pk = , (5.3.6)
m Γ(m + δ) Γ(k + 3 + δ + δ/m)
and, with Do0 (n) the degree at time n of one of the m older neighbors of o,
P(Do0 (n) = k) (5.3.7)
2m + δ Γ(m + 2 + δ + δ/m) (k − m + 1)Γ(k + 1 + δ)
→ p0k = .
m2 Γ(m + δ) Γ(k + 4 + δ + δ/m)
Further, let Pk (n) denote the proportion of vertices of degree k in PA(m,δ)
n (d), and Pk0 (n)
the proportion of degree-k older neighbors. Then Pk (n) −→ pk and Pk0 (n) −→ p0k .
P P

Note that the limiting degree distribution in (5.3.6) is equal to that for PA(m,δ)n (a) in
(1.3.60), again exemplifying that the details of the model have little influence on the limiting
degree sequence. It is not hard to see from Lemma 5.9 that
pk = cm,δ k −τ (1 + O(1/k)), p0k = c0m,δ k −(τ −1) (1 + O(1/k)), (5.3.8)
for some constants cm,δ and c0m,δ and with τ = 3 + δ/m (see Exercise 5.12). We conclude
that there is a form of size biasing in that older neighbors of a uniform vertex have a limiting
degree distribution that satisfies a power law (like the degree of the random vertex itself),
but with an exponent that is one lower than that of the vertex itself (recall Figures 5.1 and
5.2). Exercises 5.13–5.15 study the joint distribution (D, D0 ) and various conditional power
laws.
Proof of Lemma 5.9 subject to Theorem 5.8. We note that local convergence in probability
5.3 Local Convergence of Preferential Attachment Models 203

implies the convergence of the degree distribution. It thus suffices to study the distribution
of the degree of the root in the Pólya point tree. We first condition on the age A∅ = U∅ of
the root of the Pólya point tree, where U∅ is standard uniform. Let D be the degree of the
root. Conditioning on A∅ = a, the degree D is m plus a Poisson variable with parameter
Z 1
Γ∅ 1/(τ −1)−1 1 − a1/(τ −1)
x dx = Γ ∅ ≡ Γ∅ κ(a), (5.3.9)
a1/(τ −1) (τ − 1) a a1/(τ −1)
where Γ∅ is a Gamma variable with parameters r = m + δ and λ = 1. Thus, taking the
expectation wrt Γ∅ , we obtain
Z ∞
y m+δ−1 −y
P(D = k | A∅ = a) = P(D = k | A∅ = a, Γ∅ = y) e dy
0 Γ(m + δ)
Z ∞
(yκ(a))k−m y m+δ−1 −y
= e−yκ(a) e dy
0 (k − m)! Γ(m + δ)
κ(a)k−m Γ(k + δ)
=
(1 + κ(a)) k−m+m+δ (k − m)!Γ(m + δ)
Γ(k + δ)
= (1 − a1/(τ −1) )k−m a(m+δ)/(τ −1) , (5.3.10)
(k − m)!Γ(m + δ)
where we use κ(a)/(1 + κ(a)) = 1 − a1/(τ −1) and 1/(1 + κ(a)) = a1/(τ −1) . We thus
conclude that
Z 1
P(D = k) = P(D = k | A∅ = a)da
0
Z 1
Γ(k + δ)
= (1 − a1/(τ −1) )k−m a(m+δ)/(τ −1) da. (5.3.11)
0 (k − m)!Γ(m + δ)
Recall that
1
Γ(p)Γ(q)
Z
up−1 (1 − u)q−1 du = . (5.3.12)
0 Γ(p + q)
Using the integral transform u = a1/(τ −1) , for which da = (τ − 1)u2−τ du, we arrive at
Z 1
Γ(k + δ)
P(D = k) = (τ − 1) (1 − u)k−m um+δ+1+δ/m du
(k − m)!Γ(m + δ) 0
Γ(k + δ) Γ(k − m + 1)Γ(m + 2 + δ + δ/m)
= (τ − 1)
(k − m)!Γ(m + δ) Γ(k + 3 + δ + δ/m)
Γ(k + δ)Γ(m + 2 + δ + δ/m)
= (τ − 1) . (5.3.13)
Γ(m + δ)Γ(k + 3 + δ + δ/m)
Since τ − 1 = (2m + δ)/m by (1.3.63), this proves (5.3.6).
We next extend this to the convergence in distribution of Do0 (n), for which we again note
that local convergence implies the convergence of the degree distribution of neighbors of the
root, and so in particular of Do0 (n). It thus suffices to study the distribution of the degree of a
uniform neighbor of the root in the Pólya point tree. We first condition on the age A∅ = U∅
of the root of the Pólya point tree, where U∅ is standard uniform, and recall that the age A∅1
204 Connected Components in Preferential Attachment Models

1/χ
of one of the m older vertices to which ∅ is connected has distribution A∅1 = U∅1 A∅ ,
where U∅1 is uniform on [0, 1] and 1/χ = (τ − 1)/(τ − 2) by (5.3.4). Let D0 be the degree
of vertex ∅1. By (5.3.5), conditioning on A∅1 = b, the degree D0 is m plus a Poisson
variable with parameter

1
Γ∅1 1 − b1/(τ −1)
Z
x1/(τ −1)−1 dx = Γ∅1 ≡ Γ∅1 κ(b), (5.3.14)
b1/(τ −1) (τ − 1) b b1/(τ −1)

where Γ∅1 is a Gamma variable with parameters r = m + 1 + δ and λ = 1.

Thus, taking expectations with respect to Γ∅1 , we obtain as before

∞
y m+1+δ
Z
0
P(D = k | A∅1 = b) = P(D0 = k | A∅1 = b, Γ∅1 = y) e−y dy
0 Γ(m + 2 + δ)
∞
(yκ(b))k−m y m+δ
Z
= e−yκ(b) e−y dy
0 (k − m)! Γ(m + 1 + δ)
κ(b)k−m Γ(k + 1 + δ)
= (5.3.15)
(1 + κ(b))k−m+m+1+δ (k − m)!Γ(m + 1 + δ)
Γ(k + 1 + δ)
= (1 − b1/(τ −1) )k−m b(m+1+δ)/(τ −1) ,
(k − m)!Γ(m + 1 + δ)

where we again use that κ(b)/(1 + κ(b)) = 1 − b1/(τ −1) and 1/(1 + κ(b)) = b1/(τ −1) .
(τ −2)/(τ −1)
We next use that A∅1 = U∅1 A∅ , where A∅ is uniform on [0, 1]. Recall that
the vector (A∅ , U∅1 ) has density 1 on [0, 1]2 . Define the random vector (A∅ , A∅1 ) =
(τ −2)/(τ −1)
(A∅ , U∅1 A∅ ), so that (A∅ , A∅1 ) has joint density on {(a, b) : b ≤ a} given by

τ − 2 −(τ −2)/(τ −1) −1/(τ −1)

f(A∅ ,A∅1 ) (a, b) = a b . (5.3.16)
τ −1

We thus conclude that P(D0 = k) equals

1 a
τ −2
Z Z
−(τ −1)/(τ −2)
a b−1/(τ −1) P(D0 = k | A∅1 = b)db da
τ −1 0 0
τ −2 Γ(k + 1 + δ)
= (5.3.17)
τ − 1 (k − m)!Γ(m + 1 + δ)
Z 1 Z a
−(τ −2)/(τ −1)
× a (1 − b1/(τ −1) )k−m b(m+1+δ)/(τ −1)−1/(τ −1) db da
0 0
Z 1Z u
Γ(k + 1 + δ)
= (τ − 2)(τ − 1) (1 − v)k−m v m+1+δ+δ/m dv du,
(k − m)!Γ(m + 1 + δ) 0 0

where we have now used the integral transform u = aτ −1 and v = bτ −1 . Recall (5.3.12).
5.3 Local Convergence of Preferential Attachment Models 205

Interchanging the integrals over u and v thus leads to the conclusion that P(D0 = k) equals
Z 1
Γ(k + 1 + δ)
(τ − 2)(τ − 1) (1 − v)k−m+1 v m+1+δ+δ/m dv
(k − m)!Γ(m + 1 + δ) 0
Γ(k + 1 + δ) Γ(m + 2 + δ + δ/m)Γ(k − m + 2)
= (τ − 2)(τ − 1)
(k − m)!Γ(m + 1 + δ) Γ(k + 4 + δ + δ/m)
(2m + δ) Γ(m + 2 + δ + δ/m) (k − m + 1)Γ(k + 1 + δ)
= , (5.3.18)
m2 Γ(m + δ) Γ(k + 4 + δ + δ/m)
as required.

5.3.3 F INITE -G RAPH P ÓLYA V ERSION OF P REFERENTIAL ATTACHMENT M ODELS

The proof of Theorem 5.8 relies crucially on the exchangeability and applications of de
Finetti’s Theorem (Theorem 5.2). The crucial observation is that de Finetti’s Theorem can
be used to give an equivalent formulation of PA(m,δ)
n (d) that relies on independent random
variables. We now explain this finite-graph Pólya version of PA(m,δ)
n (d).
We start by introducing the necessary notation. Let (ψj )j≥1 be independent Beta random
variables with parameters α = m + δ, βj = (2j − 3)m + δ(j − 1), i.e.,

ψj ∼ Beta m + δ, (2j − 3)m + δ(j − 1) . (5.3.19)
Define
n
Y k
X n
Y
(n)
ϕj = ψj (1 − ψi ), (n)
Sk = (n)
ϕj = (1 − ψi ). (5.3.20)
i=j+1 j=1 i=k+1

Here the latter equality follows simply by induction on k ≥ 1 (see Exercise 5.16). Finally,
let Ik(n) = [Sk−1
(n)
, Sk(n) ). We now construct a graph as follows:
B conditional on ψ1 , . . . , ψn , choose (Uk,i )k∈[n],i∈[m] as a sequence of independent ran-
(n)
dom variables, with Uk,i chosen uar from the (random) interval [0, Sk−1 ];
B for k ∈ [n] and j < n, join two vertices j and k if j < k and Uk,i ∈ Ij(n) for some
i ∈ [m] (with multiple edges between j and k if there are several such i).
Call the resulting random multi-graph on [n + 1] the finite-size Pólya graph of size n. The
main result for PA(m,δ)
n (d) is as follows:
Theorem 5.10 (Finite-graph Pólya version of PA(m,δ)n (d)) Fix m ≥ 1 and δ > −m. Then,
the distribution of PA(m,δ)
n (d) is the same as that of the finite-size Pólya graph of size n.
The importance of Theorem 5.10 is that the edges in the finite-size Pólya graph are inde-
pendent conditional on the Beta variables (ψk )k≥1 , in a similar way as for (5.2.15) in The-
orem 5.3. This independence makes explicit computations possible. Exercises 5.18–5.17,
for example, use Theorem 5.10 to derive properties of the number of multiple edges in
PA(m,δ)
n (d) for m = 2.
In terms of the above Pólya point tree, the proof shows that the Gamma variables that
define the “strengths” Γw are inherited from the Beta random variables (ψk )k∈[n] , while the
age variables Aw are inherited from the random variables (Sk(n) )k∈[n] (see Lemmas 5.17 and
5.18 below).
206 Connected Components in Preferential Attachment Models

Remark 5.11 (Different starting graph) For PA(m,δ)

P
n (d), we note that v∈[n] Dv (n) =
2m(n−1) and that it has n vertices, since the graph at time 1 consists of two vertices with m
edges between them. However, it is sometimes more convenient to deal with a model where
the graph at time 2 consists of two vertices with 2m edges between them, so that the sum of
the degrees at time n equals 2mn and there are n vertices. In particular, for m = 1, this gives
rise to (PA(1,δ)
n (b))n≥1 . The proof shows that Theorem 5.10 remains true almost verbatim,
but now with (ψj )j∈[n] replaced by (ψj0 )j∈[n] given by ψ10 = 1, ψ20 ∼ Beta 2m+δ, 2m+δ ,

while, for j ≥ 3,
ψj0 ∼ Beta m + δ, (2j − 1)m + δ(j − 1) .

(5.3.21)
Recall also Theorem 5.7. The above changes affect the finite-size Pólya graph in only a
minor way. J
Let us give some insight into the proof of Theorem 5.10, after which we give two full
proofs. The first proof relies on Pólya urn methods and the second on a direct computation.
For the Pólya urn proof, we rely on the fact that there is a close connection between
the preferential attachment model and the Pólya urn model, in the following sense. Every
new connection that a vertex gains can be represented by a new ball added to the urn cor-
responding to that vertex, as in Theorems 5.6 and 5.7. As time progresses, the number of
urns corresponding to the vertices changes, which is a major complication. As it turns out,
however, the attachment probabilities are consistent, which allows the Pólya urn description
to be extended to this setting of increasing numbers of urns. Let us now make this intuition
precise.
Pólya urn proof of Theorem 5.10. Let us consider first a two-urn model, where the number
of balls in one urn represents the degree of a particular vertex k , and the number of balls in
the other represents the sum of the degrees of the vertices [k −1] as in Theorems 5.6 and 5.7.
We start this process at the point when n = k , and k has connected to precisely m vertices
in [k − 1]. Note that at this point, by the structure of PA(m,δ)
n (d), the urn representing the
degree of vertex k has m balls while the other urn, corresponding to the vertices in [k − 1],
has (2k − 3)m balls.
Consider a time in the evolution of PA(m,δ)
n (d) when we have n − 1 ≥ k old vertices, and
i − 1 edges between the new vertex n and [k − 1]. Assume that at this point the degree of k
is dk and the sum of the degrees of the vertices in [k − 1] is d[k−1] . The probability that the
ith edge from n to [n − 1] is attached to k is then
dk + δ
, (5.3.22)
2m(n − 1) + (1 + δ)(i − 1)
while the probability that it is connected to a vertex in [k − 1] is equal to
d[k−1] + δ(k − 1)
. (5.3.23)
2m(n − 1) + (1 + δ)(i − 1)
Thus, conditioned on connecting to [k], the probability that the ith edge from n to [n − 1]
is attached to k is (dk + δ)/(kδ + d[k] ), while the probability that the ith edge from n to
[n − 1] is attached to [k − 1] is (d[k−1] + δ(k − 1))/(kδ + d[k] ).
Taking into account that the two urns start with m and (2k − 3)m balls, respectively, we
5.3 Local Convergence of Preferential Attachment Models 207

see that the evolution of the two bins is a Pólya urn with strengths ψk and 1 − ψk , where
ψk ∼ Beta(m + δ, (2k − 3)m + δ(k − 1)) (recall Theorem 5.3). We next use this to
complete the proof of Theorem 5.10, where we use induction. Indeed, using the two-urn
process as an inductive input, we construct the finite-size Pólya graph defined in Theorem
5.10 in a similar way as for the Pólya urns with multiple colors in (5.2.28).
Let Xt ∈ [0, dt/me] be the vertex receiving the tth edge (the other endpoint of this edge
being the vertex dt/me + 1). For t ∈ [m], Xt is deterministic (and equal to 1 since we start
at time 1 with two vertices and m edges between them); however, beginning at t = m + 1,
we have a two-urn model, starting with m balls in each urn. As shown above, the two urns
can be described as Pólya urns with strengths 1 − ψ2 and ψ2 . Once t > 2m, Xt can take
three values but, conditioned on Xt ≤ 2, the process continues to be a two-urn model with
strengths 1 − ψ2 and ψ2 .
To determine the probability of the event that Xt ≤ 2, we now use the above two-urn
model with k = 3, which gives that the probability of the event Xt ≤ 2 is 1 − ψ3 , at least as
long as t ≤ 3m. Combining these two-urn models, we get a three-urn model with strengths
(1 − ψ2 )(1 − ψ3 ), ψ2 (1 − ψ3 ), and ψ3 . Again, this model remains valid for t > 3m, as long
as we condition on Xt ≤ 3. Continuing inductively, we see that the sequence Xt evolves in
stages:
B For t ∈ [m], the variable Xt is deterministic: Xt = 1.
B For t = m + 1, . . . , 2m, the distribution of Xt ∈ {1, 2} is described by a two-urn
model with strengths 1 − ψ2 and ψ2 , where ψ2 ∼ Beta(m + δ, m + δ).
B In general, for t = m(k − 1) + 1, . . . , km, the distribution of Xt ∈ [k] is described by
a k -urn model with strengths
k
Y
ϕ(k)
j = ψj (1 − ψi ), j ∈ [k]. (5.3.24)
i=j+1

Here the Beta-variable ψk in (5.3.19) is chosen at the beginning of the k th stage, in-
dependently of the previously chosen strengths ψ1 , . . . , ψk−1 (for convenience, we set
ψ1 = 1).
Note that the random variables ϕ(k) j can be expressed in Q
terms of the random variables
n
introduced in Theorem 5.10 as follows. By (5.3.20), Sk(n) = j=k+1 (1 − ψj ). This implies
that ϕ(n)
j = ψj Sj(n) , which relates the strengths ϕ(n)
j to the random variables defined right
before Theorem 5.10, and shows that the process derived above is indeed the process given
in the theorem.
We next give a direct proof of Theorem 5.10, which is of independent interest as it also
indicates how the conditional independence of edges can be used effectively:
Direct proof of Theorem 5.10. In what follows, we let PA0n denote the finite-size Pólya graph
of size n. Our aim is to show that P(PA0n = G) = P(PA(m,δ) n (d) = G) for any graph G.
Here, we think of G as being a directed and edge-labeled graph, where every vertex has out-
degree m and the out-edges are labeled as [m]. Thus, the out-edges are the edges from young
to old. Indeed, recall from Section 1.3.5 that the graph starts at time 1 with two vertices and
m edges between them. The vertex set of PA(m,δ) n (d) is [n]. In the proof, it is convenient
208 Connected Components in Preferential Attachment Models

~
to denote the labeled edge set of G as E(G) = {(u, vj (u), j) : u ∈ [n], j ∈ [m]}, where
vj (u) < u is the vertex to which the j th edge of u is attached in G. We can assume that
vj (2) = 1 for all j ∈ [m], since PA(m,δ)
n (d) starts at time 1 with two vertices having m
edges between them.
Fix an edge-labeled graph G for which P(PA(m,δ)
n (d) = G) > 0. On the one hand, we
can compute directly that
Y d(G)
vj (u) (u) + δ
P(PAn (m,δ)
(d) = G) = , (5.3.25)
u∈[3,n],j∈[m]
2m(u − 2) + j − 1 + δ(u − 1)

where [3, n] = {3, . . . , n}. Note that, for every s ∈ [n],

ds(G) −m−1
Y (G)
Y
(dvj (u) (u) + δ) = (i + m + δ), (5.3.26)
u∈[3,n],j∈[m] : vj (u)=s i=0

with the empty product defined as 1. Therefore,

P(PA(m,δ)
n (d) = G) (5.3.27)
d(G)
s −m−1
Y Y Y 1
= (i + m + δ) .
s∈[n] i=0 u∈[3,n],j∈[m]
2m(u − 2) + j − 1 + δ(u − 1)

Thus, we are left with showing that P(PA0n = G) is equal to the rhs of (5.3.27).
To identify P(PA0n = G), it is convenient to condition on the Beta variables (ψj )j∈[n] .
We denote the conditional measure by Pn ; i.e., for every event E ,
Pn (E) = P(E | (ψj )j∈[n] ). (5.3.28)
The advantage of this measure is that now edges are conditionally independent, which allows
us to give an exact formula for the probability of a certain graph occurring. We start by
j
computing the edge probabilities under Pn , where we recall that {u v} is the event that
the j th edge of u connects to v :
Lemma 5.12 (Edge probabilities in PA0n conditioned on Beta variables) Fix m ≥ 1 and
δ > −m, and consider PA0n . For any u > v and j ∈ [m],
j
Pn (u v) = ψv (1 − ψ)(v,u) , (5.3.29)
where, for A ⊆ [n],
Y
(1 − ψ)A = (1 − ψa ). (5.3.30)
a∈A

Proof Recall the construction between (5.3.19) and Theorem 5.10. When we condition on
(ψj )j∈[n] , the only randomness left is that in the uniform random variables (Uk,i )k∈[n],i∈[m] ,
(n) j
where Uk,i is uniform on [0, Sk−1 ]. Then, u v occurs precisely when Uu,j ∈ Iv(n) =
[Sv−1 , Sv ), which occurs with Pn -probability equal to |Iv(n) |/Su−1
(n) (n) (n)
. Note that
(n)
|Iv(n) | = Sv(n) − Sv−1 = (1 − ψ)[v+1,n] − (1 − ψ)[v,n] = ψv (1 − ψ)(v,n] , (5.3.31)
5.3 Local Convergence of Preferential Attachment Models 209

while, from (5.3.20),

n
Y
(n)
Su−1 = (1 − ψi ) = (1 − ψ)[u,n] . (5.3.32)
i=u

Taking the ratio yields, with u > v ,

j (1 − ψ)(v,n]
Pn (u v) = ψv = ψv (1 − ψ)(v,u) , (5.3.33)
(1 − ψ)[u,n]
as required.
In the finite-size Pólya graph PA0n , different edges are conditionally independent, so that
we obtain the following corollary:
Corollary 5.13 (Graph probabilities in PA0n conditioned on Beta variables) Fix m ≥ 1
and δ > −m, and consider PA0n . For any edge-labeled multi-graph G,
n
Y
Pn (PA0n = G) = ψsps (1 − ψs )qs , (5.3.34)
s=2

where ps = p(G)
s and qs = qs are given by
(G)

1{s∈(v (u),u)} .
X X
s − m,
ps = d(G) qs = j
(5.3.35)
u∈[3,n] j∈[m]

j
Proof Multiply the factors Pn (u ~
vj (u)) for every labeled edge (u, vj (u), j) ∈ E(G),
and collect the powers of ψs and 1 − ψs .
We note that ps equals the number of edges in the graph G that point towards s. This is
relevant, since in (5.3.29) in Lemma 5.12 every older vertex v in an edge receives a factor
ψv . Further, again by (5.3.29), there are factors 1 − ψs for every s ∈ (v, u) and all edges
(v, u), so qs counts how many factors 1 − ψs occur.
When taking expectations wrt (ψv )v∈[n] into account, by Corollary 5.13, we obtain ex-
pectations of the form E[ψ p (1 − ψ)q ], where ψ ∼ Beta(α, β) and p, q ≥ 0. These are
computed in the following lemma:
Lemma 5.14 (Expectations of powers of Beta variables) For all p, q ∈ N and ψ ∼
Beta(α, β),
(α + p − 1)p (β + q − 1)q
E[ψ p (1 − ψ)q ] = , (5.3.36)
(α + β + p + q − 1)p+q
where, as before, (x)m = x(x − 1) · · · (x − m + 1) denotes the mth falling factorial of x.
Proof A direct computation based on the density of a Beta random variable in (1.5.2)
yields
B(α + p, β + q) Γ(α + β) Γ(α + p)Γ(β + q)
E[ψ p (1 − ψ)q ] = =
B(α, β) Γ(α)Γ(β) Γ(α + β + p + q)
(α + p − 1)p (β + q − 1)q
= . (5.3.37)
(α + β + p + q − 1)p+q
210 Connected Components in Preferential Attachment Models

The above computation, when applied to Corollary 5.13, leads to the following expression
for the probability of observing a particular edge-labeled multi-graph G:
Corollary 5.15 (Graph probabilities in PA0n ) Fix m ≥ 1 and δ > −m, and consider PA0n .
For any edge-labeled multi-graph G,
n−1
Y (α + ps − 1)ps (βs + qs − 1)qs
P(PA0n = G) = , (5.3.38)
s=2
(α + βs + ps + qs − 1)ps +qs
where α = m + δ, βs = (2s − 3)m + δ(s − 1), and ps = ps(G) and qs = qs(G) are defined
in (5.3.35).

n = dn − m = 0 almost surely
Note that the contribution for s = n equals 1, since p(G) (G)

0
in PAn . Corollary 5.15 allows us to complete the direct proof of Theorem 5.10:
Corollary 5.16 (Graph probabilities in PA0n and PA(m,δ)
n (d)) Fix m ≥ 1 and δ > −m,
and consider PA0n and PA(m,δ)
n (d). For any edge-labeled multi-graph G,
P(PA0n = G) = P(PA(m,δ)
n (d) = G), (5.3.39)
where α = m + δ, βs = (2s − 3)m + δ(s − 1), and ps = ps(G) and qs = qs(G) are defined
in (5.3.35). Consequently, Corollaries 5.13 and 5.15 also hold for PA(m,δ)
n (d).
Proof We evaluate (5.3.38) in Corollary 5.15 explicitly. Since α = m + δ and ps =
s − m,
d(G)
d(G)
s −m−1
Y
(α + ps − 1)ps = (i + m + δ), (5.3.40)
i=0

so that
(G)
−m−1
n−1
Y Y ds
n−1 Y
(α + ps − 1)ps = (i + m + δ), (5.3.41)
s=2 s=2 i=0

which produces the first product in (5.3.27) except for the s = 1 factor.
We next identify the other factors, for which we start by analyzing qs(G) as follows:

1{s∈(vj (u),u)} = 1{s∈(vj (u),n]} − 1{s∈[u,n]} . (5.3.42)

X X X X
qs(G) =
u∈[3,n] j∈[m] u∈[3,n] j∈[m]

We can use
1{v (u)∈[s−1]} = d[s−1] − m(s − 1),
X X (G)
j
(5.3.43)
u∈[3,n] j∈[m]

since the lhs counts the in-edges in [s − 1] except for those from vertex 2 to vertex 1, while
d(G)
[s−1] counts all in- and out-edges, and there are exactly m(s − 2) out-edges in [s − 1] and
m edges from vertex 2 to vertex 1. Further, note that, for s ∈ [n],
1{s∈[u,n]} = m 1{s∈[u,n]} = m(s − 2).
X X X
(5.3.44)
u∈[3,n] j∈[m] u∈[3,s]
5.4 Proof of Local Convergence for Preferential Attachment Models 211

Thus,
(G)
qs(G) = d[s−1] − m(2s − 3). (5.3.45)
As a result, by (5.3.35) and the recursions
(G)
p(G)
s + qs
(G)
= qs+1 + m, α + βs = βs+1 − m, (5.3.46)
we obtain
(α + βs + ps + qs − 1)ps +qs = (βs+1 + qs+1 − 1)qs+1 +m
= (βs+1 + qs+1 − 1)qs+1 (βs+1 − 1)m . (5.3.47)
Therefore, by (5.3.45) and since βs = (2s − 3)m + δ(s − 1),
n−1
Y (βs + qs − 1)qs
s=2
(α + βs + ps + qs − 1)ps +qs
n−1 n−1
Y 1 Y (βs + qs − 1)qs
=
s=2
(βs+1 − 1)m s=2 (βs+1 + qs+1 − 1)qs+1
n−1
Y 1
= (δ + d(G)
1 − 1)d(G) −m . (5.3.48)
1
s=2
(m(2s − 1) + δs − 1)m
Indeed, the starting value in the telescoping product equals, again by (5.3.45), with β2 =
m + δ and q2 = d(G)1 − m,

(β2 + q2 − 1)q2 = (δ + d(G)

1 − 1)d(G) −m , (5.3.49)
1

and, since qn = d(G)

[n−1] − m(2n + 1) = 0, the ending value equals

(βn + qn − 1)qn = 1. (5.3.50)

The first factor in (5.3.48) produces the missing s = 1 factor in (5.3.27), as compared with
(5.3.41). For the second factor in (5.3.48), we compute that, with u = s + 1 ∈ [3, n],
n−1
Y Y Y
(m(2s − 1) + δs − 1)m = (δ(u − 1) + m(2u − 3) − i)
s=2 u∈[3,n] i∈[m]
Y
= (2m(u − 2) + j − 1 + δ(u − 1)), (5.3.51)
u∈[3,n],j∈[m]

where j = m − i + 1, as required.

5.4 P ROOF OF L OCAL C ONVERGENCE FOR P REFERENTIAL ATTACHMENT M ODELS

In this section we complete the proof of the local convergence of preferential attachment
models to the Pólya point tree in Theorem 5.8. This section is organized as follows. In Sec-
tion 5.4.1 we discuss some necessary preliminaries, such as the convergence of the rescaled
Beta variables to Gamma variables and the regularity properties of the Pólya point tree. Our
local convergence proof again relies on a second-moment method for the number of vertices
212 Connected Components in Preferential Attachment Models

whose ordered r-neighborhood agrees with a specific ordered tree t. We investigate the first
moment of the subgraph counts in Section 5.4.2 and handle the second moment in Section
5.4.3. We close in Section 5.4.4 by discussing the local convergence of related preferential
attachment models.

5.4.1 L OCAL C ONVERGENCE : P RELIMINARIES

In this subsection we set the stage for the local convergence proof. We start by analyzing the
asymptotics of (ψk )k∈[n] and (Sk(n) )k∈[n] .

Asymptotics of (ψk )k∈[n] and (Sk(n) )k∈[n]

We start by analyzing the random variables in Theorem 5.10, to prepare us for proving local
convergence in Theorem 5.8. The next lemma describes the asymptotics of ψk for k large:
d
Lemma 5.17 (Gamma asymptotics of Beta variables) As k → ∞, kψk −→ Γ, where Γ
has a Gamma distribution with parameters r = m + δ and λ = 2m + δ . More precisely,
take fk (x) such that P(ψk ≤ fk (x)) = P(χk ≤ x), where χk has a Gamma distribution
d
with r = m + δ and λ = 1 (so that Γ = χk /(2m + δ). Then, for every ε > 0, there exists
K = Kε ≥ 1 sufficiently large such that, for all k ≥ K and x ≤ (log k)2 ,
1−ε 1+ε
x ≤ fk (x) ≤ x. (5.4.1)
k(2m + δ) k(2m + δ)
Further, χk ≤ (log k)2 for all k ≥ K , with probability at least 1 − ε.
d
The construction in Lemma 5.17 implies that ψk = fk (χk ), which allows for a convenient
coupling between the Beta and Gamma random variables.
Proof Fix x ≥ 0. We use Stirling’s formula as in [V1, (8.3.9)] to compute that

Γ m + δ + (2k − 3)m + δ(k − 1)
P(kψk ≤ x) =
Γ m + δ Γ (2k − 3)m + δ(k − 1)
Z x/k
× um+δ−1 (1 − u)(2k−3)m+δ(k−1)−1 du (5.4.2)
0
[(2m + δ)k]m+δ x m+δ−1
Z
= (1 + o(1)) v (1 − v/k)(2k−3)m+δ(k−1)−1 dv.
Γ m + δ k m+δ 0
For every u > 0, (1 − u/k)(2k−3)m+δ(k−1)−1 → e−u(2m+δ) , so that dominated convergence
implies that
Z x
(2m + δ)m+δ um+δ−1 −u(2m+δ)
P(kψk ≤ x) → e du, (5.4.3)
0 Γ m+δ
as required.
We continue by proving the upper bound in (5.4.1). Let χ0k = χk /[(2k − 3)m + δ(k −
1) − 1]. We prove below that ψk χ0k , or, for every x ≥ 0,
P(ψk ≤ x) ≥ P(χ0k ≤ x). (5.4.4)
5.4 Proof of Local Convergence for Preferential Attachment Models 213

This proves the upper bound in (5.4.1) for k ≥ K and K large enough. The inequality in
(5.4.4) is obviously true for x ≥ 1, so we will assume that x ∈ [0, 1) from now on. Then
we can write, with b = (2k − 3)m + δ(k − 1) − 1,

E[1{ψk ≤x} e−bψk (1 − ψk )−b ]

P(χ0k ≤ x) = . (5.4.5)
E[e−bψk (1 − ψk )−b ]

Note that y 7→ e−by (1 − y)−b is increasing, while y 7→ 1{y≤x} is decreasing, so that, by the
correlation inequality in [V1, Lemma 2.14],

E[1{ψk ≤x} e−bψk (1 − ψk )−b ] E[1{ψk ≤x} ]E[e−bψk (1 − ψk )−b ]

≤ = P(ψk ≤ x), (5.4.6)
E[e−bψk (1 − ψk )−b ] E[e−bψk (1 − ψk )−b ]

as required.
We continue with the lower bound in (5.4.1), and now instead aim to prove that, for
x ≤ (log k)2 /b and again with b = (2k − 3)m + δ(k − 1) − 1,

P(ψk ≤ (1 − ε)x) ≤ P(χ0k ≤ xbk ). (5.4.7)

We now write
R (1−ε)xb
y αk −1 (1 − y/b)b dy
P(ψk ≤ (1 − ε)x) = 0
Rb
0
y αk −1 (1 − y/b)b dy
E[1{χ0k ≤(1−ε)xb} eχk (1 − χ0k /b)−b ]
0

=
E[1{χ0k ≤b} eχk (1 − χ0k /b)−b ]
0

E[1{χ0k ≤(1−ε)xb} eχk (1 − χ0k /b)−b | χ0k ≤ b]

= 0 . (5.4.8)
E[eχk (1 − χ0k /b)−b | χ0k ≤ b]

Note that y 7→ 1{y≤(1−ε)xb} is non-increasing, and y 7→ ey (1 − y/b)−b is non-decreasing.

Therefore, again by [V1, Lemma 2.14],

P(ψk ≤ (1 − ε)x) ≤ P(χ0k ≤ (1 − ε)xbk | χ0k ≤ b). (5.4.9)

Thus, for the lower bound in (5.4.1), it suffices to show that, for all k large enough, and for
x ≤ (log k)2 /b,
P(χ0k ≤ (1 − ε)xb | χ0k ≤ b) ≤ P(χ0k ≤ xb). (5.4.10)

In turn, this follows from the statement that, for x ≤ (log k)2 /b,

e(x) ≡ P(χ0k ≤ xb)P(χ0k ≤ b) − P(χ0k ≤ (1 − ε)xb) ≥ 0. (5.4.11)

Note that e(x) can be simplified to

e(x) = P(xb(1 − ε) < χ0k ≤ bx) − P(χ0k > b)P(χ0k ≤ xb). (5.4.12)
214 Connected Components in Preferential Attachment Models

We bound the first term on the rhs of (5.4.12) from below as follows:

P(xb(1 − ε) < χ0k ≤ bx)

Z bx
y m+δ−1 −y [xb(1 − ε)]m+δ−1 bx
Z
= e dy ≥ e−y dy
xb(1−ε) Γ(m + δ) Γ(m + δ) xb(1−ε)

[xb(1 − ε)]m+δ−1 −xb(1−ε)

= [e − e−bx ], (5.4.13)
Γ(m + δ)
while the second term on the rhs of (5.4.12) is bounded from above by
0
P(χ0k > b) ≤ e−b/2 E[eχk /2 ] ≤ 2m+δ e−b/2 , (5.4.14)

and
xb xb
y m+δ−1 −y [xb]m+δ−1
Z Z
P(χ0k ≤ xb) = e dy ≤ e−y dy
0 Γ(m + δ) Γ(m + δ) 0
[xb]m+δ−1
≤ . (5.4.15)
Γ(m + δ)
Substitution yields that

[xb]m+δ−1
e(x) ≥ [(1 − ε)m+δ−1 (e−xb(1−ε) − e−bx ) − 2m+δ e−b/2 ], (5.4.16)
Γ(m + δ)
which, for any ε ∈ (0, 1), is non-negative for all x < 13 , say, and b = (2k − 3)m + δ(k − 1)
sufficiently large. This is much more than is needed.
We complete the proof by showing that χk ≤ (log k)2 for all k ≥ K with probability at
least 1 − ε, for which we note that
2 2
P(χk ≥ (log k)2 ) ≤ E[eχk /2 ] e−(log k) /2
= 2m+δ e−(log k) /2
, (5.4.17)

which is summable in k , so that the union bound yields the claim.

By Lemma 5.17, we see that indeed the Beta random variables (ψk )k∈[n] give rise to the
Gamma random variables in (5.3.2), which explains their appearance in the Pólya point tree.
We continue by analyzing the asymptotics for the random variables (Sk(n) )k∈[n] :

Proposition 5.18 (Asymptotics of Sk(n) ) Recall that χ = (m + δ)/(2m + δ). For every
ε > 0, there exist η > 0 and K < ∞ such that, for all n ≥ K and with probability at least
1 − ε,
k χ
max Sk(n) − ≤ η, (5.4.18)
k∈[n] n
and
k χ k χ
max Sk(n) − ≤ε . (5.4.19)
k∈[n]\[K] n n
Proof We will give the intuition behind Proposition 5.18. We recall from (5.3.20) that
5.4 Proof of Local Convergence for Preferential Attachment Models 215
Qn
Sk(n) = i=k+1 (1 − ψi ), where (ψk )k∈[n] are independent random variables. We write
n
X n
X n
X
log Sk(n) = log(1 − ψi ) = E[log(1 − ψi )] + (log(1 − ψi ) − E[log(1 − ψi )]).
i=k i=k i=k

Pn (5.4.20)
Note that (Mn )n≥k , with Mn = i=k (log(1 − ψi ) − E[log(1 − ψi )]), is a martingale.
Thus, by Kolmogorov’s inequality in (1.5.5), and all t ≥ k ,
t Xn t
X
P sup (log(1−ψi )− E[log(1−ψi )]) ≥ ε ≤ ε−2 Var(log(1−ψi )). (5.4.21)
n=k
i=k i=k

Using that log(1 − x) ≤ x/(1 − x) for all x ∈ [0, 1] and Lemma 5.14, we obtain the bound
h ψi2 i
Var(log(1 − ψi )) ≤ E[(log(1 − ψi ))2 ] ≤ E = O(i−2 ), (5.4.22)
(1 − ψi )2
so that, for all t ≥ n,
n
t X CX 1
P sup (log Sk(n) − E[log(1 − ψi )]) ≥ ε ≤ 2 , (5.4.23)
n=k
i=k
ε i≥k i2

which can be made small by letting k ≥ K and K be large. This shows that the random part
in (5.4.20) is whp small for k ≥ K .
To compute the asymptotics of the deterministic first part in (5.4.20), now using that
x ≤ log(1 − x) ≤ x + x2 /(1 − x) for all x ∈ [0, 1],
n
X n h ψ2 i X 1
X
0≤ E[log(1 − ψi )] − E[ψi ] ≤ E i
≤C , (5.4.24)
i=k i=k
1 − ψi i≥k
i2

which can again be made small when k ≥ K with K large. Further, by Lemma 5.14, with
α = m + δ and βi = (2i − 3)m + δ(i − 1), we have
n n
X X m+δ m+δ
E[ψi ] = = log(n/k) + O(1/k) (5.4.25)
i=k i=k
(2i − 3)m + δ(i − 1) 2m + δ
= χ log(n/k) + O(1/k),
Pn
since i=k 1/i = log(n/k) + O(1/k). We conclude that
log Sk(n) = χ log(n/k) + O(1/k), (5.4.26)
which completes the proof of (5.4.19). The proof of (5.4.18) follows easily, since (5.4.19) is
stronger for k ≥ K , while E[Sk(n) ] = o(1) for k ∈ [K]. We omit further details.

Density of the Pólya Point Tree

Our proof of Theorem 5.8 proves a stronger result. Indeed, first, it proves marked local
convergence in probability, where the marks are the vertex types in [0, 1] × {Y, O}, and
we recall Definition 2.10 in Section 2.3.5 for the definition of marked local convergence.
Second, we will prove that the probability that the vertex labels in Br(Gn ) (o), rescaled by
appropriate powers of 1/n, are equal to some fixed positive values converges to the joint
216 Connected Components in Preferential Attachment Models

density of the ages in the Pólya point tree. For this, it is useful to have a description of this
joint density.
Before we can formulate our main result concerning this joint density, we will introduce
some further notation. Recall the definition of the Pólya point tree in Section 5.3.1 and also
the Poisson intensities in (5.3.5) and the corresponding Gamma variables in (5.3.2). Below,
we write x 7→ ρw (x; Γw ) for the Poisson intensity in (5.3.5) conditioned on the Gamma
variable Γw .
Fix an ordered tree t, and let (G, o) be the Pólya point tree. In what follows, it is useful
to regard Br(G) (v) as a rooted edge-marked graph, where an edge receives a label in [m]
corresponding to the label of the directed edge (directed from young to old) that gives rise
to that edge (in either possible direction). Thus, in the pre-limiting preferential attachment
j j
model, the edge {u, v} receives label j when u v or when v u.
We denote this marked ordered neighborhood as B̄r (o). The edge labels are almost
(G)

contained in the ordered tree t but not quite, since when a vertex has label Y it is unclear
which edge of its parent gave rise to this connection, and, together with the m−1 edge labels
of its older children, these edge labels should equal [m]. We will slightly abuse notation, and
also write t for this edge-labeled tree, and we will write B̄r(G) (o) = t when the two graphs
are the same as edge-labeled trees.
For (aw )w∈V (t) ∈ [0, 1]|V (t)| , we define ft (aw )w∈V (t) to be the density of the ages in

the Pólya point tree when the ordered r-neighborhood B̄r(G) (o) in the Pólya point tree equals
t. Thus,
Y
µ(B̄r(G) (o) = t, Aw ∈ aw daw ∀w ∈ V (t)) = ft (aw )w∈V (t) daw . (5.4.27)
w∈V (t)

Note that (aw )w∈V (t) 7→ ft (aw )w∈V (t) is a sub-probability measure, as it need not inte-
grate to 1. We let t̄ denote a rooted vertex- and edge-marked tree, where the vertex labels
corresponding to the ages of the nodes are in [0, 1] and the edge labels are in [m]. Thus,

t̄ = t, (aw )w∈V (t) , (5.4.28)
where aw ∈ [0,1] is the age of w ∈ V (t). The following proposition identifies the density
ft (aw )w∈V (t) in (5.4.27), which corresponds to the density of the ages in the Pólya point
tree when the edge-marked neighborhood equals t:

Proposition 5.19 (Joint density of the Pólya point tree) The density ft (aw )w∈V (t) in
(5.4.27) satisfies
ft (aw )w∈V (t) = E[gt (aw )w∈V (t) ; (χw )w∈V (t) ],

(5.4.29)
where (χw )w∈V (t) are iid Gamma variables with parameters r = m + δ and λ = 1, and,
w (t̄) = #{{v, w} ∈ E(t) : av > aw } the in-degree of w in t̄,
with d(in)

gt (aw )w∈V (t) ; (χw )w∈V (t) (5.4.30)
Y χw d(in) w ( t̄) Y R 1 Y 1
= e− aw ρw (dt;χw ) ,
w∈V (t)
2m + δ w∈V ◦ (t) (w,w`)∈E(t)
(aw ∧ aw` ) (aw ∨ aw` )χ
1−χ

where V ◦ (t) denotes the set of vertices in the tree t that are at a distance strictly smaller
than r from the root.
5.4 Proof of Local Convergence for Preferential Attachment Models 217

Proof The proof is split into several steps. We start by removing the size-biasing of the
Gamma variables in (5.3.2).

Un-Size-Biasing the Gamma Variables

Recall the Gamma variables present in (5.3.2). Denote the conditional density in (5.4.27)
given the random variables (Γw )w∈V (t) by ft (aw )w∈V (t) ; (Γw )w∈V (t) , so that

ft (aw )w∈V (t) = E[ft (aw )w∈V (t) ; (Γw )w∈V (t) ].

(5.4.31)
Now recall the size-biasing in (5.3.2), present for all individuals of label O. In terms of these
random variables, note that, for each function h : R 7→ R, and using that E[Y ] = m + δ ,
h Y i h Y i
E[h(Y ? )] = E h(Y ) = E h(Y ) . (5.4.32)
E[Y ] m+δ
Thus, with (χw )w∈V (t) a collection of iid Gamma random variables with parameters m + δ
and 1,
h Y χ 1{label(w)=O} i
w
ft (aw )w∈V (t) = E

ft (aw )w∈V (t) ; (χw )w∈V (t) , (5.4.33)
w∈V (t)
m+δ

where we write label(w) for the label of w. We claim that gt (aw )w∈V (t) ; (χw )w∈V (t) in
(5.4.30) is given by

gt (aw )w∈V (t) ; (χw )w∈V (t)
Y χw 1{label(w)=O}
= ft (aw )w∈V (t) ; (χw )w∈V (t) , (5.4.34)
w∈V (t)
m+δ

which then completes the proof of (5.4.29).

To prove (5.4.34), we recall again that the vertices have labels in {Y, O} indicating their
relative age compared with their parent. Let dw (t̄) be the number of younger children of w
in V (t̄), i.e.,
dw (t̄) = #{` : w` ∈ V (t), aw` > aw }. (5.4.35)

w (t̄) when w has label Y, while dw (t̄) = dw (t̄) − 1 when w has label
Then dw (t̄) = d(in) (in)

O. We can then rewrite the first factor on the rhs of (5.4.30) as follows:

Y χw d(in) w (t̄) Y χw 1{label(w)=O} Y χw dw (t̄)

= . (5.4.36)
w∈V (t)
2m + δ w∈V (t)
2m + δ w∈V (t)
2m + δ

Thus, to prove (5.4.34), we need to show that

ft (aw )w∈V (t) ; (χw )w∈V (t)
Y χw dw (t̄) m + δ 1{label(w)=O}
=
w∈V (t)
2m + δ 2m + δ
Y R1 Y 1
× e− aw
ρw (dt;χw )
. (5.4.37)
w∈V ◦ (t) (w,w`)∈E(t)
(aw ∧ aw` )1−χ (a w ∨ aw` )χ
218 Connected Components in Preferential Attachment Models

Bringing in the Densities: Older Children

Recall from (5.3.1) that each vertex w ∈ V (t) has m− (w) older children of label O. By
(5.3.3), the density of the age Aw` of the child w` of w with age aw is given by
χ−1 −χ m + δ −(1−χ) −χ
f (aw` ; aw ) = χaw` aw = a aw
2m + δ w`
m+δ 1
= , (5.4.38)
2m + δ (aw ∧ aw` ) (aw ∨ aw` )χ
1−χ

for aw` ∈ [0, aw ]. These ages are iid, so that the joint density of the O children of w is
m− (w)
Y m+δ 1
. (5.4.39)
`=1
2m + δ (aw ∧ aw` ) (aw ∨ aw` )χ
1−χ

The m + δ factors cancel the inverse power of m + δ on the rhs of (5.4.34).

Bringing in the Densities: Younger Children and Poisson Processes

Recall the density of the points of an inhomogeneous Poisson processes with intensity Λ on
[0, ∞) from (1.5.6) in Section 1.5. This observation implies that, for each w ∈ V ◦ (t), the
density of the ages of its younger children having label Y equals
R1 Y
e− aw ρw (dt;χw ) ρw (aw` ; χw` ), (5.4.40)
` : aw` >aw

and aw` > aw for all ` > m− (w). Here, we also note that aw` < aw(`+1) for all w and
` > m− (w) such that w`, w(` + 1) ∈ V (t)). By (5.3.5),
χw x1/(τ −1)−1 χw 1/(τ −1)−1 −1/(τ −1)
ρw (x; χw ) = = a aw (5.4.41)
1/(τ −1)
τ − 1 aw τ − 1 w`
m m 1
= χw a−χ −(1−χ)
w` aw = χw .
2m + δ 2m + δ (aw ∧ aw` ) (aw ∨ aw` )χ
1−χ

Since the number of `-values with aw` > aw in (5.4.40) equals dw (t̄), this leads to
R1 m dw (t̄) Y 1
e− aw ρw (dt;χw ) χw
dw (t̄)
. (5.4.42)
2m + δ ` : a >a
(aw ∧ a w` )1−χ (a ∨ a )χ
w w`
w` w

When we recall that each `-value with aw` > aw is assigned an edge label in [m], which
occurs independently with probability 1/m, the density of the edge-labeled younger children
of w is given by
R1 1 dw (t̄) Y 1
e− aw ρw (dt;χw ) χw
dw (t̄)
. (5.4.43)
2m + δ ` : a >a
(aw ∧ aw` ) (aw ∨ aw` )χ
1−χ
w` w

Multiplying Out
We multiply (5.4.39) and (5.4.43) to obtain that the density of the ages of the children of w
for each w ∈ V (t) is given by
R1 1 dw (t̄) m + δ m− (w) Y 1
χdww (t̄) e− aw ρw (dt;χw ) .
2m + δ 2m + δ (aw ∧ aw` ) (aw ∨ aw` )χ
` : w`∈V (t)
1−χ
5.4 Proof of Local Convergence for Preferential Attachment Models 219

The above holds for all w ∈ V ◦ (t), i.e., all w that are at a distance strictly smaller than r
from the root ∅. We next multiply over all such w, to obtain
Y R1 χw dw (t̄) m + δ m− (w)
e− ρw (dt;χw )

ft (aw )w∈V (t) ; (χw )w∈V (t) = aw

w∈V ◦ (t)
2m + δ 2m + δ
(5.4.44)
Y 1
× .
(w,w`)∈E(t)
(aw ∧ aw` )1−χ (aw ∨ aw` )χ

Since
1{label(w)= } ,
X X
m− (w) = O (5.4.45)
w∈V ◦ (t) w∈V (t)

this is indeed the same as the rhs of (5.4.37). This proves (5.4.34), and thus (5.4.29).

Regularity Properties of the Pólya Point Tree

We next discuss some properties of the Pólya point tree, showing that all random variables
and intensities used in within the r-neighborhood of the root satisfy uniform bounds whp
(recall (5.3.5)):

Lemma 5.20 (Regularity of Pólya point tree) Consider the Pólya point tree (G, ∅). Fix
r ≥ 1 and ε > 0. Then there exist constants η > 0 and K < ∞ such that, with probability
at least 1 − ε,

(a) Br(G) (∅) ≤ K ;

(b) Aw ≥ η for all w ∈ Br(G) (∅);
(c) Γw ≤ K and ρw (·) ≤ K for all w ∈ Br(G) (∅);
(d) minw,w0 ∈Br(G) (∅) |Aw − Aw0 | ≥ η .

Proof The proof of this lemma is standard, and can be obtained, for example, by induction
on r or by using Proposition 5.19. The last bound follows from the continuous nature of
the random variables Aw , which implies that Aw 6= Aw0 for all distinct pairs w, w0 , so that
any finite number are pairwise separated by at least η for an appropriate η = η(ε) with
probability at least 1 − ε.

5.4.2 L OCAL C ONVERGENCE : F IRST M OMENT

As in the proof of Proposition 5.19, it will be useful to regard Br(Gn ) (v) as a rooted edge-
marked graph, where an edge receives a label in [m] corresponding to the label of the di-
rected edge that gives rise to it (in either possible direction). Thus, the edge {u, v} receives
j j
label j when u v or when v u. We denote this marked ordered neighborhood as
B̄r(Gn ) (v). Let
1{B̄
X
Nn,r (t) = (Gn )
r (v)=t}
. (5.4.46)
v∈[n]
220 Connected Components in Preferential Attachment Models

With Br(G) (∅) the r-neighborhood of ∅ in the Pólya point tree (which in itself is also or-
dered), we aim to show that
Nn,r (t) P
−→ µ(Br(G) (∅) = t), (5.4.47)
n
where, again, Br(G) (∅) = t denotes that the ordered trees Br(G) (∅) and t agree, and µ
denotes the law of the Pólya point tree.
Proving the convergence of Nn,r (t) is much harder than for inhomogeneous random
graphs and configuration models, considered in Theorems 3.14 and 4.1, respectively, as the
type of a vertex is crucial in determining the number and types of its children, and the type
space is continuous.
We start with the first moment, for which we note that

E[Nn,r (t)/n] = P(B̄r(Gn ) (on ) = t), (5.4.48)

where on ∈ [n] is a uniform vertex. We do this using an explicit computation, alike the
one used in the direct proof of Theorem 5.10. In fact, we will prove a stronger statement,
in which we also study the vertex labels in B̄r(Gn ) (on ) and compare it with the density in
Proposition 5.19. Let us introduce some necessary notation.
Recall the definition of the rooted vertex- and edge-marked tree t̄ in (5.4.28), where the
vertex labels were in [0, 1] and the edge labels in [m]. We fix a tree t of height exactly r.
We let the vertex vw = dnaw e ∈ [n] correspond to the node in the tree having age aw .
With a slight abuse of notation, we also write B̄r(Gn ) (on ) = t̄ to denote that the vertices,
edges, and edge labels in B̄r(Gn ) (on ) are given by those in t̄. Note that this is rather different
from Br(Gn ) (on ) ' t as defined in Definition 2.3, where t was unlabeled and we were
investigating whether Br(Gn ) (on ) and t are isomorphic, and even different from B̄r(Gn ) (on ) =
t as in (5.4.46), where only the edges receive marks, and not the vertices. The definition of
B̄r(Gn ) (o) = t̄ used here is tailor-made to study the local convergence of PA(m,δ) n (d) as a
marked graph, where the vertex marks denote the vertex labels or ages of the vertices and
also the edges receive marks in [m].
Let t̄ = (t, (vw )w∈V (t) ) be the vertex-marked version of t, now with vw = dnaw e ∈ [n]
denoting the vertex label of the tree node w in t (instead of the age as in (5.4.28)). Below,
we write v ∈ V (t̄) to indicate that there exists a w ∈ V (t) with vw = v . Also, let ∂V (t̄)
denote the vertices at distance exactly r from the root of t̄, and let V ◦ (t̄) = V (t̄) \ ∂V (t̄)
denote the restriction of t̄ to all vertices at distance at most r − 1 from its root. The main
result of this section is the following theorem:

Theorem 5.21 (Marked local weak convergence) Fix m ≥ 1 and δ > −m, and consider
Gn = PA(m,δ)
n (d). Uniformly for vw ≥ εn and χ̂vw ≤ K for all w ∈ V (t), where
χ̂v = fv (ψv ) and when (vw )w∈V (t) are all distinct,

P(B̄r(Gn ) (on ) = t̄ | (ψvw )w∈V (t) )

1
= (1 + oP (1)) |V (t)| gt (vw /n)w∈V (t) ; (χ̂vw )w∈V (t) . (5.4.49)
n
Consequently, if (χw )w∈V (t) is an iid sequence of Gamma variables with parameters 2m+δ
5.4 Proof of Local Convergence for Preferential Attachment Models 221

and 1,
E[gt (aw )w∈V (t) ; (χw )w∈V (t) ] = ft (aw )w∈V (t) ,

(5.4.50)
and thus PA(m,δ)
n (d) converges to the Pólya point tree in the marked local weak sense.
By Lemma 5.17, (χ̂v )v∈[n] are iid Gamma variables with parameters m + δ and 1. The
coordinates of the sequence (χw )w∈V (t) defined by
χw = χ̂vw (5.4.51)
are indeed iid when the (vw )w∈V (t) are distinct. By (5.4.50), the relation (5.4.49) can be
seen as a density theorem for the densities of the ages of vertices in r-neighborhoods.
Note that the type space of the Pólya point tree equals S = {Y, O} × [0, ∞) (except for
the root, which has a type only in [0, 1]). However, the {Y, O}-components of the types are
deterministic when one knows the ages in B̄r(Gn ) (on ), so these do not need to receive much
attention in what follows.
We prove Theorem 5.21 below. The main ingredient to the proof is Proposition 5.22,
which gives an explicit description for the lhs of (5.4.49):
Proposition 5.22 (Law of vertex- and edge-marked neighborhoods in PA(m,δ) n (d)) Fix
m ≥ 2 and δ > −m, and consider Gn = PAn (d). Let t̄ = (t, (vw )w∈V (t) ) be a rooted
(m,δ)

vertex- and edge-marked tree with root on . Fix t̄ such that (vw )w∈V (t) ) are all distinct, with
the oldest vertex having age at least εn. Then, for all (ψv )v∈V (t̄) such that ψv ≤ K/v for
all v ∈ V (t̄), as n → ∞,

P B̄r(Gn ) (on ) = t̄ | (ψv )v∈V (t̄)
1 + oP (1) Y p0v Y
= ψv exp − (2m + δ)nψv (v/n)χ (1 − (v/n)1−χ )
n v∈V (t̄) ◦
v∈V (t̄)
Y (βs + qs0 − 1)qs0
× , (5.4.52)
s∈[n]\V (t̄)
(α + βs + qs0 − 1)qs0

where, if u ∼ s denotes that u and s are neighbors in t̄,

p0s = 1{s∈V (t̄)} 1{u∼s,u>s} ,

X
(5.4.53)
u∈V (t̄)

1{s∈(v (u),u)} .
X X
qs0 = j
(5.4.54)
u∈V (t̄) j∈[m]

Proof We start by analyzing the conditional law of B̄r(Gn ) (on ) given all (ψv )v∈[n] . After
this, we take the expectation wrt ψv for v 6∈ B̄r(Gn ) (on ) to get the claim.

Computing the Conditional Law of B̄r(Gn ) (on ) Given (ψv )v∈[n]

We start by introducing some useful notation. Recall (5.3.29). Define, for u > v , the edge
probability
Y
Pu,v = ψv (1 − ψs ). (5.4.55)
s∈(v,u)
222 Connected Components in Preferential Attachment Models

We first condition on all (ψv )v∈[n] and use Lemma 5.12 to obtain, for a vertex-marked edge-
labeled tree t̄,
n
1 Y p0v Y 0
Pn (B̄r(Gn ) (on ) = t̄) = ψv (1 − ψs )qs
n v∈V (t̄) s=2
Y Y
× [1 − Pu,v ], (5.4.56)
v∈V ◦ (t̄) j
u,j : u6 v

where the 1/n is due to the uniform choice of the root, the first double product is due to all
the required edges to ensure that B̄r(Gn ) (on ) ⊆ t̄, while the second double product is due to
all the other edges, which must be absent, so that B̄r(Gn ) (on ) really equals t̄.

No-Further-Edge Probability
We continue by analyzing the second line in (5.4.56), which, for clarity, we call the no-
further-edge probability. First of all, since we are exploring the r-neighborhood of o, the
j
only edges that are not allowed are of the form u v , where v ∈ V ◦ (t̄) and u > v , i.e.,
they are younger vertices than those in V (t̄) that do not form edges in t̄.
◦

Recall that the minimal age of a vertex in t̄ is εn. Further, by Lemma 5.17, with over-
whelming probability, ψv ≤ (log n)2 /n for all v ≥ εn. In particular, Pu,v is small uni-
formly in v ∈ V (t̄) and u > v . Since there are only finitely many elements in V ◦ (t̄), we
can thus approximate as follows:
Y Y Y Y
[1 − Pu,v ] = (1 + oP (1)) [1 − Pu,v ]. (5.4.57)
v∈V ◦ (t̄) j v∈V ◦ (t̄) u,j : u>v
u,j : u6 v

We can make the further approximation

Y P 2
X
[1 − Pu,v ] = eΘ(1) u,j : u∈(v,n] Pu,v
exp − Pu,v . (5.4.58)
u,j : u>v u,j : u>v

Then we can compute, using (5.3.29) and (5.3.20),

X X Y
Pu,v = mψv (1 − ψs )
u,j : u∈(v,n] u∈(v,n] s∈(v,u)
X S (n)
v
= mψv (n) . (5.4.59)
u∈(v,n]
S u

− sχ | −→ 0.
P
We will take v = dsne for some s ∈ [ε, 1]. By Lemma 5.18, sups∈[ε,1] |Sns
(n)

Thus, we also have

X Z 1
t−χ dt −→ 0.
P
sup 1/Su −
(n)
(5.4.60)
s∈[ε,1] s
u∈(sn,n]
5.4 Proof of Local Convergence for Preferential Attachment Models 223

We conclude that
1
m m
X Z
tχ dt = [1 − s1−χ ]
P
(n) Pu,sn −→ m
nψsn Ssn u∈(sn,n] s 1−χ
= (2m + δ)[1 − s1−χ ]. (5.4.61)
As a result,
X
m Pu,v = (1 + oP (1))nψsn Ssn
(n)
(2m + δ)[1 − s1−χ ]
u∈(sn,n]

= (1 + oP (1))(snψsn )sχ−1 (2m + δ)[1 − s1−χ ]

= (1 + oP (1))(2m + δ)(snψsn )κ(v/n), (5.4.62)
where we recall that κ(u) = [1 − u1/(τ −1) ]/u1/(τ −1) from (5.3.9).
Further, by Lemma 5.17, (vψv )v∈V ◦ (t) converges in distribution to a sequence of inde-
pendent Gamma random variables, and thus, for all v ∈ V ◦ (t̄),
X
2 P
Pu,v −→ 0. (5.4.63)
u,j : u∈(v,n]

Therefore,
Y Y Y
[1 − Pu,v ] = (1 + oP (1)) e−(2m+δ)(vψv )κ(v/n) . (5.4.64)
v∈V (t̄) j v∈V ◦ (t̄)
u,j : u6 v

Conclusion of the Proof

We next use Lemma 5.14 to take the expectation wrt ψs for all s 6∈ V (t̄), to obtain

P B̄r (on ) = t̄ (ψj )j∈V (t̄)
1 + oP (1) Y p0v Y −(2m+δ)(vψv )κ(v/n) 0
= ψv e (1 − ψv )qv
n v∈V (t̄) v∈V ◦ (t̄)
Y (α + p0s − 1)p0s (βs + qs0 − 1)qs0
× . (5.4.65)
s∈[n]\V (t̄)
(α + βs + p0s + qs0 − 1)p0s +qs0
0
Since qv0 is uniformly bounded, we also have (1 − ψv )qv −→ 1 for all v ∈ V (t̄). Finally,
P

note that p0s = 0 for all s ∈ [n] \ V (t̄).

We are now ready to complete the proof of Theorem 5.21:
Proof of Theorem 5.21. We collect the different factors on the rhs of (5.4.52) in Proposition
5.22, and compare the result with that in Proposition 5.19.

Coupling of Beta and Gamma Variables

First, we use the construction in Lemma 5.17 and take (χ̂k )k≥1 as the sequence of iid
Gamma r = m + δ and λ = 1 variables. Let ψk = fk (χ̂k ), so that (ψk )k≥2 are in-
dependent Beta m + δ, (2k − 3)m + δ(k − 1) variables, as required (recall (5.3.19)).
Together with (5.4.51), this provides the coupling between the Gamma and Beta variables.
224 Connected Components in Preferential Attachment Models

In order to relate this to the Gamma variables in the description of the Pólya point tree
(recall (5.3.2)–(5.3.5)), we need to look into the precise structure of the rhs of (5.4.52), as
some of its ingredients give rise to the size-biasing in (5.3.2). In the proof below, we restrict
to the (χ̂k )k≥1 for which χ̂v ≤ K for all v ∈ V (t̄), which occurs whp for K large by
Lemma 5.20 and since v > εn.

Product of Beta Variables

We next analyze the first term on the rhs of (5.4.52), under the above coupling. Note that
p0v = d(in)
wv (t) for v ∈ V (t̄), where dw (t) is the in-degree of w in t̄, i.e., the number of
(in)

younger neighbors of w in V (t̄), and wv is the vertex in V (t) corresponding to v ∈ V (t̄).

Under the above coupling, and the fact that v ≥ εn for all v ∈ V (t̄), we first observe that
χ̂v
ψv = (1 + oP (1)) . (5.4.66)
v(2m + δ)
Thus,
(in)
Y 0
Y χ̂v dwv (t)
ψvpv = (1 + oP (1)) , (5.4.67)
v∈V (t̄) v∈V (t̄)
v(2m + δ)

where the error term is uniform on the event that χ̂v ≤ K for all v ∈ V (t̄).

Limit of the No-Further-Edge Probability

We continue by analyzing the second term on the rhs of (5.4.52), and note that, under the
coupling of (ψk )k≥2 and (χ̂k )k≥2 , for v = an, we have
Z 1
(2m + δ)(anψdane )κ(a) = (1 + oP (1)) ρa (dt; χ̂an ), (5.4.68)
a

where we recall ρw from (5.3.5), and its conditional form given Γw denoted by ρw (x; Γw ).
For w ∈ V (t), let vw be such that aw n = vw and write χw = χ̂vw . Using (5.4.66), this
leads to
Y Y − R 1 ρ (dt;χ̂ )
e−(2m+δ)(vψv )κ(v/n) = (1 + oP (1)) e v/n wv vw

v∈V ◦ (t̄) v∈V ◦ (t̄)

Y R1
= (1 + oP (1)) e− aw
ρw (dt;χw )
. (5.4.69)
w∈V ◦ (t)

Product of Falling Factorials

We now analyze the product of falling factorials on the second line of the rhs of (5.4.52).
Note that qs0 = 0 for all s ∈ [εn] by (5.4.54), while qs0 ≤ |V (t̄)| is uniformly bounded.
Therefore,
(βs + qs0 − 1)qs0 α 0
= (1 + Θ(βs−2 ))(1 − )qs
(α + βs + qs − 1)qs0
0 βs
0
= (1 + Θ(βs−2 ))e−αqs /βs . (5.4.70)
5.4 Proof of Local Convergence for Preferential Attachment Models 225

Thus, since qs0 = 0 for all s ∈ [εn], we also have

Y (βs + qs0 − 1)qs0 −α s∈[n]\V (t̄) qs0 /βs
P
= (1 + o(1))e ,
s∈[n]\V (t̄)
(α + βs + qs0 − 1)qs0

where the error term is uniformly bounded. We recall that the edges in our edge-marked
tree t̄ are given by E(~ t̄) = {(u, vj (u), j) : u ∈ [n], j ∈ [m]}. We use (5.4.54) and βs =
(2s − 3)m + δ(s − 1) to write
X X X 1{s∈(v,u)}
qs0 /βs =
s∈[n]\V (t̄)
βs
(u,vj (u),j)∈E(t̄) s∈[n]\V (t̄)
X X 1{s∈(v,u)}
= . (5.4.71)
(u,vj (u),j)∈E(t̄) s∈[n]\V (t̄)
(2s − 3)m + δ(s − 1)

Since v ≥ εn for all v ∈ V (t̄),

X 1{s∈(v,u)} X 1{s∈(v,u)}
= O(1/v) +
s∈[n]\V (t̄)
(2s − 3)m + δ(s − 1) s∈[n]
(2m + δ)s
log (u/v)
= O(1/v) + . (5.4.72)
2m + δ
As a result, since α = m + δ and χ = (m + δ)/(2m + δ),
Y (βs + qs0 − 1)qs0 Y
= (1 + o(1)) e−χ log (u/v)
s + qs − 1)qs0
(α + β 0
s∈[n]\V (t̄) (u,vj (u),j)∈E(t̄)
Y v χ
= (1 + o(1)) . (5.4.73)
(u,v (u),j)∈E(t̄)
u
j

Note that (u, vj (u), j) ∈ E(t̄) when there exists w ∈ V (t) and ` such that (w, w`) ∈
E(t), so that
!χ
Y v χ Y vw ∧ vw`
= . (5.4.74)
u
(u,vj (u),j)∈E(t̄) (w,w`)∈E(t)
vw ∨ vw`

Collecting Terms and Proof of (5.4.50)

We combine (5.4.67), and (5.4.73) and (5.4.74), to arrive at
Y 0
Y (βs + qs0 − 1)qs0
ψvpv (5.4.75)
v∈V (t̄) s∈[n]\V (t̄)
(α + βs + qs0 − 1)qs0
!d(in)
wv (t̄)
Y χ̂v Y 1
= (1 + oP (1)) .
v∈V (t̄)
2m + δ (w,w`)∈E(t)
(v w ∧ v w` ) 1−χ (v ∨ v )χ
w w`

Combining this further with (5.4.69) and using (5.4.30) in Proposition 5.19, we obtain
(5.4.49) with (χw )w∈V (t) = (χ̂dnaw e )w∈V (t) , which is indeed an iid sequence of Gamma(m+
δ, 1) random variables when the (dnaw e)w∈V (t) are distinct, and we recall (5.4.30).
226 Connected Components in Preferential Attachment Models

Conclusion of the Proof of Theorem 5.21

Finally, we prove that PA(m,δ)
n (d) converges to the Pólya point tree in the marked local
weak convergence sense. For this, we note that the rhs of (5.4.49) has full measure when
integrating over (aw )w∈V (t) followed by summing over the ordered Ulam–Harris trees t.
Further, the Pólya point tree is regular, as proved in Lemma 5.20. Thus, whp B̄r(Gn ) (on )
also has the same regularity properties, so that the assumptions that min V (t̄) > εn and
χv ≤ K hold whp for ε > 0 small and K large.
Recall the definition of marked local weak convergence from Definition 2.10. Recall that
the type space is [0, ∞) × {Y, O}. We put the Euclidean distance on [0, 1] for the ages of
the vertices, and the discrete distance on {Y, O}, meaning that d(s, t) = 1{s6=t} for s, t ∈
{Y, O}. Then, for fixed t, (5.4.49) can be summed over all vw such that |vw /n − aw | ≤ 1/r
with vw ∈ [n] equal to the vertex that corresponds to w ∈ V (t), to obtain that
P(B̄r(Gn ) (on ) = t, |vw /n − aw | ≤ 1/r ∀w ∈ V (t))
→ µ(B̄r(G) (o) = t, |Aw − aw | ≤ 1/r ∀w ∈ V (t)). (5.4.76)
Denote

Nn,r t, (aw )w∈V (t) = #{v : B̄r(Gn ) (v) = t, |vw − aw | ≤ 1/r ∀w ∈ V (t)}. (5.4.77)
Then, (5.4.76) shows that
1 h i
E Nn,r t, (aw )w∈V (t) → µ(Br(G) (o) = t, |Aw − aw | ≤ 1/r ∀w ∈ V (t)). (5.4.78)
n
In turn, this shows that the claimed marked local weak convergence holds.

5.4.3 L OCAL C ONVERGENCE : S ECOND M OMENT

In this subsection we complete the proof of the marked local convergence in probability.
Below, we say that the two rooted vertex- and edge-marked trees t̄1 and t̄2 have distinct and
disjoint vertex sets when (a) the vertices in (vw )w∈V (t̄1 ) , as well as those in (vw )w∈V (t̄2 ) ,
are distinct; and (b) there is no vertex appearing in (vw )w∈V (t̄1 ) as well as in (vw )w∈V (t̄2 ) .
The main result is the following theorem:
Theorem 5.23 (Marked local convergence) Fix m ≥ 1 and δ > −m, and consider
Gn = PA(m,δ)
n (d). Let t̄1 and t̄2 be two rooted vertex- and edge-marked trees with distinct
and disjoint vertex sets and with root vertices o1 and o2 , respectively. Uniformly for vw > εn
and χvw ≤ K for all w ∈ V (t1 ) ∪ V (t2 ), with χ̂v = fv (ψv ),
P(B̄r(Gn ) (o1 ) = t̄1 , B̄r(Gn ) (o2 ) = t̄2 | (ψvw )w∈V (t1 )∪V (t2 ) ) (5.4.79)
1 + oP (1)
= |V (t)|
gt1 (vw /n)w∈V (t1 ) ; (χ̂vw )w∈V (t1 ) gt2 (vw /n)w∈V (t2 ) ; (χ̂vw )w∈V (t2 ) .
n
Consequently, PA(m,δ)
n (d) converges locally in probability in the marked sense to the Pólya
point tree.
Theorem 5.23 proves a result that is stronger than the local convergence in probability
claimed in Theorem 5.8, in several ways. Foremost, Theorem 5.23 proves marked local
convergence, so that also the ages of the vertices in the r-neighborhood converge to those in
5.4 Proof of Local Convergence for Preferential Attachment Models 227

the Pólya point tree. Further, Theorem 5.23 establishes a local density limit theorem for the
vertex marks.
The proof of Theorem 5.23 follows that of Theorem 5.21, so we can be more succinct. We
have the following characterization of the conditional law of the vertex- and edge-marked
versions of B̄r(Gn ) (o1 ) and B̄r(Gn ) (o2 ), where Gn = PA(m,δ)
n (d), which is a generalization of
Proposition 5.22 to two neighborhoods:

Proposition 5.24 (Law of neighborhoods in PA(m,δ) n (d)) Fix m ≥ 1 and δ > −m, and
(m,δ)
consider Gn = PAn (d). Let t̄1 and t̄2 be two rooted vertex- and edge-marked trees
with distinct and disjoint vertex sets and root vertices o1 and o2 , respectively. Uniformly for
vw > εn and χvw ≤ K for all w ∈ V (t1 ) ∪ V (t2 ), where χv = fv (ψv ), as n → ∞,

P B̄r(Gn ) (o1 ) = t̄1 , B̄r(Gn ) (o2 ) = t̄2 (ψv )v∈V (t̄1 )∪V (t̄2 )
0
Y Y
= (1 + oP (1)) ψvpv e−(2m+δ)(vψv )κ(v/n)
v∈V (t̄1 )∪V (t̄2 ) v∈V ◦ (t̄1 )∪V ◦ (t̄2 )
Y (βs + qs0 − 1)qs0
× , (5.4.80)
s∈[n]\(V (t̄1 )∪V (t̄2 ))
(α + βs + qs0 − 1)qs0

where now

p0s = 1{s∈V (t̄1 )∪V (t̄2 )} 1{u∼s,u>s} ,

X
(5.4.81)
u∈V (t̄1 )∪V (t̄2 )

1{u 1{s∈(v,u)} .
X
qs0 = ju
v}
(5.4.82)
u,v∈V (t̄1 )∪V (t̄2 )

Proof We first condition on all (ψv )v∈[n] and use Lemma 5.12 to obtain, for two trees t̄1
and t̄2 , as in (5.4.56),
n
1 Y 0
Y 0
Pn (B̄r (o1 ) = t̄1 , B̄r (o2 ) = t̄2 ) = 2 ψvpv (1 − ψs )qs
n v∈V (t̄1 )∪V (t̄2 ) s=2
Y Y
× [1 − Pu,v ], (5.4.83)
v∈V ◦ (t̄ )∪V ◦ (t̄ ) j
1 2
u,j : u6 v

where the factor 1/n2 is due to the uniform choices of the vertices o1 , o2 , the first double
product is due to all the edges required to ensure that B̄r(Gn ) (oi ) ⊆ t̄i for i ∈ {1, 2},
while the second double product is due to the edges that are not allowed to be there, so that
B̄r(Gn ) (oi ) really equals t̄i for i ∈ {1, 2}. The remainder of the proof follows the steps in
the proof of Proposition 5.22, and is omitted.

We continue by proving Theorem 5.23. This proof follows that of Theorem 5.21, using
Proposition 5.24 instead of Proposition 5.22. The major difference is that in Proposition
5.24, we assume that V (t̄1 ) and V (t̄2 ) are disjoint, which we prove to be the case whp next.
228 Connected Components in Preferential Attachment Models

Disjoint Neighborhoods
We note that, as in Corollary 2.20,
h i
P(B̄r(Gn ) (o1 ) ∩ B̄r(Gn ) (o2 ) 6= ∅) = 1 − E |B2r
(Gn )
(o1 )|/n = 1 − o(1),

by dominated convergence and since |B2r(Gn )

(o1 )| is a tight sequence of random variables by
Theorem 5.21. Thus, we may restrict our analysis to t̄1 and t̄2 for which V (t̄1 ) and V (t̄2 )
are disjoint, which we assume from this point onwards.

Completion of the Proof of Theorem 5.23

Recall Nn,r t, (aw )w∈V (t) from (5.4.77). Note that, for disjoint V (t̄1 ) and V (t̄2 ), the two
factors on the rhs of (5.4.79) are independent. By taking the expectation in (5.4.79) and
using (5.4.50), we obtain
P(B̄r(Gn ) (o1 ) = t̄1 , B̄r(Gn ) (o2 ) = t̄2 )
1
= (1 + o(1)) |V (t1 )|+|V (t2 )| gt1 (vw /n)w∈V (t1 ) gt2 (vw /n)w∈V (t2 ) . (5.4.84)
n
As in the proof of Theorem 5.21, (5.4.84) implies that
1 h 2 i
E N n,r t, (aw ) w∈V (t) → µ(Br(G) (o) = t, |Aw −aw | ≤ 1/r ∀w ∈ V (t))2 . (5.4.85)
n2
In turn, this implies that
1 P
Nn,r t, (aw )w∈V (t) −→ µ(Br(G) (o) = t, |Aw − aw | ≤ 1/r ∀w ∈ V (t)), (5.4.86)
n
which implies the claimed marked local convergence in probability.

Offspring Operator of the Pólya Point Tree

We close this subsection by discussing the offspring operator of the local limit. The Pólya
point tree, arising as the local limit in Theorems 5.8 and 5.23, is a multi-type branching
process (recall Section 3.4). There, we saw that the offspring operator is an important func-
tional for the behaviour of such multi-type branching processes. This operator is defined as
follows.
Let s, t ∈ {Y, O} and x, y ∈ [0, 1]. Let κ (x, s), ([0, y], t) denote the expected number
of children of type in ([0, y], t) of an individual of type (x, s), and let
d
κ (x, s), (y, t) = κ (x, s), ([0, y], t) (5.4.87)
dy
denote the integral kernel of the offspring
operator of the Pólya point tree multi-type branch-
ing process. Then κ (x, s), (y, t) is computed in the following lemma:
Lemma 5.25 (Offspring operator of Pólya point tree) For all s, t ∈ {Y, O} and x, y ∈
[0, 1],
cst (1{x>y,t=O} + 1{x<y,t=Y} )
κ (x, s), (y, t) = , (5.4.88)
(x ∨ y)χ (x ∧ y)1−χ
5.4 Proof of Local Convergence for Preferential Attachment Models 229

with

m(m+δ)


 2m+δ
for st = OO,
 m(m+1+δ)

for st = OY,
2m+δ
cst = (m−1)(m+δ) (5.4.89)


 2m+δ
for st = YO,
 m(m+δ)

for st = YY.
2m+δ

Proof Recall (5.3.3). Note that U 1/χ has density χy χ−1 . Thus, for [a, b] ⊆ [0, x], and with
χ = (m + δ)/(2m + δ),
b/x
hZ i bχ − aχ
κ (x, O), ([a, b], O) = mE χy χ−1 dy = m

. (5.4.90)
a/x xχ

Further, by (5.3.5), for [a, b] ⊆ [x, 1] and noting that 1/(τ − 1) = m/(2m + δ) = 1 − χ,
we have
Z b 1/(τ −1)−1
h y i
κ (x, O), ([a, b], Y) = (τ − 1)E Poi Γ?

dy
a x1/(τ −1)
1/(τ −1) 1/(τ −1)
b −a
= E[Γ? ]
x −1)
1/(τ

b1−χ − a1−χ
= (m + 1 + δ) . (5.4.91)
x1−χ
Similarly, for [a, b] ⊆ [0, x],
b/x
hZ i bχ − aχ
κ (x, Y), ([a, b], O) = (m − 1)E χy χ−1 dy = (m − 1)

, (5.4.92)
a/x xχ

while, for [a, b] ⊆ [x, 1],

b
h Z y 1/(τ −1)−1 i
κ (x, Y), ([a, b], Y) = (τ − 1)E Poi Γ

dy
a x1/(τ −1)
b1/(τ −1) − a1/(τ −1)
= E[Γ]
x1/(τ −1)
1−χ
b − a1−χ
= (m + δ) . (5.4.93)
x1−χ

We now use (5.4.87), and see that all terms in κ (x, s), (y, t) are of the form of the rhs of
(5.4.88), with



 mχ for st = OO,
(m + 1 + δ)(1 − χ) for st = OY,

cst = (5.4.94)


 (m − 1)χ for st = YO,
(m + δ)(1 − χ)

for st = YY.

Substituting χ = (m + δ)/(2m + δ) yields (5.4.89).

230 Connected Components in Preferential Attachment Models

5.4.4 L OCAL C ONVERGENCE OF R ELATED M ODELS

In this subsection we discuss the local convergence of two related models. The main result
is the following theorem:
Theorem 5.26 (Local convergence of related preferential attachment models) Fix m ≥ 1
and δ > −m. The preferential attachment models PA(m,δ) n (a) and PA(m,δ)
n (b) converge
locally in probability to the Pólya point tree.

Proof We do not present the entire proof but, rather, explain how the proof of Theorem 5.8
can be adapted. References can be found in the notes in Section 5.7. The proof of Theorem
5.8 has two main steps, the first being the fixed-graph Pólya urn representation in Theorem
5.10, which is the crucial starting point of the analysis. In the second step, this representation
is used to perform the second-moment method for the marked neighborhood counts. In the
present proof, we focus on the first part, as this is the most sensitive to minor changes in the
model.

Finite-Graph Pólya Version of PA(1,δ/m)mn (a) and PA(1,δ/m)

mn (b)
We start with PAn (b). We recall that PAn (b) can be obtained from PA(1,δ/m)
(m,δ) (m,δ)
mn (b) by
collapsing the vertices [mv] \ [m(v − 1)] in PA(1,δ/m)
mn (b) into vertex v in PA (m,δ)
n (b) . When
j (m,δ)
we do this collapsing, all edges are also collapsed. Thus, the event u v in PAn (b) is
1
equivalent to m(u − 1) + j [mv] \ [m(v − 1)] in PAmn (b). Further, for m = 1,
(1,δ/m)

(1,δ/m)
PAmn (b) and PA(1,δ/m)
mn (d) are the same, except that PA(1,δ/m)
mn (b) starts with two vertices
with two edges between them, while PA(1,δ/m) mn (d) starts with two vertices with one edge
between them. This different starting graph was addressed in Remark 5.11, where it was
explained that the finite-graph Pólya version is changed only in a minor way. We can thus
use the results obtained thus far, together with a collapsing procedure, to obtain the Pólya
urn description of PA(1,δ/m)
mn (b).
For PA(m,δ)
n (a), we also use that it can be obtained from PA(1,δ/m)
mn (a) by collapsing
the vertices [mv] \ [m(v − 1)] in PA(1,δ/m)mn (a) into vertex v in PA(m,δ)
n (a). However,
(1,δ/m) (1,δ/m)
PAmn (a) is not quite the same as PAmn (d). Instead, we use the description in Theo-
rem 5.6 and compare it with that in Theorem 5.7 to see that now the Beta variables are given
by (ψj0 )j∈[n] with ψ10 = 1, and, for j ≥ 2,

ψj0 ∼ Beta 1 + δ/m, 2(j − 1) + (j − 1)δ/m .

(5.4.95)
Also, the definitions in (5.3.20) need to be updated. This affects the finite-graph Pólya for
m = 1 only in a minor way.

Effect of Collapsing Vertices

The above gives us finite-graph Pólya versions of PA(1,δ/m)
mn (a) and PA(1,δ/m)
mn (b). These al-
low us to describe the local limit for m = 1 as before. However, in order to go to PA(m,δ)
n (a)
and PA(m,δ)
n (b) , we need to investigate the effect of the collapsing procedure on the local
limit. Now each vertex has m older neighbors rather than 1 as for m = 1, and this effect
is easily understood. Note in particular that τ = 3 + δ/m is the same in PA(1,δ/m)
mn (a) and
5.5 Connectivity of Preferential Attachment Models 231
(1,δ/m)
PAmn (b), respectively, compared with PA(m,δ)
n (a) and PA(m,δ)
n (b), respectively, so the
distribution of the ages of the older vertices is also the same in both.
A more challenging difference arises in the description of the Poisson processes of younger
neighbors in (5.3.5). We use the fact that the sum of independent Poisson processes is again
a Poisson process, where the intensity of the sum is the sum of the individual intensities.
Thus, the intensity after collapsing becomes
m
(τ − 1)x1/(τ −1)−1 X
ρw (x) = 1/(τ −1)
Γw,i , (5.4.96)
Aw i=1

where (Γw,i )i∈[m] are iid Gamma variables with parameters given in (5.3.2) for m = 1
and with δ replaced by δ/m. Recall that the sum of iid Gamma parameters with parameters
(r
Pim )i∈[m] and scale parameter λ = 1 is again Gamma distributed, now with parameter r =
i=1 ri and scale parameter λ = 1. Recall that either ri = 1 + δ/m or ri = 1 + δ/m + 1
and that there is no ri = 1 + δ/m + 1 when w has label Y, while Pmthere is exactly one i
with ri = 1 + δ/m + 1 when i has label O. Thus, we see that i=1 Γw,i has a Gamma
distribution with parameters r = m + δ and λ = 1 when the label of w is Y, while it has
parameters r = m + δ + 1 and λ = 1 when the label of w is O, as in (5.3.2) for m ≥ 2. We
refrain from giving more details.

5.5 C ONNECTIVITY OF P REFERENTIAL ATTACHMENT M ODELS

In this section we investigate the connectivity of PA(m,δ)

n (a). Recall that PA(m,δ)
n (b) and
(m,δ) (m,δ)
PAn (d) are by construction connected, so PAn (a) is special in this respect. We start
by describing the connectivity when m = 1, which is special. For m = 1, the number of
connected components Nn of PA(1,δ)n has distribution given by
Nn = I1 + I2 + · · · + In , (5.5.1)
where Ii is the indicator that the ith edge forms a self-loop, so that (Ii )i≥1 are independent
indicator variables with
1+δ
P(Ii = 1) = . (5.5.2)
(2 + δ)(i − 1) + 1 + δ
It is not hard to see that this implies that Nn / log n converges in probability to (1 + δ)/(2 +
δ) < 1, so that whp there exists a largest connected component of size at least n/ log n.
As a result, whp PA(1,δ)n is not connected, but has a few connected components which are
almost all quite large. We do not elaborate more on the connectivity properties for m = 1
and instead leave the asymptotics of the number of connected components as Exercises 5.26
and 5.27.
For m ≥ 2, the situation is entirely different since then PA(m,δ)
n (a) is connected whp at
all sufficiently large times:
Theorem 5.27 (Connectivity of PAn(m,δ) (a) for m ≥ 2) Fix m ≥ 2. Then, with high
probability for T large, PA(m,δ)
n (a) is connected for all n ≥ T .
Proof Again, we let Nn denote the number of connected components of PA(m,δ)
n (a). We
232 Connected Components in Preferential Attachment Models

note that In = Nn − Nn−1 = 1 precisely when all m edges of vertex n are attached to
vertex n. Thus,
Y 2e − 1 + δ
P(In = 1) = . (5.5.3)
j∈[m]
(2m + δ)n + (2j − 1 + δ)

For m ≥ 2,
∞
X
P(In = 1) < ∞, (5.5.4)
n=2

so that, almost surely, In = 1 occurs only finitely often. As a result, limn→∞ Nn < ∞
almost surely since
X ∞
Nn ≤ 1 + In . (5.5.5)
n=2

This implies that, for m ≥ 2, PAn (a) almost surely contains only finitely many con-
(m,δ)

nected components. Further, PA(m,δ)

n (a) has a positive probability of being disconnected at
a certain time n ≥ 2 (see Exercise 5.30 below). However, for m ≥ 2, In = Nn − Nn−1
can be negative, since the edges of the nth vertex can be attached to two distinct connected
components. We will see that this happens whp, which explains why Nn = 1 whp for n
large, as we show next.
We first fix K ≥ 1 large. Then, with probability converging to 1 as K → ∞,
∞
1{N
X
n >Nn−1 }
= 0. (5.5.6)
n=K

We condition on n=K 1{Nn >Nn−1 } = 0, so that no new connected components are formed
P∞
after time K , and the number of connected components can only decrease in time. Let Fs
denote the σ -algebra generated by (PA(m,δ) n (a))sn=1 . We are left with proving that, for n
sufficiently large, the vertices in [K] are whp all connected in PA(m,δ)
n (a).
The proof proceeds in two steps. We show that, if Nn ≥ 2 and n is large, P(N2n − Nn ≤
−1 | Fn ) is uniformly bounded from below. Indeed, we condition on Fn for which Nn ≥ 2
and Nn ≤ K . Then, using Nn ≤ K , PA(m,δ) n (a) must have one connected component of
size at least n/K , while every other component has at least one vertex in it, and its degree
is at least m. Fix s ∈ [2n] \ [n]. Then, the probability that the first edge of vs(m) connects to
the connected component of size at least n/K , while the second connects to the connected
component of size at least 1, is, conditional on Fs−1 , at least
m + δ (m + δ)n/K ε
≥ , (5.5.7)
2(2m + δ)n 2(2m + δ)n n
for some ε > 0 and uniformly in s. Thus, conditional on Fn , the probability that this
happens for at least one s ∈ [2n] \ [n] is at least
ε n
1− 1− ≥ η > 0, (5.5.8)
n
uniformly for every n. Thus, when Nn ≥ 2, P(N2n − Nn ≤ −1 | Fn ) ≥ η . As a result,
a.s.
Nn −→ 1, so that NT = 1 for some T < ∞ almost surely. Without loss of generality, we
5.6 Further Results for Preferential Attachment Models 233

can take T ≥ K . When n=K 1{Nn >Nn−1 } = 0, if NT = 1 for some T then Nn = 1 for
P∞
all n ≥ T . This proves that PA(m,δ)
n (a) is whp connected for all n ≥ T , where T is large,
which implies Theorem 5.27.
Exercise 5.31 investigates the all-time connectivity of (PA(m,δ)
n (a))n≥1 .

5.6 F URTHER R ESULTS FOR P REFERENTIAL ATTACHMENT M ODELS

Local Limit of Preferential Attachment Models with Random Out-Degrees

We start by discussing a result concerning a preferential attachment model where vertex i
has mi edges to connect to the graph at time i − 1. Each of these edges is connected to a
vertex in [i] with a probability proportional to the degree plus δ , as usual. There are various
possible choices for the model (corresponding to PA(m,δ)n (a), PA(m,δ)
n (b), and PA(m,δ)
n (d)),
but these all behave similarly, so we will be less precise.
We assume that (mi )i∈[n] is an iid sequence and that δ > − inf{d : P(mi = d) > 0}
is such that mi + δ > 0 almost surely. We call this the preferential attachment model with
random out-degrees. Here, we describe the local limit of this model:
Theorem 5.28 (Local convergence of preferential attachment models: random out-degrees)
Let (mi )i≥1 be iid out-degrees, and fix δ > − inf{d : P(mi = d) > 0}. The preferential
attachment model with random out-degrees converges locally in probability to the Pólya
point tree with random degrees.
The Pólya point tree with random degrees arises from the Pólya point tree by letting the
initial degrees be random. We use the model related to PA(1,δ) n (a), so that now blocks of
vertices of random length are being collapsed into single vertices in this model, and the δ
corresponding to the ith block is equal to δ/mi . Interestingly, and similarly to many related
random graph models, there is an explicit size-biasing present in this model. Indeed, the
out-degrees of nodes of label O in the tree have distribution
m+δ
P(M = m), (5.6.1)
E[M ] + δ
where M is the random variable of which (mi )i≥1 are iid copies, while the out-degrees of
nodes of label Y in the tree have the size-biased distribution
m
P(M = m). (5.6.2)
E[M ]
For δ = 0 only, the two are the same.

Local Structure of Bernoulli Preferential Attachment Model

Recall the Bernoulli preferential attachment model (BPA(f )
n )n≥1 defined in Section 1.3.5.
Here we investigate the local structure of this model. A special case is (BPA(f )
n )n≥1 with an
affine attachment function f , i.e., the setting where there exist γ, β > 0 such that
f (k) = γk + β. (5.6.3)
Owing to the attachment rules in (1.3.66), the model does not satisfy the rescaling prop-
erty that the model with cf has the same law as the model with f for any c > 0. In fact,
234 Connected Components in Preferential Attachment Models

it turns out that the parameter γ > 0 (which, by convention, is always taken to be 1 for
(PA(m,δ)
n )n≥1 ) is now the parameter that determines the tail behaviour of the degree distri-
bution (recall Exercise 1.23). In Exercises 5.23 and 5.24, the reader is invited to compute
the average degree of this affine model, as well as the number of edges added at time n for
large n.
The case of general attachment functions k 7→ f (k) is more delicate to describe. We start
by introducing some notation.We call a preferential attachment function f : N0 7→ (0, ∞)
concave when
f (0) ≤ 1 and ∆f (k) := f (k + 1) − f (k) < 1 for all k ≥ 0. (5.6.4)
Concavity implies the existence of the limit
f (k)
γ := lim = min ∆f (k). (5.6.5)
k→∞ k k≥0

The following result investigates the proportion of vertices v ∈ [n] whose connected
component C (v) at time n has size k :
Theorem 5.29 (Component sizes of Bernoulli preferential attachment models) Let f be a
concave attachment function. The Bernoulli preferential attachment model with condition-
n satisfies that, for every k ≥ 1,
ally independent edges BPA(f )

1 P
#{v ∈ [n] : |C (v)| = k} −→ µ(|T | = k), (5.6.6)
n
where |T | is the total progeny of an appropriate multi-type branching process.
The limiting multi-type branching process in Theorem 5.29 is such that µ(|T | = k) > 0
for all k ≥ 1, so that BPA(f )
n is disconnected whp. While Theorem 5.29 does not quite prove
the local convergence of BPA(f )
n , it is strongly related. See Section 5.7 for a more detailed
discussion. Thus, in what follows, we discuss the proof as if the theorem does yield local
convergence.
We next describe the local limit and degree evolution in more detail. We start with two
main building blocks. Let (Zt )t≥0 be a pure-birth Markov process with birth rate f (k) when
it is in state k , starting from Z0 = 0 (i.e., it jumps from k to k + 1 at rate f (k)). Further,
for σ ≥ 0, let (Zt − 1[σ,∞) (t))t≥0 be the process (Zt )t≥0 conditioned on having a jump
[σ]

at time σ .
Let S := {Y} × R ∪ ({O} × [0, ∞)) × R be the type space, considering the label as
an element in {Y} ∪ ({O} × [0, ∞)) and the location as being in R. It turns out that the
location of a vertex t ∈ N in BPA(f )
n corresponds to log (t/n), and we allow for t > n in
our description. Individuals of label Y correspond to individuals that are younger than their
parent in the tree. Individuals of label (O, σ) correspond to individuals that are older than
their parent, and for them we need to record the relative location of an individual compared
with its parent (roughly corresponding to the log of the ratio of their ages).
The local limit is a multi-type branching process with the following properties and off-
spring distributions. The root has label Y and location −E , where E is a standard exponen-
tial random variable with parameter 1; this variable corresponds to log(U ) with U uniform
in [0, 1]. A particle of label Y at location x has younger children of label Y with relative
5.6 Further Results for Preferential Attachment Models 235

locations at the jumps of the process (Zt )t≥0 , so that their locations are equal to x + πi ,
where πi is the ith jump of (Zt )t≥0 . A particle of label Y at location x has older children
with labels (O, −πi ), where (πi )i≥0 are the points in a Poisson point process on (−∞, 0]
with intensity measure given by
et E[f (Z−t )] dt, (5.6.7)
their locations being x + πi . A particle of label (O, σ) generates offspring
B of labels in O × [0, ∞) in the same manner as for a parent of label Y;
B of label Y with locations at the jumps of (Zt − 1[σ,∞) (t))t≥0 plus x.
[σ]

The above describes the evolution of the branching process limit for all times. This can
be interesting when investigating, e.g., the degree evolutions and graph structures of the
vertices in [n] at all times t ≥ n. However, for local convergence, we are interested only
in the subgraph of vertices in [n], so that only vertices with a negative location matter.
Thus, finally, we kill all particles with location x > 0 together with their entire tree of
descendants. This describes the local limit of BPA(f n , and |T | is the total progeny of this
)

multi-type branching process.

We next explain in more detail how the above multi-type branching process arises. An
important ingredient is the birth process (Zt )t≥0 . This process can be related to the in-
(in)
degree evolution in (BPA(f )
n )n≥1 as follows. Recall from (1.3.66) that Di (n) denotes
(in)
the in-degree of vertex i at time n. Note that the (Di (n))n≥0 are independent growth
processes, and
f (Di(in) (n))
P(Di(in) (n + 1) − Di(in) (n) = 1 | BPA(f )
n ) = , (5.6.8)
n
where Di(in) (i) = 0. Now consider (Zt )t≥0 , and note that, for ε > 0 small,
P(Zt+ε − Zt | Zt ) = εf (Zt ) + o(ε). (5.6.9)
Fix
n
X 1
tn = . (5.6.10)
k=1
k
Then,
P(Ztn − Ztn−1 | Ztn−1 ) = (tn − tn−1 )f (Ztn−1 ) + o(tn − tn−1 )
f (Ztn−1 )
= + o(1/n), (5.6.11)
n
so that the evolution of (Di(in) (n))n>i is almost the same as that of (Ztn )n>i when also
Zti = 0. Further, at the times k where a jump occurs in (Ztk )k>i , there is an edge between
vertex i and vertex k . The latter has location log(k/n) ≈ tk − tn , and the vertex to which
vertex i connects is younger and thus has label Y. The above gives a nice description of the
children of the root that have label Y.
The situation changes a little when we consider an individual of type (O, σ). Indeed, when
considering its younger children, we know already that one younger child is present with a
location x+σ . This means that the birth process of younger children is (Zt − 1[σ,∞) (t))t≥0
[σ]
236 Connected Components in Preferential Attachment Models

instead of (Zt )t≥0 , where we see that the offspring distribution depends on the jump σ of
the individual of label (O, σ).
We close this discussion by explaining how the children having labels in {O} × [0, ∞)
arise. Note that these children are the same for individuals having label Y as well as for those
having label (O, σ) with σ ≥ 0. Since the connection decisions are independent and edge
probabilities are small for n large, the number of connections to vertices in the range [a, b]n
are close to a Poisson random variable with an appropriate parameter, thus leading to an
appropriate Poisson process. The expected number of neighbors of vertex qn with ages in
[a, b]n is roughly
bn
" #
f (Di(in) (qn)) 1 b 1 b
X Z Z
E ≈ E[f (Dun (qn))]du ≈
(in)
E[f (Zlog(q/u) )]du
i=an
qn q a q a
Z log(q/b) Z log(b/q)
= e−t E[f (Zt )]dt = et E[f (Z−t )]dt,
log(q/a) log(a/q)
(5.6.12)

as in (5.6.7). When the age is in [a, b]n, the location is in [log(a), log(b)], so the change
in location compared with q is in [log(a/q), log(b/q)]. This explains how the children with
labels in {O} × [0, ∞) arise, and completes our discussion of the local structure of BPA(f )
n .

Giant Component for Bernoulli Preferential Attachment

The preferential attachment model PA(m,δ)
n (a) turns out to be connected whp, as we have
discussed in Section 5.5. This, however, is not true for BPA(f )
n . Here, we describe the exis-
tence of the giant component in this model:
Theorem 5.30 (Existence of a giant component: linear case) If f (k) = γk + β for some
0 ≤ γ < 1 and 0 < β ≤ 1, then BPAn(f ) has a unique giant component if and only if

1
( 21 − γ)2
γ≥ 2
or β> . (5.6.13)
1−γ
Consequently, if Cmax and C(2) denote the largest and second largest connected components
in BPA(f
n ,
)

P P
|Cmax |/n −→ ζ, |C(2) |/n −→ 0, (5.6.14)

where ζ > 0 precisely when (5.6.13) holds.

We next state a more general theorem about concave attachment functions. For this, we
introduce some notation. Define the increasing functions M , respectively, M Y and M (O,σ) ,
by
Z t
M (t) = e−s E[f (Zs )]ds, M Y (t) = E[Zt ], (5.6.15)
0
M (O,σ) (t) = E[Zt | ∆Zσ = 1] − 1[σ,∞) (t) for σ ∈ [0, ∞). (5.6.16)

Next, define a linear operator Aα on the Banach space C(S) of continuous, bounded
5.7 Notes and Discussion for Chapter 5 237

functions on the type space S := {Y} ∪ ({O} × [0, ∞]) by

Z ∞ Z ∞
(Aα g)(σ) = αt
g(O, t)e dM (t) + g(Y)e−αt dM σ (t), for σ ∈ S. (5.6.17)
0 0

The operator Aα should be thought of as describing the expected offspring of vertices of

different types, as explained in more detail below. The main result on the existence of a
giant component in the preferential attachment model with conditionally independent edges
is the following theorem:
Theorem 5.31 (Existence of a giant component) No giant component exists in BPAn(f ) if
and only if there exists 0 < α < 1 such that Aα is a compact operator with spectral radius
P P
ρ(Aα ) ≤ 1. Equivalently, |Cmax |/n −→ ζ and |C(2) |/n −→ 0, where ζ = 0 precisely
when there exists 0 < α < 1 such that Aα is a compact operator with spectral radius
ρ(Aα ) ≤ 1.
In general, ζ = 1 − k≥1 µ(|T | = k), as described in Theorem 5.29. However, this
P
description does not immediately allow for an explicit computation of ζ , so that it is hard
to decide when ζ > 0. It turns out that Aα is a well-defined compact operator if and only
if (Aα 1)(0) < ∞. When thinking of Aα as the reproduction operator, the spectral radius
ρ(Aα ) indicates whether the multi-type branching process has a positive survival probabil-
ity. Thus, the extinction condition ρ(Aα ) ≤ 1 should be thought of as the equivalent of the
usual condition E[X] ≤ 1 for the extinction of a discrete single-type branching process.

5.7 N OTES AND D ISCUSSION FOR C HAPTER 5

Notes on Section 5.2

The proof of Theorem 5.2 has been adapted from Ross (1996). More extensive discussions on exchange-
able random variables and their properties can be found in Aldous (1985) and Pemantle (2007), the latter
focussing on random walks with self-interaction, where exchangeability is a crucial tool. There has been a
lot of work on urn schemes, also in cases where the weight functions are not linear with equal slope, when
the limits can be seen to obey rather different characteristics. See, e.g., (Athreya and Ney, 1972, Chapter 9).
For results on the relation between degrees in various random trees, including plane trees, which are closely
related to preferential attachment trees, we refer to Janson (2004). Collevecchio et al. (2013) discussed the
relation between the degrees of vertices and preferential attachment models.

Notes on Section 5.3

The multi-type branching process local limit in Theorem 5.8 was established by Berger et al. (2014) for
preferential attachment models with a fixed number of outgoing edges per vertex. Berger et al. (2014) only
treated the case where δ ≥ 0; the more recent extension to δ > −m is due to Garavaglia et al. (2022)
and was first developed in the context of this book. This is due to the fact that Berger et al. (2014) viewed
the attachment probabilities as a mixture between attaching uniformly and according to the degree. We,
instead, rely on the Pólya urn description, which works for all δ > −m. (Berger et al., 2014, Theorem 2.2)
proved local weak convergence for δ ≥ 0. Theorem 5.8 stated local convergence in probability to the Pólya
point tree for all δ > −m. Local convergence in probability for δ ≥ 0 follows from the convergence in
probability of appropriate subgraph counts in (Berger et al., 2014, Lemma 2.4). We refrain from discussing
this issue further.
The Pólya-urn proof of Theorem 5.10 follows (Berger et al., 2014, Section 3.1) closely, apart from the
fact that we do not rely on the mixture of choosing a vertex uniformly and according to degree. The direct
proof of Theorem 5.10 is novel and also appeared in a more general setting in Garavaglia et al. (2022).
Berger et al. (2014) also studied two related settings, one where the edges are attached independently
238 Connected Components in Preferential Attachment Models

(i.e., without an intermediate update of the degrees while attaching the m edges incident to the newest
vertex), and a conditional model in which the edges are attached to distinct vertices. This shows that the
result is quite robust, as Theorem 5.26 also indicates.
A related version of Theorem 5.10 for δ = 0 was proved by Bollobás and Riordan (2004a) in terms of
a pairing representation. This applies to PA(m,δ)
n (a) with δ = 0. Another related version of Theorem 5.10
is proved in Rudas et al. (2007) and applies to general preferential attachment trees with m = 1. Its proof
relies on a continuous-time embedding in terms of continuous-time branching processes. We further refer
to Lo (2021) for results on the local convergence of preferential attachment trees with additive fitness.

Notes on Section 5.4

The preliminaries in Lemma 5.17, Proposition 5.18 and Lemma 5.20 follow the proofs of the respective
results by Berger et al. (2014) closely, even though we make some simplifications. The remainder of the
proof of Theorem 5.8 has been developed for this book, and is close in spirit to that in Garavaglia et al.
(2022), where the setting with iid out-degrees is also studied.

Notes on Section 5.5

These results are novel.

Notes on Section 5.6

The preferential attachment model with random out-degrees was introduced by Deijfen et al. (2009), who
studied its degree structure. Its local limit was derived in Garavaglia et al. (2022), where Theorem 5.28 is
proved.
Theorem 5.29 is (Dereich and Mörters, 2013, Theorem 1.9). The proof of (Dereich and Mörters, 2013,
Theorem 1.9) relied on a clever coupling argument; see, e.g., Proposition 6.1 in that paper for a statement
that |C (on )| ∧ cn can be, whp, successfully coupled to |T | ∧ cn when cn → ∞ sufficiently slowly.
Here on ∈ [n] is chosen uar. It seems to us that this statement can straightforwardly be adapted to the
statement that Br(Gn ) (on ) can, whp, be successfully coupled to Br(G) (∅) for all r such that |Br(G) (∅)| ≤ cn .
Here (G, ∅) denotes the local limit of the Bernoulli preferential attachment model. See also (Dereich and
Mörters, 2013, Lemma 6.6) for a related coupling result. This would provide local weak convergence.
For local convergence in probability, a variance computation is needed to perform the second-moment
method. In (Dereich and Mörters, 2013, Proposition 7.1), such a variance computation was performed for
the number of vertices whose cluster size is at least cn , and its proof relied on an exploration process
argument. Again, it seems to us that this statement can straightforwardly be adapted to a second-moment
method for Nn (t), counting the number of v ∈ [n] for which Br(Gn ) (v) is isomorphic to an ordered tree t.
The results on the giant component for preferential attachment models with conditionally independent
edges in Theorems 5.30 amd 5.31 were proved in Dereich and Mörters (2013). Theorem 5.30 is (Dereich
and Mörters, 2013, Proposition 1.3). Theorem 5.31 is (Dereich and Mörters, 2013, Theorem 1.1). The proof
uses a nice sprinkling argument, where first f is replaced with (1 − ε)f and concentration is proved for
the number of vertices having connected components of size at least cn , and then the remaining edges are
“sprinkled” to obtain the full law of large numbers. See (Dereich and Mörters, 2013, Section 8) where this
argument is worked out nicely.
The notation used by Dereich and Mörters (2013) for Theorem 5.30 is slightly different from ours.
Dereich and Mörters (2009) proved that their model obeys an asymptotic power law with exponent τ =
1 + 1/γ = 3 + δ/m, so that γ intuitively corresponds to γ = m/(2m + δ). As a result, γ ≥ 12 corresponds
to δ ≤ 0 and τ ∈ (2, 3], which is also precisely the setting where the configuration model always has a
giant component (recall Theorem 4.9). The parameter γ plays a crucial role in the analysis, as can already be
observed in Theorem 5.30. The fact that Aα is a well-defined compact operator if and only if (Aα 1)(0) <
∞ is (Dereich and Mörters, 2013, Lemma 3.1). We refer to Dereich and Mörters (2009, 2011) for related
results on Bernoulli preferential attachment models.

5.8 E XERCISES FOR C HAPTER 5

Exercise 5.1 (Stationarity of exchangeable sequences) Show that the marginal distribution of Xi is the
same as that of X1 if (Xi )n
i=1 are exchangeable. Show also that the distribution of (Xi , Xj ), for j 6= i, is
the same as that of (X1 , X2 ).
5.8 Exercises for Chapter 5 239

Exercise 5.2 (Iid sequences are exchangeable) Show that (Xi )i≥1 forms an infinite sequence of exchange-
able random variables if (Xi )i≥1 are iid.
Exercise 5.3 (Limiting density in de Finetti’s Theorem) Use de Finetti’s Theorem (Theorem 5.2) to prove
a.s.
that Sn /n −→ U , where U appears in (5.2.1). Use this to prove (5.2.4).
Exercise 5.4 (Number of ones in (Xi )n Prove that P(Sn = k) = E P Bin(n, U ) = k in (5.2.3)

i=1 )
follows from de Finetti’s Theorem (Theorem 5.2).
Exercise 5.5 (Positive correlation of exchangeable random variables) Let (Xi )i≥1 be an infinite sequence
of exchangeable random variables. Prove that

P(Xk = Xn = 1) ≥ P(Xk = 1)P(Xn = 1). (5.8.1)

Prove that equality holds if and only if (Xk )k≥1 are iid.
Exercise 5.6 (Limiting density of mixing distribution for Pólya urn schemes) Show that (5.2.24) identifies
the limiting density in (5.2.4).
Exercise 5.7 (Uniform recursive trees) A uniform recursive tree is obtained by starting with a single
vertex, and successively attaching the (n + 1)th vertex to a uniformly chosen vertex in [n]. Prove that, for
uniform recursive trees, the tree decomposition in Theorem 5.4 is such that

S1 (n) a.s.
−→ U, (5.8.2)
S1 (n) + S2 (n)

where U is uniform on [0, 1]. Use this to prove that P(S1 (n) = k) = 1/n for each k ∈ [n].
Exercise 5.8 (Scale-free trees) Recall the model studied in Theorem 5.4, where at time n = 2, we start
with two vertices of which vertex 1 has degree d1 and vertex 2 has degree d2 . After this, we successively
attach vertices to older vertices with probabilities proportional to the degree plus δ > −1 as in (1.3.64).
Show that the model for (PA(1,δ)
n (b))n≥1 , for which the graph at time n = 2 consists of two vertices joined
by two edges, arises when d1 = d2 = 2. What does Theorem 5.4 imply for (PA(1,δ) n (b))n≥1 ?
Exercise 5.9 (Relative degrees of vertices 1 and 2) Use Theorem 5.6 to compute limn→∞ P(D2 (n) ≥
xD1 (n)) for (PA(1,δ)
n (a))n≥1 .
Exercise 5.10 (Proof of Theorem 5.7) Complete the proof of Theorem 5.7 on the relative degrees in
scale-free trees for (PA(1,δ)
n (b))n≥1 by adapting the proof of Theorem 5.6.
Exercise 5.11 (Size-biased Gamma is again Gamma) Let X have a Gamma distribution with shape pa-
rameter r and scale parameter λ. Show that its size-biased version X ? has a Gamma distribution with
shape parameter r + 1 and scale parameter λ.
Exercise 5.12 (Power-law exponents in PA(m,δ)
n (d)) Use Lemma 5.9 to prove the power-law relations in
(5.3.8) of the asymptotic degree and neighbor degree distributions in PA(m,δ)
n (d), and identify the constants
cm,δ and c0m,δ appearing in them.

Exercise 5.13 (Joint law (D, D0 ) for PA(m,δ)

n (d) (Berger et al., 2014, Lemma 5.3)) Adapt the proof of
Lemma 5.9 to show that, for j ≥ m and k ≥ m + 1,

2m + δ Γ(k + 1 + δ) Γ(j + δ)
P(D = j, D0 = k) =
m2 (k − m)!Γ(m + 1 + δ) (j − m)!Γ(m + δ)
Z 1Z 1
× (1 − v)k−m v m+1+δ+δ/m (1 − u)j−m um+δ dudv. (5.8.3)
0 v

Exercise 5.14 (Conditional law D0 given D for PA(m,δ)

n (d) (Berger et al., 2014, Lemma 5.3)) Use Lemma
5.9 and Exercise 5.13 to show that, for fixed j ≥ m and as k → ∞, there exists Cj > 0 such that

P(D0 = k | D = j) = Cj k−(2+δ/m) (1 + o(1)). (5.8.4)

240 Connected Components in Preferential Attachment Models

Exercise 5.15 (Joint law (D, D0 ) for PA(m,δ)

n (d) (Berger et al., 2014, Lemma 5.3)) Use Lemma 5.9 and
Exercise 5.13 to show that, for fixed k ≥ m + 1 and as j → ∞, there exists C̃j > 0 such that
P(D = j | D0 = k) = C̃j j −(4+δ+δ/m) (1 + o(1)). (5.8.5)
(n)
Exercise 5.16 (Simple form of Sk in (5.3.20)) Prove, using induction on k, that the equality Sk(n) =
Q n
i=k+1 (1 − ψi ) in (5.3.20) holds.

Exercise 5.17 (Multiple edges and Theorem 5.10) Fix m = 2. Let Mn denote the number of edges in
(m,δ)
PAn (d) that need to be removed so that no multiple edges remain. Use Theorem 5.10 to show that,
conditional on (ψk )k≥1 , the sequence (Mn+1 − Mn )n≥2 is an independent sequence with
n
X ϕ k 2
P Mn+1 − Mn = 1 | (ψk )k≥1 = . (5.8.6)
k=1
Sn(n)
Exercise 5.18 (Multiple edges and Theorem 5.10 (cont.)) Fix m = 2. Let Mn denote the number of edges
(m,δ)
in PAn (d) that need to be removed so that no multiple edges remain, as in Exercise 5.17. Use Exercise
5.17 to show that
n
" #
X ϕ 2
k
E[Mn+1 − Mn ] = E . (5.8.7)
k=1
Sn(n)
Exercise 5.19 (Multiple edges and Theorem 5.10 (cont.)) Fix m = 2. Let Mn denote the number of edges
in PA(m,δ)
n (d) that need to be removed so that no multiple edges remain, as in Exercise 5.17. Compute
" #
ϕ 2
k
E .
Sn(n)
Exercise 5.20 (Multiple edges and Theorem 5.10 (cont.)) Fix m = 2. Let Mn denote the number of
multiple edges in PA(m,δ)
n (d), as in Exercise 5.18, and fix δ > −1. Use Exercise 5.19 to show that
E[Mn ]/ log n → c, and identify c > 0. What happens when δ ∈ (−2, −1)?
Exercise 5.21 (Almost sure limit of normalized product of ψ’s) Let (ψj )j≥1 be independent Beta random
variables with parameters α = m + δ, β = (2j − 3)m + δ(j − 1) as in (5.3.19). Fix k ≥ 1. Prove that
(Mn (k))n≥k+1 , where
n
Y 1 − ψj
Mn (k) = , (5.8.8)
E[1 − ψj ]
j=k+1

is a multiplicative positive martingale. Thus, Mn (k) converges almost surely by the Martingale Conver-
gence Theorem ([V1, Theorem 2.24]).
ExerciseQ 5.22 (Almost sure limit of normalized product of Sk(n) ) Use Exercise 5.21, combined with the
fact that nj=k+1 E[1−ψj ] = ck (k/n) (1+o(1)) for some ck > 0, to conclude that (k/n)
χ χ Qn
j=k+1 (1−
ψj ) = (k/n)χ Sk(n) converges almost surely for fixed k.
Exercise 5.23 (Recursion formula for total edges in affine BPA(f t )
)
Consider the affine BPA(f )
n with
f (k) = γk + β. Derive a recursion formula for E[|E(BPA(f n
)
)|], where we recall that |E(BPA (f )
n )| is the
n . Identify µ such that E[|E(BPAn )|]/n → µ.
total number of edges in BPA(f ) (f )

Exercise 5.24 (Number of edges per vertex in affine BPAt(f ) ) Consider the affine BPA(f )
n with f (k) =
(f ) P
γk + β, as in Exercise 5.23. Argue that |E(BPAn )|/n −→ µ.
Exercise 5.25 (Degree of last vertex in affine BPAt(f ) ) Use the conclusion of Exercise 5.24 to show that
d
Dn (n) −→ Poi(µ).
Exercise 5.26 (CLT for number of connected components for m = 1) Show that the number of connected
components Nn in PA(1,δ)
n satisfies a central limit theorem when n → ∞, with equal asymptotic mean and
variance given by
1+δ 1+δ
E[Nn ] = log n(1 + o(1)), Var(Nn ) = log n(1 + o(1)). (5.8.9)
2+δ 2+δ
5.8 Exercises for Chapter 5 241

Exercise 5.27 (Number of connected components for m = 1) Use Exercise 5.26 to show that the number
P
of connected components Nn in PA(1,δ)
n satisfies Nn / log n −→ (1 + δ)/(2 + δ).
Exercise 5.28 (Number of self-loops in PA(m,δ)
n ) Fix m ≥ 1 and δ > −m. Use a similar analysis
P
to that in Exercise 5.26 to show that the number of self-loops Sn in PA(m,δ)
n (a) satisfies Sn / log n −→
(m + 1)(m + δ)/[2(2m + δ)].
Exercise 5.29 (Number of self-loops in PA(m,δ)
n (b)) Fix m ≥ 1 and δ > −m. Use a similar analysis
P
to that in Exercise 5.28 to show that the number of self-loops Sn in PA(m,δ)
n (b) satisfies Sn / log n −→
(m − 1)(m + δ)/[2(2m + δ)].
Exercise 5.30 (All-time connectivity for (PA(m,δ)
n (a))n≥1 ) Fix m ≥ 2. Show that the probability that
(PAn (m,δ)
(a))n≥1 is connected for all times n ≥ 1 equals P(In = 0 ∀n ≥ 2), where we recall that In is
the indicator that all the m edges of vertex n create self-loops.
Exercise 5.31 (All-time connectivity for (PA(m,δ)
n (a))n≥1 (cont.)) Fix m ≥ 2. Show that the probability
that (PA(m,δ)
n (a))n≥1 is connected for all times n ≥ 1 is in (0, 1).
Part III

Small-World Properties of Random Graphs

Summary of Part II
So far, we have considered the simplest connectivity properties possible. We focused on
vertex degrees in Volume 1, and in Part II of this book we extended this to the local structure
of the random graphs involved as well as the existence and uniqueness of a macroscopic
connected, or giant, component. We can summarize the results obtained in the following
meta theorem:

Meta Theorem A. (Existence and uniqueness of giant component) In a random

graph model with power-law degrees having power-law exponent τ , there is a
unique giant component when τ ∈ (2, 3), while, when τ > 3, there is a unique
giant component only when the degree structure in the graph exceeds a certain
precise threshold.

The above means, informally, that the existence of the giant component is quite robust
when τ ∈ (2, 3), while it is not when τ > 3. This informally extends even to the random
removal of edges, exemplifying the robust nature of the giant when τ ∈ (2, 3). These results
make the general philosophy that “random graphs with similar degree characteristics behave
alike” precise, at least at the level of the existence and robustness of a giant component. The
precise condition guaranteeing the existence of the giant varies, but generally amounts to the
survival of the local limit.

Overview of Part III

In Part III we aim to extend the discussion of the similarity of random graphs to their small-
world characteristics, by investigating distances within the giant component. We focus on
two settings:
B the typical distance in a random graph, which means the graph distance between most
pairs of vertices, as characterized by the graph distance between two uniformly chosen
vertices, conditioned on their being connected; and
B the maximal distance in a random graph, as characterized by its diameter.

In more detail, Part III is organized as follows. We study distances in general inhomoge-
neous random graphs in Chapter 6 and those in the configuration model, as well the closely

243
244

related uniform random graph with prescribed degrees, in Chapter 7. In the last chapter of
this part, Chapter 8, we study distances in the preferential attachment model.
C HAPTER 6
S MALL -W ORLD P HENOMENA IN
I NHOMOGENEOUS R ANDOM G RAPHS

Abstract
In this chapter we investigate the small-world structure in rank-1 and general
inhomogeneous random graphs. For this, we develop path-counting techniques
that are interesting in their own right.

6.1 M OTIVATION : D ISTANCES AND B RANCHING P ROCESSES

In this chapter we investigate the small-world properties of inhomogeneous random graphs.

It turns out that, when such inhomogeneous random graphs contain a giant, they are also
small worlds, in the sense that typical distances in them grow at most as the logarithm of
the network size. In some cases, for example when the variance of the weight distribution is
infinite in rank-1 models, their typical distances are even much smaller than the logarithm
of the network size, turning these random graphs into ultra-small worlds.
These results closely resemble typical distances in real-world networks. Indeed, see Fig-
ure 6.1 for the median (or 50% percentile) of typical distances in the KONECT data base,
as compared with the logarithm of the network size. We see that graph distances are, in gen-
eral, quite small. However, it is unclear whether these real-world examples correspond to
small-world or ultra-small-world behavior.
14
50-Percentile effective diameter

2
105 106 107
Size
Figure 6.1 Median of typical distances in the 727 networks of size larger than
10,000 from the KONECT data base.

Organization of this Chapter

We start in Section 6.2 by discussing results on the small-world phenomenon in inhomoge-
neous random graphs. We state results for general inhomogeneous random graphs and then

245
246 Small-World Phenomena in Inhomogeneous Random Graphs

specialize to rank-1 inhomogeneous random graphs, for which we can characterize their
ultra-small-world structure in more detail.
The proofs for the main results are in Sections 6.3–6.5. In Section 6.3 we prove lower
bounds on typical distances. In Section 6.4 we prove the corresponding upper bounds in
the doubly logarithmic regime, and in Section 6.5 we discuss path-counting techniques to
obtain the logarithmic upper bound for τ > 3. In Section 6.6 we discuss related results for
distances in inhomogeneous random graphs, including their diameter. We close the chapter
with notes and discussion in Section 6.7 and exercises in Section 6.8.

6.2 S MALL -W ORLD P HENOMENA IN I NHOMOGENEOUS R ANDOM G RAPHS

In this section we consider the distances between vertices of IRGn (κn ), where, as usual,
(κn ) is a graphical sequence of kernels with limit κ.
Recall that we write distG (u, v) for the graph distance between the vertices u, v ∈ [n] in
a graph G having vertex set V (G) = [n]. Here the graph distance between u and v is the
minimum number of edges in the graph G in all paths from u to v . Further, by convention,
we let distG (u, v) = ∞ when u, v are in different connected components. We define the
typical distance to be distG (o1 , o2 ), where o1 , o2 are two vertices that are chosen uar from
the vertex set [n].
It is possible that no path connecting o1 and o2 exists; then distIRGn (κn ) (o1 , o2 ) = ∞.
By Theorem 3.19, P(distIRGn (κn ) (o1 , o2 ) = ∞) → 1 − ζ 2 > 0, since ζ < 1 (see
Exercise 3.32). In particular, when ζ = 0, which is equivalent to ν = kT κ k ≤ 1,
P(distIRGn (κn ) (o1 , o2 ) = ∞) → 1. Therefore, in our main results, we condition on o1
and o2 being connected, and consider only cases where ζ > 0.

Logarithmic Typical Distances in Inhomogeneous Random Graphs

We start by discussing the logarithmic asymptotics of the typical distance in the case where
ν = kT κ k ∈ (1, ∞). The main result on typical distances in IRGn (κn ) is as follows:
Theorem 6.1 (Typical distances in inhomogeneous random graphs) Consider IRGn (κn ),
where (κn ) is a graphical sequence of kernels with limit κ for which ν = kT κ k ∈ (1, ∞).
Let ε > 0 be fixed. Then, with o1 , o2 chosen independently and uar from [n],
(a) if supx,y,n κn (x, y) < ∞ then
P(distIRGn (κn ) (o1 , o2 ) ≤ (1 − ε) logν n) = o(1); (6.2.1)
(b) if κ is irreducible then
P(distIRGn (κn ) (o1 , o2 ) ≤ (1 + ε) logν n) = ζκ2 + o(1). (6.2.2)
Consequently, when supx,y,n κn (x, y) < ∞ and κ is irreducible, conditional on o1 ←→ o2 ,
distIRGn (κn ) (o1 , o2 ) P
−→ 1. (6.2.3)
logν n
In the terminology of [V1, Section 1.4], Theorem 6.1(b) implies that IRGn (κn ) is a small
world when κ is irreducible and ν = kT κ k ∈ (1, ∞). Theorem 6.1(a) shows that the graph
6.2 Small-World Phenomena in Inhomogeneous Random Graphs 247

distances are of order ΘP (log n) when supx,y,n κn (x, y) < ∞, so that IRGn (κn ) is not
an ultra-small world. When kT κ k = ∞, a truncation argument can be used to prove that
distIRGn (κn ) (o1 , o2 ) = oP (log n), but its exact asymptotics is unclear. See Exercise 6.2.
The intuition behind Theorem 6.1 is that, by (3.4.7) and (3.4.8), a Poisson multi-type
branching process with kernel κ has neighborhoods that grow exponentially, i.e., the number
of vertices at distance k grows like kT κ kk . Thus, if we are to examine the distance between
two vertices o1 and o2 chosen uar from [n] then we need to explore the neighborhood of
vertex o1 up to the moment that it “catches” vertex o2 . For this to happen, the neighborhood
must have size of order n, so that we need kT κ kk = ν k ∼ n, i.e., k = kn ∼ logν n.
However, proving such a fact is quite tricky, since there are far fewer possible further vertices
to explore when the neighborhood has size proportional to n. The proof overcomes this fact
by exploring from the two vertices o1 and o2 simultaneously up to the first moment that
their neighborhoods share a common vertex, since then the shortest path is obtained. √ It turns
out that shared vertices start appearing when the neighborhoods have size roughly n. At
this moment, the neighborhood exploration
√ is still quite close to that in the local branching-
process limit. Since kT κ kr = ν r ∼ n when r = rn ∼ 21 logν n, this still predicts that
distances are close to 2rn ∼ logν n.
We next specialize to rank-1 inhomogeneous random graphs, where we also investigate
in more detail what happens when ν = ∞ in the case where the degree power-law exponent
τ satisfies τ ∈ (2, 3).

Distances in Rank-1 IRGs with Finite-Variance Weights

We continue by investigating the behavior of distNRn (w) (o1 , o2 ) for NRn (w) in the case
where the weights have finite variance:
Theorem 6.2 (Typical distances in rank-1 random graphs with finite-variance weights) Con-
sider NRn (w), where the weights w = (wv )v∈[n] satisfy Conditions 1.1(a)–(c) with ν > 1.
Then, with o1 , o2 chosen independently and uar from [n] and conditional on o1 ←→ o2 ,
distNRn (w) (o1 , o2 ) P 1
−→ . (6.2.4)
log n log ν
The same result applies, under identical conditions, to GRGn (w) and CLn (w).
Theorem 6.2 can be seen as a special case of Theorem 6.1. However, in Theorem 6.1(a) we
require that κn is bounded. In the setting of Theorem 6.2 this would imply that the maximal
weight wmax = maxv∈[n] wv is uniformly bounded in n. Theorem 6.2 does not require this
strong assumption.
We give a complete proof of Theorem 6.2 in Sections 6.3 and 6.5 below. There, we also
use the ideas in the proof of Theorem 6.2 to prove Theorem 6.1. The intuition behind The-
orem 6.2 is closely related to that behind Theorem 6.1. For rank-1 models, in particular,
kT κ k = ν = E[W 2 ]/E[W ], which explains the relation between Theorems 6.1 and 6.2.
Exercise 6.1 investigates typical distances for the Erdős–Rényi random graph.
Theorem 6.2 leaves open what happens when ν = ∞. We can use Theorem 6.2 to show
that distNRn (w) (o1 , o2 ) = oP (log n) (see Exercise 6.2). We next study the case where the
weights have an asymptotic power-law distribution with τ ∈ (2, 3).
248 Small-World Phenomena in Inhomogeneous Random Graphs

Distances in Rank-1 IRGs with Infinite-Variance Weights

We continue to study typical distances in rank-1 random graphs, in the case where the de-
grees obey a power law with degree exponent τ satisfying τ ∈ (2, 3). It turns out that
the scaling of distNRn (w) (o1 , o2 ) depends sensitively on the precise way in which νn =
E[Wn2 ]/E[Wn ] → ∞, where Wn is the weight of a uniform vertex. Below, we assume that
the vertex weights obey an asymptotic power law with exponent τ satisfying τ ∈ (2, 3). We
later discuss what happens in related settings, for example when τ = 3.
Recall from (1.3.10) that Fn (x) denotes the proportion of vertices having weight at most
x. Infinite-variance weights correspond to settings where Fn (x) ≈ F (x) and F has power-
law tails having tail exponent τ − 1 with τ ∈ (2, 3). In our setting, however, we need to
know that x 7→ Fn (x) is already close to such a power law. For this, we assume that there
exists a τ ∈ (2, 3) such that, for all δ > 0, there exists c1 = c1 (δ) and c2 = c2 (δ) such
that, uniformly in n,

c1 x−(τ −1+δ) ≤ [1 − Fn ](x) ≤ c2 x−(τ −1−δ) , (6.2.5)

where the upper bound holds for all x ≥ 1, while the lower bound is required to hold only
for 1 ≤ x ≤ nβ for some β > 12 .
The assumption in (6.2.5) precisely is what we need, and it states that [1 − Fn ](x) obeys
power-law type bounds for appropriate values of x. Note that the lower bound in (6.2.5)
cannot be valid for all x, since Fn (x) > 0 implies that Fn (x) ≥ 1/n so that the lower and
upper bounds in (6.2.5) are contradictory when x n1/(τ −1) . Thus, the lower bound can
hold only for x = O(n1/(τ −1) ). When τ ∈ (2, 3), we have that 1/(τ − 1) ∈ ( 12 , 1), and we
need the lower bound to hold only for x ≤ nβ for some β ∈ ( 12 , 1). Exercises 6.3 and 6.4
give simpler conditions for (6.2.5) in special cases, such as iid weights.
The main result on graph distances in the case of infinite-variance weights is as follows:
Theorem 6.3 (Typical distances in rank-1 random graphs with infinite-variance weights)
Consider NRn (w), where the weights w = (wv )v∈[n] satisfy Conditions 1.1(a),(b) and
(6.2.5). Then, with o1 , o2 chosen independently and uar from [n] and conditional on o1 ←→
o2 ,
distNRn (w) (o1 , o2 ) P 2
−→ . (6.2.6)
log log n | log (τ − 2)|
The same result applies, under identical conditions, to GRGn (w) and CLn (w).
Theorem 6.3 implies that NRn (w), with w satisfying (6.3.21), is an ultra-small world.
See Figure 6.2 for a simulation of the typical distances in GRGn (w) with τ = 2.5 and
τ = 3.5, respectively, where the distances are noticeably smaller in the ultra-small setting
with τ = 2.5 compared with the small-world case with τ = 3.5.
In the next two sections we prove Theorems 6.2 and 6.3. The main tool to study typical
distances in NRn (w) is by comparison with branching processes. For τ > 3, the branching-
process approximation has finite mean, and we can make use of the martingale limit results
for the number of individuals in generation k as k → ∞, so that this number grows exponen-
tially as ν k . This explains the logarithmic growth of the typical distances. When τ ∈ (2, 3),
on the other hand, the branching process has infinite mean. In this case, the number of
6.2 Small-World Phenomena in Inhomogeneous Random Graphs 249

(a) (b)
0.2

0.3
0.15
Proportion

Proportion
0.2
0.1

0.1
0.05

0
2 3 4 5 6 7 8 9 10 11 12 2 4 6 8 10 12 14 16 18 20 22 24
Typical Distance Typical Distance
Figure 6.2 Typical distances between 2,000 pairs of vertices in the generalized
random graph with n = 100, 000, for (a) τ = 2.5 and (b) τ = 3.5.
individuals in generation k , conditional on survival of the branching process, grows super-
exponentially in k , which explains why the typical distances grow doubly logarithmically.
See Section 7.4, where this is explained in more detail in the context of the configuration
model.
The super-exponential growth implies that a path between two vertices typically passes
through vertices with larger and larger weights as we move away from the starting and
ending vertices. Thus, starting from the first vertex o1 ∈ [n], the path connecting o1 to o2
uses vertices whose weights first grow until the midpoint of the path is reached, and then
decrease again to reach o2 . This can be understood by noting that the probability that a vertex
with weight w is not connected to any vertex with weight larger than y > w in NRn (w) is

wwv 1{wv >y} /`n = exp − w[1 − Fn? ](y) ,

n X o
exp − (6.2.7)
v∈[n]

where Fn? (y) = i∈[n] wi 1{wi ≤y} /`n is the distribution function of Wn? , to be introduced
P

in (6.3.24) below. When (6.2.5) holds, it follows that [1 − Fn? ](y) is close to y −(τ −2) ; the
size-biasing increases the power by 1 (recall Lemma 1.23). Therefore, the probability that
a vertex with weight w is not connected to any vertex with weight larger than y > w in
−(τ −2)
NRn (w) is approximately e−cwy for some c > 0. For w large, this probability is
small when y w1/(τ −2) . Thus, a vertex of weight w is whp connected to a vertex of
weight approximately w1/(τ −2) , where 1/(τ − 2) > 1 for τ ∈ (2, 3).

Organization of the Proofs of Small-World Results

The proofs of Theorems 6.1, 6.2, and 6.3 are organized as follows. Below, in the rank-1 set-
ting, we focus on NRn (w) but all proofs will be performed simultaneously for GRGn (w)
and CLn (w) as well. In Section 6.3 we prove lower bounds on the typical distance in
NRn (w), both when τ > 3 and when τ ∈ (2, 3). Proving lower bounds is generally
easier, as we may ignore the conditioning on distNRn (w) (o1 , o2 ) < ∞. These proofs rely on
path-counting techniques and the first-moment method, in that distNRn (w) (o1 , o2 ) = k im-
plies that there exists a path of k edges between o1 and o2 . Thus, proving a lower bound on
250 Small-World Phenomena in Inhomogeneous Random Graphs

distNRn (w) (o1 , o2 ) can be achieved by showing that the expected number of paths between
o1 and o2 having a given number of steps vanishes.
Fix Gn = NRn (w). When proving upper bounds on typical distances, we do need to
consider carefully the conditioning on distGn (o1 , o2 ) < ∞. Indeed, distGn (o1 , o2 ) = ∞
does actually occur with positive probability, for example when o1 and o2 are in two dis-
tinct connected components. To overcome this difficulty, we condition on Br(Gn ) (o1 ) and
Br(Gn ) (o2 ) in such a way that ∂Br(Gn ) (o1 ) 6= ∅ and ∂Br(Gn ) (o2 ) 6= ∅ hold, which, for
r large, makes the event that distNRn (w) (o1 , o2 ) = ∞ quite unlikely. In Section 6.4, we
prove the doubly logarithmic upper bound for τ ∈ (2, 3). Surprisingly, this proof is simpler
than that for logarithmic distances, primarily because we know that the shortest paths for
τ ∈ (2, 3) generally go from lower-weight vertices to higher-weight ones, until the hubs are
reached, and then they go back.
In Section 6.5 we investigate the variance of the number of paths between sets of ver-
tices in NRn (w), using an intricate path-counting method that estimates the sum, over
pairs of paths, of the probability that they are both occupied. For this, the precise joint
topology of these pairs of paths is crucial. We use a second-moment method to show that,
under the conditional laws, given Br(Gn ) (o1 ) and Br(Gn ) (o2 ) such that ∂Br(Gn ) (o1 ) 6= ∅
and ∂Br(Gn ) (o2 ) 6= ∅, whp there is a path of appropriate length linking ∂Br(Gn ) (o1 ) and
∂Br(Gn ) (o2 ). This proves the logarithmic upper bound when τ > 3. In each of our proofs,
we formulate the precise results as separate theorems and prove them under conditions that
are slightly weaker than those in Theorems 6.1, 6.2, and 6.3.

6.3 T YPICAL -D ISTANCE L OWER B OUNDS IN I NHOMOGENEOUS R ANDOM G RAPHS

In this section we prove lower bounds on typical distances. In Section 6.3.1 we prove the
lower bound in Theorem 6.1(a), first in the setting of Theorem 6.2; this is followed by the
proof of Theorem 6.1(a). In Section 6.3.2 we prove the doubly logarithmic lower bound on
distances for infinite-variance degrees for NRn (w) in Theorem 6.3.

6.3.1 L OGARITHMIC L OWER B OUND D ISTANCES FOR F INITE -VARIANCE D EGREES

In this subsection we prove a logarithmic lower bound on the typical distance in NRn (w):
Theorem 6.4 (Logarithmic lower bound on typical distances NRn (w)) Assume that
2
P
E[Wn2 ] v∈[n] wv
lim sup νn = ν ∈ (1, ∞), where νn = =P . (6.3.1)
n→∞ E[Wn ] v∈[n] wv

Then, for any ε > 0, with o1 , o2 chosen independently and uar from [n],
P(distNRn (w) (o1 , o2 ) ≤ (1 − ε) logν n) = o(1). (6.3.2)
The same result applies, under identical conditions, to GRGn (w) and CLn (w).
Proof The idea behind the proof of Theorem 6.4 is that it is quite unlikely for a path
containing far fewer than logν n edges to exist. In order to show this, we use a first-moment
bound and show that the expected number of occupied paths connecting the two vertices
6.3 Typical-Distance Lower Bounds in Inhomogeneous Random Graphs 251

~π
π0 = uπ1 π2 π3 π4 π5 π6 π7 π8 π9 π10 π11 π12 = v

Figure 6.3 A 12-step self-avoiding path ~π connecting vertices u and v.

chosen uar from [n] having length at most (1 − ε) logν n is o(1). We will now fill in the
details.
We set kn = d(1 − ε) logν ne. Then, conditioning on o1 , o2 gives
kn
1 X X
P(distNRn (w) (o1 , o2 ) ≤ kn ) = 2 P(distNRn (w) (u, v) = k)
n u,v∈[n] k=0
kn
1 1 X X
= + 2 P(distNRn (w) (u, v) = k). (6.3.3)
n n u,v∈[n] : u6=v k=1

In this section and in Section 6.5, we make use of path-counting techniques (see in par-
ticular Section 6.5.1). Here, we show that short paths are unlikely to exist by giving upper
bounds on the expected number of paths of various types. In Section 6.5.1 we give bounds
on the variance of the number of paths of various types, so as to show that long paths are
quite likely to exist. Such variance bounds are quite challenging, and here we give some
basics to highlight the main ideas in a much simpler setting.

Definition 6.5 (Paths in inhomogeneous random graphs) Let us fix k ≥ 1. A path ~π =

(π0 , . . . , πk ) of length k between vertices u and v is a sequence of vertices connecting
π0 = u to πk = v . We call a path ~π self-avoiding when it visits every vertex at most once,
i.e., πi 6= πj for every i 6= j . Let Pk (u, v) denote the set of k -step self-avoiding paths
between vertices u and v , and let Pk (u) denote the set of k -step self-avoiding paths starting
from u. We say that ~π is occupied when all edges in ~π are occupied in NRn (w). J

See Figure 6.3 for an example of a 12-step self-avoiding path between u and v .
When distNRn (w) (u, v) = k , there must be a path of length k such that all edges {πl , πl+1 }
are occupied in NRn (w), for l = 0, . . . , k − 1. The probability that the edge {πl , πl+1 } is
occupied in NRn (w) is equal to

1 − e−wπl wπl+1 /`n ≤ wπl wπl+1 /`n . (6.3.4)

For CLn (w) and GRGn (w), an identical upper bound holds, which explains why the proof
of Theorem 6.4 for NRn (w) applies verbatim to those models. By the union bound or
Boole’s inequality,

P(distNRn (w) (u, v) = k) ≤ P(∃~π ∈ Pk (u, v) : ~π occupied in NRn (w))

X
≤ P(~π occupied in NRn (w)). (6.3.5)
π ∈Pk (u,v)
~
252 Small-World Phenomena in Inhomogeneous Random Graphs

For any path ~π ∈ Pk (u, v),

k−1 k−1
Y Y wπl wπl+1
P(~π occupied) = P({πl , πl+1 } occupied in NRn (w)) ≤
s=0 l=0
`n
k−1 k−1
wπ0 wπk Y wπ2 l wu wv Y wπ2 l
= = . (6.3.6)
`n l=1
`n `n l=1
`n

Therefore
k−1
wu wv X wπ2 l
Y
P(distNRn (w) (u, v) = k) ≤
`n π ∈Pk (u,v) l=1
~
`n
k−1
!
wu wv Y X wπ2 l wu wv k−1
≤ = ν , (6.3.7)
`n l=1 π ∈[n] `n `n n
l

where νn is defined in (6.3.1). We conclude that

kn kn
1 1 X X wu wv k−1 1 `n X
P(distNRn (w) (o1 , o2 ) ≤ kn ) ≤ + 2 ν = + 2 ν k−1
n n u,v∈[n] k=1 `n n n n k=1 n
1 `n ν kn − 1
= + 2 n . (6.3.8)
n n νn − 1
By (6.3.1), lim supn→∞ νn = ν ∈ (1, ∞), so that, for n large enough, νn ≥ ν − δ > 1
while `n /n = E[Wn ] → E[W ] < ∞. Thus, since ν 7→ (ν k − 1)/(ν − 1) is increasing for
every integer k ≥ 0,

P(distNRn (w) (o1 , o2 ) ≤ kn ) ≤ O((ν + δ)kn /n) = o(1), (6.3.9)

when δ = δ(ε) > 0 is chosen such that (1 − ε)/ log(ν + δ) < 1, since kn = d(1 −
ε) logν ne. This completes the proof of Theorem 6.4.
The condition (6.3.1) is slightly weaker than Condition 1.1(c), which is assumed in The-
orem 6.2, as shown in Exercises 6.5 and 6.6. Exercise 6.7 extends the proof of Theorem
6.4 to show that (distNRn (w) (o1 , o2 ) − log n/ log νn )− is tight, where we write (x)− =
max{−x, 0}.
We close this section by extending the above result to settings where νn is not necessarily
bounded, the most interesting case being τ = 3:
Corollary 6.6 (Lower bound on typical distances for rank-1 random graphs for τ = 3)
Consider NRn (w), and let νn be given in (6.3.1). Then, for any ε > 0, with o1 , o2 chosen
independently and uar from [n],

P distNRn (w) (o1 , o2 ) ≤ (1 − ε) logνn n = o(1).

(6.3.10)

The same result applies, under identical conditions, to GRGn (w) and CLn (w).
The proof of Corollary 6.6 is left as Exercise 6.8. In the case where τ = 3 and [1−Fn ](x)
6.3 Typical-Distance Lower Bounds in Inhomogeneous Random Graphs 253

is, for a large range of x values, of order x−2 (which is stronger than τ = 3), it can be
expected that νn = Θ(log n), so that, in that case,
log n
P distNRn (w) (o1 , o2 ) ≤ (1 − ε) = o(1). (6.3.11)
log log n
Exercise 6.9 investigates the situation where τ = 3. Exercise 6.10 considers the case
τ ∈ (2, 3), where Corollary 6.6 unfortunately does not give particularly interesting results.

Lower Bound on Typical Distances for General IRGs: Proof of Theorem 6.1(a)
The proof of the upper bound in Theorem 6.1(a) is closely related to that in Theorem 6.4.
Note that
X k−1
Y κn (xπl , xπl+1 )
P(distIRGn (κn ) (u, v) = k) ≤ , (6.3.12)
π ,...,π ∈[n] l=0
n
1 k−1

where, by convention, π0 = u, πk = v and we can restrict the vertices to be distinct.

Therefore,
k−1
1 X Y
P(distIRGn (κn ) (o1 , o2 ) = k) ≤ κn (xπl , xπl+1 ). (6.3.13)
nk+2 π0 ,π1 ,...,πk ∈[n] l=0

If the above (k + 1)-dimensional discrete integral were to be replaced by a continuous

integral and κn by its limit κ then we would arrive at
Z k−1 k
1 1
Z Y Y
··· κ(xl , xl+1 ) µ(dxi ) = kT kκ 1k1 , (6.3.14)
n S S l=0 i=0
n
which is bounded from above by n1 kT κ kk . Repeating the bound in (6.3.9) would then prove
that, when ν = kT κ k > 1,
P(distIRGn (κn ) (o1 , o2 ) ≤ (1 − ε) logν n) = o(1). (6.3.15)
However, in the general case, it is not so easy to replace the (k + 1)-fold discrete sum in
(6.3.13) by a (k + 1)-fold integral. We next explain how nevertheless this can be done,
starting with the finite-type case.
In the finite-type case, where the types are given by [t], (6.3.12) turns into
k−1
1 X ni0 Y (n) ni
P(distIRGn (κ) (o1 , o2 ) = k) ≤ κ (il , il+1 ) l+1 , (6.3.16)
n i ,...,i ∈[t] n l=0 n
0 k

where ni denotes the number of vertices of type i ∈ [t] and where the probability that there
exists an edge between vertices of types i and j is equal to κ(n) (i, j)/n.
Under the conditions in Theorem 6.1(a), we have µi(n) = ni /n → µ(i) and κ(n) (i, j) →
κ(i, j) as n → ∞. This also implies that kT κn k → ν , where ν is largest eigenvalue of the
(n)
matrix M = (Mij )i,j∈[t] with Mij = κ(i, j)µ(j). Denoting Mij = κ(n) (i, j)nj /n →
Mij , we obtain
1
P(distIRGn (κ) (o1 , o2 ) = k) ≤ h(µ(n) )T , [M(n) ]k 1i, (6.3.17)
n
254 Small-World Phenomena in Inhomogeneous Random Graphs

where 1 is the all-1s vector. Obviously, since there are t < ∞ types,
√
h(µ(n) )T , [M(n) ]k 1i ≤ kM(n) kk kµ(n) kk1k ≤ kM(n) kk t. (6.3.18)
Thus,
√
t
P(distIRGn (κ) (o1 , o2 ) = k) ≤ kM(n) kk . (6.3.19)
n
We conclude that
P(distIRGn (κn ) (o1 , o2 ) ≤ (1 − ε) logνn n) = o(1), (6.3.20)
where νn = kM(n) k → ν . This proves Theorem 6.1(a) in the finite-type setting.
We next extend the proof of Theorem 6.1(a) to the infinite-type setting. Assume that the
conditions in Theorem 6.1(a) hold. Recall the bound in (3.3.20), which bounds κn from
above by κ̄m , which is of finite type. Then, use the fact that kT κ̄m k & kT κ k = ν > 1 to
conclude that P(distIRGn (κn ) (o1 , o2 ) ≤ (1 − ε) logν n) = o(1) holds under the conditions
of Theorem 6.1(a). This completes the proof of Theorem 6.1(a).

6.3.2 D OUBLY L OGARITHMIC L OWER B OUND FOR I NFINITE -VARIANCE W EIGHTS

In this subsection we prove a doubly logarithmic lower bound on the typical distances of
rank-1 random graphs for τ ∈ (2, 3). The main result we prove is the following theorem:
Theorem 6.7 (Log log lower bound on typical distances in rank-1 random graphs) Con-
sider NRn (w), where the weights w = (wv )v∈[n] satisfy Condition 1.1(a), and suppose
that there exist τ ∈ (2, 3) and c2 < ∞ such that, for all x ≥ 1,
[1 − Fn ](x) ≤ c2 x−(τ −1) . (6.3.21)
Then, for every ε > 0, with o1 , o2 chosen independently and uar from [n],
2 log log n
P distNRn (w) (o1 , o2 ) ≤ (1 − ε) = o(1). (6.3.22)
| log (τ − 2)|
The same result applies, under identical conditions, to GRGn (w) and CLn (w).
Below, we rely on the fact that (6.3.21) implies that the degree Dn = do of a uniform
vertex is such that (Dn )n≥1 is uniformly integrable, so that Condition 1.1(b) follows from
Condition 1.1(a) and (6.3.21).
We follow the proof of Theorem 6.4 as closely as possible. The problem with that proof
is that, under the condition in (6.3.21), νn is too large. Indeed, Exercise 6.10 shows that
the lower bound obtained in Corollary 6.6 is a constant, which is not very useful. What
goes wrong in that argument is that vertices with extremely high weights provide the main
contribution to νn , and hence to the upper bound as in (6.3.8). However, this argument
completely ignores the fact that it is quite unlikely that such a high-weight vertex appears in
a path. Indeed, as argued in (6.2.7), the probability that a vertex with weight w is directly
connected to a vertex having weight at least y is at most
X wwv 1{wv >y}
= w[1 − Fn? ](y), (6.3.23)
v∈[n]
`n
6.3 Typical-Distance Lower Bounds in Inhomogeneous Random Graphs 255

where
1 X
Fn? (x) = wi 1{wi ≤x} , (6.3.24)
`n i∈[n]

so that (6.3.23) is small when y is too large. The main contribution to νn , on the other hand,
comes from vertices having maximal weight of the order n1/(τ −1) .
This problem is resolved by a suitable truncation argument on the weights of vertices in
occupied paths, P which effectively removes these high-weight vertices. Therefore, instead of
obtaining νn = v∈[n] wv2 /`n , we obtain a version of this sum restricted to vertices having
a relatively small weight. Effectively, this means that we split the space of all paths into
good paths, i.e., paths that avoid high-weight vertices, and bad paths, which are paths that
use high-weight vertices.
We now present the details of this argument. We again start from
1 1 X
P(distNRn (w) (o1 , o2 ) ≤ kn ) = + 2 P(distNRn (w) (u, v) ≤ kn ). (6.3.25)
n n u,v∈[n] : u6=v

When distNRn (w) (u, v) ≤ kn , there exists an occupied path ~π ∈ Pk (u, v) for some k ≤ kn .
We fix an increasing sequence of numbers (bl )l≥0 that serve as truncation values for the
weights of vertices along our occupied path. We determine the precise values of (bl )l≥0 ,
which is a quite delicate procedure, below.

Definition 6.8 (Good and bad paths) Fix k ≥ 1. Recall the definitions of k -step self-
avoiding paths Pk (u, v) and Pk (u) from Definition 6.5. We say that a path ~π ∈ Pk (u, v) is
good when wπl ≤ bl ∧ bk−l for every l ∈ [k], and bad otherwise. Let GP k (u, v) be the set
of good paths in Pk (u, v), and let
BP k (u) = {~π ∈ Pk (u) : wπk > bk , wπl ≤ bl ∀l < k} (6.3.26)
denote the set of bad paths of length k starting in u. J
The condition wπl ≤ bl ∧ bk−l for every l = 0, . . . , k is equivalent to the statement that
wπl ≤ bl for l ≤ k/2, while wπl ≤ bk−l for k/2 ≤ l ≤ k . Thus, bl provides an upper
bound on the weight of the lth and (k − l)th vertices of the occupied path, ensuring that the
weights in it cannot be too large. See Figure 6.4 for a visualization of a good path and the
bounds on the weight of its vertices.
Let
Ek (u, v) = {∃~π ∈ GP k (u, v) : ~π occupied} (6.3.27)
denote the event that there exists a good path of length k between u and v .
Let Fk (u) be the event that there exists a bad path of length k starting from u, i.e.,
Fk (u) = {∃~π ∈ BP k (u) : ~π occupied}. (6.3.28)
Then, since distNRn (w) (u, v) ≤ kn implies that there either is a good path between vertices
u and v , or a bad path starting in u or in v , for u 6= v ,
kn
[
{distNRn (w) (u, v) ≤ kn } ⊆ Fk (u) ∪ Fk (v) ∪ Ek (u, v) , (6.3.29)
k=1
256 Small-World Phenomena in Inhomogeneous Random Graphs

π5 : w π5 ≤ b5 ~π
π4 : w π4 ≤ b4 π6 : w π6 ≤ b4

π3 : w π3 ≤ b3 π7 : w π7 ≤ b3

π2 : w π2 ≤ b2 π8 : w π8 ≤ b2

π1 : w π1 ≤ b1 π9 : w π9 ≤ b1

π0 = u:w π0 ≤ b0 π10 = v: w π10 ≤ b0

Figure 6.4 A 10-step good path connecting π0 = u and π10 = v and the upper
bounds on the weight of its vertices. Vertices with large weights are higher in the
figure.

so that, by Boole’s inequality and for u 6= v ,

kn
X
P(distNRn (w) (u, v) ≤ kn ) ≤ P(Fk (u)) + P(Fk (v)) + P(Ek (u, v)) .

(6.3.30)
k=1

In order to estimate the probabilities P(Fk (u)) and P(Ek (u, v)), we introduce some no-
tation. For b ≥ 0, define the truncated second moment
1 X 2
νn (b) = w 1{wi ≤b} (6.3.31)
`n i∈[n] i

to be the restriction of νn to vertices of weight at most b, and recall that Fn? (x) from
(6.3.24) denotes the distribution function of Wn? , the size-biased version of Wn . The fol-
lowing lemma gives bounds on P(Fk (u)) and P(Ek (u, v)) in terms of the tail distribution
function 1 − Fn? and νn (b), which, in turn, we bound using Lemmas 1.23 and 1.22, respec-
tively:
Lemma 6.9 (Truncated path probabilities) For every k ≥ 1, (bl )l≥0 with bl ≥ 0 and
l 7→ bl non-decreasing, in NRn (w), CLn (w), and GRGn (w),
k−1
Y
P(Fk (u)) ≤ wu [1 − Fn? ](bk ) νn (bl ) (6.3.32)
l=1

and
k−1
wu wv Y
P(Ek (u, v)) ≤ νn (bl ∧ bk−l ). (6.3.33)
`n l=1

When bl = ∞ for each l, the bound in (6.3.33) equals that obtained in (6.3.7).
6.3 Typical-Distance Lower Bounds in Inhomogeneous Random Graphs 257

Proof We start by proving (6.3.32). By Boole’s inequality,

P(Fk (u)) = P(∃~π ∈ BP k (u) : ~π occupied in NRn (w))
X
≤ P(~π occupied in NRn (w)). (6.3.34)
π ∈BP k (u)
~

By (6.3.6), (6.3.31), and (6.3.24),

k−1
X wu wπk Y wπ2 l
P(Fk (u)) ≤
π ∈BP k
~ (u)
`n l=1 `n
k−1
X wπk Y X wπ2 l
≤ wu ×
πk : wπk >bk
`n l=1 π : w ≤b
`n
l πl l

k−1
Y
= wu [1 − Fn? ](bk ) νn (bl ). (6.3.35)
l=1

The same bound applies to CLn (w) and GRGn (w).

The proof of (6.3.33) is similar. Indeed, by (6.3.6),
k−1 k−1
X wu wv Y wπ2 l wu wv Y
P(Ek (u, v)) ≤ ≤ νn (bl ∧ bk−l ), (6.3.36)
π ∈GP k
~ (u,v)
`n l=1
`n ` n l=1

since wπl ≤ bl ∧ bk−l . Now follow the steps in the proof of (6.3.32). Again the same bound
applies to CLn (w) and GRGn (w).
In order to apply Lemma 6.9 effectively, we use Lemmas 1.22 and 1.23 to derive bounds
on [1 − Fn? ](x) and νn (b):
Lemma 6.10 (Bounds on sums) Suppose that the weights w = (wv )v∈[n] satisfy Condition
1.1(a) and that there exist τ ∈ (2, 3) and c2 such that, for all x ≥ 1, (6.3.21) holds. Then,
there exists a constant c?2 > 0 such that, for all x ≥ 1,
[1 − Fn? ](x) ≤ c?2 x−(τ −2) , (6.3.37)
and there exists a cν > 0 such that, for all b ≥ 1,
νn (b) ≤ cν b3−τ . (6.3.38)
Proof The bound in (6.3.37) follows from Lemma 1.23, and the bound in (6.3.38) from
(1.4.13) in Lemma 1.22 with a = 2 > τ −1 when τ ∈ (2, 3). For both lemmas, the assump-
tions follow from (6.3.21). See Exercise 6.13 below for the bound on νn (b) in (6.3.38).
With Lemmas 6.9 and 6.10 in hand, we are ready to choose (bl )l≥0 and to complete the
proof of Theorem 6.7:
Proof of Theorem 6.7. Take kn = d2(1 − ε) log log n/| log (τ − 2)|e. By (6.3.25) and
(6.3.29),
kn h
1 X 2 X 1 X i
P(distNRn (w) (o1 , o2 ) ≤ kn ) ≤ + P(Fk (u)) + 2 P(Ek (u, v)) ,
n k=1 n u∈[n] n u,v∈[n] : u6=v
(6.3.39)
258 Small-World Phenomena in Inhomogeneous Random Graphs

where the term 1/n is due to o1 = o2 for which distNRn (w) (o1 , o2 ) = 0. We use Lemmas 6.9
and 6.10 to provide bounds on P(Fk (u)) and P(Ek (u, v)). These bounds are quite similar.
We first describe how we choose the truncation values (bl )l≥0 in such a way that [1 −
Fn? ](bk ) is small enough to make P(Fk (u)) small, and, for this choice of (bl )l≥0 , we show
that the contribution due to P(Ek (u, v)) is small. This means that it is quite unlikely that
u or v is connected to a vertex at distance k with too high a weight, i.e., with a weight at
least bk . At the same time, it is also unlikely that there is a good path ~π ∈ Pk (u, v) whose
weights are all small, i.e., for which wπk ≤ bk for every k ≤ kn , because k is too small to
achieve this.
By Lemma 6.9, we wish to choose bk in such a way that
k−1
1 X `n Y
P(Fk (u)) ≤ [1 − Fn? ](bk ) νn (bl ) (6.3.40)
n u∈[n] n l=0

1/(τ −2)
is small. Below (6.2.7), it was argued that we should choose bk such that bk ≈ bk−1 . In
order to make the contribution due to P(Fk (u)) small, however, we will take bk somewhat
larger. We now make this argument precise.
We take δ ∈ (0, τ − 2) sufficiently small and let
a = 1/(τ − 2 − δ) > 1. (6.3.41)
Take b0 = eA for some constant A ≥ 0 sufficiently large, and define (bl )l≥0 recursively by
l −l
bl = bal−1 , which implies that bl = ba0 = eA(τ −2−δ) . (6.3.42)
We will start from (6.3.30). By Lemma 6.9, we obtain an upper bound on P(Fk (u)) in terms
of factors νn (bl ) and [1 − Fn? ](bk ), which are bounded in Lemma 6.10. We start by applying
the bound on νn (bl ) to obtain
k−1 k−1 Pk−1
al
Y Y
νn (bl ) ≤ cν b3−τ
l = ck−1
ν eA(3−τ ) l=1

l=1 l=1
k (3−τ )/(a−1)
≤ ck−1
ν eA(3−τ )a /(a−1)
= ck−1
ν bk . (6.3.43)
Combining (6.3.43) with the bound on [1 − Fn? ](bk ) in Lemma 6.10 yields, for k ≥ 1,
−(τ −2)+(3−τ )/(a−1)
P(Fk (u)) ≤ c?2 wu ck−1
ν bk . (6.3.44)
Since 3 − τ + δ < 1 when τ ∈ (2, 3) and δ ∈ (0, τ − 2), we have
(τ − 2) − (3 − τ )/(a − 1) = (τ − 2) − (3 − τ )(τ − 2 − δ)/(3 − τ + δ)
= δ/(3 − τ + δ) > δ, (6.3.45)
so that, for k ≥ 1,
P(Fk (u)) ≤ c?2 wu ck−1
ν b−δ
k . (6.3.46)
As a result, for each δ > 0,
kn
1 XX 1 X ? X k−1 −δ X
P(Fk (u)) ≤ c2 wu cν bk = O(1) ck−1
ν b−δ
k ≤ ε, (6.3.47)
n u∈[n] k=1 n u∈[n] k≥1 k≥1
6.4 Doubly Logarithmic Upper Bound for Infinite-Variance Weights 259

by (6.3.42), taking A = A(δ, ε) sufficiently large.

Similarly, since bl ≥ 1, by (6.3.43) and the fact that l 7→ bl is non-decreasing,
k−1
wu wv Y wu wv k−1 2(3−τ )/(a−1)
P(Ek (u, v)) ≤ νn (bl ∧ bk−l ) ≤ c b , (6.3.48)
`n l=1 `n ν dk/2e
so that, again using that l 7→ bl is non-decreasing,
kn kn
X 1 X 1 X X wu wv 2(3−τ )/(a−1)
2
P (Ek (u, v)) ≤ 2
ck−1
ν bdk/2e
k=1
n u,v∈[n]
n k=1 u,v∈[n]
`n

`n 2(3−τ )/(a−1)
≤ 2
kn ckνn −1 bdkn /2e , (6.3.49)
n
by (6.3.42). We complete the proof by analyzing this bound.
Recall that k ≤ kn = d2(1 − ε) log log n/| log(τ − 2)|e. Take δ = δ(ε) > 0 small
enough that (τ − 2 − δ)−(kn +1)/2 ≤ (log n)1−ε/4 . Then, by (6.3.42),
−(kn +1)/2 1−ε/4
bdkn /2e ≤ eA(τ −2−δ) ≤ eA(log n) , (6.3.50)
and we conclude that
kn
X 1 X `n
2
P(Ek (u, v)) ≤ 2 kn ckνn exp 2A(3 − τ )(log n)1−ε/4 ) = o(1), (6.3.51)
k=1
n u,v∈[n]
n

since kn = O(log log n) and `n /n2 = Θ(1/n). This completes the proof of Theorem
6.7.

6.4 D OUBLY L OGARITHMIC U PPER B OUND FOR I NFINITE -VARIANCE W EIGHTS

In this section we prove the doubly logarithmic upper bound on typical distances in the case
where the asymptotic weight distribution has infinite variance. Throughout this section, we
assume that there exist τ ∈ (2, 3), β > 21 and c1 > 0 such that, uniformly in n and x ≤ nβ ,

[1 − Fn ](x) ≥ c1 x−(τ −1) . (6.4.1)

The bound in (6.4.1) corresponds to the lower bound in (6.2.5). The main result in this
section is the following theorem:
Theorem 6.11 (Doubly logarithmic upper bound on typical distance for τ ∈ (2, 3)) Con-
sider NRn (w), where the weights w = (wv )v∈[n] satisfy Conditions 1.1(a),(b) and (6.4.1).
Then, for every ε > 0, with o1 , o2 chosen independently and uar from [n],
2(1 + ε) log log n
P distNRn (w) (o1 , o2 ) ≤ distNRn (w) (o1 , o2 ) < ∞ → 1. (6.4.2)
| log (τ − 2)|
The same result applies, under identical conditions, to GRGn (w) and CLn (w).
The proof of Theorem 6.11 is organized as follows. We start by showing that the vertices
of weight larger than nβ are all connected to one another. Thus, these giant-weight vertices
260 Small-World Phenomena in Inhomogeneous Random Graphs

whp form a complete graph or clique. In the second step, we prove a doubly logarithmic
upper bound on the distance between a vertex and the set of giant-weight vertices. The latter
bound holds only when the vertex is in the giant component, a fact that we need to take into
account carefully. In the final step, we complete the proof of Theorem 6.11.

Giant-Weight Vertices Form a (Near-)Clique

Recall the definition of β > 12 used in (6.4.1). Let
Giantn = {a ∈ [n] : wa ≥ nβ } (6.4.3)
denote the set of giant-weight vertices. Further, for A ⊆ [n] and a graph Gn , we say that A
forms a clique in Gn when the edges a1 a2 are occupied for all a1 , a2 ∈ A. The next lemma
shows that, whp, Giantn forms a (near)-clique, in that it forms a clique whp for NRn (w)
and CLn (w), while the diameter is at most 2 for GRGn (w):
Lemma 6.12 (Giant-weight vertices form (near-)clique) Consider NRn (w) under the con-
ditions of Theorem 6.11,
2β
P(Giantn does not form a clique in NRn (w)) ≤ n2 e−n /`n
. (6.4.4)
The same result applies, under identical conditions, to CLn (w), while for GRGn (w) the
diameter of Giantn is at most 2.
Proof Let a1 , a2 ∈ Giantn , so that wa1 , wa2 ≥ nβ . There are at most |Giantn |2 ≤ n2
pairs of vertices in Giantn , so that
P(Giantn does not form clique in NRn (w))
≤ n2 max P(a1 a2 vacant in NRn (w)). (6.4.5)
a1 ,a2 ∈Giantn

The edge a1 a2 is vacant with probability

2β
P(a1 a2 vacant in NRn (w)) = e−wa1 wa2 /`n ≤ e−n /`n
, (6.4.6)
since wa ≥ nβ for every a ∈ Giantn . Multiplying out gives the result. For CLn (w),
P(a1 a2 vacant) = 0 since p(CL) β
a1 a2 = wa1 wa2 /`n ∧ 1 = 1 when wa1 , wa2 ≥ n with β > 2 ,
1

so the same proof applies.

For GRGn (w), we need to strengthen this analysis slightly. Indeed, for GRGn (w), for
all a1 , a2 ∈ Giantn ,
n2β 1
P(a1 a2 occupied in GRGn (w)) ≥ = 1 − Θ(n1−2β ) ≥ , (6.4.7)
`n + n2β 2
and the edge statuses are independent random variables. As a result, the diameter of Giantn
is bounded by the diameter of ERn (p) with p = 12 . Thus, it suffices to prove that the
diameter of ERn ( 12 ) is whp bounded by 2. For this, we note that, with p = 21 ,
P(diam(ERn (p)) > 2) ≤ n2 P(distERn (p) (1, 2) > 2). (6.4.8)
The event {distERn (p) (1, 2) > 2} implies that not all 2-step paths between 1 and 2 are
occupied, so that, by the independence of these 2-step paths,
P(distERn (p) (1, 2) > 2) = (1 − 14 )n−2 , (6.4.9)
6.4 Doubly Logarithmic Upper Bound for Infinite-Variance Weights 261

which, combined with (6.4.8), completes the proof.

Connections to Giantn Occur at Doubly Logarithmic Distances

Fix r ≥ 1 large. We next show that vertices that survive up to distance r have a high
log log n
probability of connecting to Giantn using a path of at most (1 + ε) | log (τ −2)|
edges:

Proposition 6.13 (Connecting to Giantn ) Consider NRn (w) under the conditions of The-
orem 6.11. Let u ∈ [n] be such that wu > 1. Then, there exist c, c?1 > 0 and η > 0 such
that
log log n ? η
P distNRn (w) (u, Giantn ) ≥ (1 + ε) ≤ ce−c1 wu . (6.4.10)
| log (τ − 2)|
Consequently, if Wr (u) = k∈∂Br(Gn ) (u) wk denotes the weight of vertices at graph dis-
P

tance r from u and Gn = NRn (w),

log log n ? η
P distNRn (w) (∂Br(Gn ) (u), Giantn ) ≥ (1 + ε) Br(Gn ) (u) ≤ ce−c1 Wr (u) .
| log (τ − 2)|
(6.4.11)
The same results apply, under identical conditions, to GRGn (w) and CLn (w).
Proof We start by proving (6.4.10). The bound in (6.4.10) is trivial unless wu is large,
which we assume from now on. Without loss of generality, we may assume that the weights
(w1 , . . . , wn ) are non-increasing.
Let x0 = u, and define, recursively,

x` = min{v ∈ [n] : x`−1 v occupied in NRn (w)}. (6.4.12)

Thus, x` is the maximal-weight neighbor of x`−1 in NRn (w). We stop the above recursion
when wx` ≥ nβ , since then x` ∈ Giantn . Recall the heuristic approach below (6.2.7),
which shows that a vertex with weight w is whp connected to a vertex with weight w1/(τ −2) .
We now make this precise.
We take a = 1/(τ − 2 + δ), where we choose δ > 0 small enough that a > 1. By
(6.3.24),

1{wv ≥wxa } wv /`n

X
P(wx`+1 < wxa` | (xs )s≤` ) = exp − wx`

`
v∈[n]

= exp − wx` [1 − Fn? ](wxa` ) .

(6.4.13)

We split the argument depending on whether wxa` ≤ nβ . First, when wxa` ≤ nβ , by (6.4.1)
and uniformly for x ≤ nβ ,
xn
[1 − Fn? ](x) ≥ [1 − Fn ](x) ≥ c?1 x−(τ −2) , (6.4.14)
`n
where, for n large enough, we can take c?1 = c1 /(2E[W ]). Therefore

P(wx`+1 < wxa` | (xs )s≤` ) ≤ exp − c?1 wx1−(τ −2)a

≤ exp − c?1 wxδ ` ,

`
(6.4.15)

since a = 1/(τ − 2 + δ) > 1 so that 1 − (τ − 2)a = aδ > δ .

262 Small-World Phenomena in Inhomogeneous Random Graphs

Second, when wxa` > nβ but wx` < nβ , we can use (6.4.14) for x = nβ to obtain

P(wx`+1 < nβ | (xs )s≤` ) ≤ exp − c?1 wx` n−β(τ −2) ≤ exp − c?1 nβ[1−(τ −2)]/a

≤ exp − c?1 nβδ/a .

(6.4.16)
Therefore, in both cases, and with η = δ(β/a ∧ 1),
? η
P(wx`+1 < nβ ∧ wxa` | (xs )s≤` ) ≤ e−c1 wx` . (6.4.17)
As a result, when x` is such that wx` is quite large, whp, we have wx`+1 ≥ wx` . This
produces, whp, a short path to Giantn . We now investigate the properties of this path.
Let the recursion stop at some integer time k . The key observation is that, when this
occurs, we must have that wx`+1 > wxa` for each ` ≤ k − 1 where k is such that wxk−1 ∈
[nβ/a , nβ ], and at the same time wxk ≥ nβ . Then, we conclude that the following facts are
true:
` `
(1) wx` ≥ wxa0 = wua for every ` ≤ k − 1;
(2) distNRn (w) (u, Giantn ) ≤ k .
k−1 k−1
By (1), wxk−1 ≥ wua , and wxk−1 ∈ [nβ/a , nβ ]. Therefore, wua ≤ nβ , which, in turn,
implies that
ak−1 ≤ β log n, or k − 1 ≤ (log log n + log β)(log a). (6.4.18)
Let kn = d(1+ε) log log n/| log (τ − 2)|e. By (1) and (2), when distNRn (w) (u, Giantn ) >
kn occurs, then there must exist an ` ≤ kn such that wx`+1 ≤ nβ ∧ wxa` . We conclude that

P distNRn (w) (u, Giantn ) ≥ kn
kn
X
≤ P(wx`+1 ≤ nβ ∧ wxa` , wxs > nβ ∧ wxas−1 ∀s ≤ `)
`=0
kn
E[1{wxs >nβ ∧wxas ∀s≤`} P(wx`+1 ≤ wxa` | (xs )s≤` )]
X
≤
`=0
kn kn
? η ? δa` ? η
X X
≤ E[e−c1 wx` ] ≤ e−c1 wu ≤ ce−c1 wu . (6.4.19)
`=0 `=0

This completes the proof of (6.4.10).

The proof of (6.4.11) is similar, and is achieved by conditioning on Br(Gn ) (u) and by
noting that we can interpret ∂Br(Gn ) (u) as a single vertex having weight given by Wr (u) =
P
(G )
v∈∂Br n (u) v
w . Indeed, for NRn (w), there is an edge between v and z with probability
−wv wz /`n
1−e , which is the same as the probability that a Poisson random variable with
parameter wv wz /`n is at least 1. Thus, there is an edge between z and ∂Br(Gn ) (u) when
at least one of these edges is present, the probability of which equals the probability that
a Poisson random variable with parameter wz v∈∂Br(Gn ) (u) wv /`n = wz Wr (u)/`n is at
P

least 1. It is here that we use the relation of the edge probabilities in NRn (w) and Poisson
random variables.
uv satisfy puv ≥ puv for all
By [V1, (6.8.12) and (6.8.13)], the edge probabilities p(CL) (CL) (NR)
6.4 Doubly Logarithmic Upper Bound for Infinite-Variance Weights 263

u, v ∈ [n], so the results immediately carry over to CLn (w). For Gn = GRGn (w),
however, for all z, v ∈ [n] \ V (Br(Gn ) (u)), we have
(GRG)
P zv ∈ E(Gn ) | Br(Gn ) (u) = 1 − (1 − p(GRG) ) ≥ 1 − e−pzv

zv

= 1 − e−wz wv /(`n +wv wz ) . (6.4.20)

This probability obeys bounds similar to those for NRn (w) as long as wv wz = o(n). Since
we are applying this to v = x`+1 and z = x` , we have that wx`+1 ≤ nβ and wx` ≤ nβ/a ,
so that wx`+1 wx` ≤ nβ(1+1/a) . Then, we can choose (β, a) such that β > 12 while at the
same time β(1 + 1/a) = β(τ − 1 + δ) < 1, which is possible since τ − 1 ∈ (1, 2). Now
the proof can be followed as for NRn (w).

Completion of the Proof of Theorem 6.11

To prove the upper bound in Theorem 6.11, for ε ∈ (0, 1) we take
l (1 + ε) log log n m
kn = , (6.4.21)
| log (τ − 2)|
so that it suffices to show that, for every ε > 0,
lim P(distNRn (w) (o1 , o2 ) ≤ 2kn | distNRn (w) (o1 , o2 ) < ∞) = 1. (6.4.22)
n→∞

Since
P(distNRn (w) (o1 , o2 ) ≤ 2kn | distNRn (w) (o1 , o2 ) < ∞)
P(distNRn (w) (o1 , o2 ) ≤ 2kn )
= , (6.4.23)
P(distNRn (w) (o1 , o2 ) < ∞)
this follows from the two bounds
lim sup P(distNRn (w) (o1 , o2 ) < ∞) ≤ ζ 2 , (6.4.24)
n→∞

lim inf P(distNRn (w) (o1 , o2 ) ≤ 2kn ) ≥ ζ 2 , (6.4.25)

n→∞

where ζ = µ(|C (o)| = ∞) > 0 is the survival probability of the branching-process ap-
proximation to the neighborhoods of NRn (w), as identified in Theorem 3.18. For (6.4.24),
we make the following split, for some r ≥ 1:
P(distNRn (w) (o1 , o2 ) < ∞)
≤ P(|∂Br(Gn ) (o1 )| > 0, |∂Br(Gn ) (o2 )| > 0, distNRn (w) (o1 , o2 ) > 2r)
+ P(distNRn (w) (o1 , o2 ) ≤ 2r). (6.4.26)
To prove (6.4.25), we fix r ≥ 1 and write
P(distNRn (w) (o1 , o2 ) ≤ 2kn )
≥ P(2r < distNRn (w) (o1 , o2 ) ≤ 2kn )
≥ P distNRn (w) (oi , Giantn ) ≤ kn , i = 1, 2, distNRn (w) (o1 , o2 ) > 2r

The first terms in (6.4.26) and (6.4.27) are the same. By Corollary 2.19(b), this term satisfies
P(|∂Br(Gn ) (o1 )| > 0, |∂Br(Gn ) (o1 )| > 0, distNRn (w) (o1 , o2 ) > 2r)
= P(|∂Br(Gn ) (o)| > 0)2 + o(1), (6.4.28)
which converges to ζ 2 when r → ∞.
We are left with showing that the second terms in (6.4.26) and (6.4.27) vanish when first
n → ∞ followed by r → ∞. By Corollary 2.20, P(distNRn (w) (o1 , o2 ) ≤ 2r) = o(1),
which completes the proof of (6.4.24).
For the second term in (6.4.27), we condition on Br(Gn ) (o1 ), and use that ∂Br(Gn ) (o1 ) is
measurable wrt Br(Gn ) (o1 ), to obtain
P distNRn (w) (o1 , Giantn ) > kn , |∂Br(Gn ) (o1 )| > 0

= E 1{|∂Br(Gn ) (o1 )|>0} P distNRn (w) (o1 , Giantn ) > kn | Br(Gn ) (o1 ) .
h i
(6.4.29)

By Proposition 6.13, with Wr (o1 ) =

P
k∈∂Br
(Gn ) wk ,
(o1 )
? η
P distNRn (w) (o1 , Giantn ) > kn | Br(Gn ) (o1 ) ≤ ce−c1 Wr (o1 ) .

(6.4.30)
By the local convergence in probability in Theorem 3.18 and Conditions 1.1(a),(b), we ob-
tain that, with (G, o) ∼ µ the local limit of NRn (w),
|∂Br(G) (o1 )|
d
X
Wr (o1 ) −→ Wi? , (6.4.31)
i=1

where Wi? i≥1
are iid random variables with distribution function F ? . Therefore,
P
Wr (o1 ) −→ ∞ (6.4.32)
P
when first n → ∞ followed by r → ∞, and we use that |∂Br(G) (o1 )| −→ ∞ as r → ∞,
since |∂Br(G) (o1 )| > 0. As a result, when first n → ∞ followed by r → ∞,

1{|∂Br(Gn ) (o1 )|>0} P distNRn (w) (o1 , Giantn ) > kn | Br(Gn ) (o1 ) −→
P
0. (6.4.33)
By Lebesgue’s Dominated Convergence Theorem [V1, Theorem A.1] this implies

E e−c1 Wr (o1 ) 1{|∂Br(Gn ) (o1 )|>0} → 0,

h ? η
i
(6.4.34)

when first n → ∞ followed by r → ∞. This proves (6.4.25), and thus completes the proof
of the upper bound in Theorem 6.3 for NRn (w). The proofs for GRGn (w) and CLn (w)
are similar, and are left as Exercise 6.14.

6.5 L OGARITHMIC U PPER B OUND FOR F INITE -VARIANCE W EIGHTS

In this section we give the proof of the logarithmic upper bound typical distances in rank-
1 random graphs with finite-variance weights stated in Theorem 6.2. For this, we use the
second-moment method to show that whp there exists a path of at most (1 + ε) logν n
6.5 Logarithmic Upper Bound for Finite-Variance Weights 265

edges between o1 and o2 , when o1 and o2 are such that ∂Br(Gn ) (o1 ), ∂Br(Gn ) (o2 ) 6= ∅ for
Gn = CLn (w). This proves Theorem 6.2 for CLn (w).
The extensions to NRn (w) and GRGn (w) follow by asymptotic equivalence of these
graphs, as discussed in [V1, Section 6.7]. Even though this shows that NRn (w), CLn (w),
and GRGn (w) all behave similarly, for our second-moment methods, we will need to be
especially careful about the model with which we are working.
To apply the second-moment method, we give a bound on the variance of the number of
paths of given lengths using path-counting techniques. This section is organized as follows.
In Section 6.5.1 we highlight our path-counting techniques. In Section 6.5.2 we apply these
methods to give upper bounds on typical distances for finite-variance weights. We also in-
vestigate the case where τ = 3, for which we prove that typical distances are bounded by
log n/ log log n under appropriate conditions.
The above path-counting methods can also be used to study general inhomogeneous ran-
dom graphs, as discussed in Section 6.5.3, where we prove Theorem 6.1(b) and use its proof
ideas to complete the proof of the law of large numbers for the giant in Theorem 3.19.

6.5.1 PATH -C OUNTING T ECHNIQUES

In this subsection we study path-counting techniques in the context of inhomogeneous ran-
dom graphs. We consider an inhomogeneous random graph on the vertices I with edge prob-
abilities pij = ui uj , for some weights (ui )i∈I satisfying ui ≤ 1 for all i ∈ I . Throughout
this section, Gn will denote this random graph.
√
We obtain CLn (w) by taking ui = wi / `n and I = [n], and we assume that wu wv /`n ≤
1 for all u, v ∈ [n]. The latter turns out to be a consequence of Conditions 1.1(a)–(c). Since
NRn (w) and GRGn (w) are closely related to CLn (w) when Conditions 1.1(a)–(c) hold,
this suffices for our purposes. We add the flexibility of choosing I ⊆ [n], since sometimes
it is convenient to exclude a subset of vertices, such as the high-weight vertices, from our
second-moment computations.
For a, b ∈ I and k ≥ 1, let

Nk (a, b) = #{~π ∈ Pk (a, b) : ~π occupied in Gn } (6.5.1)

denote the number of self-avoiding paths of length k between the vertices a and b, where we
recall that a path ~π is self-avoiding when it visits every vertex at most once (see Definition
6.5). Let
nk (a, b) = E[Nk (a, b)] (6.5.2)

denote the expected number of occupied paths of length k connecting a and b. Define
X k−1 X k−1
n̄k (a, b) = ua ub u2i , nk (a, b) = ua ub u2i , (6.5.3)
i∈I\{a,b} i∈Ia,b,k

where Ia,b,k is the subset of I in which a and b, as well as the k − 1 vertices with high-
est weights, have been removed. In Section 6.3 we proved implicitly an upper bound on
E[Nk (a, b)] of the form (see also Exercise 6.15)

nk (a, b) = E[Nk (a, b)] ≤ n̄k (a, b). (6.5.4)

266 Small-World Phenomena in Inhomogeneous Random Graphs

In this section we prove that nk (a, b) is a lower bound on nk (a, b) and use related bounds
to prove a variance bound on Nk (a, b).
Before stating our main result, we introduce some further notation. Let
X X
νI = u2i , γI = u3i (6.5.5)
i∈I i∈I

denote the sums of squares and third powers of (ui )i∈I , respectively. Our aim is to show
that whp paths of length k exist between the vertices a and b for an appropriate choice of k .
We do this by applying a second-moment method to Nk (a, b), for which we need a lower
bound on E[Nk (a, b)] and an upper bound on Var(Nk (a, b)) such that Var(Nk (a, b)) =
o(E[Nk (a, b)]2 ) (recall [V1, Theorem 2.18]), as in the next proposition, which is interesting
in its own right:
Proposition 6.14 (Variance of numbers of paths) For any k ≥ 1, a, b ∈ I and (ui )i∈I ,
E[Nk (a, b)] ≥ nk (a, b), (6.5.6)
while, assuming that νI > 1,
Var(Nk (a, b))
γ ν2 1 1 γI2 νI
≤ nk (a, b) + n̄k (a, b)2
I I
+ + + e k , (6.5.7)
νI − 1 ua ub ua ub (νI − 1)2
where γI γI νI 3 2 3
e2k γI /νI − 1 .

ek = 1 + 1+ (6.5.8)
ua νI ub νI νI − 1
Remark 6.15 (Path-counting and existence of k -step paths) Path-counting methods are
highly versatile. While in Proposition 6.14 we focus on Chung–Lu-type inhomogeneous
random graphs, we will apply them to general inhomogeneous random graphs with finitely
many types in Section 6.5.3 and to the configuration model in Section 7.3.3. For such
applications, we need to slightly modify our bounds, particularly those in Lemma 6.18,
owing to a slightly altered dependence structure between the occupation statuses of distinct
paths. J
We apply Proposition 6.14 in cases where √ E[Nk (a, b)] = nk (a, b) → ∞, by taking I to
be a large subset of [n] and ui to equal wi / `n for CLn (w). In this case, νI ≈ νn ≈ ν > 1.
In our applications of Proposition 6.14, the ratio n̄k (a, b)/nk (a, b) will be bounded, and
k 3 γI2 /νI3 = o(1), so that the term involving ek is an error term. The starting and ending
vertices a, b ∈ I will correspond to a union of vertices in [n] of quite large size; this relies
on the local limit stated in Theorem 3.18. As a result, γI /ua and γI /ub are typically small,
so that also
Var(Nk (a, b)) γI νI2 1 1 γI2 νI
≈ + + (6.5.9)
E[Nk (a, b)]2 νI − 1 ua ub ua ub (νI − 1)2
is small. As a result, whp there exists a path of k steps, as required. The choice of a, b, and
I is quite delicate, which explains why we formulated Proposition 6.14 in such generality.
We next prove Proposition 6.14, which, in particular for (6.5.7), requires some serious
combinatorial arguments.
6.5 Logarithmic Upper Bound for Finite-Variance Weights 267

Proof of Proposition 6.14. Recall Definition 6.5 and that Nk (a, b) is a sum of indicators:

1{~π occupied in Gn } ,
X
Nk (a, b) = (6.5.10)
π ∈Pk (a,b)
~

where Gn is our Chung–Lu-type inhomogeneous random graph with edge probabilities

pij = ui uj . If all the indicators in (6.5.10) were independent, the upper bound nk (a, b)
on Var(Nk (a, b)) in (6.5.7) would hold. The second term on the rhs of (6.5.7) thus accounts
for the positive dependence between the indicators of two paths being occupied.
We start by proving (6.5.6), which is relatively straightforward. By (6.5.10),
X X k
Y
E[Nk (a, b)] = P(~π occupied in Gn ) = uπl uπl+1
π ∈Pk (a,b)
~ π ∈Pk (a,b) l=0
~

X k−1
Y
= uπ0 uπk u2πl . (6.5.11)
π ∈Pk (a,b) l=1
~

Every ~π ∈ Pk (a, b) starts at π0 = a and ends at πk = b. Further, since ~π is assumed to be

self-avoiding,
X k−1
Y X∗ k−1
Y
u2πl = u2πl , (6.5.12)
π ∈Pk (a,b) l=1
~ π1 ,...,πk−1 ∈I\{a,b} l=1
P∗
where we recall that π1 ,...,π Pr ∈I denotes a sum over distinct indices. Each sum over πj
yields a factor that is at least i∈Ia,b,k u2i , which proves (6.5.6).

To compute Var(Nk (a, b)), we again start from (6.5.10), which yields
X
P(~π , ρ
~ occupied) − P(~π occ.)P(~

Var(Nk (a, b)) = ρ occ.) , (6.5.13)
~ ρ∈Pk (a,b)
π ,~

where we abbreviate {~π occupied in Gn } to {~π occupied} or {~π occ.} when no confusion
can arise.
For ~π , ρ
~, we denote the edges that the paths ~π and ρ~ have in common by ~π ∩ ρ ~. The
~ are independent precisely when ~π ∩ ρ
occupation statuses of ~π and ρ ~ = ∅, so that
X
Var(Nk (a, b)) ≤ P(~π , ρ
~ occupied). (6.5.14)
~ ~ ∈ Pk (a, b)
π, ρ
π∩ρ
~ ~ 6= ∅

~ \ ~π to be the edges in ρ
Define ρ ~ that are not part of ~π , so that
P(~π , ρ
~ occupied) = P(~π occupied)P(~
ρ occupied | ~π occupied)
k
Y Y
= uπl uπl+1 uē ue , (6.5.15)
l=0 ρ\~
e∈~ π

where, for an edge e = {x, y}, we write ē = x, e = y . For ~π = ρ

~,
P(~π , ρ
~ occupied) = P(~π occupied), (6.5.16)
268 Small-World Phenomena in Inhomogeneous Random Graphs

and this contributes nk (a, b) to Var(Nk (a, b)). Thus, from now on, we consider (~π , ρ
~ ) such
that ~π 6= ρ ~ 6= ∅.
~ and ~π ∩ ρ
The probability in (6.5.15) needs to be summed over all possible pairs of paths (~π , ρ ~)
with ~π 6= ρ
~ that share at least one edge. In order to do this effectively, we introduce some
notation.
Let l = |~π ∩ ρ~ | denote the number of edges in ~π ∩ ρ ~, so that l ≥ 1 precisely when
~π ∩ ρ~ 6= ∅. Note that l ∈ [k − 2], since ~π and ρ ~ are distinct self-avoiding paths of length
k between the same vertices a and b. Let k − l = |~ ρ \ ~π | ≥ 2 be the number of edges in ρ~
that are not part of ~π .
Let m denote the number of connected subpaths in ρ ~ \~π , so that m ≥ 1 whenever ~π 6= ρ
~.
Since π0 = ρ0 = a and πk = ρk = b, these subpaths start and end in vertices along the
path ~π . We can thus view the subpaths in ρ~ \ ~π as excursions of the path ρ ~ from the walk
~π . By construction, between two excursions there is at least one edge that ~π and ρ ~ have in
common. We next characterize this excursion structure:
Definition 6.16 ((Edge-)shapes of pairs of paths) Let m be the number of connected sub-
~ \ ~π . We define the shape of the pair (~π , ρ
paths in ρ ~ ) by
~ ) = (~xm+1 , ~sm , ~tm , ~om+1 , ~rm+1 ),
Shape(~π , ρ (6.5.17)
where
(1) ~ xm+1 ∈ Nm+1 0 , where xj ≥ 0 is the length of the subpath in ρ ~ ∩ ~π in between the
(j − 1)th and the j th subpath of ~π \ ρ ~. Here x1 ≥ 0 is the number of common edges in
the subpath of ρ ~ ∩ ~π that contains a, while xm+1 ≥ 0 is the number of common edges
in the subpath of ρ ~ ∩ ~π that contains b. For j ∈ {2, . . . , m}, xj ≥ 1;
(2) ~sm ∈ N , where sj ≥ 1 is the number of edges in the j th subpath of ~π \ ρ
m
~;
(3) ~tm ∈ Nm , where tj ≥ 1 is the number of edges in the j th subpath of ρ ~ \ ~π ;
(4) ~om+1 ∈ [m + 1]m+1 , where oj is the order of the j th common subpath in ρ ~ ∩ ~π of the
path ~π in ρ
~, e.g., o2 = 5 means that the second subpath that ~π has in common with ρ ~ is
the fifth subpath that ρ ~ has in common with ~π . Note that o1 = 1 and om+1 = m + 1,
since ~π and ρ~ start and end in a and b, respectively;
(5) ~rm+1 ∈ {0, 1}m+1 , where rj describes the direction in which the j th common subpath
~ ∩ ~π of the path ~π is traversed by ρ
in ρ ~, with rj = 1 when the direction is the same for
~π and ρ ~ and rj = 0 otherwise. Thus, r1 = rm+1 = 1. J
The information in Shape(~π , ρ ~ ) in Definition 6.16 is precisely what is needed to piece
together the topology of the two paths, except for information about the vertices involved
in ~π and ρ
~. The subpaths in Definition 6.16 of ρ ~ \ ~π avoid the edges in ~π but may contain
vertices that appear in ~π . This explains why we call the shapes edge-shapes. See Figure 6.5
for an example of a pair of paths (~π , ρ
~ ) and its corresponding shape.
We next discuss properties of shapes and use shapes to analyze Var(Nk (a, b)) further.
Recall that l = |~π ∩ ρ
~ | denotes the number of common edges in ~π and ρ
~, and m the number
of connected subpaths in ρ ~ \ ~π . Then
m+1
X m
X m
X
xj = l, sj = tj = k − l. (6.5.18)
j=1 j=1 j=1
6.5 Logarithmic Upper Bound for Finite-Variance Weights 269

t2
t1 ~π
t3 t4

x1 s1 x2 s2 x 3 s3 x 4 s4 x 5
r1 = 1 r2 = 1 r3 = 1 r4 = 0 r5 = 1
o1 = 1 o2 = 2 o3 = 4 o4 = 3 o5 = 5

ρ~
Figure 6.5 An example of a pair of paths (~π , ρ
~ ) and its corresponding shape.

Let Shapem,l denote the set of shapes corresponding to pairs of paths (~π , ρ
~ ) with m excur-
sions and l common edges so that (6.5.18) holds. Then,
k−2 X
X k−l X X
Var(Nk (a, b)) ≤ nk (a, b) + P(~π , ρ
~ occupied). (6.5.19)
l=1 m=1 σ∈Shapem,l ~ ~ ∈ Pk (a, b)
π, ρ
Shape(~ ~) = σ
π, ρ

We continue by investigating the structure of the vertices in (~π , ρ ~ ). Fix a pair of paths
(~π , ρ
~) such that Shape(~π , ρ~ ) = σ for some σ ∈ Shapem,l . There are k + 1 vertices in ~π .
Every subpath of ρ ~ \ π starts and ends in a vertex that is also in ~π . There are m connected
subpaths in ρ ~ \ ~π and l = |~π ∩ ρ
~ | common edges, so that there are at most k − l − m extra
vertices in ρ~ \ π . We conclude that the union of paths ~π ∪ ρ ~ visits at most 2k + 1 − l − m
distinct vertices and thus at most 2k − 1 − l − m vertices unequal to a or b.
Vertex a is in 1 + δx1 ,0 edges and vertex b is in 1 + δxm+1 ,0 edges. Of the other k − 1
vertices in ~π , precisely 2m − δx1 ,0 − δxm+1 ,0 are in three edges, the remaining k − 1 − 2m +
δx1 ,0 + δxm+1 ,0 vertices are in two or four edges. The remaining k − l − m vertices in ρ ~ \ ~π
that are not in ~π are in two edges. By construction ~π and ρ ~ are self-avoiding, so the k + 1
~, are distinct. In contrast, the k − l − m vertices in ρ
vertices in ~π , and those in ρ ~ \ ~π may
intersect those of ~π .
We summarize the vertex information of ~π and ρ ~ in the vector (v1 , . . . , v2k−1−l−m ) ∈
I 2k−1−l−m denoting the vertices in the union of ~π and ρ ~ that are unequal to a or b. We order
these vertices as follows:

B the vertices (v1 , . . . , v2m−a1 −am+1 ) are in three edges, in the same order as their appear-
ance in ~π , where we denote a1 = δx1 ,0 , am+1 = δxm+1 ,0 ;
B the vertices (v2m−a1 −am+1 +1 , . . . , vk−1 ) are the ordered vertices in ~π that are not in
three edges and are unequal to a or b, listed in the same order as in ~π ;
B the vertices (vk , . . . , v2k−1−l−m ) are the ordered vertices in ρ~ that are not in three edges
and are unequal to a or b, listed in the same order as in ρ ~.
Thus, vertices that are in four edges in ~π ∪ ρ~ occur twice in (v1 , . . . , v2k−1−l−m ). The vector
(v1 , . . . , v2k−1−l−m ) is precisely the missing information to reconstruct (~π , ρ ~ ) from σ :
Lemma 6.17 (Bijection of pairs of paths) There is a one-to-one correspondence between
the pairs of paths (~π , ρ
~ ) and the shape σ , combined with the vertices in (v1 , . . . , v2k−1−l−m )
as described above.
270 Small-World Phenomena in Inhomogeneous Random Graphs

Proof We have already observed that the shape σ of (~π , ρ ~ ) determines the intersection
structure of (~π , ρ
~ ) precisely, and, as such, it contains all the information needed to piece
together the two paths (~π , ρ
~ ), except for the information about the vertices involved in these
paths. Every vertex in ~π ∪ ρ ~ appears in two, three, or four edges. The vertices that occur
in three edges occur at the start of (v1 , . . . , v2k−1−l−m ) and the other vertices are those in
~π \ ρ ~ \ ~π , respectively. The above ordering ensures that we can uniquely determine
~ and ρ
where these vertices are located along the paths ~π and ρ ~.
Fix the pair of paths (~π , ρ ~ ) = σ for some σ ∈ Shapem,l , and
~ ) for which Shape(~π , ρ
recall that a1 = δx1 ,0 , am+1 = δxm+1 ,0 . Then, by (6.5.15) and Lemma 6.17,
2m−a1 −am+1 2k−1−l−m
1 1+am+1
Y Y
P(~π , ρ
~ occupied) = u1+a
a ub u3vs u2vt . (6.5.20)
s=1 t=2m−a1 −am+1 +1

Fix σ ∈ Shapem,l . We bound from above the sum over ~π , ρ ~ ∈ Pk (a, b) such that
~ ) = σ by summing (6.5.20) over all (v1 , . . . , v2k−1−l−m ) ∈ I 2k−1−l−m , to
Shape(~π , ρ
obtain
X
P(~π , ρ
~ occupied)
~ ~ ∈ Pk (a, b)
π, ρ
Shape(~ ~) = σ
π, ρ
u ν δx1 ,0 u ν δxm+1 ,0
a I b I
≤ ua ub γI2m νI2k−1−3m−l
γI γI
γ 1−δx1 ,0 γ 1−δxm+1 ,0
= n̄k (a, b)2 γI2(m−1) νI−3(m−1)−l
I I
. (6.5.21)
ua νI ub νI
Therefore, we arrive at
k−2 X
X k−l
Var(Nk (a, b)) ≤ nk (a, b) + n̄k (a, b)2 γI2(m−1) νI−3(m−1)−l
l=1 m=1
X γ 1−δx1 ,0 γ 1−δxm+1 ,0
I I
× . (6.5.22)
σ∈Shapem,l
ua νI ub νI

Equation (6.5.22) is our first main result on Var(Nk (a, b)), and we are left with inves-
tigating the combinatorial nature of the sums over the shapes. We continue to bound the
number of shapes in the following lemma:

Lemma 6.18 (Number of shapes) Fix m ≥ 1 and l ≤ k − 2.

(a) For m = 1, the number of shapes in Shapem,l with fixed a1 = δx1 ,0 , am+1 = δxm+1 ,0
equals l when a1 = am+1 = 0, 1 when a1 + am+1 = 1, and 0 when a1 = am+1 = 1.
(b) For m ≥ 2, the number of shapes in Shapem,l with fixed a1 = δx1 ,0 , am+1 = δxm+1 ,0
is bounded by
!2 !
m−1 k−l−1 l
2 (m − 1)! . (6.5.23)
m−1 m − a1 − am+1
6.5 Logarithmic Upper Bound for Finite-Variance Weights 271

Consequently, for all m ≥ 2,

(2k 3 )m−1
|Shapem,l | ≤ k . (6.5.24)
(m − 1)!
Proof Since r1 = rm+1 = 1, there are 2m−1 directions in which the common parts of the
pairs of paths can be traversed. Since there are m distinct parts, there are m + 1 common
parts (where the first and last common parts might contain no edges). The first part contains
vertex a and the last part contains vertex b. Thus, there are (m − 1)! orders ~om+1 of the
common parts when we have fixed the directions in which the paths can be traversed.
Recall that N denotes the positive integers and N0 = N ∪{0} the non-negative integers. In
counting the number of ~ xm+1 , ~sm , ~tm , we will repeatedly use the fact that there are a+b−1
b−1
Pb
possible sequences (y1 , . . . , yb ) ∈ Nb0 such that j=1 yj = a. This can be seen by repre-
senting a as a sequence of a ones and b − 1 zeros. There are a+b−1

b−1
such sequences. Then,
Pb
there is a bijection to sequences (y1 , . . . , yb ) ∈ N0 with j=1 yj = a found by letting yi
b

be the number of ones in between the (i − 1)th and ith chosen zero. Similarly, there are
a−1 Pb
possible sequences (y1 , . . . , yb ) ∈ Nb such that j=1 yj = a, since we can apply the

b−1
previous equality to (y1 − 1, . . . , yb − 1) ∈ Nb0 .
Using the above, we continue to count the Pm number of shapes. The number of vectors
(s1 , . . . , sm ) ∈ Nm such that sj ≥ 1 and j=1 sj = k − l equals k−l−1 m−1
. The same
Pm
applies to (t1 , . . . , tm ) ∈ N such that tj ≥ 1 and j=1 tj = k − l.
m
Pm+1
In counting the number of possible ~ xm+1 such that j=1 xj = l, we need to count their
numbers separately for x1 = 0 and x1 ≥ 1, and for xm+1 = 0 and xm+1 ≥ 1. When
m = 1, the number is zero when x1 = x2 = 0, since x1 = x2 = 0 implies that the paths
share no edges. Recall a1 , am+1 , and suppose that m − a1 − am+1 ≥ 0. Then, there are
!
l
m − a1 − am+1
possible choices of ~ xm+1 with fixed a1 = δx1 ,0 , am+1 = δxm+1 ,0 . The claims in part (a),
as well as that in (6.5.23) in part (b), follow by multiplying these bounds on the number of
choices for ~rm+1 , ~om+1 , ~sm , ~tm and ~
xm+1 .
To prove (6.5.24) in part (b), we continue by obtaining the bound
!2
k−l−1 1 (k − l − 1)! 2 k 2(m−1)
(m − 1)! = ≤ , (6.5.25)
m−1 (m − 1)! (k − l − m)! (m − 1)!

and, using that ab ≤ ab /b! and l ≤ k , we then have

!
l lm−a1 −am+1
≤ ≤ km . (6.5.26)
m − a1 − am+1 (m − a1 − am+1 )!
Therefore, the number of shapes in Shapem,l is bounded, for each l ≥ 1 and m ≥ 2, by

k 2(m−1) m (2k 3 )m−1

2m−1 k =k , (6.5.27)
(m − 1)! (m − 1)!
272 Small-World Phenomena in Inhomogeneous Random Graphs

as required.
We are now ready to complete the proof of Proposition 6.14:
Proof of Proposition 6.14. By (6.5.22) and applying Lemma 6.18, it suffices to show that the
sum of
2γ 2 m−1 γ 1−a1 γ 1−am+1
−l I I
|Shapem,l | × I
νI
(6.5.28)
νI3 ua νI ub νI
l

over l ∈ [k − 2], m ∈ [k − l], and a1 , am+1 ∈ {0, 1} (where, by convention, −1 = 0), is
bounded by the contribution in parentheses in the second term in (6.5.7).
We start with m = 1, for which we obtain that the sum of (6.5.28) over the other variables
l ∈ [k − 2] and a1 , am+1 ∈ {0, 1} equals
∞ ∞
1 1 X −(l−1) γI2 X −(l−1)
γI + νI + lν
ua ub l=1 ua ub νI l=1 I
γI νI2 1 1 γI2 νI
= + + , (6.5.29)
νI − 1 ua ub ua ub (νI − 1)2
where we use that, for a ∈ [0, 1),
∞
X ∞
X
a−l = a/(1 − a), la−(l−1) = a2 /(1 − a)2 . (6.5.30)
l=0 l=0

The terms in (6.5.29) are the first two terms that are multiplied by n̄k (a, b)2 on the rhs of
(6.5.7).
This leaves us to bound the contribution when m ≥ 2. Since (6.5.24) is independent of
l, we can start by summing (6.5.28) over l ∈ [k] and over a1 , am+1 ∈ {0, 1}, to obtain a
bound of the form (recall (6.5.8))
γI γI νI X (2k 3 )m−1 γI2 m−1
k 1+ 1+
ua νI ub νI νI − 1 m≥2 (m − 1)! νI3
γI γI νI 3 2 3
e2k γI /νI − 1 = ek .

=k 1+ 1+ (6.5.31)
ua νI ub νI νI − 1
After multiplication with n̄k (a, b)2 , the term in (6.5.31) is the same as the last term appearing
on the rhs of (6.5.7). Summing the bounds in (6.5.29) and (6.5.31) proves (6.5.7).
Exercises 6.16–6.20 study various consequences of our path-counting techniques. In the
next subsection, we use Proposition 6.14 to prove lower bounds on graph distances.

6.5.2 L OGARITHMIC DISTANCE BOUNDS FOR FINITE - VARIANCE WEIGHTS

In this subsection we prove that, when Conditions 1.1(a)–(c) hold, two uniform vertices that
are conditioned to be connected are whp within a distance (1 + ε) logν n:
Theorem 6.19 (Logarithmic upper bound on typical distances for finite-variance weights)
Consider NRn (w), where the weights w = (wv )v∈[n] satisfy Conditions 1.1(a)–(c) with
6.5 Logarithmic Upper Bound for Finite-Variance Weights 273

ν = E[W 2 ]/E[W ] ∈ (1, ∞). Then, for any ε > 0, with o1 , o2 chosen independently and
uar from [n],
P(distNRn (w) (o1 , o2 ) ≤ (1 + ε) logν n | distNRn (w) (o1 , o2 ) < ∞) = 1 + o(1). (6.5.32)
The same result applies, under identical conditions, to GRGn (w) and CLn (w).
Theorem 6.19 provides the upper bound on the typical distances that matches Theorem
6.4, and together these two theorems prove Theorem 6.2. The remainder of this subsection
is devoted to the proof of Theorem 6.19.

Organization of the Proof of Theorem 6.19

In the proof of Theorem 6.4 it is convenient to work with CLn (w), since Proposition 6.14
is designed for that setting. As mentioned before, for GRGn (w) and NRn (w) the results
will follow from the asymptotic equivalence arguments in [V1, Section 6.7]. Indeed, [V1,
Corollary 6.20 and (6.8.13)] imply that GRGn (w) and NRn (w) are asymptotically equiv-
alent to CLn (w) when Conditions 1.1(a)–(c) hold. Thus, we fix Gn = CLn (w) from now
on.
We prove Theorem 6.19 by combining a branching-process comparison of local neigh-
borhoods, as given by Theorem 3.18, to a second-moment method as in Proposition 6.14
regarding the number of paths of a given length. More precisely, we fix r ≥ 1 large, re-
call that Br(Gn ) (o1 ) and Br(Gn ) (o2 ) denote the rooted graphs of vertices at distance at most r
from o1 and o2 respectively, and let ∂Br(Gn ) (o1 ) and ∂Br(Gn ) (o2 ) denote the sets of vertices
at distance precisely equal to r.
We condition on Br(Gn ) (o1 ) and Br(Gn ) (o2 ) such that ∂Br(Gn ) (o1 ) 6= ∅ and ∂Br(Gn ) (o2 ) 6=
∅. By the local convergence in Theorem 3.18, the probability of the latter event is close to
ζr2 , where ζr = µ(∂Br(G) (o) 6= ∅) is the probability that the local limit (G, o) ∼ µ of
CLn (w) survives to generation r. Corollary 2.19 proves the asymptotic independence of
the neighborhoods of o1 and o2 , respectively. Then ζr & ζ when r → ∞, and, since ν > 1,
conditional on |∂Br(G) (o)| > 0, we have |∂Br(G) (o)| ≥ M whp, for any M and as r → ∞.
This explains the branching-process approximation.
We now state the precise branching-process approximation result that we rely upon, and
link the second-moment method √ for the number of paths, as proved in Proposition 6.14, to
our setting. We take ui = wi / `n ,
a = ∂Br(Gn ) (o1 ), b = ∂Br(Gn ) (o2 ), (6.5.33)
and, for some ε > 0,
1−ε X p
ua = √ wv = (1 − ε)Wr (o1 )/ `n ,
`n (Gn )
v∈∂Br (o1 )
(6.5.34)
1−ε X p
ub = √ wv = (1 − ε)Wr (o2 )/ `n .
`n (Gn )
v∈∂Br (o2 )

The 1 − ε factors in (6.5.34) are due to the fact that the edge probabilities in the graph on
{a, b} ∪ [n] \ (Br(Gn ) (o1 ) ∪ Br(Gn ) (o2 )) are not exactly of the form pij = ui uj . Indeed, for
274 Small-World Phenomena in Inhomogeneous Random Graphs

i, j ∈ {a, b}, the edge probabilities are slightly different. When Conditions 1.1(a)–(c) hold,
however, the bound almost holds for a and b, which explains the factors 1 − ε.
We formalize the above ideas in the following lemma:

Lemma 6.20 (Branching-process approximation) Under the conditions of Theorem 6.19,

as n → ∞,
Zr(1) Zr(2)
!
d
X X
?(1) ?(2)
(Wr (o1 ), Wr (o2 )) −→ W (j), W (j) , (6.5.35)
j=1 j=1

where (Zm(1)
, Zm
(2)
)m≥0 are the generation sizes of two independent unimodular branching
processes as in Theorem 3.18, and (W ?(1) (j))j≥1 and (W ?(2) (j))j≥1 are two independent
sequences of iid random variables with distribution F ? .

Proof It is now convenient to start with Gn = NRn (w). By Corollary 2.19, |∂Br(Gn ) (o1 )|
and |∂Br(Gn ) (o2 )| jointly converge in distribution to (Zr(1) , Zr(2) ), which are independent
generation sizes of the local limit of NRn (w) as in Theorem 3.18. Each of the individu-
als in ∂Br(Gn ) (o1 ) and ∂Br(Gn ) (o2 ) receives a mark Mi with weight wMi . By Proposition
3.16, these marks are iid random variables conditioned to be unthinned, where whp no ver-
P|∂B (Gn ) (oi )| ?(i)
tex in Br(Gn ) (o1 ) ∪ Br(Gn ) (o2 ) is thinned. Then, Wr (oi ) = j=1r Wn (j), where
d
(Wn?(i) (j))j≥1 are iid copies of Wn? . By Conditions 1.1(a),(b), Wn? −→ W ? , so that also
d P|∂B (G) (o )|
Wr (oi ) −→ j=1r i W ?(i) (j).
The joint convergence follows in a similar fashion, now using local convergence in prob-
ability. As discussed before, the above results extend trivially to GRGn (w) and CLn (w)
by asymptotic equivalence.

Second-Moment Method and Path Counting

We again focus on Gn = CLn (w). Fix k = kn = d(1 + ε) logν ne − 2r. We next
present the details of the second-moment method, which shows that whp, on the event
that ∂Br(Gn ) (o1 ) 6= ∅ and ∂Br(Gn ) (o2 ) 6= ∅, there exists a path of length kn connecting
∂Br(Gn ) (o1 ) and ∂Br(Gn ) (o2 ). This ensures that, on the event that ∂Br(Gn ) (o1 ) 6= ∅ and
∂Br(Gn ) (o2 ) 6= ∅, the event distNRn (w) (o1 , o2 ) ≤ kn occurs whp. √
To show that distNRn (w) (o1 , o2 ) ≤ kn −2r occurs whp, we take ui = wi / `n . We aim to
apply Proposition 6.14, for which we fix K ≥ 1 sufficiently large and take a = ∂Br(Gn ) (o1 ),
b = ∂Br(Gn ) (o2 ), and

Ia,b = {i ∈ [n] : wi ≤ K} \ (Br(Gn ) (o1 ) ∪ Br(Gn ) (o2 )). (6.5.36)

In order to apply Proposition 6.14, we start by relating the random graph obtained by
restricting CLn (w) to the vertex set Ia,b to the√ model on the vertex set Ia,b ∪ {a, b} with
edge probabilities pij = ui uj with ui = wi / `n for i ∈ Ia,b and ua , ub given by (6.5.34).
For this, we note that for i, j ∈ Ia,b , this equality holds by definition of CLn (w). We next
take i = a and j ∈ Ia,b ; the argument for i = b and j ∈ Ia,b is identical.
The conditional probability that j ∈ Ia,b is connected to at least one vertex in ∂Br(Gn ) (o1 ),
6.5 Logarithmic Upper Bound for Finite-Variance Weights 275

given Br(Gn ) (o1 ), equals

Y wv wj
1− 1− . (6.5.37)
(G )
`n
v∈∂Br n (o1 )

Since, for all xi ∈ [0, 1],

Y X 1X
(1 − xi ) ≤ 1 − xi + xi xj ,
i i
2 i6=j

we obtain that
Y wv wj X wv wj X wv1 wv2 wj2
1− 1− ≥ −
(Gn )
`n (Gn )
`n (Gn )
2`2n
v∈∂Br (o1 ) v∈∂Br (o1 ) v1 ,v2 ∈∂Br (o1 )
2
Wr (o1 )wj Wr (o1 ) wj2
≥ − . (6.5.38)
`n 2`2n
√
By Conditions 1.1(a)–(c), wj = o( n) (recall Exercise 1.8), and Wr (o1 ) is a tight sequence
of random variables (see also Lemma 6.21 below), so that, whp for any ε > 0,
Y wv wj Wr (o1 )wj
1− 1− ≥ (1 − ε) . (6.5.39)
(Gn )
`n `n
v∈∂Br (o1 )

With the choices in (6.5.34), we see that our graph is bounded below by that studied
in Proposition 6.14. By the above description, it is clear that all our arguments will be
conditional, given Br(Gn ) (o1 ) and Br(Gn ) (o2 ). For this, we define Pr to be the conditional
distribution given Br(Gn ) (o1 ) and Br(Gn ) (o2 ), and we let Er and Varr be the corresponding
conditional expectation and variance.
In order to apply Proposition 6.14, we investigate the quantities appearing in it:
Lemma 6.21 (Parameters in path counting) Under the conditions of Theorem 6.19, and
conditioning on Br(Gn ) (o1 ) and Br(Gn ) (o2 ) with ∂Br(Gn ) (o1 ) 6= ∅, ∂Br(Gn ) (o2 ) 6= ∅, and
with a = ∂Br(Gn ) (o1 ), b = ∂Br(Gn ) (o2 ), for k = kn = d(1 + ε) logν ne − 2r,
P
nk (a, b) −→ ∞, n̄k (a, b) = (1 + oP (1))nk (a, b), (6.5.40)
and, as n → ∞,
Varr (Nk (a, b)) Kν 2 1 1 K 2ν 2
≤ √ + √ + + oP (1). (6.5.41)
Er [Nk (a, b)]2 ν−1 `n ua `n ub (ν − 1)`n ua ub
Proof By (6.5.3),
n̄k (a, b) νIa,b k−1
nk (a, b) = ua ub νIk−1 , and = . (6.5.42)
a,b,k
nk (a, b) νIa,b,k
We start by investigating νI . Denote
E[W 2 1{W ≤K} ]
ν(K) = . (6.5.43)
E[W ]
276 Small-World Phenomena in Inhomogeneous Random Graphs

Then, by (6.5.36) and since Br(Gn ) (o1 ) and Br(Gn ) (o2 ) contain a finite number of vertices,
P
νIa,b −→ ν(K). (6.5.44)

The same applies to νIa,b,k . Then, with K > 0 chosen sufficently large that ν(K) ≥ ν −ε/2
and with k = kn = d(1 + ε) logν ne − 2r,
Wr (o1 )Wr (o2 ) (1+ε) log νIa,b,k / log ν−1 P
nk (a, b) = ua ub νIk−1 = n −→ ∞, (6.5.45)
a,b,k
`n
when K and n are large that (1 + ε)ν(K)/ν > 1. This proves the first property in (6.5.40).
To prove the second property in (6.5.40), we note that the set Ia,b,k is obtained from Ia,b
by removing the k vertices with highest weight. Since wi ≤ K for all i ∈ I (recall (6.5.36)),
νIa,b ≤ νIa,b,k + kK/`n . Since k ≤ A log n, we therefore arrive at

n̄k (a, b) k−1 2

≤ ek K/(`n νIa,b,k ) −→ 1,
P
≤ 1 + kK/(`n νIa,b,k ) (6.5.46)
nk (a, b)
as required.
To prove (6.5.41), we rely on Proposition 6.14. We have already shown that nk (a, b) =
Er [Nk (a, b)] −→ ∞, so that the first term on the rhs of (6.5.7) is oP (Er [Nk (a, b)]2 ). Fur-
P

ther, by (6.5.36),
νI K
γI ≤ νI (max ui ) ≤ √ , (6.5.47)
i∈I `n
so that, for k ≤ A log n with A > 1 fixed,
γI γI 2k3 γI2 /νI3
1+ 1+ k(e − 1) = oP (1). (6.5.48)
ua νI ub νI
Substituting these bounds into (6.5.41) and using (6.5.40) yields the claim.

Completion of the Proof of Theorem 6.19

Now we are are ready to complete the proof of Theorem 6.19. Recall that kn = kn (ε) =
d(1 + ε) logν ne, and fix Gn = CLn (w). We need to show that
P(kn < distCLn (w) (o1 , o2 ) < ∞) = o(1). (6.5.49)

Indeed, (6.5.49) implies that P(distCLn (w) (o1 , o2 ) > kn | distCLn (w) (o1 , o2 ) < ∞) = o(1),
since P(distCLn (w) (o1 , o2 ) < ∞) → ζ 2 > 0 by Theorem 3.20.
We rewrite

P(kn < distCLn (w) (o1 , o2 ) < ∞) (6.5.50)

= P(kn < distCLn (w) (o1 , o2 ) < ∞, ∂Br (o1 ) 6= ∅, ∂Br (o2 ) 6= ∅)
(Gn ) (Gn )

≤ P(Nkn −2r (∂Br(Gn ) (o1 ), ∂Br(Gn ) (o2 )) = 0, ∂Br(Gn ) (o1 ) 6= ∅, ∂Br(Gn ) (o2 ) 6= ∅)
≤ E Pr (Nkn −2r (∂Br(Gn ) (o1 ), ∂Br(Gn ) (o2 )) = 0)1{∂Br(Gn ) (o1 )6=∅,∂Br(Gn ) (o2 )6=∅} ,
h i

where we recall that Pr is the conditional distribution given Br(Gn ) (o1 ) and Br(Gn ) (o2 ).
6.5 Logarithmic Upper Bound for Finite-Variance Weights 277

By Lemma 6.21 and the Chebychev inequality [V1, Theorem 2.18], the conditional prob-
ability of {distCLn (w) (o1 , o2 ) > kn }, given Br(Gn ) (o1 ), Br(Gn ) (o2 ), is at most
Varr (Nkn −2r (a, b)) Kν 2 1 1 K 2ν 2
≤ √ + √ + + oP (1). (6.5.51)
Er [Nkn −2r (a, b)]2 ν−1 `n ua `n ub (ν − 1)`n ua ub
When ∂Br(Gn ) (o1 ) 6= ∅ and ∂Br(Gn ) (o2 ) 6= ∅, by (6.5.34) and as n → ∞,
(1) (2)
Zr Zr
1 1 X −1 X −1
√ +√ W ?(1) (j) W ?(2) (j)
P P
−→ (1 − ε) + (1 − ε) −→ 0,
`n ua `n ub j=1 j=1
(6.5.52)
when r → ∞. Therefore, with first n → ∞ followed by r → ∞,

Pr Nk−2r (a, b) = 0 | ∂Br(Gn ) (o1 ) 6= ∅, ∂Br(Gn ) (o2 ) 6= ∅ −→ 0,
P
(6.5.53)

and, by Lebesgue’s Dominated Convergence Theorem [V1, Theorem A.1],

P(distCLn (w) (o1 , o2 ) > kn , ∂Br(Gn ) (o1 ) 6= ∅, ∂Br(Gn ) (o2 ) 6= ∅) → 0, (6.5.54)
when first n → ∞ followed by r → ∞, which completes the proof.

Distances for the Critical Case τ = 3

When τ = 3, wi is approximately c(n/i)1/2 . It turns out that this changes the typical
distances only by a doubly logarithmic factor:
Theorem 6.22 (Typical distances in critical τ = 3 case) Consider NRn (w), where the
weights w = (wv )v∈[n] satisfy Condition 1.1(a),(b), and consider that there exist constants
c2 > c1 > 0 and β > 0 such that, for all x ≤ nβ ,
[1 − Fn ](x) ≥ c1 /x2 , (6.5.55)
and, for all x ≥ 0,
[1 − Fn ](x) ≤ c2 /x2 . (6.5.56)
Then, with o1 , o2 chosen independently and uar from [n], and conditional on o1 ←→ o2 ,
distNRn (w) (o1 , o2 ) log log n P
−→ 1. (6.5.57)
log n
The same result applies, under identical conditions, to GRGn (w) and CLn (w).
The lower bound in Theorem 6.22 has been stated already in Corollary 6.6. The upper
bound can be proved using the path-counting techniques in Proposition 6.14 and adaptations
of it. We now sketch this proof.
Again fix Gn = CLn (w). Let η ∈ (0, 1) and let
1−η
βn = eνn . (6.5.58)
Define the core of CLn (w) as follows:
Coren = {i ∈ [n] : wi ≥ βn }. (6.5.59)
The proof of Theorem 6.22 follows from the following two lemmas:
278 Small-World Phenomena in Inhomogeneous Random Graphs

Lemma 6.23 (Typical distances in core) Under the conditions of Theorem 6.22, let o01 , o02 ∈
Coren be chosen with probabilities proportional to their weights, i.e.,
wj
P(o0i = j) = P , (6.5.60)
v∈Coren wv

and let Hn0 be the graph distance between o01 , o02 in Coren . Then, for any ε > 0, there exists
an η ∈ (0, 1) in (6.5.58) such that
(1 + ε) log n
P Hn0 ≤ → 1. (6.5.61)
log log n
Lemma 6.24 (From periphery to core) Under the conditions of Theorem 6.22, let o1 , o2 be
two vertices chosen uar from [n]. Then, for any η > 0 in (6.5.58),
P(distCLn (w) (o1 , Coren ) ≤ νn1−η , distCLn (w) (o2 , Coren ) ≤ νn1−η ) → ζ 2 . (6.5.62)
Further, CLn (w), GRGn (w), and NRn (w) are asymptotically
√ equivalent when restricted
to the edges in [n] × {v : wv ≤ βn } for any βn = o( n).
Proof of Theorem 6.22 subject to Lemmas 6.23 and 6.24. To see that Lemmas 6.23 and 6.24
imply Theorem 6.22, we note that
distCLn (w) (o1 , o2 ) ≤ distCLn (w) (o1 , Coren ) + distCLn (w) (o2 , Coren )
+ distCLn (w) (o01 , o02 ), (6.5.63)
where o01 , o02 ∈ Coren are the vertices in Coren found first in the breadth-first search
from o1 and o2 , respectively. By the asymptotic equivalence of CLn (w), GRGn (w), and
NRn (w) on [n] × {v : wv ≤ βn }, stated in Lemma 6.24, whp distCLn (w) (o1 , Coren ) =
distNRn (w) (o1 , Coren ) and distCLn (w) (o2 , Coren ) = distNRn (w) (o2 , Coren ), so we can
work with NRn (w) outside Coren . Then, by Proposition 3.16, o01 , o02 ∈ Coren are cho-
sen with probabilities proportional to their weights, as assumed in Lemma 6.23.
Fix kn = d(1 + ε) log n/ log log ne. We conclude that, when n is sufficiently large that
νn1−η ≤ εkn /4,
P(distCLn (w) (o1 , o2 ) ≤ kn )
≥ P distCLn (w) (oi , Coren ) ≤ νn1−η , i = 1, 2

(6.5.64)
×P distCLn (w) (o01 , o02 ) νn1−η , i

≤ (1 − ε/2)kn | distCLn (w) (oi , Coren ) ≤ = 1, 2 .
By Lemma 6.24, the first probability converges to ζ 2 and by Lemma 6.23 the second proba-
bility converges to 1. We conclude that
log n
P distCLn (w) (o1 , o2 ) ≤ (1 + ε) → ζ 2. (6.5.65)
log log n
Since also P(distCLn (w) (o1 , o2 ) < ∞) → ζ 2 , this completes the proof of Theorem 6.22.

The proofs of Lemmas 6.23 and 6.24 follow from path-counting techniques similar to
those carried out earlier. Exercises 6.21–6.24 complete the proof of Lemma 6.23. Exercise
6.25 asks you to verify the asymptotic equivalence stated in Lemma 6.24, while Exercise
6.26 asks you to give the proof of (6.5.62) in Lemma 6.24.
6.5 Logarithmic Upper Bound for Finite-Variance Weights 279

6.5.3 D ISTANCES AND G IANTS FOR I NHOMOGENEOUS R ANDOM G RAPHS

In this subsection we use the path-counting techniques in Section 6.5.1 to give some miss-
ing proofs for general inhomogeneous random graphs. We assume that (κn ) is a graphical
sequence of kernels with limit κ that is irreducible, and with ν = kT κ k ∈ (1, ∞). We start
by proving Theorem 6.1(b), and then we use it to prove Theorem 3.19.

Logarithmic Upper Bound on Distances in IRGs in Theorem 6.1(b)

Without loss of generality, we may assume that κn is bounded, i.e., supn supx,y κn (x, y) <
∞. Indeed, we can always stochastically dominate an IRGn (κn ) with an unbounded κn
from below by an IRGn (κn ) with a bounded kernel that approximates it arbitrarily well.
Since graph distances increase by decreasing κn , if we prove Theorem 6.1(b) in the bounded
case, then the unbounded case will follow immediately. Further, we can approximate a
bounded κn from above by a finite-type kernel. Therefore, it now suffices to prove The-
orem 6.1(b) for a kernel of finite type.
Our arguments make use of the first extension of the path-counting arguments in Propo-
sition 6.14 that is beyond the rank-1 setting. This shows again the versatility of the path-
counting methods. The key is that the expected number of shortcuts from a path ~π of length
` is approximately bounded by kT κn k` /n, which is small when ` is not too large. This
allows us to handle the complicated sums over shapes that occur beyond the rank-1 setting.
Let us set up the finite-types case. For u, v ∈ [n], we let κn (u, v) = κ(n) (iu , iv ), where
iu ∈ [t] denotes the type of vertex u ∈ [n]. We assume that, for all i, j ∈ [t],
lim κ(n) (i, j) = κ(i, j), (6.5.66)
n→∞

and, for all i ∈ [t],

1
lim µn (i) = lim #{v ∈ [n] : iv = i} = µ(i). (6.5.67)
n→∞ n→∞ n

In this case, kT κn k is the largest eigenvalue of the matrix M(n) i,j = κ (i, j)µn (j), which
(n)

converges to the largest eigenvalue of the matrix Mi,j = κ(i, j)µ(j) and equals ν =
kT κ k ∈ (1, ∞), by assumption. Without loss of generality, we may assume that µ(i) > 0
for all i ∈ [t]. This sets the stage for our analysis.
Fix Gn = IRGn (κn ); fix r ≥ 1 and assume that ∂Br(Gn ) (o1 ), ∂Br(Gn ) (o2 ) 6= ∅. We will
prove that

P distIRGn (κn ) (o1 , o2 ) ≤ (1 + ε) logν n | Br(Gn ) (o1 ), Br(Gn ) (o2 ) = 1 + oP (1). (6.5.68)

We follow the proof of Theorem 6.19 and rely on path-counting techniques. We again take

a = ∂Br(Gn ) (o1 ), b = ∂Br(Gn ) (o2 ), (6.5.69)

and
Ia,b = [n] \ (Br(Gn ) (o1 ) ∪ Br(Gn ) (o2 )). (6.5.70)

Recall from (6.5.1) that Nk (a, b) denotes the number of k -step occupied self-avoiding paths
connecting a and b.
280 Small-World Phenomena in Inhomogeneous Random Graphs

We aim to use the second-moment method for Nk (a, b), for which we need to investi-
gate the mean and variance of Nk (a, b). Let Pr denote the conditional probability given
Br(Gn ) (o1 ) and Br(Gn ) (o2 ), and let Er and Varr denote the corresponding conditional expec-
tation and variance. We compute
k
X X Y κn (πl , πl+1 )
Er [Nk (a, b)] = P(~π occupied in Gn ) =
π ∈Pk (a,b)
~ π∈Pk (a,b) l=0
n
1
≤ hx, T kκn yi, (6.5.71)
n
where x = (xi )ti=1 , and y = (yi )ti=1 , with xi the number of type-i vertices in ∂Br(Gn ) (o1 )
and yi the number of type-i vertices in ∂Br(Gn ) (o2 ), respectively. An identical lower bound
holds with an extra factor (µn − k/n)/µn , where µn = minj∈[t] µn (j) → minj∈[t] µ(j) >
0, by assumption.
Recall the notation and results in Section 3.4, and in particular Theorem 3.10(b). The
types of o1 and o2 are asymptotically independent, and the probability that o1 has type j is
equal to µn (j), which converges to µ(j). On the event that the type of o1 equals j1 , the vec-
tor of the numbers of individuals in ∂Br(Gn ) (o1 ) converges in distribution to (Zr(1,j1 ) (i))i∈[t] ,
which, by Theorem 3.10(b), is close to M∞ xκ (i) for some strictly positive random variable
M∞ . We conclude that
d d
x −→ Z (1,j
r
1)
, y −→ Z (2,j
r
2)
, (6.5.72)

where the limiting branching processes are independent. Equation (6.5.72) replaces the con-
vergence in Lemma 6.20 for GRGn (w).
We conclude that, for k = kn = d(1 + ε) logν ne, conditioning on Br(Gn ) (o1 ) and
Br (o2 ) such that a = ∂Br(Gn ) (o1 ), and b = ∂Br(Gn ) (o2 ), with ∂Br(Gn ) (o1 ), ∂Br(Gn ) (o2 ) 6=
(Gn )

∅, we have
Er [Nk (a, b)] −→ ∞.
P
(6.5.73)

This completes the analysis of the first moment.

We next analyze Varr (Nk (a, b)) and prove that Varr (Nk (a, b))/Er [Nk (a, b)]2 −→ ∞.
P

Here our proof will be slightly more sketchy.

Recall (6.5.19). In this case we compute, as in (6.5.15) (see also (6.5.20)),

P(~π , ρ
~ occupied) = P(~π occupied)P(~
ρ occupied | ~π occupied). (6.5.74)

Now recall the definition of a shape in (6.5.17), in Definition 6.16. Fix σ ∈ Shapem,l and
~ ∈ Pk (a, b) with Shape(~π , ρ
ρ ~ ) = σ . The factor P(~ ρ occupied | ~π occupied), summed out
over the free vertices of ρ ~ (i.e., those that are not also vertices in ~π ) gives rise to m factors of
the form T tκin (iπui , iπvi )/n, for i ∈ [m] and some vertices πui and πvi in the path (πi )ki=0 .
We use that, uniformly in q ≥ 1,
1 C
max T q (i, j) ≤ kT κn kq . (6.5.75)
n i,j∈[t] κn n
C
Thus, for every of the m subpaths of length ti we obtain a factor n
kT κn kti . Using that
6.5 Logarithmic Upper Bound for Finite-Variance Weights 281
Pm
i=1 ti = k − l, by (6.5.18), we arrive at
X
P(~π , ρ
~ occupied)
~ ~ ∈ Pk (a, b)
π, ρ
Shape(~ ~) = σ
π, ρ
m
Y C C m
≤ Er [Nk (a, b)] kT κn kti ≤ Er [Nk (a, b)]kT κn kk−l . (6.5.76)
i=1
n n

This replaces (6.5.21). The proof can now be completed in an identical way to that of (6.5.7)
combined with that of (6.5.49) in the proof of Theorem 6.19. We omit further details.

Concentration of the Giant in Theorem 3.19

By Theorem 3.14, we know that IRGn (κn ) converges locally in probability. This immedi-
ately implies the upper bound on the giant component in Theorem 3.19, as was explained
below Theorem 3.19. We now use the previous proof on typical distances to prove the con-
centration of the giant. We do this by applying Theorem 2.28, for which we need to verify
that (2.6.7) holds. It turns out to be more convenient to verify condition (2.6.38).
Again, it suffices to consider the finite-type setting. In our argument we will combine
the path-counting methods from Proposition 6.14 with the “giant is almost local” method.
Exercise 6.27 investigates a direct proof of (2.6.38) as in Section 2.6.4.
We rewrite the expectation appearing in (2.6.38) as
1 h i
E # x, y ∈ [n] : |∂Br
(Gn )
(x)|, |∂Br
(Gn )
(y)| ≥ r, x ←→/ y
n2
= P |∂Br (o1 )|, |∂Br (o2 )| ≥ r, o1 ←→

(Gn ) (Gn )
/ o2 . (6.5.77)

We condition on Br(Gn ) (o1 ) and Br(Gn ) (o2 ), and note that the events {|∂Br(Gn ) (o1 )| ≥ r} and
{|∂Br(Gn ) (o2 )| ≥ r} are measurable with respect to Br(Gn ) (o1 ) and Br(Gn ) (o2 ), to obtain

P |∂Br(Gn ) (o1 )|, |∂Br(Gn ) (o2 )| ≥ r, o1 ←→

/ o2
= E 1{|∂Br(Gn ) (o1 )|,|∂Br(Gn ) (o2 )|≥r} P o1 ←→
h i
/ o2 | Br(Gn ) (o1 ), Br(Gn ) (o2 ) . (6.5.78)

In the proof of Theorem 6.1(b) we showed that, on {∂Br(Gn ) (o1 ), ∂Br(Gn ) (o2 ) 6= ∅},

P distIRGn (κn ) (o1 , o2 ) ≤ (1 + ε) logν n | Br(Gn ) (o1 ), Br(Gn ) (o2 ) = 1 − oP (1), (6.5.79)

when first n → ∞ followed by r → ∞. In particular, on {∂Br(Gn ) (o1 ), ∂Br(Gn ) (o2 ) 6= ∅},

P o1 ←→

/ o2 | Br(Gn ) (o1 ), Br(Gn ) (o2 ) = oP (1), (6.5.80)

again when first n → ∞ followed by r → ∞. Since also

P o1 ←→

/ o2 | Br(Gn ) (o1 ), Br(Gn ) (o2 ) ≤ 1,

the Dominated Convergence Theorem [V1, Theorem A.1] completes the proof of (2.6.38)
for IRGn (κn ), as required.
282 Small-World Phenomena in Inhomogeneous Random Graphs

(a) (b)
40
300

35
250
30

200
25
Diameter

Diameter
150 20

15
100
10

50
5

0
105 106 107 108 105 106 107 108
Size Size
Figure 6.6 (a) Diameters of the 727 networks of size larger than 10,000, from the
KONECT data base, and (b) the 721 diameters that are at most 40.

6.6 R ELATED R ESULTS ON D ISTANCES IN I NHOMOGENEOUS R ANDOM G RAPHS

In this section we discuss some related results for inhomogeneous random graphs. While we
give some intuition about their proofs, we do not include them in full detail.

Diameter in Inhomogeneous Random Graphs

Recall that the diameter diam(G) of the graph G equals the maximal finite graph distance
between any pair of vertices, i.e.,
diam(G) = max distG (u, v). (6.6.1)
u,v : distG (u,v)<∞

See Figure 6.6 for the diameters of networks in the KONECT data base. While there are
some networks with quite large diameters (often corresponding to road or other spatial net-
works), the diameters in the majority of the networks are quite small.
We next investigate the diameter of an IRGn (κn ), which tends to be much larger than
the typical distances owing to the long thin lines that are distributed as a IRGn (κn ) with a
subcritical κn by a duality principle for IRGn (κn ). Before we state the results, we introduce
the notion of the dual kernel:
Definition 6.25 (Dual kernel for IRGn (κn )) Let (κn ) be a sequence of supercritical ker-
nels with limit κ. The limiting dual kernel is the kernel κ
b defined by κ
b(x, y) = κ(x, y) with
reference measure dµ b(x) = (1 − ζκ (x))µ(dx). Note that this reference measure integrates
to 1 − ζκ > 0, not to 1. J
The dual kernel describes the graph that remains after the removal of the giant component.
Here, the reference measure µ b measures the structure of the types of vertices in the graph.
Indeed, a vertex x is in the giant component with probability ζκ (x); if in fact it is in the giant
then it must be removed. Thus, µ b describes the proportion of vertices, of various types, that
are outside the giant component. As before, we define the operator T κb by
Z Z
(T κb f )(x) = κ
b(x, y)f (y)dµb(y) = κ(x, y)f (y)[1 − ζκ (x)]µ(dy), (6.6.2)
S S
6.6 Related Results on Distances in Inhomogeneous Random Graphs 283

and we write kT κb k for

kT κb k = sup kT κb f kµb : f ≥ 0, kf kµb = 1 , (6.6.3)

where
Z
kf k2µb = f 2 (x)µ
b(dx). (6.6.4)
S

The following theorem describes the diameter in terms of the above notation:

Theorem 6.26 (Diameter of IRGn (κn ) in the finite-types case) Let (κn ) be a sequence of
kernels with limit κ, which has finitely many types. If 0 < kT κ k < 1 then

diam(IRGn (κn )) P 1
−→ (6.6.5)
log n log(1/kT κ k)
as n → ∞. If kT κ k > 1 and κ is irreducible then

diam(IRGn (κn )) P 2 1
−→ + , (6.6.6)
log n log(1/kT κb k) log kT κ k
where κ
b is the dual kernel to κ.

If we compare Theorem 6.26 with Theorem 6.2 then we see that the diameter has the same
scaling as the typical distance when kT κ k < ∞, but that diam(IRGn (κn ))/ log n con-
verges in probability to a strictly larger limit than the one when distIRGn (κn ) (o1 , o2 )/ log n is
conditioned on being finite. This effect is particularly noticeable in the case of rank-1 mod-
els with τ ∈ (2, 3), where, conditional on its being finite, distIRGn (κn ) (o1 , o2 )/ log log n
converges in probability to a finite limit, while diam(IRGn (κn ))/ log n converges to a
non-zero limit. This can be explained by noticing that the diameter in IRGn (κn ) is due to
very thin lines of length of order log n. Since these lines involve only very few vertices, they
do not contribute to distIRGn (κn ) (o1 , o2 ) but they do contribute to diam(IRGn (κn )). This is
another argument for why we prefer to work with typical distances rather than the diameter.
Exercise 6.28 investigates the consequences for ERn (λ/n).
We do not prove Theorem 6.26 here. For GRGn (w), it also follows from Theorem 7.19
below, which states a related result for the configuration model.

Distance Fluctuations for Finite-Variance Weights

We continue by studying the fluctuations of the typical distance when E[W 2 ] < ∞. We
impose a slightly stronger condition on the distribution function F of W , namely, that there
exists a τ > 3 and c > 0 such that

1 − F (x) ≤ cx−(τ −1) . (6.6.7)

Equation (6.6.7) implies that the degrees have finite variance; see Exercise 6.29.

Theorem 6.27 (Limit law for the typical distance in NRn (w)) Consider NRn (w), where
the weights w = (wv )v∈[n] are given by wv = [1 − F ]−1 (v/n) as in (1.3.15), with F
284 Small-World Phenomena in Inhomogeneous Random Graphs

satisfying (6.6.7), and let ν = E[W 2 ]/E[W ] > 1. For k ≥ 1, define ak = blogν kc −
logν k ∈ (−1, 0]. Then, there exist random variables (Ra )a∈(−1,0] with
lim sup sup P(|Ra | < K) = 1 (6.6.8)
K→∞ a∈(−1,0]

such that, as n → ∞ and for all k ∈ Z, with o1 , o2 chosen independently and uar from [n],
P distNRn (w) (o1 , o2 ) − blogν nc = k | distNRn (w) (o1 , o2 ) < ∞

= P(Ran = k) + o(1). (6.6.9)

The same result applies, under identical conditions, to GRGn (w) and CLn (w).
P
While Theorem 6.1 implies that distNRn (w) (o1 , o2 )/ log n −→ 1/ log ν conditional on
o1 and o2 being connected, Theorem 6.27 implies that the fluctuations of distNRn (w) (o1 , o2 )
around logν n remain uniformly bounded in probability.
The random variables (Ra )a∈(−1,0] can be determined in terms of the limit law in the
branching-process approximation of the neighborhoods of NRn (w), as in Theorem 3.18.
These random variables depend sensitively on a, and this has the implication that although
distNRn (w) (o1 , o2 ) − blogν nc n≥2 is a tight sequence of random variables, it does not
weakly converge. See Exercises 6.30–6.32 for further properties of this sequence of random
variables.

Distances in Generalized Random Graphs for τ = 3

We next investigate what happens at the critical value of τ , which is τ = 3, when we allow
for an additional logarithmic correction to the weight distribution:
Theorem 6.28 (Critical τ = 3 case: interpolation) Consider GRGn (w), where the iid
weights (wv )v∈[n] satisfy that, as w → ∞,
P(w1 > w) = w−2 (log w)2α+o(1) , (6.6.10)
for some α > − 21 . Consider two vertices o1 , o2 chosen independently and uar from [n].
Then, conditional on o1 ←→ o2 ,
1 log n
distGRGn (W ) (o1 , o2 ) = (1 + oP (1)) . (6.6.11)
1 + 2α log log n
Theorem 6.28 shows how the addition of powers of the logarithm changes typical dis-
tances; note that the degree distribution has tails identical to the weight distribution in
(6.6.10). It can be compared with Theorem 6.22, which is more general in terms of the
weight sequence (they do not need to be iid), yet less general in terms of the logarithmic
correction in (6.6.10). Exercise 6.33 investigates the lower bound in Theorem 6.28 for all
α > − 12 , while Exercise 6.34 shows a logarithmic lower bound for α < − 12 .

6.7 N OTES AND D ISCUSSION FOR C HAPTER 6

Notes on Section 6.2

Theorem 6.1 is a simplified version of (Bollobás et al., 2007, Theorem 3.14). A first version of Theorem
6.1 was proved by Chung and Lu (2002a, 2003) for the random graph with prescribed expected degrees
6.8 Exercises for Chapter 6 285

CLn (w), in the case of admissible deterministic weights. We refer to (Chung and Lu, 2003, p. 94) for the
definition of admissible weight sequences.
Theorem 6.2 has a long history, and many versions of it have been proved in the literature. We refer the
reader to Chung and Lu (2002a, 2003) for the Chung–Lu model, and van den Esker et al. (2008) for its
extensions to the Norros–Reittu model and the generalized random graph.
Theorem 6.3 for the random graph with prescribed expected degrees, or Chung–Lu model, was first
proved by Chung and Lu (2002a, 2003), in the case of deterministic weights wv = c(n/v)1/(τ −1) having
average degree strictly greater than 1 and maximum weight m satisfying log m log n/ log log n. These
restrictions were lifted in (Durrett, 2007, Theorem 4.5.2). Indeed, the bound on the average degree is not
necessary, since, for τ ∈ (2, 3), ν = ∞ and therefore the IRG is always supercritical. An upper bound
as in Theorem 6.3 for the Norros–Reittu model with iid weights was proved by Norros and Reittu (2006).
Theorem 6.3 has been proved in many versions, both fully as well as in partial forms; see, e.g., Norros and
Reittu (2006); Chung and Lu (2002a, 2003); Dereich et al. (2012).

Notes on Section 6.3

As far as we are aware, the proof of Theorem 6.4 is new in the present context. Similar arguments have
often been used, though, to prove lower bounds on distances in various situations.
The truncated first-moment method in the proof of Theorem 6.7 was inspired by Dereich et al. (2012).

Notes on Section 6.4

Theorem 6.11 is novel in its precise form, and also its proof is different from those in the literature. See the
notes of Section 6.2 for the relevant references.

Notes on Section 6.5

The path-counting techniques in Proposition 6.14 are novel. They were inspired by the path-counting tech-
niques used by Eckhoff et al. (2013) for smallest-weight problems on the complete graph where many of
the counting arguments appeared. Related proofs for the upper bound on typical distances in rank-1 random
graphs with ν < ∞ as in Theorem 6.19 often rely on branching-process approximations up to a generation
r = rn → ∞.

Notes on Section 6.6

Theorem 6.26 is a special case of (Bollobás et al., 2007, Theorem 3.16). In the special case of ERn (λ/n), it
extends a previous result of Chung and Lu (2001) that proves logarithmic asymptotics of diam(ERn (λ/n)),
and it negatively answers a question of Chung and Lu (2001). Related results for the configuration model,
which also imply results for the generalized random graph, can be found in Fernholz and Ramachan-
dran (2007). See also Theorem 7.19 below. See Riordan and Wormald (2010) for additional results and
branching-process proofs. There, the case of ERn (λ/n) with λ > 1 such that λ − 1 n−1/3 was also
discussed. Similar results were derived by Ding et al. (2011, 2010).
Theorem 6.27 is proved more generally in van den Esker et al. (2008), both in the case of iid weights as
well as for deterministic weights under a mild further condition on the distribution function.
Theorem 6.28 was proved by Dereich et al. (2017).

6.8 E XERCISES FOR C HAPTER 6

Exercise 6.1 (Typical distances in ERn (λ/n) revisited) Fix λ > 1. Use either Theorem 6.1 or Theorem
6.2 to prove that, with o1 , o2 chosen independently and uar from [n], and conditional on o1 ←→ o2 ,
P
distERn (λ/n) (o1 , o2 )/ log n −→ 1/ log λ.
Exercise 6.2 (Convergence in probability of typical distance in IRGn (κn )) Suppose that the graphical
sequence of kernels (κn )n≥1 converges to κ, where κ is irreducible and kT κ k = ∞. Prove that Theorem
3.19 together with Theorem 6.1(b) imply that, with o1 , o2 chosen independently and uar from [n], and
conditional on o1 ←→ o2 ,
P
distIRGn (κn ) (o1 , o2 )/ log n −→ 0. (6.8.1)
286 Small-World Phenomena in Inhomogeneous Random Graphs

Exercise 6.3 (Power-law tails in key example of deterministic weights) Let w be defined as wv = [1 −
F ]−1 (v/n) as in (1.3.15), and assume that F satisfies

1 − F (x) = x−(τ −1) L(x), (6.8.2)

where τ ∈ (2, 3) and x 7→ L(x) is slowly varying. Prove that, for all δ > 0, there exists c1 = c1 (δ) and
c2 = c2 (δ) such that, uniformly in n,

c1 x−(τ −1+δ) ≤ [1 − Fn ](x) ≤ c2 x−(τ −1−δ) , (6.8.3)

β 1
where the lower bound holds uniformly in x ≤ n for some β > 2
, as stated in (6.2.5).
Exercise 6.4 (Power-law tails for iid weights) Fix iid weights w = (wv )v∈[n] with distribution F satisfy-
ing (6.8.2) with τ ∈ (2, 3), and where x 7→ L(x) is slowly varying. Prove that (6.2.5) holds with probability
converging to 1.
Exercise 6.5 (Conditions√
(6.3.1) and Condition 1.1) Show that (6.3.1) holds when there is precisely one
vertex with weight w1 = n, whereas wv = λ > 1 for all v ∈ [n] \ {1}, but Condition 1.1(c) does not.
Exercise 6.6 (Conditions (6.3.1) and Condition 1.1) In the setting of Exercise
√ 6.5, argue that the upper
bound derived in Theorem 6.4 is not sharp, since the vertex with weight w1 = n can occur at most once
in a self-avoiding path.
Exercise 6.7 (Lower bound on fluctuations) Adapt the proof of the lower bound on typical distances in
the finite-variance weight setting in Theorem 6.4 to show that, for every ε, we can find a constant K =
K(ε) > 0 such that
P distNRn (w) (o1 , o2 ) ≤ logνn n − K ≤ ε.

(6.8.4)
Conclude that if log νn = log ν + O(1/ log n), then the same statement holds with logν n replacing
logνn n.
Exercise 6.8 (Proof of Corollary 6.6) Adapt the proof of the lower bound on typical distances in the
finite-variance weight setting in Theorem 6.4 to prove Corollary 6.6.
p
Exercise 6.9 (Lower bound on typical distances for τ = 3) Let wv = c n/v, so that τ = 3. Prove that
νn / log n converges as n → ∞. Use Corollary 6.6 to obtain that, for any ε > 0,
log n
P distNRn (w) (o1 , o2 ) ≤ (1 − ε) = o(1). (6.8.5)
log log n
Exercise 6.10 (Lower bound on typical distances for τ ∈ (2, 3)) Let wv = c(n/v)1/(τ −1) with τ ∈
(2, 3). Prove that there exists a constant c0 > 0 such that νn ≥ c0 n(3−τ )/(τ −1) . Show that Corollary 6.6
implies that distNRn (w) (o1 , o2 ) ≥ (τ − 1)/(τ − 3) whp in this case. How useful is this bound?
Exercise 6.11 (Convergence in probability of typical distance in IRGn (κn )) Suppose that the graphical
sequence of kernels (κn ) satisfies supx,y,n κn (x, y) < ∞, where the limit κ is irreducible and ν =
kT κ k > 1. Prove that Theorems 3.19 and 6.1(a),(b) imply that, conditional on distIRGn (κn ) (o1 , o2 ) < ∞,
P
distIRGn (κn ) (o1 , o2 )/ log n −→ 1/ log ν. (6.8.6)
Exercise 6.12 (Distance between fixed vertices) Show that (6.3.30) and Lemma 6.9 imply that, for all
a, b ∈ [n] with a 6= b,
kn k−1
1 wa wb X Y
P(distNRn (w) (a, b) ≤ kn ) ≤ + νn (bl ∧ bk−l )
n `n
k=1 l=1
kn
X k
Y
+ (wa + wb ) [1 − Fn? ](bk ) νn (bl ). (6.8.7)
k=1 l=1

Exercise 6.13 (Bound on truncated forward degree νn (b)) Assume that (6.3.21) holds. Prove the bound
on νn (b) in (6.3.38) by combining (1.4.12) in Lemma 1.22 with `n = Θ(n) by Conditions 1.1(a),(b).
6.8 Exercises for Chapter 6 287

Exercise 6.14 (Ultra-small distances for CLn (w) and GRGn (w)) Complete the proof of the doubly
logarithmic upper bound on typical distances in Theorem 6.11 for CLn (w) and GRGn (w).
Exercise 6.15 (Upper bound on the expected number of paths) Consider an inhomogeneous random graph
with edge probabilities pij = ui uj for (ui )i∈[n] ∈ [0, 1]n . Prove (6.5.4), which states that
X k−1
E[Nk (a, b)] ≤ ua ub u2i .
i∈I\{a,b}

Exercise 6.16 (Variance of two-paths) Consider an inhomogeneous random graph with edge probabilities
pij = ui uj for (ui )i∈[n] ∈ [0, 1]n . Prove that Var(Nk (a, b)) ≤ E[Nk (a, b)] for k = 2.
Exercise 6.17 (Variance of three-paths) Consider an inhomogeneous random graph with edge probabili-
ties pij = ui uj for (ui )i∈[n] ∈ [0, 1]n . Compute Var(N3 (a, b)) explicitly, and compare it with the bound
in (6.5.7).
Exercise 6.18 (Connections between sets in NRn (w)) Let A, B ⊆ [n] be two disjoint sets of vertices.
Prove that
P(A directly connected to Bin NRn (w)) = 1 − e−wA wB /`n , (6.8.8)
P
where wA = a∈A wa is the weight of A.

Exercise 6.19 (Expectation of paths between sets in ERn (λ/n)) Consider ERn (λ/n). Fix A, B ⊆ [n]
with A ∩ B = ∅, and let Nk (A, B) denote the number of self-avoiding paths of length k connecting A to
B (where a path connecting A and B avoids A and B except for the starting point and endpoint). Show
that, for k(|A| + |B|)/n = o(1),
!k
k |A| + |B|
E[Nk (A, B)] = λ |A||B| 1 − (1 + o(1)). (6.8.9)
n

Exercise 6.20 (Variance on path counts for ERn (λ/n) (cont.)) In the setting of Exercise 6.19, use Propo-
sition 6.14 to bound the variance of Nk (A, B), and prove that
P
Nk (A, B)/E[Nk (A, B)] −→ 1 (6.8.10)
when |A|, |B| → ∞ with |A| + |B| = o(n/k) and k = dlogλ ne.
Exercise 6.21 (Logarithmic bound for νn when τ = 3) Define

a = o01 , b = o02 ,
p
I = {i ∈ [n] : wi ∈ [K, βn ]}, (6.8.11)
where o01 , o02 are independent copies from the sized-biased distribution in (6.5.60). Prove that τ = 3 in the
form of (6.5.55) and (6.5.56) implies that νI ≥ c log βn for some c > 0 and all n sufficiently large. It may
be helpful to use
Z √βn
1X 2
wi = E[Wn 1{Wn ∈[K, βn ]} ] = 2
2
p
√ x[Fn ( βn ) − Fn (x ∨ K)]dx. (6.8.12)
n i∈I 0

Exercise 6.22 (Expected number of paths within Coren diverges) Recall the setting in (6.8.11) in Exercise
6.21. Fix η > 0. Prove that
E[Nk (a, b)] → ∞
for a = o01 , b= o02 , and k = d(1 + η) log n/ log νn e.
Exercise 6.23 (Concentration of number of paths within Coren ) Recall the setting in (6.8.11) in Exercise
6.21. Prove that
Var(Nk (a, b))/E[Nk (a, b)]2 → 0
for a = o01 , b = o02 , and k = d(1 + η) log n/ log νn e.
288 Small-World Phenomena in Inhomogeneous Random Graphs

Exercise 6.24 (Concentration of number of paths within Coren ) Complete the proof of Lemma 6.23 on
the basis of Exercises 6.21–6.23.
Exercise 6.25 (Asymptotic equivalence in Lemma 6.24) Recall the conditions of Theorem 6.22. Prove
that CLn (w), GRGn (w), and NRn (w) √ are asymptotically equivalent when restricted to the edges in
[n] × {v : wv ≤ βn } for any βn = o( n). Hint: Use the asymptotic equivalence in [V1, Theorem 6.18]
for general inhomogeneous random graphs.
Exercise 6.26 (Completion of the proof of Lemma 6.24) Complete the proof of (6.5.62) in Lemma 6.24
by adapting the arguments in (6.5.51)–(6.5.54).
Exercise 6.27 (Concentration of the giant in IRGs) In Section 6.5.3, Theorem 3.19 for finite-type inhomo-
geneous random graphs was proved using a path-counting method based on Theorem 6.1(b). Give a direct
proof of the “giant is almost local” condition in (2.6.38) by adapting the argument in Section 2.6.4 for the
Erdős–Rényi random graph. You may assume that µ(s) > 0 for every s ∈ [t] for which µn (s) > 0.
Exercise 6.28 (Diameter of ERn (λ/n)) Recall the asymptotics of the diameter in IRGn (κn ) in Theorem
6.26. For ERn (λ/n), show that kT κ k = λ and kT κb k = µλ , where µλ is the dual parameter in [V1,
(3.6.6)], so that Theorem 6.26 becomes
diam(ERn (λ/n)) P 2 1
−→ + . (6.8.13)
log n log(1/µλ ) log λ
Exercise 6.29 (Finite variance of degrees when (6.6.7) holds) Prove that (6.6.7) implies that E[W 2 ] < ∞.
Use this to prove that the degrees have uniformly bounded variance when (6.6.7) holds.
Exercise 6.30 (Tightness of centered typical distances in NRn (w)) Prove that, under the conditions
of Theorem 6.27, and conditional on distNRn (w) (o1 , o2 ) < ∞, the sequence distNRn (w) (o1 , o2 ) −

blogν nc n≥2 is tight.

Exercise 6.31 (Non-convergence of centered typical distances in NRn (w)) Prove that, under the condi-
tions of Theorem 6.27, and conditional on distNRn (w) (o1 , o2 ) < ∞, the sequence distNRn (w) (o1 , o2 ) −
blogν nc does not weakly converge when the distribution of Ra depends continuously on a and when there
are a, b ∈ (−1, 0] such that the distribution of Ra is not equal to that of Rb .
Exercise 6.32 (Extension of Theorem 6.27 to GRGn (w) and CLn (w)) Use [V1, Theorem 6.18] to prove
that Theorem 6.27 holds verbatim for GRGn (w) and CLn (w) when (6.6.7) holds. Hint: Use asymptotic
equivalence.
Exercise 6.33 (Extension lower bound Theorem 6.28 to all α > − 21 ) Consider NRn (w) with weights w
satisfying
P(Wn > x) = x−2 (log x)2α+o(1) , (6.8.14)
ε
for all x ≤ n for some ε > 0, where Wn = wo is the weight of a uniform vertex in [n]. Prove the lower
bound in Theorem 6.28 for all α > − 21 .

Exercise 6.34 (Extension of the lower bound in Theorem 6.28 to α < − 12 ) Consider NRn (w) as in
Exercise 6.33, but now with α < − 12 . Let ν = E[W 2 ]/E[W ] < ∞. Prove that the lower bound in Theorem
6.28 is replaced by
P(distGRGn (w) (o1 , o2 ) ≤ (1 − ε) logν n) = o(1). (6.8.15)
C HAPTER 7
S MALL -W ORLD P HENOMENA
IN C ONFIGURATION M ODELS

Abstract
In this chapter we investigate the distance structure of the configuration model
by investigating its typical distances and its diameter. We adapt the path-
counting techniques in Section 6.5 to the configuration model, and obtain typ-
ical distances from the “giant is almost local” proof. To understand the ultra-
small distances for infinite-variance degree configuration models, we investi-
gate the generation growth of infinite-mean branching processes. The relation
to branching processes informally leads to the power-iteration technique, which
allows one to deduce typical distance results in a relatively straightforward way.

7.1 M OTIVATION : D ISTANCES AND D EGREE S TRUCTURE

In this chapter we investigate graph distances in the configuration model. We start with a
motivating example.

Motivating Example
Recall Figure 1.5(a), in which graph distances in the Autonomous Systems (AS) graph in the
Internet, also called AS counts, are shown. A relevant question is whether such a histogram
can be predicted by the graph distances in a random graph model a having similar degree
structure and size to the AS graph. Figure 7.1 compares simulations of the typical distances
for τ ∈ (2, 3) with the distances in the AS graph, with n = 10, 940 equal to the number of
autonomous systems, and τ = 2.25 the best approximation to the degree power-law expo-
nent of the AS graph. We see that the typical distances in the configuration model CMn (d)
and the AS counts are quite close. Further, Figure 7.2 shows the 90% percentile of typical

0.4

0.3

0.2

0.1

0.0
0 1 2 3 4 5 6 7 8 9 10 11

Figure 7.1 Number of AS traversed in hopcount data (lighter gray), and, for
comparison, the model (darker gray) with τ = 2.25, n = 10, 940.

289
290 Small-World Phenomena in Configuration Models

90-Percentile effective diameter

2
105 106 107
Size

Figure 7.2 90th percentile of typical distances in the 727 networks of size larger
than 10,000 from the KONECT data base.

distances in the KONECT data base (recall that Figure 6.1 indicates the median value). We
see that this 90th percentile mostly remains relatively small, even for large networks.
Figures 7.1 and 7.2 again raise the question how graph distances depend on the structure
of the random graphs and real-world networks in question, such as their size and degree
structure. The configuration model is highly flexible, in the sense that it offers complete
freedom in the choice of the degree distribution. Thus, we can use the configuration model
(CM) to single out the relation between graph distances and degree structure, in a similar
way to that in which we investigate the giant component size and connectivity as a function
of the degree distribution, as discussed in detail in Chapter 4. Finally, we can verify whether
graph distances in CM are closely related to those in inhomogeneous random graphs, as dis-
cussed in Chapter 6, so as to detect another sign of the wished-for universality of structural
properties of random graphs with similar degree distributions.

Organization of this Chapter

This chapter is organized as follows. In Section 7.2 we summarize the main results on typical
distances in the configuration model. In Section 7.3 we prove these distance results, using
path-counting techniques and comparisons with branching processes. We do this by formu-
lating specific theorems, for upper and lower bounds on graph distances, that often hold
under slightly weaker conditions. In Section 7.4 we study the generation sizes of infinite-
mean branching processes, as these arise in the CM with infinite-variance degrees. These
generation sizes heuristically explain the ultra-small nature of CMs in the infinite-variance
degree regime. In Section 7.5 we study the diameter of the CM. In Section 7.6, we state
further results in the CM. We close the chapter with notes and discussion in Section 7.7 and
with exercises in Section 7.8.
7.2 Small-World Phenomenon in Configuration Models 291

7.2 S MALL -W ORLD P HENOMENON IN C ONFIGURATION M ODELS

In this section we describe the main results on typical distances in the CM, both in the case of
finite-variance degrees and in the case of infinite-variance degrees. These results are proved
in the following section.

Distances in Configuration Models with Finite-Variance Degrees

We start by analyzing the typical distances for the configuration model CMn (d) when Con-
ditions 1.7(a)–(c) hold:
Theorem 7.1 (Typical distances in CMn (d) for finite-variance degrees) Consider CMn (d)
where the degrees d = (dv )v∈[n] satisfy Conditions 1.7(a)–(c), with ν > 1. Then, for o1 , o2
chosen independently and uar from [n], and conditioning on o1 ←→ o2 ,
P
distCMn (d) (o1 , o2 )/ log n −→ 1/ log ν. (7.2.1)
Theorem 7.1 shows that the typical distances in CMn (d) are of order logν n, and is thus
similar in spirit to Theorem 6.2. We shall see that its proof is also quite similar.

Finite-Mean and Infinite-Variance Degrees

We next study typical distances in configuration models with degrees having finite mean and
infinite variance. We start by formulating the precise condition on the degrees with which
we will work. This condition is identical to the condition on Fn for NRn (w) formulated in
(6.2.5).
Recall from (1.3.31) that Fn (x) denotes the proportion of vertices having degree at most
x. Then, we assume that there exists a τ ∈ (2, 3) and that, for all δ > 0, there exist
c1 = c1 (δ) and c2 = c2 (δ) such that, uniformly in n,
c1 x−(τ −1+δ) ≤ [1 − Fn ](x) ≤ c2 x−(τ −1−δ) , (7.2.2)
where the upper bound holds for every x ≥ 1, while the lower bound is required to hold
only for 1 ≤ x ≤ nβ , for some β > 21 . The typical distances of CMn (d) under the
infinite-variance condition in (7.2.2) are identified in the following theorem:
Theorem 7.2 (Typical distances in CMn (d) for τ ∈ (2, 3)) Consider CMn (d) where the
degrees d = (dv )v∈[n] satisfy Conditions 1.7(a),(b) and (7.2.2). Then, with o1 , o2 chosen
independently and uar from [n], and conditioning on o1 ←→ o2 ,
distCMn (d) (o1 , o2 ) P 2
−→ . (7.2.3)
log log n | log (τ − 2)|
Theorem 7.2 is similar in spirit to Theorem 6.3 for NRn (w). We will again see that
the high-degree vertices play a crucial role in creating short connections and that shortest
paths generally pass through the core of high-degree vertices, unlike in the finite-variance
case. See Figure 7.3 for a simulation of the typical distances in CMn (d) with τ = 2.5 and
τ = 3.5, respectively, where the distances are again noticeably smaller for the ultra-small
setting, with τ = 2.5, as compared with the small-world case, with τ = 3.5. Exercises
7.1–7.3 investigate examples of Theorems 7.1 and 7.2.
Theorems 7.1 and 7.2 leave open many other possible settings, particularly in the critical
292 Small-World Phenomena in Configuration Models

(a) (b)
0.3
0.1

0.08
Proportion

Proportion
0.2
0.06

0.1 0.04

0.02

0
2 3 4 5 6 7 8 9 10 11 12 13 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
Typical Distance Typical Distance
Figure 7.3 Typical distances between 2,000 pairs of vertices in the configuration
model with n = 100, 000 and (a) τ = 2.5 and (b) τ = 3.5.
cases τ = 2 and τ = 3. In these cases it can be expected that the results depend on finer
properties of the degrees. We will present some results along the way, when such results
follow relatively straightforwardly from our proofs.

7.3 P ROOFS OF S MALL -W ORLD R ESULTS FOR THE C ONFIGURATION M ODEL

In this section we give the proofs of Theorems 7.1 and 7.2 describing the small-world prop-
erties in CMn (d). These proofs are adaptations of the proofs of Theorems 6.2 and 6.3, and
we focus on the differences in the proofs.
The section is organized as follows. In Section 7.3.1 we give a branching-process ap-
proximation for the neighborhoods of a pair of uniform vertices in CMn (d), using the local
convergence in Theorem 4.1 and its proof. In Section 7.3.2 we use path-counting upper
bounds to prove lower bounds on typical distances by the first-moment method. In Section
7.3.3 we employ path-counting techniques similar to those in in Section 6.5.1 using second-
moment methods, adapted to the CM, where edges are formed by pairing half-edges. We use
these to prove logarithmic upper bounds on graph distances. We close in Section 7.3.4 by
proving doubly logarithmic upper bounds on typical distances of CMs with infinite-variance
degrees. We also discuss the diameter of the core of high-degree vertices.

7.3.1 B RANCHING -P ROCESS A PPROXIMATION

In this subsection we summarize some links between the breadth-first exploration in CMn (d)
and branching processes. Recall the coupling of such breadth-first exploration processes in
Lemma 4.2, and their extensions to two starting points in Remark 4.3. Corollary 2.19 de-
scribes related results assuming local convergence, but the exploration to which we couple
in this section involves unpaired half-edges incident to vertices at a given distance, rather
than the vertices at a given distance themselves as in Corollary 2.19.
Fix Gn = CMn (d). Let o1 , o2 be two independent and uniformly chosen vertices in [n].
By convention, let Z0(n;1) = Z0(n;2) = 1. For r ≥ 1 and i ∈ {1, 2}, let Zr(n;i) denote the
number of unpaired half-edges incident to vertices at graph distance r − 1 from vertex oi
7.3 Proofs of Small-World Results for the Configuration Model 293
(Gn )
after exploration of Br−1 (oi ), so that Z1(n;i) = doi . Thus, Zr(n;i) is obtained after pairing all
(n;i)
the Zr−1 half-edges at distance r − 1 from the root oi and counting the number of unpaired
sibling half-edges incident to the half-edges chosen.
It would be tempting to think of Zr(n;i) as |∂Br(Gn ) (oi )|, and mostly it actually is. The
difference is that |∂Br(Gn ) (oi )| counts the number of vertices at distance r of oi while Zr(n;i)
counts the number of unpaired half-edges incident to vertices at distance r − 1. When
Br(Gn ) (oi ) is a tree, which happens whp for fixed r, then indeed Zr(n;i) = |∂Br(Gn ) (oi )|. How-
ever, we do not impose this. It is more convenient to work with Zr(n;i) than with |∂Br(Gn ) (oi )|,
since we can think of the half-edges counted in Zr(n;i) as corresponding to a “super-vertex”
that encodes much of the information in ∂Br(Gn ) (oi ). In particular, conditional on Br(Gn ) (oi ),
half-edges are still paired uar, so the super-vertices can be combined with the vertices in
[n] \ (Br(Gn ) (o1 ) ∪ Br(Gn ) (o2 )) to form a new configuration model. Such graph surgery pro-
cedures are very convenient for configuration models.
The following corollary shows that, for all r ≥ 1, the processes (Zl(n;1) , Zl(n;2) )rl=0 are
close to two independent unimodular branching processes with root-offspring distribution
(pk )k≥1 (recall Definition 1.26):
Corollary 7.3 (Coupling of neighborhoods of two vertices) Consider CMn (d) where the
degrees d = (dv )v∈[n] satisfy Conditions 1.7(a),(b). Let o1 , o2 be chosen independently and
uar from [n]. Then, for every fixed r ≥ 1,
P(distCMn (d) (o1 , o2 ) ≤ 2r) = o(1), (7.3.1)
and
d
(Zl(n;1) , Zl(n;2) )rl=0 −→ (Zl(1) , Zl(2) )rl=0 , (7.3.2)
where (Zl(1) , Zl(2) )l≥0 are two iid unimodular branching processes with root-offspring dis-
tribution (pk )k≥1 as in the local limit in Theorem 4.1.
Corollary 7.3 follows from the local convergence in probability in Theorem 4.2, combined
with Corollaries 2.19 and 2.20. However, since Zr(n;i) is not quite |∂Br(Gn ) (oi )|, a little more
work is needed to prove (7.3.2). This is left as Exercise 7.4.

7.3.2 PATH -C OUNTING U PPER B OUNDS AND D ISTANCE L OWER B OUNDS

In this subsection we present path-counting techniques similar to those in Section 6.3.1, and
use them to prove lower bounds on graph distances. Since CMn (d) is a multi-graph, and
not a simple graph like NRn (w), we adapt the definition of a path in Definition 6.5 to make
precise the meaning of a path in CMn (d). We start by introducing some notation:
Definition 7.4 (Paths in configuration models) A path ~π of length k in CMn (d) is a
sequence
~π = {(π0 , s0 ), (π1 , s1 , t1 ), . . . , (πk−1 , sk−1 , tk−1 ), (πk , tk )}, (7.3.3)
where πi ∈ [n] denotes the ith vertex along the path, and si ∈ [dπi ] and ti+1 ∈ [dπi+1 ]
denote the labels of the half-edges incident to πi and πi+1 , respectively, which are such that
si and ti+1 are paired to create an edge between πi and πi+1 . In particular, multiple edges
between πi and πi+1 give rise to distinct paths through the same vertices. J
294 Small-World Phenomena in Configuration Models

For a path ~π as in (7.3.3), we write ~π ⊆ CMn (d) when the path ~π in (7.3.3) is present in
CMn (d), so that the half-edge corresponding to si is paired with the half-edge correspond-
ing to ti+1 for i = 0, . . . , k − 1. Without loss of generality, we assume throughout that the
path ~π is self-avoiding, i.e., that π0 , . . . , πk are distinct vertices.
In this section, we perform first-moment computations on the number of paths present in
CMn (d). In the next section, we perform second-moment methods.

Upper Bounds on the Expected Number of Paths in CMn (d)

For a, b ∈ [n], I ⊆ [n], and k ≥ 1, we let Pk (a, b) = Pk (a, b; I) denote the set of k -paths
that use only vertices in I , and we let Nk (a, b) = Nk (a, b; I), given by
Nk (a, b) = #{~π ∈ Pk (a, b) : ~π ⊆ CMn (d), πi ∈ I ∀i ∈ [k − 1]}, (7.3.4)
denote the number of paths of length k between the vertices a and b. Then, we prove the
following upper bound on the expected number of paths connecting a and b:
Proposition 7.5 (Expected numbers of paths) For any k ≥ 1, a, b ∈ [n], and d =
(dv )v∈[n] ,
da db `n X di (di − 1)
E[Nk (a, b)] ≤ ν k−1 , where νI = . (7.3.5)
(`n − 2k + 1)2 I i∈I\{a,b}
`n

Proof The probability that the path ~π in (7.3.3) is present in CMn (d) is equal to
k
Y 1
P(~π ⊆ CMn (d)) = , (7.3.6)
` − 2i + 1
i=1 n

and the number of paths with fixed vertices π0 , . . . , πk is equal to

k−1
!
Y
dπ0 dπi (dπi − 1) dπk . (7.3.7)
i=1

Substituting π0 = a, πk = b, we arrive at

da db X∗ k−1Y dπ (dπ − 1)
E[Nk (a, b)] = i i
, (7.3.8)
`n − 2k + 1π ,...,π i=1 `n − 2i + 1
1 k−1

where the sum is over distinct elements of I \ {a, b} (as indicated by the asterisk). Let R
denote the subset of vertices of I \ {a, b} for which di ≥ 2. Then
k−1
da db X∗ Y dπi (dπi − 1)
E[Nk (a, b)] = . (7.3.9)
`n − 2k + 1π `n − 2i + 1
1 ,...,πk−1 ∈R i=1

By an inequality of Maclaurin (Hardy et al., 1988, Theorem 52), for r = |R|, 2 ≤ k ≤ r+1,
and any (ai )i∈R with ai ≥ 0, we have
k−1
!k−1
(r − k + 1)! X∗ Y 1X
aπi ≤ ai . (7.3.10)
r! π ,...,π ∈R i=1
r i∈R
1 k−1
7.3 Proofs of Small-World Results for the Configuration Model 295

Let ai = di (di − 1), so that

X
ai = `n νI . (7.3.11)
i∈R

Finally, we arrive at
k−1
Y (r − i + 1)
da db
E[Nk (a, b)] ≤ (`n νI /r)k−1
`n − 2k + 1 i=1
(`n − 2i + 1)
k−2
Y (1 − i/r)
da db `n
≤ νIk−1 . (7.3.12)
`n − 2k + 1 `n − 2k + 3 i=0
(1 − 2i/`n )

dv ≥ 2r, so that 1−i/r ≤ 1−2i/`n . Substitution yields the required

P
Further, `n = v∈[n]
bound.

Logarithmic Lower Bound on Typical Distances in the Configuration Model

With Proposition 7.5 in hand, we can immediately prove the lower bound on the typical
distance when the degrees have finite second moment (as in Theorem 6.4):
Theorem 7.6 (Logarithmic lower bound on typical distances) Consider CMn (d). Assume
that
lim sup νn > 1, where νn = E[Dn (Dn − 1)]/E[Dn ]. (7.3.13)
n→∞

Then, for any ε > 0, with o1 , o2 chosen independently and uar from [n],
P(distCMn (d) (o1 , o2 ) ≤ (1 − ε) logνn n) = o(1). (7.3.14)
We leave the proof of Theorem 7.6, which is almost identical to that of Theorem 6.4, with
(7.3.5) in Proposition 7.5 to hand, as Exercise 7.5.
We next investigate the τ = 3 case, where the degree distribution has logarithmic correc-
tions to the power law, as also investigated in Theorem 6.28 for NRn (w):
Corollary 7.7 (Critical τ = 3 case: interpolation) Consider CMn (d) where the degrees
d = (dv )v∈[n] satisfy Conditions 1.7(a),(b), and there exists an α such that, for all x ≥ 1,
[1 − Fn ](x) ≤ c2 x−2 (log x)2α . (7.3.15)
Let o1 , o2 be chosen independently and uar from [n]. Then, for any ε > 0 and α > − 21 ,
log n
P distCMn (d) (o1 , o2 ) ≤ (1 − ε) = o(1), (7.3.16)
(2α + 1) log log n
while, for α < − 12 and with ν = limn→∞ νn < ∞,

P distCMn (d) (o1 , o2 ) ≤ (1 − ε) logν n = o(1).

(7.3.17)
Proof The proof follows from Theorem 7.6 by realizing that (7.3.15) implies that νn ≤
C(log n)1+2α for α > − 12 , while (Dn2 )n≥1 is uniformly integrable for α < − 12 , so that
νn → ν = E[D(D − 1)]/E[D] < ∞. The details are left as Exercise 7.6.
296 Small-World Phenomena in Configuration Models

Doubly Logarithmic Lower Bound for Infinite-Variance Degrees

We next extend the above upper bounds on the expected number of paths in order to deal
with the case τ ∈ (2, 3), where, similarly to the analysis in Section 6.3.2 for NRn (w), we
need to truncate the degrees occurring in the paths that arise. Our main result is as follows:
Theorem 7.8 (Doubly logarithmic lower bound on typical distances in CMn (d)) Consider
CMn (d) where the degrees d = (dv )v∈[n] satisfy Condition 1.7(a) and there exist τ ∈
(2, 3) and c2 such that, for all x ≥ 1,
[1 − Fn ](x) ≤ c2 x−(τ −1) . (7.3.18)
Then, for any ε > 0, with o1 , o2 chosen independently and uar from [n],
2 log log n
P distCMn (d) (o1 , o2 ) ≤ (1 − ε) = o(1). (7.3.19)
| log (τ − 2)|
Proof The proof of Theorem 7.8 is similar to that of Theorem 6.7, and we discuss the
changes only. For a fixed set of distinct vertices (π0 , . . . , πk ), (7.3.6) and (7.3.7) yield that
the probability that there exist edges between πi−1 and πi for all i ∈ [k] is bounded above
by
k−1
Y dπ (dπ − 1)
dπ0 dπk i i
. (7.3.20)
`n − 2k + 1 i=1 `n − 2i + 1

Equation (7.3.20) replaces the similar identity (6.3.6) for CLn (w). We see that wπ0 and wπk
in (6.3.6) are replaced by dπ0 and dπk in (7.3.20), and, for i ∈ [k − 1], the factors wπ2 i in
(6.3.6) are replaced by dπi (dπi − 1) in (7.3.20) while the factors `n in (6.3.6) are replaced
by `n − 2i + 1 for i ∈ [k] in (7.3.20).
Define, as in (6.3.31),
1 X
νn (b) = dv (di − 1)1{di ≤b} . (7.3.21)
`n v∈[n]

Then, the arguments in Section 6.3.2 imply that (see in particular Exercise 6.12),
kn k−1
du dv X `kn (`n − 2k − 1)!! Y
P(distCMn (d) (u, v) ≤ kn ) ≤ νn (bl ∧ bk−l ) (7.3.22)
`n k=1 (`n − 1)!! l=1
kn k
X `k (`n − 2k − 1)!!
n
Y
+ (du + dv ) [1 − Fn? ](bk ) νn (bl ),
k=1
(`n − 1)!! l=1

i.e., the bound in (6.8.7) is changed by factors `kn (`n − 2k − 1)!!/(`n − 1)!! in the sum. For
k = O(log log n) and when Conditions 1.7(a),(b) hold,
k
`kn (`n − 2k − 1)!! Y `n
= = 1 + O(k 2 /`n ) = 1 + o(1), (7.3.23)
(`n − 1)!! i=1
` n − 2i + 1
so this change has a negligible effect. Since (6.3.38) in Lemma 6.10 applies under the con-
ditions of Theorem 7.8, we can follow the proof of Theorem 6.7 verbatim.
7.3 Proofs of Small-World Results for the Configuration Model 297

7.3.3 PATH -C OUNTING L OWER B OUNDS AND R ESULTING D ISTANCE U PPER B OUNDS
In this subsection we provide upper bounds on typical distances in CMn (d). We start by
using the “giant is almost local” results proved in Section 4.3.1; see in particular Remark
4.13. After this, we continue with path-counting techniques similar to those in Section 6.5.1,
focussing on the variance of the number of paths in CMn (d). Such estimates turn out to be
extremely versatile and can be used extensively to prove various upper bounds on distances,
as we will show in the remainder of the section.

Consequences of the “Giant is Almost Local” Proof

We start by proving the following result using extensions of the “giant is almost local” proof:

Theorem 7.9 (Logarithmic upper bound on graph distances in CMn (d)) Consider CMn (d)
where the degrees d = (dv )v∈[n] satisfy Conditions 1.7(a)–(c) with ν = E[D(D−1)]/E[D]
∈ (1, ∞). Then, for any ε > 0, with o1 , o2 chosen independently and uar from [n],

P(distCMn (d) (o1 , o2 ) ≤ (1 + ε) logν n | distCMn (d) (o1 , o2 ) < ∞) = 1 + o(1). (7.3.24)

Proof Recall Section 4.3.1, where the degree-truncation technique from Theorem 1.11
was used with b sufficiently large. Recall that CMn0 (d0 ) denotes the CM after the degree-
truncation method has been applied. Then, for Gn = CMn0 (d0 ) with d0max ≤ b, the
proof shows that, when |∂Br(Gn ) (o1 )|, |∂Br(Gn ) (o2 )| ≥ r, whp also distCMn0 (d0 ) (o1 , o2 ) ≤
logνn0 n(1 + oP (1)) (recall Remark 4.13). Here νn0 = E[Dn0 (Dn0 − 1)]/E[Dn0 ] denotes the
expected forward degree of a uniform half-edge in CMn0 (d0 ).
We next relate this to the bound in (7.3.24). We note that νn0 → ν 0 when Conditions
1.7(a),(b) hold, where, by construction,
E[D0 (D0 − 1)] E[(D ∧ b)((D ∧ b) − 1)]
ν0 = = . (7.3.25)
E[D ]0 E[D]
The latter equality holds since v∈[n0 ] d0v = v∈[n] dv , and d0v = dv ∧ b for v ∈ [n], while
P P
d0v = 1 for v ∈ [n0 ] \ [n]. Thus, we also have
X X X
d0v (d0v − 1) = d0v (d0v − 1) = (dv ∧ b)((dv ∧ b) − 1). (7.3.26)
v∈[n0 ] v∈[n] v∈[n]

Taking the limit as n → ∞, and using Condition 1.7(a), proves (7.3.25).

By (7.3.25), ν 0 = ν 0 (b) % ν for b → ∞. Thus, whp and for any ε > 0,

distCMn0 (d0 ) (o1 , o2 ) ≤ (1 + ε) logν n. (7.3.27)

Since also P(distCMn (d) (o1 , o2 ) < ∞) → ζ 2 and P(distCMn0 (d0 ) (o1 , o2 ) < ∞) → (ζ 0 )2 ,
where ζ 0 = ζ 0 (b) → ζ when b → ∞, this gives the first proof of the upper bound in
Theorem 7.9.
Exercise 7.7 shows that the analysis in Section 4.3.1 can be performed without the degree-
truncation argument of Theorem 1.11 when lim supn→∞ E[Dn3 ] < ∞. Exercise 7.8 extends
the condition to lim supn→∞ E[Dnp ] < ∞ for some p > 2.
298 Small-World Phenomena in Configuration Models

Second-Moment Method for the Number of Paths in the Configuration Model

We next extend the first-moment bounds on the number of paths in CMn (d) in Section 7.3.2
to second-moment bounds. This allows us to give a second proof of Theorem 7.9, as well as
proofs of several other useful and interesting results.
We start by setting up the notation. Let I ⊆ [n] be a subset of the vertices. Fix k ≥ 1,
and define
!k−1
`kn (`n − 2k − 1)!! da db X di (di − 1)
n̄k (a, b) = , (7.3.28)
(`n − 1)!! `n i∈I\{a,b}
`n
!k−1
da db X di (di − 1)
nk (a, b) = , (7.3.29)
`n i∈I
`n
a,b,k

where Ia,b,k is the subset of I in which a and b, as well as the k − 1 indices with highest
degrees, have been removed. Let
1 X 1 X
νI = di (di − 1), γI = 3/2
di (di − 1)(di − 2). (7.3.30)
`n i∈I `n i∈I
The following proposition replaces the similar Proposition 6.14 for CLn (w), which was
crucial in deriving lower bounds on typical distances:
Proposition 7.10 (Variance of number of paths) For any k ≥ 1, a, b ∈ I ,

E[Nk (a, b)] ≥ nk (a, b). (7.3.31)

Further, assuming that νI > 1,

Var(Nk (a, b)) ≤ nk (a, b)

γ ν2 1 1 γI2 νI
+ n̄k (a, b)2 0
I I
+ + + e k , (7.3.32)
νI − 1 da db da db (νI − 1)2
where
k
!
Y `n − 2i + 1
e0k = −1
` − 2i − 2k + 1
i=1 n

`2k (`n − 4k − 1)!! γI γI νI 3 2 3

+k n e2k γI /νI − 1 . (7.3.33)

1+ 1+
(`n − 1)!! da νI db νI νI − 1
Proof The proof of (7.3.31) follows immediately from (7.3.8), together with the fact that
1/(`n − 2i + 1) ≥ 1/`n .
For the proof of (7.3.32), we follow the proof of (6.5.7), and discuss the differences only.
We recall that
1{~π⊆CMn (d)}
X
Nk (a, b) = (7.3.34)
π ∈Pk (a,b)
~

is the number of paths ~π of length k between the vertices a and b, where a path was defined
7.3 Proofs of Small-World Results for the Configuration Model 299

in (7.3.3). Since Nk (a, b) is a sum of indicators, its variance can be written as follows:

Var(Nk (a, b))

X
P(~π , ρ
~ ⊆ CMn (d)) − P(~π ⊆ CMn (d))P(~

= ρ ⊆ CMn (d)) . (7.3.35)
~ ρ∈Pk (a,b)
π ,~

Equation (7.3.35) replaces (6.5.13) for NRn (w).

We next proceed to prove (7.3.32), with e0k defined in (7.3.33). We say that two paths
~π and ρ~ are disjoint when they use distinct sets of half-edges. Thus, it is possible that the
vertex sets {π1 , . . . , πk−1 } and {ρ1 , . . . , ρk−1 } have a non-empty intersection, but then the
half-edges leading in and out of the joint vertices for ~π and ρ ~ must be distinct. For NRn (w),
pairs of paths using different edges are independent, so that these pairs do not contribute to
Var(Nk (a, b)). For CMn (d), however, for disjoint pairs of paths ~π and ρ ~,
k
Y `n − 2i + 1
P(~π , ρ
~ ⊆ CMn (d)) = P(~π ⊆ CMn (d))P(~
ρ ⊆ CMn (d)).
`
i=1 n
− 2i − 2k + 1
(7.3.36)
Summing the quantity
k
!
Y `n − 2i + 1
− 1 P(~π ⊆ CMn (d))P(~
ρ ⊆ CMn (d)) (7.3.37)
` − 2i − 2k + 1
i=1 n

~ gives rise to the first contribution to e0k . For the other contributions, we
over all ~π and ρ
follow the proof of (6.5.13) for NRn (w), and omit further details.

With Proposition 7.10 in hand, we can straightforwardly adapt the proof of Theorem 6.19
to CMn (d) to prove Theorem 7.9. We leave this proof of Theorem 7.9 as Exercise 7.9.

Critical Case τ = 3 with Logarithmic Corrections

Recall the lower bounds on distances from Corollary 7.7. Upper bounds can be derived in a
similar way as for CLn (w) in the proof of Theorem 6.22 in Section 6.5.2. We refrain from
giving more details.

7.3.4 D OUBLY-L OGARITHMIC U PPER B OUNDS FOR I NFINITE -VARIANCE D EGREES

In order to prove the doubly logarithmic upper bound on the typical distance for CMn (d)
in Theorem 7.2, we will use a different approach compared to that in the related proof of
Theorem 6.11 for NRn (w). We start by describing the setting. We assume that there exist
τ ∈ (2, 3), β > 12 , and c1 such that, uniformly in n and x ≤ nβ ,

[1 − Fn ](x) ≥ c1 x−(τ −1) . (7.3.38)

Our main result is as follows:

Theorem 7.11 (Doubly logarithmic upper bound on typical distance for τ ∈ (2, 3)) Con-
sider CMn (d) where the degrees d = (dv )v∈[n] satisfy Conditions 1.7(a),(b) and (7.3.38).
300 Small-World Phenomena in Configuration Models

Then, for any ε > 0, with o1 , o2 chosen independently and uar from [n],
!
2(1 + ε) log log n
lim P distCMn (d) (o1 , o2 ) ≤ | distCMn (d) (o1 , o2 ) < ∞ = 1.
n→∞ | log (τ − 2)|
(7.3.39)
Exercise 7.10 explores a proof of Theorem 7.11, based on the proof for NRn (w) in
Theorem 6.11, that is an alternative to the proof given below.
Proof This proof of Theorem 7.11 makes precise the statement that vertices of large degree
d are directly connected vertices of degree approximately d1/(τ −2) .

Connectivity of Sets
We start by studying the connectivity of sets in CMn (d), for which we rely on the following
connectivity lemma, which is of independent interest:
Lemma 7.12 (Connectivity sets in CMn (d)) For any two sets of vertices A, B ⊆ [n],
P(A not directly connected to B in CMn (d)) ≤ e−dA dB /(2`n ) , (7.3.40)
where, for any A ⊆ [n],
X
dA = da (7.3.41)
a∈A

denotes the total degree of vertices in A.

Lemma 7.12 extends the result in Exercise 6.18 for NRn (w) to CMn (d).
Proof There are dA half-edges incident to the set A, which we pair one by one. After
having paired k half-edges, all to half-edges that are not incident to B , the probability of
pairing the next half-edge to a half-edge that is not incident to B equals
dB dB
1− ≤1− . (7.3.42)
`n − 2k + 1 `n
Some half-edges incident to A may attach to other half-edges incident to A, so that possibly
fewer than dA half-edges need to be paired in order to to pair all of them. However, since
each pairing uses up at most two half-edges incident to A, we need to pair at least dA /2
half-edges, so that
dB dA /2
P(A not directly connected to B) ≤ 1 − ≤ e−dA dB /(2`n ) , (7.3.43)
`n
where we use 1 − x ≤ e−x .

Power-Iteration Proof for Ultra-Small Distances

We prove Theorem 7.11 using a technique that we call power iteration. Power iteration
shows that vertices with large degrees are directly connected to vertices with even larger
degrees and thus can be used to find a short path to the vertices with the highest degrees in
the configuration model.
Fix Gn = CMn (d). Assume that the degrees d = (dv )v∈[n] satisfy Conditions 1.7(a),(b)
7.3 Proofs of Small-World Results for the Configuration Model 301

and (7.3.38). Fix r ≥ 1, and condition on Br(Gn ) (o1 ) and Br(Gn ) (o2 ) such that ∂Br(Gn ) (o1 ) 6=
d
∅ and ∂Br(Gn ) (o2 ) 6= ∅. By Corollary 7.3 (Zr(n;1) , Zr(n;2) ) −→ (Zr(1) , Zr(2) ), and Zr(n;1) and
Zr(n;2) are whp quite large since we are conditioning on Zr(n;1) ≥ 1 and Zr(n;2) ≥ 1. Fix
C > 1 large, and note that, by Lemma 7.12, the conditional probability that none of the
Zr(n;i) half-edges is paired to a vertex of degree at least d is at most
( )
dv 1{dv ≥d} /(2`n ) .
X
exp − Zr (n;i)
(7.3.44)
v∈[n]

We use (7.3.38) to obtain the bound

dv 1{dv ≥d} /(2`n ) ≥ d[1 − Fn ](d)(n/2`n ) ≥ cd2−τ ,
X
(7.3.45)
v∈[n]

where c = c1 /(2 supn E[Dn ]). With d = (Zr(n;i) )1/(τ −2+ε) , the probability (7.3.44) is at
most exp{−c(Zr(n;i) )ε/(2−τ +ε) }. Call the maximal-degree vertex to which one of the Zr(n;i)
half-edges is paired the first power-iteration vertex.
k
(n;i) 1/(τ −2+ε)
We now iterate these ideas. Denote uk = u(i) k = (Zr ) . Then, the probability
that the (k − 1)th power-iteration vertex is not paired to a vertex of degree at least uk is at
ε/(2−τ +ε)
most exp{−cuk−1 }, and we call the maximum-degree vertex to which the (k − 1)th
power-iteration vertex is paired the k th power-iteration vertex.
We iterate this until we reach one of the hubs in {v : dv > nβ }, where β > 21 , for which
we need at most kn? iterations, with kn? satisfying
?
kn
ukn? = (Zr(n;i) )1/(τ −2+ε) ≥ nβ . (7.3.46)
The smallest kn? for which this occurs is
& '
β
log log(n ) − log log(Z (n;i)
r )
kn? = . (7.3.47)
| log(τ − 2 + ε)|
Finally, the probability that power iteration fails from vertex oi is at most
∞
ε/(2−τ +ε)
X P
exp{−cuk−1 } −→ 0, (7.3.48)
k=1

when first n → ∞ followed by r → ∞.

On the event that power iteration succeeds from both vertices oi , the graph distance of
oi to {v : dv > nβ } is at most kn? + r. Since {v : dv > nβ } is whp a clique in CMn (d)
(recall Exercise 7.11), we conclude that, conditional on o1 ←→ o2 , distCMn (d) (o1 , o2 ) ≤
2kn? + 1 + 2r whp.

7.3.5 T HE C ORE AND ITS D IAMETER

We next discuss the core of the configuration model, consisting of those vertices that have
degree at least some power of log n, and we will bound the diameter of this core. Fix τ ∈
(2, 3). We take σ > 1/(3 − τ ) and define the core Coren of the configuration model to be
Coren = {v ∈ [n] : dv ≥ (log n)σ }, (7.3.49)
302 Small-World Phenomena in Configuration Models

i.e., the set of vertices of degree at least (log n)σ . Then, the diameter of the core is bounded
in the following theorem, which is interesting in its own right:

Theorem 7.13 (Diameter of the core) Consider CMn (d) where the degrees d = (dv )v∈[n]
satisfy Conditions 1.7(a),(b) and (7.3.38). For any σ > 1/(3 − τ ), the diameter of Coren
is whp bounded above by
2 log log n
+ 1. (7.3.50)
| log (τ − 2)|
We prove Theorem 7.13 below and start by setting up the notation for it. We note that
(7.2.2) implies that, for some β ∈ ( 12 , 1/(τ − 1)), we have

dmax = max dv ≥ u1 , where u1 = nβ . (7.3.51)

v∈[n]

Define
Γ1 = {v ∈ [n] : dv ≥ u1 }, (7.3.52)

so that Γ1 6= ∅. For some constant C > 0 to be determined later on, and for k ≥ 2, we
recursively define
τ −2
uk = C log n uk−1 . (7.3.53)

We identify uk in the following lemma:

Lemma 7.14 (Identification (uk )k≥1 ) For every k ≥ 1,

uk = (C log n)ak nbk , (7.3.54)

where
bk = β(τ − 2)k−1 , ak = [1 − (τ − 2)k−1 ]/(3 − τ ). (7.3.55)

Proof We note that ak , bk satisfy the following recursions for k ≥ 2,

bk = (τ − 2)bk−1 , ak = 1 + (τ − 2)ak−1 , (7.3.56)

with initial conditions b1 = β, a1 = 0. Solving the recursions yields our claim.

As a consequence of Lemma 7.12, Γ1 is whp a clique (see Exercise 7.11). Define

Γk = {v ∈ [n] : dv ≥ uk }. (7.3.57)

The key step in the proof of Theorem 7.13 is the following proposition, showing that whp
every vertex in Γk is connected to a vertex in Γk−1 :

Proposition 7.15 (Connectivity between Γk−1 and Γk ) Consider CMn (d) where the de-
grees d = (dv )v∈[n] satisfy Conditions 1.7(a),(b) and (7.3.38). Fix k ≥ 2, and take
C > 2E[D]/c1 , with c1 as in (7.3.38). Then the probability that there exists an i ∈ Γk
that is not directly connected to Γk−1 in CMn (d) is at most n−δ , for some δ > 0 that is
independent of k .
7.3 Proofs of Small-World Results for the Configuration Model 303

Proposition 7.15 implies that a vertex u of degree du 1 is whp directly connected to a

vertex of degree approximately d1/(τ
u
−2)
. Thus, vertices of high degree are directly connected
to vertices of even higher degree, where the higher degree is a power larger than 1 of the
original degree. This is the phenomenon that we have called “power iteration.” Theorem
7.13 can be understood by noting that from a vertex of degree (log n)σ for some σ > 0, we
need roughly log log n/| log(τ − 2)| power iterations to go to a vertex of degree nβ with
β > 12 .
Proof We note that, by definition,
X
dv ≥ uk−1 |Γk−1 | = uk−1 n[1 − Fn ](uk−1 ). (7.3.58)
v∈Γk−1

By (7.2.2), and since k 7→ uk is decreasing with u1 = nβ ,

[1 − Fn ](uk−1 ) ≥ c(uk−1 )1−τ . (7.3.59)
As a result, we obtain that for every k ≥ 2,
X
dv ≥ cn(uk−1 )2−τ . (7.3.60)
v∈Γk−1

By (7.3.60) and Lemma 7.12, using Boole’s inequality, the probability that there exists a
v ∈ Γk that is not directly connected to Γk−1 is bounded by
n o
n exp − uk nuk−1 [1 − F (uk−1 )]/(2`n )} ≤ n exp − cuk (uk−1 )2−τ /(2E[Dn ])

= n exp cC log n/(2E[Dn ])

= n1−cC/(2E[Dn ]) , (7.3.61)
where we use (7.3.53). By Conditions 1.7(a),(b), E[Dn ] → E[D] so that, as n → ∞ and
taking C > 2E[D]/c, we obtain the claim for any δ < cC/[2E[D]] − 1.
We now complete the proof of Theorem 7.13:
Proof of Theorem 7.13. Fix
j log log n k
kn? = . (7.3.62)
| log (τ − 2)|
As a result of Proposition 7.15 and the fact that kn? n−δ = o(1), whp every i ∈ Γk is directly
connected to Γk−1 for all k ≤ kn? . By Exercise 7.11, Γ1 forms whp a complete graph. As a
result, the diameter of Γkn? is at most 2kn? + 1. Therefore, it suffices to prove that
Coren ⊆ Γkn? . (7.3.63)
By (7.3.53), in turn, this is equivalent to ukn? ≤ (log n)σ , for any σ > 1/(3 − τ ). According
to Lemma 7.14,
ukn? = (C log n)akn? nbkn? . (7.3.64)
?
We note that nbkn? = exp (τ − 2)
kn
log n . Since, for τ ∈ (2, 3),
x(τ − 2)log x/| log (τ −2)| = x × x−1 = 1, (7.3.65)
304 Small-World Phenomena in Configuration Models

we find with x = log n that nbkn? ≤ e1/(τ −2) . Further, ak → 1/(τ − 3) as k → ∞, so that
(C log n)akn? = (C log n)1/(3−τ )+o(1) . We conclude that
ukn? = (log n)1/(3−τ )+o(1) , (7.3.66)
so that, by choosing n sufficiently large, we can make 1/(3−τ )+o(1) ≤ σ . This completes
the proof of Theorem 7.13.
Exercise 7.12 studies an alternative proof of Theorem 7.11 that proceeds by showing that
whp a short path exists between ∂Br(Gn ) (o1 ) and Coren when ∂Br(Gn ) (o1 ) is non-empty.

Critical Case τ = 2
We next discuss the critical case where τ = 2 and the degree distribution has logarithmic
corrections. We use adaptations of the power-iteration technique.
Let us focus on one specific example, where
c1
[1 − Fn ](x) ≥ (log x)−α , (7.3.67)
x
for all x ≤ nβ and some β > 12 . We take α > 1, since otherwise (Dn )n≥1 might not be
uniformly integrable. Our main result is as follows:
Theorem 7.16 (Example of ultra-ultra-small distances for τ = 2) Consider CMn (d)
where the degrees d = (dv )v∈[n] satisfy Conditions 1.7(a),(b) and (7.3.67). Fix r ≥ 1 and
(1−ε)/(α−1)
let u0 = r. Define, for k ≥ 1, recursively uk = exp uk−1 . Let kn? = inf{k : uk ≥
β
n }. Then, with o1 , o2 chosen independently and uar from [n],
P(distCMn (d) (o1 , o2 ) ≤ 2(kn? + r) + 1 | distCMn (d) (o1 , o2 ) < ∞) → 1, (7.3.68)
when first n → ∞ followed by r → ∞.
Proof We start from the setting discussed just above (7.3.44) and see how power iteration
applies in this case. We compute
1 X
dv 1{dv ≥d} = E[Dn 1{Dn ≥d} ] = P(Dn 1{Dn ≥d} > k)
X
n v∈[n] k≥0
X X
= P(Dn > k) = [1 − Fn ](k). (7.3.69)
k≥d k≥d

Using the lower bound in (7.3.67), which is valid until d = nβ , we obtain

β
n
1 X c1
dv 1{dv ≥d} ≥
X
(log k)−α ≥ c[(log d)1−α − (log nβ )1−α ]
n v∈[n] k=d
k
≥ c(log d)1−α , (7.3.70)
provided that d ≤ nε for some ε < β .
Therefore, the probability that a vertex of degree du is not directly connected to a vertex
of degree d = exp{d(1−ε)/(α−1)
u } is bounded by
exp − cdu (log d)1−α ≤ exp − cdεu .

(7.3.71)
7.4 Branching Processes with Infinite Mean 305

(1−ε)/(α−1)
Denote u0 = r, and recursively define uk = exp{uk−1 } for k ≥ 1. Again the k th
power-iteration vertex is the maximal-degree vertex to which the (k − 1)th power-iteration
vertex is connected.
The probability that the (k − 1)th power-iteration vertex is not paired to a vertex of
degree at least uk is at most exp{−cuεk−1 }. Recall that kn? = inf{k : uk ≥ nβ } denotes the
number of steps needed to reach a hub. Since the set of hubs {v : dv ≥ nβ } is whp a clique,
by Exercise 7.11, the probability that distCMn (d) (o1 , o2 ) > 2(r + kn? ) + 1 is at most
ε
X
oP (1) + 2 e−cuk−1 → 0, (7.3.72)
k≥1

when first n → ∞ followed by r → ∞. This completes the proof.

Exercises 7.13 and 7.14 investigate the scaling of kn? , which is very small. Exercise
7.15 investigates another example for which τ = 2, in which (7.3.67) is replaced by
γ
[1 − Fn ](x) ≥ c1 e−c(log x) /x for some c1 , c, γ ∈ (0, 1), and all x ≤ nβ .

7.4 B RANCHING P ROCESSES WITH I NFINITE M EAN

Recall the branching-process limit for neighborhoods in CMn (d) in Corollary 7.3. When
τ ∈ (2, 3), the branching processes (Zj(1) )j≥0 and (Zj(2) )j≥0 are well defined but have infi-
nite mean in generations 2, 3, etc. In this section we give a scaling result for the generation
sizes for branching processes with infinite mean. This result is crucial to describe the fluctua-
tions of the typical distances in CMn (d), and it also allows us to understand how ultra-small
distances of order log log n arise. The main result in this section is as follows:
Theorem 7.17 (Branching processes with infinite mean) Let (Zk )k≥0 be a branching pro-
cess with offspring distribution Z1 = X having distribution function FX . Assume that there
exist α ∈ (0, 1) and a non-negative non-increasing function x 7→ γ(x) such that
x−α−γ(x) ≤ 1 − FX (x) ≤ x−α+γ(x) , for large x, (7.4.1)
where x 7→ γ(x) satisfies

(a) x 7→ xγ(x) is non-decreasing,

R∞ x R∞ γ(y)
(b) 0 γ(ee ) dx < ∞, or, equivalently, e y log y
dy < ∞.
a.s.
Then αk log(Zk ∨ 1) −→ Y as k → ∞, with P(Y = 0) equal to the extinction probability
of (Zk )k≥0 .
In the analysis for the configuration model, we take α = τ −2, as α corresponds to the tail
exponent of the size-biased random variable D? (recall Lemma 1.23). Theorem 7.17 covers
the case where the branching process has a heavy-tailed offspring distribution. Indeed, it is
not hard to show that Theorem 7.17 implies that E[X s ] = ∞ for every s > α ∈ (0, 1) (see
Exercise 7.16 below).
We will not prove Theorem 7.17 in full generality. Rather, we prove it in the simpler, yet
still quite general, case in which γ(x) = c(log x)γ−1 for some γ ∈ [0, 1) and c > 0. See
Exercise 7.17 to see that this case indeed satisfies the assumptions in Theorem 7.17.
306 Small-World Phenomena in Configuration Models

In the proof of Theorem 7.17 for the special case γ(x) = c(log x)γ−1 , we rely on regu-
larly varying functions. Note, however, that (7.4.1) does not assume that x 7→ 1 − FX (x)
is regularly varying (meaning that xα [1 − FX (x)] is slowly varying; recall Definition 1.19).
Thus, instead, we work with the special slowly varying functions
1 − FX± (x) = x−α±γ(x) . (7.4.2)

Proof of Theorem 7.17 for γ(x) = c(log x)γ−1 . The proof is divided into five main steps.

The Split
We first assume that P(Z1 ≥ 1) = 1, so that the survival probability equals 1. Define
Mk = αk log(Zk ∨ 1). (7.4.3)
For i ≥ 1, we define
!
i (Zi ∨ 1)
Yi = α log . (7.4.4)
(Zi−1 ∨ 1)1/α
We make the split
Mk = Y1 + Y2 + · · · + Yk . (7.4.5)

P∞ this split, it is clear that the almost sure convergence of Mk follows when the sum
From
i=0 Yi converges, which is the case when, in turn,
∞
X
E Yi < ∞.

(7.4.6)
i=1

Inserting Normalization Sequences

We next investigate E Yi . We prove by induction on i that there exist constants κ < 1

and K > 0 such that

E Yi ≤ Kκi .

(7.4.7)
For i = 0, this follows from the fact that, when (7.4.1) holds, the random variable Y1 =
α log(Z1 ∨ 1) has bounded expectation. This initializes the induction hypothesis. We next
turn to the advancement of the induction hypothesis. For this, we recall the definition of un
in [V1, (2.6.7)], as
un = inf{x : 1 − FX (x) ≤ 1/n}, (7.4.8)
which indicates the order of magnitude of maxni=1 Xi . Here (Xi )ni=1 are iid random vari-
ables with distribution function FX . We also rely on u±
n , which are the un s corresponding
±
to FX in (7.4.2). Obviously,
u− +
n ≤ un ≤ un . (7.4.9)
Then we define
! !
i uZi−1 ∨1 i Zi ∨ 1
Ui = α log , Vi = α log , (7.4.10)
(Zi−1 ∨ 1)1/α uZi−1 ∨1
7.4 Branching Processes with Infinite Mean 307

so that Yi = Ui + Vi and
E Yi ≤ E Ui + E Vi .

(7.4.11)
We bound each of these terms separately.

Bounding the Normalizing Constants

In this step we analyze the normalizing constants n 7→ un , assuming (7.4.1), and use this
analysis, as well as the induction hypothesis, to bound E Ui .

When (7.4.1) holds, and since limx→∞ γ(x) = 0, there exists a constant Cε ≥ 1 such
that, for all n ≥ 1,
un ≤ Cε n1/α+ε . (7.4.12)
This gives a first bound on n 7→ un . We next substitute this bound into (7.4.1) and use that
x 7→ xγ(x) is non-decreasing together with γ(x) = (log x)γ−1 , to obtain
1 + o(1) = n[1 − FX (un )] ≥ n un−α−γ(un )

h n oi
1/α+ε γ
≥ n u−αn exp log C ε n . (7.4.13)

In turn, this implies that there exists a constant c > 0 such that
γ
un ≤ n1/α ec(log n) . (7.4.14)
1/α −c(log n)γ
In a similar way, we can show the matching lower bound un ≥ n e . As a result,
E Ui ≤ cαi E (log (Zi−1 ∨ 1))γ .

(7.4.15)
Using the concavity of x 7→ xγ for γ ∈ [0, 1), as well as Jensen’s inequality, we arrive at
γ
E Ui ≤ cαi E (log (Zi−1 ∨ 1)) = αi(1−γ) E[Mi−1 ]γ .

(7.4.16)

By (7.4.5) and (7.4.7), which implies that E[Mi−1 ] ≤ Kκ/(1 − κ), we arrive at
Kκ γ
E Ui ≤ αi(1−γ) c

, (7.4.17)
1−κ
γ
so that (7.4.7) follows for Ui , with κ = α1−γ < 1 and K replaced by c 1−κ Kκ
. An identical
argument implies that
Kκ γ
E log(u+ − i(1−γ)

Zi−1 ∨1 /u Zi−1 ∨1 ) ≤ α c . (7.4.18)
1−κ
Logarithmic Moment of an Asymptotically Stable Random Variable
In this step, which is the most technical, we bound E Vi . We note that by [V1, Theorem

2.33] and for Zi quite large, the random variable (Zi ∨ 1)/(uZi−1 ∨1 ) should be close to
being a stable random variable. We first add and subtract a convenient additional term, and
write

E Vi = E Vi − 2E log(u+ −

Zi−1 ∨1 /uZi−1 ∨1 )
+ −
+ 2E log(uZi−1 ∨1 /uZi−1 ∨1 ) .

(7.4.19)
308 Small-World Phenomena in Configuration Models

The latter term is bounded in (7.4.18). For the first term, we will rely on stochastic domina-
tion results in terms of 1 − FX± in (7.4.2).
We make use of the relation to stable distributions by obtaining the bound

E Vi − 2E log(u+ −

Zi−1 ∨1 /uZi−1 ∨1 )
n o
≤ αi sup E log Sm /um − 2 log(u+ −

m /u m ) , (7.4.20)
m≥1

where Sm = X1 + · · · + Xm , and (Xi )m i=1 are iid copies of the offspring distribution X .
Our aim is to prove that there exists a constant C > 0 such that, for all m ≥ 1,
E log Sm /um − 2 log(u+ −

m /um ) ≤ C. (7.4.21)
In order to prove (7.4.21), we note that it suffices to obtain the bounds
E log Sm /um + − log(u+ −

m /um ) ≤ C+ , (7.4.22)
E log Sm /um − − log(u+ −

m /um ) ≤ C− , (7.4.23)

where, for x ∈ R, x+ = max{x, 0} and x− = max{−x, 0}. Since |x| = x+ + x− , we

then obtain (7.4.21) with C = C+ + C− . P
m
−
We start by proving (7.4.23). Let Sm = i=1 Xi− , where Xi− has distribution function
FX− . We note that (log x)− = log (x−1 ∨ 1), so that
E log Sm /um − − log(u+ −

m /um )

= E log um /(Sm ∧ um ) − log(u+ −

m /um )
−
≤ E log um /(Sm ∧ u− − log(u+ −

m) m /um )

≤ E log u− − −

m /(Sm ∧ um ) , (7.4.24)
where x ∧ y = min{x, y} and we have used (7.4.9). The random variables Xi− have a
regularly varying tail, so that we can use extreme-value theory in order to bound the above
quantity.
− −

The function x 7→ log (um /(x ∧ um ) is non-increasing, and, since Sm ≥ X(m) where
− −
X(m) = max1≤i≤m Xi , we arrive at
E log u− − −
≤ E log u− − −

m /(Sm ∧ um ) m /(X(m) ∧ um ) . (7.4.25)
We next use that, for x ≥ 1, x 7→ log(x) is concave, so that, for every s ≥ 0,
1
E log u− − −
= E log (u− − − s

m /(X(m) ∧ um ) m /(X(m) ∧ um ))
s
1
− s

≤ log E u− −
m /(X(m) ∧ um )
s
1 1 − −s
≤ + log (u− m ) s
E (X(m) ) , (7.4.26)
s s
where, in the last step, we have made use of the fact that u− − −
m /(x ∧ um ) ≤ 1 + um /x.
−s s −
Now rewrite X(m) as (−Y(m) ) , where Yj = −1/Xj and Y(m) = max1≤j≤m Yj . Clearly,
Yj ∈ [−1, 0] since Xi− ≥ 1, so that E[(−Y1 )s ] < ∞. Also, u− −
m Y(m) = −um /X(m)
7.4 Branching Processes with Infinite Mean 309

converges in distribution to −E −1/α , where E is exponential with mean 1, so it follows

from (Pickands III, 1968, Theorem 2.1) that, as m → ∞,
E (u− s
→ E[E −s/α ] < ∞,

m Y(m) ) (7.4.27)
when s < α, as required.
We proceed
Pm by proving (7.4.22), which is a slight adaptation of the above argument. Let
Sm+
= i=1 Xi+ , where Xi+ has distribution function FX+ . Now we make use of the fact
that (log x)+ = log (x ∨ 1) ≤ 1 + x for x ≥ 0, so that we need to obtain the bound, using
(7.4.9),
E log Sm ∨ um /um − log(u+ −

m /um )

≤ E log Sm ∨ um /u+ ≤ E log Sm +

∨ u+ +

m m /um
1 + 1 1 + + s
≤ E log (Sm ∨ u+ + s
≤ + log E Sm

m /um )) /um . (7.4.28)
s s s
We again rely on the theory of random variables with a regularly varying tail, of which
Xi+ is an example. The discussion in (Hall, 1981, page 565 and Corollary 1) yields, for
s < α, E[Sm s
] = E[|Sm |s ] ≤ 2s/2 λs (m), for some function λs (m) depending on s, m, and
FX . Using the discussion in (Hall, 1981, p. 564), we have that λs (m) ≤ Cs ms/α l(m1/α )s ,
where l( · ) is a slowly varying function. With some more effort, it can be shown that we can
replace ms/α l(m1/α )s by (u+ s
m ) , which gives
" #
S + s
log E m
≤ Cs , (7.4.29)
u+m

and which together with (7.4.28) proves (7.4.22) with C+ = 1/s + 2s/2 Cs /s.

Completion of the Proof of Theorem 7.17 when X ≥ 1

Combining (7.4.11) with (7.4.17) and (7.4.20), (7.4.21), we arrive at
Kκ γ
E Yi ≤ 3cαi(1−γ) + Cαi ≤ Kκi ,

(7.4.30)
1−κ
when we takeκ = α1−γ and take K to be sufficiently large, for example K ≥ 2C and
γ
Kκ
K ≥ 6c 1−κ . This completes the proof when the offspring distribution X satisfies X ≥
1. The proof for X ≥ 0 uses [V1, Theorem 3.12]; see Exercise 7.21.
We finally state some properties of the almost sure limit Y of (αk log(Zk ∨ 1))k≥0 , of
which we omit a proof:
Theorem 7.18 (Limiting variable for infinite-mean branching processes) Under the con-
ditions of Theorem 7.17, with Y the almost sure limit of αk log(Zk ∧ 1),
log P(Y > x)
lim = −1. (7.4.31)
x→∞ x
Theorem 7.18 can be understood from the fact that, by (7.4.3) and (7.4.4),
X∞

Y = Yi , where Y1 = α log Z1 ∨ 1 . (7.4.32)
i=1
310 Small-World Phenomena in Configuration Models

By (7.4.1),
1/α
P(Y1 > x) = P(Z1 > ex ) = e−x(1+o(1)) , (7.4.33)
which shows that Y1 satisfies (7.4.31). The equality in (7.4.32) together with (7.4.4) suggests
that the tails of Y1 are equal to those of Y , which heuristically explains (7.4.31). Exercise
7.23 gives an example where the limit Y is exactly exponential, so that the asymptotics in
Theorem 7.18 is exact. The key behind this argument is the fact that Y in Theorem 7.17
satisfies the distributional equation
d X
Y = α max Yi , (7.4.34)
i=1

where (Yi )i≥1 are iid copies of Y that are independent of X (see Exercise 7.22).

Intuition Behind Ultra-Small Distances using Theorem 7.17

We now use the results in Theorem 7.17 to explain how we can understand the ultra-small
distances in the configuration model for τ ∈ (2, 3) in Theorem 7.11. Fix Gn = CMn (d),
and suppose that the degrees satisfy the assumptions in Theorem 7.11. First, note that
1
P(distCMn (d) (o1 , o2 ) = k) = E[|∂Bk(Gn ) (o1 )|]. (7.4.35)
n
This suggests that distCMn (d) (o1 , o2 ) should be closely related to the value of k (if any exists)
such that E[|∂Bk(Gn ) (o1 )|] = Θ(n). By the branching-process approximation (as discussed
precisely in Corollary 7.3),
n o
|∂Bk(Gn ) (o1 )| = Zk(n;1) ≈ Zk(1) ≈ exp (τ − 2)−k Y (1) (1 + oP (1)) (7.4.36)

(the first approximation being true for fairly small k , but not necessarily for large k ), this
suggests that distCMn (d) (o1 , o2 ) ≈ kn , where, by Theorem 7.17,
n o
Θ(n) = Zk(1)n = exp (τ − 2)−kn Y (1) (1 + oP (1)) , (7.4.37)

which in turn suggests that distCMn (d) (o1 , o2 ) ≈ log log n/| log(τ −2)|. Of course, for such
values the branching-process approximation may fail miserably, and in fact it does. This is
exemplified by the fact that the rhs of (7.4.37) can become much larger than n, which is
clearly impossible for |∂Bk(Gnn ) (o1 )|.
More intriguingly, we see that the proposed typical distances are a factor 2 too small
compared with Theorem 7.2. The reason is that the double-exponential growth can clearly
no longer be valid when Zk(n;1) becomes too large, and thus, Zk(n;1) must be far away from Zk(1)
in this regime. The whole problem is that we are using the branching-process approximation
well beyond its “expiry date.”
Hence, let us try this again but, rather than using it for one neighborhood, let us use the
branching-process approximation from two sides. Now we rely on the statement that
P(distCMn (d) (o1 , o2 ) ≤ 2k) = P(∂Bk(Gn ) (o1 ) ∩ ∂Bk(Gn ) (o2 ) 6= ∅). (7.4.38)
Again using (7.4.36), we see that
log |∂Bk(Gn ) (oi )| ≈ (τ − 2)−k Y (i) (1 + oP (1)), i ∈ {1, 2},
7.5 Diameter of the Configuration Model 311

where Y (1) and Y (2) are independent copies of the random variable Y in Theorem 7.17.
We see that |∂Bk(Gn ) (o1 )| and |∂Bk(Gn ) (o2 )| grow roughly at the same pace, and, in par-
ticular, we have |∂Bk(Gn ) (oi )| = nΘ(1) roughly at the same time, namely, when k ≈
log log n/| log(τ − 2)|. Thus, we conclude that
distCMn (d) (o1 , o2 ) ≈ 2 log log n/| log(τ − 2)|,
as rigorously proved in Theorem 7.2. We will see in more detail in Theorem 7.25 that the
above growth from two sides does allow for better branching-process approximations.

An Insightful Lower Bound

Suppose, for simplicity, that the branching offspring random variable X satisfies P(X ≥
1) = 1, so that the infinite-mean branching process does not die out. Then, we can construct
a lower bound on the size of the k th generation by recursively taking the child having the
largest offspring. More precisely, we let Q0 = 1, and, given Qk−1 = qk−1 , we let Qk
denote the maximal offspring of the qk−1 individuals in the (k − 1)th generation, so that Qk
counts individuals in the k th generation. We call (Qk )k≥0 the maximum process, while we
could aptly call (Zk )k≥0 the sum process. We note that Q1 = Z1 , while Q2 is the maximum
number of children of individuals in the first generation.
The process (Qk )k≥0 is a Markov chain (see Exercise 7.24), and

P(Qk > q | Qk−1 = qk−1 ) = 1− P(Qk ≤ q | Qk−1 = qk−1 ) = 1−FX (q)qk−1 . (7.4.39)

We can write, similarly to (7.4.5) and by Exercise 7.25,

k
1/α
X
αk log Qk = αi log (Qi /Qi−1 ), (7.4.40)
i=1

which suggests that

a.s.
αk log Qk −→ Q∞ . (7.4.41)

Exercise 7.26 investigates the convergence in (7.4.41) in more detail.

The double-exponential growth of the maximum process for infinite-mean branching pro-
cesses under conditions as in Theorem 7.17, together with the fact that the conditional de-
pendence in such processes for configuration models is quite small, give the key intuition
behind the upper bound in Theorem 7.11.

7.5 D IAMETER OF THE C ONFIGURATION M ODEL

We continue the discussion of distances in the configuration model by investigating the

diameter in the model.

7.5.1 D IAMETER OF THE C ONFIGURATION M ODEL : L OGARITHMIC C ASE

Before stating the main result, we introduce some notation. Recall that G?D (x) is the proba-
bility generating function of p? = (p?k )k≥0 defined in (4.2.2) (note also (4.3.56)). We recall
312 Small-World Phenomena in Configuration Models

that ξ is the extinction probability of the branching process with offspring distribution p? ,
and further define
d ? X X
µ= GD (z) z=ξ = kξ k−1 p?k = k(k + 1)ξ k−1 pk+1 /E[D]. (7.5.1)
dz k≥0 k≥1

When ξ < 1, we also have that µ < 1. Then, the main result is as follows:
Theorem 7.19 (Diameter of the configuration model) Consider CMn (d) where the de-
grees d = (dv )v∈[n] satisfy Conditions 1.7(a),(b). Assume that E[Dn2 ] → E[D2 ] ∈ (0, ∞)∪
{∞}, where ν = E[D(D − 1)]/E[D] > 1. Assume further that n1 = 0 when p1 = 0, and
that n2 = 0 when p2 = 0. Then,
diam(CMn (d)) P 1 1{p >0} 1{p =0,p >0}
−→ +2 1 + 1 2
. (7.5.2)
log n log ν | log µ| | log p?1 |
For finite-variance degrees, we note that, by Theorems 7.1 and 7.19, the diameter of the
configuration model is strictly larger than the typical distance, except when p1 = p2 = 0.
In the latter case, the degrees are at least 3, so that thin lines, consisting of degree-2 vertices
connected to each other, are not possible and the configuration model is whp connected
(recall Theorem 4.24). By [V1, Corollary 7.17] (recall also the discussion around (1.3.41)),
Theorem 7.19 also applies to uniform random graphs with a given degree sequence, when
the degrees have finite second moment, as in the examples below.
We also remark that Theorem 7.19 applies not only to the finite-variance case but also to
the finite-mean and infinite-variance case. In the latter case, the diameter is of order log n
unless p1 = p2 = 0, in which case Theorem 7.19 implies that the diameter is oP (log n). We
will discuss the latter case in more detail in Theorem 7.20 below.

Random Regular Graphs

Let r be the degree of the random regular graph, where r ≥ 3. By [V1, Corollary 7.17] (re-
call also (1.3.41)), the diameter of a random regular r-graph has whp the same asymptotics
as the diameter of CMn (d), where dv = r for all v ∈ [n]. Thus, pr = 1 and pk = 0 for any
k 6= r. We assume that nr is even, so that the degree sequence is graphical. It is not hard
to see that all the assumptions of Theorem 7.19 are satisfied. Moreover, ν = r − 1. When
r ≥ 3, we thus obtain that
diam(CMn (d)) P 1
−→ . (7.5.3)
log n log (r − 1)
When r = 2, on the other hand, the graph is critical, so that there is no unique giant
component. Since ν = 1, we have that diam(CMn (d)) log n. This is quite reasonable,
since the graph consists of a collection of cycles. The diameter of such a graph is equal to
half the longest cycle. Exercises 7.27 and 7.28 explore the length of the longest cycle in a
random 2-regular graph, and the consequence of having a long cycle on the diameter.

Erdős–Rényi Random Graph

We next study the diameter of ERn (λ/n). We let λ > 1. By [V1, Theorem 5.12], Con-
ditions 1.7(a),(b) hold with pk = e−λ λk /k!. Also, µ in (7.5.1) equals µ = µλ , the dual
parameter in [V1, (3.6.6)] (see Exercise 7.29).
7.5 Diameter of the Configuration Model 313

Again, we make essential use of [V1, Theorem 7.18] (recall also Theorem 1.4 and the dis-
cussion below (1.3.29)), which relates the configuration model and the generalized random
graph. We note that ERn (λ/n) is the same as GRGn (w), where wv = nλ/(n − λ) for all
v ∈ [n] (recall [V1, Exercise 6.1]).
Clearly, w = (nλ/(n − λ))v∈[n] satisfies Conditions 1.1(a)–(c), so that the degree se-
quence of ERn (λ/n) also satisfies Conditions 1.7(a)–(c), where the convergence holds in
probability (recall [V1, Theorem 5.12]). From the above identifications and using [V1, The-
orem 7.18], we find that
diam(ERn (λ/n)) P 1 2
−→ + . (7.5.4)
log n log λ | log µλ |
This identifies the diameter of the Erdős–Rényi random graph, for which Theorem 7.19
agrees with Theorem 6.26. Exercise 7.30 investigates the diameter of GRGn (w).

Sketch of the Proof of Theorem 7.19

We recognize the term logν n as corresponding to the typical distances in the graph (recall
Theorem 7.1). Except when p1 = p2 = 0, there is a correction to this term, which is due
to deep, yet thin, neighborhoods. These thin neighborhoods act as “traps,” from which it
takes much longer than usual to escape. The diameter is typically realized as the distance
between two traps. The deepest traps turn out to be log n/| log µ| deep when p1 > 0 and
log n/[2| log p?1 |] deep when p1 = 0 and p2 > 0. Thus, let us look at traps in the configura-
tion model in more detail.
When p1 > 0, yet ν > 1, the outside of the giant component is a subcritical graph, which
locally looks like a subcritical branching process, owing to the duality principle. Thus, in the
complement of the giant, there exist trees of size up to Θ(log n), and the maximal diameter
of such trees is close to log n/| log µ|, where now µ is the dual parameter. Of course, these
are not the components of maximal diameter, but they turn out to be closely related, as we
discuss next.
In the supercritical case, we can view the giant as consisting of the 2-core and all the
trees that hang off it. The 2-core is the maximal subgraph of the giant for which every
vertex has degree at least 2. It turns out that the trees that hang off the 2-core have a very
similar law as the subcritical trees outside of the giant. Therefore, the maximal height of
such trees is again of the order log n/| log µ|. Asymptotically, the diameter in CMn (d) is
formed between pairs of vertices for which the diameter of the tree that they are in when the
2-core is cut away is largest, thus giving rise to the log n/| log µ| contribution, whereas the
distances between the two vertices closest to them in the 2-core is close to logν n. This can
be understood by realizing that for most vertices these trees have a bounded height, so that
the typical distances in the 2-core and in the giant are close for most vertices.
The above intuition does not give the right answer when p1 = 0 yet p2 > 0. Assume that
n1 = 0. Then, the giant and the 2-core are close to each other, so that the above argument
does not apply. Instead, it turns out that the diameter is realized by the diameter of the 2-
core, which is close to log n[1/ log ν + 1/| log p?1 |]. Indeed, the 2-core contains long paths
of degree-2 vertices. The longest such paths has length that is close to log n/| log p?1 |. There-
fore, the largest distance between a vertex inside this long path and the ends of the path is
314 Small-World Phenomena in Configuration Models

close to log n/[2| log p?1 |]. Now, it turns out that pairs of such vertices realize the asymptotic
diameter, which explains why the diameter is close to log n[1/ log ν + 1/| log p?1 |].
Finally, we discuss what happens when p1 = p2 = 0. In this case, the assumption in
Theorem 7.19 implies that n1 = n2 = 0, so that dmin ≥ 3. Then, CMn (d) is whp connected
(recall Theorem 4.24), and the 2-core is the graph itself. Also, there cannot be any long
thin parts of the giant, since every vertex has degree at least 3 so that local neighborhoods
grow exponentially with overwhelming probability. Therefore, the graph distances and the
diameter have the same asymptotics, as proved by Theorem 7.19 when dmin ≥ 3.
The above case-distinctions explain the intuition behind Theorem 7.19. This intuition is
far from a proof; see Section 7.7 for references.

7.5.2 D IAMETER OF I NFINITE -VARIANCE C ONFIGURATION M ODELS : L OG L OG C ASE

We next use the diameter of the core in Theorem 7.13 to study the diameter of CMn (d)
when τ ∈ (2, 3). Note that the diameter is equal to a positive constant times log n, by
Theorem 7.19 when p1 + p2 > 0. Therefore, we turn to the case where p1 = p2 = 0 and
τ ∈ (2, 3). When dmin ≥ 3, we know by Theorem 4.24 that CMn (d) is whp connected.
The main result about the diameter in this case is as follows:
Theorem 7.20 (Diameter of CMn (d) for τ ∈ (2, 3)) Consider CMn (d) where the de-
grees d = (dv )v∈[n] satisfy Conditions 1.7(a),(b) and (7.2.2). Assume further that dmin =
minv∈[n] dv ≥ 3 and pdmin = P(D = dmin ) > 0. Then,
diam(CMn (d)) P 2 2
−→ + . (7.5.5)
log log n | log (τ − 2)| log (dmin − 1)
When comparing Theorem 7.20 with Theorem 7.2, we see that for dmin ≥ 3 the diameter
is of the same doubly logarithmic order as the typical distance, but the constant differs.
As we have already seen in the sketch of the proof of Theorem 7.19, the value of the
diameter is due to pairs of vertices that have small or thin local neighborhoods. Such neigh-
borhoods act as local “traps.” When p1 + p2 > 0, these thin neighborhoods are close to
lines, and they can be of logarithmic length in n. When dmin ≥ 3, however, we see that even
the thinnest possible neighborhoods are bounded below by a binary tree. Indeed, by assump-
tion, there is a positive proportion of vertices of degree dmin . As a result, below we will see
that the expected number of vertices whose (1 − ε) log log n/ log (dmin − 1) neighborhood
contains only minimal-degree vertices tends to infinity (see Lemma 7.22). The minimal path
between two such vertices then consists of three parts: the two paths from the two vertices to
leave their minimally connected neighborhood, and the path between the boundaries of these
minimally connected neighborhoods. The latter path has length 2 log log n/| log (τ − 2)|,
as in Theorem 7.2. This explains why there is an extra term 2 log log n/ log (dmin − 1) in
Theorem 7.20 as compared with Theorem 7.2.
We next sketch the proof of Theorem 7.20. We start with the lower bound, which is the
easier part.

Proof Sketch for the Lower Bound on the Diameter

Fix Gn = CMn (d) under the conditions in Theorem 7.20, for which we introduce the
notion of minimally k -connected vertices:
7.5 Diameter of the Configuration Model 315

Definition 7.21 (Minimally k -connected vertices) We call a vertex v minimally k -connected

when all i ∈ Bk(Gn ) (v) satisfy di = dmin . Let Mk denote the number of minimally k -
connected vertices. J
P
To prove the lower bound on the diameter, we show that Mk −→ ∞ for k = d(1 −
ε) log log n/ log (dmin − 1)e. Following this, we show that two minimally k -connected ver-
tices o1 , o2 are such that whp the distance between ∂Bk(Gn ) (o1 ) and ∂Bk(Gn ) (o2 ) is at least
2 log log n/| log (τ − 2)|:

Lemma 7.22 (Moments of number of minimally k -connected vertices) Let CMn (d) sat-
isfy dmin ≥ 3, ndmin > dmin (dmin − 1)k−1 . For kn ≤ (1 − ε) log log n/ log (dmin − 1),

Mkn
E[Mkn ] → ∞,
P
and −→ 1. (7.5.6)
E[Mkn ]

We leave the proof of Lemma 7.22 to Exercises 7.31–7.33. To complete the proof of the
lower bound on the diameter, we fix ε > 0 sufficiently small and take
l log log n m
kn? = (1 − ε) .
log (dmin − 1)

Clearly,
?
dmin (dmin − 1)kn −1 ≤ (log n)1−ε ≤ `n /8, (7.5.7)

so that, in particular, we may use Lemma 7.22.

We conclude that, whp, Mkn? ≥ n1−o(1) . Since each minimally kn? -connected vertex uses
up at most
?
kn
X
1+ dmin (dmin − 1)l−1 = no(1) (7.5.8)
l=1

vertices of degree dmin , whp there must be at least two minimally kn? -connected vertices
whose kn? -neighborhoods are disjoint. We fix two such vertices and denote them by v1
?
and v2 . We note that v1 and v2 have precisely dmin (dmin − 1)kn −1 unpaired half-edges
in ∂Bk(Gn? n ) (v1 ) and ∂Bk(Gn? n ) (v2 ). Let A12 denote the event that v1 , v2 are minimally kn? -
connected and their kn? -neighborhoods are disjoint.
Conditional on A12 , the random graph found by collapsing the half-edges in ∂Bk(Gn? n ) (v1 )
to a single vertex a and the half-edges in ∂Bk(Gn? n ) (v1 ) to a single vertex b is a configuration
model on the vertex set {a, b} ∪ [n] \ (Bk(Gn? n ) (v1 ) ∪ Bk(Gn? n ) (v2 )), having degrees d0 given by
?
d0a = d0b = dmin (dmin − 1)kn −1 and d0i = di for every i ∈ [n] \ (Bk(Gn? n ) (v1 ) ∪ Bk(Gn? n ) (v2 )).
By the truncated first-moment method on paths, performed in the proof of Theorem 7.8
(recall (7.3.22)), it follows that, for any ε > 0,
2 log log n
P distCMn (d) (∂Bk(Gn? n ) (v1 ), ∂Bk(Gn? n ) (v2 )) ≤ (1−ε) A12 = o(1). (7.5.9)
| log (τ − 2)|
316 Small-World Phenomena in Configuration Models

Therefore, whp,
2 log log n
diam(CMn (d)) ≥ (1 − ε) + 2kn?
| log (τ − 2)|
h 2 2 i
= (1 − ε) log log n + . (7.5.10)
| log (τ − 2)| log (dmin − 1)
Since ε > 0 is arbitrary, this suggests the lower bound in Theorem 7.20.

Proof Sketch for the Upper Bound on the Diameter

We finally sketch the proof of the upper bound on the diameter. We aim to prove that, with
kn? = d(1 + ε) log log n[2/| log(τ − 2)| + 2/ log (dmin − 1)]e,
P(∃v1 , v2 ∈ [n] : distCMn (d) (v1 , v2 ) ≥ kn? ) = o(1). (7.5.11)
We already know that whp diam(Coren ) ≤ 2 log log n/| log (τ − 2)| + 1, by Theorem
7.13, and we assume this from now on. Fix v1 , v2 ∈ [n]. Then (7.5.11) follows if we can
show that
P(∃v : distCMn (d) (v, Coren ) ≥ (1 + ε) log log n/ log (dmin − 1)) = o(1). (7.5.12)
For this, we argue that, uniformly in v ∈ [n],
P(distCMn (d) (v, Coren ) ≥ (1 + ε) log log n/ log (dmin − 1)) = o(1/n), (7.5.13)
which would prove (7.5.11).
To prove (7.5.13), it is convenient to explore the neighborhood of the vertex v by pairing
up only the first dmin half-edges incident to v and the dmin − 1 half-edges incident to any
other vertex appearing in the neighborhood. We call this exploration graph the minimal k -
exploration graph.
One can show that it is quite unlikely that there are many cycles within this exploration
graph, so that it is actually close to a tree. Therefore,
l the number of half-edges on the bound-
m
ary of this k -exploration graph with k = kn? = (1 + ε/2) log log n/ log (dmin − 1) is
close to
?
(dmin − 1)kn ≈ (log n)1+ε/2 . (7.5.14)
This is large, but not extremely large. However, one of these vertices is bound to have a
quite large degree and thus, from this vertex, it is quite likely that we can connect to Coren
quickly, meaning in o(log log n) steps. The main ingredient in the upper bound is the state-
ment that, whp, the minimal kn? -exploration tree connects to Coren whp. We omit further
details.

7.6 R ELATED R ESULTS ON D ISTANCES IN C ONFIGURATION M ODELS

Distances for Infinite-Mean Degrees

We assume that there exist τ ∈ (1, 2) and c > 0 such that
lim xτ −1 [1 − F ](x) = c. (7.6.1)
x→∞
7.6 Related Results on Distances in Configuration Models 317

3
N=10
4
0.6 N=10
5
N=10

probability
0.4

0.2

0.0
0 1 2 3 4 5
hopcount

Figure 7.4 Typical distances for τ = 1.8 and n = 103 , 104 , 105 .
We will now study the configuration model CMn (d) where the degrees d = (dv )v∈[n] are
an iid sequence of random variables with distribution F satisfying (7.6.1).
We will make heavy use of the results and notation used in [V1, Theorem 7.23], which
we first recall: the random probability distribution P = (Pi )i≥1 is given by
Pi = Zi /Z, (7.6.2)
−1/(τ −1) Pi
where Zi = Γi and Γi = Ei with (Ei )i≥1 an iid sequence of exponential
j=1
P −1/(τ −1)
random variables with parameter 1 and where Z = i≥1 Γi . The latter is finite
almost surely, since 1/(τ − 1) > 1 for τ ∈ (1, 2) (see Exercise 7.34).
Recall further that MP,k is a multinomial distribution with parameters k and (random)
probabilities P = (Pi )i≥1 . Thus, MP,k = (B1 , B2 , . . .), where, conditional on P =
(Pi )i≥1 , Bi is the number of outcomes i in k independent trials such that each outcome
is equal to i with probability Pi .
In [V1, Theorem 7.23], the random variable MP,D1 appears, where D1 is independent of
P = (Pi )i≥1 . We let MP,D
(1)
1
and MP,D
(2)
2
be two such random variables that are conditionally
independent given P = (Pi )i≥1 (but share the same P = (Pi )i≥1 sequence). In terms of
this notation, the main result on distances in CMn (d) when the degrees have infinite mean
is the following:
Theorem 7.23 (Distances in CMn (d) with iid infinite mean degrees) Consider CMn (d)
where the degrees d = (dv )v∈[n] are a sequence of iid copies of D satisfying (7.6.1) for
some τ ∈ (1, 2). Then, with o1 , o2 chosen independently and uar from [n],
lim P(distCMn (d) (o1 , o2 ) = 2) = 1 − lim P(distCMn (d) (o1 , o2 ) = 3) = pF ∈ (0, 1).
n→∞ n→∞

The probability pF can be identified as the probability that an outcome occurs both in MP,D
(1)
1
and MP,D2 , where D1 and D2 are two iid copies of D.
(2)

Sketch of proof of Theorem 7.23. By exchangeability, it suffices to consider o1 = 1 and

o2 = 2. First, whp, we have both d1 ≤ log n and d2 ≤ log n.
The event that distCMn (d) (o1 , o2 ) = 1 occurs precisely when one of the d1 half-edges
of vertex 1 is attached to one of the d2 half-edges of vertex 2. With high probability, `n ≥
n1/(τ −1)−ε . On the event that `n ≥ n1/(τ −1)−ε , d1 ≤ log n, and d2 ≤ log n, the probability
that distCMn (d) (o1 , o2 ) = 1 is bounded above by
(log n)2
= o(1), (7.6.3)
n1/(τ −1)−ε
318 Small-World Phenomena in Configuration Models

so that distCMn (d) (o1 , o2 ) ≥ 2 whp (see Exercise 7.35).

We note that the proof of [V1, Theorem 7.23] implies that MP,d (1)
1
equals the limit in dis-
tribution of the number of edges between vertex 1 and the vertices with the largest degrees.
Indeed, MP,d (1)
1
= (B1(1) , B2(1) , . . .), where Bi(1) is the number of edges between vertex 1 and
the vertex with degree d(n+1−i) . The same applies to vertex 2, where the limit is denoted
by MP,d(2)
2
(again with the same P ). As a result, the graph distance between vertices 1 and 2
equals 2 precisely when MP,d (1)
1
and MP,d(2)
2
have an identical outcome (meaning that Bi(1) > 0
and Bi(2) > 0 for some i ≥ 1). We are left with proving that the graph distance between
vertices 1 and 2 is whp bounded by 3.
By [V1, (2.6.17)], we have that Pk k 1/(τ −1) −→ 1/Z as k → ∞. Thus, when K is
P

large, the probability that vertex 1 is not connected to any of the vertices corresponding to
(d(n+1−i) )i∈[K] converges to 0 when first n → ∞ followed by K → ∞.
Let Pn denote the conditional probability given the degrees (dv )v∈[n] . For i ∈ [n], we let
vi be the vertex corresponding to the ith largest degree d(n+1−i) . By Lemma 7.12,
Pn (vi not directly connected to vj ) ≤ exp − d(n+1−i) d(n+1−j) /2`n .

(7.6.4)

Moreover, d(n+1−i) , d(n+1−j) ≥ n1/(τ −1)−ε whp for n sufficiently large and any ε > 0, while
whp `n ≤ n1/(τ −1)+ε . As a result, whp,

Pn (vi not directly connected to vj ) ≤ exp − n1/(τ −1)−3ε = oP (1),

(7.6.5)

when ε > 0 is sufficiently small. Therefore, for fixed K and for every i, j ∈ [K], the
vertices vi and vj are whp neighbors. This implies that the vertices corresponding to the
highest degrees whp form a complete graph.
We have already concluded that 1 is whp connected to vi for some i ≤ K . In the same
way, we conclude that vertex 2 is whp connected to vj for some j ≤ K . Since vi is whp
directly connected to vj , we conclude that

Pn distCMn (d) (o1 , o2 ) ≤ 3 = 1 − o(1).

(7.6.6)

This completes the sketch of the proof of Theorem 7.23.

Fluctuation of Distances for Finite-Variance Degrees

We continue to study the fluctuations of the distances in the configuration model, starting
with the case where the degrees have finite variance. We need a limit result from branching-
process theory before we can identify the limiting random variables (Ra )a∈(−1,0] .
Recall that (Zk )k≥0 denotes the unimodular branching process with root-offspring dis-
tribution (pk )k≥1 , with pk = P(D = k). The process (Zk /E[D]ν k−1 )k≥1 is a martingale
with expectation equal to 1, and consequently converges almost surely to a limit (see, e.g.,
[V1, Theorem 2.24]):
Zn
lim =Y almostsurely (7.6.7)
n→∞ E[D]ν n−1
In the theorem below we need two independent copies Y (1) and Y (2) of Y :
Theorem 7.24 (Limit law for typical distance in CMn (d)) Consider CMn (d) where the
7.6 Related Results on Distances in Configuration Models 319

degrees d = (dv )v∈[n] are a sequence of iid copies of D satisfying that there exist τ > 3
and c < ∞ such that, for all x ≥ 1,

[1 − F ](x) ≤ cx−(τ −1) . (7.6.8)

Let ν = E[D(D − 1)]/E[D] > 1. For k ≥ 1, let ak = blogν kc − logν k ∈ (−1, 0]. Then,
there exist random variables (Ra )a∈(−1,0] such that, as n → ∞ and for all k ∈ Z, with
o1 , o2 chosen independently and uar from [n],
P distCMn (d) (o1 , o2 ) − blogν nc = k | distCMn (d) (o1 , o2 ) < ∞

(7.6.9)
= P(Ran = k) + o(1).
The random variables (Ra )a∈(−1,0] can be identified by

P(Ra > k) = E exp{−κν a+k Y (1) Y (2) } Y (1) Y (2) > 0 ,

(7.6.10)

where Y (1) and Y (2) are independent limit copies of Y in (7.6.7) and κ = E[D]/(ν − 1).
In words, Theorem 7.24 states that, for τ > 3, the graph distance distCMn (d) (o1 , o2 )
between two randomly chosen connected vertices grows as logν n, where n is the size of the
graph, and that the fluctuations around this leading asymptotics remain uniformly bounded
in n. Exercise 7.37 shows that distCMn (d) (o1 , o2 ) − blogν nc converges in distribution along
appropriately chosen subsequences.
The law of Ra is involved, and in most cases cannot be computed exactly. The reason for
this is the fact that the random variables Y (1) and Y (2) that appear in its statement are hard
to compute explicitly (see also [V1, Chapter 3]).
Let us give two examples where the law of Y is known. The first example is the r-regular
random graphs, for which all degrees in the graph are equal to some r ≥ 3. In this case,
E[D] = r, ν = r − 1, and Y = 1 almost surely. In particular, P(distCMn (d) (o1 , o2 ) <
∞) = 1 + o(1). Therefore,
r
P(Ra > k) = exp − (r − 1)a+k ,

(7.6.11)
r−2
and distCMn (d) (o1 , o2 ) is asymptotically equal to logr−1 n. Note that the distribution Ra de-
pends explicitly on a, so that distCMn (d) (o1 , o2 )−blogν nc does not converge in distribution
(see also Exercise 7.38).
The second example for which Y can be explicitly computed is where p? is the proba-
bility mass function of a geometric random variable, in which case the branching-process
generation sizes with offspring p? , conditioned to be positive, converge to an exponential
random variable with parameter 1. This example corresponds to
1
p?j = p(1 − p)j−1 , so that pj = p(1 − p)j−2 , ∀j ≥ 1, (7.6.12)
jcp
and cp is a normalization constant. For p > 12 , Y has the same law as the sum of D1
copies of a random variable that is equal to 0 with probability (1 − p)/p and an exponential
random variable with parameter 1 with probability (2p − 1)/p. Even in this simple case the
computation of the exact law of Ra is non-trivial.
320 Small-World Phenomena in Configuration Models

Fluctuation of Distances for Infinite-Variance Degrees

Next, we study the fluctuations of typical distances in CMn (d) in the setting where the
degrees are iid and satisfy that there exist τ ∈ (2, 3), γ ∈ [0, 1), and C < ∞ such that
γ−1 γ−1
x−τ +1−C(log x) ≤ 1 − F (x) ≤ x−τ +1+C(log x) , for large x. (7.6.13)
The condition in (7.6.13) is such that the results in Theorem 7.17 apply. Then we can identify
the fluctuations of the typical distance in CMn (d) as follows:
Theorem 7.25 (Fluctuations of the graph distance CMn (d) for infinite-variance degrees)
Consider CMn (d) where the degrees d = (dv )v∈[n] are a sequence of iid copies of D
satisfying (7.6.13) for some τ ∈ (2, 3). Then there exist random variables (Ra )a∈(−1,0]
such that, as n → ∞ and for all l ∈ Z, with o1 , o2 chosen independently and uar from [n],
j log log n k
P distCMn (d) (o1 , o2 ) = 2 + l distCMn (d) (o1 , o2 ) < ∞
| log(τ − 2)|
= P(Ran = l) + o(1), (7.6.14)
where
log log n log log n
an = b c− ∈ (−1, 0]. (7.6.15)
| log(τ − 2)| | log(τ − 2)|
Here, the random variables (Ra )a∈(−1,0] are given by

P(Ra > l) = P min (τ − 2)−s Y (1) + (τ − 2)s−cl Y (2) ≤ (τ − 2)dl/2e+a Y1 Y2 > 0 ,

s∈Z

where cl = 1 if l is even, and zero otherwise, and Y (1) , Y (2) are two independent copies of
the limit random variable Y in Theorem 7.17.
In words, Theorem 7.25 states that for τ ∈ (2, 3) the graph distance distCMn (d) (o1 , o2 )
between two randomly chosen connected vertices grows proportionally to the log log of the
size of the graph and that the fluctuations around this mean remain uniformly bounded in n.
We next discuss an extension, obtained by possibly truncating the degree distribution. In
order to state the result, we make the following assumption that makes (7.6.13) more precise:
Condition 7.26 (Truncated infinite-variance degrees) Fix ε > 0. There exists a βn ∈
(0, 1/(τ − 1)] such that Fn (x) = 1 for x ≥ nβn (1+ε) , while, for all x ≤ nβn (1−ε) ,
Ln (x)
1 − Fn (x) = , (7.6.16)
xτ −1
with τ ∈ (2, 3) and a function Ln (x) that satisfies, for some constant C > 0 and γ ∈ (0, 1),
that, for all x ≤ nβn (1−ε) ,
γ−1 γ−1
x−C(log x) ≤ Ln (x) ≤ xC(log x) . (7.6.17)
Theorem 7.27 (Fluctuations of the distances CMn (d) for truncated infinite-variance de-
grees) Consider CMn (d) where the degrees d = (dv )v∈[n] satisfy Condition 7.26 for
some τ ∈ (2, 3). Assume that dmin ≥ 2, and that there exists κ > 0 such that
max{dTV (Fn , F ), dTV (Fn? , F ? )} ≤ n−κβn . (7.6.18)
7.7 Notes and Discussion for Chapter 7 321

When βn → 1/(τ − 1), we further require that the limit random variable Y in Theorem
7.17 has no point mass on (0, ∞). Then, with o1 , o2 chosen independently and uar from [n],
and conditional on o1 ←→ o2 ,
log log(nβn ) 1
distCMn (d) (o1 , o2 ) − 2 − (7.6.19)
| log(τ − 2)| βn (τ − 3)
is a tight sequence of random variables.
Which of the two terms in (7.6.19) dominates depends sensitively on the choice of βn .
When βn → β ∈ (0, 1/(τ − 1)], the first term dominates. When βn = (log n)−γ for some
γ ∈ (0, 1), the second term dominates. Both terms are of the same order of magnitude when
βn = Θ(1/ log log n).
For supercritical graphs, typical distances are at most of order log n. The boundary point
in (7.6.19) corresponds to βn = Θ(1/ log n), in which case nβn = Θ(1) and Theorem
7.1 applies. Thus, even after truncation of the degrees, in the infinite-variance case, typical
distances are always ultra-small.

7.7 N OTES AND D ISCUSSION FOR C HAPTER 7

Notes on Section 7.2

Distances in the configuration model were first obtained in a non-rigorous way in Newman et al. (2000b,
2002); see also Cohen and Havlin (2003) for results on ultra-small distances. Theorem 7.1 is proved in van
der Hofstad et al. (2005). Theorem 7.2 is proved by van der Hofstad et al. (2007a).

Notes on Section 7.3

Proposition 7.5 was adapted from (Janson, 2010b, Lemma 5.1). The path-counting techniques used in Sec-
tion 7.3 are adaptations of those in Section 6.5.1. Comparisons with branching processes appear in many
papers on the configuration model. We have strived for a construction that is the most transparent and
complete. The proof of Theorem 7.13 is a slightly simplified version of the analysis in Reittu and Norros
(2004).

Notes on Section 7.4

Theorem 7.17 is proved in Davies (1978), whose proof we have adapted. The proof of Davies (1978)
relies on Laplace transforms. Darling (1970) proved a related result, under stronger conditions. Branching
processes with infinite mean have attracted considerable attention; see, e.g., Schuh and Barbour (1977);
Seneta (1973) and the references therein. There is a balance between the generality of the results and the
conditions on the offspring distribution, and in our opinion Theorem 7.17 strikes this balance nicely, in that
the result is relatively simple and the conditions fairly general.
Properties of the limit in Theorem 7.17 are hard to prove. For example, it is in general unknown whether
Y has a density. This fact is proved, under stronger assumptions, by Seneta (1973, 1974).

Notes on Section 7.5

Theorem 7.19 is (Fernholz and Ramachandran, 2007, Theorem 5.1) when p1 > 0, (Fernholz and Ra-
machandran, 2007, Theorem 5.14) when n1 = 0, and (Fernholz and Ramachandran, 2007, Theorem 5.15)
when n1 = n2 = 0. The proof by Fernholz and Ramachandran (2007) involves a precise analysis of the
trees that are augmented to the 2-core and their maximal height, as well as an analysis showing that the
pairs of vertices determined above really are at distances close to the diameter.
Theorem 7.20 is proved in Caravenna et al. (2019). A logarithmic lower bound on the diameter when
τ ∈ (2, 3) was also proved by van der Hofstad et al. (2007b), but that result is substantially weaker than
Theorem 7.19. A sharper result on the diameter of random regular graphs was obtained by Bollobás and
322 Small-World Phenomena in Configuration Models

Fernandez de la Vega (1982). For a nice discussion and results about the existence of a large k-core in the
configuration model, we refer to Janson and Luczak (2007).

Notes on Section 7.6

Theorem 7.23 is proved in van den Esker et al. (2006). The explicit identification of the limit of
P(distCMn (d) (o1 , o2 ) = 2) is novel. One might argue that including degrees larger than n − 1 is artifi-
cial in a network with n vertices. In fact, in many real-world networks, the degree is bounded by a physical
constant. Therefore, van den Esker et al. (2006) also considered the case where the degrees are conditioned
to be smaller than nα , where α is an arbitrary positive number. In this setting, it turns out that the typical
distance is equal to k+3 with high probability, whenever α ∈ (1/(τ +k), 1/(τ +k−1)). It can be expected
that a much more detailed picture can be derived when instead one conditions on the degrees being at most
nβn , as in van der Hofstad and Komjáthy (2017) for the τ ∈ (2, 3) case.
Theorem 7.24 is proved in van der Hofstad et al. (2005). Theorem 7.25 is proved in van der Hofstad et al.
(2007a). The assumption that the limit Y in Theorem 7.17, conditional on Y > 0, has no point masses was
implicitly made by van der Hofstad et al. (2007a). However, this assumption is not necessary for Theorem
7.25. Instead, it suffices to have that P(0 < Y ≤ ε) vanishes as ε & 0 (see, e.g., (van der Hofstad et al.,
2007a, Lemma 4.6)), which is generally true.
Theorem 7.27 is proved in van der Hofstad and Komjáthy (2017). There, the assumption on the random
variable Y not having point mass in (0, ∞) was also explicitly made and discussed in detail. Several exten-
sions were proved in van der Hofstad and Komjáthy (2017), the most precise result being (van der Hofstad
and Komjáthy, 2017, Corollary 1.9).

7.8 E XERCISES FOR C HAPTER 7

Exercise 7.1 (Degree example) Let the degree sequence d = (dv )v∈[n] be given by dv = 1 + (v mod 3)
as in Exercise 1.12. Let o1 , o2 be two independent vertices chosen uar from [n]. Identify a number a such
that, conditional on distCMn (d) (o1 , o2 ) < ∞, it holds that
P
distCMn (d) (o1 , o2 )/ log n −→ a. (7.8.1)

Exercise 7.2 (Poisson degree example) Consider the degree sequence d = (dv )v∈[n] satisfying the Pois-
son degree limit as formulated in (1.7.4) and (1.7.5) with λ > 1. Let o1 , o2 be two independent vertices
chosen uar from [n]. Identify a number a such that, conditional on distCMn (d) (o1 , o2 ) < ∞,
P
distCMn (d) (o1 , o2 )/ log n −→ a. (7.8.2)

Exercise 7.3 (Power-law degree example) Consider the degree sequence d = (dv )v∈[n] with dv =
[1 − F ]−1 (v/n), where F is the distribution of a random variable D having generating function, for
α ∈ (0, 1),
GX (s) = s − (1 − s)α+1 /(α + 1) (7.8.3)

as in Exercise 1.14. Identify a number a such that, conditional on distCMn (d) (o1 , o2 ) < ∞,
P
distCMn (d) (o1 , o2 )/ log log n −→ a. (7.8.4)

Exercise 7.4 (Branching-process approximation in Corollary 7.3) Use Corollary 2.19 to prove the
branching-process approximation for CMn (d) in (7.3.2) in Corollary 7.3. Hint: Use that, with Gn =
CMn (d), Zl(n;i) = |∂Bl(Gn ) (oi )| for all l ∈ [r] when Br+1
(Gn )
(oi ) is a tree, and then use Theorem 4.1.
Exercise 7.5 (Proof of logarithmic lower bound distances in Theorem 7.6) Let o1 , o2 be two independent
vertices chosen uar from [n]. Use Proposition 7.5 with a = o1 , b = o2 , I = [n] to prove the logarithmic
lower bound on typical distances in Theorem 7.6.
Exercise 7.6 (Proof of distances for τ = 3 in Corollary 7.7) Let o1 , o2 be two independent vertices chosen
uar from [n]. Use Theorem 7.6 to prove the logarithmic and logarithmic divided by log log lower bounds on
typical distances in Corollary 7.7 when Conditions 1.7(a),(b) and (7.3.15) hold.
7.8 Exercises for Chapter 7 323

Exercise 7.7 (Proof of logarithmic upper bound distances without degree truncation) Check that the “giant
is almost local” analysis in Section 4.3.1 can be performed without the degree-truncation argument of
Theorem 1.11 when lim supn→∞ E[Dn3 ] < ∞. Hint: Note that lim supn→∞ E[Dn3 ] < ∞ implies that the
Chebychev inequality can be used without degree truncation.
Exercise 7.8 (Proof of logarithmic upper bound distances without degree truncation (cont.)) Extend the
proof in Exercise 7.7 to the case where lim supn→∞ E[Dnp ] < ∞ for some p > 2. Hint: Instead of the
Chebychev inequality, use the Marcinkiewicz–Zygmund inequality (Gut, 2005, Corollary 8.2), a form of
i=1 with E[Xi ] = 0 and all q ∈ (1, 2],
which states that, for iid random variables (Xi )m
m
h X qi
E Xi ≤ nE[|X1 |q ]. (7.8.5)
i=1

Exercise 7.9 (Proof of logarithmic upper bound on typical distances in Theorem 7.9) Use Proposition
7.10 to prove the logarithmic upper bound on typical distances in Theorem 7.9 by adapting the proof of the
related result for NRn (w) in Theorem 6.19.
Exercise 7.10 (Alternative proof of log log typical distances in Theorem 7.11) Give an alternative proof
of the doubly logarithmic upper bound on typical distances in Theorem 7.11 by adapting the proof of the
related result for NRn (w) in Theorem 6.11.
Exercise 7.11 (The hubs Γ1 form whp a complete graph) Use Lemma 7.12 and β > 21 to show that, whp,
the set of hubs in Γ1 in (7.3.52) forms a complete graph, i.e., whp, every pair i, j ∈ Γ1 are neighbors in
CMn (d).
Exercise 7.12 (Second alternative proof of log log typical distances in Theorem 7.11 using the core) Give
an alternative proof of the doubly logarithmic upper bound on typical distances in Theorem 7.11 by using
the diameter of the core in Theorem 7.13, and an application of the second-moment method for the existence
of paths in Proposition 7.10.
Exercise 7.13 (Typical distances when τ = 2 in Theorem 7.16) Recall the definition of kn? for the critical
case τ = 2 studied in Theorem 7.16. Show that kn? = o(log?p (n)) for every p ≥ 1, where log?p (n) is
obtained by taking the logarithm of n p times.
Exercise 7.14 (Typical distances when τ = 2 in Theorem 7.16) Recall kn? from Exercise 7.13. Investigate
heuristically the size of kn? .
Exercise 7.15 (Another example of typical distances when τ = 2) Adapt the upper bound on typical
distances for τ = 2 in Theorem 7.16 to degree sequences for which τ = 2, in which (7.3.67) is replaced by
γ
[1 − Fn ](x) ≥ c1 e−c(log x) /x for some c, c1 , γ ∈ (0, 1), and all x ≤ nβ for some β > 21 .
Exercise 7.16 (Infinite mean under conditions in Theorem 7.17) Prove that E[X] = ∞ when the condi-
tions in Theorem 7.17 are satisfied. Extend this to show that E[X s ] = ∞ for every s > α ∈ (0, 1).
Exercise 7.17 (Example of infinite-mean branching process) Prove that γ(x) = (log x)γ−1 , for some
γ ∈ [0, 1), satisfies the assumptions in Theorem 7.17.
Exercise 7.18 (Telescoping sum identity for generation sizes in infinite-mean branching processes) Con-
sider an infinite-mean branching process as in Theorem 7.17. Prove the telescoping-sum identity (7.4.3) for
αk log (Zk ∨ 1).
Exercise 7.19 (Conditions in Theorem 7.17 for individuals with infinite line of descent) Prove that p(∞)
in [V1, (3.4.2)] satisfies the conditions in Theorem 7.17 with the function x 7→ γ ? (x) given by γ ? (x) =
γ(x) + c/ log x.
Exercise 7.20 (Convergence for Zk + 1) Show that, under the conditions of Theorem 7.17, it also holds
that αk log(Zk + 1) converges to Y almost surely.
Exercise 7.21 (Branching processes with infinite mean: case X ≥ 0) Use the branching process of the
number of vertices with infinite line of descent in [V1, Theorem 3.12] to extend the proof of Theorem 7.17
to the case where X ≥ 0.
324 Small-World Phenomena in Configuration Models

Exercise 7.22 (Distributional identity of limit in Theorem 7.17) Let Y be the limit of k 7→ αk log(Zk ∨ 1)
in Theorem 7.17. Prove (7.4.34) by showing that
d X
Y = α max Yi ,
i=1

where X denotes the offspring variable of the infinite-mean branching process and (Yi )i≥1 is a sequence
of iid copies of Y .
Exercise 7.23 (Exponential limit in Theorem 7.17) Let the offspring X have generating function
GX (s) = 1 − (1 − s)α (7.8.6)
with α ∈ (0, 1) as in Exercise 1.5. Use (7.4.34) as well as Theorem 7.18 to show that the limit Y of
k 7→ αk log(Zk ∨ 1) in Theorem 7.17 has an exact exponential distribution.
Exercise 7.24 (Maximum process for infinite-mean branching processes) Recall the maximum process
for infinite-mean branching processes, for which we let Q0 = 1, and, given Qk−1 = qk−1 , let Qk denote
the maximal offspring of the qk−1 individuals in the (k − 1)th generation. Show that (Qk )k≥0 is a Markov
chain, for which the transition probabilities can be derived from (recall (7.4.39))
P(Qk > q | Qk−1 = qk−1 ) = 1 − FX (q)qk−1 . (7.8.7)
Exercise 7.25 (Telescoping sum identity for maximum process, for infinite-mean branching processes)
Recall the maximum process (Qk )k≥0 for infinite-mean branching processes from Exercise 7.24. Prove the
telescoping-sum identity for αk log Qk in (7.4.40).
Exercise 7.26 (Convergence of the maximum process for infinite-mean branching processes) Recall the
a.s.
maximum process (Qk )k≥0 for infinite-mean branching processes from Exercise 7.24. Show αk log Qk −→
Q∞ under the conditions of Theorem 7.17, by adapting the proof of double-exponential growth of the
generation sizes in Theorem 7.17. You may for simplicity assume that the offspring distribution satisfies
P(X > k) = c/kα for all k ≥ kmin and c = kminα
.
Exercise 7.27 (Diameter of “soup” of cycles) Prove that in a graph consisting solely of cycles, the diam-
eter is equal to the longest cycle divided by 2.

Exercise 7.28 (Longest cycle in a 2-regular graph) Let Mn denote the size of the longest cycle in a
d
2-regular graph. Prove that Mn /n −→ M for some M . What can you say about the distribution of M ?

Exercise 7.29 (Diameter result for ERn (λ/n)) Fix λ > 1, and recall the constants in the limit of
diam(ERn (λ/n))/ log n in (7.5.4), as a consequence of Theorem 7.19. Prove that ν in Theorem 7.19
equals ν = λ and that µ in Theorem 7.19 equals µ = µλ , where µλ ∈ [0, 1) is the dual parameter, i.e., the
unique µ < 1 satisfying
µe−µ = λe−λ . (7.8.8)
Exercise 7.30 (Diameter of GRGn (w)) Consider GRGn (w) where the weights w = (wv )v∈[n] satisfy
Conditions 1.1(a)–(c). Identify the limit in probability of diam(GRGn (w))/ log n. Can this limit be zero?
Exercise 7.31 (Expectation of the number of minimally k-connected vertices) Recall the definition of
minimally k-connected vertices in Definition 7.21. Prove that, for all k ≥ 1,
dmin (dmin −1)k−1
Y dmin (ndmin − (i − 1))
E[Mk ] = ndmin . (7.8.9)
i=1
`n − 2i + 1

Exercise 7.32 (Second moment of the number of minimally k-connected vertices) Recall the definition
of minimally k-connected vertices in Definition 7.21. Prove that, for all k such that dmin (dmin − 1)k−1 ≤
`n /8,
h dmin 2ndmin d2min (dmin − 1)2k i
E[Mk2 ] ≤ E[Mk ]2 + E[Mk ] (dmin − 1)k + . (7.8.10)
dmin − 2 (dmin − 2)`n
7.8 Exercises for Chapter 7 325

Exercise 7.33 (Concentration of the number of minimally k-connected vertices: proof of Lemma 7.22)
Recall the definition of minimally k-connected vertices in Definition 7.21. Use Exercises 7.31 and 7.32 to
complete the proof of Lemma 7.22.
Exercise 7.34 (Sum of Gamma variables is finite almost surely) Fix τ ∈ (1, 2). Let (Ei )i≥1 be iid
−1/(τ −1)
exponentials, and Γi = ij=1 Ei be (dependent) Gamma variables. Show that Z = i≥1 Γi
P P
is
almost surely finite.
Exercise 7.35 (Typical distance is at least 2 whp for τ ∈ (1, 2)) Complete the argument that
P(distCMn (d) (o1 , o2 ) = 1) = o(1)
in the proof of the typical distance for τ ∈ (1, 2) in Theorem 7.23.
Exercise 7.36 (Typical distance equals 2 whp for τ = 1) Let (dv )v∈[n] be a sequence of iid copies
of D with distribution function F satisfying that x 7→ [1 − F ](x) is slowly varying at ∞. Prove that
P P
P satisfies that distCMn (d) (o1 , o2 ) −→ 2. You may use without proof that Mn /Sn −→ 1, where
CMn (d)
Sn = v∈[n] Dv and Mn = maxv∈[n] Dv .
Exercise 7.37 (Convergence along subsequences (van der Hofstad et al. (2005))) Fix an integer n1 . Prove
that, under the assumptions in Theorem 7.24, and conditional on distCMn (d) (o1 , o2 ) < ∞, along the
subsequence nk = bn1 ν k−1 c the sequence of random variables distCMn (d) (o1 , o2 ) − blogν nk c converges
in distribution to Ran1 as k → ∞.
Exercise 7.38 (Non-convergence of graph distances for random regular graph) Let dv = r for every v ∈
[n] and let nr be even. Recall (7.6.11). Show that Theorem 7.24 implies that distCMn (d) (o1 , o2 ) − blogν nc
does not converge in distribution.
Exercise 7.39 (Tightness of the hopcount (van der Hofstad et al. (2005))) Prove that, under the assump-
tions in Theorem 7.24:
(a) conditional on distCMn (d) (o1 , o2 ) < ∞ and whp, the random variable distCMn (d) (o1 , o2 ) is in be-
tween (1 ± ε) logν n for any ε > 0;
(b) conditional on distCMn (d) (o1 , o2 ) < ∞, the random variables distCMn (d) (o1 , o2 ) − logν n form a
tight sequence, i.e.,
lim lim sup P |distCMn (d) (o1 , o2 ) − logν n| ≤ K distCMn (d) (o1 , o2 ) < ∞ = 1. (7.8.11)

K→∞ n→∞

As a consequence, prove that the same result applies to a uniform random graph with degrees (dv )v∈[n] .
Hint: Make use of [V1, Theorem 7.21].
C HAPTER 8
S MALL -W ORLD P HENOMENA IN
P REFERENTIAL ATTACHMENT M ODELS

Abstract
In this chapter we investigate graph distances in preferential attachment mod-
els. We focus on typical distances as well as on the diameter of preferential
attachment models. We again rely on path-counting techniques, as well as local
limit results. Since the local limit is a rather involved quantity, some parts of our
analysis are considerably harder than those in Chapters 6 and 7.

8.1 M OTIVATION : L OCAL S TRUCTURE VERSUS S MALL -W ORLD P ROPERTIES

In Chapters 6 and 7, we saw that generalized random graphs and configuration models with
finite-variance degrees are small worlds, whereas random graphs with infinite-variance de-
grees are ultra-small worlds. In the small-world setting, distances are roughly logν n, where
ν describes the exponential growth of the branching-process approximation of local neigh-
borhoods in the random graphs in question. For preferential attachment models with δ > 0,
for which the degree power-law exponent equals τ = 3 + δ/m > 3, it is highly unclear
whether the neighborhoods grow exponentially, since the Pólya point tree that arises as the
local limit in Theorem 5.8 is a rather intricate object (recall also Theorem 5.21). This local
limit also makes path-counting estimates, the central method for bounding typical distances,
much more involved.
The ultra-small behavior in generalized random graphs and configuration models, on the
other hand, can be understood informally in terms of two effects. First, we note that such
random graph models contain super-hubs, whose degrees are much larger than n1/2 and
which form a complete graph of connections. Second, vertices of large degree d 1 are
typically connected to vertices of much larger degree, more precisely of degree roughly
d1/(τ −2) , by the power-iteration method. When combined, these two effects mean that it
takes roughly log log n/| log(τ − 2)| steps from a typical vertex to reach one of the super-
hubs, and thus roughly 2 log log n/| log(τ −2)| steps to connect two typical vertices to each
other. Of course the proofs are more technical, but this is the bottom line.
For preferential attachment models with δ ∈ (−m, 0) and m ≥ 2, however, vertices of
large degree d tend to be the old vertices, but old vertices are not necessarily connected to
much older vertices, which would be necessary to increase their degree from d to d1/(τ −2) .
However, vertices of degree d 1 do tend to be connected to vertices that are in turn con-
nected to a vertex of degree roughly d1/(τ −2) . This gives rise to a two-step power-iteration
property. We conclude that distances seem about twice as large in preferential attachment
models with infinite-variance degrees than in the corresponding generalized random graphs
or configuration models, and that this can be explained by differences in the local connec-
tivity structure. Unfortunately, owing to their dynamic nature, the results for preferential
attachment models are harder to prove, and they are less complete.

327
328 Small-World Phenomena in Preferential Attachment Models

The above explanations depend crucially on the local structure of the generalized random
graph as well as the configuration model. In this chapter we will see that, for the preferential
attachment model, such considerations need to be subtly adapted.

Organization of this Chapter

This chapter is organized as follows. We start in Section 8.2 by showing that preferential
attachment trees (where m = 1, so that every vertex enters the random graph with one edge
to earlier vertices) have logarithmic height and diameter. In Section 8.3 we investigate typical
distances in PA(m,δ)
n for m ≥ 2 and formulate our main results. In Section 8.4 we investigate
path-counting techniques in preferential attachment models, which we use in Section 8.5
to prove logarithmic lower bounds on distances for δ > 0 and in Section 8.6 to prove
doubly logarithmic lower bounds for δ < 0. In Section 8.7 we prove the matching doubly
logarithmic upper bound for δ < 0. In Section 8.8 we discuss the diameter of preferential
attachment models, and complete the upper bound on typical distances, for m ≥ 2. In
Section 8.9 we discuss further results about distances in preferential attachment models. We
close this chapter with notes and discussion in Section 8.10 and with exercises in Section
8.11.

Connectivity Notation for Preferential Attachment Models

Throughout this chapter, we work with the preferential attachment models defined in Sec-
tion 1.3.5, given by (PA(m,δ)
n (a))n≥1 , (PA(m,δ)
n (b))n≥1 , and (PA(m,δ)
n (d))n≥1 . We will write
(m,δ) j
(PAn )n≥1 when we talk about all of these models at the same time. We write u v to
denote that the j th edge from u is connected to v , where u ≥ v . We further write u ←→ v
when there exists a path of edges connecting u and v , so that v is in the connected component
of u.

8.2 L OGARITHMIC D ISTANCES IN P REFERENTIAL ATTACHMENT T REES

In this section, we investigate distances in scale-free trees, arising for m = 1. We ex-

plicitly treat PA(1,δ)
n (b), and we discuss the minor adaptations necessary for PAn (a)
(1,δ)

(1,δ)
and PAn (d). Such results are interesting in their own right and at the same time pro-
vide natural upper bounds on distances for m ≥ 2 owing to the fact that PA(m,δ) n (a) and
(m,δ) (1,δ/m)
PAn (b) can be obtained by collapsing blocks of m vertices in (PAn (a))n≥1 and
(PAn(1,δ/m) (b))n≥1 .
Let the height of a tree T on the vertex set [n] be defined as
height(T ) = max distT (1, v), (8.2.1)
v∈[n]

where distT (u, v) denotes the graph distance between vertices u and v in the tree T , and 1
denotes the root of the tree. We start by studying various distances in the tree PA(1,δ)
n :

Theorem 8.1 (Distances in scale-free trees) Consider PA(1,δ)

n with m = 1 and δ > −1,
and let θ ∈ (0, 1) be the non-negative solution of
θ + (1 + δ)(1 + log θ) = 0. (8.2.2)
8.2 Logarithmic Distances in Preferential Attachment Trees 329

Then, as n → ∞, for o ∈ [n] chosen uar,

distPA(1,δ) (b)
(1, o) P 1 + δ height(PA(1,δ)
n (b)) a.s. 1+δ
n
−→ , −→ , (8.2.3)
log n 2+δ log n (2 + δ)θ
and, as n → ∞, for o1 , o2 ∈ [n] chosen independently and uar,
distPA(1,δ) (b)
(o1 , o2 ) P 2(1 + δ) diam(PA(1,δ)
n (b)) P 2(1 + δ)
n
−→ , −→ . (8.2.4)
log n 2+δ log n (2 + δ)θ
The same results apply to PA(1,δ) (1,δ)
n (d) and PAn (a), where in the latter we condition on 1
being connected to o in (8.2.3), and on o1 being connected to o2 in (8.2.4).
We will prove all these bounds except for the lower bound on the height in (8.2.3),
which we discuss in Section 8.10. We start by proving the upper bounds in Theorem 8.1
for PA(1,δ)
n (b). As we will see, the proof indicates that the almost sure limit of the height in
Theorem 8.1 does not depend on the precise starting configuration of the graph PA(1,δ) 2 (b),
which is useful in extending the results in Theorem 8.1 to PA(1,δ)
n (d) and PA (1,δ)
n (a). Exer-
cises 8.1 and 8.2 explore properties of the constant θ.
In the proof of Theorem 8.1 we make use of the following result, which computes the
probability mass function of the distance between vertex v and the root 1. Before stating
the result, we need some more notation. Recall that we write u v , which is the same as
1 (1,δ)
u v , when in (PAn (b))n≥1 the (single) edge from vertex u is connected to vertex v .
For u = π0 > π1 > · · · > πk = 1, and denoting ~π = (π0 , π1 , . . . , πk ), we write the
event that ~π is present in the tree PA(1,δ)
n (b) as

k−1
\
{~π ⊆ PAn (b)} =
(1,δ)
{πi πi+1 }. (8.2.5)
i=0

The probability mass function of distPA(1,δ)

n (b)
(u, v) can be explicitly identified as follows:
Proposition 8.2 (Distribution of tree distance) Fix m = 1 and δ > −1. Then, for all
π0 = u > π1 > · · · > πk−1 > πk = v ,
1+δ
1 + δ k Γ(u −
2+δ
)Γ(v) k−1
Y 1
P(~π ⊆ PAn (b)) =
(1,δ)
1 . (8.2.6)
2+δ Γ(v + 2+δ
)Γ(u) i=1 πi − 1+δ
2+δ

Consequently, for all u > v ,

1 + δ k Γ(u − 1+δ X k−1
2+δ
)Γ(v) Y 1
P(distPA(1,δ) (u, v) = k) = 1 , (8.2.7)
n (b)
2+δ Γ(v + 2+δ
)Γ(u) ~
π i=1
πi − 1+δ
2+δ

where the sum is over ordered vectors ~π = (π0 , . . . , πk ) with π0 = u and πk = v .

Remark 8.3 (Extension to PA(1,δ) (1,δ)
n (a)) For PAn (a), (8.2.6) is replaced by

1
1 + δ k Γ(v −
2+δ
)Γ(u) k−1
Y 1
P(~π ⊆ PA(1,δ)
n (a)) = 1+δ 1 . (8.2.8)
2+δ Γ(u + 2+δ
)Γ(v) i=1 πi − 2+δ
J
330 Small-World Phenomena in Preferential Attachment Models

Proof of Proposition 8.2. We claim that the events {πi πi+1 } are independent, i.e., it
holds that, for every ~π = (π0 , . . . , πk ),
k−1
\ k−1
Y
P {πi πi+1 } = P(πi πi+1 ). (8.2.9)
i=0 i=0

We prove the independence in (8.2.9) by induction on k ≥ 1. For k = 1, there is nothing to

prove, and this initializes the induction hypothesis. To advance the induction hypothesis in
(8.2.9), we condition on PA(1,δ)
π0 −1 (b) to obtain

h k−1
\ i
P(~π ⊆ PAn (1,δ)
(b)) = E P {πi πi+1 } PA(1,δ)
π1 −1 (b)
i=0

= E 1Tk−1
h i
i=1 {πi πi+1 } P π0 π1 | PA(1,δ)
π0 −1 (b) , (8.2.10)
Tk−1
since the event i=1 {πi πi+1 } is measurable with respect to PA(1,δ)
π0 −1 (b) because π0 −
1 ≥ πi for all i ∈ [k − 1]. Furthermore, from [V1, (8.2.2)],
Dπ1 (π0 − 1) + δ
P π0

π1 | PA(1,δ)
π0 −1 (b) = . (8.2.11)
(2 + δ)(π0 − 1)
In particular,
h D (π − 1) + δ i
π1 0
P π0 π1 = E

. (8.2.12)
(2 + δ)(π0 − 1)
Therefore,
Dπ1 (π0 − 1) + δ i
n (b)) = E 1 k−1
h
P(~π ⊆ PA(1,δ) T
i=1 {πi πi+1 }
(2 + δ)(π0 − 1)
k−1 h D (π − 1) + δ i
π1 0
\
=P {πi πi+1 } E , (8.2.13)
i=1
(2 + δ)(π0 − 1)
since the random variable Dπ1 (π0 − 1) depends only on how many edges are connected
Tk−1
to π1 after time π1 , and is thus independent of the event i=1 {πi πi+1 }, which only
depends on the attachment of the edges up to and including time π1 . We conclude that
k−1
\
P(~π ⊆ PA(1,δ)
n (b)) = P π0 π1 P {πi πi+1 } . (8.2.14)
i=1

The claim in (8.2.9) for k follows from the induction hypothesis.

Combining (8.2.19) with (8.2.9) and (8.2.12), we obtain that
k−1
πi+1 (πi − 1) + δ
Y hD i
P(~π ⊆ PA(1,δ)
n (b)) = E . (8.2.15)
i=0
(2 + δ)(πi − 1)
By [V1, (8.11.3)], for all n ≥ s and for PA(1,δ)
n (b),
1
Γ(n + 2+δ )Γ(s)
E Ds (n) + δ = (1 + δ)

1 . (8.2.16)
Γ(n)Γ(s + 2+δ )
8.2 Logarithmic Distances in Preferential Attachment Trees 331

As a result,
1
1+δ Γ(πi − 1 + 2+δ )Γ(πi+1 )
P(πi πi+1 ) = 1
(2 + δ)(πi − 1) Γ(πi − 1)Γ(πi+1 + 2+δ )
1
1 + δ Γ(πi − 1 + 2+δ )Γ(πi+1 )
= 1 , (8.2.17)
2 + δ Γ(πi )Γ(πi+1 + 2+δ )
so that
1 + δ k k−1
Y Γ(πi − 1 + 1
2+δ
)Γ(πi+1 )
P(~π ⊆ PAn (b)) =
(1,δ)
1
2+δ i=0
Γ(πi )Γ(πi+1 + 2+δ )
1+δ 1
1 + δ k Γ(π0 −
2+δ
)Γ(πk ) k−1
Y Γ(πi − 1 + 2+δ )
= 1 1
2+δ Γ(π0 )Γ(πk + 2+δ ) i=0 Γ(πi + 2+δ )
1+δ
1 + δ k Γ(u −
2+δ
)Γ(v) k−1
Y 1
= 1 , (8.2.18)
2+δ Γ(u)Γ(v + 2+δ ) i=1 πi − 1+δ
2+δ

which proves (8.2.6). Since the path between vertex u and v in PA(1,δ)
n (b) is unique,

X k−1
\
P(distPA(1,δ)
n (b)
(u, v) = k) = P {πi πi+1 } , (8.2.19)
~
π i=0

where again the sum is over all ordered vectors ~π = (π0 , . . . , πk ) with π0 = u and πk = v .
Thus, (8.2.7) follows immediately from (8.2.6). This completes the proof of Proposition
8.2.
For the proof of Remark 8.3, (8.2.15) is replaced with
k−1
" #
Y Dπi+1 (πi − 1) + δ
P(~π ⊆ PAn (a)) =
(1,δ)
E , (8.2.20)
i=0
(2 + δ)(πi − 1) + 1 + δ
and (8.2.16) is replaced with, for n ≥ s,
1
Γ(n + 1)Γ(s − 2+δ )
E Ds (n) + δ = (1 + δ)

1+δ
. (8.2.21)
Γ(n + 2+δ )Γ(s)
After these changes, the proof follows the same steps (see Exercise 8.3).

Proof of the upper bounds in Theorem 8.1 for PA(1,δ)

n (b). We first use Proposition 8.2 to
prove that, for o chosen uar from [n],
distPA(1,δ) (b)
(1, o) 1+δ
n
≤ (1 + oP (1)) , (8.2.22)
log n 2+δ
and, almost surely for large enough n,
distPA(1,δ) (b)
(1, n) (1 + δ)
n
≤ (1 + ε) , (8.2.23)
log n (2 + δ)θ
where θ is the non-negative solution of (8.2.2). We start by noting that (8.2.22) and (8.2.23)
332 Small-World Phenomena in Preferential Attachment Models

immediately prove the first upper bound in (8.2.3). Further, (8.2.23) implies that, almost
surely for large n,
height(PA(1,δ)
n (b)) (1 + δ)
≤ (1 + ε) . (8.2.24)
log n (2 + δ)θ
Indeed, height(PA(1,δ) n (b)) > log n(1 + ε)(1 + δ)/(2 + δ)θ for n large precisely when
there exists an m large (m must at least satisfy m ≥ log n(1 + ε)(1 + δ)/(2 + δ)θ) such
that distPA(1,δ)
n (b)
(1, m) > log m(1 + ε)(1 + δ)/(2 + δ)θ. Since the latter almost surely
does not happen for m large, by (8.2.23), it follows that (8.2.24) does not hold for n large.
Thus, (8.2.22) and (8.2.23) also prove the second upper bound in (8.2.3).
By the triangle inequality,

distPA(1,δ)
n (b)
(o1 , o2 ) ≤ distPA(1,δ)
n (b)
(1, o1 ) + distPA(1,δ)
n (b)
(1, o2 ), (8.2.25)
diam(PAn (b)) ≤ 2 height(PAn (b)),
(1,δ) (1,δ)
(8.2.26)

so that (8.2.22) and (8.2.23) imply the upper bounds in (8.2.4), and thus all those in Theorem
8.1, for PA(1,δ)
n (b).
We proceed to prove (8.2.22) and (8.2.23) for PA(1,δ)
n (b), and start with some preparations.
We use (8.2.7) and symmetry to obtain
1 + δ k Γ(n − 1+δ k−1
2+δ
)Γ(1) X∗ 1 Y 1
P(distPA(1,δ) (1, n) = k) = 1 ,
n (b)
2+δ Γ(1 + 2+δ
)Γ(n) ~ (k − 1)! i=1 ti − 1+δ
2+δ
tk−1
(8.2.27)
where the sum now is over all vectors ~tk−1 = (t1 , . . . , tk−1 ), with 1 < ti < n, having
distinct coordinates. We can upper bound this sum by leaving out the restriction that the
coordinates of ~tk−1 are distinct, so that
!k−1
1 + δ k Γ(n − 1+δ
2+δ
) 1
n−1
X 1
P(distPA(1,δ) (1, n) = k) ≤ 1 .
n (b)
2+δ Γ(1 + 2+δ )Γ(n) (k − 1)! s=2
s − 1+δ
2+δ
(8.2.28)
Since x 7→ 1/x is monotonically decreasing and (1 + δ)/(2 + δ) ∈ (0, 1),
n−1 n−1 n
1 X 1 1
X Z
1+δ
≤ ≤1+ dx ≤ log (en). (8.2.29)
s=2
s − 2+δ s=2
s−1 1 x
1+δ
Also, we use [V1, (8.3.9)] to bound Γ(n − 2+δ
)/Γ(n) ≤ Cδ n−(1+δ)/(2+δ) , for some con-
stant Cδ > 0, so that
k−1
1+δ
2+δ
log (en)
− 1+δ
P(distPA(1,δ) (1, n) = k) ≤ Cδ n 2+δ
n (b)
(k − 1)!
1 + δ
= Cδ P Poi log (en) = k − 1 . (8.2.30)
2+δ
Now we are ready to prove (8.2.22) for PA(1,δ)
n (b). We note that o is chosen uar from [n],
8.2 Logarithmic Distances in Preferential Attachment Trees 333

so that, with C denoting a generic constant that may change from line to line,
n
1X
P(distPA(1,δ) (1, o) = k) = P(distPA(1,δ) (s, 1) = k)
n (b)
n s=1 n (b)

k−1
1+δ
1 Xn
1+δ 2+δ
log (es)
≤ Cδ s− 2+δ
n s=1 (k − 1)!
k−1
1+δ
2+δ
log (en) X n
1+δ
≤ Cδ s− 2+δ
n(k − 1)! s=1
k−1
1+δ
2+δ
log (en) 1+δ
≤C n− 2+δ
(k − 1)!
1 + δ
= C P Poi log (en) = k − 1 . (8.2.31)
2+δ
Therefore,
1 + δ
P(distPA(1,δ) (1, o) > k) ≤ C P Poi log (es) ≥ k . (8.2.32)
n (b)
2+δ
Fix ε > 0 and take k = kn = d (1+ε)(1+δ)
(2+δ)
log (en)e, to arrive at
P(distPA(1,δ)
n (b)
(1, o) > kn )
1 + δ (1 + ε)(1 + δ)
≤ C P Poi log (en) ≥ log (en) = o(1), (8.2.33)
2+δ (2 + δ)
by the law of large numbers and for any ε > 0, as required.
We continue by proving (8.2.23) for PA(1,δ)
n (b). By (8.2.30),
1 + δ
P(distPA(1,δ) (1, n) > k) ≤ Kδ P Poi log (en) ≥ k . (8.2.34)
n (b)
2+δ
Take kn = da log ne with a > (1 + δ)/(2 + δ). We use the Borel–Cantelli lemma to see
that dist(1, n) > kn will occur only finitely often when (8.2.34) is summable. We then
use the large-deviation bounds for Poisson random variables in [V1, Exercise 2.20] with
λ = (1 + δ)/(2 + δ) to obtain that
P(distPA(1,δ)
n (b)
(1, n) > a log n) ≤ Kδ n−p , (8.2.35)
with
a(2 + δ) 1+δ
p = a log −a+ . (8.2.36)
1+δ 2+δ
Let x be the solution of
1+δ
x(log (x(2 + δ)/(1 + δ)) − 1) + = 1, (8.2.37)
2+δ
so that x = (1 + δ)/[(2 + δ)θ]. Then, for every a > x,
P(distPA(1,δ)
n (b)
(1, n) > a log n) = O(n−p ), (8.2.38)
334 Small-World Phenomena in Preferential Attachment Models

where p in (8.2.36) satisfies p > 1 for any a > (1 + δ)/[(2 + δ)θ]. As a result, by the
Borel–Cantelli lemma, the event {distPA(1,δ)
n (b)
(1, n) > kn }, with kn = da log ne and
a > (1 + δ)/[(2 + δ)θ], occurs only finitely often, so that (8.2.23) holds.
Proof of the lower bound on distPA(1,δ)
n (b)
(1, o) in Theorem 8.1 for PA(1,δ)
n (b). By (8.2.31),
!
1 + δ
P(distPA(1,δ) (1, o) ≤ k) ≤ C P Poi log (en) ≤ k . (8.2.39)
n (b)
2+δ

Fix kn = d[(1 − ε)(1 + δ)/(2 + δ)] log (2 + δ)ne, and note that P(distPA(1,δ) n (b)
(1, o) ≤
kn ) = o(1) by the law of large numbers.
a.s. (1+δ)
n (b) / log n −→ (2+δ)θ in Theorem 8.1, we
To complete the proof that height PA(1,δ)
(1+δ)

use the second-moment method to prove that height PA(1,δ) n (b) ≤ (1 − ε) (2+δ)θ log n has
vanishing probability. Together with (8.2.24), this certainly proves that

height PA(1,δ)
n (b) P (1 + δ)
−→ .
log n (2 + δ)θ

However, since height PA(1,δ)
n (b) is a non-decreasing sequence of random variables, this
also implies convergence almost surely, as we argue in more detail below Proposition 8.4.
(1−ε)(1+δ)
n (b) ≤
We formalize the statement that height PA(1,δ) (2+δ)θ
log n as follows:

Proposition 8.4 (Height of PA(1,δ)

n (b) converges in probability) For every ε > 0 there
exists an η = η(ε) > 0 such that
(1 − ε)(1 + δ)
P height PA(1,δ) ≤ O(n−η ).

n (b) ≤ log n (8.2.40)
(2 + δ)θ

Proof of lower bound on height PAn(1,δ) (b) in Theorem 8.1 subject to Proposition 8.4. Fix
α > 0, and take nk = nk (α) = eαk . For any α > 0, by Proposition 8.4 and the fact that n−η k
(1−ε)(1+δ)

is summable in k , almost surely, height PA(1,δ)
nk (b) ≥ (2+δ)θ
log n k . This proves the

almost sure lower bound on height PA(1,δ)n (b) along the subsequence (nk )k≥0 . To extend

this to an almost sure lower bound when n → ∞, we use that n 7→ height PA(1,δ) n (b) is
non-decreasing, so that, for every n ∈ [nk−1 , nk ],

n (b) ≥ height PAnk−1 (b)
height PA(1,δ) (1,δ)

(1 − ε)(1 + δ)
≥ log nk−1
(2 + δ)θ
(1 + δ)
≥ (1 − ε)(1 − α) log n, (8.2.41)
(2 + δ)θ

where the third inequality follows from the almost sure lower bound on height PA(1,δ)
nk−1 (b) .
The above bound holds for all ε, α > 0, so that letting ε, α & 0 proves our claim.
We omit the proof of Proposition 8.4, and refer to the notes and discussion in Section
8.10 for details. The proof relies on a continuous-time embedding of preferential attachment
models, first invented by Pittel (1994), and the height of such trees, found using a beautiful
8.2 Logarithmic Distances in Preferential Attachment Trees 335

argument by Kingman (1975) (see Exercises 8.4 and 8.5). In the remainder of the proofs in
this chapter, we rely only on the upper bound in (8.2.24).

We continue by proving the lower bound on diam PA(1,δ) n (b) in Theorem 8.1:

Proof of the lower bound on diam PA(1,δ) n (b) in Theorem 8.1. We use the lower bound on
height(PA(1,δ)
n (b)) in Theorem 8.1 and the decomposition of scale-free trees in Theorem
5.4. Theorem 5.4 states that PA(1,δ)
n (b) can be decomposed into two scale-free trees hav-
ing similar distributions as copies PA(1,δ) (1,δ) (1,δ)
S1 (n) (b1 ) and PAn−S1 (n) (b2 ), where (PAn (b1 ))n≥1
and (PA(1,δ)
n (b2 ))n≥1 are independent scale-free tree processes, and the law of S1 (n) is
described in (5.2.30). By this tree decomposition,

n (b) ≥ height PAS1 (n) (b1 ) + height PAn−S1 (n) (b2 ) .
(1,δ) (1,δ)
diam PA(1,δ) (8.2.42)

The two trees (PA(1,δ) (1,δ)

n (b1 ))n≥1 and (PAn (b2 ))n≥1 are not exactly equal in distribution
(1,δ)
to (PAn (b))n≥1 , because the initial degrees of the starting vertices at time n = 2 are
different. However, the precise almost sure scaling in Theorem 5.4 does not depend in a
sensitive way on the degrees d1 and d2 of the first two vertices, and furthermore the height
of the scale-free tree in Theorem 8.1 does not depend on the starting graphs PA(1,δ)2 (b1 ) and
a.s.
PA(1,δ)
2 (b 2 ) (see the remark below Theorem 8.1). Since S 1 (n)/n −→ U , where U has a
Beta distribution with parameters a = (3+δ)/(2+δ) and b = (1+δ)/(2+δ), we obtain that
a.s. (1+δ) a.s. (1+δ)
height PA(1,δ) (1,δ)
S1 (n) (b1 ) / log n −→ (2+δ)θ and height PAn−S1 (n) (b2 ) / log n −→ (2+δ)θ .
Thus, we conclude that, almost surely for all large n,

diam PA(1,δ)
n (b) 2(1 + δ)
≥ (1 − ε) . (8.2.43)
log n (2 + δ)θ
We proceed by proving the convergence of distPA(1,δ)
n (b)
(o1 , o2 )/ log n in Theorem 8.1.
We write
distPA(1,δ)
n (b)
(o1 , o2 ) = distPA(1,δ)
n (b)
(1, o1 ) + distPA(1,δ)
n (b)
(1, o2 )
− distPA(1,δ)
n (b)
(1, V ), (8.2.44)
where V is the last vertex that is on both paths, i.e., from 1 → o1 as well as from 1 → o2 .
The asymptotics of the first two terms are identified in (8.2.3) in Theorem 8.1, so that it
suffices to show that distPA(1,δ)
n (b)
(1, V ) = oP (log n), as stated in Lemma 8.5.
Lemma 8.5 (Distance to most recent common ancestor) Consider PA(1,δ) n (b). Let o1 , o2
be two vertices in [n] chosen independently and uar from [n]. Let V be the oldest vertex that
the paths from 1 to o1 and 1 to o2 have in common in PA(1,δ)
n (b). Then,

distPA(1,δ) (b)
(1, V ) P
n
−→ 0. (8.2.45)
log n
We leave the proof of Lemma 8.5 as Exercise 8.16, which we postpone until we have
discussed path-counting techniques for preferential attachment models.
Proof of Theorem 8.1 for PA(1,δ) (1,δ) (1,δ)
n (d) and PAn (a). The proof of Theorem 8.1 for PAn (d)
(1,δ)
follows the same line of argument as for PAn (b), where we note that the only difference
in PA(1,δ) (1,δ)
n (d) and PAn (b) is in the graph for n = 2. We omit further details.
336 Small-World Phenomena in Preferential Attachment Models

To prove Theorem 8.1 for PA(1,δ)n (a), we note that the connected components of PAn (a)
(1,δ)

(1,δ) (1,δ)
are similar in distribution to single scale-free tree PAt1 (b1 ), . . . , PAtNn (bNn ), apart from
the initial degree of the root. Here ti denotes the size of the ith tree at time n, and we recall
P
that Nn denotes the total number of trees at time n. Since Nn / log n −→ (1 + δ)/(2 + δ)
whp the largest connected component has size at least εn/ log n. Since
by Exercise 5.26,
log εn/ log n = (1+o(1)) log n, the distances in these trees are closely related to those in
PA(1,δ) (1,δ)
n (b). Theorem 8.1 for PAn (a) then follows similarly to the proof for PAn (b).
(1,δ)

8.3 S MALL -W ORLD P HENOMENA IN P REFERENTIAL ATTACHMENT M ODELS

In the following sections we investigate typical distances in preferential attachment models

for m ≥ 2. These results are not as complete as those for the inhomogeneous random graphs
or configuration models discussed in Chapters 6 and 7, respectively. This is partly due to the
fact that the dynamics in preferential attachment models make them substantially harder to
analyze than those static models.
By Theorem 5.27, PA(m,δ)n (a) is whp connected when m ≥ 2, and the same is trivially
(m,δ) (m,δ)
true for PAn (b) and PAn (d). In a connected graph, the typical distance is the graph
distance between two vertices chosen uar from [n]. Recall further that the power-law degree
exponent for PA(m,δ)
n (a) is equal to τ = 3 + δ/m. Therefore, τ > 3 precisely when δ > 0.
For the generalized random graph and the configuration model, we have seen that distances
are logarithmic in the size of the graph when τ > 3 and doubly logarithmic when τ ∈ (2, 3).
We will see similar results for preferential attachment models.

Logarithmic Distances in PA Models with m ≥ 2 and δ > 0

We start by investigating the case where δ > 0, so that the power-law degree exponent
τ = 3 + δ/m satisfies τ > 3. Here, the typical distance is logarithmic in the graph size:
Theorem 8.6 (Logarithmic bounds for typical distances for δ > 0) Consider PA(m,δ)
n (a)
with m ≥ 2 and δ > 0. Let o1 , o2 be chosen independently and uar from [n]. There exist
0 < c1 ≤ c2 < ∞ such that, as n → ∞ and whp,
c1 log n ≤ distPA(m,δ)
n (a)
(o1 , o2 ) ≤ c2 log n. (8.3.1)
These results also apply to PA(m,δ)
n (b) and PA(m,δ)
n (d) under identical conditions.
It turns out that any c1 < 1/ log ν suffices, where ν = kT κ k > 1 is the operator norm of
the offspring operator T κ of the Pólya point tree defined in (8.5.61) below. It turns out that
ν has an interpretation not only as the operator norm kT κ k of T κ but also as the spectral
P
radius of T κ . As a result, we believe that distPA(m,δ)
n (a)
(o1 , o2 )/ logν n −→ 1, but we lack
a proof for the upper bound.
See Figure 8.1 for a simulation of the typical distances in PA(m,δ) n (a) with m = 2, δ =
−1 (so that τ = 2.5) and m = 2, δ = 1 (so that τ = 3.5), respectively.

Doubly Logarithmic Distances in PA Models with m ≥ 2 and δ < 0

We continue by discussing the case where δ ∈ (−m, 0), so that τ = 3 + δ/m ∈ (2, 3). In
this case, it turns out that distances again grow doubly logarithmically in the graph size:
8.3 Small-World Phenomena in Preferential Attachment Models 337

(a) (b)
0.2

0.3
0.15
Proportion

Proportion
0.2
0.1

0.1
0.05

0
2 4 6 8 10 12 14 16 18 20 22 24 2 3 4 5 6 7 8 9 10 11 12
Typical Distance Typical Distance
Figure 8.1 Typical distances between 2,000 pairs of vertices in the preferential
attachment model with n = 100, 000 and (a) τ = 2.5; (b) τ = 3.5.
Theorem 8.7 (Ultra-small typical distances for δ < 0) Consider PA(m,δ)
n (a) with m ≥ 2
and δ ∈ (−m, 0). Let o1 , o2 be chosen independently and uar from [n]. As n → ∞,
distPA(m,δ) (a)
(o1 , o2 ) P 4
n
−→ . (8.3.2)
log log n | log (τ − 2)|
These results also apply to PA(m,δ)
n (b) and PA(m,δ)
n (d) under identical conditions.
Exercise 8.6 investigates an example of the above result. Interestingly, the limiting con-
stant 4/| log (τ − 2)| appearing in Theorem 8.7 replaces the limit 2/| log (τ − 2)| in The-
orem 6.3 for the Norros–Reittu model NRn (w) and in Theorem 7.2 for the configuration
model CMn (d) when the power-law exponent τ satisfies τ ∈ (2, 3). Thus, typical distances
are twice as large for PA(m,δ)
n compared with CMn (d) with the same power-law exponent.
This can be intuitively explained as follows. For the configuration model CMn (d), vertices
with degree d 1 are likely to be directly connected to vertices of degree ≈ d1/(τ −2) (see,
e.g., Lemma 7.12), which is the whole idea behind the power-iteration methodology.
For PA(m,δ)
n , this is not the case. However, pairs of high-degree vertices are likely to be at
distance 2, as whp there is a young vertex that connects to both older vertices. This makes
distances in PA(m,δ)
n effectively twice as big as those for CMn (d) with the same degree
sequence. This effect is special for δ < 0 and is studied in more detail in Exercises 8.7 and
8.8, while Exercise 8.9 shows that this effect is absent when δ > 0.

Distances in Preferential Attachment Models with m ≥ 2 and δ = 0

We close this section by discussing the case δ = 0, where the power-law exponent τ satisfies
τ = 3. For NRn (w) and CMn (d), distances grow as log n/ log log n in this case (recall
Theorem 6.22 and Corollary 7.7). The same turns out to be true for PAn(m,0) (a):
Theorem 8.8 (Typical distances for δ = 0) Consider PAn(m,0) (a) with m ≥ 2. Then, with
o1 , o2 chosen independently and uar from [n],
log log n P
distPA(m,0) (o1 , o2 ) −→ 1. (8.3.3)
n (a)
log n
338 Small-World Phenomena in Preferential Attachment Models

The lower bounds in Theorem 8.8 also apply to PAn(m,0) (b) and PA(m,0)
n (d).

Universality in Distances for Scale-Free Graphs

The available results are all consistent with the prediction that distances in preferential at-
tachment models have asymptotics similar to the distances in generalized random graphs or
configuration models with the same degree sequence. Such results suggest a strong form of
universality, which is interesting in its own right. However, certain local effects of the graph
may change graph distances somewhat, as exemplified by the fact that the distances in The-
orem 8.7 are asymptotically twice as large as those for the Norros–Reittu model NRn (w)
in Theorem 6.3 and for the configuration model CMn (d) in Theorem 7.2. This shows that
the details of the model do matter.

Organization of the Proof of Small-World Properties of the PA Model

We prove small-world and ultra-small-world results in the following four sections. These
proofs are organized as follows. We start in Section 8.4 by discussing path-counting tech-
niques for preferential attachment models. While preferential attachment models lack the
independence or weak dependence between the edge statuses of the graph that are present in
inhomogeneous random graphs and configuration models, it turns out that the probability of
the existence of paths can still be bounded from above by products of probabilities, owing
to the inherent negative correlations between edge statuses. Therefore, the lower bounds on
typical distances can be obtained in a very similar way to those for rank-1 inhomogeneous
random graphs in Theorems 6.4 and 6.7 or to those for configuration models in Theorems
7.6 and 7.8. These bounds also apply to δ = 0. The resulting lower bounds are obtained in
Sections 8.5 and 8.6 for δ ≥ 0 and δ < 0, respectively.
For the upper bounds, we use those in Section 8.2 for m = 1 and δ > 0, which are
obviously not sharp. This explains why the upper bound in Theorem 8.6 is suboptimal. For
δ < 0, instead, in Section 8.7 we again define the notion of a core, as for configuration
models in Theorem 7.13. However, owing to the preferential attachment growth dynamics,
the precise definition is more involved. The idea of the core is also used to analyze the
diameter for m ≥ 2 and δ ∈ (−m, 0) in Section 8.8.
These proofs together complete the proofs of the typical distances in PA(m,δ)
n . Sub-results
are stated explicitly, as these are sometimes sharper than the results stated here, and are thus
interesting in their own right.

8.4 PATH C OUNTING IN P REFERENTIAL ATTACHMENT M ODELS

In this section we study the probability that a certain path is present in PA(m,δ) n . Recall from
Definition 6.5 that we call a k -step path ~π = (π0 , π1 , . . . , πk ) self-avoiding when πi 6= πj
for all 1 ≤ i < j ≤ k . The following proposition studies the probability that a path ~π is
present in PA(m,δ)
n :

Proposition 8.9 (Path counting in preferential attachment models) Consider PA(m,δ) n (a)
with m ≥ 2. Denote γ = m/(2m + δ). Fix k ≥ 0 and let ~π = (π0 , π1 , . . . , πk ) be a k -step
8.4 Path Counting in Preferential Attachment Models 339

self-avoiding path. Then, there exists a constant C > 0 such that, for all k ≥ 1,
k−1
Y 1
P(~π ⊆ PA(m,δ)
n (a)) ≤ (Cm)k 1−γ . (8.4.1)
i=0 (πi ∧ πi+1 )γ (πi ∨ πi+1 )
This result also applies to PA(m,δ)
n (b) and PA(m,δ)
n (d).
Paths are formed by repeatedly forming edges. When m = 1, paths always go from
younger to older vertices. When m ≥ 2 this monotonicity property is lost, which makes
proofs harder. We start by investigating intersections of events that specify which edges are
present in PA(m,δ)
n . More precise results appear in Propositions 8.14 and 8.15 below.

8.4.1 N EGATIVE C ORRELATION OF C ONNECTION E VENTS

j
We recall that {u v}, for j ∈ [m], denotes the event that the j th edge of vertex u is
attached to the earlier vertex v , where u, v ∈ [n]. It is a direct consequence of the definition
j
of preferential attachment models that the event {u v} increases the preference for vertex
v , and hence decreases (in a relative way) the preference for the other vertices in [n] \ {v}.
Intuitively, another way of expressing this effect is to say that, for different v1 6= v2 , the
j1 j2
events {u1 v1 } and {u2 v2 } are negatively correlated. We now formalize this result.
For an integer nv ≥ 1 and i ∈ [nv ], we denote the event that the ji(v) th edge of vertex u(v)
i
is attached to the earlier vertex v by
(v)
\ (v) ji
Env ,v = Env ,v (u(v) (v)
i , ji )i∈[nv ] = ui v . (8.4.2)
i∈[nv ]

The following lemma shows that the events Env ,v , for different v , are negatively correlated:
Lemma 8.10 (Negative correlation for edge connections in preferential attachment models)
Consider PA(m,δ) n (a) with m ≥ 2. Fix k ≥ 1. For distinct v1 , v2 , . . . , vk ∈ [n] and all
nv1 , . . . , nvk ≥ 1,
\ Y
P Envt ,vt ≤ P(Envt ,vt ). (8.4.3)
t∈[k] t∈[k]

These results also apply to PA(m,δ)

n (b) and PA(m,δ)
n (d) under identical conditions.
Proof We prove only the statement for PA(m,δ) n (a); the proofs for PA(m,δ)
n (b) and PA(m,δ)
n (d)
are very similar and are left as Exercises 8.10–8.13.
j
We define the edge number of the event {u v} to be m(u − 1) + j , which is the order
of the edge when we consider the edges as being attached in sequence in PA(1,δ/m) mn (a).
We use induction on the largest edge number present in the events Env1 ,v1 , . . . , Envk ,vk .
The induction hypothesis is that (8.4.3) holds for all k , all distinct v1 , v2 , . . . , vk ∈ [n], all
nv1 , . . . , nvk ≥ 1, and all choices of u(v
i
s)
, ji(vs ) such that maxi,s m(u(vi
s)
− 1) + ji(vs ) ≤ e,
where induction is performed with respect to e.
To initialize
Tk the induction, we note that for e = 1 the induction hypothesis holds trivially,
since s=1 Envs ,vs can be empty or can consist of exactly one event, and in the latter case
there is nothing to prove. This initializes the induction.
340 Small-World Phenomena in Preferential Attachment Models

To advance the induction, we assume that (8.4.3) holds for all k , all distinct vertices
v1 , v2 , . . . , vk ∈ [n], all nv1 , . . . , nvk ≥ 1, and all choices of u(v i
s)
, ji(vs ) such that we
(vs )
have maxi,s m(ui − 1) + ji (vs )
≤ e − 1, and we extend it to all k , all distinct ver-
tices v1 , v2 , . . . , vk ∈ [n], all nv1 , . . . , nvk ≥ 1, and all choices of u(v i
s)
, ji(vs ) such that
(vs ) (vs )
maxi,s m(ui − 1) + ji ≤ e. Clearly, by induction, we may restrict attention to the case
for which maxi,s m(u(v i
s)
− 1) + ji(vs ) = e.
We note that there is a unique choice of u, j such that m(u − 1) + j = e. There are
two possibilities: (1) either there is exactly one choice of s and u(v i
s)
, ji(vTs)
such that u(v i
s)
=
u, ji = j , or (2) there are at least two such choices. In the latter case, t∈[k] Envt ,vt = ∅,
(vs )

since the eth edge is connected to a unique vertex. Hence, there is nothing to prove.
We are left with investigating the case where there exists a unique s and ui(vs ) , ji(vs ) such
that u(vi
s)
= u, ji(vs ) = j . Denote the restriction of Envs ,vs to all other edges by
(vs )
\ ji
En0 vs ,vs =

u(v
i
s)
vs . (8.4.4)
(vs ) (vs )
i∈[nv ] : (ui ,ji )6=(u,j)

Then we can write

j
\ \
vs ∩ En0 vs ,vs ∩

Envt ,vt = u Envt ,vt . (8.4.5)
t∈[k] t∈[k] : vt 6=vs

Tk
By construction, all edge numbers of events in En0 v ,v ∩ i=1 : si 6=s Envi ,vi are at most e − 1.
By conditioning, we obtain

Envt ,vt ≤ E 1En0 v ,vs ∩Tt∈[k] : vt 6=vs Envt ,vt P(u

\ h i
j
P vs | PA(1,δ/m)
e−1 (a)) , (8.4.6)
s
t∈[k]

where we have used that the event En0 v ,v ∩ Envt ,vt is measurable with respect
T
t∈[k] : vt 6=vs
to PA(1,δ/m)
e−1 (a). We compute

j Dvs (u − 1, j − 1) + δ
P(u vs | PA(1,δ/m)
e−1 (a)) = , (8.4.7)
zu,j

where we recall that Dvs (u − 1, j − 1) is the degree of vertex vs after j − 1 edges of vertex
u have been attached, and we write the normalization constant in (8.4.7) as

zu,j = zu,j (δ, m) = (2m + δ)(u − 1) + (j − 1)(2 + δ/m) + 1 + δ. (8.4.8)

We wish to use the induction hypothesis. For this, we note that

1
X
Dvs (u − 1, j − 1) = m + j0 , (8.4.9)
{u0 vs }
(u0 ,j 0 ) : mu0 +j 0 ≤e−1

where we recall that e − 1 = m(u − 1) + j − 1.

j0
Each of the events {u0 vs } in (8.4.9) has edge number strictly smaller than e and
occurs with a non-negative multiplicative constant. As a result, we may use the induction
8.4 Path Counting in Preferential Attachment Models 341

hypothesis for each of these terms. Thus, we obtain, using also that m + δ ≥ 0,
\ m+δ Y
P Envt ,vt ≤ P(En0 vs ,vs ) P(Envt ,vt ) (8.4.10)
t∈[k]
zu,j t∈[k] : v 6=v t s

j0
X P(En0 v ,v ∩ {u0 vs }) Y
+ P(Envt ,vt ).
(u0 ,j 0 ) : mu0 +j 0 ≤e−1
zu,j t∈[k] : vt 6=vs

We use (8.4.9) to recombine the above as follows:

Dv (u − 1, j − 1) + δ i
Envt ,vt ≤ E 1En0 v ,vs s
\ h Y
P P(Envt ,vt ), (8.4.11)
t∈[k]
s zu,j t∈[k] : v 6=v t s

and the advancement of the induction hypothesis is complete when we note that
Dvs (u − 1, j − 1) + δ i
E 1En0 v ,vs
h
= P(Envs ,vs ). (8.4.12)
s zu,j
The claim in Lemma 8.10 follows by induction.

8.4.2 P ROBABILITIES OF PATH C ONNECTION E VENTS

We next study the probabilities of the events Env ,v when nv ≤ 2:
Lemma 8.11 (Edge connection events for at most two edges) Consider PA(m,δ)
n (a) and de-
note γ = m/(2m + δ). There exist absolute constants M1 = M1 (δ, m), M2 = M2 (δ, m),
such that the following statements hold:
(a) For m = 1 and any u > v ,
1
1 1 + δ Γ(u)Γ(v − 2+δ )
P(u v) = . (8.4.13)
2 + δ Γ(u + 1+δ
2+δ
)Γ(v)
Consequently, for m ≥ 2, any j ∈ [m] and u > v ,
j m+δ 1 M1
P u

v = 1−γ
(1 + o(1)) ≤ 1−γ γ ; (8.4.14)
2m + δ u v γ u v
the asymptotics in the first equality in (8.4.14) refer to the limit when v grows large.
(b) For m = 1 and any u2 > u1 > v ,
1 1+δ
1 1 1 + δ Γ(u2 )Γ(u1 + 2+δ )Γ(v + 2+δ )
P u1

v, u2 v = . (8.4.15)
2 + δ Γ(u2 + 1+δ
2+δ
)Γ(u1 + 1)Γ(v + 3+δ
2+δ
)
Consequently, for m ≥ 2, any j1 , j2 ∈ [m] and u2 > u1 > v ,

j1 j2
m+δ m+1+δ 1
P u1 v, u2 v ≤ (1 + o(1)) (8.4.16)
2m + δ 2m + δ (u1 u2 )1−γ v 2γ
M2
≤ ;
(u1 u2 )1−γ v 2γ
the asymptotics in the first equality in (8.4.16) refer to the limit when v grows large.
342 Small-World Phenomena in Preferential Attachment Models

(c) The asymptotics in (8.4.14) and (8.4.16) also apply to PA(m,δ)

n (b) and PA(m,δ)
n (d).

Proof We prove only the results for PA(m,δ)

n (a) in parts (a) and (b); the proofs for PA(m,δ)
n (b)
(m,δ)
and PAn (d) in part (c) are similar. Further, we will prove only (8.4.13) and (8.4.15); the
bounds in (8.4.14) and (8.4.16) follow immediately from the Stirling-type formula in [V1,
(8.3.9)].
For m = 1, part (a) follows from Remark 8.3. The proof of (8.4.14) for m ≥ 2 follows
j 1
by recalling that u v occurs when (m − 1)u + j [mv] \ [m(v − 1)], since j ∈ [m]
and m ≥ 2 are fixed.
We proceed with the proof of part (b) and start with (8.4.15) for m = 1. Take u2 > u1 .
We compute
h i
1 1 1 1
P u1 v, u2 v = E P u1 v, u2 v PA(1,δ)
u2 −1 (a)

Dv (u2 − 1) + δ

= E 1{u 1 v} . (8.4.17)
1 (u2 − 1)(2 + δ) + 1 + δ
We use the following iteration, for u > u1 :

E 1{u 1 v} (Dv (u) + δ)

h i
1

1
E 1{u 1 v} (Dv (u − 1) + δ)
h i
= 1+
(2 + δ)(u − 1) + 1 + δ 1

u
1
h i
= 1+δ
E 1
{u1 v}
(D v (u − 1) + δ)
u − 1 + 2+δ
Γ(u + 1)Γ(u1 + 1+δ ) h
E 1{u 1 v} (Dv (u1 ) + δ) .
i
2+δ
= 1+δ
(8.4.18)
Γ(u + 2+δ )Γ(u1 + 1) 1

Therefore,

1 1
P u1 v, u2 v
1 Γ(u2 )Γ(u1 + 1+δ )
1
h i
2+δ
= 1 E 1 (D v (u1 ) + δ)
(u2 − 1)(2 + δ) + 1 + δ Γ(u2 − 2+δ )Γ(u1 + 1) {u1 v}

1 Γ(u2 )Γ(u1 + 1+δ )

1
h i
2+δ
= 1+δ
E 1 (D v (u1 ) + δ) . (8.4.19)
2 + δ Γ(u2 + 2+δ )Γ(u1 + 1) {u1 v}

1{u
h i
We thus need to compute E 1
v}
(D v (u1 ) + δ) . We use recursion to obtain
1

1{u
h i
E 1
v}
(Dv (u1 ) + δ) PA(m,δ)
u1 −1 (a)
1

1
= (Dv (u1 − 1) + 1 + δ)P u1

v | PA(m,δ)
u1 −1 (a)
(Dv (u1 − 1) + δ)(Dv (u1 − 1) + 1 + δ)
= , (8.4.20)
(u1 − 1)(2 + δ) + 1 + δ
8.4 Path Counting in Preferential Attachment Models 343

1
since Dv (u1 ) = Dv (u1 − 1) + 1 on the event {u1 v}. By [V1, Proposition 8.15],
3+δ 1+δ
Γ(u + 2+δ
)Γ(v + 2+δ
)
E[(Dv (u) + δ)(Dv (u) + 1 + δ)] = 1+δ 3+δ
(2 + δ)(1 + δ). (8.4.21)
Γ(u + 2+δ
)Γ(v + 2+δ
)
Consequently,

E 1{u
h i
1
v}
(Dv (u1 − 1) + δ)
1

1 1+δ
Γ(u1 + 2+δ
)Γ(v 2+δ
)+
= 1 3+δ
(2 + δ)(1 + δ)
[(u1 − 1)(2 + δ) + 1 + δ]Γ(u1 − 2+δ )Γ(v + 2+δ
)
1
Γ(u1 + 2+δ )Γ(v + 1+δ
2+δ
)
= (1 + δ) 1+δ 3+δ
. (8.4.22)
Γ(u1 + 2+δ )Γ(v + 2+δ )
Combining (8.4.19)–(8.4.22), we arrive at
1 1
P u1

v, u2 v
1+δ 1 1+δ
1 + δ Γ(u2 )Γ(u1 + 2+δ ) Γ(u1 + 2+δ
)Γ(v + 2+δ
)
= 1+δ
× 1+δ 3+δ
2 + δ Γ(u2 + 2+δ )Γ(u1 + 1) Γ(u1 + 2+δ
)Γ(v + 2+δ
)
1
1 + δ Γ(u2 )Γ(u1 + 2+δ
)Γ(v+ 1+δ
2+δ
)
= 1+δ 3+δ
, (8.4.23)
2 + δ Γ(u2 + 2+δ )Γ(u1 + 1)Γ(v + 2+δ )
as claimed in (8.4.15).
j
The proof of (8.4.16) for m ≥ 2 follows by again recalling that u v occurs when
1
(m−1)u+j [mv]\[m(v−1)] and we replace δ by δ/m. Now there are two possibilities,
1 1
depending on whether m(u1 − 1) + j1 v1 and m(u2 − 1) + j2 v2 hold for the same
v1 = v2 ∈ [mv] \ [m(v − 1)] or for two different v1 , v2 ∈ [mv] \ [m(v − 1)].
For v1 = v2 , we use (8.4.15) to obtain a contribution that is asymptotically equal to
m m+δ 1
(1 + o(1)), (8.4.24)
m 2m + δ (u1 u2 )1−γ v 2γ
2

where the factor m comes from the m distinct choices for v1 = v2 and the factor 1/m2
originates since we need to multiply u1 , u2 and v in (8.4.15) by m.
For v1 6= v2 , we use the negative correlation in Lemma 8.10 to bound this contribu-
tion from above by the product of the probabilities in (8.4.14), so that the contribution is
asymptotically bounded by
m(m − 1) m + δ 2 1
(1 + o(1)). (8.4.25)
m2 2m + δ (u1 u2 )1−γ v 2γ
Summing (8.4.24) and (8.4.25) completes the proof of (8.4.16).

8.4.3 R ESULTING PATH P ROBABILITIES

With Lemmas 8.10 and 8.11 in hand, we are ready to prove Proposition 8.9:
Proof of Proposition 8.9. Before starting the proof, it is convenient to explain how to define
paths in preferential attachment models:
344 Small-World Phenomena in Preferential Attachment Models

Definition 8.12 (Edge-labeled paths in preferential attachment models with m ≥ 2) An

edge-labeled path ~π e is written as follows:
~π e = {(π0 , j0 ), . . . , (πk−1 , jk−1 ), πk }. (8.4.26)
ji ji
Here ji is such that πi πi+1 or πi+1 πi , depending on whether πi > πi+1 or πi <
e
πi+1 . We call an edge-labeled path ~π self-avoiding when ~π = (π0 , . . . , πk ) is self-avoiding,
where ~π = (π0 , . . . , πk ) denotes a path without edge labels. J
We have the bound
X
P(~π ⊆ PA(m,δ)
n )≤ P(~π e ⊆ PA(m,δ)
n ). (8.4.27)
j0 ,...,jk−1

Since ~π is self-avoiding, we can write

\
{~π e ⊆ PA(m,δ)
n }= Envs ,vs , (8.4.28)
s∈[k]

with either
j
Envs ,vs = {u vs }, (8.4.29)
where u > vs satisfy {u, vs } = {πi , πi+1 } for some i ∈ {0, . . . , k − 1} and j = ji ∈ [m],
or
s1 s2
Envs ,vs = {u1 vs , u2 vs }, (8.4.30)
where u1 , u2 > vs satisfy (u1 , vs , u2 ) = (πi , πi+1 , πi+2 ) for some i ∈ {0, . . . , k − 1} and
(s1 , s2 ) = (ji , ji+1 ) ∈ [m]2 .
In the first case, by (8.4.13),
j M1
P(Envs ,vs ) = P u

vs ≤ , (8.4.31)
u vsγ
1−γ

whereas in the second case, according to (8.4.15),

j1 j2 M2 M2
P(Envs ,vs ) = P(u1 vs , u2 vs ) ≤ 1−γ 2γ
= 1−γ γ 1−γ γ . (8.4.32)
(u1 u2 ) v u1 v u2 v
In both cases Mi , i = 1, 2, is an absolute constant. Lemma 8.10 then yields (8.4.1) with
C = M1 ∨ M2 ∨ 1, where the factor mk originates from the number of possible choices of
ji ∈ [m] for i ∈ {0, . . . , k − 1} (recall (8.4.27)).
At this point, it would be appropriate to prove Lemma 8.5, but we leave this as Exercise
8.16.

8.5 L OGARITHMIC L OWER B OUNDS ON THE D ISTANCE

In this section we prove lower bounds on distances in PA(m,δ)

n with m ≥ 2 and δ > 0, using
the bounds, derived in Section 8.4, on the probability that a path exists in PA(m,δ)
n ; this is our
main tool in this section. The following theorem is our main result:
8.5 Logarithmic Lower Bounds on the Distance 345

Theorem 8.13 (Logarithmic lower distances for δ > 0 and m ≥ 2) Consider PA(m,δ)
n (a)
with δ > 0 and m ≥ 2. Then, as n → ∞,
P(distPA(m,δ)
n (a)
(o1 , o2 ) ≤ (1 − ε) logν n) → 0, (8.5.1)
where ν > 1 is the spectral radius of the offspring operator T κ of the Pólya point tree
defined in (8.5.61). These results also apply to PA(m,δ)
n (b) and PA(m,δ)
n (d).
The proof of Theorem 8.13 is organized as follows. We prove path-counting bounds for
PA(m,δ)
n (a) and PA(m,δ)
n (b) with δ > 0 in Section 8.5.1. Those for PA(m,δ)
n (d) are deferred
to Section 8.5.2 and are based on the Pólya finite graph description of PA(m,δ)
n (d). We prove
Theorem 8.13 in Section 8.5.3. In Section 8.5.4 we prove the resulting lower bounds for
δ = 0 and m ≥ 2.

8.5.1 PATH C OUNTING FOR δ > 0

In this section we perform path-counting techniques for m ≥ 2. Recall the offspring
operator κ identified in Lemma 5.25, and let T κ be the integral operator defined by
X Z 1
(T κ f )(x, s) = κ((x, s)), (y, t))f (y, t)dy, (8.5.2)
t∈{O,Y} 0

where f : [0, 1] × {O, Y} → R. Introduce the function f by

Z 1
f (y, s) = κ∅ (x, (y, s))dx, s ∈ {O, Y}, y ∈ [0, 1], (8.5.3)
0

where
s (1{x>y,s=O} + 1{x<y,s=Y} )
c(∅)
κ∅ (x, (y, s)) = , (8.5.4)
(x ∨ y)χ (x ∧ y)1−χ
with c(∅)
Y
= m, c(∅)
O
= m + δ . Further, let
f ? (y, s) = f (y, s? ), where O
?
= Y, Y? = O. (8.5.5)
Our main path-counting result is the following proposition, which will be crucial in obtain-
ing the lower bound on typical distances:
Proposition 8.14 (Path-counting and multi-type branching processes) Consider PA(m,δ)
n (a)
for m ≥ 2. For every ε > 0, there exists a K = Kε such that, with o1 , o2 independently
and uar chosen from [n],
K
P(distPA(m,δ) (1 + ε)k hf, T k−2
(o1 , o2 ) = k) ≤ ?
κ f i. (8.5.6)
n (a)
n
Equivalently, with Zk the generation size of the multi-type branching process that started
from an individual of uniform age and with type ∅,
K
P(distPA(m,δ) (o1 , o2 ) = k) ≤ (1 + ε)k E[Zk ]. (8.5.7)
n (a)
n
These results also apply to PA(m,δ)
n (b) and PA(m,δ)
n (d).
346 Small-World Phenomena in Preferential Attachment Models

Below, we give a proof based on the negative correlations (8.4.3) in Lemma 8.10, as well
as Lemma 8.11. In Section 8.5.2, we redo the analysis for PA(m,δ)
n (d) using its finite-graph
Pólya version in Theorem 5.10.
Proof We start by analyzing the consequences of (8.4.3) in Lemma 8.10 for the existence
of paths. Note that
1 X
P(distPA(m,δ) (o1 , o2 ) = k) ≤ P(~π e ⊆ PA(m,δ)
n ), (8.5.8)
n
n2 ~πe
where we sum over all self-avoiding edge-labeled paths ~π e , as in Definition 8.12.

Sum over the Edge Labels

We first perform the sum over the edge labels. Given the vertices ~π = (π0 , . . . , πk ) in the
edge-labeled path ~π e , the number of choices for the edge labels is fixed. Since ~π is self-
avoiding, every vertex in it receives zero, one, or two edges from younger vertices. The
number of ways in which we can choose the edges equals (recall (8.4.27))
mk−b (m − 1)b , (8.5.9)
where b = b(~π ) equals the number of i for which there are two edges from πi : one to πi−1
and one to πi+1 . This occurs precisely when πi > max{πi−1 , πi+1 }.
We next give a convenient way to compute the powers of m and of m − 1. We associate
a label to each of the vertices in ~π e , denoted by label(πi ) ∈ {O, Y}, as follows:
(
Y when πi > πi−1 ,
label(πi ) = (8.5.10)
O when πi < πi−1 .
Then, for a given set of vertices in ~π , the number of ways in which we can choose the edge
labels equals mk−b (m − 1)b , with b equal to the number of i for which label(πi ) = Y and
label(πi+1 ) = O since this is equivalent to πi > max{πi−1 , πi+1 }, i.e., b(~π ) equals the
number of YO reversals in the vertices of ~π :
k−1
1{label(π )=
X
b(~π ) = i Y ,label(πi+1 )=O} . (8.5.11)
i=0

This improves upon the obvious upper bound mk , which was used in Proposition 8.9.

Connection Probabilities
We rely on the asymptotic equalities in (8.4.14) and (8.4.16) in Lemma 8.11. Fix ε > 0. We
note that the bounds with an extra factor 1 + ε/2 in (8.4.14) and (1 + ε/2)2 in (8.4.16) can
be used only when the inequality πi > M holds for some M = Mε for the relevant πi .
For the contributions where πi ≤ M , we use the uniform bounds in (8.4.14) and (8.4.16).
Note that the number of i for which πi ≤ M can be at most M since επ is self-avoiding.
Thus, in total, this gives rise to an additional factor (M1 ∨ M2 )M ≡ K . We next look at
the asymptotic equalities in (8.4.14) and (8.4.16), and conclude that the product over the
constant equals
m + δ k−c m + 1 + δ c
, (8.5.12)
2m + δ 2m + δ
8.5 Logarithmic Lower Bounds on the Distance 347

where now c = c(~π ) is the number of OY reversals in the vertices in ~π and is thus defined
by
k−1
1{label(π )=
X
c(~π ) = i O,label(πi+1 )=Y } . (8.5.13)
i=0

We can combine these factors effectively as

m + δ k−c m + 1 + δ c
mk−b (m − 1)b
2m + δ 2m + δ
k−1
Y
= c(∅) (∅)
label(π1 ) clabel(πk )? clabel(πi−1 ),label(πi ) ≡ A(~π ), (8.5.14)
i=2

where we recall (5.4.89) in Lemma 5.25 and c(∅) O

= m and cY(∅) = m + δ below (8.5.4).
We are left with computing the powers of πi for every i. Again, this depends on the precise
order of the vertices in the path ~π . Let pi denote the number of edges pointing towards πi ,
so that pi ∈ {0, 1, 2} and we have
pi = pi (~π ) = 1{label(πi )=O} + 1{label(πi+1 )=Y} ; (8.5.15)
here π0 = 1{label(πi+1 )=Y} , and pk = 1{label(πk )=O} .
Recall (8.4.14) and (8.4.16). We note that a factor πi−ai occurs, where, for i ∈ [k − 1],

2(1 − γ) when pi = 0,

ai = a(pi ) = 2(1 − γ) + (2γ − 1)pi = 1 when pi = 1, (8.5.16)

2γ when pi = 2,


whereas a0 = a0 (p0 ) = (1 − γ) + (2γ − 1)p0 = 1 − γ when p0 = 0 and a0 = a0 (p0 ) = γ

when p0 = 1, and the same for ak (pk ). We conclude that
k
K X Y −a(p )
P(distPA(m,δ) (o1 , o2 ) = k) ≤ (1 + ε)k
A(~
π ) πi i
n
n2 ~
π i=1
k
K k
X Y n a(pi )
= (1 + ε) A(~
π ) , (8.5.17)
nk+2 ~
π i=0
πi
Pk
where b = b(~π ) and pi = pi (~π ), and we have used that i=0 a(pi ) = k . We rescale the
sum and use that, uniformly for a ≥ M/n, b − a ≥ 1/n, and all p ≥ 0,
dnbe Z b
1 X −p
l ≤ (1 + ε/4) x−p dx, (8.5.18)
n l=dnae a

since x 7→ x−p is decreasing. Also using that (1 + ε/2)(1 + ε/4) ≤ 1 + ε for ε > 0
sufficiently small, we can rewrite this as
Z 1 Z 1 k
K Y −a(p )
P(distPA(m,δ) (o1 , o2 ) = k) ≤ (1 + ε)k
··· A(~x) xi i dx0 · · · dxk .
n
n 0 0 i=0
(8.5.19)
348 Small-World Phenomena in Preferential Attachment Models

Comparison with T κ
We next relate the (k + 1)-fold integral in (8.5.19) to the operator T κ defined in (8.5.2). For
this, we note that, by Lemma 5.25 and (8.5.3)–(8.5.5), hf, T k−2 ?
κ f i equals

1 1 k−1
cti−1 ,ti
Z Z X Y
··· c(∅) (∅)
label(x1 ) clabel(xk )? dx0 · · · dxk .
0 0 t ,...,t i=1
(xi−1 ∨ xi )χ (xi−1 ∧ xi )1−χ
1 k

(8.5.20)
k−1
Again, we note that c(∅) (∅)
label(x1 ) , clabel(xk )? and (cti−1 ,ti )i∈[k−1] are determined by (xi )i=0
alone. It is not hard to see that, since χ = 1 − γ ,
k k
Y 1 Y −a(p )
= xi i , (8.5.21)
i=1
(x i−1 ∨ x i )χ (x i−1 ∧ x i )1−χ
i=0

as required, so that the powers of x1 , . . . , xk in (8.5.19) and in (8.5.20) agree. Indeed, note
that ai = a(pi ) = 2(1 − γ) + pi (2γ − 1) and pi = 1{xi <xi−1 } + 1{xi <xi+1 } , so that, since
1 − γ = χ, −a(pi ) equals the power of xi in

(xi ∨ xi−1 )χ (xi ∧ xi−1 )1−χ (xi ∨ xi+1 )χ (xi ∧ xi+1 )1−χ . (8.5.22)

We conclude that
Z 1 Z 1 k
−a(pi )
Y
··· A(~x) xi dx0 · · · dxk = hf, T k−2 ?
κ f i, (8.5.23)
0 0 i=0

as required. This completes the proof of Proposition 8.14.

8.5.2 L OWER B OUNDS ON D ISTANCES FOR PA(m,δ)

n (d)
In this subsection we prove a stronger form of Proposition 8.14 for PA(m,δ)
n (d):
Proposition 8.15 (Path counting in PA(m,δ)
n (d): upper bound) Consider PA(m,δ)
n (d) for
m ≥ 2 with δ > −m. There exist C, p ≥ 0, such that, for every k ≥ 0,
Ck p
P(distPA(m,δ) (o1 , o2 ) = k) ≤ hf, T k−2 ?
κ f i. (8.5.24)
n (d)
n
Proposition 8.15 is a substantial improvement compared with Proposition 8.14, since the
correction factor is polynomial in k rather than exponential in k as in Proposition 8.14.
Further, the proof is much more direct. Indeed, instead of using the negative correlations
in Lemma 8.10, to prove Proposition 8.15, we rely on the explicit formulas for the edge
probabilities in Lemma 5.12.
We start by exploring this, and give the proof of Proposition 8.15 at the end of this section.
In the finite-graph Pólya urn version of PA(m,δ)
n (d) in Theorem 5.10, different edges are
conditionally independent, so that we obtain the following corollary, in which we recall that
Pn denotes the conditional probability given (ψ1 , . . . , ψn ):

Corollary 8.16 (Path probabilities in PA(m,δ)

n (d) conditioned on Beta variables) Consider
8.5 Logarithmic Lower Bounds on the Distance 349

PA(m,δ)
n (d) with m ≥ 2 and δ > −m. For any edge-labeled self-avoiding path ~π e as in
Definition 8.12,
n
Y n
Y
Pn (~π e ⊆ PA(m,δ)
n (d)) = ψs
ps
(1 − ψs )qs , (8.5.25)
s=1 s=1

where ps = ps and qs = qs are given by

π π

k k−1
1{π =s} [1{π + 1{πi+1 >s} ], 1{s∈(π ,π
X X
π π
ps = i i−1 >s}
qs = i i+1 )∪(πi+1 ,πi )}
(8.5.26)
i=0 i=0

(with π−1 = πk+1 = 0 by convention).

We note that ps equals the number of edges in the edge-labeled path ~π e that point towards
s. This is relevant since, in (5.3.29) in Lemma 5.12, every older vertex v in an edge receives
a factor ψv . Further, again by (5.3.29) in Lemma 5.12, there are factors 1 − ψs for every
s ∈ (v, u), so qs counts how many factors 1 − ψs occur for each s ∈ [n].
ji
Proof Recall Lemma 5.12. Multiply the factors Pn (πi πi+1 ) when πi > πi+1 , or
ji
Pn (πi+1 πi ) when πi < πi+1 , and collect the powers of ψs and 1 − ψs .
We see in Corollary 8.16 that we obtain expectations of the form E[ψ a (1 − ψ)b ], where
ψ ∼ Beta(α, β) and a, b ≥ 0, which we can analyze using Lemma 5.14. Exercises 8.20
and 8.21 investigate the consequences of Corollary 8.16 and Lemma 5.14 for connection
events as in Lemma 8.11. The above computation, when applied to Corollary 8.16, leads to
the following expression for the probability of the existence of paths like that in Proposition
8.14:
Corollary 8.17 (Path probabilities in PA(m,δ)
n (d)) Consider PA(m,δ)
n (d) with m ≥ 2 and
δ > −m. For any edge-labeled self-avoiding path ~π e as in Definition 8.12,
n
Y (α + ps − 1)ps (βs + qs − 1)qs
P(~π e ⊆ PA(m,δ)
n (d)) = , (8.5.27)
s=1
(α + βs + ps + qs − 1)ps +qs
where α = m + δ, βs = (2s − 3)m + δ(s − 1), and ps = pπs and qs = qsπ are defined in
(8.5.26).
Proof of Proposition 8.15. We start from (8.5.27) in Corollary 8.17. We note that pπs ∈
{0, 1, 2} since ~π e is an edge-labeled path, and so we have
(α − 1)0 = 1, (α + 1 − 1)1 = α = m + δ, (8.5.28)
and
(α + 2 − 1)1 = α(α + 1) = (m + δ)(m + 1 + δ). (8.5.29)
Thus, as in (8.5.12),
n
Y m + δ k−c m + 1 + δ c
(α + ps − 1)ps = , (8.5.30)
s=1
2m + δ 2m + δ
where c = c(~π ) is the number of OY reversals in the vertices in ~π defined in (8.5.13). The
350 Small-World Phenomena in Preferential Attachment Models

terms in (8.5.28) and (8.5.29) correspond to the prefactors in (8.4.14) and (8.4.16) in Lemma
8.11. Further, the factors m and m − 1 arise owing to the sum over (ji )k−1
i=0 . Combining with
k k−1 k−2 ?
the sums over (πi )i=0 and (ji )i=0 gives rise to the factor hf, T κ f i as in Proposition
8.14, as we prove next.
We compute
n
Y (βs + qs − 1)qs
s=2
(α + β s + ps + qs − 1)ps +qs
n
Y 1 (βs + qs − 1)qs
=
s=2
(α + βs + ps + qs − 1)ps (α + βs + qs − 1)qs
n qs −1
Y 1 Y βs + i
=
s=2
(α + βs + ps + qs − 1)ps i=0 α + βs + i
n qs −1
Y 1 Y α
= 1− . (8.5.31)
s=2
(α + βs + ps + qs − 1)ps i=0 α + βs + i

1{π =s} [1{π + 1{πi+1 >s} ], so that

Pk−1
Recall that ps = i=0 i i−1 >s}

n k k
Y Y −(1{πi−1 >πi } +1{πi+1 >πi } ) Y 1
s−ps = πi = . (8.5.32)
s=1 i=1 i=1
(πi−1 ∧ πi )

Further, recall that α = m + δ and βs = (2s − 3)m + δ(s − 1). Therefore,

n n
Y 1 −k
Y p
s
= (2m + δ) s−ps 1 + O (8.5.33)
s=1
(α + βs + p s + q s − 1)ps s=1
s
k n
−k
Y 1 Y p
s
= (2m + δ) 1+O .
i=1
(πi−1 ∧ π )
i s=1 s

As a result, we obtain the bound

n
Y n
X
1 + O(ps /s) ≤ exp O(1) ps /s
s=1 s=1
k
X
≤ exp O(1) 1/(πmin + l) ≤ (1 + k/πmin )O(1) , (8.5.34)
l=0

where πmin = minki=0 πi and we have used that ps = 0 for s < πmin . This bounds the first
factor in (8.5.31).
We continue by analyzing the second factor in (8.5.31). By a Taylor expansion,

α α
+ O (α + βs + i)−2 ,

log 1 − =− (8.5.35)
α + βs + i α + βs + i
8.5 Logarithmic Lower Bounds on the Distance 351

so that
s −1
n qY
Y α
1−
s=1 i=0
α + βs + i
s −1
n qX
!
X α −2

= exp − + O (α + βs + i)
s=1 i=0
α + βs + i
s −1
n qX
n
!
X
−2
X α
= exp O(1) qs s − . (8.5.36)
s=1 s=1 i=0
α + βs + i

Recall that γ = (m + δ)/(2m + δ) to compute

s −1
n qX s −1
n qX
X α X m+δ
− = − + O s−2
s=1 i=0
α + βs + i s=1 i=0
(2m + δ)s
n n
X qs X qs
= O(1) 2
−γ . (8.5.37)
s=1
s s=1
s

1{s∈(π ,π
Pk−1
Further, since qs = i=0 i i+1 )∪(πi+1 ,πi )}
by (8.5.26), we have
n k−1 πi+1 ∨πi −1
X qs X X 1
−γ = −γ . (8.5.38)
s=1
s i=0 s=πi ∧πi+1 +1
s

We now use that

b−1
X 1 b 1
= log +O , (8.5.39)
s=a+1
s a a

where the constant in O(1/a) is uniform in a and b, to arrive at

n k−1 k−1
X qs X ∨ πi
π
i+1
X 1
−γ = −γ log + O(1) .
s=1
s i=0
πi+1 ∧ π i i=0
(πi+1 ∧ πi )

Further,
Xn
qs
k−1 X
X n
1{s>(πi ∧πi+1 )} X k
c
≤ ≤
s=1
s 2
i=0 s=1
s 2
i=0
(πi ∧ πi+1 )
k
X 1 k
≤ 2c ≤ 2c log 1 + , (8.5.40)
l=0
πmin + l πmin

since ~π e is self-avoiding. Using this with πmin = 1 yields the bound 2c log (1 + k). We
conclude that
s −1
n qY k
Y α k O(1) Y (πi−1 ∧ πi )γ
1− = 1+ . (8.5.41)
s=1 i=0
α + βs + i πmin i=1
(πi−1 ∨ πi )γ

We next rewrite the above in terms of hf, T k−2 ?

κ f i, as in the proof of Proposition 8.14. By
352 Small-World Phenomena in Preferential Attachment Models

(8.5.33) and (8.5.41), we conclude that

n
k O(1) Y (α + ps − 1)ps
P(~π e ⊆ PA(m,δ)
n (d)) = 1+ . (8.5.42)
πmin s=1
(πi−1 ∧ πi )γ (πi−1 ∨ πi )γ

By (8.5.9), there are mk−b (m − 1)b choices for the edge labels. By (8.5.30) and (8.5.14),
n
Y
mk−b (m − 1)b (α + ps − 1)ps = A(~π ). (8.5.43)
s=1

Recall that {~π ⊆ PA(m,δ)

n (d)} denotes the event that there exists a labeling for which ~π e ⊆
(m,δ)
PAn (d) occurs. Then,
k
k O(1) 1 Y 1
P(~π ⊆ PA(m,δ)
n (d)) ≤ 1+ A(~
π ) ,
πmin n k
i=1
(πi−1 ∧ πi ) (πi−1 ∨ πi )γ
γ

where qi = label(πi ). Apart from the factor (1 + k/πmin )O(1) , this agrees exactly with the
summand in (8.5.17). Thus, we may follow the analysis of (8.5.17). Bounding πmin ≥ 1 and
summing this over the vertices in ~π , combined with an approximation of the discrete sum
by an integral, leads to the claim in (8.5.24).

8.5.3 L OGARITHMIC L OWER B OUNDS ON D ISTANCES : P ROOF OF T HEOREM 8.13

We now use Propositions 8.14 and 8.15 to prove Theorem 8.13:
Proof of lower bound on typical distances in Theorem 8.13. By Propositions 8.14 and 8.15,
it obviously suffices to study hf, T κk−2 f ? i. We write

k−2 ?
XZ
hf, T κ f i = f (x, s)κ?k−2 ((x, s), (y, t))f ? (y, t)dxdy, (8.5.44)
s,t S2

where κ?1 ((x, s), (y, t)) = κ((x, s), (y, t)), and we define recursively
X Z 1
?k
κ ((x, s), (y, t)) = κ?(k−1) ((x, s), (z, r))κ((z, r), (y, t))dx. (8.5.45)
r∈{O,Y} 0

We obtain the following bound, using (8.5.3) and (8.5.4):

cs (1{x>y,s=O} + 1{x<y,s=Y} )
Z 1 (∅)
c
f (y, s) = dx ≤ 1−χ , (8.5.46)
0 (x ∨ y) (x ∧ y)
χ 1−χ y
for some c > 0. Thus, also,
c
f ? (y, s) = f (y, s? ) ≤ . (8.5.47)
y 1−χ
Note that 1 − χ ∈ (0, 12 ), since δ > 0. We use that (x/y)1/2−χ + (y/x)1/2−χ ≥ 2, and

c c c2 x2χ−3/2 c2 y 2χ−3/2
[(x/y)1/2−χ + (y/x)1/2−χ ] = √ + √ . (8.5.48)
x1−χ y 1−χ y x
8.5 Logarithmic Lower Bounds on the Distance 353

This implies that

!
c2 X x2χ−3/2 y 2χ−3/2
Z
hf, T k−2 ?
κ f i ≤ κ?k−2
((x, s), (y, t)) √ + √ dxdy. (8.5.49)
2 s,t S 2 y x
√
Let q(y, t) = xt / y for some vector x = (xY , xO ). By Lemma 5.25,
cst (1{x>y,t=O} + 1{x<y,t=Y} ) xt
Z Z
κ(x, y)q(y)dy = √ dy
S S (x ∨ y)χ (x ∧ y)1−χ y
Z x Z 1
1 1 1 1
= csO xO 1−χ
√ dy + csY xY 1−χ
√ dy. (8.5.50)
0 x y y x y x y
χ χ

We next compute, since χ > 12 ,

Z x Z x
1 1 1 1 1 h 1 ix
√ dy = dy = (8.5.51)
χ 1−χ
0 x y y xχ 0 y 3/2−χ (χ − 21 )xχ y 1/2−χ y=0
1 1 2
= = √ ,
(χ − 12 )xχ x1/2−χ (2χ − 1) x
and
1 1
1 1 1 1 1
Z Z
√ dy = 1−χ dy = [y 1/2−χ ]1y=x
x y x
χ 1−χ y x x y 1/2+χ ( 12 − χ)x1−χ
1
= 1 [1 − x1/2−χ ]
( 2 − χ)x1−χ
2 2 2
= √ − ≤ √ . (8.5.52)
(2χ − 1) x (2χ − 1)x 1−χ (2χ − 1) x
√
Thus, for q(y, t) = xt / y ,
XZ 1 2(Mx)s h 1 1 i 2(Mx)s
κ((x, s), (y, t))q(y, t)dy = √ − 1−χ ≤ √ , (8.5.53)
t 0 (2χ − 1) x x (2χ − 1) x

where the matrix M = (Ms,t )s,t∈{O,Y} is defined as follows:

c c
Ms,t = OO OY . (8.5.54)
cYO cYY
√
By repeatedly using (8.5.53), the contribution of the x2χ−3/2 / y term in (8.5.44) is
bounded by
2 k−2 X Z 1 2χ−3/2 Z 1
x
Mk−2
s,t √ dx = x2(χ−1) dxh1, Mk−2 1i (8.5.55)
2χ − 1 s,t 0 x 0

1 2 k−1
= h1, Mk−2 1i,
2 2χ − 1
where 1 = (1, 1)T is the constant vector. A similar computation, now computing √ the inte-
2χ−3/2
grals from left to right instead, shows that the contribution due to y / x is bounded
354 Small-World Phenomena in Preferential Attachment Models

by
1 2 k−1 1 2 k−1
h(M∗ )k−2 1, 1i = h1, Mk−2 1i, (8.5.56)
2 2χ − 1 2 2χ − 1
so that we end up with

c2 2 k−1
hf, T k−2 ?
κ f i ≤ h1, Mk−2 1i. (8.5.57)
2 2χ − 1
The matrix M has largest eigenvalue
p
m(m + δ) + m(m − 1)(m + δ)(2m + 1 + δ)
λM = , (8.5.58)
2m + δ
and smallest eigenvalue
p
m(m + δ) − m(m − 1)(m + δ)(2m + 1 + δ)
µM = < λM . (8.5.59)
2m + δ
A simple computation using that m > −δ shows that µM > 0. Thus,

h1, Mk−2 1i = λk−2

M
h1, ui2 + µk−2
M
h1, vi2 ≤ λk−2
M
[h1, ui2 + h1, vi2 ]
= λk−2
M
k1k = 2λk−2
M
, (8.5.60)

where u and v are the two right-eigenvectors of M corresponding to λM and µM , respec-

tively. Denote
p
2λM m(m + δ) + m(m − 1)(m + δ)(2m + 1 + δ)
ν≡ =2 , (8.5.61)
2χ − 1 δ

where the equality follows since χ = (m + δ)/(2m + δ). Then this proves that

hf, T k−2 ? 2 k−2

κ f i ≤ c ν . (8.5.62)

Write
kn? = d(1 − 2ε) logν ne. (8.5.63)

Then, by Proposition 8.14, with K 0 = 2Kc2 and for all n sufficiently large,
?
kn
X K0 K 0 kn
P(distPA(m,δ) (o1 , o2 ) ≤ kn? ) ≤ [(1 + ε)ν]k−2 ≤ [(1 + ε)ν]kn
n (a)
k=0
n n
K 0 kn?
≤ nlog(1+ε)+1−2ε = o(1), (8.5.64)
n
since log(1 + ε) ≤ 1 + ε, as required. The proof for PA(m,δ)
n (d) proceeds identically, now
using Proposition 8.15 instead.
8.5 Logarithmic Lower Bounds on the Distance 355

8.5.4 E XTENSION TO L OWER B OUNDS ON D ISTANCES FOR δ = 0 AND m ≥ 2

In this subsection we investigate lower bounds on the distances in PA(m,δ) n when m ≥ 2
and δ = 0 and prove the lower bound in Theorem 8.8. We again perform the proof for
PA(m,0)
n = PA(m,0)
n (a); the proof for PAn(m,0) (b) and PA(m,0)
n (d) is identical.
We again start from Proposition 8.9, which as we will show implies that, for δ = 0,
l log n m
kn? = (8.5.65)
log(2Cm log n)
is whp a lower bound for typical distances in PAn(m,0) . Consider a path ~π = (π0 , π1 , . . . , πk )
(this path is not edge-labeled); then (8.4.1) in Proposition 8.9 implies that
k−1 k−1
2 k
Y 1 (Cm2 )k Y 1
P(~π ⊆ PAn (m,0)
) ≤ (Cm ) √ = √ , (8.5.66)
j=0
πj πj+1 π0 πk j=1 πj

since γ = 1/2 for δ = 0.

Pn √ Rn √ √
We use a=1 1/ a ≤ 0 1/ xdx = 2 n to compute that
k−1
4(Cm)k X 1 Y 1
P(distPA(m,0) (o1 , o2 ) = k) ≤ √
n
n2 ~
π
π0 πk j=1 πj
k−1
4(Cm)k X Y 1
≤ . (8.5.67)
n 1≤π1 ,...,πk−1 ≤n j=1
πj

Thus,
n
!k−1
4(Cm)k X 1
P(distPA(m,0) (o 1 , o2 ) = k) ≤
n
n s=1
s
k
4(Cm)
≤ (log n)k−1 . (8.5.68)
n
As a result,
X 4(Cm)k
P(distPA(m,0) (o1 , o2 ) ≤ kn? ) ≤ 4 (log n)k−1
n
k≤k?
n
n
X 1
≤4 (log n)−1 → 0, (8.5.69)
k≤k ?
2 k
n

?
since (2Cm log n)kn ≤ n, which follows from (8.5.65). This implies that the typical dis-
tances are whp at least kn? in (8.5.65). Since kn? ≤ (1 − ε) log n/ log log n, this completes
the proof of the lower bound on the graph distances for δ = 0 in Theorem 8.8.
In Exercise 8.18, the reader is asked to prove that the distance between vertices n − 1
and n is also whp at least kn? in (8.5.65). Exercise 8.19 considers whether the above proof
implies that the distance between vertices 1 and 2 is whp at least kn? in (8.5.65) as well.
356 Small-World Phenomena in Preferential Attachment Models

8.6 L OG L OG D ISTANCE L OWER B OUND FOR I NFINITE -VARIANCE D EGREES

In this section we prove the lower bound in Theorem 8.7 for δ < 0. We do so in a more
general setting, by assuming an upper bound on the existence of paths in a model that is
inspired by Proposition 8.9:
Assumption 8.18 (Path probabilities) There exist constants κ > 0 and γ > 0 such that,
for all n and self-avoiding paths ~π = (π0 , . . . , πk ) ∈ [n]l ,
k
Y
P(~π ⊆ PAn ) ≤ κ(πi−1 ∧ πi )−γ (πi ∨ πi−1 )γ−1 . (8.6.1)
i=1

By Proposition 8.9, Assumption 8.18 is satisfied for PA(m,δ)

n with γ = m/(2m + δ)
and κ = Cm. We expect doubly logarithmic distances in such networks if and only if
δ ∈ (−m, 0), so that 21 < γ < 1. Theorem 8.19, which is the main result in this section,
gives a lower bound on the typical distance in this case:
Theorem 8.19 (Doubly logarithmic lower bound on typical distances) Let PAn be a ran-
dom graph that satisfies Assumption 8.18 for some γ ∈ ( 21 , 1). Let o1 and o2 be chosen
independently and uar from [n], and define
2−γ
τ= ∈ (2, 3). (8.6.2)
1−γ
Then, for every ε > 0, there exists K = Kε such that
4 log log n
P distPAn (o1 , o2 ) ≥ − K ≥ 1 − ε. (8.6.3)
| log(τ − 2)|
We will prove Theorem 8.19 in the form where | log(τ − 2)| is replaced with log(γ/(1 −
γ)). For PA(m,δ)
n , γ = m/(2m + δ), so that
γ m 1
= = , (8.6.4)
1−γ m+δ τ −2
where we recall from (1.3.63) that τ = 3 + δ/m. Therefore, Theorem 8.19 proves the
lower bound in Theorem 8.7, and even extends this to lower tightness for the distances.
However, Theorem 8.19 also applies to other random graph models that satisfy Assumption
8.18, such as inhomogeneous random graph models with appropriate kernels such as pu,v ≤
κ(u ∧ v)−γ (u ∨ v)γ−1 .
In fact, we will prove a slightly different version of Theorem 8.19, namely, that, uniformly
in u, v ≥ εn, we can choose K = Kε > 0 sufficiently large, so that, uniformly in n ≥ 1,
4 log log n
P distPAn (u, v) ≤ − K ≤ ε. (8.6.5)
| log(τ − 2)|
Since P(o1 ≤ εn) ≤ ε, this clearly implies Theorem 8.19 with ε in (8.6.3) replaced by 3ε.
The proof of Theorem 8.19 is based on a constrained or truncated first-moment method,
similar to the method used for NRn (w) in Theorem 6.7 and for CMn (d) in Theorem 7.8.
Owing to the fact that the path probabilities in Assumption 8.18 satisfy a rather different
8.6 Log Log Distance Lower Bound for Infinite-Variance Degrees 357

bound compared with those for NRn (w) and CMn (d), this truncated first-moment method
looks rather different compared with those presented in the proof of Theorems 6.7 and 7.8.
This difference also explains why distances are twice as large for PAn in Theorem 8.19
compared with Theorems 6.7 and 7.8.
Let us now briefly explain the truncated first-moment method. We start with an expla-
nation of the (unconstrained) first-moment bound and its shortcomings. Let u, v ≥ εn be
distinct vertices of PAn . Then, by Assumption 8.18, for kn ∈ N,
2k
! 2k
[n [ X n X

P(distPAn (u, v) ≤ 2kn ) = P {~π ⊆ PAn } ≤ p(~π ), (8.6.6)

k=1 ~
π k=1 ~
π

where ~π = (π0 , . . . , πk ) is a self-avoiding path in PAn with π0 = u and πk = v , and we

define
k
Y
p(~π ) = p(πj−1 , πj ) where p(a, b) = κ(a ∧ b)−γ (a ∨ b)γ−1 . (8.6.7)
j=1

The shortcoming of the above bound is that the paths that contribute most to the total
weight are those that connect u or v quickly to very old vertices. However, such paths are
quite unlikely to be present. This explains why the very old vertices have to be removed in
order to get a reasonable estimate, and why this leads to small errors when we do so. For
this, and similarly to Section 6.3.2 for NRn (w), we split the paths into good and bad paths:

Definition 8.20 (Good and bad paths for PAn ) For a decreasing sequence g = (gl )l=0,...,k
of positive integers, we consider a path ~π = (π0 , . . . , πk ) to be good when πl ∧ πk−l ≥ gl
for all l ∈ {0, . . . , k}. We denote the event that there exists a good path of length k between
u and v by Ek (u, v). We further let Fl (v) denote the event that there exists a bad path of
length l in PAn starting at v . This means that there exists a path ~π ⊆ PAn , with v = π0 ,
such that π0 ≥ g0 , . . . , πl−1 ≥ gl−1 , but πl < gl , i.e., a path that exceeds the threshold after
exactly l steps. J

For fixed vertices u, v ≥ g0 , we thus obtain

k
X k
X 2k
X
P(distPAn (u, v) ≤ 2k) ≤ P(Fl (u)) + P(Fl (v)) + P(El (u, v)). (8.6.8)
l=1 l=1 l=1

The truncated first-moment estimate arises when one is bounding the events of the existence
of certain good or bad paths by their expected numbers. Owing to the split into good and
bad paths, these sums will now behave better than without this split. Inequality (8.6.8) is
identical to the inequality (6.3.30) used in the proof of Theorem 6.7. However, the notion of
good has changed owing to the fact that the vertices no longer have a weight, but rather an
age.
By Assumption 8.18,
P(~π ⊆ PAn ) ≤ p(~π ). (8.6.9)
358 Small-World Phenomena in Preferential Attachment Models

Thus, for v ≥ g0 and l ∈ [k], and with ~π = (π0 , . . . , πl ) with π0 = u,

n n gl −1 gl −1
X X X X
P(Fl (u)) ≤ ··· p(~π ) = fl−1,n (u, w), (8.6.10)
π1 =g1 πl−1 =gl−1 πl =1 w=1

where, for l ∈ [k] and u, v ∈ [n], we set

n n
fl,n (u, v) := 1{v≥g0 }
X X
··· p(u, π1 , . . . , πl−1 , v), (8.6.11)
π1 =g1 πl−1 =gl−1

and f0,n (u, v) = 1{v=u} 1{u≤n} .

We next investigate P(El (u, v)). Since p is symmetric, for all l ≤ 2k , we have
n
X n
X n
X
P(El (u, v)) ≤ ··· ··· p(u, π1 , . . . , πbl/2c )p(πbl/2c , . . . , πl−1 , v)
π1 =g1 πbl/2c =gbl/2c πl−1 =g1
n
X
= fbl/2c,n (u, πbl/2c )fdl/2e,n (v, πbl/2c ). (8.6.12)
πbl/2c =gbl/2c

We conclude that all terms on the rhs of (8.6.8) are now bounded in terms of fl,n (u, v), as
k −1
kn gX k −1
kn gX
X X
P(distPAn (u, v) ≤ 2kn ) ≤ fk,n (u, w) + fk,n (v, w) (8.6.13)
k=1 w=1 k=1 w=1
2kn
X n
X
+ fbk/2c,n (u, πbk/2c )fdk/2e,n (v, πbk/2c ).
k=1 πbk/2c =gbk/2c

The remaining task in the proof is to choose kn ∈ N, as well as a decreasing sequence

(gk )kk=0
n
such that 2 ≤ gkn ≤ · · · ≤ g0 ≤ n, that allow us to bound the rhs of (8.6.13).
We study fl,n (u, v) in detail in the remainder of this section. This analysis is quite in-
volved. Using the recursive representation
n
X
fk+1,n (u, v) = fk,n (u, w)p(w, v), (8.6.14)
w=gk

we establish upper bounds on fk,n (u, v) and use these to show that the rightmost term in
(8.6.8) remains small when k = kn is chosen appropriately.
Our aim is to provide an upper bound of the form

fk,n (u, v) ≤ αk v −γ + βk v γ−1 for all u, v ∈ [n], (8.6.15)

for suitably chosen parameters αk , βk ≥ 0. Key to this choice is the following lemma:

Lemma 8.21 (Recursive bounds) Let γ ∈ ( 21 , 1) and suppose that 2 ≤ ` ≤ n, α, β ≥ 0.

Assume that q` : [n] → [0, ∞) satisfies

q` (w) ≤ 1{w>`} (αw−γ + βwγ−1 ) for all w ∈ [n]. (8.6.16)

8.6 Log Log Distance Lower Bound for Infinite-Variance Degrees 359

Then there exists a constant c = c(γ, κ) > 1 such that, for all u ∈ [n],
n
X
q` (w)p(w, u) ≤ c α log(n/`) + βn2γ−1 u−γ

w=1

+ c1{u>`} α`1−2γ + β log(n/`) uγ−1 .

(8.6.17)
Proof Throughout the remainder of this section, we frequently use the following bound:
 1 1−p
b  1−p b
 for p ∈ [0, 1),
X
−p
w ≤ log(b/(a − 1)) for p = 1, (8.6.18)

 1
w=a 1−p
p−1
(a − 1) for p > 1.
We use (8.6.7) and (8.6.16) to rewrite
n n u−1
q` (w)p(w, u) + 1{u>`}
X X X
q` (w)p(w, u) = q` (w)p(w, u)
w=1 w=u∨` w=`
Xn
= κ(αw−γ + βwγ−1 )wγ−1 u−γ
w=u∨`
u−1
+ 1{u>`}
X
κ(αw−γ + βwγ−1 )w−γ uγ−1 . (8.6.19)
w=`

Simplifying the sums leads to, using γ ∈ ( 12 , 1) and (8.6.18),

n
X Xn n
X
−1
q` (w)p(w, u) ≤ κ α w +β w2γ−2 u−γ
w=1 w=u∨` w=u∨`
u−1 u−1
+ κ1{u>`} α
X X
−2γ
w +β w−1 uγ−1
w=` w=`
n β

≤ κ α log + n2γ−1 u−γ
`−1 2γ − 1
!
α u

+ κ1{u>`} (` − 1)1−2γ + β log uγ−1 .
1 − 2γ `−1
(8.6.20)
This immediately implies the assertion since ` ≥ 2 and u ∈ [n] by assumption.
We aim to apply Lemma 8.21 iteratively. We use induction in k to prove that there exist
(gk )k≥0 , (αk )k≥1 , and (βk )k≥1 such that (8.6.15) holds. The sequences (gk )k≥0 , (αk )k≥1
and (βk )k≥1 are chosen as follows:
Definition 8.22 (Choices of parameters (gk )k≥0 , (αk )k≥1 , and (βk )k≥1 ) Fix κ > 1 and
ε ∈ (0, 1). We define
g0 = dεne , α1 = κg0γ−1 , β1 = κg0−γ , (8.6.21)
and, recursively, for k ≥ 1, we make the following choices:
360 Small-World Phenomena in Preferential Attachment Models

(1) gk is the smallest integer such that

1 ε
αk gk1−γ ≥ 2 2 ; (8.6.22)
1−γ π k
(2) αk+1 is chosen as
αk+1 = c αk log(n/gk ) + βk n2γ−1 ;

(8.6.23)

(3) βk+1 is chosen as

βk+1 = c αk gk1−2γ + βk log(n/gk ) ,

(8.6.24)

where c = c(κ, γ) > 1 is the constant appearing in Lemma 8.21. J

One can check that k 7→ gk is non-increasing, while k 7→ αk , βk are non-decreasing.
Further, since gk is the smallest integer such that (8.6.22) holds, we have
1 ε
αk (gk − 1)1−γ < 2 2 , (8.6.25)
1−γ π k
which, since gk ≥ 2, in turn implies that
1 ε
αk gk1−γ < 21−γ 2 2 , (8.6.26)
1−γ π k
which we crucially use below.
We recall fk,n (u, v) in (8.6.11), with p(a, b) defined in (8.6.7), and that fk,n satisfies the
recursion in (8.6.14). The following lemma derives recursive bounds on fk,n :
Lemma 8.23 (Recursive bound on fk,n ) For the sequences in Definition 8.22, for every
u, v ∈ [n] and k ∈ N,
fk,n (u, v) ≤ αk v −γ + 1{v>gk−1 } βk v γ−1 . (8.6.27)

Proof We prove (8.6.27) by induction on k . For k = 1, using α1 = κg0γ−1 , β1 = κg0−γ ,

f1,n (u, v) = p(u, v)1{u≥g0 } ≤ κg0γ−1 v −γ + 1{v>g0 } κg0−γ v γ−1

= α1 v −γ + 1{v>g0 } β1 v γ−1 , (8.6.28)

as required. This initializes the induction hypothesis.

We now proceed to advance the induction: suppose that gk−1 , αk and βk are such that
(8.6.27) holds. We use the recursive property of fk,n in (8.6.14) and apply Lemma 8.21,
with qgk (w) = fk,n (u, w)1{w>gk } , so that, by Definition 8.22,
h i
fk+1,n (u, v) ≤ c αk log(n/gk ) + βk n2γ−1 v −γ

+ c1{v>gk } αk gk1−2γ + βk log(n/gk ) v γ−1

h i

= αk+1 v −γ + 1{v>gk } βk+1 v γ−1 , (8.6.29)

as required. This advances the induction hypothesis, and thus completes the proof.
8.6 Log Log Distance Lower Bound for Infinite-Variance Degrees 361

We next use (8.6.15) to prove Theorem 8.19. We start with the contributions due to bad
paths. Summing over (8.6.27) in Lemma 8.23, and using (8.6.18) and (8.6.26), we obtain
gk −1
X 1 2ε
fk,n (v, w) ≤ αk gk1−γ 21−γ ≤ 2 2 , (8.6.30)
w=1
1−γ π k
which, when summed over all k ≥ 1, is bounded by ε/3. Hence, together the first two
summands on the rhs in (8.6.13) are smaller than 2ε/3. This shows that the probability that
there exists a bad path from either u or v is small, uniformly in u, v ≥ εn.
We continue with the contributions due to good paths, which is the most delicate part of
the argument. For this, it remains for us to choose kn as large as possible while ensuring that
gkn ≥ 2 and
2kn
X n
X
fbk/2c,n (u, πbk/2c )fdk/2e,n (v, πbk/2c ) ≤ ε/3. (8.6.31)
k=1 πbk/2c =gbk/2c

Proving (8.6.31) for the appropriate kn is the main content of the remainder of this section.
Recall from Definition 8.22 that gk is the largest integer satisfying (8.6.22) and that the
parameters αk , βk are defined via equalities in (8.6.23) and (8.6.24). To establish lower
bounds for the decay of gk , we instead investigate the growth of ηk = n/gk > 0 for k large:
Proposition 8.24 (Inductive bound on ηk ) Recall Definition 8.22, and let ηk = n/gk . Let
ε ∈ (0, 1). Then, there exists a constant B = Bε such that, for any k = O(log log n),
−k/2
ηk ≤ eB(τ −2) , (8.6.32)
where we recall the degree power-law exponent τ = (2 − γ)/(1 − γ) from (8.6.2).
Exercise 8.22 asks the reader to relate the above bound to the growth of the Pólya point
tree.
Before turning to the proof of Proposition 8.24, we comment on it. Recall that we are
summing over πk ≥ gk , which is equivalent to n/πk ≤ ηk . The sums in (8.6.13) are such
that the summands obey this bound for appropriate values of k .
Compare this with (6.3.30), where, instead, the weights obey wπk ≤ bk . We see that ηk
plays a similar role to bk . Recall that wπk is indeed close to the degree of πk in GRGn (w),
while the degree of vertex πk in PA(m,δ)n is close to (n/πk )1/(τ −1) by [V1, (8.3.12)]. Thus,
−k/2
the truncation n/πk ≤ ηk can be interpreted as a bound of order eB(τ −2) on the degree
−k
of πk . Note, however, that bk ≈ e(τ −2) by (6.3.42), which grows roughly twice as quickly
as ηk . This is again a sign that distances in PAn are twice as large as those in GRGn (w).
Before proving Proposition 8.24, we first derive a recursive bound on ηk :
Lemma 8.25 (Recursive bound on ηk ) Recall Definition 8.22, and let ηk = n/gk . Then
there exists a constant C > 0 independent of η0 = ε > 0 such that
1−γ 1−γ
≤ C ηkγ + ηk+1

ηk+2 log ηk+1 . (8.6.33)
Proof By the definition of gk in (8.6.22) and the fact that γ − 1 < 0,
1−γ γ−1 1 π2
ηk+2 = n1−γ gk+2 ≤ n1−γ (k + 2)2 αk+2 . (8.6.34)
1−γ ε
362 Small-World Phenomena in Preferential Attachment Models

By the definition of αk in (8.6.23),

1 π2
n1−γ (k + 2)2 αk+2
1−γ ε
c π2
= n1−γ (k + 2)2 αk+1 log ηk+1 + βk+1 n2γ−1 .

(8.6.35)
1−γ ε
We bound each of the two terms in (8.6.35) separately. By (8.6.26),

(1 − γ)ε 1−γ γ−1

αk+1 ≤ 2 gk+1 . (8.6.36)
π 2 (k + 1)2

We conclude that the first term in (8.6.35) is bounded by

c π2
n1−γ (k + 2)2 αk+1 log ηk+1
1−γ ε
c21−γ π 2 (1 − γ)ε γ−1
≤ n1−γ (k + 2)2 2 g log ηk+1
1−γ ε π (k + 1)2 k+1
(k + 2)2 1−γ
= c21−γ η log ηk+1 , (8.6.37)
(k + 1)2 k+1

which is part of the second term in (8.6.33).

We now have to bound the second term in (8.6.35) by the rhs of (8.6.33). This term equals

c π2 c π2
n1−γ (k + 2)2 βk+1 n2γ−1 = (k + 2)2 βk+1 nγ . (8.6.38)
1−γ ε 1 − γ 6ε

We use the definition of βk in (8.6.24) to write

c π2 c π2
(k + 2)2 βk+1 nγ = (k + 2)2 nγ c αk gk1−2γ + βk log ηk ,

(8.6.39)
1−γ ε 1−γ ε
which again leads to two terms that we bound separately. For the first term in (8.6.39), we
again use the fact that
(1 − γ)ε γ−1
αk ≤ 21−γ g ,
π2 k2 k
to arrive at

c π2
(k + 2)2 nγ cαk gk1−2γ
1−γ ε
c21−γ π 2 (1 − γ)ε (k + 2)2 γ
≤ (k + 2)2 nγ c 2 2 gkγ−1 gk1−2γ = c2 21−γ ηk , (8.6.40)
1−γ ε π k k2
which contributes to the first term on the rhs of (8.6.33).
By Definition 8.22 we have cβk n2γ−1 ≤ αk+1 , so that, using (8.6.36), the second term
8.6 Log Log Distance Lower Bound for Infinite-Variance Degrees 363

in (8.6.39) is bounded by

c π2
(k + 2)2 nγ cβk log ηk
1−γ ε
c π2
≤ (k + 2)2 αk+1 n1−γ log ηk
1−γ ε
(k + 2)2 γ−1 1−γ
≤ c21−γ g n log ηk
(k + 1)2 k+1
(k + 2)2 1−γ
= c21−γ η log ηk . (8.6.41)
(k + 1)2 k+1
Since k 7→ gk is decreasing, it follows that k 7→ ηk is increasing, so that

(k + 2)2 1−γ 1−γ (k + 2)

2
c21−γ η log η k ≤ c2 η 1−γ log ηk+1 , (8.6.42)
(k + 1)2 k+1 (k + 1)2 k+1
which again contributes to the second term in (8.6.33). This proves that both terms in
(8.6.39), and thus the second term in (8.6.39), are bounded by the rhs of (8.6.33).
Putting together all the bounds and taking a sufficiently large constant C = C(γ), we
obtain (8.6.33).

Proof of Proposition 8.24. We prove the proposition by induction on k , and start by ini-
tializing the induction. For k = 0,
n n
η0 = = ≤ ε−1 ≤ eB , (8.6.43)
g0 dεne

when B ≥ log(1/ε), which initializes the induction.

We next advance the induction hypothesis. Suppose that the statement is true for all l ∈
[k − 1]; we will advance it to k . We use that, for all z, w ≥ 0,

(z + w)1/(1−γ) ≤ 21/(1−γ) z 1/(1−γ) + w1/(1−γ) . (8.6.44)

By Lemma 8.25, for a different constant C ,

γ/(1−γ)
+ ηk−1 (log ηk−1 )1/(1−γ)

ηk ≤ C ηk−2
1/(τ −2)
+ ηk−1 (log ηk−1 )1/(1−γ) .

= C ηk−2 (8.6.45)

Using this inequality, we can write

1/(τ −2)
+ ηk−3 (log ηk−3 )1/(1−γ) ,

ηk−2 ≤ C ηk−4 (8.6.46)

so that, by (8.6.44),
1/(τ −2)2 1/(τ −2)
ηk ≤ C(2C)1/(τ −2) ηk−4 + ηk−3 (log ηk−3 )1/[(1−γ)(τ −2)]

+ Cηk−1 (log ηk−1 )1/(1−γ) . (8.6.47)

364 Small-World Phenomena in Preferential Attachment Models

Renaming 2C as C for simplicity, and iterating these bounds, we obtain

(τ −2)−k/2
Pk/2 −l
ηk ≤ C l=0 (τ −2) η0
k/2
(τ −2)−(i−1)
X Pi−1 −l −(i−1)
+ C l=0 (τ −2) ηk−2i+1 (log ηk−2i+1 )(τ −2) /(1−γ)
. (8.6.48)
i=1

For the first term in (8.6.48), we use the upper bound η0 ≤ 1/ε to obtain
(τ −2)−k/2
Pk/2 −l Pk/2 −l −k/2
C l=0 (τ −2) η0 ≤C l=0 (τ −2) e(B/2)(τ −2) (8.6.49)
−k/2
≤ 12 eB(τ −2) ,
Pk/2 −l −k/2
when B ≥ 2 log(1/ε), and we use that C l=0 (τ −2) ≤ 12 e(B/2)(τ −2) for B large, since
C is independent of ε.
For the second term in (8.6.48), we use the induction hypothesis to obtain
k/2
(τ −2)−(i−1)
X Pi−1 −l −(i−1)
C l=0 (τ −2) ηk−2i+1 (log ηk−2i+1 )(τ −2) /(1−γ)
(8.6.50)
i=1
k/2 h i(τ −2)−(i−1) /(1−γ)
X Pi−1 −l −(k−1)/2
≤ C l=0 (τ −2) eB(τ −2) B(τ − 2)−(k−2i+1)/2 .
i=1

We can write
−(k−1)/2 −k/2 −k/2 √
eB(τ −2) = eB(τ −2) eB(τ −2) ( τ −2−1)
. (8.6.51)
√
Since τ − 2 − 1 < 0, for k = O(log log n) we can take B large enough that, uniformly
in k ≥ 1,
k/2 i(τ −2)−(i−1) /(1−γ)
Pi−1 −l √ h
B(τ −2)−k/2 ( τ −2−1)
X
C l=0 (τ −2) e B(τ − 2) −(k−2i+1)/2
< 12 .
i=1
(8.6.52)
We can now sum the bounds in (8.6.49) and (8.6.50)–(8.6.52) to obtain
−k/2 −k/2
ηk ≤ 12 + 21 eB(τ −2) = eB(τ −2)

, (8.6.53)
as required. This advances the induction hypothesis, and thus completes the proof of Propo-
sition 8.24 for B ≥ 2 log(1/ε).
We are now ready to complete the proof of Theorem 8.19:
Completion of the proof of Theorem 8.19. Recall that we were left with proving (8.6.31),
i.e., that uniformly in u, v ≥ εn,
2kn
X n
X
fbk/2c,n (u, πbk/2c )fdk/2e,n (v, πbk/2c ) ≤ ε. (8.6.54)
k=1 πbk/2c =gbk/2c

A crucial part of the proof will be the optimal choice of kn . By Lemma 8.24,
−k/2
gk ≥ n/ηk ≥ neB(τ −2) . (8.6.55)
8.6 Log Log Distance Lower Bound for Infinite-Variance Degrees 365

We use Lemma 8.23 to obtain the estimate

fbk/2c,n (u, w) ≤ αbk/2c w−γ + 1{w>gbk/2c−1 } βbk/2c wγ−1

≤ αdk/2e w−γ + 1{w>gdk/2e−1 } βdk/2e wγ−1 ,

(8.6.56)

since k 7→ αk and k 7→ βk are non-decreasing, while k 7→ gk is non-increasing. Thus,

again using Lemma 8.23,

2kn
X n
X
fbk/2c,n (u, πbk/2c )fdk/2e,n (v, πbk/2c )
k=1 πbk/2c =gbk/2c
2kn n
αdk/2e w−γ + 1{w>gdk/2e−1 } βdk/2e wγ−1
X X 2
≤
k=1 w=gbk/2c
2kn n
w−2γ + 1{w>gdk/2e−1 } βdk/2e
X X
2 2
≤2 αdk/2e w2(γ−1) . (8.6.57)
k=1 w=gbk/2c

This gives two terms, which we estimate one at a time. For the first term, using that γ > 12
and that k 7→ αk is non-decreasing, while k 7→ gk is non-increasing, by (8.6.18), we find
that
2kn n n 2k
X X
2 2 X
2 αdk/2e w−2γ ≤ α2 (gbk/2c − 1)1−2γ (8.6.58)
k=1 w=gbk/2c
2γ − 1 k=1 dk/2e
2kn kn
2 X 1−2γ 4 × 22γ−1 ηkn X
≤ 2
αdk/2e gdk/2e = α2 g 2−2γ .
2γ − 1 k=1 2γ − 1 n k=1 k k

Using (8.6.26), we obtain

ε 2
αk2 gk2−2γ ≤ 22(1−γ) k −2 2 (1 − γ) , (8.6.59)
π
so that
2kn n
X X
2 ηkn +1 X 2 −4 Cηkn +1 ε2
2 αdk/2e w−2γ ≤ C ε k ≤ . (8.6.60)
k=1 w=gbk/2c
n k≥1 n

This bounds the first term on the rhs of (8.6.57).

For the second term on the rhs of (8.6.57), we again use that k 7→ gk is non-increasing,
and that 2(γ − 1) > −1 since γ ∈ ( 12 , 1), to obtain, using (8.6.18),

2kn n n k
X X
2 4 X
2 βdk/2e w2(γ−1) ≤ β 2 n2γ−1 . (8.6.61)
k=1 w=gdk/2e−1
2γ − 1 k=1 k
366 Small-World Phenomena in Preferential Attachment Models

By the definition of αk in (8.6.23), we get βk n2γ−1 ≤ αk+1 . Thus,

kn kn kn
X X 1 2−2γ X 2−2γ
βk2 n2γ−1 ≤ 2
αk+1 n1−2γ = ηkn +1 2
αk+1 gk+1
k=1 k=1
n k=1

Cηk2−2γn +1
ε2 Cηkn +1 ε2
≤ ≤ , (8.6.62)
n n
as in (8.6.58)–(8.6.60), since γ ∈ ( 12 , 1).
We conclude that, using (8.6.32) in Proposition 8.24,
2kn
X n
X
fbk/2c,n (u, πbk/2c )fdk/2e,n (v, πbk/2c )
k=1 πbk/2c =gbk/2c

ε2 ηkn +1 Cε2 B(τ −2)−kn /2 ε

≤C ≤ e ≤ , (8.6.63)
n n 3
when finally choosing
2 log log n
kn ≤ − K, (8.6.64)
| log (τ − 2)|
for K = Kε sufficiently large. Thus, for ε > 0,
P(distPAn (u, v) ≤ 2kn ) ≤ 2ε/3 + ε/3 = ε, (8.6.65)
whenever u, v ≥ g0 = dεne. Note that from (8.6.32) in Proposition 8.24 this choice also
ensures that gkn = n/ηkn ≥ 2 for K sufficiently large. This implies Theorem 8.19.

8.7 L OG L OG U PPER B OUNDS FOR PAM S WITH I NFINITE -VARIANCE D EGREES

In this section we investigate PA(m,δ)

n with m ≥ 2 and δ ∈ (−m, 0) and prove a first step
towards the upper bound in Theorem 8.7. This proof is divided into two key steps. In the
first, in Theorem 8.26 in Section 8.7.1, we bound the diameter of the core, which consists of
the vertices with degree at least a certain power of log n. This argument is close in spirit to
the argument used to bound the diameter of the core in the configuration model in Theorem
7.13, but substantial adaptations are necessary. After this, in Theorem 8.32, we derive a
bound on the distance between a typical vertex having a small degree and the core.

8.7.1 D IAMETER OF THE C ORE

In what follows, it will be convenient to prove Theorem 8.7 for 2n rather than for n. Clearly,
this does not make any difference to the results. We will adapt the proof of Theorem 7.13 to
2n . We recall that τ = 3 + δ/m from (1.3.63), so that δ ∈ (−m, 0) corresponds to
PA(m,δ)
τ ∈ (2, 3). Throughout this section, we fix m ≥ 2.
We take σ > 1/(3 − τ ) = −m/δ > 1 and define the core Coren of PA(m,δ) 2n by
σ

Coren = v ∈ [n] : Dv (n) ≥ (log n) , (8.7.1)
i.e., all vertices in [n] that at time n have degree at least (log n)σ . We emphasize that we
require the degree to be large at time n, rather than at the final time 2n for PA(m,δ)
2n .
8.7 Log Log Upper Bounds for PAMs with Infinite-Variance Degrees 367

Let us explain the philosophy of the proof. Note that Coren requires only information
about PA(m,δ)
n , while we are going to study its diameter in PA(m,δ)
2n . This allows us to use
the edges originating from vertices in [2n] \ [n] as a sprinkling of the graph that will cre-
ate shortcuts in PA(m,δ)
n . Such shortcuts shorten graph distances tremendously. We call the
vertices that create such shortcuts n-connectors.
Basically, this argument shows that a vertex v ∈ [n] of large degree Dv (n) 1 will
likely have an n-connector to a vertex u ∈ [n] satisfying Du (n) ≥ Dv (n)1/(τ −2) . This is
related to the power-iteration argument for the configuration model discussed below Propo-
sition 7.15. However, for preferential attachment models, we emphasize that it takes two
steps to link a vertex of large degree to another vertex of even larger degree. In the proof
for the configuration model in Theorem 7.13, this happened in only one step. Therefore,
distances in PA(m,δ)
n for δ < 0 are (at least in terms of upper bounds) twice as large as the
corresponding distances for a configuration model with similar degree structure. Let us now
state our main result, for which we require some notation.
For A ⊆ [2n], we write

diam2n (A) = max distPA(m,δ) (i, j). (8.7.2)

i,j∈A 2n

Then, the diameter of the core in the graph PA(m,δ)

2n is bounded as follows:

Theorem 8.26 (Diameter of the core) Consider PA(m,δ)

2n with m ≥ 2 and δ ∈ (−m, 0).
For every σ > 1/(3 − τ ), whp there exists a K = Kδ > 0 such that
4 log log n
diam2n (Coren ) ≤ + K. (8.7.3)
| log (τ − 2)|
These results apply to PA(m,δ) (m,δ) (m,δ)
2n (a), as well as to PA2n (b) and PA2n (d).

The proof of Theorem 8.26 is divided into several smaller steps. We start by proving that
the diameter of the inner core Innern , which is defined by

Innern = v ∈ [n] : Dv (n) ≥ n1/[2(τ −1)] (log n)−1/2 ,

(8.7.4)

is whp bounded by some finite constant Kδ < ∞. After this, we show that the distance
from the outer core, given by Outern = Coren \Innern , to the inner core can be bounded
by 2 log log n/| log (τ − 2)|. This shows that the diameter of the outer core is bounded
by 4 log log n/| log (τ − 2)| + Kδ , as required. We now give the details, starting with the
diameter of the inner core:

Proposition 8.27 (Diameter of the inner core) Consider PA(m,δ)

2n with m ≥ 2 and δ ∈
(−m, 0). Then, whp, there exists a constant K = Kδ such that
diam2n (Innern ) < Kδ . (8.7.5)

These results apply to PA(m,δ) (m,δ) (m,δ)

2n (a), as well as to PA2n (b) and PA2n (d).

Before proving Proposition 8.27, we first introduce the important notion of an n-connector
between two sets of vertices A, B ⊆ [n], which plays a crucial role throughout the proof:
368 Small-World Phenomena in Preferential Attachment Models

Definition 8.28 (n-connector) Fix two sets of vertices A and B . We say that the vertex
j ∈ [2n] \ [n] is an n-connector between A and B if one of the edges incident to j connects
to a vertex in A, while another edge incident to j connects to a vertex in B . Thus, when
there exists an n-connector between A and B , the distance between A and B in PA(m,δ)2n is
at most 2. J
The next lemma gives bounds on the probability that an n-connector does not exist:
Lemma 8.29 (Connectivity sets in infinite-variance degree preferential attachment models)
2n (a) with m ≥ 2 and δ ∈ (−m, 0). For any two sets of vertices A, B ⊆
Consider PA(m,δ)
[n], there exists η = η(m, δ) > 0 such that
P(no n-connector for A and B | PA(m,δ)
n (a)) ≤ e−ηDA (n)DB (n)/n , (8.7.6)
where, for any A ⊆ [n],
X
DA (n) = Da (n), (8.7.7)
a∈A

denotes the total degree of vertices in A at time n. These results also apply to PA(m,δ)
2n (b)
and PA(m,δ)
2n (d) under identical conditions.
Lemma 8.29 plays the same role for preferential attachment models as Lemma 7.12 for
configuration models.
Proof We give only the proof for PA(m,δ) (m,δ) (m,δ)
2n (a); the proofs for PA2n (b) and PA2n (d)
(m,δ)
are identical. We note that for two sets of vertices A and B , conditional on PAn (a) the
probability that j ∈ [2n] \ [n] is an n-connector for A and B is at least
(DA (n) + δ|A|)(DB (n) + δ|B|)
, (8.7.8)
[2n(2m + δ)]2
independently of whether the other vertices are n-connectors.
Since Di (n) + δ ≥ m + δ > 0 for every i ≤ n, and δ < 0, for every i ∈ B , we have
δ δ m+δ
Di (n) + δ = Di (n) 1 + ≥ Di (n) 1 + = Di (n) , (8.7.9)
Di (n) m m
and thus DA (n) + δ|A| ≥ DA (n)(m + δ)/m. As a result, for η = (m + δ)2 /(2m(2m +
δ))2 > 0, the probability that j ∈ [2n] \ [n] is an n-connector for A and B is at least
ηDA (n)DB (n)/n2 , independently of whether the other vertices are n-connectors. There-
fore, the probability that there is no n-connector for A and B is, conditional on PA(m,δ)
n (a),
bounded above by
ηDA (n)DB (n) n
1− ≤ e−ηDA (n)DB (n)/n , (8.7.10)
n2
as required.
We now give the proof of Proposition 8.27:
Proof of√Proposition 8.27. From [V1,√ Theorem 8.3 and Exercise 8.20] whp, Innern contains
at least n vertices. Denote the first n vertices of Innern by I . We are relying on Lemma
8.29. Recall that Di (n) ≥ n1/[2(τ −1)] (log n)−1/2 for all i ∈ I . Observe that n1/(τ −1)−1 =
8.7 Log Log Upper Bounds for PAMs with Infinite-Variance Degrees 369

o(1) for τ > 2, so that, for any i, j ∈ I , the probability that there exists an n-connector for
i and j is bounded below by
1 − exp{−ηn1/(τ −1)−1 (log n)−1 } ≥ pn ≡ n−(τ −2)/(τ −1) (log n)−2 , (8.7.11)
for n sufficiently large.
√
We wish to couple Innern to an Erdős–Rényi random graph with Nn = n vertices and
edge probability pn , which we denote by ERNn (pn ). For this, for i, j ∈ [Nn ], we say that
an edge between i and j is present when there exists an n-connector connecting the ith and
j th vertices in I .
We now prove that this graph is stochastically bounded below by ERNn (pn ). Note that
(8.7.11) does not guarantee this coupling; instead we need to prove that the lower bound
holds uniformly when i and j belong to I , independently of the previous edges. For this, we
order the Nn (Nn − 1)/2 edges in an arbitrary way and bound the conditional probability
that the lth edge is present, conditioning on all previous edges, from below by pn for every
l. This proves the claimed stochastic lower bound.
Indeed, the lth edge is present precisely when there exists an n-connector connecting
the corresponding vertices, which we call i and j in I . Moreover, we shall not make use
of the first vertices that were used to n-connect the previous edges. This removes at most
Nn (Nn − 1)/2 ≤ n/2 possible n-connectors, after which at least another n/2 remain.
The probability that one of them is an n-connector for the ith and j th vertex in I is, for n
sufficiently large, bounded below by
1 − exp{−ηn1/(τ −1)−2 (log n)−1 n/2} ≥ pn ≡ n−(τ −2)/(τ −1) (log n)−2 ,
using 1 − e−x ≥ x/2 for x ∈ [0, 1] and η/2 ≥ 1/ log n for n sufficiently large. This
proves the claimed stochastic domination of the random graph on I and ER(Nn , pn ). Next,
we show that diam(ERNn (pn )) is, whp, uniformly bounded by a constant.
For this we use the result in (Bollobás, 2001, Corollary 10.12), which gives sharp bounds
on the diameter of an Erdős–Rényi random graph. Indeed, this result implies that if pd N d−1 −
2 log N → ∞, while pd−1 N d−2 − 2 log N → −∞, then diam(ERN (p)) = d, whp. In
our case, N = Nn = n1/2 and
p = pn = n−(τ −2)/(τ −1) (log n)−2 = N −2(τ −2)/(τ −1) (2 log N )−2 ,
−1 −1
which implies that, whp, τ3−τ < d ≤ τ3−τ + 1. Thus, we obtain that the diameter of I in
τ −1
PA2n is whp bounded by 2d ≤ 2( 3−τ + 1). In Exercise 8.23, the reader is asked to prove
(m,δ)

an upper bound on the diameter of ERn (p).

We finally show that for any i ∈ Innern \ I , the probability that no n
√-connector exists be-
tween i and I is small. We again rely on Lemma 8.29. Since DI (n) ≥ nn1/[2(τ −1)] (log n)−1/2 ,
and Di (n) ≥ n1/[2(τ −1)] (log n)−1/2 , the probability that no n-connector exists between i
and I is bounded above by exp − ηn1/(τ −1)−1/2 (log n)−1 , which is tiny since τ < 3.
This proves that the distance between any vertex i ∈ Innern \ I and I is whp bounded by 2,
−1
and, together with the fact that diam2n (I) ≤ 2( τ3−τ +1), this implies that diam2n (Innern ) ≤
τ −1
2( 3−τ + 2) ≡ Kδ .
We proceed by studying the distances between the outer core Coren \ Innern and the
inner core Innern :
370 Small-World Phenomena in Preferential Attachment Models

Proposition 8.30 (Distance between outer and inner core) Consider PA(m,δ)2n (a) with m ≥
2 and δ ∈ (−m, 0). The inner core Innern can whp be reached from any vertex in the outer
core Outern using no more than |2log
log log n
(τ −2)|
edges in PA(m,δ)
2n (a), i.e., whp,

2 log log n
max min distPA(m,δ) (i, j) ≤ . (8.7.12)
i∈Outern j∈Innern 2n (a)
| log (τ − 2)|
These results also apply to PA(m,δ) (m,δ)
2n (b) and PA2n (d).

Proof Again, we give the proof only for PA(m,δ)

2n (a). Recall that

Outern = Coren \ Innern . (8.7.13)

and define
Γ1 = Innern = {i : Di (n) ≥ u1 }, (8.7.14)
where
u1 = n1/[2(τ −1)] (log n)−1/2 . (8.7.15)
We now recursively define a sequence uk , for k ≥ 2, so that, for any vertex i ∈ [n] with
degree at least uk , the probability that there is no n-connector for the vertex i and the set
Γk−1 = {j : Dj (n) ≥ uk−1 }, (8.7.16)
conditional on PA(m,δ)
n , is tiny; see Lemma 8.31 below. According to Lemma 8.29 and [V1,
Exercise 8.20], this probability is at most
( )
ηBn[uk−1 ]2−τ uk
exp − . (8.7.17)
n
To make this sufficiently small, we define
τ −2
uk = D log n uk−1 , (8.7.18)
with the constant D exceeding (ηB)−1 . Note that the recursion in (8.7.18) is identical to
that in (7.3.53). Therefore, by Lemma 7.14,
uk = (D log n)ak nbk , (8.7.19)
where
(τ − 2)k−1 1 − (τ − 2)k−1
bk = , ak = . (8.7.20)
2(τ − 1) 3−τ
The key step in the proof of Proposition 8.30 is the following lemma:
Lemma 8.31 (Connectivity between Γk−1 and Γk ) Fix m, k ≥ 2 and δ ∈ (−m, 0). Then
the probability that there exists an i ∈ Γk that is not at distance 2 from Γk−1 in PA(m,δ)
2n is
−ζ
bounded by n for some ζ > 0.
Proof We note that, by [V1, Exercise 8.20], with probability exceeding 1 − o(n−1 ), for all
k , we have
X
Di (n) ≥ Bn[uk−1 ]2−τ . (8.7.21)
i∈Γk−1
8.7 Log Log Upper Bounds for PAMs with Infinite-Variance Degrees 371

On the event that the bounds in (8.7.21) hold, by Lemma 8.29 we obtain that the conditional
probability, given PA(m,δ)
n , that there exists an i ∈ Γk such that there is no n-connector
between i and Γk−1 is bounded, using Boole’s inequality, by
n exp − ηB[uk−1 ]2−τ uk = ne−ηBD log n ≤ n−(1+ζ) ,

(8.7.22)
where we have used (8.7.18) and taken D = 2(1 + ζ)/(ηB).
We now complete the proof of Proposition 8.30. Fix
$ %
? log log n
kn = . (8.7.23)
| log (τ − 2)|

By Lemma 8.31, and since kn? n−ζ = o(1) for all ζ > 0, the distance between Γkn? and
Innern is at most 2kn? . Therefore, we are done if we can show that
Outern ⊆ {i : Di (n) ≥ (log n)σ } ⊆ Γkn? = {i : Di (n) ≥ ukn? }, (8.7.24)
so that it suffices to prove that (log n)σ ≥ ukn? for any σ > 1/(3 − τ ). This follows by
(7.3.66), which implies that
ukn? = (log n)1/(3−τ )+o(1) ; (8.7.25)
by picking n sufficiently large, we see that this is smaller than (log n)σ for any σ > 1/(3 −
τ ). This completes the proof of Proposition 8.30.
Proof of Theorem 8.26. We note that whp diam2n (Coren ) ≤ Kδ +2kn? , where kn? in (8.7.23)
is the upper bound on maxi∈Outern minj∈Innern distPA(m,δ) (i, j) in Proposition 8.30, and we
2n
have made use of Proposition 8.27. This proves Theorem 8.26.

8.7.2 C ONNECTING THE P ERIPHERY TO THE C ORE

In this subsection we extend the results of the previous subsection and, in particular, study
the distance between the vertices not in Coren and the core. The main result is the following
theorem:
Theorem 8.32 (Connecting the periphery to the core) Consider PA(m,δ) 2n with m ≥ 2 and
δ ∈ (−m, 0). For every σ > 1/(3 − τ ), whp the distance between a uniformly chosen
vertex o1 ∈ [2n] and Coren in PA(m,δ)
2n (a) is bounded by C log log log n for some C > 0.
These results also apply to PA(m,δ)
2n (b) and PA(m,δ)
2n (d).

Together with Theorem 8.26, Theorem 8.32 proves the upper bound in Theorem 8.7:
Proof of the upper bound in Theorem 8.7. Choose o1 , o2 ∈ [2n] independently and uar.
Using the triangle inequality, we obtain the bound
distPA(m,δ)
2n
(o1 , o2 )
≤ distPA(m,δ)
2n
(o1 , Coren ) + distPA(m,δ)
2n
(o2 , Coren ) + diam2n (Coren ). (8.7.26)
By Theorem 8.32, the first two terms are each whp bounded by C log log log n. Further, by
Theorem 8.26, the third term is bounded by (1 + oP (1)) |4log
log log n
(τ −2)|
. This completes the proof
of the upper bound in Theorem 8.7.
372 Small-World Phenomena in Preferential Attachment Models

Exercise 8.24 shows that distPA(m,δ) (o1 , o2 ) − 2 log log n/| log(τ − 2)| is upper tight
2n
when distPA(m,δ) (o1 , Coren ) is tight.
2n

Proof of Theorem 8.32. We use the same ideas as in the proof of Theorem 8.26, but now
start from a vertex of large degree at time n instead. We need to show that, for fixed ε > 0,
a uniformly chosen vertex o ∈ [(2 − ε)n] can whp be connected to Coren using no more
than C log log log n edges in PA(m,δ)
2n . This is done in two steps.
In the first step, we explore the neighborhood of o in PA(m,δ)
2n until we find a vertex v0
with degree Dv0 (n) ≥ u0 , where u0 will be determined below. Denote the set of all vertices
in PA(m,δ)
2n that can be reached from o using exactly k different edges from PA(m,δ)2n by Sk .
Denote the first k for which there is a vertex in Sk whose degree at time n is at least u by

6 ∅ .

Tu(o) = inf k : Sk ∩ {v : Dv (n) ≥ u} = (8.7.27)

Recall the local convergence in Theorem 5.26, as well as the fact that each vertex v has
m older neighbors v1 , . . . , vm , whose ages av1 , . . . , avm are distributed as Uv(τi −2)/(τ −1) av ,
where av is the age of v . Therefore, whp, there is a vertex in Sk with arbitrarily small age,
and thus also with arbitrarily large degree, at time n. As a result, there exists a C = Cu,ε
such that, for sufficiently large n,

P(Tu(o) ≥ Cu,ε ) ≤ ε. (8.7.28)

The second step is to show that a vertex v0 satisfying Dv0 (n) ≥ u0 for sufficiently large
u0 can be joined to the core by using O(log log log n) edges. To this end, we apply Lemma
8.29 to obtain, for any vertex a with Da (n) ≥ wa , the probability that there does not exist a
vertex b with Db (n) ≥ wb that is connected to a by an n-connector, conditional on PA(m,δ)
n ,
is at most
exp {−ηDa (n)DB (n)/n} , (8.7.29)

where B = {b : Db (n) ≥ wb }. Since, as in (8.7.21),

DB (n) ≥ wb #{b : Db (n) ≥ ub } ≡ wb N≥wb (n) ≥ cnwb2−τ , (8.7.30)

we thus obtain that the probability that such a b does not exists is at most

exp −η 0 wa wb2−τ ,

(8.7.31)

where η 0 = ηc. Fix ε > 0 such that (1 − ε)/(τ − 2) > 1. We then iteratively take
(1−ε)/(τ −2)
uk = uk−1 , to see that the probability that there exists a k for which there does not
exist a vk with Dvk (n) ≥ uk is at most
k
X
exp −η 0 uεl−1 .

(8.7.32)
l=1

Since, for ε > 0,

k
uk = uκ0 , where κ = (1 − ε)/(τ − 2) > 1, (8.7.33)

we obtain that the probability that there exists an l with l ≤ k for which there does not exist
8.8 Diameters in Preferential Attachment Models 373

a vl with Dvl (n) ≥ ul is at most

k n o
l−1
X
exp −η 0 uεκ
0 . (8.7.34)
l=1

Now fix k = kn = dC log log log ne and choose u0 sufficiently large that
kn n o
l−1
X
exp −η 0 u0εκ ≤ ε/2. (8.7.35)
l=1

Then we obtain that, with probability at least 1 − ε/2, v0 is connected in kn steps to a vertex
kn
vkn with Dvkn (n) ≥ u0κ . Since, for C ≥ 1/ log κ, we have
kn
uκ0 ≥ ulog
0
log n
≥ (log n)σ (8.7.36)
when log u0 ≥ σ , we obtain that vkn ∈ Coren whp.

8.8 D IAMETERS IN P REFERENTIAL ATTACHMENT M ODELS

In this section we investigate the diameter in preferential attachment models. We start by

discussing logarithmic bounds for δ > 0, continue with the doubly logarithmic bounds for
δ < 0, and finally discuss the case where δ = 0.

Logarithmic Bounds on the Diameter

We start by proving a logarithmic bound on the diameter for (PA(m,δ)
n )n≥1 , which obviously
also implies the logarithmic upper bound on typical distances in Theorem 8.6:
Theorem 8.33 (Diameter of preferential attachment models) Consider PA(m,δ)
n (a) with
m ≥ 2 and δ > −m. There exists a constant c2 with 0 < c2 < ∞ such that, as n → ∞
and whp,
diam(PA(m,δ)
n (a)) ≤ c2 log n. (8.8.1)
Consequently, there exists a constant c2 with 0 < c2 < ∞ such that distPA(m,δ)
n
(o1 , o2 ) ≤
c2 log n whp as n → ∞. These results also apply to PAn (b) and PAn (d).
(m,δ) (m,δ)

Proof We start by proving the logarithmic bound on the diameter of PA(m,δ)

n (b), which is
(m,δ) (m,δ)
easier, since PAn (b) is whp connected. Since (PAn (b))n≥1 is obtained by collapsing
m successive vertices in (PA(1,δ/m)
mn (b))n≥1 , we have
diam(PA(m,δ)
n (b)) diam(PA(1,δ/m)
mn (b)), (8.8.2)
and the result follows from Theorem 8.1.
We next extend this result to (PA(m,δ)
n (a))n≥1 . We know that PA(m,δ)
n (a) is whp connected
for all n sufficiently large (recall Theorem 5.27). Fix a sufficiently large T , to be determined
below. Then we can bound, for all u, v ∈ [n],
distPA(m,δ)
n (a)
(u, v) ≤ distPA(m,δ)
n (a)
(u, [T ]) + distPA(m,δ)
n (a)
(v, [T ])
+ diamPA(m,δ)
n (a)
([T ]). (8.8.3)
374 Small-World Phenomena in Preferential Attachment Models

Since PA(m,δ)
T (a) is connected whp,
diamPA(m,δ)
n (a)
([T ]) ≤ diam(PA(m,δ)
T (a)), (8.8.4)
which is a tight random variable. Then, similarly to (8.8.2),
distPA(m,δ)
n (a)
(u, [T ]) ≤ distPA(1,δ/m)
mn (a)
(u, [mT ]). (8.8.5)
As in the proof for PA(m,δ)
n (b), whp, when T is sufficiently large,
max distPA(1,δ/m)
mn (a)
(u, [mT ]) ≤ c log n. (8.8.6)
u∈[mn]

A similar proof can be used for PA(m,δ)

n (d); see Exercise 8.25.

Doubly Logarithmic Diameter

When δ < 0, typical distances in PAn(m,δ) grow doubly logarithmically. The following theo-
rem shows that this extends to the diameter of PA(m,δ)
n , and it identifies the exact constant:
Theorem 8.34 (Diameter of PA(m,δ)
n for δ < 0) Consider PA(m,δ)
n (a) with m ≥ 2 and
δ ∈ (−m, 0). As n → ∞,
diam(PA(m,δ)
n (a)) P 4 2
−→ + . (8.8.7)
log log n | log (τ − 2)| log m
These results also apply to PA(m,δ)
n (b) and PA(m,δ)
n (d).
Theorem 8.34 is an adaptation of Theorem 7.20 from CMn (d) to PA(m,δ) n . Indeed, the
first term on the rhs of (8.8.7) corresponds to the typical distances in Theorem 8.19, just as
the first term on the rhs of (7.5.5) in Theorem 7.20 corresponds to the typical distances in
Theorem 7.2. Further, the additional term 2/ log m can be interpreted as 2/ log (dmin − 1),
where dmin is the minimal degree of an internal vertex in the neighborhood of a vertex
in PA(m,δ)
n , as in Theorem 7.20. In turn, this can be interpreted as twice 1/ log m, where
log log n/ log m can be viewed as the depth of the worst trap.
We will be brief about the proof of the upper bound. We will explore the r-neighborhoods
of vertices, where we take r = (1+ε) log log n/ log m. Then, these boundaries are so large,
that, with probability of order 1 − o(1/n2 ), pairs of such boundaries are quickly connected
to the core Coren . An application of Theorem 8.26 then completes the upper bound.
We now give some more details on the proof of the lower bound, which proceeds by defin-
ing the notion of a vertex being minimally k -connected. It means that the k -neighborhood
of the vertex is as small as possible, so that its boundary has size mk (rather than dmin (dmin −
1)k−1 as it is in CMn (d)). We then show that there are plenty of such minimally k -connected
vertices as long as k ≤ (1 − ε) log log n/ log m, similarly to the analysis in Lemma 7.22
for CMn (d). However, the analysis itself is quite a bit harder, owing to the dynamic nature
of PA(m,δ)
n .
Definition 8.35 (Minimally k -connected vertices) Fix Gn = PA(m,δ)
n , and define the set
of minimally k -connected vertices by

Mk = v ∈ [n] : Dv (n) = m, Du (n) = m + 1 ∀u ∈ Bk−1 (Gn )
(v) \ {v},
Bk(Gn ) (v) ⊆ [n] \ [n/2] , (8.8.8)
8.9 Related Results on Distances in Preferential Attachment Models 375

and let Mk = |Mk | denote its size. J

In words, Mk consists of those vertices in [n]\[n/2] whose k -neighborhood is minimally

P
k -connected at time n. The following lemma shows that Mk −→ ∞ when k ≤ (1 −
ε) log log n/ log m:
Lemma 8.36 (Many minimally k -connected vertices in PA(m,δ)
n ) Consider PA(m,δ)
n (a),
PAn (b) and PAn (d) with m ≥ 2 and δ ∈ (−m, 0). For k ≤ (1−ε) log log n/ log m,
(m,δ) (m,δ)

P
Mk −→ ∞. (8.8.9)

Exercises 8.26–8.28 prove Lemma 8.36 for PA(m,δ) n (d); they rely on the arguments in the
proof of Proposition 5.22. The proofs for PA(m,δ)
n (a) and PA(m,δ)
n (b) are quite similar.
(m,δ)
To complete the lower bound on diam(PAn ) in Theorem 8.34, we take two vertices
u, v ∈ Mk with k = (1 − ε) log log n/ log m. By definition, ∂Bk(Gn ) (u), ∂Bk(Gn ) (v) ⊂
[n] \ [n/2]. We can then adapt the proof of Theorem 8.19 to show that the distance between
∂Bk(Gn ) (u) and ∂Bk(Gn ) (v) is whp still bounded from below by 4 log log n/| log (τ − 2)|.
Therefore, whp,

diam(PA(m,δ)
n ) ≥ distPA(m,δ)
n
(u, v) = 2k + distPA(m,δ)
n
(∂Bk(Gn ) (u), ∂Bk(Gn ) (v))
2(1 − ε) log log n 4 log log n
≥ + . (8.8.10)
log m | log (τ − 2)|
This gives an informal proof of the lower bound.

Critical Case δ = 0
We close this section by discussing the diameter for δ = 0:

Theorem 8.37 (Diameter of PA(m,0)

n (a) for δ = 0) Fix m ≥ 2. As n → ∞,
log log n P
diam(PA(m,0)
n (a)) −→ 1. (8.8.11)
log n
Theorem 8.37 is one of the few results that have not been extended to PA(m,0)
n (b) and
PA(m,0)
n (b). As we see, the diameter in Theorem 8.37 and the typical distances in Theorem
8.8 behave similarly. This is unique to δ = 0.

8.9 R ELATED R ESULTS ON D ISTANCES IN P REFERENTIAL ATTACHMENT M ODELS

Distance Evolution for δ < 0

We now study the evolution of the graph distances n 7→ distPA(m,δ)
n
(i, j). In Exercise 8.29,
the reader is asked to show that n 7→ distPA(m,δ)
n
(i, j) is non-increasing for n ≥ i ∨ j .
Exercises 8.30–8.34 study various aspects of the evolution of distances. The main result
below identifies precisely how the distances between uniform vertices in [n] decrease as
time progresses. We indicate the dependence on the initial time explicitly and investigate
how t 7→ distPA(m,δ) (o(n) (n)
1 , o2 ) evolves for all t ≥ n:
t
376 Small-World Phenomena in Preferential Attachment Models

Theorem 8.38 (Evolution of distances) Consider (PA(m,δ) n (b))n≥1 with m ≥ 2 and δ ∈

(−m, 0). Choose o(n)
1 , o (n)
2 independently and uar from [n]. Then, for all t ≥ n,
j log log n − log 1 ∨ log(t/n) k
sup distPA(m,δ) (n) (n)
(o1 , o2 ) − 4 ∨4 (8.9.1)
t≥n t (b)
| log(τ − 2)|
is a tight sequence of random variables.
Theorem 8.38 is very strong, as it describes rather precisely how the distances decrease
as the graph PA(m,δ)
t grows. Further, Theorem 8.38 also proves that distPA(m,δ) n
(o1(n) , o(n)
2 )−
4 log log n/| log(τ − 2)| is a tight sequence of random variables. While the lower tightness
(i.e., the tightness of [distPA(m,δ)
n
(o(n) (n)
1 , o2 ) − 4 log log n/| log(τ − 2)|]− ) follows from
Theorem 8.19, the upper tightness (i.e., the tightness of [distPA(m,δ)
n
(o1(n) , o(n)
2 )−4 log log n/
| log(τ − 2)|]+ ) was not proved in Theorem 8.7 (recall (8.7.26)). Exercises 8.35 and 8.36
investigate what happens when t = tn = n exp{(log n)α } for α ∈ (0, 1) and α > 1,
respectively. This leaves open the interesting case where α = 1.
The proof of Theorem 8.38 uses ideas similar to those used earlier in the present chapter,
but significantly extends them in order to prove the uniformity in t ≥ n. We do not present
the full proof here. We do explain, however, why distPA(m,δ) (o(n) (n)
1 , o2 ) = 2 whp when t =
t

tn n−2m/δ , which is an interesting case and explains why the supremum in (8.9.1) can
basically be restricted to t ∈ [n, n−2m/δ+o(1) ]. This sheds light on precisely what happens
when t = tn = n exp{(log n)α } for α = 1, a case that is left open above.
The probability that one of the m edges of vertex n + t + 1 connects to u, and another
one to v (which certainly makes the distance between u and v equal to 2), is close to
h Dv (n + t) + δ Du (n + t) + δ i
m(m − 1)E PA(m,δ)
n
2m(n + t) + (n + t)δ 2m(n + t) + (n + t)δ
m(m − 1) h i
= (1 + oP (1)) E (D v (n + t) + δ)(D u (n + t) + δ) PA (m,δ)
n
(2m + δ)2 t2
m(m − 1) t 2/(2+δ/m)
= (1 + oP (1)) (D v (n) + δ)(D u (n) + δ)
(2m + δ)2 t2 n
m(m − 1) −2(m+δ)/(2m+δ) −2/(2+δ/m)
= (1 + oP (1)) t n (Dv (n) + δ)(Du (n) + δ).
(2m + δ)2
(8.9.2)
d d
If we take u = o(n) (n)
1 , v = o2 , we have that Dv (n) −→ D1 , Du (n) −→ D2 , where
(D1 , D2 ) are two iid copies of the random variable with asymptotic degree distribution
P(D = k) = pk in (1.3.60). Thus, the conditional expectation of the total number of double
attachments to both o(n) (n)
1 and o2 up to time n + t is close to
t
X m(m − 1)(D1 + δ)(D2 + δ)
s−2(m+δ)/(2m+δ) n−2/(2+δ/m)
s=1
(2m + δ)2
m(m − 1)(D1 + δ)(D2 + δ) −2m/(2m+δ) −δ/(2m+δ)
≈ n t , (8.9.3)
(2m + δ)(−δ)
which becomes ΘP (1) when t = Kn−2m/δ . The above events, for different t, are close
8.9 Related Results on Distances in Preferential Attachment Models 377

to being independent. This suggests that the process of attaching to both o(n) (n)
1 and o2 is,
conditioning on their degrees (D1 , D2 ), Poisson with some random intensity.

Distances in the Bernoulli PA Model

Recall the Bernoulli preferential attachment model (BPA(f )
n )n≥1 from Section 1.3.5. Its
degree structure is understood much more generally than for preferential attachment models
with a fixed number of edges (recall (1.3.67)). This allows one to zoom in on particular
instances of the preferential attachment function. While (BPA(f )
n )n≥1 is a quite different
model, in particular since whp the graph is not connected unlike PA(m,δ)
n for m ≥ 2 and
n large, in terms of distances it behaves similarly to the fixed out-degree models. This is
exemplified by the following theorem, which applies in the infinite-variance degree setting:
Theorem 8.39 (Evolution of distances for scale-free BPA(f ) (f )
n ) Consider BPAn where the
1
concave attachment rule f satisfies that there exists γ ∈ ( 2 , 1) such that
f (k) = γk + β. (8.9.4)
Then Theorem 8.38 extends to this setting when we restrict to t ≥ n such that o1(n) and o(n)
2
are connected in BPA(f )
n . In particular, conditional on this event,

log log n
) (o1 , o2 ) − 4
(n) (n)
distBPA(f (8.9.5)
n
| log(τ − 2)|
is a tight sequence of random variables.
The situation of affine preferential attachment functions f in (8.9.4) where γ ∈ (0, 12 ), for
which the degree power-law exponent satisfies τ = 1 + 1/γ > 3, is not so well understood,
but one can conjecture that again, the distance between o1(n) and o2(n) is whp logarithmic at
some base related to the multi-type branching process that describes its local limit.
The following theorem, for which γ = 12 so that τ = 3, describes nicely how the addition
of an extra power of a logarithm in the degree distribution affects the distances:
Theorem 8.40 (Critical case: interpolation) Consider BPA(f )
n where the concave attach-
ment rule f satisfies that there exists α > 0 such that
k α k k
f (k) = + +o , (8.9.6)
2 2 log k log k
Choose o1 , o2 independently and uar from [n]. Then, conditional on o1 ←→ o2 ,
1 log n
distBPA(f ) (o1 , o2 ) = (1 + oP (1)) . (8.9.7)
n
1 + α log log n
Exercise 8.37 shows that the degree distribution for f in (8.9.6) satisfies l>k pl ≈
P
k −2 (log k)−2α , as for GRGn (w) in Theorem 6.28. Comparing Theorems 8.40 and 6.28,
we see that, for large α, the typical distances in BPAn(f ) are about twice as large as those in
GRGn (w) with similar degrees. This gives an explanation of the occurrence of the extra
factor 2 in Theorem 8.7 compared with Theorem 6.3 for the Norros–Reittu model NRn (w),
and Theorem 7.2 for the configuration model CMn (d), when the power-law exponent τ
satisfies τ ∈ (2, 3). Note that this extra factor is absent precisely when α = 0.
378 Small-World Phenomena in Preferential Attachment Models

8.10 N OTES AND D ISCUSSION FOR C HAPTER 8

Notes on Section 8.2

Scale-free trees have received substantial attention in the literature; we refer to Bollobás and Riordan
(2004b); Pittel (1994), and the references therein. The almost sure limit of height(PA(1,δ) n (d))/ log n in
Theorem 8.1 is (Pittel, 1994, Theorem 1). Our proof of this result in Section 8.2 is incomplete, particu-
larly since we have omitted the proof of Proposition 8.4. Proposition 8.4 can be proved using the beautiful
result on the asymptotic height of branching-process trees due to Kingman (1975), of which Pittel (1994)
makes crucial use. This approach is based on a continuous-time embedding of PA(1,δ) n (d) that leads to a
continuous-time branching process, for which exponential martingales and the subadditive ergodic theorem
can be used to give a relatively short proof of the almost sure limit of the tree height. The adaptation to
PA(1,δ)
n (b) is rather straightforward. We do not give more details, since we have not relied on the lower
bound but have used the upper bound instead. See Exercises 8.4 and 8.5 for some background on this proof.
There is a close analogy between PA(1,δ)
n (a) and so-called uniform recursive trees. In uniform recursive
trees, we grow a tree such that at time 1, we have a unique vertex called the root, with label 1, and, at time n,
we add a vertex and connect it to a uniformly chosen vertex in the tree. See Smythe and Mahmoud (1994)
for a survey of recursive trees.
A variant of a uniform recursive tree arises when the probability that a newly added vertex is attached
to a particular vertex is proportional to the degree of that vertex (and, for the root, its degree plus one).
This process is called a random plane-oriented recursive tree. For a uniform recursive tree of size n, it was
proved in Pittel (1994) that the maximal distance between the root and any other vertex is whp equal to
1
2γ
log n(1 + o(1)), where γ satisfies (8.2.2) with δ = 0. It is not hard to see that this implies that the
diameter of the uniform recursive tree is equal to γ1 log n(1 + oP (1)).

Notes on Section 8.3

Theorem 8.6 was proved in Dommers et al. (2010). A weaker version of Theorem 8.7 was proved in Dom-
mers et al. (2010). The current theorem was inspired by Dereich et al. (2012).
Theorem 8.8 was proved in Bollobás and Riordan (2004a), who, more precisely, proved the result for
the diameter of PA(m,0)
n in Theorem 8.37. This proof can rather easily be extended to the typical distances.

Notes on Section 8.4

The bound in Proposition 8.9 for δ = 0 was proved in (Bollobás and Riordan, 2004a, Lemma 3) in a rather
different way. The current version for all δ is (Dommers et al., 2010, Corollary 2.3), and its proof was also
adapted from there. Lemma 8.10 is (Dommers et al., 2010, Lemma 2.1).

Notes on Section 8.5

Theorem 8.13 is stronger than the corresponding result in Dommers et al. (2010), since we expect the lower
bound to be sharp. We thank Joost Jorritsma and Júlia Komjáthy for preliminary discussions on Theorem
8.13. The proofs in this section for δ = 0 first appeared in (Bollobás and Riordan, 2004a, Section 4).

Notes on Section 8.6

The proof of Theorem 8.19 was adapted from Dereich et al. (2012).

Notes on Section 8.7

Theorem 8.26 is somewhat stronger than (Dommers et al., 2010, Theorem 3.1), who also first proved a
weaker upper bound than that in Theorem 8.7. We follow the proof of (Dommers et al., 2010, Theorem
3.1). Theorem 8.32 was proved by Dereich et al. (2012); see also (Dommers et al., 2010, Theorem 3.6).

Notes on Section 8.8

Theorem 8.33 was proved in Dommers et al. (2010). Theorem 8.34 was proved in Caravenna et al. (2019).
Theorem 8.37 was proved by Bollobás and Riordan (2004a). Its proof relies on the linear cord diagram
(LCD) description of the δ = 0 model by Bollobás and Riordan (2004a). This explains why the proof
applies only to PA(m,0)
n (a) and not immediately to PA(m,0)
n
(m,0)
(b) and PAn (d).
8.11 Exercises for Chapter 8 379

Notes on Section 8.9

Theorems 8.38 and 8.39 were proved by Jorritsma and Komjáthy (2022), who studied the more general
problem of first-passage percolation on the preferential attachment model. In first-passage percolation,
the edges are weighted. These weights can be interpreted as the traversal time of an edge in a rumor-
spreading model. Then, Jorritsma and Komjáthy (2022) studied the time it takes for a rumor to go from a
random source to a random destination. They obtained sharp results for the leading-order asymptotics of
this traversal time, and also in the dynamical setting. Theorem 8.40 is proved in Dereich et al. (2017).

8.11 E XERCISES FOR C HAPTER 8

Exercise 8.1 (Bound on θ) Prove that the solution θ of (8.2.2) satisfies θ < 1. What does this imply for
the diameter and typical distances in scale-free trees?
Exercise 8.2 (Bound on θ) Prove that the solution θ of (8.2.2) satisfies θ ∈ (0, e−1 ).
Exercise 8.3 (Extension Remark 8.3) Prove the path-presence probability in (8.2.8) in Remark 8.3 by
using (8.2.20) and (8.2.21).
Exercise 8.4 (Height of preferential attachment trees (Pittel (1994))) Check Pittel (1994) to see how
Pittel identifies the almost sure limit of height(PA(1,δ)
n )/ log n, using a continuous-time embedding and
the identification of the almost sure limit of the height of a continuous-time branching process by Kingman
(1975).
Exercise 8.5 (Height of continuous-time branching process trees (Kingman (1975))) Check the beautiful
argument by Kingman (1975) that identifies the almost sure limit of height(Tn )/ log n, where Tn is a
continuous-time branching process tree of size n. Verify that the conditions used by Kingman (1975) hold
for the branching process used by Pittel (1994) to study preferential attachment trees.
Exercise 8.6 (Doubly logarithmic distances for PA(m,δ)
n ) Fix m = 4 and δ = −2. Let o1 , o2 be indepen-
dent uniform vertices in [n]. Find the constant a such that
distPA(m,δ) (o1 , o2 ) P
n
−→ a. (8.11.1)
log log n
Exercise 8.7 (Early vertices are whp at distance 2 for δ < 0) Let δ ∈ (−m, 0) and m ≥ 2. Show that,
for i, j fixed,
lim P(distPA(m,δ) (i, j) ≤ 2) = 1. (8.11.2)
n→∞ n

Exercise 8.8 (All early vertices are whp at distance 2 for δ < 0) Let δ ∈ (−m, 0) and m ≥ 2. Extend
Exercise 8.7 to the statement that, for K ≥ 1 fixed,
lim P(distPA(m,δ) (i, j) ≤ 2 ∀i, j ∈ [K]) = 1. (8.11.3)
n→∞ n

Exercise 8.9 (Early vertices are not at distance 2 when δ > 0) Let δ > 0 and m ≥ 2. Show that
lim lim P(distPA(m,δ) (i, j) = 2 ∀i, j ∈ [K]) = 0. (8.11.4)
K→∞ n→∞ n

Exercise 8.10 (Extension negative correlations to PA(m,δ)

n (b)) Verify that the proof of the negative corre-
lations in Lemma 8.10 applies also to PA(m,δ)
n (b).
Exercise 8.11 (Computing vertex-attachment probabilities for PA(m,δ)
n (d)) Consider PA(m,δ)
n (d), and fix
v ∈ [n] and nv ≥ 1. Use the Pólya graph representation in Theorem 5.10, as well as Lemma 5.12, to
compute P(Env ,v ) in (8.4.2).
Exercise 8.12 (Computing the vertex-attachment probabilities for PA(m,δ) n (d) (cont.)) Consider PA(m,δ)
n (d),
and fix k ≥ 1, v1 , . . . , vk ∈ [n] distinct, and nvi ≥ 1 for
T every i ∈ [k]. Use
the Pólya graph representation
in Theorem 5.10, as well as Lemma 5.12, to compute P t∈[k] Envt ,vt .

Exercise 8.13 (Negative correlations for PA(m,δ)

n (d)) Prove the negative correlations in Lemma 8.10 for
(m,δ)
PAn (d) by combining Exercises 8.11 and 8.12.
380 Small-World Phenomena in Preferential Attachment Models

Exercise 8.14 (Negative correlations for m = 1) Show that, for m = 1, Lemma 8.10 implies that if
(π0 , . . . , πk ) contains different coordinates as (ρ0 , . . . , ρk ) then
k−1
\ k−1
\ k−1
\ k−1
\
P {πi πi+1 } ∩ {ρi ρi+1 } ≤ P {πi πi+1 } P {ρi ρi+1 } . (8.11.5)
i=0 i=0 i=0 i=0

Exercise 8.15 (Extension of (8.4.15) to PA(1,δ)

n (b)) Prove that, for PA(1,δ)
n (b), (8.4.15) is replaced with
1 1 Γ(u1 − δ/(2 + δ))Γ(u2 − (1 + δ)/(2 + δ))Γ(v)
P u1

v, u2 v = (1 + δ) . (8.11.6)
Γ(u1 + 1/(2 + δ))Γ(u2 )Γ(v + 2/(2 + δ))
Exercise 8.16 (Most recent common ancestor in PA(1,δ) n ) Fix o1 , o2 to be two vertices in [n] chosen uar,
and let V be the oldest vertex that the paths from 1 to o1 and from 1 to o2 have in common in PA(1,δ)
n . Prove
P
that distPA(1,δ) (b) (1, V )/ log n −→ 0, as stated in Lemma 8.5.
n

Exercise 8.17 (Most-recent common ancestor in PA(1,δ) n (cont.)) Fix o1 , o2 to be two vertices in [n]
chosen uar, and let V be the oldest vertex that the paths from 1 to o1 and that from 1 to o2 have in common
in PA(1,δ)
n . Extend Exercise 8.16 to show that distPA(1,δ) (b) (1, V ) is tight.
n

Exercise 8.18 (Distance between n − 1 and n in PA(m,0)

n ) Show that distPA(m,0) (n − 1, n) ≥ kn? whp,
n
where kn? is defined in (8.5.65).
Exercise 8.19 (Distance between vertices 1 and 2 in PA(m,0) n ) Check the implications of the the analysis
in Section 8.5.4 for distPA(m,0) (1, 2). In particular, where does the proof that distPA(m,0) (1, 2) ≥ kn? whp,
n n
with kn? as in (8.5.65), fail, and why does this happen?
Exercise 8.20 (Connection probabilities in PA(m,δ)
n (d)) Extend (8.4.14) and (8.4.16) in Lemma 8.11 to
PA(m,δ)
n (d) using Corollary 8.16 and Lemma 5.14.
Exercise 8.21 (Connection probabilities in PA(m,δ)
n (d)) Consider PA(m,δ)
n (d) for m ≥ 2. Fix ` ≥ 1 and
ji
ui > v for all i ∈ [`]. What do Corollary 8.16 and Lemma 5.14 imply for P(ui v ∀i ∈ [`]) for ` = 3, 4?
Exercise 8.22 (Growth of Pólya point tree for δ < 0) Argue that Proposition 8.24 suggests that the size
of the kth generation in the Pólya point tree is at most exp B(τ − 2)−k/2 for some large constant B.
Here, you may use without proof that the degree of a vertex of age a is comparable with (n/a)1/(τ −1) .
Exercise 8.23 (Diameter of ERn (p)) Let p = n−α with α ∈ (0, 1). Show that whp diam(ERn (p)) ≤
1/(1 − α) when 1/(1 − α) is not an integer, and diam(ERn (p)) ≤ 1/(1 − α) + 1 when 1/(1 − α) is an
integer. Compare this to the claimed bound of diam(Innern ) in the proof of Proposition 8.27.
Exercise 8.24 (Upper tightness criterion for centered distPA(m,δ) (o1 , o2 )) Fix δ ∈ (−m, 0). Let o1 be
2n
chosen uar from [2n]. Use (8.7.26) and Theorem 8.26 to show that distPA(m,δ) (o1 , o2 ) − 4 log log n/
2n
| log(τ −2)| is upper tight when distPA(m,δ) (o1 , Coren ) is tight. Is it plausible that distPA(m,δ) (o1 , Coren )
2n 2n
is tight?
Exercise 8.25 (Upper bound on the diameter of (PA(m,δ)n (d))n≥1 ) Consider (PA(m,δ)n (d))n≥1 , and define
the subtree of PA(m,δ)
n (d) by considering only the first edge of each vertex. Use the Pólya urn representation
in Theorem 5.10 and an adaptation of the proof of (8.2.24) to show that height(PA(m,δ) n (d)) ≤ c2 log n
for some c2 < ∞.
Exercise 8.26 (Expectation of number of minimally k-connected vertices) Consider PA(m,δ) n (d) with
m ≥ 2 and δ ∈ (−m, 0). Recall the definition of minimally k-connected vertices in Definition 8.35. Prove
that, for all k ≤ (1 − ε) log log n/ log m,
E[Mk ] ≥ n1−ε . (8.11.7)
(m,δ)
Exercise 8.27 (Second moment of number of minimally k-connected vertices) Consider PAn (d) with
m ≥ 2 and δ ∈ (−m, 0) as in Exercise 8.26. Prove that, for all k ≤ (1 − ε) log log n/ log m,
E[Mk2 ] = E[Mk ]2 (1 + o(1)). (8.11.8)
8.11 Exercises for Chapter 8 381

Exercise 8.28 (Concentration of number of minimally k-connected vertices: proof of Lemma 8.36) Con-
sider PA(m,δ)
n (d) with m ≥ 2 and δ ∈ (−m, 0) as in Exercise 8.26. Use Exercises 8.26 and 8.27 to prove
P
that Mk /E[Mk ] −→ 1 for all k ≤ (1 − ε) log log n/ log m, as in Lemma 8.36.
Exercise 8.29 (Monotonicity of distances in PA(m,δ) n ) Fix m ≥ 1 and δ > −m. Show that n 7→
distPA(m,δ) (i, j) is non-decreasing for n ≥ i ∨ j.
n

Exercise 8.30 (Distance evolution in PA(1,δ)

n ) Fix m = 1 and δ > −1. Show that n 7→ distPA(1,δ) (i, j)
n
is constant for n ≥ i ∧ j.
Exercise 8.31 (Distance structure on N due to PA(m,δ)
n ) Fix m ≥ 1 and δ > −m. Use Exercise 8.29 to
a.s.
show that distPA(m,δ) (i, j) −→ dist∞ (i, j) < ∞ as n → ∞ for all i, j ≥ 1. Thus, dist∞ is a distance
n
function on N.
Exercise 8.32 (Nearest neighbors on N due to PA(m,δ)
n ) Fix m ≥ 2 and δ > −m. Recall dist∞ from
Exercise 8.31. Compute the asymptotics of P(dist∞ (i, j) = 1) for j large and i fixed.
Exercise 8.33 (Infinitely many neighbors) Fix m ≥ 2 and δ > −m. Recall dist∞ from Exercise 8.31.
Show that #{j : dist∞ (i, j) = 1} = ∞ almost surely for all i ≥ 1, so there are infinitely many vertices at
distance 1 for every vertex in N.
Exercise 8.34 (Eventually distances are at most 2 in PA(m,δ)
n ) Fix m ≥ 2 and δ ∈ (−m, 0). Recall dist∞
from Exercise 8.31. Show that dist∞ (i, j) ≤ 2 almost surely for all i, j.
Exercise 8.35 (Evolution of distances in PA(m,δ)
t (b): critical parametric choice) Fix m ≥ 2 and δ ∈
(−m, 0). Choose o(n)
1 , o2
(n)
uar from [n], and take t = tn = n exp{(log n)α } for some α ∈ (0, 1). Use
Theorem 8.38 to identify θα such that
distPA(m,δ) (b) (o(n) (n)
1 , o2 ) P
t
−→ θα . (8.11.9)
log log n
Exercise 8.36 (Tight distances in PA(m,δ)
t for δ < 0) Fix m ≥ 2 and δ ∈ (−m, 0). Choose o1(n) , o(n)
2
uar from [n], and take t = tn = n exp{(log n)α } for some α > 1. Use Theorem 8.38 to show that
(n) (n)
distPA(m,δ) (b) (o1 , o2 ) is a tight sequence of random variables.
t

Exercise 8.37 (Degree distribution in BPA(f )

n with logarithmic Pcorrections) Show that −2
the degree distri-
bution of BPA(f )
n with f in (8.9.6) in Theorem 8.40 satisfies l>k pl = (1 + o(1))ck (log k)−2α for
some constant c > 0.
Part IV

Summary of Part III

In Part III we investigated the small-world behaviour of random graphs, extending the results
on the existence and uniqueness of the giant component as informally described in Meta
Theorem A on page 243. It turns out that the results are all quite similar, even though the
details of the description of the models are substantially different. We can summarize the
results obtained in the following meta theorem:

Meta Theorem B (Small-world and ultra-small-world characteristics) In a random

graph model with power-law degrees having power-law exponent τ , the typical
distances of the giant component in a graph of size n are of order log log n when
τ ∈ (2, 3), while they are of order log n when τ > 3. Further, these typical
distances are highly concentrated.

Informally, these results quantify the “six degrees of separation” paradigm in random
graphs, where we see that random graphs with very heavy-tailed degrees have ultra-small
typical distances, as could perhaps be expected.
Often, even the lines of proof of these results are similar, relying on clever path-counting
techniques. In particular, the results show that in both generalized random graphs and con-
figuration models, in the τ ∈ (2, 3) regime vertices of high degrees, say k , are typically
connected to vertices of even higher degree, of order k 1/(τ −2) . In the preferential attachment
model, on the other hand, this is not true, yet vertices of degree k tend to be connected to
vertices of degree k 1/(τ −2) in two steps, making typical distances roughly twice as large.

Overview of Part IV
In Part IV we study several related random graph models that can be seen as extensions of
the simple models studied so far. They incorporate novel features, such as directed edges,
clustering, communities, and/or geometry. The important aspect in Part IV will be to verify
to what extent the main results informally described in Meta Theorems A (see the start of
Part III) and B (see above) remain valid, and otherwise, to what extent they need to be
adapted. We will not give complete proofs but instead informally explain why results are
similar to those in Meta Theorems A and B or, instead, why they are different.

383
C HAPTER 9
R ELATED M ODELS

Abstract
In this chapter we discuss some related random graph models that have been
studied in the literature. We explain their relevance, as well as some of the prop-
erties in them. We discuss directed random graphs, random graphs with local
and global community structures, as well as spatial random graphs.

Organization of this Chapter

We start in Section 9.1 by extensively discussing the real-world network example of citation
networks. The aim there is to show that the models, as discussed so far, tend not to resemble
some, or even many, real-world systems. Citation networks are directed, have substantial
clustering, have a hierarchical community structure, and possibly even have a spatial com-
ponent to them. While we focus on citation networks, we also highlight the fact that that we
could have chosen other real-world examples as well.
We continue in Section 9.2 by considering directed versions of the random graphs studied
in this book. In Sections 9.3 and 9.4, we introduce several random graph models that have a
community structure, in order to model the communities occurring in real-world networks.
Section 9.3 studies the setting where communities are macroscopic, in the sense that there is
a bounded number of communities even when the network size tends to infinity. In Section
9.4, instead, we look at the setting where the communities have a bounded average size. We
argue that both settings are relevant. In Section 9.5, we discuss random graph models that
have a spatial component to them, and explain how this spatial structure gives rise to highly
clustered random graphs. We close this chapter with notes and discussion in Section 9.6 and
with exercises in Section 9.7. We do not give all the relevant proofs but refer to the notes
and discussion in Section 9.6 for details of where these proofs can be found.
Throughout this chapter, Cmax denotes the largest (strongly) connected component of the
model under consideration and C(2) denotes its second largest cluster (breaking ties arbitrar-
ily when needed).

9.1 M OTIVATION : R EAL -W ORLD N ETWORK M ODELING

Here, we discuss real-world network models. We start in Section 9.1.1 by discussing citation
networks in detail. In Section 9.1.2 we draw conclusions about network modeling.

9.1.1 E XAMPLE : C ITATION N ETWORKS

In this subsection we discuss citation networks as a real-world network example. This ex-
ample shows how real-world networks are different from the stylized network models that
we have discussed so far. In fact, many real-world networks are directed, in that edges point
from one vertex to another. Also, real-world networks often display a pronounced commu-

385
386 Related Models

nity structure, in that certain parts are more densely connected than the rest of the network,
and these communities are relevant in practice.
In citation networks, vertices denote scientific papers and the directed edges correspond
to citations of one paper to another. Obviously, such citations are directed, since it makes a
difference whether your paper cites mine, or my paper cites yours.
Citation networks grow in time. Indeed, papers do not disappear, so a citation, once made
in a published paper, does not disappear either. Further, their growth is enormous. Figure
9.1(a) shows that the number of papers in various fields grows exponentially in time, mean-
ing that more and more papers are being written. If you ever wondered why scientists seem
to be ever more busy, then this may be an obvious explanation.
In Figure 9.1(a) we display the number of papers in three different domains, namely,
Probability and Statistics (PS), Electrical Engineering (EE), and Biotechnology and Applied
Microbiology (BT). The data comes from the Web of Science data base. While exponential
growth is quite prominent in the data, it is somewhat unclear how this exponential growth
arises. It could be due either to the fact that the number of journals that are listed in Web of
Science grows over time or to the fact that journals contain more and more papers. However,
the exponential growth was observed as early as the 1980’s; see the book by Derek de Solla
Price (1986), appropriately called Little science, big science.
As you can see, we have already restricted to certain subfields in science, the reason being
that the publication and citation cultures in different fields are vastly different. Thus, we have
attempted to go to a situation in which the networks that we investigate are somewhat more
homogeneous. For this, it is relevant to be able to distinguish such fields, and to decide which
papers (or journals) contribute to which field. This is a fairly daunting task. However, it is
also an ill-defined task, as no subdomain is truly homogeneous. Let me restrict myself to
probability and statistics, as I happen to know this area best. In probability and statistics,
there are subdomains that are very pure, as well as areas that are highly applied such as
applied statistics. These areas do indeed have different publication and citation cultures.
Thus, science as a whole is probably hierarchical, in that large scientific disciplines can be
identified, that can, in turn, be subdivided into smaller subdomains, etc. However, one should
stop somewhere, and the three scientific disciplines relating to Figure 9.1 are homogeneous
enough to make our point.
Figure 9.1(b) shows the log–log plot for the in-degree distribution in these three citation
networks. We notice that these data sets seem to have empirical power-law citation distri-
butions. Thus, on average, papers attract few citations but the variability in the number of
citations is rather substantial. We are also interested in the dynamics of the citation distri-
bution of the papers published in a given year, as time proceeds. This can be observed in
Figure 9.2. We see a dynamical power law, meaning that at any time the degree distribution
of a cohort of papers from a given time period (in this case 1984) is close to a power law, but
the exponent changes over time (and in fact decreases, which corresponds to heavier tails).
When time grows quite large, the power law approaches a fixed value.
Interestingly, the existence of power-law in-degrees in citation networks also has a long
history. Derek de Solla Price (1965) observed it and even proposed a model for it that relied
on a preferential attachment mechanism, more than two decades before Barabási and Albert
(1999) proposed the first preferential attachment model.
9.1 Motivation: Real-World Network Modeling 387

(a) (b)
105 100
PS PS
EE EE
10−1 BT
BT
Yearly Publications

10−2

10−3
104
10−4

10−5

10−6

103 10−7 0
1980 1985 1990 1995 2000 2005 2010 2014 10 101 102 103

Figure 9.1 (a) Number of publications per year (logarithmic y axis). (b) Log–log
plot for the in-degree distribution tail in citation networks.

PS EE BT
100 100 100
1984 1984 1984
1988 1988 1988
1992 1992 1992
−1
1996 10 1996 1996
10−1 2000 2000 10−1 2000
2004 2004 2004
2008 2008 2008
2012 10−2 2012 2012
10−2 10−2

10−3

10−3 10−3
10−4
100 101 102 100 101 102 103 100 101 102

Figure 9.2 Degree distribution for papers from 1984 versus time.

We wish to discuss two further properties of citation networks and their dynamics. In
Figure 9.3 we see that the majority of papers stop receiving citations after some time, while
a few others keep being cited for longer times. This inhomogeneity in the evolution of vertex
in-degrees is not present in classical preferential attachment models, where the degree of
every fixed vertex grows as a positive power of the graph size. Figure 9.3 shows that the
number of citations of papers published in the same year can be rather different, and the
majority of papers actually stop receiving citations quite soon. In particular, after a first
increase the average increment of citations decreases over time (see Figure 9.4). We observe

PS EE BT
40
35
100
35
30
30 80
25
25
20 60
20

15 15
40
10 10
20
5 5

0 0 0
1990 2000 2010 1990 2000 2010 1990 2000 2010

Figure 9.3 Time evolution for citations of 20 randomly chosen papers from 1980
for PS and EE, and from 1982 for BT.
388 Related Models
PS EE BT
0.5
1984 0.7 1984
1987 1987
0.4 1990 1990
0.4 1993 0.6 1993

0.5
0.3
0.3
0.4

0.2 1984 0.2

0.3
1987
1990
1993 0.2
0.1 0.1
0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20

Figure 9.4 Average citation increment over a 20-year time window for papers
published in different years. PS presents an aging effect different from EE and BT,
showing that papers in PS receive citations longer than papers in EE and BT.

PS - Distribution of Age of Cited Paper EE - Distribution of Age of Cited Paper BT - Distribution of Age of Cited Paper

2004 0.12 2004 0.12 2004

0.08 2007 2007 2007
2011 2011 2011
2014 0.1 2014 0.1 2014
0.06
0.08 0.08

0.04 0.06 0.06

0.04 0.04
0.02
0.02 0.02

0 0 0
0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20
Age (years) Age (years) Age (years)

Figure 9.5 Distribution of the ages of cited papers, for PS, EE, and BT, for
different citing years.

a difference in this aging effect between the PS data set and the other two data sets, due to the
fact that in PS scientists tend to cite older papers than in EE or BT, again exemplifying the
differences in citation and publication patterns in different fields. Nevertheless, the average
increment of citations received by papers in different years tends to decrease over time for
all three data sets.
A last characteriztic that we observe is the log-normal distribution of the age of cited
papers. In Figure 9.5, we plot the distribution of cited papers, looking at references made by
papers in different years. We have used a 20-year time window in order to compare different
citing years. Notice that this log-normal distribution seems to be rather stable over time, and
the shape of the curve is also similar for different fields.
Let us summarize the differences between citation networks and the random graph mod-
els that form the basis of network science. First, citation networks are directed, which is
different from the typically undirected models that we have discussed so far. However, it is
not hard to adapt our models to become directed, and we explain this in Section 9.2. Second,
citation networks have a substantial community structure, in that parts of the network exist
that are much more densely connected than the network as a whole. We can argue that both
communities exist on a macroscopic scale, for example in terms of the various scientific
disciplines of which science consists, as well as on a microscopic scale, where research net-
works of small groups of scientists create subnetworks that are more densely connected than
the whole network. One could even argue that geography plays an important role in citation
networks, since many collaborations between scientists are within their own university or
country, even though we all work with various researchers around the globe.
9.1 Motivation: Real-World Network Modeling 389

Third, citation networks are dynamic, like preferential attachment models (PAMs), but
their time evolution is quite different from PAMs, as the linear growth in PAMs is replaced
by an exponential growth in citation networks. Further, papers in citation networks seem to
age, as seen both in Figures 9.3 and 9.4, in that citation rates become smaller for large times,
in such a way that typically paperscompletely stop receiving citations at some (random)
point in time.
In conclusion, finding an appropriate model for citation networks is quite a challenge, and
one should be quite humble in one’s expectation that the standard models are in any way
indicative of the complexity of real-world networks.

9.1.2 G ENERAL R EAL -W ORLD N ETWORK M ODELS

We have concluded that real-world networks tend to have many properties that the models
that we have discussed so far in this book do not have, in that they may be directed, may have
community structure, and may have spatial structure. Further, real-world networks often
evolve over time and should therefore be considered as temporal networks. In some cases,
several networks even work together, and thus should be seen not as separate networks but
as multiplex networks. For example, in transporting people, the railroad and road networks
are both highly relevant. At larger distances, airline networks also become involved. Thus,
to study how people move around the globe, we cannot study each of these networks in
isolation. In science, the collaboration networks of authors and the citation networks of
papers together give a much clearer picture of how science works than these networks in
isolation, and in fact these more restricted views can offer useful insights as well.
One may become rather overwhelmed by the complexity that real-world networks pro-
vide. Indeed, high-level complex network science can be viewed to be part of complexity
theory, the science of complex systems. The past decades have given rise to a bulk of in-
sights, often based on relatively simple models such as those discussed so far. Indeed, often
Box (1976) is quoted as saying that “All models are wrong but some are useful.” Box (1979)
gave a more elaborate version of this quote as follows:

Now it would be very remarkable if any system existing in the real world could be exactly
represented by any simple model. However, cunningly chosen parsimonious models often
do provide remarkably useful approximations. For example, the law P V = RT relating
pressure P , volume V and temperature T of an “ideal” gas via a constant R is not exactly
true for any real gas, but it frequently provides a useful approximation and furthermore
its structure is informative since it springs from a physical view of the behavior of gas
molecules. For such a model there is no need to ask the question “Is the model true?” If
“truth” is to be the “whole truth” the answer must be “No.” The only question of interest
is “Is the model illuminating and useful?”

Thus, we should not feel discouraged at all! In particular, it is important to know when
to include extra features into the model at hand, so that it becomes more “useful.” For this,
the first step is to come up with models that do incorporate these extra features. Many of
the models discussed so far can rather straightforwardly be adapted to include features such
390 Related Models

(a) (b)
106

106

105
Maximum out-degree

Maximum in-degree
105
4
10

104

103
103

2
10
102

101
105 106 107 105 106 107
Size Size
Figure 9.6 Maximum (a) out- and (b) in-degrees of the 229 networks of size larger
than 10,000 from the KONECT data base.
as directedness, community structure, and geometry. Further, these properties can also be
combined. The simpler models that do not have such features serve as a useful model for
comparison, and can thus act as a “benchmark” for more complex situations. In this way,
the understanding of simple models often helps one to understand more complex models
since many properties, tools, and ideas can be extended to them. In some cases the extra
features give rise to a richer behavior, which then merits being studied in full detail. In this
way, network science has moved significantly forward compared with the models described
so far. The aim of this chapter is to highlight some of the lessons learned.
We discuss directed random graphs in Section 9.2, random graphs with macroscopic or
global communities in Section 9.3, random graphs with microscopic or local communities
in Section 9.4, and spatial random graphs in Section 9.5.

9.2 D IRECTED R ANDOM G RAPHS

Many real-world networks are directed, in the sense that edges are oriented. For example, in
the World-Wide Web, the vertices are web pages, and the edges are the hyperlinks between
them. One could forget about these directions, but that would discard a wealth of informa-
tion. For example, in citation networks it makes a substantial difference whether my paper
cites yours, or yours cites mine. See Figure 9.6 for the maximum out- and in-degrees in the
KONECT data base.
This section is organized as follows. We start by defining directed graphs or digraphs.
After this, we discuss various models invented for them. We discuss directed inhomogeneous
random graphs in Section 9.2.1, directed configuration models in Section 9.2.2, and directed
preferential attachment models in Section 9.2.3.
A digraph G = (V (G), E(G)) on the vertex set V (G) = [n] has an edge set that is a
subset of the set E(G) ⊆ [n]2 = {(u, v) : u, v ∈ [n]} of all ordered pairs of elements of
[n]. Elements of G are called directed edges or arcs.

Connectivity Structure of Digraphs

The edge orientations significantly impact the connectivity structure of digraphs. Indeed,
a given vertex v has both a forward connected component consisting of all the vertices to
9.2 Directed Random Graphs 391

0.9

0.8

Relative size of LSCC

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
102 103 104 105 106 107
Size
Figure 9.7 Proportion of vertices in the largest strongly connected component
(LSSC) in the 229 networks of size larger than 10,000 from the KONECT data base.

Tendrils
164m
4.61%

OUT
IN SCC 215m
1,139m 1,828m 6.05%
31.96% 51.28%

Tube
9,1m
Disconnected Components 0.26%
208m
5.84%

Figure 9.8 The WWW according to Broder et al. (2000), with updated numbers
from Fujita et al. (2019).

which it is connected, as well as a backward connected component. Further, every vertex has
a strongly connected component (SCC) consisting of those vertices to which there exist both
a forward as well as a backward path. Exercise 9.1 shows that the SCCs in a graph are well
defined. See Figure 9.7 for the proportion of vertices in the largest SCC in the KONECT
data base.
The different notions of connectivity divide the graph up into several disjoint parts. Often,
there is a unique largest SCC that contains a positive proportion of the graph. The IN-part
of the graph consists of collections of vertices that are forward connected to the largest
SCC, but not backward connected. Further, the OUT-parts of the graph are the collections of
vertices that are backward connected to the largest SCC, but not forward connected. Finally,
there are the parts of the graph that are in neither of these parts, and consist of their own
SCC and IN- and OUT-parts. See Figure 9.8 for a description of these parts of the WWW,
as well as an estimate of their relative sizes.
392 Related Models

Local Convergence of Digraphs

It turns out that there are several ways to define local convergence for digraphs, owing to the
fact that local convergence is defined in terms of neighborhoods, which in turn depend on the
connectivity that one wishes to use. For the neighborhood Br(G) (v), do we wish to explore all
vertices that can be reached from v (which is relevant when v is the source of an infection),
or all the vertices from which we can reach v (which is relevant when investigating whether
v can be infected by others, and also in the case of PageRank)? One might also consider all
edges at the same time, thus ignoring the directions of the edges in the local neighborhoods
and possibly keeping these as edge marks.
Let distG (u, v) be the (directed) graph distance from u to v , i.e., the minimal number of
directed edges needed to connect u to v . Note that, for digraphs, distG (u, v) and distG (v, u)
may be different. We define Br(G) (u) = (V (Br(G) (u)), E(Br(G) (u)), u), the forward explo-
ration neighborhood, by
V (Br(G) (u)) = {v ∈ V (G) : distG (u, v) ≤ r}, (9.2.1)
E(Br(G) (u)) = {(x, y) ∈ E(G) : distG (u, x), distG (u, y) ≤ r}, (9.2.2)
as in (2.3.1). When no confusion can arise wrt the graph that we are considering, we some-
times write Br(out) (u) instead of Br(G) (u), to indicate that we are following the out-edges in
our exploration of the neighborhood.
For the backward neighborhood, distG (u, v) ≤ r in the definition of V (Br(G) (u)) is re-
placed by distG (v, u) ≤ r, and distG (u, x), distG (u, y) in the definition of E(Br(G) (u)) is
replaced by distG (x, u), distG (y, u), and we similarly sometimes write this as Br(in) (u).
One could also consider the forward–backward exploration neighborhood, which is the
union of the forward and the backward exploration neighborhoods.
Finally, we also sometimes consider the half-edge-marked directed neighborhood, ob-
tained by ignoring the directions of edges in the neighborhood (thus effectively replacing
the graph by its undirected version), but edge-marking the edges to indicate their direction.
Such a mark is indicated at each of the vertices in the edge, and thus can be thought of as a
mark on the half-edges incident to the vertices, indicating the direction of the half-edge.
The notion of isomorphisms in Definition 2.3, and of the metric on directed rooted graphs
in Definition 2.5, can straightforwardly be adapted. From this moment on, the definition
of the forward local convergence of deterministic digraphs, and that of the forward local
convergence of random digraphs in their various settings, can straightforwardly be adapted
as well.
There is a minor catch, though. While the forward exploration neighborhood keeps track
of the out-degrees d(out)
v
(out)
for all v ∈ Br−1 (u), it does not keep track of the in-degrees d(in)
v
(out)
for v ∈ Br−1 (u). Similarly, the backward exploration neighborhood keeps track of the in-
v for all v ∈ Br−1 (u), but not of the out-degrees dv
(in)
degrees d(in) (out) (in)
for v ∈ Br−1 (u). Below,
we sometimes need these as well. For this, we add a degree mark to the vertices, which
indicates the in-degrees in the forward exploration neighborhood, and the out-degrees of
the vertices in the backward exploration neighborhood. Particularly in random graphs that
are locally tree-like, the edges in the other direction than the one explored will often go to
vertices that are far away, and thus are often not in the explored neighborhood themselves.
Therefore, to use this information, we need to explicitly keep track of it.
9.2 Directed Random Graphs 393

Let us explain how this marking is defined. For a vertex v , let m(v) denote its mark.
We then define an isomorphism φ : V (G1 ) → V (G2 ) between two labeled rooted graphs
(G1 , o1 ) and (G2 , o2 ) to be an isomorphism between (G1 , o1 ) and (G2 , o2 ) that respects the
marks, i.e., for which m1 (v) = m2 (φ(v)) for every v ∈ V (G1 ), where m1 and m2 denote
the degree-mark functions on G1 and G2 respectively. We then define R? as in (2.2.2), and
the metric on rooted degree-marked graphs as in (2.2.3). We call the resulting notion of local
convergence (LC) marked forward LC and marked backward LC, respectively.
Even when we are considering forward–backward neighborhoods, the addition of marks is
necessary. Indeed, while for the root of this graph we do know by construction its in- and out-
degrees, for the other vertices in the forward and backward neighborhoods this information
is still not available.
Exercises 9.3–9.5 investigate the various notions of local convergence for some directed
random graphs that are naturally derived from undirected graphs.
While the above discussion may not be relevant for all questions that one may wish to
investigate using local convergence techniques, it is useful for the discussion of PageRank,
as we discuss next.

Convergence of PageRank
Recall the definition of PageRank from [V1, Section 1.5]. Let us first explain the solution in
the absence of dangling ends, so that d(out)v ≥ 1 for all v ∈ [n]. Let Gn = (V (Gn ), E(Gn ))
denote a digraph, where we let n = |V (Gn )| denote the number of vertices. Fix the damping
factor α ∈ (0, 1). Then we let the vector of PageRanks (Rv(Gn ) )v∈V (Gn ) be the unique
solution to the equation
X R(Gn )
u
Rv(Gn ) = α (out) + 1 − α, (9.2.3)
u→v
du
P
satisfying the normalization v∈V (Gn ) Rv(Gn ) = n.
The damping parameter α ∈ (0, 1) guarantees that (9.2.3) has a unique solution. This
solution can be understood in terms of the stationary distribution of a “bored surfer.” In-
deed, denote πv = Rv(Gn ) /n, so that (πv )v∈V (Gn ) is a probability distribution that satisfies a
similar relation to (Rv(Gn ) )v∈V (Gn ) in (9.2.3), namely
X πu 1−α
πv = α (out) + . (9.2.4)
u→v
du n

Therefore, (πv )v∈V (Gn ) is the stationary distribution of a random walker, which, with prob-
ability α, jumps according to a simple random walk, i.e., it chooses any of the out-edges
with equal probability, while, with probability 1 − α, the walker is bored, forgets about the
search they were doing, and jumps to a uniform vertex in V (Gn ).
In the presence of dangling ends, we can just redistribute their mass equally over all
vertices that are not dangling ends, so that (9.2.3) becomes
X R(Gn ) α X (Gn )
Rv(Gn ) = α u
1{u6 ∈ D} + R + 1 − α, (9.2.5)
u→v
d(out)
u n u∈D u

where D = {v : d(out)
v = 0} ⊆ V (Gn ) denotes the collection of dangling vertices.
394 Related Models

The damping factor α is quite crucial. When α = 0, the stationary distribution is just
πv = 1/n for every v ∈ V (Gn ), so that all vertices have PageRank 1. This is not very
informative. On the other hand, PageRank possibly converges quite slowly when α is close
to 1, and this is also not what we want. Experimentally, α = 0.85 seems to work well and
strikes a nice balance between these two extremes.
We next investigate the convergence of the PageRank distribution on a directed graph
sequence (Gn )n≥1 that converges locally:
Theorem 9.1 (Existence of asymptotic PageRank distribution) Consider a sequence of
directed random graphs (Gn )n∈N and let on ∈ V (Gn ) be chosen uar. Then, the following
hold:
(a) If Gn converges locally weakly in the marked backward sense to (Ḡ, ō) ∼ µ̄, then there
(Ḡ)
exists a limiting distribution R∅ , with Eµ [R∅
(Ḡ)
] ≤ 1, such that
d (Ḡ)
Ro(Gnn ) −→ R∅ . (9.2.6)
(b) If Gn converges locally in probability in the marked backward sense to (G, o) ∼ µ,
then there exists a limiting distribution R∅ (G)
, with Eµ [R∅ (G)
] ≤ 1, such that, for every
(G)
r > 0 that is a continuity point of the distribution R∅ ,
1 X
1{Rv(Gn ) >r} −→ P
µ (R∅ (G)
> r) . (9.2.7)
n v∈V (G )
n

Interestingly, the positivity of the damping factor also allows us to give a power-iteration
formula for Rv(Gn ) , and thus for R∅ . Indeed, let
1{j→i}
A(G
i,j
n)
= , i, j ∈ V (Gn ) (9.2.8)
d(out)
j

denote the (normalized) adjacency matrix of the graph Gn . Then, Rv(Gn ) can be computed as
follows:
∞
X X
Rv(Gn ) = (1 − α) αk (A(Gn ) )kv,i . (9.2.9)
k=0 i∈V (Gn )

As a result, when Gn converges locally in probability in the marked backward sense with
limit (G, ∅), then R∅
(G)
can be computed:
∞
X X
(G)
R∅ = (1 − α) αk (A(G) )k∅,i , (9.2.10)
k=0 i∈V (G)

where A(G)i,j is the normalized adjacency matrix of the backwards local limit G. We refer to
the notes in Section 9.6 for a discussion of Theorem 9.1, including consideration of an error
in its original statement.

Power-Law Hypothesis for PageRank

Recall from [V1, Section 1.5] that the PageRank power-law hypothesis states that the PageR-
ank distribution satisfies a power law with the same exponent as that of the in-degree.
By Theorem 9.1, this can be rephrased by stating that P(R∅ > r) r−(τin −1) when
9.2 Directed Random Graphs 395

P(D∅ (in)
> r) r−(τin −1) (we are deliberately being vague about what means in this
context).
Exercises 9.6 and 9.7 investigate the implications of (9.2.10) for the power-law hypothesis
for random graphs having bounded out-degrees.

9.2.1 D IRECTED I NHOMOGENEOUS R ANDOM G RAPHS

Let (xi )i∈[n] be a sequence of variables, with values in a type space S , such that the empirical
distribution of (xi )i∈[n] approximates a measure µ as n → ∞. That is, we assume that, for
each µ-continuous Borel set A ⊆ S , as n → ∞, we have
1
|{i ∈ [n] : xi ∈ A}| → µ(A). (9.2.11)
n
Here we say that a Borel set A is µ-continuous whenever its boundary ∂A has zero proba-
bility, i.e., µ(∂A) = 0.
Given n, let Gn be the random digraph on the vertex set (xi )i∈[n] with independent arcs
having probabilities
pij = P((xi , xj ) ∈ E(Gn )) = 1 ∧ (κ(xi , xj )/n), i, j ∈ [n]. (9.2.12)
We denote the resulting graph by DIRGn (κ). Combining S , µ, and κ, we obtain a large
class of inhomogeneous digraphs with independent arcs. Obviously, the model includes di-
graphs with power-law in-degree and/or out-degree distributions. This general undirected
model was studied in detail in Chapter 3.
We will be a little less general here, since we assume that κ in (9.2.12) does not depend on
n, which simplifies the exposition. Note that in the case of undirected graphs it is necessary
to assume, in addition, that the kernel κ is symmetric. In the definition of digraphs Gn for
n ≥ 2, we do not require symmetry of the kernel.

Assumptions on the Kernel

We need to impose further conditions on the kernel κ, like those in Chapter 3. Namely, we
need to assume that the kernel κ is irreducible (µ × µ)-almost everywhere. That is, for
any measurable A ⊆ S with µ(A) 6∈ {0, 1}, the identity (µ × µ)({(s, t) ∈ A × (S \
A) : κ(s, t) = 0}) = 0 implies that either µ(A) = 0 or µ(S \ A) = 0. In addition, we
assume that κ is continuous almost everywhere on (S × S, µ × µ) and that the number of
arcs in DIRGn (κ), denoted by |E(DIRGn (κ))|, satisfies that, as n → ∞,
1
Z
P
|E(DIRGn (κ))| −→ κ(s, t)µ(ds)µ(dt) < ∞. (9.2.13)
n S×S

Examples of Directed Inhomogeneous Random Graphs

We next discuss some examples of DIRGn (κ).
Directed Erdős–Rényi random graph. The most basic example is the directed Erdős–Rényi
random graph, for which puv = pvu = λ/n. In this case, κ(xu , xv ) = λ and S = {1}.
Finite-type directed inhomogeneous random graphs. Slightly more involved than the above
are kernels of finite type, in which case (s, r) 7→ κ(s, r) takes on finitely many values and
396 Related Models

S = [t]. Such kernels are also highly convenient to approximate more general models, as
was exemplified in an undirected setting in Chapter 3.
Directed rank-1 inhomogeneous random graphs. We next generalize rank-1 inhomogeneous
random graphs. For v ∈ [n], let wv(in) and wv(out) be its respective in- and out-weights, which
will have the interpretation of the asymptotic average in- and out-degrees, respectively, under
a summation-symmetry condition on the weights. Then, the directed generalized random
graph DGRGn (w) has edge probabilities given by

wu(out) wv(in)
puv = p(DGRG)
uv = , (9.2.14)
`n + wu(out) wv(in)
where
1 X (out)
`n = (w + wv(in) ). (9.2.15)
2 v∈[n] v

Let (Wn(out) , Wn(in) ) = (wo(out) , wo(in) ) denote the in- and out-weights of a uniformly chosen
vertex o ∈ [n]. Similarly to Condition 1.1, we assume that
d
(Wn(out) , Wn(in) ) −→ (W (out) , W (in) ) (9.2.16)

and
E[Wn(out) ] → E[W (out) ], E[Wn(in) ] → E[W (in) ], (9.2.17)

where (W (out) , W (in) ) is the limiting in- and out-weight distribution. Exercises 9.8 investi-
gates the expected number of edges in this setting.
When thinking of wv(out) and wv(in) as corresponding to the approximate out- and in-degrees
of vertex v ∈ [n], it is reasonable to assume that

E[W (in) ] = E[W (out) ]. (9.2.18)

Indeed, if Dv(out) and Dv(in) denote the out- and in-degrees of vertex v ∈ [n], we know that
(recall Exercise 9.2)
X X
Dv(out) = Dv(in) . (9.2.19)
v∈[n] v∈[n]

Thus, if indeed wv(out) and wv(in) are approximately equal to the out- and in-degrees of vertex
v ∈ [n] then also
X X
wv(out) ≈ wv(in) , (9.2.20)
v∈[n] v∈[n]

which, assuming (9.2.16), proves (9.2.18).

As discussed in more detail in Section 1.3.2 (see in particular (1.3.8) and (1.3.9)), many
related versions of the rank-1 inhomogeneous random graph exist. We refrain from giving
more details here.
9.2 Directed Random Graphs 397

Multi-Type Marked Branching Processes for DIRGn (κ)

For large n, the local convergence and phase transition in the digraph Gn can be described
in terms of the survival probabilities of the related multi-type branching processes with type
space S . Let us now introduce the necessary mixed-Poisson branching processes. Given s ∈
S , let X (s) and Y(s), respectively, denote the branching processes starting at an individual
of type s ∈ S such that the number of children of types in a subset A ⊆ S of an individual
of type r ∈ S have Poisson distributions with means,
Z Z
κ(r, u)µ(du), and κ(u, r)µ(du), (9.2.21)
A A

respectively. These numbers are independent for disjoint subsets A and for different individ-
uals. The two branching processes X (s) and Y(s) correspond to the forward and backward
limits of DIRGn (κ), respectively.
We now extend the discussion by defining the marks of these branching processes. With
each individual
R of type r we associate
R independent marks having Poisson distributions with
means S κ(r, u)µ(du) and S κ(u, r)µ(du), respectively. These random variables corre-
spond to the “in-degrees” for the forward exploration process X (s) and the “out-degrees”
for the backward exploration process Y(s). Finally, for the forward–backward setting, we
let the marked branching processes X (s) and Y(s) be independent. We call these objects
Poisson marked branching processes with kernel κ. As inR Section 3.4.3, we let T κ be defined
as in (3.4.15), i.e., for f : S → R, we let (T κ f )(x) = S κ(x, y)f (y)µ(dy).

Local Convergence for DIRGn (κ)

The following theorem describes the local convergence of DIRGn (κ):

Theorem 9.2 (Local convergence of DIRGn (κ)) Suppose that κ is irreducible and con-
tinuous almost everywhere on (S × S, µ × µ) and that (9.2.13) holds. Then, DIRGn (κ)
converges locally in probability in the marked forward, backward, and forward–backward
sense to the above Poisson marked branching processes with kernel κ, where the law of the
type of the root ∅ is µ.

We do not give a proof of Theorem 9.2, but refer to Section 9.6 for its history. In Exercise
9.10 the reader is asked to determine the local limit of the directed Erdős–Rényi random
graph. Exercise 9.11 proves Theorem 9.2 in the case of finite-type kernels, while Exercise
9.12 investigates the local convergence of the directed generalized random graph.

Giant Component in DIRGn (κ)

The critical point of the emergence of the giant strongly connected component is determined
when the averaged joint survival probability
Z
ζ= ζX (s)ζY (s)µ(ds) (9.2.22)
S

becomes positive. Here ζX (s) and ζY (s) denote the survival probabilities of X (s) and Y(s),
respectively. The following theorem describes the phase transition in DIRGn (κ):
398 Related Models

Theorem 9.3 (Phase transition in DIRGn (κ)) Suppose that κ is irreducible and continu-
ous almost everywhere on (S × S, µ × µ) and that (9.2.13) holds. Then
P
|Cmax |/n −→ ζ, (9.2.23)
P P
while |C(2) |/n −→ 0 and |E(C(2) )|/n −→ 0.
Theorem 9.3 is the directed version of Theorem 3.19. Owing to the directed nature of the
random graph involved, it is now more involved to identify when ζ > 0. In the finite-type
case, this is determined by the largest eigenvalue of the mean-offspring matrix exceeding 1;
in the infinite-type case, this is less clear. Exercises 9.13 and 9.14 investigate the conditions
for a giant component to exist for directed Erdős–Rényi and generalized random graphs.

9.2.2 D IRECTED C ONFIGURATION M ODELS

One way to obtain a directed version of CMn (d) is to give each edge a direction, cho-
sen with probability 12 , independently of the directions of all other edges. In such a model,
however, the correlation coefficient between the in- and out-degree of vertices is close to 1,
particularly when the degrees are large (see Exercise 9.15). In real-world applications, cor-
relations between in- and out-degrees can be positive or negative, depending on the precise
application, so we should aim to formulate a model that is more general, in that we prescribe
both the in- and out-degrees of vertices.
Let d(in) = (d(in)
v )v∈[n] be a sequence of in-degrees, where dv
(in)
denotes the in-degree of
(out)
vertex v . Similarly, we let d = (dv )v∈[n] be a sequence of out-degrees. Naturally, in
(out)

order for a graph with in- and out-degree sequence d = (d(in) , d(out) ) to exist, we need that
(recall Exercise 9.2)
X X
d(in)
v = d(out)
v . (9.2.24)
v∈[n] v∈[n]

We think of d(in) v and dv

(out)
, respectively, as the number of in- and out-half-edges incident
to vertex v . The directed configuration model DCMn (d) is obtained by sequentially pairing
each in-half-edge with a uniformly chosen out-half-edge, without replacement. The resulting
graph is a random multi-graph, where each vertex v has in-degree d(in) v and out-degree dv
(out)
.
Similarly to CMn (d), DCMn (d) can have self-loops as well as multiple edges. A self-loop
arises at vertex v when one of its in-half-edges pairs to one of its out-half-edges.
We now continue to investigate the strongly connected component of DCMn (d). Let
(Dn(in) , Dn(out) ) denote the in- and out-degree of a vertex chosen uar from [n]. Assume, simi-
larly to Condition 1.7(a),(b), that
d
(Dn(in) , Dn(out) ) −→ (D(in) , D(out) ), (9.2.25)

and that
E[Dn(in) ] → E[D(in) ], and E[Dn(out) ] → E[D(out) ]. (9.2.26)

Naturally, by (9.2.24), this implies that (see Exercise 9.16)

E[D(out) ] = E[D(in) ]. (9.2.27)

9.2 Directed Random Graphs 399

Exercise 9.17 investigates the convergence of the numbers of self-loops and multi-edges in
DCMn (d).
Let
pk,l = P(D(in) = k, D(out) = l) (9.2.28)
denote the asymptotic joint in- and out-degree distribution. We refer to (pk,l )k,l≥0 simply
as the asymptotic degree distribution of DCMn (d). The distribution (pk,l )k,l≥0 plays a role
for DCMn (d) similar to the role (pk )k≥0 plays for CMn (d). We further define
X X
p?(in)
k = lpk,l /E[D(out) ], p?(out)
l = kpk,l /E[D(in) ]. (9.2.29)
l k
(in) (out)
The distributions (pk )k≥0 and (pk )k≥0 correspond to the asymptotic forward in- and
out-degrees of a uniformly chosen edge in DCMn (d).

Local Limit of the Directed Configuration Model

Let us now formulate the construction of the appropriate marked forward and backward
branching processes that arises as the local limit of DCMn (d). For the forward branching
process, we let the root have out-degree with distribution
X
p(out)
l = P (D (out)
= l) = pk,l , (9.2.30)
l≥0

whereas every other vertex except the root has independent out-degree with law (p?(out) l )l≥0 .
Further, for a vertex of out-degree l, we let the mark (corresponding to its asymptotic in-
degree) be k with probability pk,l /p(out)
l . For the marked backward branching process, we
reverse the role of in- and out-. For the marked forward–backward branching process, we
let the root have joint out- and in-degree distribution (pk,l )k,l≥0 , and define the forward and
backward processes and marks as before. We call the above branching process the marked
unimodular branching process with degree distribution (pk,l )k,l≥0 .
The following theorem describes the local convergence in DCMn (d):
Theorem 9.4 (Local convergence of DCMn (d)) Suppose that the out- and in-degrees in a
directed configuration model DCMn (d) satisfy (9.2.25) and (9.2.26). Then DCMn (d) con-
verges locally in probability in the marked forward, backward, and forward–backward sense
to the above marked unimodular branching process with degree distribution (pk,l )k,l≥0 .
It will not come as a surprise that Theorem 9.4 is the directed version of Theorem 4.1,
and Exercise 9.18 asks the reader to prove Theorem 9.4 by adapting its proof.

Giant Component in the Directed Configuration Model

Let θ and θ
(in) (out)
be the survival probabilities of the branching processes with offspring
distributions (p?(in)
k )k≥0 and (pk
?(out)
)k≥0 , respectively, and define
X X
ζ (in) = 1 − pk,l (1 − θ(in) )l , ζ (out) = 1 − pk,l (1 − θ(out) )k . (9.2.31)
k,l k,l

Then, ζ (out)
has the interpretation of the asymptotic probability that a uniform vertex has a
large forward cluster, while ζ (in) has that of a uniform vertex having a large backward cluster.
400 Related Models

Further, let
X
ψ= pk,l (1 − θ(in) )l (1 − θ(out) )k , (9.2.32)
k,l

so that ψ has the interpretation of the asymptotic probability that a uniform vertex has both
a finite forward and a finite backward cluster. We conclude that 1 − ψ is the probability that
a uniform vertex has either a large forward or a large backward cluster, and thus
ζ = ζ (out) + ζ (in) − (1 − ψ) (9.2.33)
has the interpretation of the asymptotic probability that a uniform vertex has both a large
forward and a large backward cluster. Finally, we let
∞
X X klpk,l E[D(in) D(out) ]
ν= kp?(in) = = . (9.2.34)
k=0
k
k,l
E[D (out)
] E[D(out) ]
P∞
Alternatively, ν = k=0 kp?(out)
k = E[D(in) D(out) ]/E[D(in) ] by (9.2.27). The main result
concerning the size of the giant is as follows:
Theorem 9.5 (Phase transition in DCMn (d)) Suppose that the out- and in-degrees in the
directed configuration model DCMn (d) satisfy (9.2.25) and (9.2.26).
(a) When ν > 1, ζ in (9.2.33) satisfies ζ ∈ (0, 1] and
P
|Cmax |/n −→ ζ, (9.2.35)
P P
while |C(2) |/n −→ 0 and |E(C(2) )|/n −→ 0.
P P
(b) When ν ≤ 1, ζ in (9.2.33) satisfies ζ = 0, so that |Cmax |/n −→ 0 and |E(Cmax )|/n −→
0.
Theorem 9.5 is the adaptation to DCMn (d) of the existence of the giant for CMn (d)
in Theorem 4.9. In Exercise 9.19, the reader is asked to prove that the probability that the
size of |Cmax |/n exceeds ζ + ε vanishes whenever a graph sequence converges locally in
probability in the marked forward–backward sense. In Exercise 9.20, this is used to prove
Theorem 9.5(b).

Logarithmic Typical Distances in the Directed Configuration Model

We continue by studying the small-world nature in the directed configuration model. Let
u, v ∈ [n], and let distDCMn (d) (u, v) denote the graph distance between u and v , i.e., the
minimal number of directed edges needed to connect u to v , so that distDCMn (d) (u, v) is not
necessarily equal to distDCMn (d) (v, u). The main result is as follows:
Theorem 9.6 (Logarithmic typical distances in DCMn (d)) Suppose that the out- and in-
degrees in the directed configuration model DCMn (d) satisfy (9.2.25) and (9.2.26), and
assume that
E[D(in) D(out) ]
ν= > 1. (9.2.36)
E[D(out) ]
Further, assume that
E[(Dn(in) )2 ] → E[(D(in) )2 ] < ∞, E[(Dn(out) )2 ] → E[(D(out) )2 ] < ∞. (9.2.37)
9.2 Directed Random Graphs 401

Then, conditional on o1 → o2 ,
distDCMn (d) (o1 , o2 ) P 1
−→ . (9.2.38)
log n log ν
Theorem 9.6 is the directed version of Theorem 7.1. The philosophy behind the proof
is quite similar:.A breadth-first exploration process shows that |∂Br(out) (o1 )| grows roughly
like ν r , so in order to “catch” o2 , one would need r ≈ logν n = log n/ log ν (recall the
discussion of the various directed neighborhoods below (9.2.1)). Of course, at this stage,
the branching-process approximation starts to fail, which is why one needs to grow the
neighborhoods from two sides and use that also |∂Br(in) (o2 )| grows roughly like ν r .
It would be tempting to believe that (9.2.37) is more than what is needed; this is the con-
tent of Exercise 9.21. In Exercise 9.23, the reader is asked to prove that distDCMn (d) (o1 , o2 )
= oP (log n) when ν = ∞.

Doubly Logarithmic Typical Distances in the Directed Configuration Model

We continue by studying the ultra-small-world nature in the directed configuration model,
for which we assume that there exist τ (in) , τ (out) ∈ (2, 3) and that, for all δ > 0, there exist
c1 = c1 (δ) and c2 = c2 (δ) such that, uniformly in n,
1 X (out)
d 1{d(in)
(in) (in)
c1 x−(τ −2+δ) ≤ ≤ c2 x−(τ −2−δ) ,
n v∈[n] v v >v}

(9.2.39)
1 X (in)
c1 x −(τ (out) −2+δ)
≤ d 1 (out) ≤ c2 x −(τ (out) −2−δ)
,
n v∈[n] v {dv >v}

where the upper bound holds for every x ≥ 1 while the lower bound is required to hold only
for 1 ≤ x ≤ nβ for some β > 12 . The main result is then as follows:
Theorem 9.7 (Doubly logarithmic typical distances in DCMn (d)) Suppose that the out-
and in-degrees in the directed configuration model DCMn (d) satisfy (9.2.25), (9.2.26), and
(9.2.39). Then, conditional on o1 → o2 ,
distDCMn (d) (o1 , o2 ) P 1 1
−→ + . (9.2.40)
log log n | log (τ (in) − 2)| | log (τ (out) − 2)|
Theorem 9.7 is the directed version of Theorem 7.2.

Logarithmic Diameter in the Directed Configuration Model

We next investigate the logarithmic asymptotics of the diameter of DCMn (d). Here, we
define the diameter of the digraph DCMn (d) to by
diam(DCMn (d)) = max distDCMn (d) (u, v). (9.2.41)
u,v : u→v

In order to state the result, we introduce some notation. Let

(in) (out)
f (s, t) = E[sD tD ] (9.2.42)
be the bivariate generating function of (D(in) , D(out) ). Recall that θ(in) and θ(out) are the
survival probabilities of a branching process with offspring distributions (p?(in)
k )k≥0 and
402 Related Models

(p?(out)
k )k≥0 , respectively. Write
1 1 ∂2
= f (s, t) , (9.2.43)
ν (in) E[D(in) ] ∂s∂t s=1−θ (in) ,t=1

and
1 1 ∂2
= f (s, t) . (9.2.44)
ν (out) E[D(out) ] ∂s∂t s=1,t=1−θ (out)

Then, the diameter in the directed configuration model DCMn (d) behaves as follows:
Theorem 9.8 (Logarithmic diameter in DCMn (d)) Suppose that the out- and in-degrees
in the directed configuration model DCMn (d) satisfy (9.2.25) and (9.2.26). Further, assume
that (9.2.37) holds. Then, if ν = E[D(in) D(out) ]/E[D(out) ] > 1,
diam(DCMn (d)) P 1 1 1
−→ + + . (9.2.45)
log n log ν (in)
log ν log ν (out)
Theorem 9.8 is the directed version of Theorem 7.19. The interpretation of the various
terms is similar to that in Theorem 7.19: the terms involving ν (in) and ν (out) indicate the
depths of the deepest traps, where a trap indicates that the neighborhood lives for a long
time without gaining substantial mass, i.e., it is thin. The term involving ν (in) is the depth of
the largest in-trap, so that the in-neighborhood is thin, and that involving ν (out) that of the
largest out-trap, so that the out-neighborhood is thin. These numbers are determined by first
taking r such that
1
P(|∂Br(in/out) (o)| ∈ [1, K]) ≈ Θ , (9.2.46)
n
where ∂Br(in/out) (o) corresponds to the ball of the backward r-neighborhood for ν (in) and
to the forward r-neighborhood for ν (out) , while K is arbitrary and large. Owing to large
deviations for supercritical branching processes, one can expect that
P(|∂Br(in/out) (o)| ∈ [1, K]) ≈ (ν (in/out) )r . (9.2.47)
Then we can identify r(in) = logν (in) (n) and r(out) = logν (out) (n). The solutions to (9.2.47)
are given by (9.2.43) and (9.2.44). For those special vertices u, v for which |∂Br(in) (in) (u)| ∈

[1, K] and |∂Br(out)

(out) (v)| ∈ [1, K], it then takes around log ν (n) steps to connect ∂B (in)
r (in)
(u)
(out)
to ∂Br(out) (v), thus explaining the asymptotics in Theorem 9.8. Of course, proving that this
heuristic is correct is quite a bit harder.

9.2.3 D IRECTED P REFERENTIAL ATTACHMENT M ODELS

A directed preferential attachment model was introduced in [V1, Section 8.9]. It is known
that the degrees obey a power law similar to that in [V1, Theorem 8.3]. For the definition of
this model and the available results on degree structure, we refer the reader to [V1, Section
8.9]. Unfortunately, the type of properties investigated in the present volume have so far
not been analyzed for this random graph model. In particular, there is no description of the
strongly connected component, nor of the typical distances and diameters of this model.
One can also interpret normal preferential attachment models as directed graphs by orient-
ing edges from young to old. This can be a useful perspective, for example when modeling
9.2 Directed Random Graphs 403

temporal networks in which younger vertices can connect only to older vertices, such as in
citation networks (recall Section 9.1.1). The connectivity structure of such directed versions
is not particularly interesting. For example, the strongly connected component is always
small (see Exercise 9.24). Below, we discuss the PageRank of this model.

PageRank of Directed Preferential Attachment Models

We close this section by discussing a very attractive result about PageRank on preferen-
tial attachment models. Recall the directed version of the preferential attachment model
introduced above and the definition of the PageRank vector (Rv(Gn ) )v∈V (Gn ) in Section 9.2.
Theorem 9.1 shows that the PageRank of a uniform vertex converges. The next theorem
describes the power-law structure of the limiting PageRank:

Theorem 9.9 (Power-law PageRank distribution of directed PAM) Let (Rv(Gn ) )v∈V (Gn ) be
the PageRank vector with damping factor α of the directed preferential attachment model
Gn with δ ≥ 0 and m ≥ 1, where edges in the normal preferential attachment model are
directed from young to old. Let R∅ be the limiting distribution of the PageRank Ro(Gnn ) of a
uniform vertex, as derived in Theorem 9.1. Then there exist constants 0 < c1 ≤ c2 < ∞
such that, for any r ≥ 1,

c1 r−(2+δ/m)/(1+(m+δ)α/m) ≤ µ (R∅ > r) ≤ c2 r−(2+δ/m)/(1+(m+δ)α/m) , (9.2.48)

where µ is the law of the local limit of this directed preferential attachment model.

Theorem 9.9 implies that the PageRank power-law hypothesis, as explained in [V1, Sec-
tion 1.5] and restated in Section 9.2, is false in general. Indeed, the PageRank distribution
obeys a power law, as formulated above, with exponent τ (PR) = 1 + (2 + δ/m)/(1 + (m +
δ)α/m), while the in-degree obeys a power law with exponent τ = 3 + δ/m. Note that
τ (PR) → 1/α for δ → ∞, while τ (in) = τ = 3 + δ/m → ∞ for δ → ∞. Thus, the
power-law exponent of the directed preferential attachment PageRank remains uniformly
bounded independently of δ , while that of the in-degree distribution grows infinitely large.
This suggests that the PageRank distribution could have power-law tails even for random
graphs with thin-tailed in-degree distributions.
Since the PageRank distribution obeys a power law, it is of interest to investigate the
maximal PageRank in a network of size n. The theorem below gives a result for the very
first vertex:

Theorem 9.10 (PageRank of first vertex in a directed preferential attachment tree) Let
(Rv(Gn ) )v∈V (Gn ) be the PageRank vector with damping factor α of the directed preferential
attachment tree Gn with δ ≥ 0 and m = 1, defined above. Then there exists a limiting
random variable R such that
a.s.
n−(1+(1+δ)α)/(2+δ) R1(Gn ) −→ R. (9.2.49)

Theorem 9.10 shows that the PageRank of vertex 1 has the same order of magnitude as
the maximum of n random variables with power-law exponent τ (PR) = 1+(2+δ/m)/(1+
(m + δ)α/m) would have. It would be of interest to extend Theorem 9.10 to other values
of m, as well as to the the maximal PageRank maxv∈[n] Rv(Gn ) .
404 Related Models

9.3 R ANDOM G RAPHS WITH C OMMUNITY S TRUCTURE : G LOBAL C OMMUNITIES

Many real-world networks have communities that are global in size. For example, when
dividing science into its core fields, citation networks have just such a global community
structure, as discussed in Section 9.1.1. In Belgian telecommunication networks of who
calls whom, the division into the French and the Flemish speaking parts is clearly visible,
Blondel et al. (2008), while in US politics the division into Republicans and Democrats
plays a pronounced effect on the network structure of social interactions between politicians,
Mucha et al. (2010).
In this section we discuss random graph models for networks with a global community
structure. The section is organized as follows. In Section 9.3.1 we discuss stochastic block
models, which are the models of choice for networks with community structures. In Section
9.3.2 we consider degree-corrected stochastic block models, which are similar to stochas-
tic block models but allow for more pronounced inhomogeneity in the degree structure. In
Sections 9.3.3 and 9.3.4, we study configuration models and preferential attachment models
with global communities, respectively. We introduce the models, state the most important
results in them, and also discuss the topic of community detection in such models, a topic
that has attracted considerable attention owing to its practical importance.

9.3.1 S TOCHASTIC B LOCK M ODEL

We have already encountered stochastic block models as inhomogeneous random graphs
with finitely many types in Chapter 3. Here we repeat the definition, after which we discuss
the extremely interesting and challenging community detection results.
Fix t ≥ 2 and suppose we have a graph with t different types of vertices. Let S = [t].
Let ns denote the number of vertices of type s, and let µn (s) = ns /n. Let IRGn (κ) be the
random graph where two vertices of types s and r, respectively, are joined by an edge with
probability n−1 κ(s, r) (for n ≥ maxs,r∈[t] κ(s, r)). Then κ is equivalent to a t × t matrix,
and the random graph IRGn (κ) has vertices of t different types. We assume that the type
distribution µn satisfies, for all s ∈ [t],

lim µn (s) = lim ns /n = µ(s). (9.3.1)

n→∞ n→∞

Exercise 3.11 then shows that the resulting random graph is graphical as in Definition 3.3(a),
so that the results in Chapters 3 and 6 apply. As a result, we will not spend much time on the
degree distribution and the giant and graph distances in this model, as they were addressed
there. Exercise 9.25 elaborates on the degree structure, while Exercise 9.26 investigates fur-
ther the conditions for a giant to exist.
Let us mention that, for the stochastic block model to be a good model for networks with
a global community structure, one would expect that the edge probabilities of the internal
edges between vertices of the same type are larger than those of the external edges between
vertices of different types. In terms of formulas, this means that κ(s, s) > κ(s, r) for all
s, r ∈ S = [t] with s 6= r. For example, the bipartite Erdős–Rényi random graph has a
structure that is quite opposite to a random graph with global communities (as vertices only
have neighbors of a different type).
9.3 Random Graphs with Community Structure: Global Communities 405

Community Detection in Stochastic Block Models

We next discuss the topic of community detection in stochastic block models. Before we can
say anything about when it is possible to detect communities, we must first define what this
means. A community detection algorithm is an assignment σ̂ : [n] 7→ [t], where σ̂(v) = s
means that the algorithm assigns type s to vertex v . In what follows, we assume that the
communities have equal size. Then, in a random guess of group membership, a vertex is
guessed to be of the correct type with probability 1/t. As a result, we are impressed with the
performance of a community detection algorithm only when it does far better than random
guessing. This explains the following definition:
Definition 9.11 (Solvable community detection) Consider a stochastic block model where
there are the same number of vertices of each of the r types, and where σ(v) denotes the
type of vertex v ∈ [n]. We call a community detection problem solvable when there exists
an algorithm σ̂ : [n] 7→ [t] and an ε > 0 such that, whp as n → ∞,
1 Xh 1i
max 1{σ̂(v)=(p◦σ)(v)} − ≥ ε, (9.3.2)
p : [t]→[t] n t
v∈[n]

where the maximum is over all possible permutations p from [t] to [t]. If such an algorithm
does not exist then we call the problem unsolvable. J
The maximum over permutations of the types in (9.3.2) is due to the fact that the type la-
bels generally have no meaning in real-world networks, so that they can be permuted without
changing anything. Exercise 9.27 shows that (9.3.2) is indeed false for random guessing.
Community detection is the most difficult when the degree distributions of vertices of all
the different types are the same. This is not surprising, as otherwise one may aim to classify
on the basis of the degrees of the graph. As a result, from now on, we assume that the
expected degrees of all types of vertices are the same. Some ideas about how one can prove
that the problem is solvable for unequal expected degrees can be obtained from Exercises
9.28 and 9.29.
We start by considering the case where there are just two types, so that we can take the
edge probability puv to be a/n for vertices of the same type, and b/n for vertices of opposite
types. Here we think of a > b. The question whether community detection is solvable is
answered in the following theorem:
Theorem 9.12 (Stochastic block model threshold) Take n to be even. Consider a stochastic
block model of two types, each having n/2 vertices, where the edge probability puv is a/n
for vertices of the same type, and b/n for vertices of opposite types, where a > b. Then, the
community detection problem is solvable as in Definition 9.11 when
(a − b)2
> 1, (9.3.3)
2(a + b)
while it is unsolvable when
(a − b)2
< 1. (9.3.4)
2(a + b)
Theorem 9.12 is quite surprising. Indeed, it shows that not only should a > b in order
406 Related Models

to have a chance to perform community detection but it also should be sufficiently large
compared to a + b. Further, the transition in Theorem 9.12 is sharp, in the sense that (9.3.3)
and (9.3.4) complement each other. It is unclear what happens in the critical case when
(a − b)2 = 2(a + b). The solvable case in (9.3.3) is sometimes called an “achievability
result,” the unsolvable case in (9.3.4) an “impossibility result.” We do not give the full proof
of Theorem 9.12, as this is quite involved. The proof of the solvable case also shows that the
proportion of pairs of vertices that are correctly classified to be of the same type converges
to 1 when (a − b)2 /[2(a + b)] grows large.
In Exercise 9.30, the reader is asked to show that (9.3.3) implies that a − b > 2 (and thus
a + b > 2), and to conclude that a giant thus exists in this setting.
While the results for a general number of types r are less complete, there is an achiev-
ability result when puv = a/n for vertices of the same type, and puv = b/n for all vertices
of different types, in which (9.3.3) is replaced by
(a − b)2
> 1, (9.3.5)
t(a + (t − 1)b)
which indeed reduces to (9.3.3) for t = 2. Also, many results exist about whether efficient
algorithms for community detection exist. In general, this means that not only should a
detection algorithm exist that achieves (9.3.2), but it should also be computable in reasonable
time (say Θ(n log n) for fixed t). We refer to Section 9.6 for a more elaborate discussion on
such results.
Let us continue this subsection by explaining how thresholds such as (9.3.3) and (9.3.5)
can be interpreted. Interestingly, there is a close connection with multi-type branching pro-
cesses. Consider a branching process with finitely many types. Kesten and Stigum (1966)
asked in this context when it would be possible to estimate the type of the root while ob-
serving the types of the vertices in generation k for very large k . In this case, the expected
offspring matrix equals Ms,r = κ(s, r)µ(r), which is a t × t matrix. Let λ1 > λ2 be the
two largest eigenvalues of M. Then, the Kesten–Stigum criterion is that estimation of the
root type is possible with probability strictly larger than 1/t when
λ22
> 1. (9.3.6)
λ1
Next, consider a general finite-type inhomogeneous random graph, with limiting type
distribution µ(s) and expected offspring matrix Ms,r = κ(s, r)µ(r). Obviously, the local
limit of the stochastic block model is the above multi-type branching process, so a link
between the two detection problems can indeed be expected. Under the condition in (9.3.6),
it is believed that the community detection problem is solvable and even that communities
can be detected in polynomial time. For t = 2, this is sharp, as we have seen above. For
t ≥ 3, the picture is much more involved. It is believed that, for t ≥ 4, a double phase
transition occurs: detection should be possible in polynomial time when λ22 /λ1 > 1, much
harder but still possible (i.e., the best algorithms take a time that is exponentially long in the
size of the network) when λ22 /λ1 > c? for some 0 < c? < 1, and information-theoretically
impossible when λ22 /λ1 < c? . However, this is not yet known in the general case.
The way to get from a condition like (9.3.6) to an algorithm for community detection is
9.3 Random Graphs with Community Structure: Global Communities 407

by using the two largest eigenvalues of the so-called non-backtracking matrix of the random
graph defined below, and to obtain an estimate for the partition by using the eigenvectors
corresponding to these eigenvectors. The leading eigenvalue converges to λ1 in probabil-
ity, while the second is bounded by |λ2 |. This, together with a good approximation to the
corresponding eigenvectors, suggests a specific estimation procedure that we explain now.
Let B be the non-backtracking matrix of the graph G. This means that B is indexed by
~
the oriented edges E(G) = {(u, v) : {u, v} ∈ E(G)}, so that B = (Be,f )e,f ∈E(G)
~ . For an
~
edge e ∈ E(G) , denote e = (e1 , e2 ), and write
Be,f = 1{e2 =f1 ,e1 6=f2 } , (9.3.7)
which indicates that e ends in the vertex in which f starts, but e is not the reversal of f . The
latter property explains the name non-backtracking matrix.
Now we come to the eigenvalues. We restrict ourselves to the case where t = 2, even
though some results extend with modifications to higher values of r. Let λ1 (B) and λ2 (B)
denote the two leading eigenvalues of B. Then, for the stochastic block model,
P P
λ1 (B) −→ λ1 , λ2 (B) −→ λ2 , (9.3.8)
where we recall that λ1 > λ2 are the two largest eigenvalues of M, where Ms,r =
κ(s, r)µ(r). It turns out that for the Erdős–Rényi random graph with edge probability
P
(a + b)/(2n) ≡ α/n, the first eigenvalue λ1 (B) √ satisfies λ1 (B) −→ λ1 = α, while
the second eigenvalue λ2 (B) satisfies λ2 (B) ≤ α + oP (1). Note that this does not follow
from (9.3.8), since M is a 1 × 1 matrix. For the stochastic block model with t = 2 instead,
P
λ2 (B) −→ λ2 = (a − b)/2. Thus, we can expect that the graph is a stochastic block model
when
λ2 (B)2
> 1, (9.3.9)
λ1 (B)
while if the reverse inequality holds then we are not even sure whether the model is an
Erdős–Rényi random graph or a stochastic block model. In the latter case the graph is so
random and homogeneously distributed that we are not able to make a good estimate of
the types of the vertices, which strongly suggests that this case is unsolvable. This at least
informally explains (9.3.6).
Finally, we explain how the above analysis of eigenvalues can be used to estimate the
~
types. Assume that λ22 /λ1 > 1. Let ξ2 (B) : E(G) → R denote the normalized eigenvector
corresponding to λ2 (B). We fix a constant θ > 0. Then, we estimate that σ̂(v) = 1 when
X θ
ξk (e) ≥ √ (9.3.10)
e : e =v
2
n

and otherwise that σ̂(v) = 2 for some deterministic threshold θ. This estimation can then
be shown to achieve (9.3.2) owing to the sufficient separation of the eigenvalues.

9.3.2 D EGREE -C ORRECTED S TOCHASTIC B LOCK M ODEL

While the stochastic block model is a nice model for networks with communities, it has de-
grees that have Poisson tails (recall Theorem 3.4). Thus, in order to account for the abundant
408 Related Models

inhomogeneities present in real-world networks, an adaptation of the stochastic block model

has been proposed in which the degrees are more flexible. This is called the degree-corrected
stochastic block model and is an inhomogeneous random graph that takes features of rank-1
inhomogeneous random graphs as well as of stochastic block models. Just like the rank-1
setting, there are various possible versions of the model. Here we stick to the version for
which the strongest community detection results have been proved.
For each vertex v we sample a random vertex weight Xv , where we assume that (Xv )v∈[n]
are iid. Conditional on (Xv )v∈[n] , we then assume that the edge between vertices u and v is
independently present with probability
Xu Xv
puv = κ(σ(u), σ(v)) ∧ 1, (9.3.11)
n
with σ(u) ∈ [t] the type of vertex u and κ : [t] × [t] → [0, ∞) the type kernel.
Let us first discuss this setting in the simplest case where t = p 1. When the weights
Pv )v∈[n] in (9.3.11) are fixed we could take them as xv = wv n/`n , where `n =
(x
v∈[n] wv , and obtain the Chung–Lu model CLn (w). This intuition is helpful in what fol-
p
lows. Unfortunately, if (wv )v∈[n] are iid and xv = wv n/`n then (xv )v∈[n] are not exactly
iid, so it is not obvious how to rigorously transfer results between the settings. As a result,
we will stick to the setting in (9.3.11).
From now on, we assume that there are t types of vertices, each of which occurs roughly
(or precisely, depending on the setting) equally often. We also assume that κ(σ, σ 0 ) takes
on two values: κ(s, s) = a, and κ(s, r) = b when s 6= r. This leads us to a model very
similar to the stochastic block model, except that the weight structure (Xv )v∈[n] adds some
additional inhomogeneity to the vertex roles, according to which vertices with high weights
generally have larger degrees than those with small weights.
Exercise 9.32 and 9.33 study the degree structure of the degree-corrected stochastic block
model, while Exercise 9.34 investigates conditions for a giant to exist in this model.

Community Detection in Degree-Corrected Stochastic Block Models

We now come to the main result of this section, which involves the solvability of the estima-
tion in degree-corrected stochastic block models:

Theorem 9.13 (Degree-corrected stochastic block model threshold) Take n to be even.

Consider a stochastic block model of two types, each having n/2 vertices, where the edge
probabilities are given by (9.3.11), with κ(s, s) = a, and κ(s, r) = b for s 6= r. Without
loss of generality, assume that E[X] = 1. The community detection problem is unsolvable
as in Definition 9.11 when
(a − b)2 E[X 2 ]
< 1. (9.3.12)
2(a + b)
Assume further that there exists a β > 8 such that
1
P(X > x) ≤ . (9.3.13)
xβ
9.3 Random Graphs with Community Structure: Global Communities 409

Then the community detection problem is solvable as in Definition 9.11 if

(a − b)2 E[X 2 ]
> 1. (9.3.14)
2(a + b)
The impossibility result in (9.3.12) in Theorem 9.13 is extended to all t ≥ 2 under the
condition that (a − b)2 E[X 2 ] < t(a + b). The crux of the proof is to show that, for two
vertices o1 , o2 chosen uar, and with Gn = ([n], E(Gn )), the realization of the graph of the
degree-corrected stochastic block model, we have
1
P(σ(o1 ) = s | σ(o2 ), Gn ) −→
P
(9.3.15)
t
for every s ∈ [t]. Thus, the type of o2 gives us, asymptotically, no information about the
type of o1 . This should make detection quite hard, and thus intuitively explains (9.3.12). The
proof of the achievability result follows a spectral argument similar to that of the ordinary
stochastic block model considered in Section 9.3.1, and we refrain from discussing it further
here. We can expect the power-law bound in (9.3.13) to be too strict, in fact, and the results
to extend to slightly milder assumptions.

9.3.3 C ONFIGURATION M ODELS WITH G LOBAL C OMMUNITIES

The stochastic block model is an adaptation of the Erdős–Rényi random graph that incorpo-
rates global communities, and the degree-corrected stochastic block model is an adaptation
of the Chung–Lu model. In a similar way, one can adapt the configuration model to incor-
porate global communities. Surprisingly, this has not attracted substantial attention in the
literature, which is why this section is relatively short. We will discuss an obvious setting in
which we let every vertex v ∈ [n] have a type σ(v) ∈ [t].
For a vertex v ∈ [n] and a type s ∈ [t], we let d(s)v denote the number of half-edges to
be connected from vertex v to vertices ofP type s. We first specify the structure of the graph
between vertices of the same type. Let v : σ(v)=s d(s) v be the total number of half-edges
between vertices of type s, and assume that this number is even. We also assume that the
graph on {v : σ(v) = s} is a configuration model with ns = #{v ∈ [n] : σ(v) = s}
vertices and degrees (d(s)
v )v : σ(v)=s .
For the structure of the edges between vertices of different types, we recall the bipartite
configuration model. We suppose that we have vertices of two types, say types 1 and 2, and
that there are n1 and n2 vertices of these two types, respectively. Let the type of vertex v
again be given by σ(v) ∈ {1, 2}. Vertices of type 1 have degrees (dv )v : σ(v)=1 and vertices
of type 2 have degrees (dv )v : σ(v)=2 . Assuming that
X X
dv = dv , (9.3.16)
v : σ(v)=1 v : σ(v)=2

we pair the half-edges incident to vertices of type 1 uar to those incident to vertices of type
2, without replacement. As a result, there are edges only between vertices of types 1 and 2,
and the total number of edges is given in (9.3.16).
Using the above definition, we let the edges between vertices of types s, r be given by a
bipartite configuration model between the vertices in {v : σ(v) = s} and {v : σ(v) = r},
410 Related Models

where the former have degrees (d(r)

v )v : σ(v)=s and the latter have degrees (dv )v : σ(v)=r . To
(s)

make the construction feasible, we assume that

X X
d(r)
v = d(s)
v for all s, r ∈ [t]. (9.3.17)
v : σ(v)=s v : σ(v)=r

Special cases of this model are the configuration model, for which t = 1, and the bipartite
configuration model itself, for which t = 2 and d(r)
v = 0 for every v with σ(v) = r .
Let µn (s) denote the proportion of vertices of type s. We again assume, as in (9.3.1) that
the type distribution µn (s) = ns /n satisfies that, for all s ∈ [t],
lim µn (s) = lim ns /n = µ(s). (9.3.18)
n→∞ n→∞

Also, in order to describe the local and global properties of the configuration model with
global communities, one should make assumptions similar to those for the original configu-
ration model in Condition 1.7 but now for the matrix of degree distributions. For example, it
is natural to assume that, for all s ∈ [t], the joint distribution function of all the type degrees
satisfies
1 X
Fn(s) (x1 , . . . , xt ) = 1 (1) (t) → F (s) (x1 , . . . , xt ), (9.3.19)
ns v : σ(v)=s {dv ≤x1 ,...,dv ≤xt }

for all x1 , . . . , xt ∈ R and some limiting joint distribution F (s) : Rt → [0, 1]. Further, it is
natural to assume that an adaptation of Condition 1.7(b) holds for all these degrees, such as
that, for all s, r ∈ [t], we have
1 X
d(r) → E[D(s,r) ], (9.3.20)
ns v : σ(v)=s v

where D(s,r) is the rth coordinate of the random vector whose distribution function is given
by F (s) , i.e.,
P(D(s,r) ≤ xr ) = lim F (s) (x1 , . . . , xt ). (9.3.21)
x1 ,...,xr−1 ,xr+1 ,...,xt →∞

While the configuration model, as well as its bipartite version, has attracted substantial at-
tention, the above extension has not. Exercises 9.35–9.37 informally investigate some of
properties of this extension.

9.3.4 P REFERENTIAL ATTACHMENT M ODELS WITH G LOBAL C OMMUNITIES

The preferential attachment model with global communities is naturally a dynamic random
graph model, where now every vertex n has a type σ(n) ∈ [t] with t the total number of
communities. The graph is considered to be directed, with all edges directed from young
to old. Each vertex comes in with out-degree equal to m, as in the normal preferential at-
tachment model. We will see that the extension studied in the literature is most similar to
(PAn(m,0) (b))n≥0 , but later on an extension to (PA(m,δ)
n (b))n≥0 will be discussed as well. In
a similar way, extensions to (PA(m,δ)
n (a))n≥0 can be formulated.
We start with an initial graph at time n0 given by Gn0 , in which we assume that every
vertex has out-degree m and a label σ(v) for all v ∈ [n0 ]. The graph then evolves as follows.
9.3 Random Graphs with Community Structure: Global Communities 411

Let the graph Gn at time n be given. At time n + 1, let vertex n + 1 have a type σ(n + 1)
that is chosen in an iid way from [t], where

µ(s) = P(σ(n + 1) = s) (9.3.22)

is the type distribution. We consider the m edges incident to vertex n + 1 to be half-edges,

similarly to the construction of the configuration model. We give all the out-half-edges in-
cident to a vertex v the label σ(v). Further, let the matrix κ : [t] × [t] → [0, ∞) be an
“affinity” matrix. If σ(n + 1) = s then give each half-edge of label r a weight κ(s, r).
Choose a half-edge according to these weights, meaning that a half-edge x with label r has
a probability proportional to κ(s, r) of being chosen as the pair of any of the m half-edges
incident to vertex n + 1 when σ(n) = s. We do this for all m half-edges incident to vertex
n + 1 independently. This creates the graph Gn+1 at time n + 1. The dynamics is iterated
indefinitely.

Degree Distribution of PAMs with Global Communities

Let ηn (s) denote the fraction of half-edges with label s, and let ηn = (ηn (s))s∈[t] denote
the empirical distribution of the labels of the half-edges. For a probability distribution η on
[t], and a type s ∈ [t], let
X κ(r, s)η(s)
hs (η) = µ(s) + µ(r) P 0 0
− 2η(s). (9.3.23)
r∈[t] r 0 ∈[t] κ(s, r )η(r )

Then, the half-edge label distribution satisfies

a.s.
ηn (s) −→ η ? (s), where hs (η ? ) = 0 ∀s ∈ [t]. (9.3.24)

The probability distribution η ? that solves hs (η ? ) = 0 for all s ∈ [t] can be shown to be
unique. We next define the crucial parameters in the model.
For s, r ∈ [t], let
κ(s, r)
θ? (s, r) = P 0 ? 0
, (9.3.25)
r 0 ∈[t] κ(s, r )η (r )

and write
X
θ? (s) = µ(r)θ? (s, r). (9.3.26)
r∈[t]

We let ns = #{v : σ(v) = s} denote the type count. Next, we study the degree distri-
bution in the above preferential attachment model with global communities. For s ∈ [r],
define
1 X
Pk(s) = 1{Dv (n)=k,σ(v)=s} , (9.3.27)
ns v∈[n]

to be the degree distribution of the types in the model, where Dv (n) denotes the degree of
vertex v at time n and ns equals the number of vertices of type s ∈ [t]. The main result on
the degree distribution is as follows:
412 Related Models

Theorem 9.14 (Degrees in preferential attachment models with global communities) In

the above preferential attachment models with global communities, for every s ∈ [r],
a.s.
Pk(s) (n) −→ pk (θ? (s)), (9.3.28)
where θ? (s) is defined in (9.3.26) and where, for every θ > 0, we write
Γ(m + 1/θ) Γ(k)
pk (θ) = . (9.3.29)
θΓ(m) Γ(k + 1 + 1/θ)
Exercise 9.38 shows that the limiting distribution in (9.3.29) is indeed a probability dis-
tribution. Theorem 9.14 shows that the degree distribution has a power-law tail, as can be
expected from the fact that the preferential attachment mechanism is intrinsically present.
Moreover, (9.3.28) also shows that the degrees of vertices of type s satisfies a power law
with exponent that depends sensitively on the type through the key parameters (θ? (s))s∈[t] .
Exercise 9.39 shows that the global degree distribution also converges, as can be expected by
the convergence of the degree distribution of the types. Further, Exercise 9.40 shows that the
global degree distribution has a power-law tail with exponent τ = 1 + 1/ maxs∈[r] θ? (s),
provided that µ(s? ) > 0 for at least one s? ∈ [t] satisfying θ? (s? ) = maxs∈[r] θ? (s).
Related properties of preferential attachment models with global communities seem not
to have been investigated, so we will move to the community detection problem.

Community Detection in PAMs with Global Communities

It is important to discuss what we assume to be known in a community detection problem.
We think of the graph as being directed, i.e., that the edges are directed from young to old.
Further, we assume that m and the probability distribution (µ(s))s∈[t] are known. Finally,
and probably most importantly, we assume that the labels or ages of the vertices in the graph
are known. Some of these parameters can be estimated (m being the easiest). While for
stochastic block models it is clear that the case where the communities are all equally large
and have the same inter- and intra-community edge probabilities is the most challenging, this
is not obvious in the present case. However, to mimic the stochastic block model setting,
one can keep in mind the case where µ(s) = 1/t as a key example. Also, the setting
where κ(s, s) = a for all s ∈ [t] and κ(s, r) = b for all s, r ∈ [t] with s 6= r is
particularly interesting; here, to model community structure, to take a > b is natural. By
scaling invariance, we may assume that b = 1. In this case, η ? (s) = 1/t for all s ∈ [t],
θ? (s, r) = ar/[2(a+t−1)] for all s, r ∈ [t] with s 6= r, while θ? (s, s) = a/[2(a+t−1)].
Also, θ? (s) = 12 for all s ∈ [t].
The aim is to estimate the type σ(v) for all v ∈ [n] from the above information. This can
be done by several algorithms. One algorithm performs an estimation of the label of v on
the basis of the neighbors of v . A second algorithm does it on the basis of the degree of v at
time n. Let Errn denote the fraction of errors in the above algorithms. It can be shown that
P
Errn −→ Err, (9.3.30)
for some limiting constant Err depending on the algorithm. The precise form of Err is
known but it is difficult to obtain rigorously as it relies on a continuous-time branching-
process approximation of the graph evolution. In particular, in the setting where µ(s) = 1/t,
9.4 Random Graphs with Community Structure: Local Communities 413

κ(s, s) = a > 1 for all s ∈ [t], and κ(s, r) = 1 for all s, r ∈ [t] with s 6= r, it is unclear
whether Err < (t − 1)/t.
Many more detailed result can be proved, for example that the probability that a vertex
label of vertex v is estimated wrongly converges uniformly for all v ∈ [n] \ [δn] for any
δ > 0. Also, there exists an algorithm that estimates σ(v) correctly whp provided that
v = o(n). We refrain from discussing such results further.

9.4 R ANDOM G RAPHS WITH C OMMUNITY S TRUCTURE : L OCAL C OMMUNITIES

In the previous section we investigated settings where the models have a finite number of
communities, making the communities global. This setting is realiztic when we would like
to partition a network of choice into a finite number of parts, for example corresponding
to the main scientific fields in citation or collaboration networks, or the continents in the
Internet. However, in many other settings this is not realiztic. Indeed, most communities of
social networks correspond to smaller entities, such as school classes, families, sports teams,
etc. In most real-world settings, it is not even clear what communities look like. As a result,
community detection has become an art.
The topic is relevant, since most models (including the models with global community
structure from Section 9.4) have rather low clustering. For example, consider a general in-
homogeneous random graph IRGn (κn ) with kernel κn . Assume that κn (x, y) ≤ n. Then,
the expected number of triangles in an IRGn (κn ) is close to
1 X
E[# triangles in IRGn (κn )] = κn (xi , xj )κn (xj , xk )κn (xk , xi ), (9.4.1)
6n3 i,j,k∈[n]

Under relatively weak conditions on the kernel κn , it follows that

E[# triangles in IRGn (κn )]
1
Z
→ κ(x1 , x2 )κ(x2 , x3 )κ(x3 , x1 )µ(dx1 )µ(dx2 )µ(dx3 ). (9.4.2)
6 S3
Therefore, the clustering coefficient converges to zero at speed 1/n. In many real-world
networks, particularly in social networks, the clustering coefficient is strictly positive. See
Figure 9.9 for the clustering coefficients in the KONECT data base.
In this section we discuss random graph models in which most communities are quite
small, in that the average community size is bounded. However, since the community sizes
can also be quite large, we might call them mesoscopic rather than microscopic. See Figures
9.10 and 9.11 for some empirical data on real-world networks. Figure 9.10 shows that there
is enormous variability in the tail probabilities of the degree distributions, community-size
distribution, and inter-community degrees in real-world networks. Figure 9.11 shows that the
edge-densities of communities generally decrease with their sizes, where the edge density of
a community of size s is given by 2ein /[s(s − 1)], with ein the number of edges within the
community. Here, the communities were extracted (or detected) using the so-called Louvain
method.
We conclude that we need to be flexible in our community structure to be able to real-
414 Related Models

0.9

0.8

Clustering coefficient
0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
105 106 107 108
Size

Figure 9.9 Clustering coefficients in the 727 networks of size larger than 10,000
from the KONECT data base.

iztically model the community structure in real-world networks. Below, we consider several
models that attempt to do so. In Section 9.4.1 we start by discussing inhomogeneous random
graphs with community structures. We continue in Section 9.4.2 by describing the hierar-
chical configuration model as well as some close cousins; this is followed by a discussion
of random intersection graphs in Section 9.4.3 and exponential random graphs in Section
9.4.4.

9.4.1 I NHOMOGENEOUS R ANDOM G RAPHS WITH L OCAL C OMMUNITIES

In this subsection we discuss a model, similar to the inhomogeneous random graph
IRGn (κn ), that incorporates clustering. The idea behind this model is that instead of only
adding edges independently, we can also add other graphs on r vertices in an independent
way. For example, we could study a graph where each pair of vertices is independently con-
nected with probability λ/n, as for ERn (λ/n), but also each collection of triples forms a
triangle with probability µ/n2 , independently for all triplets and independently of the sta-
tus of the edges. Here the exponent 1/n2 is chosen so as to make the expected number of
triangles containing a vertex bounded. See Exercise 9.41 for the clustering in such random
graphs. In social networks, complete graphs of size 4, 5, etc., are present more often than in
the usual random graph. Therefore, we also wish to add those independently.

Model Introduction
We will repeatedly make use of notation from Chapter 3. Let F consist of one representative
of each isomorphism class of finite connected graphs, chosen so that if F ∈ F has r vertices
then V (F ) = [r] = {1, 2, . . . , r}. Simple examples of such an F are the complete graphs
on r vertices, but other examples are also possible. Recall that S denotes the type space.
Given F ∈ F with r vertices, let κF : S r → [0, ∞) be a measurable function. The function
κF is called the kernel corresponding to F . A sequence κ e = (κF )F ∈F is a kernel family.
Let κ be a particular kernel family and n an integer. We define a random graph IRGn (κ
e e)
9.4 Random Graphs with Community Structure: Local Communities 415

(a) (b)
100 100
degree degree
10−1
s 10−1
s
k k
10−2 10−2
P(X > x)

P(X > x)
10−3 10−3

10−4 10−4

10−5 10−5

10−6 0 10−6 0
10 101 102 103 10 101 102 103 104 105
x x
(c) (d)
100 100
degree degree
10−1
s 10−1
s
k k
10−2 10−2
P(X > x)

P(X > x)

10−3 10−3

10−4 10−4

10−5 10−5

10−6 0 10−6 0
10 101 102 103 104 10 101 102 103 104
x x

Figure 9.10 Tail probabilities of the degrees, community sizes s, and

inter-community degrees k in real-world networks. (a) A MAZON co-purchasing
network. (b) G OWALLA social network. (c) English word relations. (d) G OOGLE
web graph. Figures taken from Stegehuis et al. (2016b).

with vertex set [n]. First let x1 , x2 , . . . , xn ∈ S be iid with distribution µ. Given x =
(x1 , . . . , xn ), construct IRGn (κ e) as follows, starting with the empty graph. For each r and
each F ∈ F with |V (F )| = r, and for every r-tuple of distinct vertices (v1 , . . . , vr ) ∈ [n]r ,
add a copy of F on the vertices v1 , . . . , vr (with vertex i of F mapped to vi ) with probability
κ (x , . . . , x )
F v1 vr
p(v1 , . . . , vr ; F ) = ∧ 1, (9.4.3)
nr−1
all these choices being independent. We assume throughout that κF is invariant under per-
mutations of the vertices of the graph F .
The reason for dividing by nr−1 in (9.4.3) is that we wish to consider sparse graphs.
Indeed, our main interest is the case when IRGn (κ e) has O(n) edges. As it turns out, we can
be slightly more general; however, when κF is integrable (which we assume), the expected
number of added copies of each graph F is O(n). Below, all incompletely specified integrals
are taken with respect to the appropriate r-fold product measure µr on S r .
In the special case where all κF are zero apart from κK2 , the kernel corresponding to
an edge, we recover (essentially) a special case of the inhomogeneous random graph model
discussed in Chapter 3. In this case, given x, two vertices i and j are joined with probability
κK2 (xi , xj ) + κK2 (xj , xi ) (κK2 (xi , xj ) + κK2 (xj , xi ))2

+O . (9.4.4)
n n2
416 Related Models

(a) (b)

10−1
10−1
2ein/(s(s − 1))

2ein/(s(s − 1))
10−2 10−2

10−3
10−3

101 102 103 104 101 102 103 104

s s
(c) (d)
100 100
2ein/(s(s − 1))

2ein/(s(s − 1))
10−1 10−1

10−2
10−2

10−3
10−3

101 102 103 104 101 102 103 104

s s

Figure 9.11 The relation between the community edge density 2ein /(s2 − s) and
the community size s can be approximated by a power law. (a) A MAZON
co-purchasing network, (b) G OWALLA social network, (c) English word relations,
(d) G OOGLE web graph. Figures taken from Stegehuis et al. (2016b).

The correction term will never matter, so we may as well replace κK2 by its symmetrized
version.
For any kernel family κ
e, let κe be the corresponding edge kernel, defined by

κe (x, y) (9.4.5)
X X Z
= κF (x1 , . . . , xi−1 , x, xi+1 , . . . , xj−1 , y, xj+1 , . . . , x|V (F )| ),
F {i,j}∈E(F ) S |V (F )\{i,j}|

where the second sum runs over all 2|E(F )| ordered pairs (i, j) with {i, j} ∈ E(F ),
and we integrate over all variables apart from x and y . Note that the sum need not always
converge. Since every term is positive this causes no problems: we simply allow κe (x, y) =
∞ for some x, y . Given xi and xj , the probability that i and j are joined in IRGn (κe) is at
most κe (xi , xj )/n. In other words, κe captures the edge probabilities in IRGn (κ
e), but not
the correlations.

Number of Edges
Before proceeding to Rdeeper properties, let us note that the expected number of added copies
of F is (1 + o(1))n S |V (F )| κF . Unsurprisingly, the actual number turns out to be concen-
9.4 Random Graphs with Community Structure: Local Communities 417

trated about this mean. Let

1
X Z Z
ξ(κ
e) = |E(F )| κF = κe ≤ ∞ (9.4.6)
F ∈F S |V (F )| 2 S2

e. Since every copy of F contributes |E(F )| edges, the

be the asymptotic edge density of κ
following theorem is almost obvious, provided that we can ignore overlapping edges:
e)) As n → ∞,
Theorem 9.15 (Edge density in IRGn (κ
E[|E(IRGn (κ
e))|/n] → ξ(κ
e) ≤ ∞. (9.4.7)
e) < ∞ then
Moreover, if ξ(κ
P
|E(IRGn (κ
e))|/n −→ ξ(κ
e). (9.4.8)
e) < ∞ then |E(IRGn (κ
In other words, Theorem 9.15 states that if ξ(κ e))| = ξ(κ
e)n +
e) = ∞ then |E(IRGn (κ
oP (n), and if ξ(κ e))| > Cn whp for every constant C . We conclude
that the model is sparse when ξ(κ e) < ∞. This is certainly true when κF is uniformly
bounded, but it may also be true more generally under some integrability assumptions.

Existence of a Giant
We next consider the emergence of the giant component. For this, the linear operator T κe ,
defined by
Z
(T κe f )(x) = κe (x, y)f (y)µ(dy), (9.4.9)
S

where κe is defined in (9.4.5), is crucial. We need to impose some integrability condition on

our kernel family:
Definition 9.16 (Integrable and irreducible kernel families)

(a) A kernel family κ

e = (κF )F ∈F is integrable if
Z X Z
κ
e= |V (F )| κF < ∞. (9.4.10)
F ∈F S |V (F )|

(b) A kernel family κ

e = (κF )F ∈F is edge integrable if
X Z
|E(F )| κF < ∞; (9.4.11)
F ∈F S |V (F )|

we write this as ξ(κ) < ∞. This means that the expected number of edges in IRGn (κ e)
is O(n), see Theorem 9.15, and thus the expected degree of a uniform vertex is bounded.
(c) We say that a symmetric edge kernel κe : S 2 → [0, ∞) is reducible if

∃A ⊂ S with 0 < µ(A) < 1 such that κe = 0 almost everywhere on A × (S \ A);

otherwise κe is irreducible. J
We are now ready to formulate our main result concerning the phase transition in IRGn (κ
e):
418 Related Models

Theorem 9.17 (Giant in clustered inhomogeneous random graphs) Let κ e = (κF )F ∈F be

e) ∈ [0, 1) such that
an irreducible, integrable kernel family. Then there exists a ζ(κ
1 P 1 P
|Cmax | −→ ζ(κ e), and |C(2) | −→ 0. (9.4.12)
n n
Theorem 9.17 is proved by showing that the branching process that captures the “local
structure” of IRGn (κe) is a good approximation and indeed describes the existence of the
giant. Along the way, it is also proved that IRGn (κe) converges locally to this “branching
process.” This is however not a branching process in the usual sense, but instead a branching
process that describes the connections between the subgraphs that act as communities.
For Theorem 9.17 to be useful, we would like to know something about ζ(κ e), which
can be calculated from x 7→ ζκe (x), and which, in turn, equals the largest solution to the
functional equation
f (x) = 1 − e−(Sκe f )(x) , (9.4.13)
for some non-linear operator Sκe . We can think of ζκe (x) as the probability that a vertex of
type x ∈ S has a “large” connected component. The question of when ζ(κ e) > 0 is settled
in the following theorem:
Theorem 9.18 (Condition for existence of giant component) Let κ e be an integrable clique
kernel. Then ζ(κe) > 0 if and only if kT κe k > 1. Furthermore, if κ e is irreducible and
kT κe k > 1, then ζκe (x) is the unique non-zero solution to the functional equation (9.4.13),
and ζκe (x) > 0 holds for almost every x.

P kT κe k may be rather hard to calculate. If we suppose that the symmetrized

In general,
version of F ∈F : |V (F )|=r κF (x1 , . . . , xr ) is constant for each r ≥ 2, however,
P this can be
done. Indeed, say that this symmetrized kernel equals cr . Then κe (x, y) = r r(r−1)cr =
2ξ(κ) for all x and y , so that
kT κe k = 2ξ(κ). (9.4.14)
This is perhaps surprising: it tells us that, for such kernels, the critical point where a giant
component emerges is determined only by the total number of edges added; the size of the
cliques in which they live does not matter, even though, for example, the third edge in every
triangle might be “wasted.”

9.4.2 C ONFIGURATION M ODELS WITH L OCAL C OMMUNITY S TRUCTURE

In this subsection we investigate several models that are related to the configuration model
and yet have a pronounced local community structure. We start by discussing the hierarchical
configuration model and its properties. We then look at a particular version that goes under
the name of the household model, and we close by describing a model in which triangles are
explicitly added.

Hierarchical Configuration model: Model Introduction

Consider the configuration model CMN (d) with a degree sequence d = (di )i∈[N ] satisfying
Condition 1.7(a),(b), and now using N rather than n. We replace each of the vertices by a
small “local” graph to create a community or household structure. Thus, vertex i is replaced
9.4 Random Graphs with Community Structure: Local Communities 419

by a local graph Gi = (V (Gi ), E(Gi )). We assign each of the di half-edges incident to
vertex i to a vertex in Gi in an arbitrary way. Thus, vertex i is replaced by a pair consisting
of the community graph Gi and the inter-community degrees d(b) =P (du(b) )u∈V (Gi ) satisfying
v∈[N ] |V (Gv )|.
P
u∈V (Gi ) du = di . Naturally, the size of the graph becomes n =
(b)

As a result, we obtain a graph with two levels of hierarchy; its local structure is described
by the local graphs (Gi )i∈[N ] whereas its global structure is described by the configuration
model CMN (d). This model is called the hierarchical configuration model. A natural as-
sumption is that the degree sequence d = (di )i∈[N ] satisfies Condition 1.7(a),(b) with n
replaced by N , while the empirical distribution of the graphs satisfies, as N → ∞,
1 X
~ =
µn (H, d) 1 (b)
~
~ → µ(H, d), (9.4.15)
N i∈[N ] {Gi =H,(du )u∈V (Gi ) =d}

for every connected graph H of size |V (H)| and degree vector d~ = (dh )h∈|V (H)| and some
probability distribution µ on graphs with integer marks associated with the vertices. We as-
sume that µn (H, d) ~ = 0 for all H that are disconnected. Indeed, we think of the graphs
(Gi )i∈[N ] as describing the local community structure of the graph, so it makes sense to as-
sume that all (Gi )i∈[N ] are connected. In particular, (9.4.15) shows that a typical community
has bounded size.
We often also make assumptions on the average size of the community of a random ver-
tex. For this, it is necessary to impose that, with µn (H) =
P
~ µ (H, ~ and µ(H) =
d)
d n
P ~
d~ µ(H, d) the community distribution,

X 1 X X
|V (H)|µn (H) = |V (Gi )| → |V (H)|µ(H) < ∞. (9.4.16)
H
n i∈[N ] H

Equation (9.4.16) indicates that the community of a random vertex has a tight size, since
the community size of a random vertex has a size-biased community distribution (see Exer-
cise 9.43). The degree structure of the hierarchical configuration model is determined by the
model description. We next discuss the giant and the distances in the hierarchical configura-
tion model.

Giant in the Hierarchical Configuration Model

The main result concerning the size of the giant is the following theorem:

Theorem 9.19 (Giant in hierarchical configuration model) Assume that the inter-community
degree sequence d = (di )i∈[N ] satisfies Conditions 1.7(a),(b) with N replacing n and with
limit D, while the communities satisfy (9.4.15) and (9.4.16). Then, there exists ζ ∈ [0, 1]
such that
1 P 1 P
|Cmax | −→ ζ, |C(2) | −→ 0. (9.4.17)
n n
Write ν = E[D(D − 1)]/E[D]. Then, ζ > 0 precisely when ν > 1.

Since the communities (Gi )i∈[N ] are connected, the sizes of the clusters in the hierarchical
configuration model are closely related to those in CMN (d). Indeed, for v ∈ [n], let iv
420 Related Models

denote the vertex for which v ∈ V (Giv ). Then

X
|C (v)| = |V (Gi )|, (9.4.18)
i∈C 0 (iv )

where C 0 (i) denotes the connected component of i in CMN (d). This allows one to move
back and forth between the hierarchical configuration model and the corresponding config-
uration model CMN (d) that describes the inter-community connections.
It also allows us to identify the limit ζ . Let ξ ∈ [0, 1] be the extinction probability of the
local limit of CMN (d) of a vertex of degree 1, so that a vertex of degree d survives with
probability 1 − ξ d . Then,
~ − ξ d ],
XX
ζ= |V (H)|µ(H, d)[1 (9.4.19)
H d~

where d = v∈V (H) dv . Further, ξ = 1 precisely when ν ≤ 1; see, e.g., Theorem 4.9. This
P
explains the result in Theorem 9.19. In Exercise 9.45, the reader is asked to fill in the details.
Before moving to graph distances, we discuss an example of the hierarchical configura-
tion model that has attracted attention under the name configuration model with household
structure.

Configuration Model with Household Structure

In the configuration model with household structure, each Gi is a complete graph of size
|V (Gi )|, while each inter-community degree equals 1, i.e., du(b) = 1 for all u ∈ V (Gi ).
As a result, the degree of a vertex v ∈ [n] is equal to |V (Giv )|, where we recall that iv
is such that v ∈ V (Giv ). This is a rather interesting example, where every vertex is in a
“household,” and every household member has precisely one connection to the outside. This
connects it to another household. In this case, the degree distribution is equal to
1 X 1 X
Fn (x) = 1{|V (Giv )|≤x} = |V (Gi )|1{|V (Gi )|≤x} . (9.4.20)
n v∈[n] n i∈[N ]

As a result, the degree distribution in the household model is the size-biased degree in the
configuration model that describes its inter-community structure. In particular, this implies
that if the limiting degree distribution D in CMN (d) is a power law with exponent τ 0
then the limiting degree distribution in the household model is a power law with exponent
τ = τ 0 − 1. This is sometimes called a power-law shift and is clearly visible in Figure 9.12.

Graph Distances in the Hierarchical Configuration Model

We next turn to the graph distances in the hierarchical configuration model. Again, a link
can be made to the distances in the configuration model, particularly when the diameters
of the community graphs (Gi )i∈[N ] are uniformly bounded. Indeed, in this case, one can
expect the graph distances in the hierarchical configuration model to be of the same order of
magnitude as in the configuration model CMN (d).
This link is easiest to formulate for the household model. Indeed, there the diameter of the
community graphs is 1 since all community graphs are complete graphs. Denote the hierar-
chical configuration model by HCMn (G). Then, unless u = v or the half-edge incident to
9.4 Random Graphs with Community Structure: Local Communities 421

100

10−1

P(X > x) 10−2

degree
s(= k)
10−3 0
10 101 102 103
x

Figure 9.12 Degree distribution of household model follows a power law with a
smaller exponent than the community size distribution and outside degree
distribution

u is being paired to that incident to v , distHCMn (G) (u, v) = 2 distCMN (d) (iu , iv ) − 1, where
we recall that iv is such that v ∈ V (Giv ). Thus, distances in HCMn (G) are asymptotically
twice as large as those in CMN (d). The reason is that paths in the household model alternate
between intra-community edges and inter-community edges, because the inter-community
degrees are all equal to 1 so there is no way to jump to a vertex using an inter-community
edge and leave through an inter-community edge again. This is different from general hier-
archical configuration models.

Configuration Model with Clustering

We close this section on adaptations of the configuration model with local communities by
introducing a model that has not yet attracted a lot of attention in the mathematical commu-
nity.
The low clustering of CMn (d) can be resolved by introducing households as described
above. Alternatively, and in the spirit of clustered inhomogeneous random graphs as de-
scribed in Section 9.4.1, we can also introduce clustering directly. In the configuration model
with clustering, we assign two types of “degrees” to a vertex v ∈ [n]. We let d(si) v denote
the number of simple half-edges incident to vertex v , and we let d(tr)
v denote the number of
triangles of which vertex v is part. In this terminology, the degree dv of a vertex is equal to
dv = d(si)
v + 2dv . We can then say that there are dv half-edges incident to vertex v and
(tr) (si)

d(tr)
v “third-triangles” of pairs of half-edges.
The graph is built by (a) recursively choosing two half-edges uar without replacement,
and pairing them into edges (as for CMn (d)), and (b) choosing triples of third-triangles
uar and without replacement, and drawing edges between the three vertices incident to the
third-triangles that are chosen.
Let (Dn(si) , Dn(tr) ) denote the number of simple edges and triangles incident to a uniform
d
vertex in [n], and assume that (Dn(si) , Dn(tr) ) −→ (D(si) , D(tr) ) for some limiting distribution
(D(si) , D(tr) ). Newman (2009) performed a generating function analysis to investigate when
a giant component is expected to exist. The criterion that Newman found is that a giant exists
422 Related Models

when
E[(D(si) )2 ] 2E[(D(tr) )2 ] 2E[D(si) D(tr) ]
− 2 − 3 < . (9.4.21)
E[D(si) ] E[D(tr) ] E[D(si) ]E[D(tr) ]
When D(tr) = 0 almost surely, so that there are no triangles, this reduces to
E[(D(si) )2 ]
− 2 > 0, (9.4.22)
E[D(si) ]
which is equivalent to ν = E[D(si) (D(si) − 1)]/E[D(si) ] > 1 (recall Theorem 4.9).
It would be of interest to analyze this model mathematically. While the extra triangles
do create extra clustering in the graph, in that the graph is no longer locally tree-like, the
community structure of the graph is less clear. Of course, the above setting can be general-
ized to arbitrary cliques and possible other community structures, but this would make the
mathematical analysis substantially more involved. Exercises 9.46 and 9.47 investigate the
local and global clustering coefficients, respectively.

9.4.3 R ANDOM I NTERSECTION G RAPHS

In most of the above models the local communities to which vertices belong partition the
vertex space. However, in most real-world applications, particularly in social networks, of-
ten the vertices are not just part of one community, but of several. We now present a model,
the random intersection graph, where the group memberships are rather general. In a ran-
dom intersection graph, vertices are members of groups. The group memberships arise in
a random way. Once the group memberships are chosen, the random intersection graph is
constructed by giving an edge to two vertices precisely when they are both members of the
same group. Formally, let V (Gn ) denote the vertex set and A(Gn ) the collection of groups.
Let M (Gn ) = {(v, a) : v is in group a} ⊆ V (Gn ) × A(Gn ) denote the group member-
ships. Then, the edge set of the random intersection graph with these group memberships
is
E(Gn ) = {{u, v} : (u, a), (v, a) ∈ M (Gn ) for some a ∈ A(Gn )}. (9.4.23)
Thus the random intersection graph is a deterministic function of the group memberships.
The operation in (9.4.23), going from a bipartite graph to a unipartite graph, is sometimes
called one-mode projection. In turn, the groups give rise to a community structure in the
resulting network. Since vertices can be in several groups, the groups no longer partition the
graph. It is even possible that pairs of vertices are both in several groups, even though we
will see that this is rare.
Ggroup memberships occur through some random process. There are many possibilities
for this. We will discuss several different versions of the model, as introduced in the litera-
ture, and will then focus on one particular type of model to state the main results. In general,
we will require that our models are sparse. Suppose that the probability that vertex v is part
of group a equals pva and that group a has on average ma group elements. Then the average
degree of vertex v equals
X
E[Dv ] = pva (ma − 1), (9.4.24)
a∈A(Gn )
9.4 Random Graphs with Community Structure: Local Communities 423

which we aim to keep bounded. We now discuss several different choices.

Random Intersection Graphs with Independent Group Membership

In the most studied model there are n vertices, m = mn groups, often with m = βnα for
some α > 0, and each vertex is independently connected to each group with probability pn .
In this case, pva = pn and ma = npn , so that (9.4.24) turns into

E[Dv ] ≈ np2n m = βn1+α p2n , (9.4.25)

and choosing p = γn−(1+α)/2 yields an expected degree βγ 2 . A more flexible version is

obtained by giving a weight wv to each of the vertices, for example in an iid way, letting

pva = (γwv pn ) ∧ 1, (9.4.26)

and making all the edges between vertices and groups conditionally independent given the
weights (wv )v∈[n] . In the theorem below, we assume that (wv )v∈[n] is a sequence of iid
random variables with finite mean:
Theorem 9.20 (Degrees in random intersection graph with iid vertex weights) Consider
the above random intersection graph, with m = βnα groups, vertex weights (wv )v∈[n]
that are iid copies of W ∼ F and have finite mean, and group membership probabilities
pva = (γwv n−(1+α)/2 ) ∧ 1. Then, for any v ∈ [n]:
P
(a) Dv −→ 0 when α < 1;
d PX
(b) Dv −→ i=1 Yi when α = 1, where (Yi )i≥1 are iid Poi(γ) random variables and
X ∼ Poi(βγW );
d
(c) Dv −→ X where X ∼ Poi(βγ 2 W ) when α > 1.

Theorem 9.20 can be understood as follows. The expected number of groups to which
individual v belongs is roughly (βnα )×(γwv n−(1+α)/2 ) = βγwv n−(1−α)/2 . When α < 1,
this is close to zero, so that Dv = 0 whp. For α = 1, it is close to Poisson, with parameter
βγwv , and the number of other individuals in each of these groups is approximately Poi(γ)
distributed. For α > 1, individual v belongs to a number of groups that tends to infinity
when n → ∞, while each group has expected vanishing size n(1−α)/2 . The latter means
that group sizes are generally 0 or 1, asymptotically independently, giving rise to the Poisson
distribution specified in part (c). Part (b) is the most interesting, and interpolates between
the two extremes.

Random Intersection Graphs with Prescribed Groups

We next discuss a setting in which the number of groups per vertex and the group sizes
are deterministic and are obtained by randomly pairing vertices to groups. As such, the
random intersection graph is obtained by (9.4.23), where now the edges between vertices
in V (Gn ) and groups in A(Gn ) are modeled as a bipartite configuration model. Again, the
number of groups m = |A(Gn )| grows asymptotically linearly in the number of vertices
n = |V (Gn )|.
In more detail, vertex v ∈ [n] belongs to d(ve)
v groups, while group g ∈ [m] has size d(gr)
g .
424 Related Models

Here n is the number of individuals while m is the number of groups. Naturally, in order for
the model to be well defined we require
X X
d(ve)
v = d(gr)
a . (9.4.27)
v∈[n] a∈[m]

The edges in this random intersection graph are given by (9.4.23).

We will now focus on this particular setting, for concreteness. We refer to the extensive
discussion in Section 9.6 for more details and references to the literature.

Local Limit of Random Intersection Graphs with Prescribed Groups

The local limit of the random intersection graph with prescribed degrees and groups is de-
scribed in the following theorem:
Theorem 9.21 (Local limit of random intersection graphs with prescribed groups) Con-
sider the random intersection graph with prescribed groups, where the number of groups
satisfies mn = βn. Assume that the group membership sequence d(ve) = (d(ve) v )v∈[n] and the
group size sequence d(gr) = (d(gr)
a )v∈[m n ] both satisfy Conditions 1.7(a),(b), where (9.4.27)
also holds (and, for d(gr) , n in Conditions 1.7(a),(b) is replaced by mn ). Then the model
converges locally in probability to a so-called clique tree, where

B the number of groups in which the root participates has law D(ve) , which is the limiting
law of Dn(ve) = d(ve)
o with o ∈ [n] uar;
B the number of groups in which every other vertex participates has law Y ? − 1, where
Y ? − 1 is the size-biased version of D(ve) ;
B the numbers of vertices per group are iid random variables with law X ? , where X ? is
the size-biased version of the limiting law of Dn(gr) = d(gr)
V where now V ∈ [m] is a
uniformly chosen group.

By Theorem 9.21, the degree distribution of the random intersection graphs with pre-
scribed groups is equal to
(ve)
D
X
D= (Xi? − 1), (9.4.28)
i=1

which can be compared with Theorem 9.20(b). The intuition behind Theorem 9.21 is that the
random intersection graph can easily be obtained from the bipartite configuration model by
making all group members direct neighbors. By construction, the local limit of the bipartite
configuration model can be described by an alternating branching process of the size-biased
vertex and group distributions. Note that the local limit in Theorem 9.21 is not a tree; ver-
tices are in multiple groups. However, Theorem 9.21 does imply that the probability that
a uniform vertex has a neighbor with which shares two group memberships vanishes; see
Exercise 9.48. Thus, the overlap between groups is generally a single vertex.
Theorem 9.21 also has implications for the clustering coefficients. Indeed, by Theorem
2.23 the local clustering coefficient for the random intersection graph converges; by Theo-
rem 2.22, the same holds for the global clustering coefficient under a finite-second moment-
condition on the degrees. See Exercises 9.49 and 9.50 for more details.
9.4 Random Graphs with Community Structure: Local Communities 425

Giant in Random Intersection Graphs with Prescribed Degrees

We now investigate the existence and size of the giant component in the random intersection
graph with prescribed degrees and groups:
Theorem 9.22 (Giant in random intersection graphs with prescribed degrees) Consider
the random intersection graphs with prescribed degrees and groups, where the number of
groups satisfies mn = βn. Assume that the group membership sequence d(ve) = (d(ve) v )v∈[n]
and the group size sequence d(gr) = (d(gr)
a )v∈[mn ] both satisfy Conditions 1.7(a),(b) (and, for
d(gr) , n in Conditions 1.7(a),(b) is replaced by mn ), where (9.4.27) also holds. Then there
exists ζ ∈ [0, 1] such that
1 P 1 P
|Cmax | −→ ζ, |C(2) | −→ 0, (9.4.29)
n n
where ζ is the survival probability of the local limit in Theorem 9.21. Further, ζ > 0 pre-
cisely when ν > 1, where
E[(D(ve) − 1)D(ve) ] E[(D(gr) − 1)D(gr) ]
ν= . (9.4.30)
E[D(ve) ] E[D(gr) ]
In terms of the intuitive argument just below Theorem 9.21, the survival probability of the
random intersection graph is identical to that of the bipartite configuration model. Owing to
its alternating nature in odd and even generations, the bipartite configuration model needs
to be treated with care. However, the numbers of individuals in odd or in even generations
PY ? −1
are normal branching processes with offspring distribution i=1 (Xi? − 1) for the even
PX ? −1
generations and i=1 (Yi? − 1) for the odd generations. These branching processes have a
positive survival probability when the expected value ν of their number of offspring satisfies
ν > 1. Obviously, the expected number of offsprings in the branching processes describing
even and odd generations agree, by Wald’s identity. This leads us to ν in (9.4.30).

9.4.4 E XPONENTIAL R ANDOM G RAPHS AND M AXIMAL E NTROPY

Suppose we have a real-world network for which we observe a large number of occurrences
of certain subgraphs F ∈ F . For example, though the model is sparse, we do see a linear
number of triangles. How can we model such a network? This is particularly relevant in
the area of social science, where early on it was observed that social networks have much
more clustering, i.e., many more triangles, than one might expect on the basis of many
classical random graph models. This raises the question how to devise models that have
similar features.
One solution may be to take the subgraph counts as given, and use a random graph model
with precisely these subgraph counts. For example, considering the subgraphs to be stars of
any order, this would fix the degrees of all the vertices, which would lead us to a uniform
graph with prescribed degrees. However, this model is notoriously difficult to work with,
and even to simulate. It becomes more difficult only when more involved quantities such
as the number of triangles or of triangles per vertex are considered. Therefore, this solution
may be practically impossible. Also, it may be that the numbers we observe are merely the
end product of a random process, so that we should see the realizations as an indication of
their mean, that is, as constituting a soft constraint rather than a hard constraint.
426 Related Models

The exponential random graph is a way to leverage the randomness and still obtain a
model that one can write down. Indeed, let F be a collection of subgraphs and suppose we
observe that, on our favorite real-world network, the number of occurrences of subgraph F
equals αF for every F ∈ F . Let us now write down what this might mean. Let F be a
graph on |V (F )| = m vertices. For a graph G on n vertices and m vertices v1 , . . . , vm , let
G|(vi )i∈[m] be the subgraph spanned by (vi )i∈[m] . This means that the vertex set of G|(vi )i∈[m]
equals [m], while its edge set equals {{i, j} : {vi , vj } ∈ E(G)}. The number of occur-
rences of F in G can then be written as

1{G|(vi )i∈[m] =F } .
X
NF (G) = (9.4.31)
v1 ,...,vm ∈V (G)

Here, it is convenient to recall that we may equivalently write G = ([n], (xuv )1≤u<v≤n ),
where xi,j ∈ {0, 1} and xi,j = 1 if and only if {i, j} ∈ E(G). Then, we can write
NF (G) = NF (x).
In order to define a measure, we can take a so-called exponential family of the form
1 P
βF NF (x)
pβ~ (x) = e F ∈F , (9.4.32)
~
Zn (β)
~ is the normalization constant:
where Zn (β)
~ =
X P
Zn (β) e F ∈F βF NF (x) , (9.4.33)
x

~ = (βF )F ∈F is a collection of parameters. In order to ensure that p ~ has the correct

and β β
mean values for NF , we choose β~ = (βF )F ∈F as the solution to
X
NF (x)pβ~ (x) = αF for all F ∈ F. (9.4.34)
x

In this case, E[NF (X)] = αF for all F ∈ F when P(X = x) = pβ~ (x). Further, when
conditioning on NF (x) = qF for some parameters (qF )F ∈F , the conditional exponential
random graph is uniform over the set of graphs with this property. This is a conditioning
property of exponential random graphs.
We next discuss two examples that we know quite well, and that arise as exponential
random graphs with certain specific subgraph counts:

Example
P 9.23 (Example: ERn (λ/n) and edge subgraphs) Take NF (x) = NK2 (x) =
u,v∈[n] uv = 2|E(Gn )|, so that we put a restriction on the expected number of edges or
x
complete graphs of size 2 in the graph. Then we see that, with Gn = ([n], (xuv )1≤u<v≤n ),

e2β|E(Gn )| = (1 + e2β )( 2 ) ,
n
~ =
X
Zn (β) (9.4.35)
x

and
1 Y e2βxuv
pβ~ (x) = e2β|E(Gn )| = . (9.4.36)
~
Zn (β) 1 + e2β
1≤u<v≤n
9.4 Random Graphs with Community Structure: Local Communities 427

Thus, the different edges are independent, and an edge is present with probability e2β /(1 +
e2β ), and absent with probability 1/(1 + e2β ). In a sparse setting, we aim at
λ
E[|E(Gn )|] = (n − 1), (9.4.37)
2
so that the average degree per vertex is precisely equal to λ. The constraint in (9.4.34) thus
reduces to !
n e2β λ
= (n − 1). (9.4.38)
2 1 + e2β 2
This leads to ERn (λ/n), where
e2β λ
= , (9.4.39)
1 + e2β n
that is, e2β = λ/(n − λ). This shows that the ERn (λ/n) is an example of an exponential
random graph with a constraint on the expected number of edges in the graph. Further, by
the conditioning property of exponential random graphs, conditional on ERn (λ/n) = m,
the distribution is uniform over all graphs with m edges. J

Example 9.24 (Example: GRGn (w) and vertex degrees) The second example arises
P we fix the (G
when expected degrees of all the vertices. This occurs when we take Nv (x) =
u∈[n] xvu = dv
n)
for every v ∈ [n]. In this case, with Gn = ([n], (xuv )1≤u<v≤n ), we
have
(Gn )
~ =
X P X P Y
Zn (β) e v∈[n] βv dv = e 1≤u<v≤n (βu +βv )xuv = (1 + eβu +βv )
x x 1≤u<v≤n
(9.4.40)
and
1 P
βv d(Gn)
Y e(βu +βv )xuv
pβ~ (x) = e v∈[n] v
= . (9.4.41)
~
Zn (β) 1 + eβi +βj
1≤u<v≤n

Thus the different edges are still independent, edge {u, v} being present with probability
eβu +βv /(1 + eβu +βj ) and absent with probability 1/(1 + eβu +βv ). In a sparse setting, we
aim for
E[d(G
v
n)
] = αv , (9.4.42)
so that the average degree of vertex v is precisely equal to αv . The constraint in (9.4.34) thus
reduces to
X eβv +βj
= αv . (9.4.43)
j6=v
1 + eβv +βj

This leads to GRGn (w), where

wv
qP = eβv . (9.4.44)
u∈[n] wu

Thus, GRGn (w) is an example of an exponential random graph with a constraint on the
428 Related Models

expected number of edges in the graph. Further, by the conditioning property of exponential
random graphs, conditional on d(G v
n)
= αv for all v ∈ [n], the distribution is uniform over
all graphs with these degrees. This gives an alternative proof of Theorem 1.4.
We note that this does not exactly fit the format as in (9.4.31) since we have fixed the
expected vertex degrees rather than subgraph counts. However, the model where we use
the number of k -stars for all k in (9.4.31) is closely related to the model where we fix the
expected degrees of all the vertices. J

Now that we have discussed two quite nice examples of an exponential random graph, let
us discuss its intricacies. The above choices, in Examples 9.23 and 9.24, are quite spe-
cial in the sense that the exponent in (9.4.32) is linear in the edge occupation statuses
(xuv )1≤u<v≤n . This gives rise to exponential random graphs that have independent edges.
However, when investigating more intricate subgraph counts, such as triangles, this linear-
ity no longer holds. Indeed, the number of triangles is a cubic function of (xuv )1≤u<v≤n . In
such cases the edges will no longer be independent, making the exponential random graph
very hard to study.
Indeed, the exponential form in (9.4.32) naturally leads to large deviations in random
graphs, a topic that is much better understood in a dense setting where the number of edges
grows proportionally to n2 . In the sparse setting such problems are hard, and sometimes
ill-defined, for example since the model may have phase transitions (see, e.g., Häggström
and Jonasson (1999)). Such phase transitions imply that the estimation problem of finding
parameters β ~ = (βF )F ∈F such that the expected subgraph counts are exactly as intended
may be ill defined. We refer to the notes and discussion in Section 9.6 for more background
and references.

9.5 S PATIAL R ANDOM G RAPHS

The models described so far do not incorporate geometry at all. Yet, geometry may be rele-
vant (see, e.g., Wong et al. (2006) and the references therein). In many networks the vertices
are located somewhere in space and their locations may indeed be relevant. People who live
closer to one another are more likely to know each other, even though we all know people
who live far away from us. This is a very direct link to the geometric properties of networks.
However, the geometry may also be much more indirect or latent. For example, people who
have similar interests are also more likely to know one another. Thus, when we are associat-
ing a whole bunch of attributes with vertices in the network, vertices with similar attributes
(age, interests, hobbies, profession, music preference, etc.) may be more likely to know each
other. In any case, we are rather directly led to studying networks where the vertices are em-
bedded in some general geometric space. These are what we refer to as spatial networks.

One further aspect of spatial random graphs deserves to be mentioned. Owing to the
fact that nearby vertices are more likely to be neighbors, conversely it is also true that two
neighbors of a vertex are more likely to be connected. Therefore, geometry rather naturally
leads to clustering.
9.5 Spatial Random Graphs 429

9.5.1 S MALL -W ORLD M ODEL

The small-world model was arguably the first spatial model to be proposed in the context
of complex networks. We again refer to the notes and discussion in Section 9.6 for more
background and references, including the history of the model. The aim was to describe how
the small-world effect can arise in a simple and natural way through the addition of long-
range edges. Here we call an edge long range when it connects a pair of vertices that are
far away in an underlying (or extrinsic) geometry, in which the vertices are located in some
geometric space with a natural distance on it. Such long-range edges can lead to substantial
shortcuts and thus significantly decrease graph distances. Let us describe a first version of
the small-world network.
We start with a finite torus, and add random long-range connections to it, independently
for each pair of vertices. This gives rise to a graph that is a small perturbation of the original
lattice, but has occasional long-range connections that are crucial in order to shrink graph
distances. From a practical point of view, we can think of the original graph as being the local
description of acquaintances in a social network, while the shortcuts describe the occasional
long-distance acquaintances. The main idea is that, even though the shortcuts only form a
small part of the connections in the graph, they are crucial in order to make it a small world.

Small-World Behavior in the Continuous Circle Model

The simplest version of the model is obtained by taking a circle of circumference n, and
adding a Poisson number of shortcuts with parameter nρ/2, where the starting and end-
points of the shortcuts are chosen uar independently of each other. This model is called the
continuous circle model.
Distance is measured as usual along the circle, and the shortcuts have, by convention,
length zero. Thus, one can think of this model as a circle where the points along the random
shortcut are identified, thus creating a puncture in the circle. Multiple shortcuts then lead
to multiple punctures of the circle, and the distance is then the usual distance along the
punctured graph. Bear in mind that this is somewhat different from the usual graph distance,
which counts the number of edges along the shortest path between pairs of vertices. The
following result describes this punctured graph distance:
Theorem 9.25 (Distance in continuous circle model) Let Dn be the distance between two
uniformly chosen points along the punctured circle in the continuous circle model. Then, for
every ρ > 0, as n → ∞,
P
Dn (2ρ)/ log(ρn) −→ 1. (9.5.1)
More precisely,
d
ρ(Dn − log(ρn)/2) −→ T, (9.5.2)
where T is a random variable satisfying
∞
e−y dy
Z
P(T > t) = . (9.5.3)
0 1 + e2t
The random variable T can be described by
2t
W (1) W (2)
P(T > t) = E[e−e ], (9.5.4)
430 Related Models

where W (1) , W (2) are two independent exponential random variables with parameter 1. Al-
ternatively, it can be seen that T = (G1 + G2 − G3 )/2, where G1 , G2 , G3 are three inde-
x
pendent Gumbel distributions having density fG (x) = ex e−e on R.
Interestingly, the method of proof of Theorem 9.25 is quite close to that of Theorem 7.24.
Indeed, again the parts of the graph that can be reached in a distance at most t are analyzed.
Let P1 and P2 be two uniform points along the circle, so that Dn has the same distribution
as the distance between P1 and P2 . Denote by R(1) (t) and R(2) (t) the parts of the graph
that can be reached within a distance t from P1 and P2 , respectively. Then Dn = 2Tn ,
where Tn is the first time that R(1) (t) and R(2) (t) have a non-zero intersection. The proof
then consists of showing that, up to time Tn , the processes R(1) (t) and R(2) (t) are close to
certain continuous-time branching processes, primarily owing to the fact that the probability
that there are two intervals that overlap is quite small. The random variables W (1) and W (2)
can be viewed as appropriate martingale limits of these branching processes.
Comparing Theorem 9.25 with Theorem 7.24, we see that the rescaled distance Dn , after
subtraction of the correct multiple of log n, converges in distribution, while in Theorem 7.24,
convergence is at best along subsequences. This is due to the fact that Dn is a continuous
random variable, while graph distances are integer-valued. Therefore, graph distances suffer
from discretization effects. In the next paragraph, we will see that the graph distances in the
small-world model suffer from similar issues.

Small-World Behavior in Discrete Small-World Model

We now extend the analysis to graph distances in the small-world model where the long-
range edges are counted as having distance 1 rather than zero. Further, the vertices are po-
sitioned on a discrete torus. Thus, these are the distances in the small-world model when
the latter is considered as a graph. We let the total number of vertices be n = Lk and
every vertex be connected to k of its closest neighbors by an (undirected) edge. The extra
variable k is useful, as k > 2 allows the model to have significant clustering, which the
nearest-neighbor version for k = 2 does not have. “Shortcuts” between all pairs of vertices
are added independently with probability λ/n, and we call these long-range edges. Thus,
the small-world model is realized by taking the union of the discrete torus, which gives rise
to the short-range edges, with ERn (λ/n) so that each possible long-range edge is present
independently with probability λ/n. Exercise 9.51 investigates the degree structure of this
graph.
Define q
1

ν = 2 λ + 1 + (λ + 1)2 + 4λ(2k − 1) > λ + 1. (9.5.5)
The main result concerning small-world properties in the small-world model is as follows:
Theorem 9.26 (Distance in small-world model) Let Gn be the above discrete small-world
model. Assume that n → ∞ and that ρ = λk remains bounded. Let distGn (o1 , o2 ) denote
the graph distance between two uniformly chosen vertices o1 , o2 ∈ [n]. Then, with ν as in
(9.5.5),
P 1
distGn (o1 , o2 )/ log n −→ . (9.5.6)
log ν

A more precise version of the fluctuations of distGn (o1 , o2 ) − logν n is also known.
9.5 Spatial Random Graphs 431

Further, the case where ρ = λn k → 0 has been studied, and there the behavior is closely
related to that in Theorem 9.25. See Section 9.6 for an extensive discussion. The parameter
ν arises as the largest eigenvalue of the offspring matrix of an appropriate two-type branch-
ing process that describes the local neighborhoods in the discrete small-world model. This
branching process has two types in that there is a difference between an interval starting
immediately after a shortcut, and intervals that have previously been found, owing to the
“hesitation” arising from the fact that a long-range edge now has length 1 rather than 0.

9.5.2 H YPERBOLIC R ANDOM G RAPHS

Here we consider the hyperbolic random graph, where vertices are in a disk of radius R
and are connected if their hyperbolic distance is at most R. This is sometimes called the
Poincaré disk model. These graphs are very different from general inhomogeneous random
graphs, because the geometry creates random graphs that are more clustered.
The hyperbolic random graph has two key parameters, denoted by ν and α. The model
samples n vertices on a disk of radius R = 2 log(n/ν), where the density of the radial
coordinate r of a vertex p = (r, φ) is
sinh(αr)
ρ(r) = α . (9.5.7)
cosh(αR) − 1
This measure is natural as it gives rise to the uniform measure on the hyperbolic disk. Here
ν parametrizes the average degree of the generated networks and −α parametrizes the so-
called negative curvature of the space. The angle φ of p is sampled uniformly from [0, 2π],
so that the points have a spherically symmetric distribution. Then, two vertices are connected
when their hyperbolic distance is at most R. Here, the hyperbolic distance x = distH (u, v)
between two points at polar coordinates u = (r, φ) and v = (r0 , φ0 ) is given by the hyper-
bolic law of cosines,
cosh(x) = cosh(r) cosh(r0 ) − sinh(r) sinh(r0 ) cos(kφ − φ0 k), (9.5.8)
where kφ − φ0 k = π − |π − |φ − φ0 || is the difference between the two angles (which is
the Euclidean distance on the circle).

Degree Structure of Hyperbolic Random Graphs

v∈[n] 1{Dv =k} denote the degree distribution in the hyperbolic random
(n) 1
P
Let Pk = n
graph. As explained informally in the previous paragraph, we may expect that the degree
distribution obeys a power law. This is the content of the following theorem:
Theorem 9.27 (Power-law degrees in hyperbolic random graphs) As n → ∞, there exists
a probability distribution (pk )k≥0 such that
P
Pk(n) −→ pk , (9.5.9)
where (pk )k≥0 obeys an asymptotic power law, i.e., there exists a c > 0 such that
pk = ck −τ (1 + o(1)), (9.5.10)
where τ = 2α + 1.
432 Related Models

(a) (b)

Figure 9.13 Examples of hyperbolic graphs for n = 250, with (a) τ = 2.5 and (b)
τ = 3.5, and average degree approximately 5.

100 τ = 2.5
τ = 3.5
P(X > x)

10−1

100 101 102

Figure 9.14 Degree distributions of the hyperbolic graphs in Figure 9.13.

We deduce that the model is scale-free, meaning that the asymptotic degree distribution
has infinite variance, precisely when α ∈ ( 12 , 1); otherwise the degree distribution obeys
a power law with a larger degree exponent. Let us informally explain the power law in
Theorem 9.27. For a vertex v with radial coordinate rv , we define its type tv by
tv = e(R−rv )/2 . (9.5.11)
Then, the degree Dv of vertex v can be approximated by a Poisson random variable with
mean tv , so that Dv is of order tv . Furthermore, the random variables (tv )v≥1 are distributed
as a power law with exponent τ = 2α + 1, so that the degrees have a power-law distribution
as well, with the same exponent.
The exact form of pk involves several special functions. Its identification is quite impres-
sive, the proof of which is rather involved. For the purpose of this book, however, the exact
shape of pk is not so relevant.
Much more is known about the local structure of the hyperbolic random graph, for exam-
ple, its local limit has been identified. We postpone this discussion to the next subsection, in
9.5 Spatial Random Graphs 433

which we discuss the local limit in geometric inhomogeneous random graphs. It turns out
that we can interpret the hyperbolic random graph as a special case which is interesting in
its own right.

Giant in Hyperbolic Random Graphs

We next study the giant in hyperbolic random graphs, the result being somewhat surprising:
Theorem 9.28 (Giant in hyperbolic random graphs) Consider the hyperbolic random
graph with parameters α > 12 and ν > 0.
P
(a) For α > 1, |Cmax |/n −→ 0 for all ν > 0.
(b) For α ∈ ( 12 , 1), there exists a ζ > 0 such that |Cmax |/n −→ ζ whp, and |C(2) |/n −→ 0
P P

for all ν > 0.

P
(c) For α = 1, there exist π/8 ≤ ν0 ≤ ν1 ≤ 20π and ζ > 0 such that |Cmax |/n −→ ζ
P P
and |C(2) |/n −→ 0 for all ν > ν1 , while |Cmax |/n −→ 0 for ν ≤ ν0 .
We see that a giant component exists only in the scale-free regime, which is quite surpris-
ing. In various other random graphs, a giant component exists in settings where the degrees
have finite variance, particularly when there are sufficiently many edges. This turns out not
to be the case for hyperbolic random graphs. The fact that no giant exists when α > 1 and
τ > 3 is explained in more detail in Section 9.5.3; see below Theorem 9.34. There, it is ex-
plained that the hyperbolic random graph is a special case of a one-dimensional geometric
inhomogeneous random graph. Giants in one dimension exist only when the degrees have
infinite variance. We postpone a further discussion to that section.
More precise results include the fact that |Cmax | = ΘP (R2 (log log R)3 n1/α ) for α > 1,
while the second largest component is at most polylogarithmic (i.e., at most a power of
log n) for α ∈ ( 12 , 1).

Ultra-Small Distances in Hyperbolic Random Graphs

Obviously, there is little point in studying typical distances in the hyperbolic random graph
when there is no giant component, even though such results do shed light on the component
structure of smaller components. Thus, we restrict to the setting for which α ∈ ( 12 , 1), where
typical distances are ultra-small:
Theorem 9.29 (Ultra-Small Distances in Hyperbolic Random Graphs) Let Gn be the hy-
perbolic random graph with parameters α ∈ ( 21 , 1) and ν > 0. Then, with o1 , o2 two
independent vertices chosen uar from [n], conditional on o1 ←→ o2 ,
distGn (o1 , o2 ) P 2
−→ . (9.5.12)
log log n | log (2α − 1)|
Note that 2α − 1 = τ − 2, so that Theorem 9.29 agrees with the ultra-small nature of the
random graphs studied in this book.

Hyperbolic Embeddings of Complex Networks

Can hyperbolic geometries be used to map real-world complex networks efficiently? On the
one hand, hyperbolic random graphs have very small distances, while on the other hand they
434 Related Models

Figure 9.15 Hyperbolic embedding of the Internet at Autonomous Systems level,

by Boguná et al. (2010) (see (Boguná et al., 2010, Figure 3)).

also have the necessary clustering to make them appropriate models. Of course, the question
of how to embed them precisely is highly relevant and also quite difficult.
The example that has attracted the most attention is the Internet. In Figure 9.15, you can
see a hyperbolic embedding of the Internet, as performed by Boguná et al. (2010). We see
that the regions on the boundary of the outer circle can be grouped in a fairly natural way,
where the countries in which the autonomous systems reside seem to be grouped according
to their geography, with some exceptions (for example, it is not clear why Kenya is almost
next to the Netherlands). This hyperbolic geometry is first of all quite interesting, but it could
also be helpful in sustaining the ever growing Internet traffic.

9.5.3 G EOMETRIC I NHOMOGENEOUS R ANDOM G RAPHS

We continue by defining the geometric inhomogeneous random graph (GIRG). Here vertices
are located in a general metric space X ⊆ Rd for some dimension d ≥ 1. We let (Xv )v∈[n]
denote their locations, and assume that (Xv )v∈[n] are iid and have some law µ on X . Often,
we assume that X = [− 21 , 12 ]d is the cube of width 1 in d dimensions centered at the
origin, with periodic boundary conditions, and µ the uniform measure. Further, we assume
that each vertex v has a weight Wv associated with it, where (Wv )v∈[n] are assumed to be
9.5 Spatial Random Graphs 435

iid, for example as power-law random variables. We write (xv )v∈[n] and (wv )v∈[n] for the
realizations of (Xv )v∈[n] and (Wv )v∈[n] .
The edges are conditionally independent given (xv )v∈[n] and (wv )v∈[n] , where the condi-
tional probability that the edge between u and v is present is equal to
pu,v = κn (kxu − xv k, wu , wv ), (9.5.13)
3
for some κ : [0, ∞) → [0, ∞). A prime example of such a GIRG is the so-called product
GIRG, for which
w w max{α,1}
u v
κn (t, wu , wv ) = λ 1 ∧ P t−dα , (9.5.14)
i∈[n] wi

where α, λ > 0 are appropriate parameters. Often, we assume that the vertex weights obey
a power law, i.e.,
P(W > w) = L(w)w−(τ −1) (9.5.15)
for some slowly varying function L : [0, ∞) → (0, ∞). As is usual, the literature treats a
variety of models and settings, and we refer to the notes and discussion in Section 9.6 for
more details. To describe the local limit of GIRGs, we need to assume a more restrictive
setting:
Assumption 9.30 (Limiting connection probabilities exist) Assume the following:
(a) the vertex weights (wv )v∈[n] satisfy Condition 1.1(a) for some limiting random variable
W;
(b) the vertex locations (xv )v∈[n] are a sequence of iid uniform locations on [− 12 , 12 ]d that
are independent of (wv )v∈[n] ;
(c) there exists a function κ : [0, ∞)3 → [0, ∞) such that κn (n−1/d t, xn , yn ) → κ(t, x, y)
for all xn → x and yn → y , where κ satisfies that there exists α > 0 such that, for all
t large enough,
E[κ(t, W1 , W2 )] ≤ t−α , (9.5.16)
with W1 , W2 two copies of the limiting random variable W in part (a).

Hyperbolic Random Graph as a One-Dimensional Product GIRG

We now explain that the hyperbolic random graph can be interpreted as a special case of the
product GIRG. To this end, we embed the disk of the native hyperbolic model into our model
with dimension 1. Hence, we reduce the geometry of the hyperbolic disk to the geometry of
a circle but gain additional freedom, as we can choose the weights of vertices.
Notice that a single point on the hyperbolic disk has measure zero, so we can assume that
no vertex has radius rv = 0. Recall that distH denotes the hyperbolic distance defined in
(9.5.8). Here, we also treat the soft hyperbolic graph, for which the probability that there
exists an edge between the vertices u = (ru , φu ) and v = (rv , φv ) equals
−1
pH (distH (u, v)) = 1 + e(dH (u,v)−Rn )/(2T ) . (9.5.17)

In the limit T & 0, this gives rise to the (hard) hyperbolic graph studied in the previous
subsection. The identification is given in the following theorem:
436 Related Models

Theorem 9.31 (Hyperbolic random graphs as product GIRGs)

(a) The soft hyperbolic random graph is a GIRG that satisfies Assumption 9.30 with limiting
W satisfying (9.5.15) with parameters
d = 1, τ = 2α + 1, α = 1/T, (9.5.18)
and limiting connection probabilities given by
1
κ(t, wu , wv ) = α , (9.5.19)
ct
1+ wu wv
√
where c = 2π/ν .
(b) The hard hyperbolic random graph can be mapped to a one-dimensional threshold
product GIRG, where vertices u, v ∈ [n] are connected precisely when kxu − xv k ≤
νwu wv /π .
Sketch of proof of Theorem 9.31. We define the mapping
wv := e(R−rv )/2 and xv := φv /2π, (9.5.20)
where, for v ∈ [n], its radial coordinate equals rv and its angular coordinate equals φv . The
map (r, φ) 7→ (e(R−r)/2 , φ/2π) is a bijection from the hyperbolic space to [0, eR/2 ] × T1,1 ,
where T1,1 = [− 21 , 12 ] with periodic boundary conditions. Therefore, it also has an inverse,
which we denote by g(wv , xv ) = (rv , φv ). Then, for i, j ∈ [n], we let
pi,j = pi,j ((wi , xi ), (wj , xj )) = pH (distH (g(wi , xi ), g(wj , xj ))). (9.5.21)
This maps the soft hyperbolic random graph to an explicit GIRG.
We next verify the conditions on the edge probabilities that are required for GIRGs. We
start by computing from (9.5.7) that
Z r Z r
sinh(αr) cosh(αr) − 1
P(rv ≤ r) = ρ(r)dr = α dr = . (9.5.22)
0 0 cosh(αR) − 1 cosh(αR) − 1
Since Rn = 2 log(n/ν) is large,
eα(r−R) + e−α(R+r)
P(rv ≤ r) = = eα(r−R) (1 + o(1)). (9.5.23)
1 + e−2R
Therefore, the distribution of wv satisfies
P(wv > w) = P(rv < R − 2 log w) = e−2α log w (1 + o(1))
= w−(τ −1) (1 + o(1)), (9.5.24)
as required by (9.5.15) and since τ = 2α+1. In particular, this implies that rv ∈ [R, R−A]
for most vertices in [n] when A is large. Thus, most vertices are close to the boundary of the
hyperbolic disk.
We next investigate the role of the radial coordinate. For this, we assume that ru ∈ [0, R],
which is true almost surely, and ru + rv ≥ R, which occurs whp for R large. We use
cosh(x ± y) = cosh(x) cosh(y) ± sinh(x) sinh(y) (9.5.25)
9.5 Spatial Random Graphs 437

to rewrite (9.5.8) for u = (ru , φu ) and v = (rv , φv ) as

cosh(distH (u, v)) = cosh(ru − rv ) + sinh(ru ) sinh(rv )[1 − cos(kφu − φv k)]. (9.5.26)
When kφu − φv k is small, as in Assumption 9.30, we thus obtain that cosh(distH (u, v))
equals
cosh(ru − rv ) + 21 sinh(ru ) sinh(rv )kφu − φv k2 (1 + o(1)). (9.5.27)
Denote sn = distH (u, v) − Rn , and note that
esn = 2 cosh(distH (u, v))e−Rn − e−sn −2Rn . (9.5.28)
Further, multiplying (9.5.27) by e−Rn , with Rn = 2 log(n/ν), leads to
cosh(distH (u, v))e−Rn
= 21 eru −Rn erv −Rn eRn kφu − φv k2 (1 + o(1)) + cosh(ru − rv )e−Rn
2π 2 n2 (xu − xv )2 2
2 (wu /wv ) + (wu /wv )
2
= 2 (1 + o(1)) + ν . (9.5.29)
ν (wu wv )2 n2
Combining (9.5.28) and (9.5.29), and substituting this into (9.5.17) for xv = x + t/n, leads
to
−1 2π 2 t2 1/(2T ) −1
pH (distH (u, v)) = 1 + esn /(2T ) = 1+ (1 + o(1))
ν 2 (wu wv )2
t α −1
= 1+ c (1 + o(1)), (9.5.30)
wu wv
√
where c = 2π/ν .
Since hyperbolic random graphs are special cases of the GIRG, some results can simply
be deduced from the analysis for the GIRG. Also, the fact that hyperbolic graphs arise as a
one-dimensional GIRG helps us to explain why there is no giant for α > 1.

Degree Structure of Product Geometric Inhomogeneous Random Graphs

We start by analysing the degree structure of the graph. Let Pk(n) denote the proportion of
vertices with degree k in the hyperbolic random graph, which obeys a power law:
Theorem 9.32 (Power-law degrees in product GIRGs) Let the edge probabilities in the
product geometric inhomogeneous random graph be given by
W W α
u v
pu,v = λ 1 ∧ kXu − Xv k−dα , (9.5.31)
n
where α > 1, (Wu )u∈[n] are iid random variables satisfying P(W > w) = cW w−(τ −1) (1+
o(1)) for some cW > 0 and w → ∞, and (Xu )u∈[n] are iid uniform random variables on
[− 12 , 12 ]d . As n → ∞, there exists a probability distribution (pk )k≥0 such that
P
Pk(n) −→ pk , (9.5.32)
where (pk )k≥0 obeys an asymptotic power law, i.e., there exists a constant c > 0 such that
X
pl = ck −(τ −1) (1 + o(1)). (9.5.33)
l>k
438 Related Models

Exercise 9.54 asks the reader to investigate the proof of Theorem 9.32.

Local Limit of Geometric Inhomogeneous Random Graphs

We next study the local limit of geometric inhomogeneous random graphs. Before stating
the result, we introduce the local limit. We fix a Poisson process with intensity 1 on Rd ,
with an additional point at the origin that will serve as our point of reference. With each
point x ∈ Rd in this process, we associate an independent copy of the limiting weight Wx .
Then we draw an edge between two vertices x, y with probability κ(kx − yk, Wx , Wy ),
where these variables are conditionally independent given the points of the Poisson process
and the weights of its points. Call the above process the Poisson infinite GIRG with edge
probabilities given by κ. When κ ∈ {0, 1}, we are dealing with a threshold Poisson infinite
GIRG with edge statuses given by κ. The main result is as follows:
Theorem 9.33 (Local limit of GIRG) The GIRG, with pu,v as in (9.5.13) under Assumption
9.30, converges locally in probability to the Poisson infinite GIRG with edge probabilities
given by κ.
Together with Theorem 9.31, Theorem 9.33 identifies the local limit of both the soft as
well as the hard hyperbolic random graph, as well as for the product GIRG in (9.5.14), for
which, with W as in Assumption 9.30,
w w max{α,1}
u v
κ(t, wu , wv ) = λ t−dα . (9.5.34)
E[W ]

Giant in Geometric Inhomogeneous Random Graphs

We next study the giant in GIRGs. First, we define the class of GIRGs to which these results
apply. We denote the graph by Gn = (V (Gn ), E(Gn )), and let the vertices V (Gn ) of Gn
be the points of a Poisson point process on 12 [−n1/d , n1/d ] × [1, ∞) with intensity measure

µ(dx, dw) = dx × (τ − 1)w−τ dw. (9.5.35)

For p ≥ 0, α ∈ [0, ∞], σ ∈ [0, ∞), and β > 0, we let

 n 0 0 σ
α o
p min 1, β(x∧x )(x∨x
0
)
for α < ∞,
kx−x k
p((x, w), (x0 , w0 )) = (9.5.36)
p1
0 0 σ 0
for α = ∞.
β(x∧x )(x∨x ) ≥kx−x k

For u, v ∈ V (Gn ), we independently let {u, v} ∈ E(Gn ) with probability p(u, v), and
{u, v} 6∈ E(Gn ) otherwise.
Theorem 9.34 (Giant in GIRGs) Let Gn be the GIRG defined in (9.5.35) and (9.5.36) with
P
p ≥ 0, α ∈ [0, ∞], σ ∈ [0, ∞), and β > 0. Then |Cmax |/n −→ ζ whp, where ζ is the
survival probability of the Poisson infinite GIRG with edge probabilities given by (9.5.35)
P
and (9.5.36), while |C(2) |/n −→ 0 for all α > 0.
Special cases of the GIRG defined in (9.5.35) and (9.5.36) are the product GIRG, as well
as certain cases of the GIRG in Assumption 9.30. By the construction of the inhomogeneous
Poisson process in (9.5.35) and conditional on |V (Gn )| = n0 , the variables ((xi , wi ))i∈[n0 ]
9.5 Spatial Random Graphs 439

are iid with distribution that is uniform on 21 [−n1/d , n1/d ] for the xi ’s, and wi iid copies of
a Pareto random variable W with P(W > w) = w−(τ −1) for w ≥ 1.
Let us complete this discussion by considering the situation when d = 1 using the local
convergence statement in Theorem 9.33. Here, there is no infinite component in the local
limit for τ > 3, so that no giant exists in the pre-limit either by Corollary 2.27.
We call a vertex u strongly isolated when there does not exist an occupied edge {v1 , v2 } ∈
E(Gn ) with v1 ≤ u ≤ v2 in the graph. In particular, this means that the connected com-
ponent of u is finite. We prove that the expected number of occupied edges {v1 , v2 } with
v1 ≤ u ≤ v2 is bounded. Indeed, for the local limit of the hard hyperbolic graph in d = 1,
this expected number is equal to
" #
X 1 X h (Wv1 Wv2 )α i
E α ≤ C E
(Wv1 Wv2 )α + kv1 − v2 kα

v ≤u≤v
1 2 1 + kv1 −v2 k
cWv1 Wv2 v ≤u≤v
1 2

X h Xα i
≤C kE , (9.5.37)
k≥1
X α + kα

where X = W1 W2 is the product of two independent W variables. Now, when P(W >
w) = w−(τ −1) , it is not hard to see that
log x
P(X α > x) ≤ C . (9.5.38)
x(τ −1)/α
In turn, this implies that
h Xα i log k
E ≤ C τ −1 . (9.5.39)
X +k
α α k
We conclude that, when multiplied by k , this is summable when τ > 3 so that the expected
number of occupied edges {v1 , v2 } with v1 ≤ u ≤ v2 is bounded. In turn, this suggests
that this number equals zero with strictly positive probability (beware, these numbers are
not independent for different u), and this in turn suggests that there is a positive proportion
of them. However, when there is a positive proportion of strongly isolated vertices, there
cannot be an infinite component. This intuitively explains why the existence of the giant
component is restricted to τ ∈ (2, 3). Thus, the absence of a giant in hyperbolic graphs
with power-law exponent τ with τ > 3 is intimately related to the fact that this model is
inherently one-dimensional.

Ultra-Small Distances in Product Geometric Inhomogeneous Random Graphs

We continue by discussing ultra-small distances in product GIRGs:
Theorem 9.35 (Ultra-small distances in GIRGs) For τ ∈ (2, 3), with Gn the product
GIRG in (9.5.14) and (9.5.15), conditional on o1 , o2 being connected,
distGn (o1 , o2 ) P 2
−→ . (9.5.40)
log log n | log (τ − 2)|
We return to the question of the distances in GIRGs for τ > 3 in Section 9.5.5, where we
discuss scale-free percolation, for which the strongest results have been shown. It is natural
to believe that similar results may also be true for product GIRGs.
440 Related Models

Clustering in Geometric Inhomogeneous Random Graphs

We now study the clustering properties of spatial inhomogeneous random graph.
Theorem 9.36 (Clustering in GIRGs) The local clustering coefficient in the geometric
inhomogeneous random graph, with puv as in (9.5.13), under Assumption 9.30, converges
in probability to a positive constant.
Theorem 9.36 follows directly from Theorem 9.33 combined with Theorem 2.23. One
could say that Theorem 9.36 exemplifies why the spatial inhomogeneous random graphs are
so relevant. Indeed, they allow us to build random graphs that have non-vanishing clustering,
as well as all the other properties that real-world networks have, and for which the models
discussed in this book have been invented. For τ ∈ (2, 3), it is not clear that the global
clustering coefficient is also strictly positive, recall Theorem 2.22, since the square of the
degree of a uniform vertex is not uniformly integrable.

9.5.4 S PATIAL P REFERENTIAL ATTACHMENT M ODELS

We discuss three spatial preferential attachment models (SPAMs), and recall that we have
already encountered an example in Section 9.3.4 (Theorem 9.14). There, such models were
considered to model community structure, for which it is most natural to let the vertices
have types in a finite set. Here we substantially extend this notion, and instead consider the
vertices to have a spatial location in some abstract measurable space.

Geometric Preferential Attachment Model with Uniform Locations

Let S be a bounded metric space and µ be the uniform measure on S . We require that
µ(Br (u)) is independent of u ∈ S , where Br (u) = {x ∈ S : |x − u| ≤ r} denotes the
extrinsic or geometric ball of radius r around u ∈ S . Let Gn = (V (Gn ), E(Gn )) denote
the graph at time n. Then, we let V (Gn ) be a subset of S of size n.
The process (Gn )n≥0 evolves as follows. At time n = 0, G0 is the empty graph. At time
n + 1, given Gn we obtain Gn+1 in the following way. Let xn+1 be chosen uar from S , and
denote V (Gn+1 ) = V (Gn ) ∪ {xn+1 }. We assign m edges to the vertex xn+1 , which we
connect independently of each other to vertices in V (Gn ) ∩ Br (xn+1 ). Denote
X
Dn (xn+1 ) = Dv(n) , (9.5.41)
v∈V (Gn )∩Br (xn+1 )

where Dv(n) denotes the degree of vertex v ∈ V (Gn ) in Gn . Thus, Dn (xn+1 ) denotes the
total degree of all vertices located in Br (xn+1 ). The m edges are connected to vertices
(y1 , . . . , ym ) conditionally independently given (Gn , xn+1 ). Thus, for all v ∈ V (Gn ) ∩
Br (xn+1 ),
Dv(n)
P(yi = v | Gn ) = , (9.5.42)
max(Dn (xn+1 ), αmAr n)
while
Dn (xn+1 )
P(yi = xn+1 | Gn ) = 1 − , (9.5.43)
max(Dn (xn+1 ), αmAr n)
where we denote Ar = µ(Br (u)); α ≥ 0 is a parameter, and r = rn is a radius to be chosen
9.5 Spatial Random Graphs 441

appropriately. The parameter r may depend on the size of the graph. The degree sequence
of the model that arises is characterized in the following theorem:
Theorem 9.37 (Degrees in preferential attachment models with uniform locations on the
sphere) Let S be the surface of a three-dimensional unit ball. Take rn = nβ−1/2 log n,
where β ∈ (0, 12 ) is a constant. Finally, let α > 2 and m ≥ 1. In the above geometric
preferential attachment model given by (9.5.41)–(9.5.43),
P
Pk(n) −→ pk , (9.5.44)
−(α+1)
where pk = Ck (1 + o(1)) for k large.
Theorem 9.37 allows for r = rn = o(1), so that vertices can make connections only to
vertices that are close by.
We next discuss a setting where rn remains fixed. Let us first introduce the model. We
again assume that S is a metric space, and µ is the uniform measure on S . Further, let
α : R+ → R+ be an attractiveness function. The graph process is denoted by (Gn )n≥0 .
Here G0 is assumed to be a connected graph with n0 vertices and e0 edges. We let the
spatial locations (Xi )i≥1 be iid draws from µ in S . Each vertex enters the graph with m
edges to be connected to the vertices already in the graph. Denote the receiving vertices by
(Vi(n+1) )i∈[m] . Conditional on Gn and Xn+1 , we let the vertices (Vi(n+1) )i∈[m] be condition-
ally iid with
(degGn (u) + δ)α(|u − Xn+1 |)
P(Vi(n+1) = u | Xn+1 , Gn ) = P , (9.5.45)
i∈[n] (degGn (Xi ) + δ)α(|Xi − Xn+1 |)

where δ > −m is a parameter in the model. Let

Z
Ik = α(|x − u|)k µ(dx). (9.5.46)
S

The main result about the degree distribution in the above model is as follows:
Theorem 9.38 (Degrees in preferential attachment models with uniform locations) Let S
be a general metric space, and let µ be the uniform measure on it. Let m ≥ 1 and δ ≥ 0. For
m = 1 assume that I1 < ∞, and, for m ≥ 2, that I2 < ∞ in (9.5.46). Let α be continuous,
and for δ = 0 assume that α(r) ≥ α0 > 0. In the geometric preferential attachment model
described in (9.5.45),
P Γ(k + δ)Γ(m + 2 + δ + δ/m)
Pk(n) −→ pk = (2 + δ/m) . (9.5.47)
Γ(m + δ)Γ(k + 3 + δ + δ/m)
The asymptotic degree distribution in (9.5.47) is identical to that in the non-geometric
preferential attachment model; see, e.g., (1.3.60). This is true since we are working on a
fixed metric space, where vertices become more and more dense. Thus, locally, the behavior
is very similar to a normal preferential attachment model, i.e., the spatial effects are “washed
away.” Remarkably, δ ∈ (−m, 0) is not allowed even though that model is perfectly well
defined.
So far, we have discussed settings where I1 < ∞, so that, in particular, the geometric
component is not very pronounced. We continue to study a setting where r 7→ α(r) is quite
large for small r, so that the proximity of the vertices does become much more pronounced:
442 Related Models

Theorem 9.39 (Stretched exponential degrees in geometric preferential attachment models)

Let S be a general metric space, and let µ be the uniform measure on it. Let α(r) = r−s
for s > d. In the geometric preferential attachment model described in (9.5.45), for any
γ ∈ (0, (s − d)/(2s − d)), there exists a constant C such that, almost surely,
γ
Pk(n) ≤ Ce−k . (9.5.48)
Thus, we see that if the geometric restriction is highly pronounced, then degree distribu-
tions are no longer power-law. We refer to the notes and discussion in Section 9.6 for a brief
discussion of related results.
We next discuss a setting where the location of points is not uniform, and thus locally the
power-law degree exponent can vary.

Geometric Preferential Attachment Model with Non-Uniform Locations

Again, we assume that S is a bounded metric space, and µ is a (not necessarily uniform)
measure on S . This means that µ(Br (u)) might depend on u. As before we let α : S × S →
R+ be an attractiveness function. Here, for simplicity, we restrict to the setting where S is
finite, denote |S| = t, and for simplicity take S = [t].
We let (Xv )v≥1 be iid random variables with measure µ. For s ∈ [t], let
1 X
(n)
Ps,k = 1{Dv (n)=k,Xv =s} (9.5.49)
n v∈[n]

be the degree distribution of types in the model, where Dv (n) denotes the degree of vertex
v at time n. The main result on the degree distribution is as follows:
Theorem 9.40 (Degrees in preferential attachment models with uniform locations) Let
S = [t] be a finite space, and let µ be a measure on it. Assume that α(x, y) ≥ α0 > 0. In
the geometric preferential attachment model described in (9.5.45),
(n) P 2µs Γ(k)Γ(m + φ(s)−1 )
Ps,k −→ ps,k = , (9.5.50)
φs Γ(m)Γ(k + 1 + φ(s)−1 )
where φ(s) satisfies
X α(s, r)
φ(s) = P µ(r); (9.5.51)
r∈[t] j∈[t] α(j, r)ν(j)

ν(r) is the limiting proportion of edges at spatial location r.

Theorem 9.40 implies that asymptotic limiting proportions of edges at the various spatial
locations exist. Exercise 9.52 investigates the degree distribution of the model.

Spatial Preferential Attachment as Influence

We close the discussion of spatial PAMs by considering a spatial preferential attachment
model with local influence regions, inspired by the Web graph. The model is directed, but
it can easily be adapted to an undirected setting. In preferential attachment models, new
vertices should be “aware” of the degrees of the vertices already present. In reality, it is
quite hard to observe vertex degrees and, therefore, we let vertices instead have a region
9.5 Spatial Random Graphs 443

of influence in some metric space, for example the torus [0, 1]d for some dimension d, for
which the metric equals
d(x, y) = min{kx − y + uk∞ : u ∈ {0, 1, −1}d }. (9.5.52)
When a new vertex arrives, it is uniformly located somewhere in the unit cube, and it con-
nects to each of the older vertices in whose region of influence it lands independently and
with fixed probability p. These regions of influence evolve as time proceeds, in such a way
that the volume of the influence region of the vertex v at time n is equal to
a1 Dv (n) + a2
Rv (n) = , (9.5.53)
n + a3
where now Dv (n) is the in-degree of vertex v at time n and a1 , a2 , a3 are parameters which
are chosen such that pa1 ≤ 1. One of the main results is that this model is a scale-free graph
process with limiting degree distribution (pk )k≥0 satisfying (1.1.9) with τ = 1 + 1/(pa1 ) ∈
[2, ∞):
Theorem 9.41 (Degrees in preferential attachment models with influence) In the above
preferential attachment models with influence where the volume of the influence region is
given by (9.5.53), for k ≤ (n1/8 / log n)4pa1 /(2pa1 +1) ,
1 X
Pk(n) = 1{Dv (n)=k} −→P
pk , (9.5.54)
n v∈[n]

where
k−1
pk Y ja1 + a2
pk = . (9.5.55)
1 + kpa1 + pa2 j=0 1 + jpa1 + a2

In Exercise 9.53, the reader is asked to prove that pk = ck −(1+1/(pa1 )) for the pk in
(9.5.55).

9.5.5 C OMPLEX N ETWORK M ODELS ON THE H YPERCUBIC L ATTICE

So far, we have studied finite random graphs having geometry. However, there is a large body
of literature that studies infinite random graphs. Usually these models live on the hypercubic
lattice and are related to percolation. See Grimmett (1999); Kesten (1982); Bollobás and
Riordan (2006) for extensive introductions to percolation.
The main questions in percolation relate to the existence and structure of infinite con-
nected components. Many of these models obey a phase transition in terms of the infinite
connected component. Sometimes such models are also formulated in terms of Poisson pro-
cesses on the d-dimensional reals, and then the topic touches upon stochastic geometry. See,
for example, Meester and Roy (1996) for an introduction to continuum percolation.
How are these models related to the types of random graphs studied in this section? First
of all, infinite models can arise as local limits of finite models. Indeed, we have already
seen in Theorem 9.33 that the local limit of a geometric inhomogeneous random graph is
formulated in terms of a Poisson process on Rd . However, models on the hypercubic lattice
can be expected to arise as local limits of models where the spatial locations of the vertices
are on a grid, such as [n1/d ]d (where, to simplify notation, we assume that n1/d is an integer).
444 Related Models

(a) (b)

Figure 9.16 Examples of scale-free percolation graphs, with (a) τ = 2.5 and (b)
τ = 3.5.

For models on Zd , the definition of power-law degrees needs some adaptation. Indeed,
we say that an infinite random graph has power-law degrees when
pk = P(Do = k), (9.5.56)
where Dx is the degree of vertex x ∈ Zd and o ∈ Zd is the origin, satisfies (1.4.4) for some
τ > 1. This is a reasonable definition. Indeed, let Br (o) = [−r, r]d ∩ Zd be a cube of width
2r around the origin o ∈ Zd , denote n = (2r + 1)d , and, for each k ≥ 0, let
1 X
Pk(n) = 1{Dx =k} , (9.5.57)
n x∈B (o)
r

which, assuming translation invariance and ergodicity, converges almost surely to pk .

In this section, we discuss scale-free percolation, which is a hypercubic lattice model like
the inhomogeneous random graph, and spatial configuration models. There are currently no
results on preferential attachment models on hypercubic lattices.

Scale-Free Percolation
Let each vertex x ∈ Z be equipped with an iid weight Wx . Conditional on the weights
d

(Wx )x∈Zd , the edges in the graph are independent and the probability that there is an edge
between x and y is
α
/|x−y|αd
pxy = 1 − e−λ(Wx Wy ) , (9.5.58)
for α, λ ∈ (0, ∞). Here, the parameter α > 0 describes the long-range nature of the model,
while we think of λ > 0 as the percolation parameter. In terms of the weight distribution,
we are mainly interested in settings where the Wx have unbounded support in [0, ∞) and
particularly when they vary substantially, as in (9.5.15). The name scale-free percolation is
justified by the following theorem:
Theorem 9.42 (Power-law degrees for power-law weights) Fix d ≥ 1, consider scale-
free percolation as in (9.5.58), and assume that the vertex weights are iid random variables
satisfying (9.5.15).
9.5 Spatial Random Graphs 445

100 τ = 2.5
τ = 3.5

P(X > x)
10−1

100 101 102

x
Figure 9.17 Degree distributions of the scale-free percolation graphs in Figure
9.16.

(a) Let α ≤ 1 or τ ≤ 2. Then P(Do = ∞ | Wo > 0) = 1.

(b) Let α > 1 and τ > 2. Then there exists s 7→ `(s) that is slowly varying at infinity such
that
P(Do > s) = s−(τ −1) `(s). (9.5.59)
When edges are present independently but not with the same probability, it is impossible
to have infinite-variance degrees in the long-range setting (see Exercise 9.55). Thus, the
vertex weights have a pronounced effect on the structure of the graph, particularly when
τ ∈ (2, 3) (see also Exercise 9.56).
We continue by studying the percolative properties of scale-free percolation. Denote the
connected component or cluster of x by C (x) = {y : x ←→ y} , and the number of vertices
in C (x) by |C (x)|. Further, define the percolation probability as
θ(λ) = P(|C (o)| = ∞), (9.5.60)
and the critical percolation threshold λc as
λc = inf{λ : θ(λ) > 0}. (9.5.61)
It is a priori unclear whether λc < ∞, but λc < ∞ holds in most cases. Indeed, if P(W =
0) < 1, then λc < ∞ in all d ≥ 2. As for GIRGs, d = 1 is special and the results are not
optimal. Indeed, λc < ∞ for α ∈ (1, 2] and P(W = 0) < 1, while λc = ∞ for α > 2 and
τ > 3.
More interesting is whether λc = 0 or not. The following theorem shows that this depends
on whether the degrees have infinite variance:
Theorem 9.43 (Positivity of the critical value) Fix d ≥ 1, and consider scale-free perco-
lation as in (9.5.58), with iid weights (Wx )x∈Zd satisfying (9.5.15), α > 1, and τ > 2.
(a) Assume that τ > 3. Then, θ(λ) = 0 for small λ > 0, that is, λc > 0.
(b) Assume that τ ∈ (2, 3). Then, θ(λ) > 0 for every λ > 0, that is, λc = 0.
In percolation with independent edges, instantaneous percolation in the form λc = 0 can
occur only when the degree of the graph is infinite. The randomness in the vertex weights
facilitates instantaneous percolation, for which λc = 0, in scale-free percolation. We see a
446 Related Models

(a) (b)

Figure 9.18 Examples of a scale-free percolation graph on the plane, with (a)
τ = 2.5 and (b) τ = 3.5.

100 τ = 2.5
τ = 3.5
P(X > x)

10−1

10−2
100 101 102
x
Figure 9.19 Degree distributions of the scale-free percolation on the planar graphs
in Figure 9.18.
similar phenomenon for rank-1 inhomogeneous random graphs, such as the Norros–Reittu
model and the configuration model, where the giant is robust (recall, e.g., Theorem 3.20).
We close our discussion on scale-free percolation by studying the graph distances. In
finite graphs, typical distances are obtained by choosing two vertices uar from the vertex set,
and studying how the graph distance between them evolves when the network size n → ∞.
For infinite models, however, we replace this by studying the graph distances between far-
away vertices, i.e., we study distG (x, y) for |x − y| large. By translation invariance, this is
the same as studying distG (0, x) for |x| large. Below is the main result:
Theorem 9.44 (Distances in scale-free percolation) Fix d ≥ 1, consider scale-free perco-
lation as in (9.5.58), with iid weights (Wx )x∈Zd satisfying (9.5.15), α > 1, and τ > 2, and
let λ > λc . Then, conditional on o ←→ x,
(a) for τ ∈ (2, 3) and α > 1,
distG (o, x) P 2
−→ ; (9.5.62)
log log |x| | log (τ − 2)|
(b) for τ > 3 and α ∈ (1, 2), whp for every ε > 0,
0
(log |x|)∆ −ε ≤ distG (o, x) ≤ (log |x|)∆+ε , (9.5.63)
9.5 Spatial Random Graphs 447

α
distG (o, x) & |x|

2 α=2
distG (o, x) ≈
log log |x| distG (o, x) ≈ (log |x|)∆
distG (o, x)
1 ≤2 α=1

distG (o, x) ≤ d1/(1 − α)e

τ
2 3 4 5
Figure 9.20 Distances in scale-free percolation. Figure adapted from Hao and
Heydenreich (2023).

for some 0 < ∆0 ≤ ∆ < ∞;

(c) for τ > 3 and α > 2, whp, for some c > 0,
distG (o, x)
≥ c. (9.5.64)
|x|
Theorem 9.44 shows that the graph distances depend sensitively on all the parameters
in the model. Explicit formulas are known for ∆, ∆0 in Theorem 9.44(b). For example,
∆ = ∆0 = log 2/ log (2/α) when α(τ − 1) > 2, but ∆0 < ∆ in other cases. See Figure
9.20 for a pictorial representation of Theorem 9.44, where the lighter shaded area signifies
that ∆ < ∆0 , while the darker shaded area signifies that ∆0 = ∆.

Spatial Configuration Models on the Lattice

Let (Dx )x∈Zd be iid integer-valued random variables. Can we construct a simple graph
where vertex x has exactly degree Dx ? This is a matching problem, where we can think
of Dx as the number of half-edges incident to vertex x and we wish to pair the half-edges,
as in a configuration model. However, it is also desirable that the edges are not too long.
Indeed, the whole idea behind spatial models is that edges are more likely between close-by
vertices. Further, we also prefer our graph to be similar from the perspective of every vertex.
The following theorem shows when such a matching with finite expected total edge length
exists:
Theorem 9.45 (Matching with bounded total edge length per vertex) Let F be a probability
distribution supported on the non-negative integers and let D be a random variable with
distribution F . There exists a translation invariant random graph model G on Zd , with iid
degrees distributed according to F , satisfying

1{{o,x}∈E(G)} |x − o| < ∞,
hX i
E (9.5.65)
x∈Zd

precisely when E[D(d+1)/d ] < ∞.

448 Related Models

Since it is hard even to construct the spatial configuration model, it may not come as a
surprise that few results are known for this particular model.
We close this section by discussing a particular matching for d = 1 that is rather natural.
Give each half-edge a direction uar, meaning that it points to the left or to the right with
equal probability and independently across the edges. The edges are then obtained by pairing
half-edges pointing to each other, first exhausting all possible connections between nearest
neighbors, then linking second-nearest neighbors, and so on.
Assuming that D has finite mean, it is known that this algorithm leads to a well-defined
configuration, but that the expected length of the longest edge attached to a given vertex is
infinite. Indeed, let N be the length of the edge to the furthest neighbor of o. Then, N < ∞
almost surely. If we assign directions with a probability that is not equal to 12 then N = ∞
with positive probability. Further, let N1 denote the length of the first edge incident to o.
Then, E[N1 ] = ∞ when E[D] < ∞. Thus, edges have finite (spatial) length, but their
lengths have infinite mean.

9.6 N OTES AND D ISCUSSION FOR C HAPTER 9

In this chapter we have given an extensive introduction to random graph models for networks that are
directed, and have community structures and geometry. Obviously, these additional features can also be
combined, but there is only a limited literature on that, and we refrain from discussing such models.
We have not been able to cover all the relevant models that have attracted attention in the literature.
Particularly for dynamic random graphs, we have not discussed some of the relevant models. Examples are
copying or duplication models; these are dynamic random graphs in which new vertices copy a portion of
the neighbors of an older vertex (Kumar et al. (2000)).
Another class of dynamic models that has attracted considerable attention are models aimed at delaying
or accelerating the birth of the giant component. Indeed, one can obtain the combinatorial Erdős–Rényi
random graph by adding edges uniformly one by one, until the desired number of edges is added; then the
distribution is the same as that for the Erdős–Rényi random graph with the same number of edges. For this
process there is a giant when the number of added edges is m = cn with c > 12 , while there is not giant for
m = cn with c < 12 .
This process can be modified using a “power of choice” by considering a pair of edges at each time.
Achlioptas raised the question whether is it possible to select one of the two exposed edges at each stage
in such a way as to either delay or accelerate the birth of a giant? Spencer and Wormald (2007) called this
aptly birth control for giants. In general, one can select the edge for which the connected components on
either side are the smallest in some sense. Bohman and Frieze (2001, 2002) studied settings where the first
edge is taken when it connects two isolated vertices, but otherwise the second edge is chosen. They showed
that there is no giant yet when the number of added edges is m = cn with c > 0.535, so that this indeed
delays the birth of the giant. Spencer and Wormald (2007) narrowed down the birth of the giant as being in
between m = 0.8293n and m = 0.9823n.
Intuitively, one may guess that the birth of the giant is delayed most when the chosen edge minimizes
the product of the connected component sizes of the vertices in the edge. This is the so-called product rule,
and is sometimes also called explosive percolation since the size of the giant grows very fast after the giant
is first formed. In fact, this led Achlioptas et al. (2009) to the conjecture that the limiting size of the giant
for m = cn might be discontinuous around the critical value. This, however, turns out not to be true, as
proved by Riordan and Warnke (2011). It is as yet unclear how the limiting proportion of vertices in the
giant grows slightly beyond the critical value, though.
The notes to this chapter are substantially more extensive than those in other chapters, because many
models are being discussed, and we only have limited space. As before, the notes can be used to learn more
about the models and to get pointers to the literature.

Notes on Section 9.1

Citation networks. Our discussion of citation networks is inspired by, and follows, Garavaglia et al. (2017).
9.6 Notes and Discussion for Chapter 9 449

For citation networks, there is a rich literature proposing models for them using preferential attachment
schemes and adaptations of these, mainly in the complex-networks literature. Aging effects, i.e., taking
account of the age of a paper in its likelihood of obtaining citations, have been extensively considered as
the starting point to investigate the dynamics of citation networks; see Wang et al. (2009, 2008); Hajra
and Sen (2005, 2006); Csárdi (2006). Here the idea is that old papers are less likely to be cited than new
papers. Such aging has been observed in many citation network data sets and makes PAMs with weight
functions depending only on the degree ill-suited for them. Indeed, PAMs could more aptly be called the-
old-get-richer models, i.e., in general old vertices have the highest degrees. In citation networks, however,
papers with many citations appear all the time. Wang et al. (2013) investigated a model that incorporates
these effects; see also Wang et al. (2014) for a comment on the methods in that paper. On the basis of
empirical data, they suggested a model where the aging function follows a log-normal distribution with
paper-dependent parameters, and the preferential attachment function is the identity. Wang et al. (2013)
estimated the fitness function rather than using the more classical approach where the latter is taken to be
an iid sample of random variables.

Notes on Section 9.2

Local convergence for PageRank as stated in Theorem 9.1 was proved by Garavaglia et al. (2020), where
the notion of the marked backward local limit was also introduced. We have extended the discussion on
local convergence for directed graphs here, since these other notions are useful in other contexts as well, for
example in studying the strongly connected component of directed random graphs.
We also take the opportunity here to correct the statement of (Garavaglia et al., 2020, Theorem 2.1).
Indeed, in the proof of (Garavaglia et al., 2020, Theorem 2.1(1)), the more general result that P(Ro(Gnn ) >
r) → µ(R∅ > r) for every r ≥ 0 is stated. However, the proof of this result is incomplete, as noted
by Francesco Caravenna and Federica Finazzi. The statement of convergence for all continuity points r of
r 7→ µ(R∅ > r) does follow by using an appropriate Markov inequality. Similarly, (Garavaglia et al., 2020,
Theorem 2.1(2)) states the convergence in (9.2.7) for every r ≥ 0, while the proof can only be extended to
show (9.2.7) for continuity points of r 7→ µ(R∅ ≤ r). We thank Francesco and Federica for notifying us
of this oversight.
In more detail, for N ≥ 1, let Ro(N,G n
n)
denote the contribution due to k ≤ N on the rhs of (9.2.9)
(N )
and, similarly, let R∅ be that of the rhs in (9.2.10). Then, (Garavaglia et al., 2020, Lemma 4.1) states that
E[Ro(Gnn ) − Ro(N,G
n
n)
] ≤ αN +1 . Below (Garavaglia et al., 2020, (29)), it is claimed that this implies that also
|P(Ro(Gnn ) > r) − P(Ro(N,G
n
n)
> r)| = P(Ro(Gnn ) > r, Ro(N,G
n
n)
≤ r) ≤ αN +1 , (9.6.1)
but this is incorrect. However, by the Markov inequality, we do get that, for every ε > 0,
P(Ro(Gnn ) > r, Ro(N,G
n
n)
≤ r − ε) ≤ P(Ro(Gnn ) − Ro(N,G n
n)
≥ ε)
1
≤ E[Ro(Gnn ) − Ro(N,G n
n)
] ≤ αN +1 /ε. (9.6.2)
ε
Continuing the argument below (Garavaglia et al., 2020, (29)) with the above bound instead, this shows that
the stated convergence follows provided that µ(R∅ > r) is continuous at the value r. See Exercises 9.57
and 9.58 for a more precise analysis.
Directed inhomogeneous random graphs. Bloznelis et al. (2012) studied the general directed inhomoge-
neous random graph considered in this section and proved Theorem 9.3. Cao and Olvera-Cravioto (2020)
continued this analysis and generalized it substantially. While the local convergence result in Theorem 9.2
has not been proved anywhere explicitly, it is the leading idea in the identification of the phase transition
as well as in the description of the limiting branching processes and joint degree distribution. Garavaglia
et al. (2020) investigated the marked directed forward local convergence for directed rank-1 inhomogeneous
random graphs.
Cao and Olvera-Cravioto (2020) specifically investigated general rank-1 inhomogeneous digraphs. Our
choice of edge probabilities in (9.2.14) and (9.2.15) is slightly different from that of Cao and Olvera-
Cravioto (2020), particularly since the factor 21 in (9.2.15) is absent in Cao and Olvera-Cravioto (2020). We
have added it so as to make wv(in) approximately the average in-degree of vertex v. If this were to be true
for every v, then we would also need that
X (in) X (out)
wv ≈ wv ,
v∈[n] v∈[n]
450 Related Models

which would imply the limiting statement in (9.2.18). Lee and Olvera-Cravioto (2020) used the results in
Cao and Olvera-Cravioto (2020) to prove that the limiting PageRank of such directed generalized random
graphs exists and that the solution obeys the same recurrence relation as on a branching-process tree. In
particular, under certain independence assumptions, this implies that the PageRank power-law hypothesis
is valid for such models.
Directed configuration models were investigated by Cooper and Frieze (2004), who proved Theorem 9.5.
In fact, the results in Cooper and Frieze (2004) also prove detailed bounds on the strongly connected com-
ponent in the subcritical regime, as well as precise bounds on the number of vertices whose forward and
backward clusters are large and the asymptotic size of forward and backward clusters. A substantially sim-
pler proof was given by Cai and Perarnau (2021).
Both Cooper and Frieze (2004) as well as Cai and Perarnau (2021) made additional assumptions on the
degree distribution. In particular, they assumed that E[Dn(in) Dn(out) ] → E[D(in) D(out) ] < ∞, which we
do not. Further, Cooper and Frieze (2004) assumed that d is proper, which is a technical requirement on
the degree sequence stating that (a) E[(Dn(in) )2 ] = O(1), E[(Dn(out) )2 ] = O(1); (b) E[Dn(in) (Dn(out) )2 ] =
o(n1/12 log n). In view of the fact that such conditions do not appear in Theorem 4.9, these conditions can
be expected to be suboptimal for Theorem 9.5 to hold, and we next explain how they can be avoided by a
suitable degree-truncation argument, which we now explain:
Assume that the out- and in-degrees in the directed configuration model DCMn (d) satisfy (9.2.25) and
(9.2.26). By Exercise 9.19 below, |Cmax | ≤ n(ζ + ε) whp for n large and any ε > 0. Exercise 9.19 is
proved by an adaptation of the proof of Corollary 2.27 in the undirected setting. Thus, we need to show
only that |Cmax | > n(ζ − ε) whp for n large and any ε > 0
Fix b > 1 large. We now construct a lower bounding directed configuration model where all the degrees
are bounded above by b. This is similar to the degree-truncation argument for the undirected configuration
model discussed in Section 1.3.3 (recall Theorem 1.11). When v is such that dv = d(out) v + d(in)
v ≥ b, we
split v into nv = ddv /be vertices, and we deterministically redistribute all out- and in-half-edges over the
nv vertices in an arbitrary way such that the out- and in-degrees of all the vertices that used to correspond
to v now have both out- and in-degree bounded by b. Denote the corresponding random graph by DCM0n .
The resulting degree sequence again satisfies (9.2.25) and (9.2.26). Moreover, for b > 1 large and by
(9.2.25) and (9.2.26), the limits in (9.2.25) and (9.2.26) for the new degree sequence are quite close to the
original limits of the old degree sequence, while the degrees are now bounded. As a result, we can apply
the original result in Cooper and Frieze (2004) or Cai and Perarnau (2021) to the new setting.
DenotePthe size of the largest SCC in DCM0n by |Cmax 0 0
P|. Obviously, since we split vertices, |Cmax | ≤
0
|Cmax |+ v∈[n] (nv −1). Therefore, |Cmax | ≥ |Cmax |− v∈[n] (nv −1). Take b > 0 sufficiently large that
P 0 0
v∈[n] (nv − 1) ≤ εn/3 and that ζ ≥ ζ − ε/3, where ζ is the forward–backward survival probability of
0 0
the limiting directed DCMn and ζ that of DCMn (d). Finally, for every ε > 0, whp |Cmax | ≥ n(ζ 0 −ε/3)n.
As a result, we obtain that, again whp and as required,
0
|Cmax | ≥ |Cmax | − nε/3 ≥ n(ζ 0 − 2ε/3) ≥ n(ζ − ε). (9.6.3)
Chen and Olvera-Cravioto (2013) studied a way to obtain nearly iid in- and out-degrees in the directed
configuration model. Here, the problem is that if ((d(in)
v , dv
(out)
))v∈[n] is an iid bivariate distribution with
(in) (out)
P
equal means, then v∈[n] (dv − dv ) has Gaussian fluctuations at best, so that it will likely not be zero.
Chen and Olvera-Cravioto (2013) indicated how the excess in- or out-half-edges can be removed so as to
keep the degrees close to iid. Further, they showed that the removal of self-loops and multiple directed edges
does not significantly change the degree distribution (so that, in particular, one would assume that the local
limits would be the same, but Chen and Olvera-Cravioto (2013) considered only the degree distribution).
Theorem 9.6 was first proved by van der van der Hoorn and Olvera-Cravioto (2018) under stronger
assumptions, but then the claim proved is also much stronger. Indeed, van der van der Hoorn and Olvera-
Cravioto (2018) not only identified the first-order asymptotics, as in Theorem 9.6, but also fluctuations like
those stated for the undirected configuration model in Theorem 7.24. This proof is substantially harder
than that of (Cai and Perarnau, 2023, Proposition 7.7). Theorem 9.7 was proved by Colenbrander (2022).
Theorem 9.8 was proved by Cai and Perarnau (2023).
Directed preferential attachment models and its PageRank. The backward local limit of the preferential
attachment model with random out-degrees was identified by Banerjee et al. (2023), using a collapsing
procedure on a continuous-time branching process in the spirit of Garavaglia and van der Hofstad (2018).
Theorem 9.9 is (Banerjee and Olvera-Cravioto, 2022, Theorem 3.1), Theorem 9.10 is (Banerjee and Olvera-
Cravioto, 2022, Theorem 3.3).
9.6 Notes and Discussion for Chapter 9 451

Notes on Section 9.3

Stochastic block models (SBMs) were introduced by Holland et al. (1983) in the context of group structures
in social networks. Their introduction was inspired by the work of Fienberg and Wasserman (1981). There,
the problem of community detection was also discussed. Of course, we have already seen SBMs in the
context of inhomogeneous random graphs, so our focus here is on the detection problem. The definition of
“solvable” in Definition 9.11, in particular (9.3.2), appeared in Bordenave et al. (2018).
The precise threshold in Theorem 9.12 was conjectured by Decelle et al. (2011), on the basis of a non-
rigorous belief propagation method. The proof of the impossibility result of the block model threshold con-
jecture was first given by Mossel et al. (2015). The proof of the solvable part of the block model threshold
conjecture was given by Mossel et al. (2018) and independently by Massoulié (2014). Mossel et al. (2016)
gave an algorithm that maximizes the fraction of vertices labeled correctly. The results were announced by
Krzakala et al. (2013). The achievability result for a general number t of types in (9.3.5) was proved by
Abbe and Sandon (2018) and Bordenave et al. (2018), whose presentation we follow. The convergence of
the eigenvalues in (9.3.8) is (Bordenave et al., 2018, Theorem 4) and the estimation of the types in (9.3.10)
is investigated in (Bordenave et al., 2018, Theorem 5). There, it is also shown that this choice satisfies
the solvability condition (9.3.2) in Definition 9.11. We have simplified the presentation substantially by
considering t = 2 and equal sizes of the groups of the two types.
Degree-corrected stochastic block models were first introduced by Karrer and Newman (2011). The con-
sistent estimation of communities in degree-corrected SBM was investigated by Zhao et al. (2012), under
the assumption that the average degree tends to infinity for weak consistency (meaning that most pairs of
vertices that are in the same community are correctly estimated to be such), and grows faster than log n for
strong consistency (meaning that all pairs of vertices that are in the same community are correctly estimated
to be such). Sparse settings suffer from a similar threshold phenomenon as that for the original stochastic
block model, as derived in Mossel et al. (2018) and Massoulié (2014). The impossibility result in Theorem
9.13 was proved by Gulikers et al. (2018). The solvable parts were proved by Gulikers et al. (2017a,b).
Preferential attachment models with community structure. Jordan (2013) investigated preferential attach-
ment models in general metric spaces. When one takes these metric spaces as discrete sets, one can inter-
pret the geometric location as a type or community label. This interpretation was proposed by Hajek and
Sankagiri (2019), where our results on community detection are also proved. We return to the geometric
interpretation of the model in Section 9.5.4. The result in (9.3.24) is stated in (Jordan, 2013, Theorem 2.1).
The asymptotics of the degree distribution in Theorem 9.14 is (Jordan, 2013, Theorem 2.2). (Jordan, 2013,
Theorem 2.3) further contains some estimates of the number of vertices in given regions and with given
degrees. The convergence of the proportion of errors in (9.3.30) is stated in (Hajek and Sankagiri, 2019,
Proposition 8), to which we refer for the formula for Err.
A preferential attachment model with community structure, phrased as a coexistence model, was in-
troduced by Antunović et al. (2016). In their model, contrary to the setting of Jordan (2013), the vertices
choose their type on the basis of the types of their neighbors, thus creating the possibility of denser con-
nectivity between vertices of the same type and thus community structure. The focus of Antunović et al.
(2016) is the coexistence of all the different types or rather a winner-takes-it-all phenomenon, depending on
the precise probability of choosing a type. Even with two types, the behavior is quite involved and depends
sensitively on the distribution of type choices. For example, for the majority rule (where the type of the
new vertex is the majority-type of its older neighbors), the winner type takes all, while if this probability is
linear in the number of older neighbors of a given type, there is always coexistence.

Notes on Section 9.4

Empirical properties of real-world network with community structure were studied by Stegehuis et al.
(2016b).
Inhomogeneous random graphs with communities were introduced by Bollobás et al. (2011).
Configuration models with community structure. The hierarchical configuration model was introduced by
van der Hofstad et al. (2017). The fit to real-world networks, particularly in the context of epidemics, was
studied by Stegehuis et al. (2016a). The configuration model with household structure was investigated in
Ball et al. (2009, 2010) in the context of epidemics on social networks. Particularly when studying epi-
demics on networks, clustering is highly relevant, as it can slow down the spread of infectious diseases. The
configuration model with clustering was defined by Newman (2009), who studied it, though not rigorously.
Random intersection graphs were introduced by Singer (1996) and further studied in Fill et al. (2000);
452 Related Models

Karoński et al. (1999); Stark (2004). Theorem 9.20 is (Deijfen and Kets, 2009, Theorem 1.1), where the
authors also proved that clustering can be tuned. The model was also investigated for more general distribu-
tions of groups per vertex by Godehardt and Jaworski (2003); Jaworski et al. (2006). Random intersection
graphs with prescribed degrees and groups are studied in a non-rigorous way in Newman (2003); Newman
and Park (2003). We refer to Bloznelis et al. (2015) for a survey of recent results.
Rybarczyk (2011) studied various properties of the random intersection graph when each vertex is in
precisely d groups that are all chosen uar from the collection of groups. In particular, Rybarczyk (2011)
proves results on the giant as in Theorem 9.22, as well as on the diameter of the graph, which is ΘP (log n)
when the model is sparse.
Bloznelis (2009, 2010a,b) studied a general random intersection model, where the sizes of groups are
iid random variables, and the sets of the vertices in them are chosen uar from the vertex set. His results
include distances (Bloznelis (2009)) and component sizes (Bloznelis (2010a,b)). Bloznelis (2013) studied
the degree and clustering structure in this setting.
Theorem 9.21 is proved in Kurauskas (2022); see also van der Hofstad et al. (2021). Both papers investi-
gate more general settings: Kurauskas (2022) also allows for settings with independent group memberships,
while van der Hofstad et al. (2021) also allows for more general group structures than the complete graph.
Theorem 9.22 is proved in van der Hofstad et al. (2022).
Random intersection graphs with communities. van der Hofstad et al. (2021) proposed a model that com-
bines the random intersection graph with more general communities than complete graphs. van der Hofstad
et al. (2021) identified the local limit, as well as the nature of the overlaps between different communities.
van der Hofstad et al. (2022) identified the giant component, also when percolation is being performed on
the model. See Vadon et al. (2019) for an informal description of the model, aimed at a broad audience.
Exponential random graphs. For a general introduction to exponential random graphs, we refer to Snijders
et al. (2006) and Wasserman and Pattison (1996). Frank and Strauss (1986) discussed the notion of Markov
graphs, for which the edges of the graph form a Markov field. The general exponential random graph is
only a Markov field when the subgraphs are restricted to edges, stars of any kind, and triangles. This is
exemplified by Example 9.24, where general degrees are used and give rise to a model with independent
edges. Kass and Wasserman (1996) discussed relations to Bayesian statistics.
For a discussion on the relation between statistical mechanics and exponential models, we refer to Jaynes
(1957). Let us now explain the relation between exponential random graphs and entropy maximization. Let
(px )x∈X be a probability measure on a general discrete set X . We define its entropy by
X
H(p) = − px log px . (9.6.4)
x∈X

Entropy measures the amount of randomness in a system. Shannon (1948) proved that the entropy is
the unique quantity that is positive, increases with increasing uncertainty, and is additive for independent
sources of uncertainty, so it is a very natural quantity.
The relation to exponential random graphs is that they are the random graphs that, for fixed expected
values of the subgraph counts NF (Gn ), optimize the entropy. Indeed, recall that X = (Xi,j )i≤i<j≤n are
Pof the graph, so that Gn is uniquely characterized by X. Then maximize H(p) over all
the edge statuses
the p such that x NF (x) = αF for some given αF and all subgraphs F ∈ F, where F is an appropriate
set of subgraphs. Then, using Lagrange multipliers, the optimization problem reduces to
1 PF ∈F βF NF (x)
pβ~ (x) = e , (9.6.5)
Z
where Z = Zn (β) ~ is the normalization constant given in (9.4.33), and β ~ = (βF )F ∈F is chosen as the
solution to (9.4.34). This implies that, indeed, the exponential random graph model optimizes the entropy
under this subgraph constraint.
See also Kass and Wasserman (1996) for a discussion of maximum entropy and a reference to its long
history as well as a critique of the method.
An important question is how one can find the appropriate β ~ = (βF )F ∈F such that (9.4.34) holds.
~ in (9.4.33) is quite
This is particularly difficult, since the computation of the normalization constant Zn (β)
hard. Often, Markov chain Monte Carlo (MCMC) techniques are used to sample efficiently from pβ~ . In this
case, such MCMC techniques perform a form of Glauber dynamics, for which (9.4.32) is the stationary
distribution. One can then try to solve (9.4.34) by keeping track of the value of NF in the simulation.
9.6 Notes and Discussion for Chapter 9 453

However, these methods can be very slow, as well as daunting, since the behavior of NF (X) under pβ~
~ making P NF (x)p ~ (x) very sensitive to small changes of β.
may undergo a phase transition in β, ~ See in
x β
particular Chatterjee and Diaconis (2013) for a discussion on this topic.
Bhamidi et al. (2011) (see also Bhamidi et al. (2008)) studied the mixing time of the exponential random
graph when edges are changed dynamically in a Glauber way. The results were somewhat disappointing,
since either edges are close to being iid as in an Erdős–Rényi random graph or the mixing is very slow.
These results, however, apply only to dense settings where the number of edges grows quadratically with
the number of vertices. This problem is closely related to large deviations on dense Erdős–Rényi random
graphs. See also Chatterjee (2017) for background on such large deviations and Chatterjee and Varadhan
(2011) for the original paper.
Results on sparse exponential random graphs are limited. We refer to Chakraborty et al. (2021) for a
discussion of how sparse exponential random graphs with a linear number of triangles can be obtained.
For more background on sufficient statistics and their relation to symmetries, we refer to Diaconis (1992).
For a discussion of the relation between information theory and exponential models, we refer to Shore and
Johnson (1980).

Notes on Section 9.5

The small-world model was first introduced by Watts and Strogatz (1998). Newman and Watts (1999) pre-
sented a minor adaptation called the Newman–Watts model, which is such that shortcuts are added rather
than edges being rewired. This has the advantage that the model remains connected. Small-world models
were first analyzed in Moore and Newman (2000); Newman et al. (2000a); Newman and Watts (1999), and
a non-rigorous mean-field analysis of distances in small-world models was first performed in Newman et al.
(2000a). There are various ways of adding long-range connections (for example by rewiring the existing
edges), and we have focussed on the models by Barbour and Reinert (2001, 2004, 2006), for which the
strongest mathematical results have been obtained. See Barbour and Reinert (2001) for a discussion of the
differences between the exact and mean-field analyses. The definition of the great-circle model by Ball et al.
(1997) actually precedes Moore and Newman (2000); Newman et al. (2000a); Newman and Watts (1999).
See Ball and Neal (2002, 2004, 2008) for further results on this model.
Theorem 9.25 is (Barbour and Reinert, 2001, Theorem 3.9). The description of the limiting random
variable in terms of Gumbel distributions can be found in (Barbour and Reinert, 2006, page 1242). The
proof of Theorem 9.25 was extended by Barbour and Reinert (2006) to deal with discrete tori where the
usual distances on the torus are considered (but the shortcuts still do not count as single edges).
Theorem 9.26 follows from (Barbour and Reinert, 2006, Theorem 4.1), which even allows λ, ρ, and k
to depend on n, as long as λn n → ∞ and ρn kn remains bounded. When ρn → 0, the result in (9.5.2) in
Theorem 9.25 is retrieved for the discrete small-world model.
A related small-world model was considered by Turova and Vallier (2010), who studied a union of
subcritical percolation on a finite cube and the Erdős-Rényi random graph. Using the methodology of
Bollobás et al. (2007), they showed that the phase transition is similar to the one described in Theorem
3.19.
Hyperbolic random graphs were introduced by Krioukov et al. (2010), who considered hyperbolic random
graphs in d ≥ 2. The mathematical results restrict to d = 1. The first rigorous results were obtained by
Gugelmann et al. (2012), who proved Theorem 9.27 and identified the exact asymptotic degree distribution
(see (Gugelmann et al., 2012, Theorem 2.2)), as well as the positivity of the clustering in the graph (see
(Gugelmann et al., 2012, Theorem 2.1)). The result about the maximal degree in the hyperbolic graph
is stated in (Gugelmann et al., 2012, Theorem 2.4). The sharpest results on the degree distribution and
clustering were proved by Fountoulakis et al. (2021), who identified the exact clustering coefficient in
terms of various special functions in an involved paper.
Bode et al. (2015) proved the main parts of Theorem 9.28. More precisely, instead of the law of large
numbers for the giant, Bode et al. (2015) proved a linear lower bound on |Cmax |. Kiwi and Mitsche (2019)
proved that the second largest component is at most polylogarithmic for α ∈ [ 21 , 1) (i.e., at most a power
of log n), and at most nγ for some γ ∈ (0, 1) for α = 1. The law of large numbers follows from these
results by a “giant is almost local” proof, and using the local convergence proved in van der Hofstad et al.
(2023). See van der Hofstad (2021) for further details. Bode et al. (2015) also showed that |Cmax | =
ΘP (R2 (log log R)3 n1/α ) for α > 1. For α ∈ ( 21 , 1), Friedrich and Krohmer (2018) and Kiwi and Mitsche
(2015) (see also Friedrich and Krohmer (2015)) studied the diameter of these hyperbolic random graphs,
454 Related Models

and Abdullah et al. (2017) identified their ultra-small-world behavior in Theorem 9.29. Bläsius et al. (2018)
studied the size of the largest cliques in hyperbolic graphs.
Geometric inhomogeneous random graphs (GIRG)s. Product GIRGs were introduced by Bringmann et al.
(2019) and Bringmann et al. (2020). The relation between the hyperbolic random graph and the product
GIRG as described in Theorem 9.31 can be found in (Bringmann et al., 2017, Section 7), where limits
are derived up to constants. Theorem 9.31 was first proved in (Komjáthy and Lodewijks, 2020, Section
7) under conditions slightly different from Assumption 9.30. The current statement of Assumption 9.30
is Assumptions 1.5–1.7 in van der Hofstad et al. (2023). An adaptation of the proof of Theorem 9.31 can
be found in Section 2.1.3 in van der Hofstad et al. (2023). Komjáthy and Lodewijks (2020) also studied
its weighted distances focussing on the case where τ ∈ (2, 3). Theorem 9.34 was proved in Bringmann
et al. (2020) for product GIRGs with τ ∈ (2, 3), except for the law of large numbers of the giant. This
again follows by a “giant is almost local” proof combined with the bound on the second largest component,
proved in Bringmann et al. (2020), and the identification of the local limit in van der Hofstad et al. (2023).
The result for the model in (9.5.35) and (9.5.36) is (Jorritsma et al., 2023, Corollary 2.3). The GIRGs in
(9.5.35) and (9.5.36) are called interpolating kernel-based spatial random graphs by Jorritsma et al. (2023).
The main focus of Jorritsma et al. (2023) is the size of the second largest connected component |C(2) |, on
which the authors proved sharp polylogarithmic bounds with the correct exponent.
The local convergence in probability in Theorem 9.33 is proved in van der Hofstad et al. (2023) using
path-counting techniques. Local weak convergence for product GIRGs was proved under slightly different
assumptions by Komjáthy and Lodewijks (2020) (see (Komjáthy and Lodewijks, 2020, Assumption 2.5)).
In more detail, a coupling version of Theorem 9.33 is stated in (Komjáthy and Lodewijks, 2020, Claim
3.3), where a blown-up version of the product GIRG is bounded from below and above by the limiting
model with slightly smaller and larger intensities, respectively. Take a vertex uar in the product GIRG.
Then whp it is also present in the lower and upper bounding Poisson infinite GIRG. Similarly, whp none of
the edges within a ball of intrinsic radius r will be different in the three models, which proves local weak
convergence. Local convergence in probability would follow from a coupling of the neighborhoods of two
uniformly chosen vertices in the GIRG to two independent limiting copies. Such independence is argued
in (Komjáthy and Lodewijks, 2020, proof of Theorem 2.12), in particular in the text around (Komjáthy and
Lodewijks, 2020, (3.16)). It can be expected that a hyperbolic random graph in d dimensions can be mapped
to a product GIRG in d−1 dimensions. The one-dimensional nature of the model for d = 2 discussed below
Theorem 9.34 should thus not arise when d ≥ 3, and one can expect a giant to exist for τ > 3 also.
Local limits as arising in Theorem 9.33, in turn, were studied by Hirsch (2017). Fountoulakis (2015)
studied an early version of a geometric Chung–Lu model.
Spatial preferential attachment models. Our exposition follows Jordan (2010, 2013); Flaxman et al. (2006,
2007). Jordan (2010) treats the case of uniform locations of the vertices, a problem first suggested by
Flaxman et al. (2006, 2007). Jordan (2013) studies preferential attachment models where the vertices are
located in a general metric space with not-necessarily-uniform location of the vertices. This is more difficult,
as then the power-law degree exponent depends on the location of the vertices. Theorem 9.37 follows from
(Flaxman et al., 2006, Theorem 1(a)), which is quite a bit sharper, as it states detailed concentration results
as well. Further results involve the proof of connectivity of the resulting graph and an upper bound on
the diameter of order O(log (n/r)) when r ≥ n−1/2 log n, m ≥ K log n for some large enough K,
and α ≥ 0. Flaxman et al. (2007) generalized these results to the setting where, instead of a unit ball, a
smoother version is used, while the majority of points were still within a distance rn = o(1). Theorem
9.38 is (Jordan, 2010, Theorem 2.1). Theorem 9.39 is (Jordan and Wade, 2015, Theorem 2.4). (Jordan and
Wade, 2015, Theorem 2.2) shows that the degree distribution for α(r) = exp{(log(1/r))γ } for γ > 23 is
the same as the so-called online nearest-neighbor graph, for which (Jordan and Wade, 2015, Theorem 2.1)
shows that the limiting degree distribution has exponential tails. Manna and Sen (2002) studied geometric
preferential attachment models from a simulation perspective.
Theorem 9.40 is (Jordan, 2013, Theorems 2.1 and 2.2). (Jordan, 2013, Theorem 2.3) contains partial
results for the setting where S is infinite. These results are slightly weaker, as they do not characterize the
degree power-law exponent exactly.
Aiello et al. (2008) gave an interpretation of spatial preferential attachment models in terms of influence
regions and proved Theorem 9.41 (see (Aiello et al., 2008, Theorem 1.1)). Further results involve the study
of maximal in-degrees and the total number of edges. See also Janssen et al. (2016) for a version with
non-uniform locations.
Jacob and Mörters (2015) studied the degree distribution and local clustering in a related geometric
preferential attachment model. Jacob and Mörters (2017) studied the robustness of the giant component in
9.6 Notes and Discussion for Chapter 9 455

that model, and also presented heuristics that distances are ultra-small in the case where the degrees have
infinite variance.
For a relation between preferential attachment graphs with so-called fertility and aging, and a geometric
competition-induced growth model for networks, we refer to Berger et al. (2004, 2005) and the references
therein. Zuev et al. (2015) studied how geometric preferential attachment models give rise to soft commu-
nities.
Complex network models on the hypercubic lattice. Below we give references to the literature.
Scale-free percolation was introduced by Deijfen et al. (2013). We have adapted the parameter choices,
so that the model is closer to the geometric inhomogeneous random graph. In particular, in Deijfen et al.
(2013), (9.5.58) is replaced with
α
pxy = 1 − e−λWx Wy /|x−y| , (9.6.6)
−γ
and then the power-law exponent for the degrees is such that P(Do > k) ≈ k , where γ = α(τ − 1)/d
and τ is the weight power-law exponent as in (9.5.15). The current set-up has the advantage that the degree
power-law exponent agrees with that of the weight distribution.
The fact that λc < ∞ holds in most cases is (Deijfen et al., 2013, Theorem 3.1). Theorem 9.43(a) is
(Deijfen et al., 2013, Theorem 4.2), Theorem 9.43(b) is (Deijfen et al., 2013, Theorem 4.4). Deprez et al.
(2015) showed that the percolation function is continuous when α ∈ (d, 2d), i.e., θ(λc ) = 0. However, in
full generality, continuity of the percolation function at λ = λc when λc > 0 is unknown.
Theorem 9.44(a) was proved in Deijfen et al. (2013); van der Hofstad and Komjáthy (2017), see in
particular Corollary 1.4 in van der Hofstad and Komjáthy (2017). Theorem 9.44(b) was proved in Heyden-
reich et al. (2017); Hao and Heydenreich (2023), following up on similar results for long-range percolation
proved by Biskup (2004); Biskup and Lin (2019). In long-range percolation, edges are present indepen-
dently, and the probability that the edge {x, y} is present equals |x − y|−αd+o(1) for some α > 0. In
this case, detailed results exist for the limiting behavior of distG (o, x) depending on the value of α. For
example, in Benjamini et al. (2004), it is shown that the diameter of this infinite percolation model is equal
to d1/(1 − α)e almost surely when α ∈ (0, 1). Theorem 9.44(c) is (Deprez et al., 2015, Theorem 8(b)).
Deprez and Wüthrich (2019) investigated graph distances in the continuum scale-free percolation model;
a related result was proved in the long-range percolation setting by Sönmez (2021), who also addressed
bounds on graph distances for α ∈ {1, 2}.
There is some follow-up work on scale-free percolation. Hirsch (2017) proposed a continuum model for
scale-free percolation. Deprez et al. (2015) argued that scale-free percolation can be used to model real-life
networks. Heydenreich et al. (2017) established recurrence and transience criteria for random walks on the
infinite connected component. For long-range percolation this was proved by Berger (2002).
Spatial configuration models on the lattice were introduced by Deijfen and Jonasson (2006); see also
Deijfen and Meester (2006). In our exposition, we follow Jonasson (2009), who studied more general un-
derlying graphs, such as trees or other infinite transitive graphs. Theorem 9.45 is (Jonasson, 2009, Theorem
3.1). (Jonasson, 2009, Theorem 3.2) extended Theorem 9.45 to settings where the degrees are not iid, but
rather translation invariant. In this case, it is still necessary for (9.5.65) that E[D(d+1)/d ] < ∞, but this
may not be enough. Sharper conditions are restricted to the setting where d = 1 (and where no condition
of the form E[Dk ] < ∞ suffices) and d = 2 (for which it suffices that E[D(d+1)/(d−1)+α ] < ∞ for some
α > 0, but if E[D(d+1)/(d−1)−α ] = ∞ for some α > 0, then there exist translation-invariant matchings
for which (9.5.65) fails).
We next discuss the properties of the model in d = 1 where the directions of the half-edges are chosen
independently. (Deijfen and Meester, 2006, Proposition 2.1) shows that, when the direction is chosen with
probability p 6= 12 , the maximal edge length N is infinite with positive probability. (Deijfen and Meester,
2006, Theorem 2.1) shows that N < ∞ almost surely when p = 12 , while (Deijfen and Meester, 2006,
Theorem 4.1) implies that E[N ] = ∞ when p = 12 .
Deijfen (2009) studied a related model where the vertices are a Poisson point process on Rd . This model
was further studied by Deijfen et al. (2012). In the latter paper, the surprising result is shown that for
any sequence of iid degrees of the points of the Poisson process, there are translation-invariant matchings
that percolate, as well as matchings that do not. Further, this matching can be a factor, where a translation-
invariant matching is called a factor if it is a deterministic function of the Poisson process and of the degrees
of the vertices in the Poisson process, that is, if it does not involve any additional randomness. See (Deijfen
et al., 2012, Theorem 1.1) for more details.
A threshold scale-free percolation model. We finally discuss the results by Yukich (2006) on another infinite
456 Related Models

geometric random graph model. We start by taking an iid sequence (Wx )x∈Zd of random variables on
[1, ∞) satisfying (9.5.15) with a constant slowly varying function. Fix δ ∈ (0, 1]. The edge {x, y} ∈
Zd × Zd appears in the random graph precisely when

|x − y| ≤ δ min{Wxτ /d , Wyτ /d }. (9.6.7)

τ /d
We can think of the ball of radius δWx as being the region of influence of x, and two vertices are
connected precisely when each of them lies in the influence region of the other. This motivates the choice
in (9.6.7). The parameter δ can be interpreted as the probability that nearest neighbors are connected, and
in what follows we restrict ourselves to δ = 1, in which case the infinite connected component turns out
to equal Zd . We denote the resulting (infinite) random graph by G. Yukich (2006) parametrized the model
τ /d
slightly differently, and replaced Wx by Ux−q for a uniform random variable Ux .
The threshold model in (9.6.7) is quite different from a threshold scale-free percolation model as in
(9.5.58), where an edge would be present when |x − y| ≤ (Wx Wy )1/d . The product structure creates
rather different asymptotics compared with the minimum in (9.6.7).
We next discuss the properties of this model, starting with its power-law nature. (Yukich, 2006, Theorem
1.1) shows that the limit
lim kτ −1 P(Do > k) (9.6.8)
k→∞

exists, so that the model has a power-law degree sequence with power-law exponent τ (recall (1.4.3)). The
intuitive explanation of (9.6.8) is as follows. Suppose we condition on the value of Wo = w. Then, the
conditional distribution of Do given that Wo = w is equal to

1{|x|≤min{W τ /d ,W τ /d }} = 1{|x|≤W τ /d } .
X X
Do = (9.6.9)
o x x
x∈Zd x : |x|≤wτ /d

Note that the random variables (1{|x|≤W τ /d } )x∈Zd are independent Bernoulli random variables with pa-
x
rameter equal to
P(1{|x|≤W τ /d } = 1) = P(W ≥ |x|d/τ ) = |x|−d(τ −1)/τ . (9.6.10)
x

In order for Do ≥ k to occur, for k large, we must have that Wo = w is quite large, and, in this case, a
central limit theorem should hold for Do , with mean equal to
X
E[Do | Wo = w] = |x|−d(τ −1)/τ = cw(1 + o(1)), (9.6.11)
x : |x|≤wτ /d

for some explicit constant c = c(τ, d). Furthermore, the conditional variance of Do given that Wo = w is
bounded above by its conditional expectation, so that the conditional distribution of Do given that Wo = w
is highly concentrated. We omit further details, and merely note that this heuristic can be made precise by
using standard concentration results. Assuming sufficient concentration, we obtain that the probability that
Do ≥ k is asymptotically equal to the probability that W > wk , where wk is determined by the equation

E[Do | Wo = wk ] = cwk (1 + o(1)) = k, (9.6.12)

so that wk = (1 + o(1))k/c. This suggests that

P(Do > k) = P(W > k/c)(1 + o(1)) = (k/c)−(τ −1) (1 + o(1)), (9.6.13)

which explains (9.6.8).

We next turn to distances in this model. For x, y ∈ Zd , we denote by distG (x, y) the graph distance
between the vertices x and y, i.e., the minimal number of edges in G connecting x and y. The main result
in Yukich (2006) is the following theorem:
Theorem 9.46 (Ultra-small distances for model in (9.6.7)) For all d ≥ 1 and all τ > 1, whp, as |x| → ∞,

distG (o, x) ≤ 8 + 4 log log |x|. (9.6.14)

9.7 Exercises for Chapter 9 457

The result in Theorem 9.46 shows that distances in the model given by (9.6.7) are much smaller than
those in normal percolation models. Recall Meta Theorem B at the start of Part III. While Theorem 9.46
resembles the results in Meta Theorem B, the differences reside in the fact that distances are ultra-small
independently of the exact value of the degree power-law exponent.
Again, the result in Theorem 9.46 can be compared with similar results for long-range percolation (recall
the discussion of scale-free percolation).

9.7 E XERCISES FOR C HAPTER 9

Exercise 9.1 (Topology of the strongly connected component for digraphs) Let G be a digraph. Prove that
if u and v are such that u is connected to v and v is connected to u, then the strongly connected components
of u and v are the same.
Exercise 9.2 (The sums of the out- and of the in-degrees of a digraph agree) Let G be a digraph for which
d(out)
v and d(in)
v denote the out- and in-degree of v ∈ V (G). Show that
X (out) X (in)
dv = dv . (9.7.1)
v∈V (G) v∈V (G)

Exercise 9.3 (Local convergence for randomly directed graphs) Let (Gn )n≥1 be a random graph se-
quence that converges locally in probability. Give each edge e a random orientation, by orienting e =
{u, v} as e = (u, v) with probability 12 and as e = (v, u) with probability 12 , independently across edges.
Show that the resulting digraph converges locally in probability in the marked forward, backward, and
forward–backward senses.
Exercise 9.4 (Local convergence for randomly directed graphs (cont.)) In the setting of Exercise 9.3,
assume that the convergence of (Gn ) is locally weakly. Conclude that the resulting digraph converges
locally weakly in the marked forward, backward, and forward–backward senses.
Exercise 9.5 (Local convergence for directed version of PA(m,δ)
n (d)) Consider the edges in PA(m,δ)
n (d)
to be oriented from young to old, so that the resulting digraph has out-degree m and random in-degrees.
Use Theorem 5.8 and 5.21 to show that this digraph converges locally in probability in the marked forward,
backward, and forward–backward senses.
Exercise 9.6 (Power-law lower bound for PageRank on directed version of PA(m,δ)
n (d)) Recall the di-
rected version of PA(m,δ)
n (d) in Exercise 9.5. Use (9.2.10) to show that there exists a constant c =
c(α, δ, m) > 0 such that
δ
P(R∅ > r) ≥ cr−τ , where τ =3+ (9.7.2)
m
is the power-law exponent of PA(m,δ)
n (d). What does this say about the PageRank power-law hypothesis for
the directed version of PA(m,δ)
n (d)?
Exercise 9.7 (Power-law lower bound for PageRank on digraphs with bounded out-degrees) Let (Gn )n≥1
be a random digraph sequence that converges locally in probability in the marked backward sense to
(D, ∅) ∼ µ. Assume that there exist 0 < a, b < ∞ such that d(out)v ∈ [a, b] for all v ∈ V (Gn ). As-
sume that µ(D∅ (in)
> r) ≥ cr−γ for some γ > 0. Use (9.2.10) to show that there exists a constant c0 > 0
such that µ(R∅ > r) ≥ c0 r−γ .
Exercise 9.8 (Mean number of edges in DGRGn (w)) Consider the directed generalized random graph,
as formulated in (9.2.14) and (9.2.15). Assume that the weight-regularity condition in (9.2.16) holds. Let
Xij be the indicator that there is a directed edge from i to j (with Xii = 0 for all i ∈ [n] by convention).
Show that
1 h X i E[W (in) ]E[W (out) ]
E Xij → . (9.7.3)
n E[W (in) + W (out) ]
i,j∈[n]

Conclude that the limit equals E[W (in) ] = E[W (out) ] when the symmetry condition in (9.2.18) holds.
458 Related Models

Exercise 9.9 (Number of edges in DGRGn (w)) In the setting of (9.7.3) in Exercise 9.8, show that
1 X P E[W (in) ]E[W (out) ]
Xij −→ , (9.7.4)
n E[W (in) + W (out) ]
i,j∈[n]

which equals 12 E[W (in) ] = 21 E[W (out) ] when the symmetry condition in (9.2.18) holds.
Exercise 9.10 (Local limit of directed Erdős–Rényi random graph) Use Theorem 9.2 to describe the local
limit of the directed Erdős–Rényi random graph.
Exercise 9.11 (Local convergence for finite-type directed inhomogeneous random graphs) Adapt the proof
of Theorem 3.14 to prove Theorem 9.2 in the case of finite-type kernels. Here, we recall that a kernel κ is
called finite type when (s, r) 7→ κ(s, r) takes on finitely many values.
Exercise 9.12 (Local convergence for DGRGn (w)) Consider the directed generalized random graph
as formulated in (9.2.14) and (9.2.15). Assume that the weight-regularity condition in (9.2.16) holds. Use
Theorem 9.2 to determine the local limit in probability of DGRGn (w). Is the local limit of the forward and
backward neighborhoods a single- or a multi-type branching process?
Exercise 9.13 (Phase transition for directed Erdős–Rényi random graph) For the directed Erdős–Rényi
random graph, show that ζ in Theorem 9.3 satisfies ζ > 0 precisely when λ > 1.
Exercise 9.14 (Phase transition for directed generalized random graph) Consider the directed general-
ized random graph, as formulated in (9.2.14) and (9.2.15). Assume that the weight-regularity condition in
(9.2.16) holds. What is the condition on the asymptotic weight distribution (W (out) , W (in) ) in (9.2.16) that
is equivalent to ζ > 0 in Theorem 9.3?
Exercise 9.15 (Correlations of out- and in-degrees of a randomly directed graph) In an undirected graph
G, randomly direct each edge by orienting e = {u, v} as (u, v) with probability 12 and as (v, u) with
probability 12 , as in Exercise 9.3. Let v ∈ V (G) be a vertex in G of degree dv . What is the correlation
coefficient between its out- and in-degrees in the randomly directed version of G? Note:
p The correlation
coefficient ρ(X, Y ) between two random variables X and Y is equal to Cov(X, Y )/ Var(X)Var(Y ).
Exercise 9.16 (Equivalence of convergence of in- and out-degrees in DCMn (d)) Show that (9.2.24)
d
implies that E[D(out) ] = E[D(in) ] when (Dn(in) , Dn(out) ) −→ (D(in) , D(out) ), E[Dn(in) ] → E[D(in) ], and
E[Dn(out) ] → E[D(in) ].
Exercise 9.17 (Self-loops and multiple edges in DCMn (d)) Argue that the proof of [V1, Proposition
7.13] can be adapted to show that the number of self-loops in DCMn (d) converges to a Poisson random
d
variable with parameter E[D(in) D(out) ] when (Dn(in) , Dn(out) ) −→ (D(in) , D(out) ) and
E[Dn(in) Dn(out) ] → E[D(in) D(out) ]. (9.7.5)
What can you say about the number of multiple edges in DCMn (d)? Note: No proof is expected, a reason-
able argument suffices.
Exercise 9.18 (Local convergence for DCMn (d) in Theorem 9.4) Give a proof of the local limit result in
Theorem 9.4 by suitably adapting the proof of Theorem 4.1.
Exercise 9.19 (One-sided law of large numbers for SSC) Adapt the proof of Corollary 2.27 to show that
when Gn = ([n], E(Gn )) converges locally in probability in the forward–backward sense to (G, o) having
distribution µ, then the size of the largest strongly connected component |Cmax | satisfies that, for every
ε > 0 fixed,
P(|Cmax | ≤ n(ζ + ε)) → 1, (9.7.6)
where ζ = µ(|C (o)| = ∞) is the forward–backward survival probability of the limiting graph (G, o) (i.e.,
the probability that both the forward and the backward component of o have infinite size).
Exercise 9.20 (Subcritical directed configuration model) Let DCMn (d) be a directed configuration
model that satisfies the degree-regularity conditions in (9.2.25) and (9.2.26). Let Cmax denote its largest
P
strongly connected component. Use Exercise 9.19 to show that |Cmax |/n −→ 0 when ζ = 0, where
9.7 Exercises for Chapter 9 459

ζ = µ(|C (o)| = ∞) is the forward–backward survival probability of the limiting graph (G, o). This
proves the subcritical result in Theorem 9.5(b).
Exercise 9.21 (Logarithmic growth of typical distances in the directed configuration model) Let DCMn (d)
be a directed configuration model that satisfies the degree-regularity conditions in (9.2.25) and (9.2.26). Ar-
gue heuristically why the logarithmic typical distance result in Theorem 9.6 remains valid when (9.2.37)
is replaced by the weaker condition that (Dn(in) Dn(out) )n≥1 is uniformly integrable. Also, give an example
where this uniform integrability is true, but (9.2.37) is not.
Exercise 9.22 (Logarithmic growth of typical distances in the directed configuration model (cont.)) Let
DCMn (d) be a directed configuration model that satisfies the degree-regularity conditions in (9.2.25) and
(9.2.26). Give a formal result of the claim in Exercise 9.21 by a suitable degree-truncation argument, as
explained above (9.6.3).
Exercise 9.23 (Ultra-small distances in the directed configuration model) Let DCMn (d) be a directed
configuration model that satisfies the degree-regularity conditions in (9.2.25) and (9.2.26). Use the degree-
truncation argument, as explained above (9.6.3), to show that distDCMn (d) (o1 , o2 ) = oP (log n) when
ν = ∞.
Exercise 9.24 (Strongly connected component in temporal networks) Let G be a temporal network, in
which vertices have a time label of their birth and edges are oriented from younger to older vertices. What
do the strongly connected components of G look like?
Exercise 9.25 (Degree structure in stochastic block models) Recall the definition of the stochastic block
model in Section 9.3.1, and assume that the type regularity condition in (9.3.1) holds. What is the asymptotic
expected degree of this model? When do all vertices have the same asymptotic expected degree?
Exercise 9.26 (Giant in stochastic block models) Recall the definition of the stochastic block model in
Section 9.3.1, and assume that the type regularity condition in (9.3.1) holds. When is there a giant compo-
nent?
Exercise 9.27 (Random guessing in stochastic block models) Consider a stochastic block model with t
types as introduced in Section 9.3.1 and assume that each of the types occurs equally often. Let σ̂(v) be a
random guess, so that (σ̂(v))v∈[n] is an iid vector, with σ̂(v) = s with probability 1/t for every s ∈ [t].
Show that (9.3.2) does not hold, i.e., show that the probability that
1 X h 1i
max 1{σ̂(v)=(p◦σ)(v)} − ≥ ε
p : [t]→[t] n t
v∈[n]

vanishes.
Exercise 9.28 (Degree structure in stochastic block models with unequal expected degrees) Let n be even.
Consider the stochastic block model with two types and n/2 vertices with each types. Let pij = a1 /n when
i, j have type 1, pij = a2 /n when i, j have type 2, and pij = b/n when i, j have different types. For
i ∈ {1, 2} and k ∈ N0 , let Ni,k (n) denote the number of vertices of type i and degree k. Show that
Ni,k (n) P
−→ P(Poi(λi ) = k), (9.7.7)
n
where λi = (ai + b)/2.
Exercise 9.29 (Community detection in stochastic block models with unequal expected degrees) In the
setting of Exercise 9.28, assume that a1 > a2 . Consider the following greedy community detection algo-
rithm: let σ̂(v) = 1 for the n/2 vertices v ∈ [n] of highest degree, and σ̂(v) = 2 for the remaining vertices
(breaking ties randomly when necessary). Argue that this algorithm achieves the solvability condition in
(9.3.2).
Exercise 9.30 (Parameter conditions for solvable stochastic block models) Consider the stochastic block
model in the setting of Theorem 9.12, and assume that (a − b)2 > 2(a + b), so that the community detection
problem is solvable. Show that a − b > 2 and thus also a + b > 2. Conclude that this model has a giant.
Exercise 9.31 (Parameter conditions for solvable stochastic block models (cont.)) In the setting of Exer-
cise 9.30, show that also the vertices of type 1 only (resulting in an Erdős–Rényi random graph of size n/2
and edge probability a/n) have a giant. What are the conditions for the vertices of type 2 to have a giant?
460 Related Models

Exercise 9.32 (Degree structure in degree-corrected stochastic block models) Recall the definition of the
degree-corrected stochastic block model in (9.3.11) in Section 9.3.2, and assume that the type regularity
condition in (9.3.1) holds. Assume further that E[Xup ] < ∞ for some p > 1. What is the asymptotic
expected degree of a vertex v of weight xv in this model? What are the restrictions on (κ(s, r))s,r∈S such
that the expected degree of vertex v with weight xv is equal to xv (1 + oP (1))?
Exercise 9.33 (Equal average degrees in degree-corrected stochastic block models) Recall the definition
of the degree-corrected stochastic block model in (9.3.11) in Section 9.3.2, and assume that (9.3.1) holds.
Let the number of types t ≥ 2 be arbitrary and assume that κ(s, r) = b for all s 6= r, while κ(s, s) = a.
Assume that µ(s) = 1/t for every s ∈ [t]. Compute the asymptotic average degree of a vertex of type s,
and show that it is independent of s.
Exercise 9.34 (Giant in the degree-corrected stochastic block models) Recall the definition of the degree-
corrected stochastic block model in Section 9.3.2, and assume that the type regularity condition in (9.3.1)
holds. When is there a giant component?
Exercise 9.35 (Degrees in configuration models with global communities) Recall the definition of the
configuration models with global communities in Section 9.3.3, and assume that the degree regularity con-
ditions in (9.3.18), (9.3.19), and (9.3.20) hold. What is the asymptotic average degree of this model? When
do all vertices have the same asymptotic expected degree?
Exercise 9.36 (Local limit in configuration models with global communities) Recall the definition of the
configuration models with global communities in Section 9.3.3, and assume that (9.3.18), (9.3.19), and
(9.3.20) hold. What is the local limit of this model? Note: No proof is expected; a reasonable argument
suffices.
Exercise 9.37 (Giant in configuration models with global communities) Recall the definition of the con-
figuration model with global communities in Section 9.3.3, and assume that (9.3.18), (9.3.19), and (9.3.20)
hold. When is there a giant component? Note: No proof is expected; a reasonable argument suffices.
Exercise 9.38 (Degree distribution in preferential attachment model with global P communities) Show that
(pk (θ))k≥m in (9.3.29) is a probability distribution for all θ, i.e., show that k≥m pk (θ) = 1 and pk (θ) ≥
0 for all k ≥ 1.
Exercise 9.39 (Degree distribution in preferential attachment model with global communities) In the
preferential attachment model with global
Pcommunities studied in Theorem 9.14, show that also the global
degree distribution given by Pk (n) = n1 v∈[n] 1{Dv (n)=k} converges almost surely.
Exercise 9.40 (Power-law degrees in preferential attachment model with global communities) In the pref-
erential attachment models with global communities studied in Theorem 9.14, show that the global degree
distribution has a power-law tail with exponent τ = 1 + 1/ maxs∈[s] θ? (s), provided that µ(s? ) > 0 for
at least one s? ∈ [r] satisfying θ? (s? ) = maxs∈[s] θ? (s).
Exercise 9.41 (Clustering in model with edges and triangles) Show that the global clustering coefficient in
the model where each pair of vertices is independently connected with probability λ/n, as for ERn (λ/n),
and each triple forms a triangle with probability µ/n2 , independently for all triplets and independently of
the status of the edges, converges to µ/((λ + µ)2 + µ).
Exercise 9.42 (Local limit in inhomogeneous random graph with communities) Recall the definition of
the inhomogeneous random graph with communities in Section 9.4.1. What is the local limit of this model?
Note: No proof is expected; a reasonable argument suffices.
Exercise 9.43 (Size-biased community size distribution in HCM) In the hierarchical configuration model
introduced in Section 9.4.2, choose a vertex o uar from [n]. Let Go be the community containing o. Show
that (9.4.16) implies that |V (Go )| converges in distribution, and identify its limiting distribution.
Exercise 9.44 (Local limit in hierarchical configuration model) Recall the definition of the hierarchical
configuration model in Theorem 9.19. What is the local limit of this model? Note: No proof is expected; a
reasonable argument suffices.
Exercise 9.45 (Law of large numbers for |Cmax | in hierarchical configuration model) Use Theorem 4.9 to
prove the law of large numbers for the giant in the hierarchical configuration model in Theorem 9.19, and
prove that ζ is given by (9.4.19).
9.7 Exercises for Chapter 9 461

Exercise 9.46 (Local clustering for configuration model with clustering) Recall the configuration model
with clustering defined in Section 9.4.2. Let (Dn(si) , Dn(tr) ) denote the number of simple edges and triangles
d
incident to a uniform vertex in [n], and assume that (Dn(si) , Dn(tr) ) −→ (D(si) , D(tr) ) for some limiting dis-
tribution (D(si) , D(tr) ). Compute the local clustering coefficient of this model under the extra assumptions
that E[Dn(si) ] → E[D(si) ] < ∞ and E[Dn(tr) ] → E[D(tr) ] < ∞.
Exercise 9.47 (Local clustering for configuration model with clustering) In the setting of Exercise 9.46,
compute the global clustering coefficient of this model under the extra assumption that also E[(Dn(si) )2 ] →
E[(D(si) )2 ] < ∞ and E[(Dn(tr) )2 ] → E[(D(tr) )2 ] < ∞.
Exercise 9.48 (Single overlap in random intersection graph) Consider the random intersection graph with
prescribed communities as defined in Section 9.4.3, under the conditions of Theorem 9.21. Show that it is
unlikely for a uniform vertex to have a neighbor with which it shares two groups.
Exercise 9.49 (Local clustering in the random intersection graph) Consider the random intersection graph
with prescribed communities as defined in Section 9.4.3, under the conditions of Theorem 9.21. Show that
the local clustering coefficient converges. When is this limit strictly positive?
Exercise 9.50 (Global clustering in the random intersection graph) Consider the random intersection
graph with prescribed communities as defined in Section 9.4.3, under the conditions of Theorem 9.21. What
are the conditions on the group membership and size distributions that imply that the convergence of the
global clustering coefficient in Theorem 2.22 follows? When is the limit of the global clustering coefficient
strictly positive?
Exercise 9.51 (Degree distribution in the discrete small-world model) Recall the discrete small-world
model in Section 9.5.1 as studied in Theorem 9.26, but now with λ, ρ > 0 and k fixed. What is the limit of
the probability that a uniform vertex has degree l for l ≥ 0?
Exercise 9.52 (Degree distribution in the geometric preferential attachment model with non-uniform loca-
tions) Recall that (9.5.50) in Theorem 9.40 identifies the degree distribution of the geometric preferential
attachment model at each of the elements zi ∈ S. Conclude what the degree distribution is of the entire
graph. Does it obey a degree power law, and, if so, what is the degree power-law exponent?
Exercise 9.53 (Power-law degrees for the spatial preferential attachment model with influence) Prove
that, for pk in (9.5.55) and for k large, we have
pk = ck−(1+1/(pa1 )) (1 + o(1)), (9.7.8)
so that the spatial preferential attachment model with influence indeed has a power-law degree distribution.
Exercise 9.54 (Degree distribution in the GIRG) Investigate the degree distribution in the GIRG in The-
orem 9.32 using a second-moment method.
Exercise 9.55 (Degree moments in scale-free percolation (Deijfen et al. (2013))) Recall that o denotes the
degree of the origin in the scale-free percolation model defined in (9.5.58). Show that E[Dop ] < ∞ when
p < τ − 1 and E[Dop ] = ∞ when p > τ − 1. In particular, the variance of the degrees is finite precisely
when τ > 3.
Exercise 9.56 (Positive correlation between edge statuses in scale-free percolation) Show that, for scale-
free percolation, for all x, y, z distinct, and for λ > 0,
P({x, y} and {x, z} occupied) ≥ P({x, y} occupied) P({x, z} occupied), (9.7.9)
the inequality being strict when P(Wo = 0) < 1. In other words, the edge statuses are positively correlated.
Exercise 9.57 (Local convergence of PageRank) Assume that Gn converges locally in probability in the
marked backward sense to (G, o) ∼ µ. Use (9.6.2) to show that, whp, and for every η, ε > 0,
1 X
1{Rv(Gn ) >r} ≤ µ(Ro(G) > r − ε) + η, (9.7.10)
n
v∈V (Gn )

and
1
1{Rv(Gn ) >r} ≥ µ(Ro(G) > r) − η.
X
(9.7.11)
n
v∈V (Gn )
462 Related Models

Exercise 9.58 (Local convergence of PageRank (cont.)) Use Exercise 9.57 to complete the proof of The-
orem 9.1(ii).
A PPENDIX A
M ETRIC S PACE S TRUCTURE OF ROOTED G RAPHS

Abstract
In this appendix we highlight some properties of and results about metric
spaces, including separable metric spaces and Borel measures on them, as used
throughout this book. We also present some missing details in the proof that
the space of rooted graphs is a separable metric space. Finally, we discuss what
compact sets look like in this topology and relate this to tightness criteria.

A.1 M ETRIC S PACES

In this section we discuss metric spaces. We start by defining the terms metric and metric space:
Definition A.1 (Metric and metric space) Let X be a space. A metric on X is a function dX : X 2 7→
[0, ∞) such that
(a) 0 ≤ dX (x, y) < ∞ for all x, y ∈ X ;
(b) dX (x, y) = dX (y, x) for all x, y ∈ X ;
(c) dX (x, z) ≤ dX (x, y) + dX (y, z) for all x, y, z ∈ X .
Let X be a space and dX a metric on it. Then (X , dX ) is called a metric space. J
We next discuss desirable properties of metric spaces:
Definition A.2 (Complete and separable metric spaces) Let (X , dX ) be a metric space.
(a) We say that (X , dX ) is complete when every Cauchy sequence has a limit. Here, a Cauchy sequence
is a sequence (xn )n≥1 with xn ∈ X such that, for every ε > 0 there exists an N = N (ε) such that
dX (xn , xm ) ≤ ε for all n, m ≥ N .
(b) We say that (X , dX ) is separable if it contains a countable subset that is dense in (X , dX ). This means
that there exists a countable set A ⊆ X such that, for every x ∈ X , there exists a sequence (an )n≥1
with an ∈ A such that dX (an , x) → 0.
(c) The metric space (X , dX ) is called Polish when it is separable and complete. J

A.2 L OCAL T OPOLOGIES

In this section we discuss local topologies in general. The metric dG? on rooted graphs turns out to be a
special case of such a local topology.
As before, let (X , dX ) be a general metric space. Let x ∈ X . Suppose that, for every r ≥ 1, there is
a notion of a restriction of x to radius r, which we denote by [x]r and which is such that [x]r ∈ X . We
formalize this notion as follows:
Definition A.3 (Restriction) Let (X , dX ) be a metric space. For every r ≥ 1, a restriction of x ∈ X to
radius r is a continuous function [·]r : X 7→ X that satisfies the following properties:
(Closure) [x]r ∈ X for all x ∈ X ;
(Compatibility) [[x]r ]s = [x]s for all x ∈ X and r ≥ s;
(Coherence) For any sequence x1 , x2 , . . . ∈ X satisfying [xi ]j = xj for all j ∈ [i] there exists a unique
(infinite) element x ∈ X such that [x]r = xr for all r ≥ 0. J
For example, for a rooted graph (G, o), we can let [(G, o)]r = Br(G) (o) denote the r-neighborhood of o,
seen as a rooted graph. There are many more examples of restrictions, such as [x]r = (x1 , . . . , xr , 0, . . .)
for x = (xi )i≥1 , so that this restriction is equal to x in its first r coordinates, and equal to zero otherwise.
We next define a local topology:
Definition A.4 (Local topology) Let [·]r : X 7→ X be a restriction. Define, for x, y ∈ X ,
X dX ([x]r , [y]r ) ∧ 1
dloc (x, y) = . (A.2.1)
r(r + 1)
r≥1

463
464 Metric Space Structure of Rooted Graphs

The local topology is defined through the metric dloc . J

The metric dX may depend strongly on “far away” changes, in the sense that dX (x, y) could be large
even when [x]r = [y]r for most small r. The metric dloc derived from dX does not suffer from such effects,
in that it is at most 1/(R + 1) when [x]r = [y]r for all r ≤ R. This explains the name local topology.
Further, convergence for dloc is equivalent to local convergence:
Remark A.5 (Local convergence) The metric dloc is local in the sense that xn → x for dloc precisely
when [xn ]r → [x]r for dX for every r ≥ 1, as we will see in Theorem A.7 below. J
When we define dX ([x]r , [y]r ) = 1{[x]r 6=[y]r } , we obtain with R? = sup{r : [x]r = [y]r } that
X dX ([x]r , [y]r ) ∧ 1 X 1 1
dloc (x, y) = = = ? , (A.2.2)
r(r + 1) r(r + 1) R +1
r≥1 r≥R? +1

as in (2.2.2) and (2.2.3) for rooted graphs. Thus, the topology on rooted graphs can be seen as a special case
of a local topology. In a similar way, the metric on marked rooted graphs in (2.3.14) and (2.3.15) can be
viewed as a local topology. In this section we discuss local topologies in the general setting.
We next show that local topologies form a Polish space:
Theorem A.6 (Local topologies form a Polish space) Assume that {[x]r : x ∈ X } is Polish for every
r ≥ 1. Then the space (X , dloc ) is a Polish space, that is, (X , dloc ) is a metric, separable, and complete
space. Furthermore, a subset A ⊂ X is pre-compact (meaning that its closure is compact) if and only if the
sets {[x]r : x ∈ A} are pre-compact for every r ≥ 0.
Proof Let us first show that dloc is a distance. The symmetry and triangle inequality will then follow
directly. The fact that dloc (x, y) = 0 precisely when x = y is also easy, since if [x]r = [y]r for all r > 0
then x = y by (A.2.1).
We next show the separability of (X , dloc ). For any x ∈ X , we have dloc (x, [x]r ) ≤ 1/(r + 1) and
we have assumed that the set {[x]r : x ∈ X } of all restrictions of elements in X to radius r is separable.
Thus, (X , dloc ) arises as a union over r of dense countable sets in {[x]r : x ∈ X }, so that (X , dloc ) itself
is countable and dense for dloc .
For the completeness of (X , dloc ), we let (xn )n≥1 be a Cauchy sequence for dloc . Then, for every r,
the restriction [xn ]r is again Cauchy and, by the completeness of {[x]r : x ∈ X }, [xn ]r thus converges for
dX to a certain element yr ∈ {[x]r : x ∈ X }. By the continuity of x 7→ [x]r we deduce that [ys ]r = yr
for any s ≥ r, and so by the coherence property (A.2.1), we can define a unique element y ∈ X such that
yr = [y]r . It is then clear that xn → y for dloc .
We complete the proof by characterizing the compacts. The condition in the theorem is clearly necessary
for A to be pre-compact, for otherwise there exists r0 ≥ 0 and a sequence (xn )n≥1 in A whose restrictions
of radius r0 are all at a distance at most ε from each other. Such a sequence cannot admit a convergent
subsequence. Conversely, a subset A satisfying the condition of the theorem is easily seen to be pre-compact
for dloc : just cover it with restrictions of radius 1/(r + 1) centered on a 1/(r + 1)-net for dX of A to get a
1/(r + 1)-net for dloc .
We next proceed to discuss the convergence of random variables on (X , dloc ). We first recall that a
random variable X is a measurable function from the underlying probability space (Ω, F , P) with values in
the Polish space (X , dloc ) endowed with the Borel σ-field denoted by Bloc . Therefore, the natural notion of
convergence in distribution states that the sequence of random variables (Xn )n≥0 converges in distribution
d
(for the local topology) towards a random variable X, which we denote as Xn −→ X, if, for any bounded
continuous function h : X → R,
E[h(Xn )] → E[h(X)]. (A.2.3)
The main result of this section is the following theorem:
Theorem A.7 (Convergence of finite-dimensional distributions implies tightness) Assume that {[x]r : x ∈
X } is Polish for every r ≥ 1. The local topology satisfies the following properties:
(a) A family (Xi )i∈I of random variables with values in X is tight in the local topology if and only if the
family ([Xi ]r )i∈I is tight for every r ≥ 0.
(b) Let X1 and X2 be two random variables with values in (X , dloc ) such that P([X1 ]r ∈ A) =
d
P([X2 ]r ∈ A) for any A ∈ Bloc and any r ≥ 0. Then X1 = X2 .
A.3 Properties of the Metric dG? on Rooted Graphs 465

d
(c) Xn −→ X in the local topology when, for every r ≥ 1 and Borel sets A ∈ Bloc ,

P([Xn ]r ∈ A) → µ([X]r ∈ A), (A.2.4)

where µ is a probability measure on X .

Theorem A.7 is remarkable, since the convergence of P([Xn ]r ∈ A) is equivalent to the convergence of
finite-dimensional distributions. Normally, one would expect to need this convergence to be combined with
tightness to obtain convergence in distribution. Owing to the special nature of the local topology introduced
in this section (recall also Remark A.5), however, this convergence, combined with the fact that the limit is
a probability measure, implies tightness.

Proof of Theorem A.7 Part (a) follows directly from the compactness statement in Theorem A.6.
For part (b), we consider the family of events
n o
M = {x ∈ X : [x]r ∈ A} : A ∈ Bloc , r ≥ 0 . (A.2.5)

It is easy to see that the family M generates the Borel σ-field on X and moreover that M is stable under
finite intersections. It follows from the monotone class theorem that two random variables X1 and X2
agreeing on M have the same law.
For part (c), above we have already seen that the sets {[x]r ∈ A} are stable under finite intersections,
and it is easy to see that any open sets of the local topology can be written as a countable union of those sets.
The result then follows from (Billingsley, 1968, Theorem 2.2). In particular, we deduce that for a sequence
of random variables (Xn )n≥1 to converge in distribution it is necessary and sufficient that (Xn )n≥1 be tight
and that P([Xn ]r ∈ A) converges for every r ≥ 0 and every A ∈ Bloc . The two conditions are necessary,
d
since if P([Xn ]r ∈ A) converges to a limit that does not have full mass, then Xn −→ X does not hold.
d
From this, we conclude that if P([Xn ]r ∈ A) → µ([X]r ∈ A) where µ has full mass then Xn −→ X
does indeed follow.

A.3 P ROPERTIES OF THE M ETRIC dG? ON ROOTED G RAPHS

For (G, o) ∈ G? , let
[G, o] = {(G0 , o0 ) : (G0 , o0 ) ' (G, o)} (A.3.1)

denote the equivalence class in G? corresponding to (G, o). We further let

[G? ] = {[G, o] : (G, o) ∈ G? } (A.3.2)

denote the set of equivalence classes in G? . This is the set on which the distance dG? acts.
In this section we prove that ([G? ], dG? ) is a Polish space:

Theorem A.8 (Rooted graphs form a Polish space) dG? is a well defined metric on [G? ]. Further, the
metric space ([G? ], dG? ) is Polish.

We give an explicit proof of Theorem A.8, even though completeness and separability might also be
concluded from Theorem A.6, together with the observation that {[G, o]r : (G, o) ∈ G? } is Polish for
every r ≥ 1. This must be the case since completeness is obvious while separability follows because
dG? ((Gn , on ), (Gm , om )) ≤ ε implies that Br(Gn ) (on ) ' Br(Gm ) (om ) for all r ≤ 1/ε − 1.
The proof of Theorem A.8 is divided into several steps. These proof steps are a little involved, since
we need to deal with isomorphism classes of rooted graphs, rather than rooted graphs themselves. This
requires us to show that statements hold irrespective of the representative rooted graph chosen. We start in
Proposition A.10 below by showing that dG? is an ultrametric, which is a slightly stronger property than
being a metric, and which also implies that (G? , dG? ) is a metric space. In Proposition A.12, we show that
the metric space (G? , dG? ) is complete, and in Proposition A.14, we show that it is separable. After that,
we can complete the proof of Theorem A.8.
In the remainder of this section, we often work with r-neighborhoods Br(G) (o) of o in G. We emphasize
that we consider Br(G) (o) to be a rooted graph, with root o (recall (2.2.1)).
466 Metric Space Structure of Rooted Graphs

A.3.1 U LTRAMETRIC P ROPERTY OF THE S PACE ([G? ], dG? ) OF ROOTED G RAPHS

In this subsection we prove that dG? : G? × G? → [0, 1] is an ultrametric. One of the problems that we
have to resolve is that the space of rooted graphs is defined only up to isomorphisms, which means that we
have to make sure that dG? ((G1 , o1 ), (G2 , o2 )) is independent of the exact representative we choose in the
equivalence classes of (G1 , o1 ) and (G2 , o2 ). That is the content of the following proposition:
Proposition A.9 (dG? is well defined on [G? ]) The equality
dG? ((Ĝ1 , ô1 ), (Ĝ2 , ô2 )) = dG? ((G1 , o1 ), (G2 , o2 ))
holds whenever (Ĝ1 , ô1 ) ' (G1 , o1 ) and (Ĝ2 , ô2 ) ' (G2 , o2 ). Consequently, dG? : [G? ] × [G? ] → [0, ∞)
is well defined.
We continue to study the metric structure of dG? by showing that it is an ultrametric:
Proposition A.10 (Ultrametricity) The map dG? : G? × G? → [0, 1] is an ultrametric, meaning that:
(a) dG? ((G1 , o1 ), (G2 , o2 )) = 0 precisely when (G1 , o1 ) ' (G2 , o2 );
(b) dG? ((G1 , o1 ), (G2 , o2 )) = dG? ((G2 , o2 ), (G1 , o1 )) for all (G1 , o1 ), (G2 , o2 ) ∈ G? ;
(c) dG? ((G1 , o1 ), (G3 , o3 )) ≤ max{dG? ((G1 , o1 ), (G2 , o2 )), dG? ((G2 , o2 ), (G3 , o3 ))} for all
(G1 , o1 ), (G2 , o2 ), (G3 , o3 ) ∈ G? .
We prove will Propositions A.9 and A.10 below. Before giving their proofs, we state and prove an
important ingredient in them:
Lemma A.11 (Local neighborhoods determine the graph) Let (G1 , o1 ) and (G2 , o2 ) be two connected
(G ) (G )
locally finite rooted graphs such that Br 1 (o1 ) ' Br 2 (o2 ) for all r ≥ 0. Then (G1 , o1 ) ' (G2 , o2 ).
(G1 )
Proof We use a subsequence argument. Fix r ≥ 0, and consider the isomorphism φr : Br (o1 ) →
(G )
Br 2 (o2 ), which exists by assumption. Extend φr to (G1 , o1 ) by defining
(
(G )
φr (v) for v ∈ V (Br 1 (o1 ));
ψr (v) = (A.3.3)
o2 otherwise.

Our aim is to use (ψr )r≥0 to construct an isomorphism between (G1 , o1 ) and (G2 , o2 ).
(G ) (G ) (G )
Set Vr 1 = V (Br 1 (o1 )). Let ψr |V (G1 ) be the restriction of ψr to V0 1 = {o1 }. Then we know that
0
(G1 ) (G1 )
ψr (v) = o2 for every v ∈ V0 and r ≥ 0. We next let ψr |V (G1 ) be the restriction of ψr to V1 . Then,
1
(G1 ) (G2 )
ψr |V (G1 ) is an isomorphism between B1 (o1 ) and B1 (o2 ) for every r. Since there are only finitely
1
many such isomorphisms, the same isomorphism, say φ01 , needs to be repeated infinitely many times in the
sequence (ψr |V (G1 ) )r≥1 . Let N1 denote the values of r for which
1

ψr |V (G1 ) = φ01 ∀r ∈ N1 . (A.3.4)

(G1 )
Now we extend this argument to k = 2. Let ψr |V (G1 ) be the restriction of ψr to V2 . Again, ψr |V (G1 )
2 2
(G1 ) (G2 )
is an isomorphism between B2 (o1 ) and B2 (o2 ) for every r. Since there are again only finitely many
such isomorphisms, the same isomorphism, say φ02 , needs to be repeated infinitely many times in the se-
quence (ψr |V (G1 ) )r∈N1 . Let N2 denote the values of r ∈ N1 for which
2

ψr |V (G1 ) = φ02 ∀r ∈ N2 . (A.3.5)

(G1 )
We next generalize this argument to general k ≥ 2. Let ψr |V (G1 ) be the restriction of ψr to Vk .
k
(G1 ) (G2 )
Again, ψr |V (G1 ) is an isomorphism between Bk (o1 ) and Bk (o2 ) for every r. Since there are again
k
only finitely many such isomorphisms, the same isomorphism, say φ0k , needs to be repeated infinitely many
times in the sequence (ψr |V (G1 ) )r∈Nk−1 . Let Nk denote the values of r ∈ Nk−1 for which
k

ψr |V (G1 ) = φ0k ∀r ∈ Nk . (A.3.6)

k
A.3 Properties of the Metric dG? on Rooted Graphs 467

Then, we see that Nk is a sequence of decreasing infinite sets.

Let us define ψ`0 to be the first element of the sequence (ψr )r∈N` . Then, it follows that ψ`0 (v) = φ0k (v)
(G )
for all ` ≥ k and all v ∈ Vk 1 .
(G ) (G )
Denote U0 = {o1 } and Uk = Vk 1 \ Vk−11 . Since we are assuming that V (G1 ) is connected, we have
that ∪k≥0 Uk = V (G1 ), and this union is disjoint.
It follows that the functions (ψ`0 )`≥1 converge pointwise to

φk (v)1{v∈Uk } .
0
X 0
ψ(v) = ψ∞ (v) = (A.3.7)
k≥1

We claim that ψ is the desired isomorphism between (G1 , o1 ) and (G2 , o2 ). The map ψ is clearly
bijective, since φ0k : Uk → φ0k (Uk ) is bijective. Further, let u, v ∈ V (G1 ). Denote
k = max{distG1 (o1 , u), distG1 (o1 , v)}.

Then u, v ∈ Vk 1 . Because φ0k is an isomorphism between Bk 1 (o1 ) and Bk 2 (o2 ), it follows that
(G ) (G ) (G )

φ0k (u), φ0k (v) ∈ V (Bk 2 (o2 )), and further that {φ0k (u), φ0k (v)} ∈ E(Bk 2 (o2 )) precisely when {u, v} ∈
(G ) (G )

(G1 ) 0 (G1 ) (G )
E(Bk (o1 )). Since ψ = φk on Vk , it then also follows that {ψ(u), ψ(v)} ∈ E(Bk 2 (o2 )) precisely
(G1 )
when {u, v} ∈ E(Bk (o1 )), as required. Finally, ψ(o1 ) = φk (o1 ) and φk (o1 ) = o2 for every k ≥ 0.
This completes the proof.

Proof of Proposition A.9. We note that if (Ĝ1 , ô1 ) ' (G1 , o1 ) and (Ĝ2 , ô2 ) ' (G2 , o2 ) then we have that
(G ) (G ) (Ĝ ) (Ĝ )
Br 1 (o1 ) ' Br 2 (o2 ) if and only if Br 1 (ô1 ) ' Br 2 (ô2 ). Therefore dG? ((G1 , o1 ), (G2 , o2 )) is in-
dependent of the exact choice of representative in the equivalence class of (G1 , o1 ) and (G2 , o2 ). In partic-
ular, dG? ((G1 , o1 ), (G2 , o2 )) is constant on such equivalence classes. This makes dG? ([G1 , o1 ], [G2 , o2 ])
well defined for [G1 , o1 ], [G2 , o2 ] ∈ [G? ].
(G )
Proof of Proposition A.10. (a) Assume that dG? ((G1 , o1 ), (G2 , o2 )) = 0. Then we have Br 1 (o1 ) '
(G )
Br 2 (o2 ) for all r ≥ 0, so that, by Lemma A.11, we also have that (G1 , o1 ) ' (G2 , o2 ) as required.
The proof of (b) is trivial and omitted.
For (c) and i, j ∈ [3], let
(Gj )
rij = sup{r : Br(Gi ) (oi ) ' Br (oj )}. (A.3.8)
(G ) (G ) (G ) (G )
Then Br 1 (o1 ) ' Br 3 (o3 ) for all r ≤ r13 and Br 2 (o2 ) ' Br 3 (o3 ) for all r ≤ r23 . We conclude
(G ) (G )
that Br 1 (o1 ) ' Br 2 (o2 ) for all r ≤ min{r13 , r23 }, so that r12 ≥ min{r13 , r23 }. This implies that
1/(r12 + 1) ≤ max{1/(r13 + 1), 1/(r23 + 1)},
which in turn implies the claim (recall (2.2.2)).

A.3.2 C OMPLETENESS OF THE S PACE ([G? ], dG? ) OF ROOTED G RAPHS

In this subsection we prove that ([G? ], dG? ) is complete:
Proposition A.12 (Completeness) The metric space ([G? ], dG? ) is complete.
Before giving the proof of Proposition A.12, we state and prove an important ingredient in it:
Lemma A.13 (Coherence: compatible rooted graph sequences have a limit) Let ((Gr , or ))r≥0 be con-
nected locally finite rooted graphs that are compatible, meaning that Br(Gs ) (os ) ' (Gr , or ) for all r ≤ s.
Then there exists a connected locally finite rooted graph (G, o) such that (Gr (or ), or ) ' Br(G) (o). More-
over, (G, o) is unique up to isomorphisms.
It is implicit in Lemma A.13 that (Gr (or ), or ) = Br(Gr ) (or ).
Proof The crucial point is that (Gr , or ) might not have the same vertex sets for different r. Therefore, we
first create a version (G0r , o0r ) ' (Gr , or ) that does.
For this, we first construct isomorphic copies of (Gr , or ) on a common node set, in a compatible
way. To do this, denote Vr = [Nr ], where Nr = |V (Br(Gr ) (or ))|. We define a sequence of bijections
468 Metric Space Structure of Rooted Graphs

φr : V (Br(Gr ) (or )) → Vr recursively as follows. Let φ0 be the unique isomorphism from V (B0(Gr ) (o0 )) =
{o0 } to V0 = {1}.
(Gr )
Let ψr be an isomorphism between (Gr−1 , or−1 ) and Br−1 (or ), and ηr an arbitrary bijection between
(Gr )
V (Gr ) \ V (Br−1 (or )) to Vr \ Vr−1 . Define
(
φr−1 (ψr−1 (v)) for v ∈ V (Br−1 (Gr )
(or ));
φr (v) = (Gr ) (A.3.9)
ηr (v) for v ∈ V (Gr ) \ V (Br−1 (or )),

so that φr is a bijection from V (Gr ) to Vr .

Then we define (G0r , o0r ) = (φr (Gr ), φr (or )), where φr (Gr ) is the graph consisting of the vertex set
{φr (v) : v ∈ V (Gr )} = Vr and edge set {{φr (u), φr (v)} : {u, v} ∈ E(Gr )}.
Let us derive some properties of (G0r , o0r ). First of all, we see that o0r = 1 for every r ≥ 0, by construc-
(G0 )
tion. Further, φr is a bijection, so that Br r (o0r ) ' Br(Gr ) (or ) = (Gr , or ).
S Now we0 are ready S to define (G, o). We define the root S as o = 0 1, the vertex set of G by (G)V (G) =
r≥1 V (Gr ) = r≥1 Vr , and the edge set E(G) = r≥1 E(Gr ). Then it follows that Br (o) =
(G0 )
Br r (o0r ) ' Br(Gr ) (or ) = (Gr , or ), as required. Further, (G, o) is locally finite and connected, since
(Gr , or ) is so for every r ≥ 0. Finally, to verify uniqueness apart from isomosphisms, note that if (G0 , o0 )
0 0
also satisfies that Br(G ) (o0 ) ' Br(Gr ) (or ) = (Gr , or ) for every r ≥ 0, then Br(G ) (o0 ) ' Br(G) (o) for
0 0
every r ≥ 0, so that (G, o) ' (G , o ) by Lemma A.11 as required.
Proof of Proposition A.12. To verify the completeness of the metric space ([G? ], dG? ), fix a Cauchy se-
quence ([Gn , on ])n≥1 with representative rooted graphs ((Gn , on ))n≥1 . Then, for every ε > 0, there exists
an N = N (ε) ≥ 0 such that, for every n, m ≥ N ,
dG? ([Gn , on ], [Gm , om ]) ≤ ε. (A.3.10)
By Proposition A.9,
dG? ([Gn , on ], [Gm , om ]) = dG? ((Gn , on ), (Gm , om )), (A.3.11)
so that from now on we can work with the representatives instead. Since dG? ((Gn , on ), (Gm , om )) ≤ ε,
we obtain that Br(Gn ) (on ) ' Br(Gm ) (om ) for all r ≤ 1/ε − 1 and n, m ≥ N .
Equivalently, the fact that ([Gn , on ])n≥1 is a Cauchy sequence implies that, for every r ≥ 1, there exists
an nr such that, for all n ≥ nr ,
(Gnr )
Br(Gn ) (on ) ' Br (onr ). (A.3.12)
(Gnr )
Clearly, we may select nr such that r 7→ nr is strictly increasing. Define = Br (G0r , o0r )
(onr ). Then
((G0r , o0r ))r≥0 forms a compatible sequence as in Lemma A.13. By Lemma A.13, there exists a locally
finite rooted graph (G, o) such that Br(G) (o) ' (G0r , o0r ). But then also

(onr ) = (G0r , o0r ).

(Gnr )
Br(Gn ) (on ) ' Br (A.3.13)
This, in turn, implies that, for all n ≥ nr ,
dG? ([G, o], [Gn , on ]) = dG? ((G, o), (Gn , on )) ≤ 1/(r + 1). (A.3.14)
Since r ≥ 1 is arbitrary, we conclude that [Gn , on ] converges to [G, o], which is in [G? ], so that ([G? ], dG? )
is complete.

A.3.3 S EPARABILITY OF THE S PACE ([G? ], dG? ) OF ROOTED G RAPHS

In this subsection we prove that ([G? ], dG? ) is separable:
Proposition A.14 (Separability) The metric space ([G? ], dG? ) is separable.
Proof We need to show that there exists a countable dense subset in ([G? ], dG? ). Consider the set of
all finite rooted graphs, which is certainly countable. Fix [G, o] ∈ [G? ] with representative (G, o). Then
Br(G) (o) is a finite rooted graph for all r ≥ 0. Finally, dG? (Br(G) (o), (G, o)) ≤ 1/(r + 1), so that Br(G) (o)
converges to (G, o) when r → ∞. Thus, the space of isomorphism classes of finite rooted graphs is dense
and countable. This completes the proof that (G? , dG? ) is separable.
A.3 Properties of the Metric dG? on Rooted Graphs 469

A.3.4 ([G? ], dG? ) IS P OLISH : P ROOF OF T HEOREM A.8

Here we use the above results to complete the proof of Theorem A.8:

Proof of Theorem A.8. The function dG? is well defined on [G? ] × [G? ] by Proposition A.9. Proposition
A.10 implies that dG? is an (ultra)metric on [G? ]. Finally, Proposition A.12 proves that ([G? ], dG? ) is com-
plete, while Proposition A.14 proves that ([G? ], dG? ) is separable. Thus, ([G? ], dG? ) is a Polish space.

A.3.5 T HE L AWS OF N EIGHBORHOODS D ETERMINE D ISTRIBUTIONS ON G?

In this subsection we show that the laws of neighborhoods determine distributions on G? , as was crucially
used in the proof of Theorem 2.7:
Proposition A.15 (Laws of neighborhoods determine distributions) Let µ and µ0 be two distributions on
G? such that µ(Br(G) (o) ' H? ) = µ0 (Br(G) (o) ' H? ) for all r ≥ 1. Then µ = µ0 .
Proof This is Theorem A.7(ii), and here we also give a direct proof. The measures µ and µ0 satisfy µ = µ0
precisely when µ(H? ) = µ0 (H? ) for every measurable H? ⊆ G? . Fix H? ⊆ G? . For r ≥ 0, denote
0
H? (r) = {(G, o) : ∃(G0 , o0 ) ∈ H? such that Br(G) (o) ' Br(G ) (o0 )}. (A.3.15)

Thus, H? (r) contains those rooted graphs whose r-neighborhood is the same as that of a rooted graph in
H? . Clearly, H? (r) & H? as r → ∞. Therefore, also µ(H? (r)) & µ(H? ) and µ0 (H? (r)) & µ0 (H? ).
Finally, note that (G, o) ∈ H? (r) if and only if Br(G) (o) ∈ H? (r). Thus,
X
µ(H? (r)) = µ(Br(G) (o) ' H? ) (A.3.16)
H? ∈H? (r)

(where we realize that the fact that the sum is over equivalence classes makes the events {Br(G) (o) ' H? }
disjoint). Since µ(Br(G) (o) ' H? ) = µ0 (Br(G) (o) ' H? ), we conclude that
X X
µ(H? (r)) = µ(Br(G) (o) ' H? ) = µ0 (Br(G) (o) ' H? ) = µ0 (H? (r)), (A.3.17)
H? ∈H? (r) H? ∈H? (r)

so that µ(H? ) = µ0 (H? ), as required.

A.3.6 C OMPACT S ETS IN ([G? ], dG? ) AND T IGHTNESS

In this subsection we investigate compact sets in the metric space ([G? ], dG? ), after which we formulate a
convenient tightness condition. First, we recall the definition of compactness:
Definition A.16 (Compact sets and tightness on general metric spaces) Let X be a general metric space.
A set K is compact when every collection of open sets covering K has a finite subset. A sequence of random
variables (Xn )n≥1 living on a general metric space X is tight when, for every ε > 0, there exists a compact
set K = Kε such that lim supn→∞ P(Xn ∈ Kεc ) ≤ ε. J
The notion of tightness is thus intimately related to compact sets. For real-valued random variables,
K = [−K, K] are convenient compact sets. For ([G? ], dG? ), we first investigate what compact sets look
like:
Theorem A.17 (Compact sets in ([G? ], dG? )) For (G, o) ∈ G? and r ≥ 1, define

∆r (G, o) = max{dv(G) : v ∈ V (Br(G) (o))}, (A.3.18)

where d(G)
v denotes the degree of v ∈ V (G). Then, a closed family of equivalence classes of rooted graphs
[K] ⊆ [G? ] is compact if and only if

sup ∆r (G, o) < ∞ for all r ≥ 1. (A.3.19)

(G,o)∈K

Proof Recall from (Rudin, 1991, Theorem A.4) that a closed set K is compact when it is totally bounded,
meaning that, for every ε > 0, the set K can be covered by finitely many balls of radius ε. As a result, for
470 Metric Space Structure of Rooted Graphs

every r ≥ 1, there must be graphs (F1 , o1 ), . . . , (F` , o` ) such that K is covered by the finitely many open
sets
{(G, o) : Br(G) (o) ' Br(Fi ) (oi )}. (A.3.20)
(Fi )
Equivalently, every (G, o) ∈ K satisfies Br(G) (o) ' Br (oi ) for some i ∈ [`]. In turn, this is equivalent
to the statement that the set
Ar = {Br(G) (o) : (G, o) ∈ K} (A.3.21)
is finite for every r ≥ 1.
We finally prove that Ar is finite for every r ≥ 1 precisely when (A.3.19) holds. Denote ∆r =
sup(G,o)∈K ∆r (G, o). If ∆r is finite for every r ≥ 1, then, because every (G, o) ∈ K is connected,
the graphs Br(G) (o) can have at most

|V (Br(G) (o))| ≤ 1 + ∆r + · · · + ∆rr

many vertices, so that Ar is finite. On the other hand, when ∆r = ∞, K contains a sequence of rooted
(G )
graphs (Gi , oi ) such that ∆r (Gi , oi ) → ∞, so that we also have |V (Br i (oi ))| → ∞. Since rooted
graphs with different numbers of vertices are non-isomorphic (recall Exercise 2.1), this shows that Ar is
infinite.
We continue by giving a more convenient tightness criterium for local weak convergence:
Theorem A.18 (Tightness criterion for local weak convergence) Let (Gn )n≥1 be a sequence of (possibly
random) graphs with |V (Gn )| → ∞. Let d(G on
n)
denote the degree of on in Gn , where on is chosen uar from
the vertex set V (Gn ) of Gn . Then ((Gn , on ))n≥1 is tight when (do(Gnn ) )n≥1 forms a uniformly integrable
sequence of random variables.
The needed uniform integrability in Theorem A.18 is quite suggestive. Indeed, in many random graph
models, such as the configuration model, the degree of a random neighbor of a vertex has the size-biased
degree distribution (recall (1.4.18)). When (d(G n)
on )n≥1 forms a uniformly integrable sequence of random
variables, there exists a subsequence along which Dn? , the size-biased version of Dn = do(Gnn ) , converges
in distribution (see Exercise 2.11).
For the configuration model, Conditions 1.7(a),(b) imply that (d(G n)
on )n≥1 is a tight sequence of random
variables (see Exercise 2.12). Further, [V1, Theorem 7.25] discusses how Conditions 1.7(a),(b) imply con-
vergence of the degrees of neighbors of the uniform vertex on , a distribution that is given by Dn? . Of course,
for local weak convergence to hold, one certainly needs that the degrees of neighbors converge in distribu-
tion. Thus, at least for the configuration model, we can fully understand why the uniform integrability of
(do(Gnn ) )n≥1 is needed.
In general, however, local weak convergence does not imply that (d(G n)
on )n≥1 is uniformly integrable
(see Exercises 2.13 and 2.14). This is due to the fact that a small proportion of vertices may have degrees
that are very large.
Proof of Theorem A.18. Let A be a family of finite graphs. For a graph G, let o denote a random vertex
drawn uar from V (G), let U (G)
= (G, o) be the rooted graph obtained by rooting G at o, and let µG be its
law. We need to show that if d(G)
o : G ∈ A is a uniformly integrable sequence of random variables, then
the family A is tight. Let
o 1{d(G) >d} .
f (d) = sup E d(G)

(A.3.22)
G∈A o

By assumption, limd→∞ f (d) = 0. Write m(G) = E do(G) ]. Thus, 1 ≤ m(G) ≤ f (0) < ∞. Write µ?G

for the degree-biased probability measure on {(G, v) : v ∈ V (G)}, that is,

d(G)
µ?G [(G, v)] = v
× µG [(G, v)], (A.3.23)
m(G)
and oG for the corresponding root. Since µG ≤ m(G)µ?G ≤ f (0)µ?G , it suffices to show that {µ?G : G ∈ A}
is tight. Note that {d(G)
oG : G ∈ A} is tight by assumption.
For r ∈ N, let FrM (v) be the event such that there is some vertex at distance at most r from v whose
A.4 Notes and Discussion 471

degree is larger than M . Let X be a uniform random neighbor of oG . Because µ?G is a stationary measure
for a simple random walk, FrM (oG ) and FrM (X) have the same probability. Also,

P Fr+1
M
(oG ) d(G)
oG oG P Fr (X) doG .
≤ d(G) M (G)
(A.3.24)

We claim that, for all r ∈ N and ε > 0, there exists M < ∞ such that P FrM (X) ≤ ε for all G ∈ A.

?
This clearly implies that {µG : G ∈ A} is tight. We prove the claim by induction on r.
The statement for r = 0 is trivial. Given that the property holds for r, let us now show it for r + 1. Given
ε > 0, choose d sufficiently large that P(d(G)
oG > d) ≤ ε/2 for all G ∈ A. Also, choose M sufficiently large
that P(FrM (oG )) ≤ ε/(2d) for all G ∈ A. Write F for the event that d(G) oG > d. Then, by conditioning on
d(G)
oG , we see that

(oG ) ≤ P(F ) + E 1F c P Fr+1

h i
P Fr+1
M M
(oG ) d(G)
oG

≤ ε/2 + E 1F c do(G)
h i
G
P FrM (oG ) d(G) oG

≤ ε/2 + E 1F c dP FrM (oG ) d(G)

h i
oG

≤ ε/2 + dP FrM (oG ) ≤ ε/2 + dε/(2d) = ε, (A.3.25)

for all G ∈ A, which proves the claim.

A.4 N OTES AND D ISCUSSION

In Section A.1 we drew inspiration from Howes (1995) and Rudin (1987, 1991). Section A.2 is, to a large
extent, based on (Curien, 2018, Section 1.2). I am grateful to Nicolas Curien for sharing his material, and
allowing me to use it in this book. Definition A.3 is inspired by discussions with Simon Irons. Section A.3
is based to a large extent on (Leskelä, 2019, Appendix B); some parts of the presented material are copied
almost verbatim from there. I am grateful to Lasse Leskelä for making me aware of the subtleties of the
proof, as well as sharing his preliminary version of these notes.
Random variables on general metric graphs can be hard to fathom. For example, single probability
measures may not be tight. Since the space of equivalence classes of rooted graphs is Polish (see Theorem
A.8), (Parthasarathy, 1967, Theorem 3.2 in Chapter 2) implies that single probability measures are tight.
The tightness statement in Theorem A.18 is (Benjamini et al., 2015, Theorem 3.1). Benjamini et al.
(2015) used the term network instead of a marked graph. We avoid the term networks here, as it may cause
confusion with the complex networks in the real world that form the inspiration for this book. A related
proof can be found in Angel and Schramm (2003).
Aldous (1991) investigated local weak convergence in the context of finite trees and called trees rooted
at a uniform vertex fringe trees. For fringe trees, the uniform integrability of the degree of a random vertex
is equivalent to tightness of the resulting tree in the local weak sense (see (Aldous, 1991, Lemma 4(ii))).
R EFERENCES

Abbe, E., and Sandon, C. 2018. Proof of the achievability conjectures for the general stochastic block
model. Comm. Pure Appl. Math., 71(7), 1334–1406.
Abdullah, M. A., Bode, M., and Fountoulakis, N. 2017. Typical distances in a geometric model for complex
networks. Internet Math., pp. 38.
Achlioptas, D., D’Souza, R., and Spencer, J. 2009. Explosive percolation in random networks. Science,
323(5920), 1453–1455.
Aiello, W., Bonato, A., Cooper, C., Janssen, J., and Pralat, P. 2008. A spatial web graph model with local
influence regions. Internet Math., 5(1-2), 175–196.
Aldous, D. 1985. Exchangeability and related topics. Pages 1–198 of: École d’été de probabilités de
Saint-Flour, XIII–1983. Lecture Notes in Math., vol. 1117. Springer.
Aldous, D. 1991. Asymptotic fringe distributions for general families of random trees. Ann. Appl. Probab.,
1(2), 228–266.
Aldous, D., and Lyons, R. 2007. Processes on unimodular random networks. Electron. J. Probab., 12(54),
1454–1508.
Aldous, D., and Steele, J.M. 2004. The objective method: probabilistic combinatorial optimization and
local weak convergence. Pages 1–72 of: Probability on discrete structures. Encyclopaedia Math. Sci.,
vol. 110. Springer.
Anantharam, V., and Salez, J. 2016. The densest subgraph problem in sparse random graphs. Ann. Appl.
Probab., 26(1), 305–327.
Andreis, L., König, W., and Patterson, R. 2021. A large-deviations principle for all the cluster sizes of a
sparse Erdős–Rényi graph. Random Structures Algorithms, 59(4), 522–553.
Andreis, L., König, W., and Patterson, R. 2023. A large-deviations principle for all the components in a
sparse inhomogeneous random graph. Probab. Theory Rel. Fields, 186(1-2), 521–620.
Angel, O., and Schramm, O. 2003. Uniform infinite planar triangulations. Comm. Math. Phys., 241(2-3),
191–213.
Antunović, T., Mossel, E., and Rácz, M. 2016. Coexistence in preferential attachment networks. Combin.
Probab. Comput., 25(6), 797–822.
Arratia, R., Barbour, AD., and Tavaré, S. 2003. Logarithmic combinatorial structures: a probabilistic
approach. EMS Monographs in Mathematics. European Mathematical Society (EMS), Zürich.
Artico, I., Smolyarenko, I., Vinciotti, V., and Wit, EC. 2020. How rare are power-law networks really?
Proc. Roy. Soc. A, 476(2241), 20190742.
Athreya, K., and Ney, P. 1972. Branching processes. New York: Springer-Verlag. Die Grundlehren der
mathematischen Wissenschaften, Band 196.
Backhausz, Á., and Szegedy, B. 2022. Action convergence of operators and graphs. Canad. J. Math., 74(1),
72–121.
Ball, F., and Neal, P. 2002. A general model for stochastic SIR epidemics with two levels of mixing. Math.
Biosci., 180, 73–102. John A. Jacquez memorial volume.
Ball, F., and Neal, P. 2004. Poisson approximations for epidemics with two levels of mixing. Ann. Probab.,
32(1B), 1168–1200.
Ball, F., and Neal, P. 2008. Network epidemic models with two levels of mixing. Math. Biosci., 212(1),
69–87.
Ball, F., and Neal, P. 2017. The asymptotic variance of the giant component of configuration model random
graphs. Ann. Appl. Probab., 27(2), 1057–1092.
Ball, F., Mollison, D., and Scalia-Tomba, G. 1997. Epidemics with two levels of mixing. Ann. Appl.
Probab., 7(1), 46–89.
Ball, F., Sirl, D., and Trapman, P. 2009. Threshold behaviour and final outcome of an epidemic on a random
network with household structure. Adv. Appl. Probab., 41(3), 765–796.
Ball, F., Sirl, D., and Trapman, P. 2010. Analysis of a stochastic SIR epidemic on a random network
incorporating household structure. Math. Biosci., 224(2), 53–73.

473
474 References

Banerjee, S., and Olvera-Cravioto, M. 2022. PageRank asymptotics on directed preferential attachment
networks. Ann. Appl. Probab., 32(4), 3060–3084.
Banerjee, S., Deka, P., and Olvera-Cravioto, M. 2023. Local weak limits for collapsed branching processes
with random out-degreess. arXiv:2302.00562 [math.PR].
Barabási, A.-L. 2002. Linked: The new science of networks. Perseus Publishing.
Barabási, A.-L. 2016. Network science. Cambridge University Press.
Barabási, A.-L. 2018. Love is all you need: Clauset’s fruitless search for scale-free networks. Blog post
available at www.barabasilab.com/post/love-is-all-you-need.
Barabási, A.-L., and Albert, R. 1999. Emergence of scaling in random networks. Science, 286(5439),
509–512.
Barbour, A. D., and Reinert, G. 2001. Small worlds. Random Structures Algorithms, 19(1), 54–74.
Barbour, A. D., and Reinert, G. 2004. Correction: “Small worlds” [Random Structures Algorithms 19(1)
(2001) 54–74; MR1848027]. Random Structures Algorithms, 25(1), 115.
Barbour, A. D., and Reinert, G. 2006. Discrete small world networks. Electron. J. Probab., 11(47), 1234–
1283 (electronic).
Barbour, A. D., and Röllin, A. 2019. Central limit theorems in the configuration model. Ann. Appl. Probab.,
29(2), 1046–1069.
Beirlant, J., Goegebeur, Y., Segers, J., and Teugels, J. 2006. Statistics of extremes: theory and applications.
John Wiley and Sons.
Bender, E. A., and Canfield, E. R. 1978. The asymptotic number of labelled graphs with given degree
sequences. J. Combin. Theory (A), 24, 296–307.
Benjamini, I., and Schramm, O. 2001. Recurrence of distributional limits of finite planar graphs. Electron.
J. Probab., 6(23), 13 pp. (electronic).
Benjamini, I., Kesten, H., Peres, Y., and Schramm, O. 2004. Geometry of the uniform spanning forest:
transitions in dimensions 4, 8, 12, ... Ann. Math. (2), 160(2), 465–491.
Benjamini, I., Lyons, R., and Schramm, O. 2015. Unimodular random trees. Ergodic Theory Dynam.
Systems, 35(2), 359–373.
Berger, N. 2002. Transience, recurrence and critical behavior for long-range percolation. Comm. Math.
Phys., 226(3), 531–558.
Berger, N., Borgs, C., Chayes, J. T., D’Souza, R. M., and Kleinberg, R. D. 2004. Competition-induced
preferential attachment. Pages 208–221 of: Automata, languages and programming. Lecture Notes in
Comput. Sci., vol. 3142. Springer.
Berger, N., Borgs, C., Chayes, J. T., D’Souza, R. M., and Kleinberg, R. D. 2005. Degree distribution of
competition-induced preferential attachment graphs. Combin. Probab. Comput., 14(5-6), 697–721.
Berger, N., Borgs, C., Chayes, J., and Saberi, A. 2014. Asymptotic behavior and distributional limits of
preferential attachment graphs. Ann. Probab., 42(1), 1–40.
Bhamidi, S., Bresler, G., and Sly, A. 2008. Mixing time of exponential random graphs. Pages 803–812 of:
FOCS ’08: Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science.
IEEE Computer Society.
Bhamidi, S., Bresler, G., and Sly, A. 2011. Mixing time of exponential random graphs. Ann. Appl. Probab.,
21(6), 2146–2170.
Bhamidi, S., Evans, S., and Sen, A. 2012. Spectra of large random trees. J. Theoret. Probab., 25(3),
613–654.
Bhattacharya, A., Chen, B., van der Hofstad, R., and Zwart, B. 2020. Consistency of the PLFit estimator
for power-law data. arXiv:2002.06870 [math.PR].
Billingsley, P. 1968. Convergence of probability measures. John Wiley and Sons.
Bingham, N. H., Goldie, C. M., and Teugels, J. L. 1989. Regular variation. Encyclopedia of Mathematics
and its Applications, vol. 27. Cambridge University Press.
Biskup, M. 2004. On the scaling of the chemical distance in long-range percolation models. Ann. Probab.,
32(4), 2938–2977.
Biskup, M., and Lin, J. 2019. Sharp asymptotic for the chemical distance in long-range percolation. Random
Structures Algorithms, 55(3), 560–583.
References 475

Bläsius, T., Friedrich, T., and Krohmer, A. 2018. Cliques in hyperbolic random graphs. Algorithmica,
80(8), 2324–2344.
Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. 2008. Fast unfolding of communities in
large networks. J. Statis. Mech.: Theory and Experiment, 2008(10).
Bloznelis, M. 2009. A note on log log distances in a power law random intersection graph. arXiv:0911.5127
[math.PR].
Bloznelis, M. 2010a. Component evolution in general random intersection graphs. SIAM J. Discrete Math.,
24(2), 639–654.
Bloznelis, M. 2010b. The largest component in an inhomogeneous random intersection graph with cluster-
ing. Electron. J. Combin., 17(1), Research Paper 110, 17.
Bloznelis, M. 2013. Degree and clustering coefficient in sparse random intersection graphs. Ann. Appl.
Probab., 23(3), 1254–1289.
Bloznelis, M., Götze, F., and Jaworski, J. 2012. Birth of a strongly connected giant in an inhomogeneous
random digraph. J. Appl. Probab., 49(3), 601–611.
Bloznelis, M., Godehardt, E., Jaworski, J., Kurauskas, V., and Rybarczyk, K. 2015. Recent progress in com-
plex network analysis: models of random intersection graphs. Pages 69–78 of: Data science, learning
by latent structures, and knowledge discovery. Springer.
Bode, M., Fountoulakis, N., and Müller, T. 2015. On the largest component of a hyperbolic model of
complex networks. Electron. J. Combin., 22(3), Paper 3.24, 46.
Boguná, M., Papadopoulos, F., and Krioukov, D. 2010. Sustaining the internet with hyperbolic mapping.
Nature Commun., 1(1), 1–8.
Bohman, T., and Frieze, A. 2001. Avoiding a giant component. Random Structures Algorithms, 19(1),
75–85.
Bohman, T., and Frieze, A. 2002. Addendum to “Avoiding a giant component” [Random Structures Algo-
rithms 19(1) (2001), 75–85; MR1848028]. Random Structures Algorithms, 20(1), 126–130.
Boldi, P., and Vigna, S. 2004. The WebGraph Framework I: compression techniques. Pages 595–601 of:
Proc. 13th International World Wide Web Conference (WWW 2004). ACM Press.
Boldi, P., Rosa, M., Santini, M., and Vigna, S. 2011. Layered label propagation: a multiresolution
coordinate-free ordering for compressing social networks. Pages 587–596 of: Proceedings of the 20th
International Conference on the World Wide Web. ACM Press.
Bollobás, B. 1980. A probabilistic proof of an asymptotic formula for the number of labelled regular graphs.
European J. Combin., 1(4), 311–316.
Bollobás, B. 2001. Random graphs. Second edn. Cambridge Studies in Advanced Mathematics, vol. 73.
Cambridge University Press.
Bollobás, B., and Fernandez de la Vega, W. 1982. The diameter of random regular graphs. Combinatorica,
2(2), 125–134.
Bollobás, B., and Riordan, O. 2004a. The diameter of a scale-free random graph. Combinatorica, 24(1),
5–34.
Bollobás, B., and Riordan, O. 2004b. Shortest paths and load scaling in scale-free trees. Phys. Rev. E, 69,
036114.
Bollobás, B., and Riordan, O. 2006. Percolation. Cambridge University Press.
Bollobás, B., and Riordan, O. 2015. An old approach to the giant component problem. J. Combin. Theory
Ser. B, 113, 236–260.
Bollobás, B., Riordan, O., Spencer, J., and Tusnády, G. 2001. The degree sequence of a scale-free random
graph process. Random Structures Algorithms, 18(3), 279–290.
Bollobás, B., Janson, S., and Riordan, O. 2005. The phase transition in the uniformly grown random graph
has infinite order. Random Structures Algorithms, 26(1-2), 1–36.
Bollobás, B., Janson, S., and Riordan, O. 2007. The phase transition in inhomogeneous random graphs.
Random Structures Algorithms, 31(1), 3–122.
Bollobás, B., Janson, S., and Riordan, O. 2011. Sparse random graphs with clustering. Random Structures
Algorithms, 38(3), 269–323.
Bordenave, C. 2016. Lecture notes on random graphs and probabilistic combinatorial optimization. Version
April 8, 2016. Available at www.math.univ-toulouse.fr/˜bordenave/coursRG.pdf.
476 References

Bordenave, C., and Caputo, P. 2015. Large deviations of empirical neighborhood distribution in sparse
random graphs. Probab. Theory Rel. Fields, 163(1-2), 149–222.
Bordenave, C., and Lelarge, M. 2010. Resolvent of large random graphs. Random Structures Algorithms,
37(3), 332–352.
Bordenave, C., Lelarge, M., and Salez, J. 2011. The rank of diluted random graphs. Ann. Probab., 39(3),
1097–1121.
Bordenave, C., Lelarge, M., and Salez, J. 2013. Matchings on infinite graphs. Probab. Theory Rel. Fields,
157(1-2), 183–208.
Bordenave, C., Lelarge, M., and Massoulié, L. 2018. Nonbacktracking spectrum of random graphs: com-
munity detection and nonregular Ramanujan graphs. Ann. Probab., 46(1), 1–71.
Box, G. E. P. 1976. Science and statistics. J. Amer. Statist. Assoc., 71(356), 791–799.
Box, G. E. P. 1979. Robustness in the strategy of scientific model building. Pages 201–236 of: Robustness
in statistics. Elsevier.
Bringmann, K., Keusch, R., and Lengler, J. 2017. Sampling geometric inhomogeneous random graphs in
linear time. In: Proceeding of the 25th Annual European Symposium on Algorithms (ESA 2017). Schloss
Dagstuhl-Leibniz-Zentrum fuer Informatik.
Bringmann, K., Keusch, R., and Lengler, J. 2019. Geometric inhomogeneous random graphs. Theoret.
Comput. Sci., 760, 35–54.
Bringmann, K., Keusch, R., and Lengler, J. 2020. Average distance in a general class of scale-free networks
with underlying geometry. arXiv: 1602.05712 [cs.DM].
Britton, T., Deijfen, M., and Martin-Löf, A. 2006. Generating simple random graphs with prescribed degree
distribution. J. Statist. Phys., 124(6), 1377–1397.
Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., and Wiener, J.
2000. Graph structure in the Web. Computer Networks, 33, 309–320.
Broido, A., and Clauset, A. 2019. Scale-free networks are rare. Nature Commun., 10(1), 1017.
Cai, X. S., and Perarnau, G. 2021. The giant component of the directed configuration model revisited.
ALEA Lat. Am. J. Probab. Math. Statist., 18(2), 1517–1528.
Cai, X. S., and Perarnau, G. 2023. The diameter of the directed configuration model. Ann. Inst. Henri
Poincaré Probab. Stat., 59(1), 244–270.
Callaway, D. S., Hopcroft, J. E., Kleinberg, J. M., Newman, M. E. J., and Strogatz, S. H. 2001. Are randomly
grown graphs really random? Phys. Rev. E, 64, 041902.
Cao, J., and Olvera-Cravioto, M. 2020. Connectivity of a general class of inhomogeneous random digraphs.
Random Structures Algorithms, 56(3), 722–774.
Caravenna, F., Garavaglia, A., and van der Hofstad, R. 2019. Diameter in ultra-small scale-free random
graphs. Random Structures Algorithms, 54(3), 444–498.
Chakraborty, S., van der Hofstad, R., and den Hollander, F. 2021. Sparse random graphs with many trian-
gles. arXiv:2112.06526 [math.PR].
Chatterjee, S. 2017. Large deviations for random graphs. Lecture Notes in Mathematics, vol. 2197.
Springer. Lecture notes from the 45th Probability Summer School held in Saint-Flour, June 2015.
Chatterjee, S., and Diaconis, P. 2013. Estimating and understanding exponential random graph models.
Ann. Statist., 41(5), 2428–2461.
Chatterjee, S., and Durrett, R. 2009. Contact processes on random graphs with power law degree distribu-
tions have critical value 0. Ann. Probab., 37(6), 2332–2356.
Chatterjee, S., and Varadhan, S. R. S. 2011. The large deviation principle for the Erdős–Rényi random
graph. European J. Combin., 32(7), 1000–1017.
Chen, N., and Olvera-Cravioto, M. 2013. Directed random graphs with given degree distributions. Stoch.
Syst., 3(1), 147–186.
Chung, F., and Lu, L. 2001. The diameter of sparse random graphs. Adv. in Appl. Math., 26(4), 257–279.
Chung, F., and Lu, L. 2002a. The average distances in random graphs with given expected degrees. Proc.
Natl. Acad. Sci. USA, 99(25), 15879–15882 (electronic).
Chung, F., and Lu, L. 2002b. Connected components in random graphs with given expected degree se-
quences. Ann. Comb., 6(2), 125–145.
References 477

Chung, F., and Lu, L. 2003. The average distance in a random graph with given expected degrees. Internet
Math., 1(1), 91–113.
Chung, F., and Lu, L. 2004. Coupling online and offline analyses for random power law graphs. Internet
Math., 1(4), 409–461.
Chung, F., and Lu, L. 2006a. Complex graphs and networks. CBMS Regional Conference Series in Math-
ematics, vol. 107.
Chung, F., and Lu, L. 2006b. The volume of the giant component of a random graph with given expected
degrees. SIAM J. Discrete Math., 20, 395–411.
Clauset, A., Shalizi, C., and Newman, M. E. J. 2009. Power-law distributions in empirical data. SIAM
Review, 51(4), 661–703.
Cohen, R., and Havlin, S. 2003. Scale-free networks are ultrasmall. Phys. Rev. Lett., 90, 058701, 1–4.
Colenbrander, D. 2022. Ultra-small world phenomenon in the directed configuration model. M.Phil. thesis,
Eindhoven University of Technology.
Collevecchio, A., Cotar, C., and LiCalzi, M. 2013. On a preferential attachment and generalized Pólya’s
urn model. Ann. Appl. Probab., 23(3), 1219–1253.
Cooper, C., and Frieze, A. 2004. The size of the largest strongly connected component of a random digraph
with a given degree sequence. Combin. Probab. Comput., 13(3), 319–337.
Corten, R. 2012. Composition and structure of a large online social network in the Netherlands. PLOS
ONE, 7(4), 1–8.
Coscia, M. 2021. The atlas for the aspiring network scientist. arXiv:2101.00863 [cs.CY].
Csárdi, G. 2006. Dynamics of citation networks. Pages 698–709 of: Proceedings of the International
Conference on Artificial Neural Networks 2006. Lecture Notes in Computer Science, vol. 4131. Springer.
Curien, N. 2018. Random graphs: the local convergence perspective. Version October 17, 2018. Available
at www.imo.universite-paris-saclay.fr/˜curien/enseignement.html.
Danielsson, J., de Haan, L., Peng, L., and de Vries, C. G. 2001. Using a bootstrap method to choose the
sample fraction in tail index estimation. J. Multivariate Anal., 76(2), 226–248.
Darling, D. A. 1970. The Galton–Watson process with infinite mean. J. Appl. Probab., 7, 455–456.
Davies, P. L. 1978. The simple branching process: a note on convergence when the mean is infinite. J. Appl.
Probab., 15(3), 466–480.
Decelle, A., Krzakala, F., Moore, C., and Zdeborová, L. 2011. Asymptotic analysis of the stochastic block
model for modular networks and its algorithmic applications. Phys. Rev. E, 84(6), 066106.
Deijfen, M. 2009. Stationary random graphs with prescribed iid degrees on a spatial Poisson process.
Electron. Commun. Probab., 14, 81–89.
Deijfen, M., and Jonasson, J. 2006. Stationary random graphs on Z with prescribed iid degrees and finite
mean connections. Electron. Commun. Probab., 11, 336–346 (electronic).
Deijfen, M., and Kets, W. 2009. Random intersection graphs with tunable degree distribution and clustering.
Probab. Engrg. Inform. Sci., 23(4), 661–674.
Deijfen, M., and Meester, R. 2006. Generating stationary random graphs on Z with prescribed independent,
identically distributed degrees. Adv. Appl. Probab., 38(2), 287–298.
Deijfen, M., van den Esker, H., van der Hofstad, R., and Hooghiemstra, G. 2009. A preferential attachment
model with random initial degrees. Ark. Mat., 47(1), 41–72.
Deijfen, M., Häggström, O., and Holroyd, A. 2012. Percolation in invariant Poisson graphs with i.i.d.
degrees. Ark. Mat., 50(1), 41–58.
Deijfen, M., van der Hofstad, R., and Hooghiemstra, G. 2013. Scale-free percolation. Ann. Instit. Henri
Poincaré (B) Prob. Statist., 49(3), 817–838.
Dembo, A., and Montanari, A. 2010a. Gibbs measures and phase transitions on sparse random graphs.
Braz. J. Probab. Statist., 24(2), 137–211.
Dembo, A., and Montanari, A. 2010b. Ising models on locally tree-like graphs. Ann. Appl. Probab., 20(2),
565–592.
Deprez, P., and Wüthrich, M. 2019. Scale-free percolation in continuum space. Commun. Math. Statist.,
7(3), 269–308.
Deprez, P., Hazra, R., and Wüthrich, M. 2015. Inhomogeneous long-range percolation for real-life network
modeling. Risks, 3(1), 1–23.
478 References

Dereich, S., and Mörters, P. 2009. Random networks with sublinear preferential attachment: degree evolu-
tions. Electron. J. Probab., 14, 1222–1267.
Dereich, S., and Mörters, P. 2011. Random networks with concave preferential attachment rule. Jahresber.
Dtsch. Math.-Ver., 113(1), 21–40.
Dereich, S., and Mörters, P. 2013. Random networks with sublinear preferential attachment: the giant
component. Ann. Probab., 41(1), 329–384.
Dereich, S., Mönch, C., and Mörters, P. 2012. Typical distances in ultrasmall random networks. Adv. Appl.
Probab., 44(2), 583–601.
Dereich, S., Mönch, C., and Mörters, P. 2017. Distances in scale free networks at criticality. Electron. J.
Probab., 22, Paper No. 77, 38.
Diaconis, P. 1992. Sufficiency as statistical symmetry. Pages 15–26 of: American Mathematical Society
centennial publications, Vol. II.
Ding, J., Kim, J. H., Lubetzky, E., and Peres, Y. 2010. Diameters in supercritical random graphs via first
passage percolation. Combin. Probab. Comput., 19(5-6), 729–751.
Ding, J., Kim, J. H., Lubetzky, E., and Peres, Y. 2011. Anatomy of a young giant component in the random
graph. Random Structures Algorithms, 39(2), 139–178.
Dommers, S., van der Hofstad, R., and Hooghiemstra, G. 2010. Diameters in preferential attachment graphs.
J. Statist. Phys., 139, 72–107.
Dorogovtsev, S. N., Mendes, J. F. F., and Samukhin, A. N. 2000. Structure of growing networks with
preferential linking. Phys. Rev. Lett., 85(21), 4633–4636.
Dort, L., and Jacob, E. 2023. Local weak limit of dynamical inhomogeneous random graphs. arXiv:
2303.17437 [math.PR].
Draisma, G., de Haan, L., Peng, L., and Pereira, T. 1999. A bootstrap-based method to achieve optimality
in estimating the extreme-value index. Extremes, 2(4), 367–404.
Drees, H., Janßen, A., Resnick, S., and Wang, T. 2020. On a minimum distance procedure for threshold
selection in tail analysis. SIAM J. Math. Data Sci., 2(1), 75–102.
Drmota, M. 2009. Random trees: an interplay between combinatorics and probability. Springer.
Durrett, R. 2003. Rigorous result for the CHKNS random graph model. Pages 95–104 of: Discrete random
walks (Paris, 2003). Association of Discrete Mathematics and Theoretical Computer Sciience.
Durrett, R. 2007. Random graph dynamics. Cambridge Series in Statistical and Probabilistic Mathematics.
Cambridge University Press.
Eckhoff, M., Goodman, J., van der Hofstad, R., and Nardi, F. R. 2013. Short paths for first passage perco-
lation on the complete graph. J. Statist. Phys., 151(6), 1056–1088.
Elek, G. 2007. On limits of finite graphs. Combinatorica, 27(4), 503–507.
Erdős, P., and Rényi, A. 1959. On random graphs. I. Publ. Math. Debrecen, 6, 290–297.
Erdős, P., and Rényi, A. 1960. On the evolution of random graphs. Magyar Tud. Akad. Mat. Kutató Int.
Közl., 5, 17–61.
Erdős, P., and Rényi, A. 1961a. On the evolution of random graphs. Bull. Inst. Internat. Statist., 38, 343–
347.
Erdős, P., and Rényi, A. 1961b. On the strength of connectedness of a random graph. Acta Math. Acad. Sci.
Hungar., 12, 261–267.
Erdős, P., Greenhill, C., Mezei, T., Miklós, I., Soltész, D., and Soukup, L. 2022. The mixing time of switch
Markov chains: a unified approach. European J. Combin., 99, Paper No. 103421, 46.
van den Esker, H., van der Hofstad, R., Hooghiemstra, G., and Znamenski, D. 2006. Distances in random
graphs with infinite mean degrees. Extremes, 8, 111–140.
van den Esker, H., van der Hofstad, R., and Hooghiemstra, G. 2008. Universality for the distance in finite
variance random graphs. J. Statist. Phys., 133(1), 169–202.
Faloutsos, C., Faloutsos, P., and Faloutsos, M. 1999. On power-law relationships of the internet topology.
Computer Commun. Rev., 29, 251–262.
Federico, L. 2023. Almost-2-regular random graphs. Australas. J. Combin., 86, 76–96.
Federico, L., and van der Hofstad, R. 2017. Critical window for connectivity in the configuration model.
Combin. Probab. Comput., 26(5), 660–680.
References 479

Fernholz, D., and Ramachandran, V. 2007. The diameter of sparse random graphs. Random Structures
Algorithms, 31(4), 482–516.
Fienberg, S., and Wasserman, S. 1981. Categorical data analysis of single sociometric relations. Sociologi-
cal Methodology, 12, 156–192.
Fill, J., Scheinerman, E., and Singer-Cohen, K. 2000. Random intersection graphs when m = ω(n): an
equivalence theorem relating the evolution of the G(n, m, p) and G(n, p) models. Random Structures
Algorithms, 16(2), 156–176.
Flaxman, A., Frieze, A., and Vera, J. 2006. A geometric preferential attachment model of networks. Internet
Math., 3(2), 187–205.
Flaxman, A., Frieze, A., and Vera, J. 2007. A geometric preferential attachment model of networks II. In:
Proceedings of Workshop on Algorithms and Models for the Web Graph 2007.
Fountoulakis, N. 2015. On a geometrization of the Chung-Lu model for complex networks. J. Complex
Netw., 3(3), 361–387.
Fountoulakis, N., van der Hoorn, P., Mller, T., and Schepers, M. 2021. Clustering in a hyperbolic model of
complex networks. Electronic J. Probab., 26, 1–132.
Frank, O., and Strauss, D. 1986. Markov graphs. J. Amer. Statist. Assoc., 81(395), 832–842.
Friedrich, T., and Krohmer, A. 2015. On the diameter of hyperbolic random graphs. Pages 614–625
of: Automata, languages, and programming. Part II. Lecture Notes in Computer Science, vol. 9135.
Springer.
Friedrich, T., and Krohmer, A. 2018. On the diameter of hyperbolic random graphs. SIAM J. Discrete
Math., 32(2), 1314–1334.
Fujita, Y., Kichikawa, Y., Fujiwara, Y., Souma, W., and Iyetomi, H. 2019. Local bow-tie structure of the
web. Applied Netw. Sci., 4(1), 1–15.
Gamarnik, D., Nowicki, T., and Swirszcz, G. 2006. Maximum weight independent sets and matchings
in sparse random graphs. Exact results using the local weak convergence method. Random Structures
Algorithms, 28(1), 76–106.
Gao, P., and Greenhill, C. 2021. Mixing time of the switch Markov chain and stable degree sequences.
Discrete Appl. Math., 291, 143–162.
Gao, P., and Wormald, N. 2016. Enumeration of graphs with a heavy-tailed degree sequence. Adv. Math.,
287, 412–450.
Gao, P., van der Hofstad, R., Southwell, A., and Stegehuis, C. 2020. Counting triangles in power-law
uniform random graphs. Electron. J. Combin., 27(3), Paper No. 3.19, 28.
Garavaglia, A., and van der Hofstad, R. 2018. From trees to graphs: collapsing continuous-time branching
processes. J. Appl. Probab., 55(3), 900–919.
Garavaglia, A., van der Hofstad, R., and Woeginger, G. 2017. The dynamics of power laws: fitness and
aging in preferential attachment trees. J. Statist. Phys., 168(6), 1137–1179.
Garavaglia, A., van der Hofstad, R., and Litvak, N. 2020. Local weak convergence for PageRank. Ann.
Appl. Probab., 30(1), 40–79.
Garavaglia, A., Hazra, R., van der Hofstad, R., and Ray, R. 2022. Universality of the local limit in prefer-
ential attachment models. arXiv:2212.05551 [math.PR].
Gilbert, E. N. 1959. Random graphs. Ann. Math. Statist., 30, 1141–1144.
Gleiser, P., and Danon, L. 2003. Community structure in jazz. Adv. Complex Systems, 06(04), 565–573.
Godehardt, E., and Jaworski, J. 2003. Two models of random intersection graphs for classification. Pages
67–81 of: Exploratory data analysis in empirical research. Stud. Classification Data Anal. Knowledge
Organ. Springer.
Goñi, J., Esteban, F., de Mendizábal, N., Sepulcre, J., Ardanza-Trevijano, S., Agirrezabal, I., and Villoslada,
P. 2008. A computational analysis of protein–protein interaction networks in neurodegenerative diseases.
BMC Systems Biology, 2(1), 52.
Grimmett, G. 1999. Percolation. 2nd edn. Springer.
Gugelmann, L., Panagiotou, K., and Peter, U. 2012. Random hyperbolic graphs: degree sequence and
clustering. Pages 573–585 of: Proceedings of the International Colloquium on Automata, Languages,
and Programming. Springer.
480 References

Gulikers, L., Lelarge, M., and Massoulié, L. 2017a. Non-backtracking spectrum of degree-corrected
stochastic block models. Pages 1–27 of: Proceedings of the 8th Innovations in Theoretical Computer
Science Conference. LIPIcs. Leibniz Int. Proc. Inform., vol. 67. Schloss Dagstuhl–Leibniz-Zentrum für
Informatik. Art. No. 44.
Gulikers, L., Lelarge, M., and Massoulié, L. 2017b. A spectral method for community detection in moder-
ately sparse degree-corrected stochastic block models. Adv. Appl. Probab., 49(3), 686–721.
Gulikers, L., Lelarge, M., and Massoulié, L. 2018. An impossibility result for reconstruction in the degree-
corrected stochastic block model. Ann. Appl. Probab., 28(5), 3002–3027.
Gut, A. 2005. Probability: a graduate course. Springer Texts in Statistics. Springer.
Häggström, O., and Jonasson, J. 1999. Phase transition in the random triangle model. J. Appl. Probab.,
36(4), 1101–1115.
Hajek, B. 1990. Performance of global load balancing by local adjustment. IEEE Trans. Inform. Theory,
36(6), 1398–1414.
Hajek, B. 1996. Balanced loads in infinite networks. Ann. Appl. Probab., 6(1), 48–75.
Hajek, B., and Sankagiri, S. 2019. Community recovery in a preferential attachment graph. IEEE Trans.
Inform. Theory, 65(11), 6853–6874.
Hajra, K.B., and Sen, P. 2005. Aging in citation networks. Physica A. Statist. Mech. Applic., 346(1-2),
44–48.
Hajra, K.B., and Sen, P. 2006. Modelling aging characteristics in citation networks. Physica A: Statist.
Mech. Applic., 368(2), 575–582.
Hall, P. 1981. Order of magnitude of moments of sums of random variables. J. London Math. Soc., 24(2),
562–568.
Hall, P., and Welsh, A. 1984. Best attainable rates of convergence for estimates of parameters of regular
variation. Ann. Statist., 12(3), 1079–1084.
Halmos, P. 1950. Measure theory. Van Nostrand.
Hao, N., and Heydenreich, M. 2023. Graph distances in scale-free percolation: the logarithmic case. J.
Appl. Probab., 60(1), 295–313.
Hardy, G. H., Littlewood, J. E., and Pólya, G. 1988. Inequalities. Cambridge Mathematical Library. Cam-
bridge University Press. Reprint of the 1952 edition.
Harris, T. 1963. The theory of branching processes. Die Grundlehren der Mathematischen Wissenschaften,
Band 119. Springer-Verlag.
Hatami, H., Lovász, L., and Szegedy, B. 2014. Limits of locally-globally convergent graph sequences.
Geom. Funct. Anal., 24(1), 269–296.
Heydenreich, M., Hulshof, T., and Jorritsma, J. 2017. Structures in supercritical scale-free percolation. Ann.
Appl. Probab., 27(4), 2569–2604.
Hill, B. M. 1975. A simple general approach to inference about the tail of a distribution. Ann. Statist., 3(5),
1163–1174.
Hirsch, C. 2017. From heavy-tailed Boolean models to scale-free Gilbert graphs. Braz. J. Probab. Statist.,
31(1), 111–143.
van der Hofstad, R. 2017. Random graphs and complex networks. Volume 1. Cambridge Series in Statistical
and Probabilistic Mathematics. Cambridge University Press.
van der Hofstad, R. 2021. The giant in random graphs is almost local. arXiv:2103.11733 [math.PR].
van der Hofstad, R., and Komjáthy, J. 2017. Explosion and distances in scale-free percolation.
arXiv:1706.02597 [math.PR].
van der Hofstad, R., and Komjáthy, J. 2017. When is a scale-free graph ultra-small? J. Statist. Phys., 169(2),
223–264.
van der Hofstad, R., and Litvak, N. 2014. Degree–degree dependencies in random graphs with heavy-tailed
degrees. Internet Math., 10(3-4), 287–334.
van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. 2005. Distances in random graphs with finite
variance degrees. Random Structures Algorithms, 27(1), 76–123.
van der Hofstad, R., Hooghiemstra, G., and Znamenski, D. 2007a. Distances in random graphs with finite
mean and infinite variance degrees. Electron. J. Probab., 12(25), 703–766 (electronic).
References 481

van der Hofstad, R., Hooghiemstra, G., and Znamenski, D. 2007b. A phase transition for the diameter of
the configuration model. Internet Math., 4(1), 113–128.
van der Hofstad, R., van Leeuwaarden, J. S. H., and Stegehuis, C. 2017. Hierarchical configuration model.
Internet Math. arXiv:1512.08397 [math.PR].
van der Hofstad, R., Komjáthy, J., and Vadon, V. 2021. Random intersection graphs with communities. Adv.
Appl. Probab., 53(4), 1061–1089.
van der Hofstad, R., Komjáthy, J., and Vadon, V. 2022. Phase transition in random intersection graphs with
communities. Random Structures Algorithms, 60(3), 406–461.
van der Hofstad, R., van der Hoorn, P., and Maitra, N. 2023. Local limits of spatial inhomogeneous random
graphs. Adv. Appl. Probab., 1–48.
Holland, P., Laskey, K., and Leinhardt, S. 1983. Stochastic blockmodels: first steps. Social Netw., 5(2),
109–137.
Holme, P. 2019. Rare and everywhere: perspectives on scale-free networks. Nature Commun., 10(1), 1016.
van der Hoorn, P., and Olvera-Cravioto, M. 2018. Typical distances in the directed configuration model.
Ann. Appl. Probab., 28(3), 1739–1792.
Howes, N. 1995. Modern analysis and topology. Universitext. Springer-Verlag.
Jacob, E., and Mörters, P. 2015. Spatial preferential attachment networks: power laws and clustering coef-
ficients. Ann. Appl. Probab., 25(2), 632–662.
Jacob, E., and Mörters, P. 2017. Robustness of scale-free spatial networks. Ann. Probab., 45(3), 1680–1722.
Janson, S. 2004. Functional limit theorems for multitype branching processes and generalized Pólya urns.
Stochastic Process. Appl., 110(2), 177–245.
Janson, S. 2008. The largest component in a subcritical random graph with a power law degree distribution.
Ann. Appl. Probab., 18(4), 1651–1668.
Janson, S. 2009. Standard representation of multivariate functions on a general probability space. Electron.
Commun. Probab., 14, 343–346.
Janson, S. 2010a. Asymptotic equivalence and contiguity of some random graphs. Random Structures
Algorithms, 36(1), 26–45.
Janson, S. 2010b. Susceptibility of random graphs with given vertex degrees. J. Combin., 1(3-4), 357–387.
Janson, S. 2011. Probability asymptotics: notes on notation. arXiv:1108.3924 [math.PR].
Janson, S. 2020a. Asymptotic normality in random graphs with given vertex degrees. Random Structures
Algorithms, 56(4), 1070–1116.
Janson, S. 2020b. Random graphs with given vertex degrees and switchings. Random Structures and
Algorithms, 57(1), 3–31.
Janson, S., and Luczak, M. 2007. A simple solution to the k-core problem. Random Structures Algorithms,
30(1-2), 50–62.
Janson, S., and Luczak, M. 2008. Asymptotic normality of the k-core in random graphs. Ann. Appl. Probab.,
18(3), 1085–1137.
Janson, S., and Luczak, M. 2009. A new approach to the giant component problem. Random Structures
Algorithms, 34(2), 197–216.
Janson, S., Łuczak, T., and Rucinski, A. 2000. Random graphs. Wiley-Interscience Series in Discrete
Mathematics and Optimization. Wiley-Interscience.
Janssen, J., Prałat, P., and Wilson, R. 2016. Nonuniform distribution of nodes in the spatial preferential
attachment model. Internet Math., 12(1-2), 121–144.
Jaworski, J., Karoński, M., and Stark, D. 2006. The degree of a typical vertex in generalized random
intersection graph models. Discrete Math., 306(18), 2152–2165.
Jaynes, E. T. 1957. Information theory and statistical mechanics. Phys. Rev., 106(2), 620–630.
Jonasson, J. 2009. Invariant random graphs with iid degrees in a general geography. Probab. Theory Rel.
Fields, 143(3-4), 643–656.
Jordan, J. 2010. Degree sequences of geometric preferential attachment graphs. Adv. Appl. Probab., 42(2),
319–330.
Jordan, J. 2013. Geometric preferential attachment in non-uniform metric spaces. Electron. J. Probab., 18,
no. 8, 15.
482 References

Jordan, J., and Wade, A. 2015. Phase transitions for random geometric preferential attachment graphs. Adv.
Appl. Probab., 47(2), 565–588.
Jorritsma, J., and Komjáthy, J. 2022. Distance evolutions in growing preferential attachment graphs. Ann.
Appl. Probab., 32(6), 4356–4397.
Jorritsma, J., Komjáthy, J., and Mitsche, D. 2023. Cluster-size decay in supercritical kernel-based spatial
random graphs. arXiv:2303.00724 [math.PR].
Kallenberg, O. 2002. Foundations of modern probability. Second edn. Springer.
Kallenberg, O. 2017. Random measures, theory and applications. Probability Theory and Stochastic Mod-
elling, vol. 77. Springer.
Karoński, M., Scheinerman, E., and Singer-Cohen, K. 1999. On random intersection graphs: the subgraph
problem. Combin. Probab. Comput., 8(1-2), 131–159.
Karp, R.M. 1990. The transitive closure of a random digraph. Random Structures Algorithms, 1(1), 73–93.
Karrer, B., and Newman, M. E. J. 2011. Stochastic blockmodels and community structure in networks.
Phys. Rev. E, 83(1), 016107.
Kass, R.E., and Wasserman, L. 1996. The selection of prior distributions by formal rules. J. Amer. Statist.
Assoc., 91(435), 1343–1370.
Kesten, H. 1982. Percolation theory for mathematicians. Progress in Probability and Statistics, vol. 2.
Birkhäuser.
Kesten, H., and Stigum, B. P. 1966. A limit theorem for multidimensional Galton-Watson processes. Ann.
Math. Statist., 37, 1211–1223.
Kingman, J. F. C. 1975. The first birth problem for an age-dependent branching process. Ann. Probab.,
3(5), 790–801.
Kiwi, M., and Mitsche, D. 2015. A bound for the diameter of random hyperbolic graphs. Pages 26–39
of: 2015 Proceedings of the 12th Workshop on Analytic Algorithmics and Combinatorics (ANALCO).
SIAM.
Kiwi, M., and Mitsche, D. 2019. On the second largest component of random hyperbolic graphs. SIAM J.
Discrete Math., 33(4), 2200–2217.
Komjáthy, J,̇ and Lodewijks, B. 2020. Explosion in weighted hyperbolic random graphs and geometric
inhomogeneous random graphs. Stochastic Process. Appl., 130(3), 1309–1367.
Krioukov, D., Papadopoulos, F., Kitsak, M., Vahdat, A., and Boguñá, M. 2010. Hyperbolic geometry of
complex networks. Phys. Rev. E, 82(3), 036106, 18.
Krioukov, D., Kitsak, M., Sinkovits, R., Rideout, D., Meyer, D., and Boguñá, M. 2012. Network cosmology.
Sci. Rep., 2.
Krzakala, F., Moore, C., Mossel, E., Neeman, J., Sly, A., Zdeborová, L., and Zhang, P. 2013. Spectral
redemption in clustering sparse networks. Proc. National Acad. Sci., 110(52), 20935–20940.
Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A., and Upfal, E. 2000. Stochastic
models for the Web graph. Pages 57–65 of: Proceedings of the 42nd Annual IEEE Symposium on
Foundations of Computer Science.
Kunegis, J. 2013. KONECT: the Koblenz network collection. Pages 1343–1350 of: Proceedings of the
22nd International Conference on World Wide Web.
Kunegis, J. 2017. The Koblenz network collection.
Kurauskas, V. 2022. On local weak limit and subgraph counts for sparse random graphs. J. Appl. Probab.,
59(3), 755–776.
Last, G., and Penrose, M. 2018. Lectures on the Poisson process. Institute of Mathematical Statistics
Textbooks, vol. 7. Cambridge University Press.
Lee, J., and Olvera-Cravioto, M. 2020. PageRank on inhomogeneous random digraphs. Stochastic Process.
Appl., 130(4), 2312–2348.
Leskelä, L. 2019. Random graphs and network statistics. Available at http://math.aalto.fi/
˜lleskela/LectureNotes004.html.
Leskovec, J., and Krevl, A. 2014 (Jun). SNAP Datasets: Stanford large network dataset collection. http:
//snap.stanford.edu/data.
Leskovec, J., Kleinberg, J., and Faloutsos, C. 2007. Graph evolution: densification and shrinking diameters.
ACM Trans. Knowledge Discovery from Data (TKDD), 1(1), 2.
References 483

Leskovec, J., Lang, K., Dasgupta, A., and Mahoney, M. 2009. Community structure in large networks:
natural cluster sizes and the absence of large well-defined clusters. Internet Math., 6(1), 29–123.
Litvak, N., and van der Hofstad, R. 2013. Uncovering disassortativity in large scale-free networks. Phys.
Rev. E, 87(2), 022801.
Lo, T. Y. Y. 2021. Weak local limit of preferential attachment random trees with additive fitness.
arXiv:2103.00900 [math.PR].
Lovász, L. 2012. Large networks and graph limits. American Mathematical Society Colloquium Publica-
tions, vol. 60. American Mathematical Society, Providence, RI.
Łuczak, T. 1992. Sparse random graphs with a given degree sequence. Pages 165–182 of: Random graphs,
Vol. 2 (Poznań, 1989). Wiley.
Lyons, R. 2005. Asymptotic enumeration of spanning trees. Combin. Probab. Comput., 14(4), 491–522.
Manna, S., and Sen, P. 2002. Modulated scale-free network in Euclidean space. Phys. Rev. E, 66(6), 066114.
Massoulié, L. 2014. Community detection thresholds and the weak Ramanujan property. Pages 694–703
of: Proceedings of the 2014 ACM Symposium on Theory of Computing. ACM.
McKay, B. D. 1981. Subgraphs of random graphs with specified degrees. Congressus Numerantium, 33,
213–223.
McKay, B. D. 2011. Subgraphs of random graphs with specified degrees. In: Proceedings of the Interna-
tional Congress of Mathematicians 2010. Hindustan Book Agency.
McKay, B. D., and Wormald, N. 1990. Asymptotic enumeration by degree sequence of graphs of high
degree. European J. Combin., 11(6), 565–580.
Meester, R., and Roy, R. 1996. Continuum percolation. Cambridge Tracts in Mathematics, vol. 119.
Cambridge University Press.
Milewska, M., van der Hofstad, R., and Zwart, B. 2023. Dynamic random intersection graph: Dynamic
local convergence and giant structure. arXiv: 2308.15629 [math.PR].
Molloy, M., and Reed, B. 1995. A critical point for random graphs with a given degree sequence. Random
Structures Algorithms, 6(2-3), 161–179.
Molloy, M., and Reed, B. 1998. The size of the giant component of a random graph with a given degree
sequence. Combin. Probab. Comput., 7(3), 295–305.
Molloy, M., Surya, E., and Warnke, L. 2022. The degree-restricted random process is far from uniform.
arXiv: 2211.00835v1 [math.CO].
Moore, C., and Newman, M. E. J. 2000. Epidemics and percolation in small-world networks. Phys. Rev. E,
61, 5678–5682.
Mossel, E., Neeman, J., and Sly, A. 2015. Reconstruction and estimation in the planted partition model.
Probab. Theory Rel. Fields, 162(3-4), 431–461.
Mossel, E., Neeman, J., and Sly, A. 2016. Belief propagation, robust reconstruction and optimal recovery
of block models. Ann. Appl. Probab., 26(4), 2211–2256.
Mossel, E., Neeman, J., and Sly, A. 2018. A proof of the block model threshold conjecture. Combinatorica,
38(3), 665–708.
Mucha, P. J., Richardson, T., Macon, K., Porter, M. A., and Onnela, J.-P. 2010. Community structure in
time-dependent, multiscale, and multiplex networks. Science, 328(5980), 876–878.
Nair, J., Wierman, A., and Zwart, B. 2022. The fundamentals of heavy tails: properties, emergence, and
estimation. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press.
Newman, M. E. J. 2003. Properties of highly clustered networks. Phys. Rev. E, 68(2), 026121.
Newman, M. E. J. 2009. Random graphs with clustering. Phys. Rev. Lett., 103(Jul), 058701.
Newman, M. E. J. 2010. Networks: an introduction. Oxford University Press.
Newman, M. E. J., and Park, J. 2003. Why social networks are different from other types of networks.
Phys. Rev. E, 68(3), 036122.
Newman, M. E. J., and Watts, D.J. 1999. Scaling and percolation in the small-world network model. Phys.
Rev. E, 60, 7332–7344.
Newman, M. E. J., Moore, C., and Watts, D. J. 2000a. Mean-field solution of the small-world network
model. Phys. Rev. Lett., 84, 3201–3204.
Newman, M. E. J., Strogatz, S., and Watts, D. 2000b. Random graphs with arbitrary degree distribution and
their application. Phys. Rev. E, 64, 026118, 1–17.
484 References

Newman, M. E. J., Strogatz, S., and Watts, D. 2002. Random graph models of social networks. Proc.
National Acad. Sci., 99, 2566–2572.
Newman, M. E. J., Watts, D. J., and Barabási, A.-L. 2006. The structure and dynamics of networks. Prince-
ton Studies in Complexity. Princeton University Press.
Norros, I., and Reittu, H. 2006. On a conditionally Poissonian graph process. Adv. Appl. Probab., 38(1),
59–75.
Noutsos, D. 2006. On Perron–Frobenius property of matrices having some negative entries. Linear Algebra
Appl., 412(2-3), 132–153.
O’Connell, N. 1998. Some large deviation results for sparse random graphs. Probab. Theory Rel. Fields,
110(3), 277–285.
Parthasarathy, K. R. 1967. Probability measures on metric spaces. Probability and Mathematical Statistics,
No. 3. Academic Press.
Pemantle, R. 2007. A survey of random processes with reinforcement. Probab. Surv., 4, 1–79 (electronic).
Pickands III, J. 1968. Moment convergence of sample extremes. Ann. Math. Statistics, 39, 881–889.
Pittel, B. 1994. Note on the heights of random recursive trees and random m-ary search trees. Random
Structures Algorithms, 5(2), 337–347.
Price, D. J. de Solla. 1965. Networks of scientific papers. Science, 149, 510–515.
Price, D. J. de Solla. 1986. Little science, big science. . . and beyond. Columbia University Press.
Reittu, H., and Norros, I. 2004. On the power law random graph model of massive data networks. Perfor-
mance Evaluation, 55(1-2), 3–23.
Resnick, S. 2007. Heavy-tail phenomena. Springer Series in Operations Research and Financial Engineer-
ing. Springer. (Probabilistic and statistical modeling).
Riordan, O., and Warnke, L. 2011. Explosive percolation is continuous. Science, 333(6040), 322–324.
Riordan, O., and Wormald, N. 2010. The diameter of sparse random graphs. Combin., Probab. & Comput.,
19(5-6), 835–926.
Ross, S. M. 1996. Stochastic processes. Second edn. Wiley Series in Probability and Statisticss. John Wiley
and Sons.
Ruciński, A., and Wormald, N. 2002. Connectedness of graphs generated by a random d-process. J. Aust.
Math. Soc., 72(1), 67–85.
Rudas, A., Tóth, B., and Valkó, B. 2007. Random trees and general branching processes. Random Structures
Algorithms, 31(2), 186–202.
Rudin, W. 1987. Real and complex analysis. McGraw-Hill.
Rudin, W. 1991. Functional analysis. International Series in Pure and Applied Mathematics. McGraw-Hill.
Rybarczyk, K. 2011. Diameter, connectivity, and phase transition of the uniform random intersection graph.
Discrete Math., 311(17), 1998–2019.
Salez, J. 2013. Weighted enumeration of spanning subgraphs in locally tree-like graphs. Random Structures
Algorithms, 43(3), 377–397.
Schuh, H.-J., and Barbour, A. D. 1977. On the asymptotic behaviour of branching processes with infinite
mean. Adv. Appl. Probab., 9(4), 681–723.
Seneta, E. 1973. The simple branching process with infinite mean. I. J. Appl. Probab., 10, 206–212.
Seneta, E. 1974. Regularly varying functions in the theory of simple branching processes. Adv. Appl.
Probab., 6, 408–420.
Shannon, C. E. 1948. A mathematical theory of communication. Bell System Tech. J., 27, 379–423, 623–
656.
Shepp, L. A. 1989. Connectedness of certain random graphs. Israel J. Math., 67(1), 23–33.
Shore, J. E., and Johnson, R. W. 1980. Axiomatic derivation of the principle of maximum entropy and the
principle of minimum cross-entropy. IEEE Trans. Inform. Theory, 26(1), 26–37.
Simon, H. A. 1955. On a class of skew distribution functions. Biometrika, 42, 425–440.
Singer, K. 1996. Random intersection graphs. ProQuest LLC, Ann Arbor, MI. PhD Thesis, The Johns
Hopkins University.
Smythe, R., and Mahmoud, H. 1994. A survey of recursive trees. Teor. Ĭmovı̄r. Mat. Statist., 1–29.
Snijders, T. A., Pattison, P., Robbins, G., and Handcock, M. 2006. New specifications for exponential
random graph models. Sociological Methodology, 36(1), 99–153.
References 485

Söderberg, B. 2002. General formalism for inhomogeneous random graphs. Phys. Rev. E, 66(6), 066121,
6.
Söderberg, B. 2003a. Properties of random graphs with hidden color. Phys. Rev. E, 68(2), 026107, 12.
Söderberg, B. 2003b. Random graph models with hidden color. Acta Phys. Polonica B, 34, 5085–5102.
Söderberg, B. 2003c. Random graphs with hidden color. Phys. Rev. E, 68(1), 015102, 4.
Sönmez, E. 2021. Graph distances of continuum long-range percolation. Braz. J. Probab. Statist., 35(3),
609–624.
Spencer, J., and Wormald, N. 2007. Birth control for giants. Combinatorica, 27(5), 587–628.
Stark, D. 2004. The vertex degree distribution of random intersection graphs. Random Structures Algo-
rithms, 24(3), 249–258.
Stegehuis, C., van der Hofstad, R., and van Leeuwaarden, J. S. H. 2016a. Epidemic spreading on complex
networks with community structures. Sci. Rep., 6, 29748.
Stegehuis, C., van der Hofstad, R., and van Leeuwaarden, J. S. H. 2016b. Power-law relations in random
networks with communities. Phys. Rev. E, 94, 012302.
Sundaresan, S., Fischhoff, I., Dushoff, J., and Rubenstein, D. 2007. Network metrics reveal differences in
social organization between two fission–fusion species, Grevy’s zebra and onager. Oecologia, 151(1),
140–149.
Turova, T. S. 2011. The largest component in subcritical inhomogeneous random graphs. Combin., Probab.
Comput., 20(01), 131–154.
Turova, T. S., and Vallier, T. 2010. Merging percolation on Zd and classical random graphs: phase transition.
Random Structures Algorithms, 36(2), 185–217.
Ugander, J., Karrer, B., Backstrom, L., and Marlow, C. 2011. The anatomy of the Facebook social graph.
arXiv:1111.4503 [cs.SI].
Vadon, V., Komjáthy, J., and van der Hofstad, R. 2019. A new model for overlapping communities with
arbitrary internal structure. Applied Network Science, 4(1), 42.
Voitalov, I., van der Hoorn, P., van der Hofstad, R., and Krioukov, D. 2019. Scale-free networks well done.
Phys. Rev. Res., 1(3), 033034.
Wang, D., Song, C., and Barabási, A. L. 2013. Quantifying long-term scientific impact. Science, 342(6154),
127–132.
Wang, J., Mei, Y., and Hicks, D. 2014. Comment on ”Quantifying long-term scientific impact”. Science,
345(6193), 149.
Wang, M., Yu, G., and Yu, D. 2008. Measuring the preferential attachment mechanism in citation networks.
Physica A: Statist. Mech. Appli., 387(18), 4692 – 4698.
Wang, M., Yu, G., and Yu, D. 2009. Effect of the age of papers on the preferential attachment in citation
networks. Physica A: Statist. Mech. Applic., 388(19), 4273–4276.
Wasserman, S., and Pattison, P. 1996. Logit models and logistic regressions for social networks. Psychome-
trika, 61(3), 401–425.
Watts, D. J. 1999. Small worlds. The dynamics of networks between order and randomness. Princeton
Studies in Complexity. Princeton University Press.
Watts, D. J. 2003. Six degrees. The science of a connected age. W. W. Norton & Co.
Watts, D. J., and Strogatz, S. H. 1998. Collective dynamics of ‘small-world’ networks. Nature, 393, 440–
442.
Wong, L.H., Pattison, P., and Robins, G. 2006. A spatial model for social networks. Physica A: Statist.
Mech. Applic., 360(1), 99–120.
Wormald, N. 1981. The asymptotic connectivity of labelled regular graphs. J. Combin. Theory Ser. B,
31(2), 156–167.
Wormald, N. 1999. Models of random regular graphs. Pages 239–298 of: Surveys in combinatorics, 1999
(Canterbury). London Math. Soc. Lecture Note Series, vol. 267. Cambridge University Press.
Yukich, J. E. 2006. Ultra-small scale-free geometric networks. J. Appl. Probab., 43(3), 665–677.
Yule, G. U. 1925. A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F.R.S.
Phil. Trans. Roy. Soc. London, B, 213, 21–87.
Zhao, Y., Levina, E., and Zhu, J. 2012. Consistency of community detection in networks under degree-
corrected stochastic block models. Ann. Statist., 40(4), 2266–2292.
486 References

Zuev, K., Boguná, M., Bianconi, G., and Krioukov, D. 2015. Emergence of soft communities from geomet-
ric preferential attachment. Sci. Rep., 5, 9421.
G LOSSARY

G = (V, E) Graph with vertex set V and edge set E 4

[n] Vertex set [n] = {1, . . . , n} 4
Dn Degree of a uniform vertex in a graph G with vertex set 4
[n]
nk Number of vertices with degree k 5
(Gn )n≥1 Graph sequence of graphs of increasing size 14
ERn (p) Erdős–Rényi random graph with n vertices and edge 16
probability p P
`n Total vertex Pweight `n = v∈[n] wv for GRG, total 17, 21
degree `n = v∈[n] dv for CM
GRGn (w) Generalized random graph with n vertices and weight 17
sequence w = (wv )v∈[n]
CLn (w) Chung–Lu model with n vertices and weight sequence 17
w = (wv )v∈[n]
NRn (w) Norros–Reittu model with n vertices and weight se- 17
quence w = (wv )v∈[n]
(Xuv )1≤u≤v≤n Alternative representation of a (multi-) graph, where 20, 22
Xuv denotes the number of edges between u and v .
CMn (d) Configuration model with n vertices and degree se- 22
quence d = (dv )v∈[n]
ECMn (d) Erased configuration model obtained from CMn (d) by 24
removing self-loops and merging multiple edges
UGn (d) Uniform random graph with n vertices and degree se- 25
quence d = (dv )v∈[n]
PA(m,δ)
n (a) Preferential attachment model with n vertices. At each 32
time step, one vertex and m edges are added and the
attachment function equals the degree plus δ
Γ(t) Gamma function 33
PA(m,δ)
n (b) Preferential attachment model with n vertices alike 34
PA(m,δ)
n , but for m = 1 no self-loops are allowed.
PA(m,δ)
n (d) Preferential attachment model with n vertices alike 35
PA(m,δ)
n , but without any self-loops.
BPA(f
n
)
Bernoulli preferential attachment model with n ver- 36
tices, where, at each time step, one vertex is added with
conditionally independent edges to the older vertices
X? The size-biased version of a (non-negative) random 39
variable X
Be(p) Bernoulli random variable with success probability p 40
Bin(n, p) Binomial random variable with parameters n and suc- 40
cess probability p
Poi(λ) Poisson random variable with parameter λ 40

487
488 Glossary

Exp(λ) Exponential random variable with parameter λ and ex- 40

pected value 1/λ
Gam(r, λ) Gamma random variable with parameters λ and r, and 40
expected value r/λ
whp A sequence of events (En )n≥1 occurs with high proba- 40
bility (whp) when limn→∞ P(En ) = 1
d
−→ Convergence in distribution 40
P
−→ Convergence in probability 40
a.s.
−→ Convergence almost surely 40
(G? , dG? ) Metric space of (equivalence classes of) rooted graphs. 49
C (v) Connected component of v ∈ V (G) 50
Cmax Connected component of maximal size 74
Z≥k Number of vertices in connected component of size at 74
least k
I NDEX

Achlioptas processes, 448 clustering, 420, 421

product rule, 448 connectivity, 173
Albert–Barábasi model, 231 core, 302
Average degree, 5 degree regularity, 23
degree-truncation argument, 26
Backward neighborhood of diameter, 312
directed graph, 392 diameter core, 302
Bernoulli preferential attachment model, 35 directed, 398
degree distribution, 36 distance, 291
local convergence, 234 forward degree distribution, 139
Bipartite configuration model, 409 giant component, 151
Bohman–Frieze processes, 448 household structure, 420
Box, George E. P., 389 local convergence, 138
“All models are wrong but some are use- phase transition, 150
ful”, 389 power-iteration, 300
Branching process, 42 small-world nature, 291
exploration, 42 spatial, 447
infinite-mean, 305 ultra-small-world nature, 291
marked, 121 Connected component, 50
multi-type, 108 complexity, 125
unimodular, 42, 121, 318 Connectivity
Breadth-first exploration, 41 configuration model, 173
Broder, A., 391 Convergence of random variables
partition of connected components in di- almost surely, 40
graphs, 391 in distribution, 40
in probability, 40
Central limit theorem, 241 Copying models, 448
Chung–Lu model, 17
attack vulnerability, 127 de Finetti’s Theorem, 190
fluctuations of typical distance, 283 Death process, 163
giant component, 172 Degree
small-world nature, 247 average, 5
ultra-small-world nature, 248 correlations, 14
Citation networks, 386 generalized random graph, 25
exponential growth, 386 random intersection graph, 423
power-law degree sequences, 386 scale-free percolation, 445
Clique, 260, 302 Degree distribution
Clustered inhomogeneous random graph, 414 erased configuration model, 24
giant component, 418 generalized random graph, 19
number of edges, 417 geometric preferential attachment model,
Clustering, 14 441, 442
geometric inhomogeneous random graph, inhomogeneous random graphs, 102
440 log–log plot, 7
Clustering coefficient, 67, 68 preferential attachment model, 34
global, 67 scale-free, 7
local, 68 spatial preferential attachment model, 443
Community structure network, 14, 386 Degree sequence
macroscopic communities, 386 graphical, 28
microscopic communities, 386 Diameter, 13
Complexity, 125 configuration model, 312
Configuration model, 21, 137, 289 inhomogeneous random graph, 283
average distance, 291 preferential attachment model, 373
bipartite, 409 Digraph, 390
clique, 302 arc, 390

489
490 INDEX

Directed configuration model, 398 degree distribution, 441, 442

giant component, 400 Giant component
local convergence, 399 Chung–Lu model, 172
out- and in-degree distributions, 399 clustered inhomogeneous random graph,
Directed graph, 390 418
Directed inhomogeneous random graph, 395 configuration model, 151
examples, 395 directed configuration model, 400
giant component, 398 directed inhomogeneous random graph,
local convergence, 397 398
Directed network, 386 Erdős–Rényi random graph, 81
Distribution generalized random graph, 172
Bernoulli, 40, 190 geometric inhomogeneous random graph,
beta, 40 438
binomial, 40 hierarchical configuration model, 419
exponential, 40 hyperbolic random graph, 433
gamma, 40 Norros–Reittu model, 172
mixed Poisson, 102 random intersection graph, 425
Poisson, 40 rank-1 random graph, 172
power law, 37 uniform random graph with prescribed de-
Duplication models, 448 grees, 171
Graph, 4
Embeddings and networks, 4
hyperbolic random graph, 434 degree, 4
Erased configuration model, 24 degree distribution, 4
degree distribution, 24 digraph, 390
Erdős–Rényi random graph, 15 edge set, 4
degree distribution, 16 isomorphism, 49
diameter, 312 locally finite, 48
exponential random graph, 427 rooted, 48
giant component, 81 sequence, 14
local convergence, 60 2-core, 313
small-world nature, 86 vertex set, 4
Exchangeable random variables, 190 Graph distances
Exponential random graph, 425 diameter, 13
Erdős–Rényi random graph, 427 typical distance, 13
generalized random graph, 428
Helly’s Theorem, 192
Forward neighborhood of Hierarchical configuration model, 419
directed graph, 392 giant component, 419
Hopcount, 12
Generalized random graph, 17 Household model, 420
conditioned on its degrees, 21 Hub, 7
degree, 25 Hyperbolic random graph, 431
degree distribution, 19 degree distribution, 432
exponential random graph, 428 embeddings, 434
giant component, 172 giant component, 433
local convergence, 123 ultra-small-world nature, 433
small-world nature, 247
ultra-small-world nature, 248 In-neighborhood of
weight regularity, 18 directed graph, 392
Geometric inhomogeneous random graph, 435 Inhomogeneous random graph, 16, 98
clustering, 440 degree distribution, 102
degree distribution, 438 diameter, 282, 283
giant component, 438 dual kernel, 282
hyperbolic random graph, 436 duality principle, 282
local convergence, 438 fluctuations of typical distance, 283
ultra-small-world nature, 439 graphical, 98
Geometric preferential attachment model irreducible, 98
INDEX 491

local convergence, 116 directed graph, 392

small world, 247
small-world nature, 246 Pólya urn scheme, 193
sparsity, 99 limit theorem, 193
typical distance, 246, 283 Path, 251
ultra-small world, 248 self-avoiding, 251
Internet, 7, 12 Percolation
hopcount, 12 long-range, 457
scale-free, 7 Perron–Frobenius theorem, 110
Isomorphism, 49 Phase transition
configuration model, 150
Local convergence Poissonian random graph, 17
Bernoulli preferential attachment model, Potter’s Theorem, 37
234 Power-law distribution, 37
configuration model, 138 Preferential attachment model, 32, 189, 327
directed configuration model, 399 degree, 202
directed inhomogeneous random graph, degree distribution, 34
397 degree fixed vertices, 33
Erdős–Rényi random graph, 60 diameter, 336, 373
generalized random graph, 123 finite-graph Pólya version, 205
geometric inhomogeneous random graph, local convergence, 199, 201, 230
438 Pólya point tree, 199
inhomogeneous random graph, 116 small-world nature, 336
preferential attachment model, 199, 201, typical distances, 336
230 ultra-small-world nature, 337, 338
random intersection graph, 424 Preferential attachment tree, 197
random regular graph, 59 relative degrees, 197
Local weak convergence
boxes in Zd , 52 Random graph
criterion, 51 exponential, 425
definition, 50 geometry, 428
truncated trees, 53 sequence, 14
small-world model, 429
Maclaurin inequality, 294 spatial, 428
Metric structure on rooted graphs, 49 typical distance, 246
Mixed marked Poisson branching process, 121 universality, 37
Mixed-Poisson distribution, 102 Random graph with prescribed expected degrees,
Multi-type branching process, 108 17
irreducible, 110 Random intersection graph, 422, 423
Markov chain, 109 degree, 423
positively regular, 110 giant component, 425
singular, 109 local convergence, 424
Random regular graph
Neighborhood, 48 local convergence, 59
directed graph, 392 Random variables
Network exchangeable, 190
sparse, 5 stochastic domination, 40
Network statistics uniformly integrable, 67
clustering coefficient, 67, 68 Rank-1 random graph, 17
Norros–Reittu model, 17 clique, 260
fluctuations of typical distance, 283 giant component, 172
giant component, 172 Real-world network, 3, 4
small-world nature, 247 citation networks, 386
stochastic domination branching process, clustering, 14
121 communities, 386
ultra-small-world nature, 248 community structure, 14
degree correlations, 14
Out-neighborhood of directed, 386
492 INDEX

hub, 7 Theorem
log–log plot degree distribution, 7 de Finetti, 190
scale-free, 6, 7 Helly, 192
scale-free nature, 5 Perron–Frobenius, 110
“scale-free networks are rare”, 7 Potter, 37
small-world phenomenon, 11 Tightness, 469
spatial structure, 14 general metric space, 469
super-spreader, 7 Tree, 41
Repeated configuration model, 25 exploration, 41
Rooted graph height, 328
isomorphism, 49 ordered, 41
metric, 49 rooted, 41
neighborhood, 48 Ulam–Harris labeling, 41
2-core, 313
Scale-free percolation, 444, 456 Two-regular graph
degree, 445 diameter, 324
small-world nature, 447 longest cycle, 324
ultra-small-world nature, 447 Typical distance, 13
Scale-free tree, 328
diameter, 328 Ultra-small distance
height, 328 power iteration for configuration model,
typical distance, 328 303
Self-avoiding path, 251 power-iteration for configuration model,
Size-biased distribution, 121 300
Small world, 247 Ultra-small world, 247, 248
Small-world model, 429 Ultra-small-world nature
continuous circle model, 429 Chung–Lu model, 248
small-world nature, 429, 430 configuration model, 291
Small-world nature generalized random graph, 248
Chung–Lu model, 247 geometric inhomogeneous random graph,
configuration model, 291 439
Erdős–Rényi random graph, 86 hyperbolic random graph, 433
generalized random graph, 247 Norros–Reittu model, 248
inhomogeneous random graph, 246 preferential attachment model, 337, 338
Norros–Reittu model, 247 scale-free percolation, 447
preferential attachment model, 336 Uniform integrability, 67
scale-free percolation, 447 Uniform random graph with prescribed degrees,
small-world model, 429, 430 27
Sparse network, 5 edge probabilities, 28
Spatial configuration model, 447 giant component, 171
matching, 448 switching algorithm, 27
Spatial preferential attachment model, 440 using configuration model, 25
degree distribution, 443 Uniform recursive trees, 378
Spatial random graph, 428 Universality, 37
clustering, 428 typical distances, 338
Spatial structure, 14
Stochastic domination, 40, 121 With high probability, 40

Numerical Sound Synthesis: Finite Difference Schemes and Simulation in Musical Acoustics
From Everand
Numerical Sound Synthesis: Finite Difference Schemes and Simulation in Musical Acoustics
Stefan Bilbao
No ratings yet
Combinatorial Optimization
From Everand
Combinatorial Optimization
William J. Cook
4.5/5 (2)
(MADHU MANGAL PAUL) Numerical Analysis For Scienti
77% (13)
(MADHU MANGAL PAUL) Numerical Analysis For Scienti
666 pages
Complete Download Introduction to Random Graphs Alan Frieze PDF All Chapters
100% (1)
Complete Download Introduction to Random Graphs Alan Frieze PDF All Chapters
71 pages
Introduction to Random Graphs Alan Frieze pdf download
No ratings yet
Introduction to Random Graphs Alan Frieze pdf download
54 pages
Instant download Introduction to Random Graphs Alan Frieze pdf all chapter
No ratings yet
Instant download Introduction to Random Graphs Alan Frieze pdf all chapter
57 pages
SoICT-Eng - ProbComp - Lec 9 - Random Network Models
No ratings yet
SoICT-Eng - ProbComp - Lec 9 - Random Network Models
80 pages
Introduction to Random Graphs Alan Frieze - The ebook in PDF/DOCX format is available for instant download
No ratings yet
Introduction to Random Graphs Alan Frieze - The ebook in PDF/DOCX format is available for instant download
80 pages
Complex Network Models
No ratings yet
Complex Network Models
110 pages
SoICT-Eng - ProbComp - Lec 9 - Random Network Models
No ratings yet
SoICT-Eng - ProbComp - Lec 9 - Random Network Models
80 pages
04 Sahoo
No ratings yet
04 Sahoo
24 pages
A Beginners View of Our Electric Universe-Free
No ratings yet
A Beginners View of Our Electric Universe-Free
517 pages
Full Download (Ebook) Introduction to Random Graphs by Alan Frieze, Michał Karoński ISBN 9781107118508, 1107118506 PDF DOCX
100% (3)
Full Download (Ebook) Introduction to Random Graphs by Alan Frieze, Michał Karoński ISBN 9781107118508, 1107118506 PDF DOCX
81 pages
Oxford SC2 Transcribed Notes
No ratings yet
Oxford SC2 Transcribed Notes
42 pages
[Ebooks PDF] download (Ebook) Probability on Graphs: Random Processes on Graphs and Lattices by Geoffrey Grimmett ISBN 9781108528986, 1108528988 full chapters
100% (8)
[Ebooks PDF] download (Ebook) Probability on Graphs: Random Processes on Graphs and Lattices by Geoffrey Grimmett ISBN 9781108528986, 1108528988 full chapters
67 pages
Data Mining and BI: Social Network Analytics: Random Graphs
No ratings yet
Data Mining and BI: Social Network Analytics: Random Graphs
46 pages
course 3-4
No ratings yet
course 3-4
46 pages
Lecture Series Local Convergence NotesRGCNII
No ratings yet
Lecture Series Local Convergence NotesRGCNII
1 page
SPN Book
No ratings yet
SPN Book
520 pages
(Ebook) Probability on graphs: random processes on graphs and lattices by Grimmett, Geoffrey ISBN 9781108438179, 9781108528986, 1108438172, 1108528988 2024 Scribd Download
100% (8)
(Ebook) Probability on graphs: random processes on graphs and lattices by Grimmett, Geoffrey ISBN 9781108438179, 9781108528986, 1108438172, 1108528988 2024 Scribd Download
57 pages
Download Full Probability on graphs 1st Edition Geoffrey Grimmett PDF All Chapters
100% (21)
Download Full Probability on graphs 1st Edition Geoffrey Grimmett PDF All Chapters
50 pages
? ??????! - 1719552340
No ratings yet
? ??????! - 1719552340
279 pages
After CAT
No ratings yet
After CAT
443 pages
Download Full Probability on graphs 1st Edition Geoffrey Grimmett PDF All Chapters
100% (6)
Download Full Probability on graphs 1st Edition Geoffrey Grimmett PDF All Chapters
61 pages
1
No ratings yet
1
3 pages
Gionis
No ratings yet
Gionis
191 pages
Chapter Random Graphs
No ratings yet
Chapter Random Graphs
18 pages
Graph Theory in The Information Age
No ratings yet
Graph Theory in The Information Age
13 pages
10 Models Erdos Renyi
No ratings yet
10 Models Erdos Renyi
9 pages
Concentration of Random Graphs and Application To Community Detection
No ratings yet
Concentration of Random Graphs and Application To Community Detection
22 pages
Probability on Graphs Random Processes on Graphs and Lattices Institute of Mathematical Statistics Textbooks Geoffrey Grimmett - Download the complete ebook in PDF format and read freely
100% (1)
Probability on Graphs Random Processes on Graphs and Lattices Institute of Mathematical Statistics Textbooks Geoffrey Grimmett - Download the complete ebook in PDF format and read freely
60 pages
Random_Geometric_Models
No ratings yet
Random_Geometric_Models
201 pages
Probability on Graphs Random Processes on Graphs and Lattices Institute of Mathematical Statistics Textbooks Geoffrey Grimmett instant download
100% (1)
Probability on Graphs Random Processes on Graphs and Lattices Institute of Mathematical Statistics Textbooks Geoffrey Grimmett instant download
57 pages
Introduction to random graphs
No ratings yet
Introduction to random graphs
682 pages
47150
No ratings yet
47150
70 pages
Book
No ratings yet
Book
652 pages
1960 Random Graphs
No ratings yet
1960 Random Graphs
45 pages
Random Graph Theo
No ratings yet
Random Graph Theo
45 pages
Download Full Probability on Graphs Random Processes on Graphs and Lattices Institute of Mathematical Statistics Textbooks Geoffrey Grimmett PDF All Chapters
100% (2)
Download Full Probability on Graphs Random Processes on Graphs and Lattices Institute of Mathematical Statistics Textbooks Geoffrey Grimmett PDF All Chapters
75 pages
Probability On Graphs Random Processes On Graphs and Lattices Geoffrey Grimmett 2024 Scribd Download
100% (8)
Probability On Graphs Random Processes On Graphs and Lattices Geoffrey Grimmett 2024 Scribd Download
52 pages
Probability on graphs random processes on graphs and lattices 1. publ., repr. with corr Edition Textbooks. download
100% (1)
Probability on graphs random processes on graphs and lattices 1. publ., repr. with corr Edition Textbooks. download
60 pages
Book
No ratings yet
Book
579 pages
Probability on graphs random processes on graphs and lattices Second Edition Grimmett all chapter instant download
100% (5)
Probability on graphs random processes on graphs and lattices Second Edition Grimmett all chapter instant download
55 pages
Handbook of Large Scale Random Networks PDF
No ratings yet
Handbook of Large Scale Random Networks PDF
539 pages
Lecture Notes On Graph Theory
No ratings yet
Lecture Notes On Graph Theory
99 pages
Introduction To Random Graphs
100% (1)
Introduction To Random Graphs
583 pages
BOOK
No ratings yet
BOOK
583 pages
Download Probability on Graphs Random Processes on Graphs and Lattices Geoffrey Grimmett ebook All Chapters PDF
No ratings yet
Download Probability on Graphs Random Processes on Graphs and Lattices Geoffrey Grimmett ebook All Chapters PDF
55 pages
Biol Sistemas 2 Redes
No ratings yet
Biol Sistemas 2 Redes
82 pages
Book
No ratings yet
Book
583 pages
Graphtheory
No ratings yet
Graphtheory
100 pages
Graph Theory
100% (2)
Graph Theory
100 pages
4 RandomNetworks Lastupdate2324
No ratings yet
4 RandomNetworks Lastupdate2324
41 pages
Book
No ratings yet
Book
585 pages
XQuery for Humanists
From Everand
XQuery for Humanists
Clifford B. Anderson
No ratings yet
Probability and Statistics for Computer Science
From Everand
Probability and Statistics for Computer Science
James L. Johnson
No ratings yet
The Calculus Lifesaver: All the Tools You Need to Excel at Calculus
From Everand
The Calculus Lifesaver: All the Tools You Need to Excel at Calculus
Adrian Banner
4/5 (6)
Inside Symbian SQL: A Mobile Developer's Guide to SQLite
From Everand
Inside Symbian SQL: A Mobile Developer's Guide to SQLite
Ivan Litovski
No ratings yet
Adaptive Approximation Based Control: Unifying Neural, Fuzzy and Traditional Adaptive Approximation Approaches
From Everand
Adaptive Approximation Based Control: Unifying Neural, Fuzzy and Traditional Adaptive Approximation Approaches
Jay A. Farrell
5/5 (1)
Mastering Rust Programming: From Foundations to Future
From Everand
Mastering Rust Programming: From Foundations to Future
Kameron Hussain
No ratings yet
Schaum's Outline of Computer Architecture
From Everand
Schaum's Outline of Computer Architecture
Nick Carter
No ratings yet
Ec110 Prefilter Cartridge: Everpure
No ratings yet
Ec110 Prefilter Cartridge: Everpure
2 pages
Underwater Robotics Book Chapter
No ratings yet
Underwater Robotics Book Chapter
35 pages
Inflam Immuno
No ratings yet
Inflam Immuno
3 pages
Dinakdakan
No ratings yet
Dinakdakan
2 pages
BFC 32002 Project Instructions 120232024
No ratings yet
BFC 32002 Project Instructions 120232024
7 pages
Nutsvolts201003 DL
100% (2)
Nutsvolts201003 DL
84 pages
Safety Data Sheet For Chemical Products: According ISO 11014-1 Page 1 of 5 Date: 06-02-2007
No ratings yet
Safety Data Sheet For Chemical Products: According ISO 11014-1 Page 1 of 5 Date: 06-02-2007
5 pages
Transformations, Congruence and Similarity
No ratings yet
Transformations, Congruence and Similarity
201 pages
Handbook Triboluminiscence
No ratings yet
Handbook Triboluminiscence
454 pages
What Is The Difference Between Hydraulic Mean Depth and Hydraulic Radius
0% (1)
What Is The Difference Between Hydraulic Mean Depth and Hydraulic Radius
1 page
Diagnostic Test (5th Grade)
No ratings yet
Diagnostic Test (5th Grade)
8 pages
Company Profile PT Tohoma Mandiri
No ratings yet
Company Profile PT Tohoma Mandiri
16 pages
Chapter 1 - Disaster and Disaster Risk
No ratings yet
Chapter 1 - Disaster and Disaster Risk
18 pages
Iñigo_Manglano-Ovalle
No ratings yet
Iñigo_Manglano-Ovalle
9 pages
1.1 Introduction To Cells
No ratings yet
1.1 Introduction To Cells
55 pages
Correcto
No ratings yet
Correcto
27 pages
JSK CB Training Course Presentation V3
No ratings yet
JSK CB Training Course Presentation V3
119 pages
SHS DEGMNHS Orientation
No ratings yet
SHS DEGMNHS Orientation
62 pages
Modeling Helicopter UAV: Carl Thibault
No ratings yet
Modeling Helicopter UAV: Carl Thibault
12 pages
4.1 - Uniform Probability Distributions
No ratings yet
4.1 - Uniform Probability Distributions
2 pages
The Paramount Importance of Self-Attention
100% (1)
The Paramount Importance of Self-Attention
102 pages
A Review of Friction Stir Welding Pin Profile
No ratings yet
A Review of Friction Stir Welding Pin Profile
19 pages
19) Door - Cum - Partition Sections - 2
No ratings yet
19) Door - Cum - Partition Sections - 2
22 pages
Studi Kelayakan Terminal
No ratings yet
Studi Kelayakan Terminal
7 pages
COMPACT ASD Technical Manual PDF
No ratings yet
COMPACT ASD Technical Manual PDF
157 pages
Axial Load Case in Stress - Check
No ratings yet
Axial Load Case in Stress - Check
5 pages
Tepa Area Hwediem District Dedication of Hwediem Central Chapel by Apostle Eric Nyamekye, The Chairman of The Church of Pentecost
No ratings yet
Tepa Area Hwediem District Dedication of Hwediem Central Chapel by Apostle Eric Nyamekye, The Chairman of The Church of Pentecost
3 pages
Weber Compatibil
No ratings yet
Weber Compatibil
10 pages
Yellow Flag Risk Form
No ratings yet
Yellow Flag Risk Form
1 page