0% found this document useful (0 votes)
11 views

ECTA14407

a research paper

Uploaded by

daiyuanxin2222
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

ECTA14407

a research paper

Uploaded by

daiyuanxin2222
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Econometrica, Vol. 88, No.

1 (January, 2020), 1–32

TESTING MODELS OF SOCIAL LEARNING ON NETWORKS:


EVIDENCE FROM TWO EXPERIMENTS

ARUN G. CHANDRASEKHAR
Department of Economics, Stanford University, NBER, and J-PAL

HORACIO LARREGUY
Department of Governement, Harvard University

JUAN PABLO XANDRI


Department of Economics, Princeton University

We theoretically and empirically study an incomplete information model of social


learning. Agents initially guess the binary state of the world after observing a private
signal. In subsequent rounds, agents observe their network neighbors’ previous guesses
before guessing again. Agents are drawn from a mixture of learning types—Bayesian,
who face incomplete information about others’ types, and DeGroot, who average their
neighbors’ previous period guesses and follow the majority. We study (1) learning fea-
tures of both types of agents in our incomplete information model; (2) what network
structures lead to failures of asymptotic learning; (3) whether realistic networks exhibit
such structures. We conducted lab experiments with 665 subjects in Indian villages and
350 students from ITAM in Mexico. We perform a reduced-form analysis and then
structurally estimate the mixing parameter, finding the share of Bayesian agents to be
10% and 50% in the Indian-villager and Mexican-student samples, respectively.

KEYWORDS: Networks, social learning, Bayesian learning, DeGroot learning.

1. INTRODUCTION
INFORMATION AND OPINIONS ABOUT TECHNOLOGIES, job opportunities, products, and
political candidates, among other things, are largely transmitted through social networks.
However, the information individuals receive from others often contains noise that in-
dividuals need to filter. A priori, individuals are likely to differ in their sophistication
(Bayes-rationality) or naivete about how they engage in social learning. That is, they
might vary in the extent to which they are able to assess how much independent infor-
mation is contained among the social connections with whom they communicate, as well
as in whether they account for the naivete of those connections.
In many settings, individuals cannot or do not transmit their beliefs to an arbitrarily
fine degree, nor their information sets.1 Individuals are constrained to processing coarser

Arun G. Chandrasekhar: [email protected]


Horacio Larreguy: [email protected]
Juan Pablo Xandri: [email protected]
We are grateful to Daron Acemoglu, Abhijit Banerjee, Esther Duflo, Ben Golub, Matthew O. Jackson,
Markus Mobius, Omer Tamuz, Adam Szeidl, and Chris Udry for extremely helpful discussions. Essential feed-
back was provided by Kalyan Chatterjee, Juan Dubra, Rema Hanna, Ben Olken, Evan Sadler, Rob Townsend,
Xiao Yu Wang, Luis Zermeño, and participants at numerous seminars and conferences. We also thank Devika
Lakhote, Gowri Nagaraj, Adriana Paz, Mounu Prem, Alejandra Rogel, Diego Dominguez, José Ramón En-
ríquez, Piotr Evdokimov, Andrei Gomberg, and Juan Pablo Micozzi for their assistance. We thank the Russell
Sage Behavioral Economics Grant, the NSF GRFP (Chandrasekhar), and the Bank of Spain and Caja Madrid
(Larreguy) for financial support.
1
In other cases, communicating very fine information may be more feasible. The extent to which this is
feasible is surely context specific.

© 2020 The Econometric Society https://doi.org/10.3982/ECTA14407


2 A. G. CHANDRASEKHAR, H. LARREGUY, AND J. P. XANDRI

information from their network neighbors when engaging in learning. This constraint may
be for many reasons, including but not limited to operating in settings where learning
is through observations of others’ actions or the costs of communicating very complex
information are too high and therefore only summaries are transmitted.2
We study such a coarse communication environment. We focus on a setting in which
individuals receive signals about an unknown binary state of the world in the first period
and, in subsequent rounds, they communicate to their neighbors their best (binary) guess
about the state of the world.3 In this ubiquitous setting, if all agents are Bayesian, and
there is common knowledge of this, under mild assumptions, learning will be asymptoti-
cally efficient in large networks (see Gale and Kariv (2003) and Mossel and Tamuz (2010)
for a myopic learning environment and Mossel, Sly, and Tamuz (2015) for a strategic
learning environment). However, if all agents update their guess as the majority of their
neighbors’ prior guesses—as modeled by a coarse DeGroot model of learning, also known
as the majority voting model (Liggett (1985))—then it is possible that a non-trivial set of
agents will end up stuck making the wrong guess. In practice, it might be that there is a mix
of sophisticated (Bayesian) and naive (DeGroot) learners, and that Bayesians are aware
of this and incorporate it in their calculations. Such an incomplete information model, its
relevance, and implications for asymptotic learning have not been studied in our coarse
communication environment.
This paper develops an incomplete information model of social learning with coarse
communication on a network in which agents can potentially be Bayesian or DeGroot,
and agents have common knowledge of the distribution of Bayesian or DeGroot types
in the population. Bayesian agents then learn in an environment of incomplete infor-
mation. The model nests the two extreme cases—complete information all-Bayesian and
all-DeGroot—and is a hybrid that serves as an empirically relevant benchmark as, a pri-
ori, agents are likely to be heterogeneous in the degree of sophistication when engaging
in social learning.
We then study the data from two lab experiments we conducted, one in 2011 in the
field with 665 Indian villagers and another in 2017 with 350 university students of ITAM
in Mexico City, to examine whether subject learning behavior is consistent with DeGroot,
Bayesian, or a mixed population. In each experiment, we randomly placed seven subjects
into a connected network designed to maximize the ability to distinguish between the
learning models, and let agents anonymously interact with each other in a social learning
game. We study reduced patterns of learning, informed by the theory, and structurally
estimate the mixing parameter of the model via maximum likelihood estimation, which
we demonstrate delivers a consistent estimate. Conducting the experiment in two distinct
locations, with a particularly different educational background, enables us to apply our
methods broadly and also consider whether learning behavior differs by context.

2
Examples of coarse information include observing whether neighbors intend to join microfinance, the kind
of crop or fertilizer neighbors use, or the political billboards neighbors post outside their houses.
3
DeGroot (1974), DeMarzo, Vayanos, and Zwiebel (2003), Gale and Kariv (2003), Golub and Jackson
(2010), Mossel and Tamuz (2010), Jadbabaie, Molavi, Sandroni, and Tahbaz-Salehi (2012), Feldman, Immor-
lica, Lucier, and Weinberg (2014), among others, studied models where signals are endowed in the first period
and then agents repeatedly pass on the averages of their neighbors’ prior period averages to learn the state
of the world. Under DeGroot learning, which we call continuous DeGroot because of the lack of coarseness,
asymptotic convergence occurs via a law of large numbers argument and linear algebra results give bounds on
the speed of convergence. This model has been justified as the naive application of a one-period Bayes-rational
procedure of averaging.
MODELS OF SOCIAL LEARNING ON NETWORKS 3

Our theoretical and empirical results are as follows. Beginning with the theory, we first
identify four learning patterns that distinguish Bayesian and DeGroot agents in the in-
complete information model with coarse communication.4 In particular, we identify a key
network feature that sets apart the learning types, which we denote a clan. This is a set of
individuals who each have more links among themselves than to the outside world. The
first pattern we demonstrate is that, if a clan is composed entirely of DeGroot agents, and
they ever agree with each other about the state of the world in a period, they will never
change their opinions in all future periods, even if they are wrong. We denote these agents
as stuck. The second pattern is that, in the complete information all-Bayesian model, if
agent i’s neighborhood is contained in agent j’s, i always copies j’s prior period guess.
The third pattern is that, even under incomplete information, any Bayesian agent who
at any point learns whether the majority of initial signals was 0 or 1 never changes her
guess. The fourth pattern is that no Bayesian j, even under incomplete information, ever
responds to an agent i whose neighborhood is contained in j’s.
Our second theoretical result is to contrast the incomplete information Bayesian model
with the conventional complete information all-Bayesian model (where there is coarse
communication) and the oft-studied continuous DeGroot model (where agents can pass
on their exact beliefs). For any sequence of growing networks with uniformly bounded
degree, the standard all-Bayesian model with coarse communication leads to asymptotic
learning, whether agents behave myopically (choosing the short-run best response) or
strategically (playing a Nash equilibrium on a repeated game where agents may have
incentives to experiment). Further, in the continuous DeGroot model where agents can
pass on arbitrarily fine information, they too all learn the state of the world in the long
run. In contrast, if the sequence has a non-vanishing share of finite clans in the limit, then
the incomplete information Bayesian model with coarse communication exhibits failure
of asymptotic learning and a non-vanishing share of agents become stuck guessing the
wrong state of the world forever. This is true even if incomplete information Bayesian
agents play strategically rather than myopically.
Third, we address whether realistic networks have clans, which lead to the failure of
asymptotic learning in our incomplete information Bayesian model. We study a mixture
of two canonical models—random geometric graphs and Erdős–Rényi graphs—to model
sparse (each agent has few links) and clustered (agents’ friends tend to be themselves
friends) networks, which are hallmarks of realistic network data (Penrose (2003), Erdős
and Rényi (1959)). We show that, if local linking is bounded at some positive rate for
any global linking rate tending to zero at the inverse of the number of nodes,5 the share
of clans is uniformly bounded from below and therefore learning under the incomplete
information Bayesian model leads to a failure of asymptotic learning.
Turning to our empirical results, we begin with a reduced-form analysis of the differ-
ing patterns of Bayesian and DeGroot learning that we derived. Subjects behave largely
consistent with DeGroot learning in the Indian village sample but exhibit mixed learn-
ing behavior in the Mexican college sample. Specifically, first, 94.6% of the time, subjects
in the Indian experimental sample who are in a clan that comes to consensus remain
stuck on the wrong guess when the all-Bayesian model would suggest that they change
their guess. However, in the Mexican experimental sample, this number is 30.3%. Sec-
ond, over 82.9% of the time, subjects in the Indian sample who have an information set

4
Henceforth we omit the designation of coarse communication which is to be assumed unless noted other-
wise.
5
These technical choices are made to preserve the sparseness of the overall network.
4 A. G. CHANDRASEKHAR, H. LARREGUY, AND J. P. XANDRI

that is dominated by a network neighbor fail to simply copy their neighbor, which is what
Bayesians would do in an all-Bayesian environment. In contrast, this failure occurs 54.5%
of the time in the Mexican data. Third, nearly 94.5% of the time, subjects in the Indian
experiment respond to changes in their neighbors’ guesses despite learning whether the
majority of initial signals in the network were 0 or 1, which Bayesian agents would never
do even in an incomplete information environment. This happens 60.0% of the time in
the Mexican sample. Fourth, 93.1% of the time, subjects in the Indian sample respond to
changes in the behavior of agents with a dominated information set, which again Bayesian
agents (even in an incomplete information environment) would never do. In contrast, this
happens 61.4% of the time in the Mexican sample.
We then turn to the structural estimation of the mixing parameter in the incomplete
information model. There are two perspectives that we can take with the data. The first
is that the unit being analyzed is the entirety of the dynamic of social learning at the
network level. The focus is not describing an individual’s behavior per se, but rather the
social learning process as a whole. The second is at the individual level, where we focus on
each agent in each period, given a history. We estimate the parameters of the model under
both perspectives, but we prefer the network-level perspective since we are interested in
explaining the overall social learning process.
We find similar results from both perspectives. The network-level estimator indicates
that the mixing parameter is 0.1 (with a standard error of 0.130) in the Indian network
data, whereas the mixing parameter is 0.5 (with a standard error of 0.184) in the Mex-
ican data. At the individual level, the mixing parameter is 0.1 (standard error 0.113) in
the Indian network data, while the parameter is 0.4 (standard error 0.268) in the Mexi-
can data. The individual-level analysis naturally runs into the problem of zero-probability
events happening since agents do not internalize the fact that others may make mistakes.
We deal with this by specifying that the likelihood that we maximize terminates once an
agent hits a zero-probability event. Another way we could have in principle dealt with this
issue would be by adding trembles and studying quantal response equilibria (QRE) as in
Choi, Gale, and Kariv (2012). We show in our setting that this is computationally infeasi-
ble.6 So, instead, we model agents that do not internalize mistakes and only consider the
experimental data before agents reach zero-probability events, which we show delivers a
consistent estimate of the mixing parameter.
Our results then, first, suggest that in settings where networks are not sufficiently ex-
pansive (i.e., clans are likely) and there is a significant number of individuals that are not
sophisticated in their learning behavior, we should observe individuals exhibiting clus-
tered, inefficient behavior. Returning to some of our motivating examples, we might then
see groups of farmers that do not adopt a more efficient technology, or pockets of voters
that continue to support candidates revealed to engage in malfeasance. Second, our re-
sults indicate that there might be complementarities between financial market completion
and information aggregation. Due to limited access to financial institutions and contract-
ing ability, to support informal exchanges, villagers may organize themselves into “quilts”
which are composed of numerous clans (Jackson, Rodriguez-Barraquer, and Tan (2012)).
To the extent that information links are built upon the same sorts of relationships, so-
cial learning might be vulnerable particularly when there are weak institutions. This then
suggests that even partial financial market completion (e.g., the introduction of micro-
credit) might contribute to social learning through the dissolution of clans by reducing

6
A back-of-the-envelope calculation shows that using QRE in our setting with seven agents would take
377,346,524 years in a case where the same computation with three agents would take 4.5 hours.
MODELS OF SOCIAL LEARNING ON NETWORKS 5

the need for quilts.7 Third, our results motivate a simple set of policies to mitigate such
type of behavior. Policy-makers should not only do a widespread diffusion of information
about technologies, products, or incumbent politicians, but also encourage expansiveness
by generating opportunities for engagement and conversations across groups of individu-
als, for example, via village meetings.
We contribute to active discussions in both theoretical and experimental literatures
studying social learning on networks. The theoretical works closest to ours are Feldman
et al. (2014), Mossel and Tamuz (2010), Mossel, Neeman, and Tamuz (2014a), and Mos-
sel, Sly, and Tamuz (2015). Mossel and Tamuz (2010) considered a sequence of growing
graphs and showed that, in our coarse communication setting, if all agents are Bayesian,
then in the limit there will be consensus on the true state. Mossel, Sly, and Tamuz (2015)
showed that the same result holds true if agents behave strategically, where agents follow
Nash equilibrium strategies of a repeated game. Turning to the coarse DeGroot model,
Feldman et al. (2014) studied a variation with asynchronous learning behavior, where in
each period a single node is chosen uniformly at random to update, and showed that,
as long as the network is sufficiently expansive, there will also be asymptotic learning.
Mossel, Neeman, and Tamuz (2014a) focused on a synchronous setting and studied when
networks are such that the majority reaches the right opinion, and studied unanimity in
the special case of d-regular graphs, and, like Feldman et al. (2014), showed that consen-
sus on the right opinion will be reached for sufficiently expansive graphs.
Our theoretical results extend these results. Studying the synchronous case with incom-
plete information on irregular graphs, the concept we identify—clans—allows us to relate
our result on asymptotic learning failures to graph conductance and a different notion
of expansiveness (Chung (1997)). We show that a large family of flexible network for-
mation models that reflect real-world data has non-trivial clan shares to characterize the
pervasiveness of asymptotic learning failures. Finally, our model examines the robust im-
plications of Bayesian agents’ behavior in the presence of incomplete information about
others’ types, which is entirely new to the literature.
Failures of learning on networks have also been well-studied outside of the repeated
communication setting. A segment of the literature sets up the problem as sequential
learning. They studied asymptotic learning in an environment consisting of a directed net-
work where agents move once and see a subset of their predecessors’ choices (Banerjee
(1992), Bikhchandani, Hirshleifer, and Welch (1992), Smith and Sorensen (2000), Ace-
moglu, Dahleh, Lobel, and Ozdaglar (2011), Lobel and Sadler (2015), Eyster and Rabin
(2014)).8 The environment is quite different from our repeated communication setting
since, as noted by Eyster and Rabin (2014), “(a) networks are undirected and (b) players
take actions infinitely often, learning from one another’s past actions.” Thus, the asymp-
totic learning failures identified by our main theoretical results are conceptually different.
We also contribute to the experimental literature that began by documenting stylized
facts about learning. Choi, Gale, and Kariv (2005) demonstrated that, in networks of
three nodes, the data are consistent with Bayesian behavior. Similarly, Corazzini, Pavesi,
Petrovich, and Stanca (2012) showed that, in networks of four nodes, where agents receive
signals and the goal is for them to estimate the average of the initial signals and agents

7
This dynamic of reduced triadic closure, even among information links and not just informal insurance
links, due to the entry of microcredit is documented in Banerjee, Chandrasekhar, Duflo, and Jackson (2018).
8
Eyster and Rabin (2014) showed that “social learning-rules that are strictly and boundedly increasing in
private signals as well as neglect redundancy, in which no player anti-imitates any other have, with positive
probability, that society converges to the action that corresponds to certain beliefs in the wrong state.”
6 A. G. CHANDRASEKHAR, H. LARREGUY, AND J. P. XANDRI

could pass on real-numbers (arbitrarily fine information), the eventual averages reflected
double counting, consistent with DeGroot-like behavior. Meanwhile, Mobius, Phan, and
Szeidl (2015) conducted a field experiment to pit DeGroot learning against a Bayes-like—
though decidedly non-Bayesian—alternative where agents may “tag” information (pass
on information about the originator) to dampen double-counting. Finally, subsequent to
our work, Mueller-Frank and Neri (2013) and Mengel and Grimm (2012) also conducted
lab experiments to look at Bayesian versus DeGroot-like learning. Mueller-Frank and
Neri (2013) developed a general class of models where individuals perform what they
called Quasi-Bayesian updating and showed that long-run experimental outcomes are
consistent with this model. Crucial differences from our work include the fact that neither
paper allows for agents to be drawn from a mixed population and the latter (and initially
the former) did not reveal the network to their subjects, making the inference problem
more complicated.
Our contribution to the literature is to first directly address whether social learning
patterns reflect, for the most part, DeGroot-like or Bayesian-like behavior, or even the
behavior of a heterogeneous population in an incomplete information Bayesian environ-
ment. This framework nests all prior models that study cases with coarse DeGroot agents
or only complete information Bayesian agents at the extremes. We take seriously the idea
that, a priori, heterogeneity in the sophistication of learning types is likely. We identify
a number of distinguishing features between learning types, some of which are even ro-
bust to incomplete information. We demonstrate the relationship between social learning
failures and structures of the underlying network such as the presence of clans or informa-
tion dominating neighborhoods. Finally, by conducting our experiment in two contrasting
settings, we show that the mixing parameter might be context dependent, which has im-
portant implications for research and policy making.
The remainder of the paper is organized as follows. Section 2 develops the theoretical
framework and patterns of behavior by DeGroot and Bayesian agents in our incomplete
information model. Section 3 contains the experimental setup. In Section 4, we explore
the raw data and show reduced-form results from the perspective of the contrasting learn-
ing patterns developed in the theory section. Section 5 describes the structural estimation
procedure and the main results of such estimation. Section 6 concludes. All proofs are in
Appendix A.

2. THEORY
We develop an incomplete information model of social learning on networks in a coarse
learning environment where every agent in the network is drawn to be either a Bayesian
or a DeGroot type with independent and identically distributed probability (π). This nests
the pure DeGroot model (π = 0), the complete information all-Bayesian model (π = 1),
and the incomplete information model with (π ∈ (0 1)).

2.1. Setup
2.1.1. Environment
We consider an undirected, unweighted graph G = (V  E) with a vertex set V and an
edge list E of n = |V | agents. In an abuse of notation, we use G as the adjacency matrix
as well, with Gij being an indicator of whether ij ∈ E. We let Ni = {j : Gij = 1} be the
neighborhood of i and let Ni := Ni ∪ {i}.
MODELS OF SOCIAL LEARNING ON NETWORKS 7

Every agent has a type ηi ∈ {D B}—DeGroot (D) or Bayesian (B). We assume this
type is drawn independently and identically distributed by a Bernoulli with probability
π of ηi = B. This describes how each agent processes information, either according to
the DeGroot model or by using Bayesian updating. The process by which ηj are drawn
is commonly known, as is the structure of the entire network.9 Thus, it is an incomplete
information Bayesian model.
Individuals in the network attempt to learn about the underlying state of the world,
θ ∈ {0 1}. Time is discrete with an infinite horizon, so t ∈ N. Then, in every period t =
1 2    , every agent takes an action ait ∈ {0 1}, which is the guess about the underlying
state of the world. We can also interpret this as agents coarsely communicating by only
stating their guess about the state of the world.10
At t = 0, and only at t = 0, every agent receives an independent and identically dis-
tributed signal

θ with probability p
si =
1 − θ with probability 1 − p

for some p ∈ ( 12  1). Let s = (s1      sn ) denote the initial signal configuration. Then, at
the start of every period t > 1, every agent observes the history of play by each neighbor j.
Let At−1 denote the set of actions {aiτ }nt−1 t−1
i=1τ=1 , so every i in period t observes Ai , the
historical set of guesses by all neighbors of i.

2.1.2. Learning
Consider a DeGroot agent (ηi = D). This agent, in a coarse model, follows the majority
of her guess and her neighborhood’s guesses in the prior period. We assume that, for ties,
the agent simply follows her prior period’s guess.11 Formally, for t > 1,

⎪ n

⎪ ajt−1 · Gij + ait−1



⎪ 1


j=1
 

⎪ 1 if
N 
> 

⎪ 2

⎪ 
i


n
⎨ ajt−1 · Gij + ait−1
ait = 1

⎪ 0 if
j=1
  < 

⎪  

⎪ Ni 2

⎪ n



⎪ ajt−1 · Gij + ait−1



⎪ j=1 1

⎩ait−1 if   = 
N  2
i

9
As we make clear later, this feature is only relevant for Bayesian agents, as DeGroot agents only rely on
the prior period actions of their neighbors.
10
In Supplemental Material, Online Appendix F (Chandrasekhar, Larreguy, and Xandri (2020)), we extend
the definition of the coarse DeGroot model to the case where the agents can communicate up to an m degree
of granularity their beliefs about the state of the world rather than simply reporting one of {0 1}. That is,
agents can report beliefs {0 m−1
1
     m−2
m−1
 1} from rounding their averaging of their neighbors’ and own prior-
round coarse beliefs. We show that the results of our main theorem on the failure of asymptotic learning in
Theorem 1 extend to this case with the appropriate generalizations.
11
We designed the experiments in order to minimize the possibility of such ties. In particular, we selected
Network 3 to have no possibility of ties under either the all-Bayesian model or the DeGroot model.
8 A. G. CHANDRASEKHAR, H. LARREGUY, AND J. P. XANDRI

Next, consider a Bayesian agent (ηi = B). Since there is incomplete information about
the types of the other agents in the network, Bayesian individuals attempt to learn about
the types ηj of all other agents in the network while learning about the underlying state
of the world in order to make their most informed guess about it in every period.12 For-
mally, the relevant states for the Bayesian agent are not just the signal endowments but
also the types of players in the network. Thus, we take the state of the world to be
ω = (s η) ∈ Ω := {0 1}n × {0 1}n . We formalize the model in Appendix B in the Sup-
plemental Material (Chandrasekhar, Larreguy, and Xandri (2020)).
In what follows, when we say “incomplete information (Bayesian) model,” we are al-
ways referring to the model with incomplete information composed of Bayesian and De-
Groot agents that learn in a coarse communication environment. Bayesian agents use
Bayes’s rule as above and DeGroot agents average their neighbors’ and own prior guesses,
and both types of agents can only coarsely communicate their best guess in every period.

2.2. Patterns of Behavior by DeGroot and Bayesian Agents


We examine several distinguishing patterns of learning behavior by DeGroot and
Bayesian agents in our setting, which we make use of when analyzing the experimental
data. To that end, it is helpful to start by defining the concepts of stuckness and a clan.
We define a node to be stuck if, from some period on, the actions she chooses are the
opposite to the optimal decision with full information (the majority of signals in s). That
is, i is stuck if there exists some t > 0 such that ait+m = θ for all m ∈ N.
Next, we define a clan as a set of nodes who are more connected among themselves
than to those outside the group. The formal definition follows after some additional no-
tation. First, given network G, and a subset H ⊂ V , the induced subgraph is given by
G(H) = (V (H) E(H)) consisting of only the links among the subgroup H. Let di (H) be
the degree of node i ∈ H counted only among partners within H.
Formally, define a group C ⊂ V to be a clan if, for all i ∈ C, di (C) ≥ di (V \C) and
|C| ≥ 2.13 An example of a clan with three nodes in Panel C of Figure 2 is that of nodes
(2, 3, 6). The entire set of nodes comprises a clan. Nodes (1, 3, 6) do not constitute a clan.
With these concepts defined, our first result describes how a set of all-DeGroot agents
in a clan that reaches a consensus opinion at any point cannot ever change their mind for
any π < 1.

PROPOSITION 1: Assume all agents in a clan C are DeGroot and there exists t ≥ 1 and
some fixed a such that for all i ∈ C, ait = a. Then ait+τ = a for all i ∈ C and τ > 0.

PROOF: See Appendix A. Q.E.D.

This immediately implies that, among a set of DeGroot learners, if any clan agrees on
the wrong state of the world at any period, then all nodes in the clan are forever stuck
on that state irrespective of the other agents’ types. The result does not depend on π
and applies for π < 1 since DeGroot agents simply average their neighbors’ past period
behavior.14 See Figure 1 for an illustration.
Next, we turn to the all-Bayesian case (π = 1). Here, since all agents are Bayesian
and π = 1 is commonly known, this is the complete information Bayesian case. Consider

12
We do not consider the possibility that players engage in experimentation in early rounds. While such is a
theoretical possibility, anecdotal evidence from participants suggests that it does not fit our experimental data.
MODELS OF SOCIAL LEARNING ON NETWORKS 9

FIGURE 1.—Contrast between DeGroot learning, where agents 2, 3, 6, and 1 remain stuck forever, and
complete information Bayesian learning where, because all agents are Bayesian and this is commonly known,
all agents converge to the truth (yellow/light gray in this example).

nodes 3 and 2 in Panel C of Figure 2. Note that N2 ⊂ N3 . In this case, the information set
of node 3 dominates that of node 2. Our second result illustrates that, when π = 1, any
agent who is informationally dominated by another always copies the dominating agent’s
prior period guess. In this example, node 2 always should copy node 3’s prior period guess.

PROPOSITION 2: Assume that all agents are Bayesian (π = 1). Consider any i and j such
that Ni ⊂ Nj . Then ait = ajt−1 for all t > 2.

FIGURE 2.—Network structures chosen for the experiment.

In addition, the theoretical and experimental literature assumes away experimentation (see, e.g., Choi, Gale,
and Kariv (2012), Mossel, Sly, and Tamuz (2015)).
13
In Supplemental Material Appendix F, we extend the definition of a clan to the case where m coarse
messages can be passed.
14
With probability (1 − π)|C| , a clan C consists of all-DeGroot agents.
10 A. G. CHANDRASEKHAR, H. LARREGUY, AND J. P. XANDRI

PROOF: The proof is straightforward and therefore omitted. Q.E.D.

We then turn to the intermediate case where π ∈ (0 1), and thus there is incomplete
information for Bayesian agents. We consider two learning patterns of these agents. First,
if a Bayesian agent ever learns whether the majority of initial signals was 0 or 1, her guess
should never change, irrespective of π. Note that, in this case, every signal need not be
learned—only that a majority is either 1 or 0. Second, any agent i who is Bayesian will
never respond to the actions of any other agent j whose neighborhood is informationally
dominated by the neighborhood of i, irrespective of ηj and π, after period 2. In our above
example, this means that, if node 3 is Bayesian, she should never respond to any behavior
by node 2 after period 2.

PROPOSITION 3: Consider any π ∈ (0 1) and suppose ηi = B.


1. If, at any period τ, agent i learns the majority of initial signals distributed θ :=
1{ n1 j sj > 12 }, then ait = θ for all t ≥ τ, irrespective of any future sequence of actions by
i’s neighbors.
2. If Nj ⊆ Ni , and that for all ω ∈ Ω, aj1 (ω) = sj , then for t > 2, sj is sufficient for Atj
when explaining ait . That is, ait = Function(sj  (At−1
k )k∈Ni \{j} ).

PROOF: The proof is straightforward and therefore omitted. Q.E.D.

In both cases, the intuition is that if j’s action reveals no new information to i, then j’s
choice only matters through the initial signal: sj = aj1 . There is, therefore, a conditional
sufficiency requirement: conditional on the actions of neighbors other than j, ait should
be conditionally independent of ajt for all t > 2.
Below, when we present our raw data, we revisit these properties in our reduced-form
results. We check the extent to which clans get stuck, informationally dominated agents
copy the actions of dominating agents, and agents who necessarily learn whether the ma-
jority of initial signals was 0 or 1, or reach a time when they necessarily dominate other
agents’ available information, ignore the actions of other agents going forward.

2.3. Asymptotic Efficiency: Long-Run Behavior in Large Networks


We have studied the differences in behavior between DeGroot and Bayesian agents in
our environment, for π ∈ [0 1]: all DeGroot, all Bayesian, and incomplete information
intermediate cases. Now we assess how efficient learning is in the long run in a large
network under these models. We show that non-trivial misinformation traps should occur
if there are a non-trivial number of clans of bounded size in the network.
Then, as a point of contrast, we compare our results to the two most studied models
in the literature. The first point of comparison is the complete information all-Bayesian
case in this exact coarse environment (π = 1 in our notation). Prior work has shown that
exactly in the same setup as ours, if all agents were Bayesian, then large, sparse networks
should do quite well at information aggregation.
The second point of comparison is the continuous DeGroot model. Note that this goes
outside of our setting in the sense that it replaces coarse communication with the abil-
ity of an agent to communicate arbitrarily fine numbers.15 When studying naive learning

15
It is also worth noting that the analog to this in a Bayesian environment would be Bayesians being able to
pass on their entire posterior about the state of the world, which we omit for the obvious reasons.
MODELS OF SOCIAL LEARNING ON NETWORKS 11

models, the literature has often turned to this model to develop intuitions. The core result
in this case is that in our setting, if naive agents can communicate with continuous bits of
information, then large networks will aggregate information.16
Again, these points of comparisons are to illustrate that in our general environment
with coarse learning, the intuitions from the literature are overturned. The main point
that we illustrate is that, for a sensible sequence of networks, both the complete infor-
mation all-Bayesian and continuous DeGroot models lead to asymptotic efficiency where
all but a vanishing share of agents converge to a right guess about the state of the world.
However, in the incomplete information model with a non-zero share of coarse DeGroot
agents, there is a non-vanishing share of agents that remain stuck with the wrong guess.
To make these comparisons precise, we need to nest the different models. Assume
without loss of generality that every agent i receives a signal pi0 that is continuously
distributed in [0 1]. In the continuous DeGroot model, these pi0 ’s and their subsequent
averages can be communicated directly. That is,

pjt−1 + pit−1
j∈Nj
pit =   
N 
i

The guess is then ait = 1{pit > 12 } and we denote the limit guess as ai∞ = limt→∞ 1{pit >
1
2
}.
Meanwhile, in the Bayesian and coarse DeGroot models, we can think of pi0 = P(θ =
1|si )—so the signal delivered at t = 0 equivalently generates a posterior. Information
transmission and updating are as before, with ait depending on At−1 i using either Bayesian
updating or DeGroot updating via majority.
We consider a sequence of networks Gn = (Vn  En ) where |Vn | = n, letting p(n) it be de-
fined as above for the continuous DeGroot model and a(n) it analogously be defined for
each model. We define a sequence as asymptotically efficient if, for all > 0,
 
lim max P lim a(n) 
it − θ ≤ = 1
n→∞ i≤n t→∞

Under very general conditions, we show that both the continuous DeGroot model and
the complete information all-Bayesian model achieve asymptotic efficiency, but that the
incomplete information model with coarse DeGroot agents may not.

THEOREM 1: Suppose Gn = (Vn  En ) with |Vn | = n is such that (i) there is a uniform
bound on degree: di (Vn ) ≤ d̄ for all i, n; (ii) the posterior distributions P(θ|si ) are non-atomic
in s for θ ∈ {0 1}; and (iii) signals are i.i.d. across agents. Then,
1. the continuous DeGroot model is asymptotically efficient,
2. the complete information all-Bayesian model is asymptotically efficient, and
3. the incomplete information Bayesian model with π < 1 Bayesian and 1 − π coarse
DeGroot agents may not be asymptotically efficient. In particular, suppose there exist k < ∞
such that Xn := #{i : i is in a clan of size k}/n is positive in the limit. Then the model is not
asymptotically efficient.

16
In Supplemental Material, Online Appendix F, we demonstrate that if DeGroot agents can transmit beliefs
that are coarse but not necessarily binary, there is still a failure of asymptotic learning if there is a non-trivial
share of the appropriate type of clan present.
12 A. G. CHANDRASEKHAR, H. LARREGUY, AND J. P. XANDRI

PROOF: See Appendix A. Q.E.D.

Note that stuckness requires clan membership. In Supplemental Material Appendix G,


we show that clans are essential for stuckness since a node must be in a clan that is itself
entirely stuck to become stuck.
Note that the asymptotic efficiency result does not crucially depend on the non-
atomicity of posteriors, but rather on the possibility of ties (when pit = 1/2). In networks
where ties do not occur (or occur with vanishing probabilities), then the asymptotic effi-
ciency result is also true for the case of binary signals (an implicit result in Mossel, Sly,
and Tamuz (2014b), also shown in Menager (2006)). Both results 2 and 3 remain true
if the Bayesian agents play a Nash equilibrium in the normal form of the game instead
of the assumption of myopic behavior by Bayesian agents. Specifically, Mossel, Sly, and
Tamuz (2015) showed that the same asymptotic efficiency result remains true in a strategic
setting. Moreover, in the incomplete information setting, the result showing asymptotic
inefficiency relies on bounding the random number of clans formed only by agents with
DeGroot types, so while Bayesian types may be playing according to an equilibrium of the
normal-form game, DeGroot types do not.
The theorem illustrates an important discrepancy between these models in terms of
social learning and why differentiating between which model best describes agents is rel-
evant. If agents cannot communicate all the information they have observed, the coarse-
ness of these messages causes beliefs to get “stuck” on a particular action, not allowing
the flow of new information. This is particularly true of DeGroot agents in the incom-
plete information model, as seen in Proposition 1. Because π is fixed in n, the share of
finite-sized clans that are all-DeGroot does not vanish in n, and therefore the share of
nodes that get stuck is non-vanishing in n. In Supplemental Material, Online Appendix F,
we extend this result to the case where coarse DeGroot agents can pass on m coarse es-
timates of beliefs {0 m−1
1
     m−2
m−1
 1} generated by rounding the prior round averages of
their neighbors’ and own coarse beliefs. For the appropriate extension of the notion of
clans, again if there is a non-vanishing share of nodes in clans, the incomplete informa-
tion model also fails asymptotic efficiency. This is prevented in the continuous DeGroot
model by allowing pit to take any number in [0 1], so arbitrarily small changes in beliefs
are effectively passed through to other agents.

2.4. What Kinds of Networks Exhibit Stuckness?


Having examined the properties of continuous DeGroot, complete information all-
Bayesian, and incomplete information Bayesian models, we address whether DeGroot-
like behavior may cause lack of asymptotic learning in practice. In particular, we study
whether realistic network structures would exhibit stuckness in a setting in which the share
of DeGroot agents is non-zero (π < 1). Specifically, we explore whether, as the number
of nodes n → ∞ asymptotically, there is a non-vanishing share of clans. If so, the share
of all-DeGroot clans is bounded from below, and consequently, there is a non-vanishing
lower bound on the share of agents getting stuck.
We begin by examining a stochastic network formation model that mimics the structure
of real-world networks—the resulting graphs tend to be both sparse and clustered.17 This
model is very similar to a number of network formation models in the statistics and econo-
metrics literature (see, e.g., Fafchamps and Gubert (2007), Graham (2017), McCormick

17
See Supplemental Material Appendix C for simulations that demonstrate this on empirical network data.
MODELS OF SOCIAL LEARNING ON NETWORKS 13

and Zheng (2015)) and has a random utility interpretation. We then relate stuckness to a
concept from graph theory called conductance and related properties of expansion. This
allows checking an eigenvalue of a standard transformation of the adjacency matrix G to
evaluate whether stuckness, and consequently lack of asymptotic learning, is possible in
networks of interest.

2.4.1. A Hybrid Model of Random Geometric Graphs and Erdős–Rényi Graphs


We build a simple, but general, model to capture sparse and clustered network struc-
tures, which resemble those in the real world. Our model toggles between two starkly dif-
ferent canonical models of network formation: Random Geometric Graphs (henceforth
RGG) and Erdős–Rényi (henceforth ER) (Penrose (2003), Erdős and Rényi (1959)). The
basic idea is that, in an RGG, nodes have positions in some latent space and tend to be
linked when they are close enough, capturing latent homophily. Meanwhile, in an ER,
nodes are independently linked.
We demonstrate that, for such a mixture, the share of clans is non-vanishing. Practi-
cally, this means that, if π < 1, we should expect that, in networks with real-world–like
structures, there will remain pockets of individuals who become irreversibly convinced of
the wrong state of the world and are unable to shift away from this, if they behave in a
coarse DeGroot manner.
Let Ω = [0 k]2 ⊂ R2 with k be a latent space.18 We say an RGG-ER mixture is (α β)-
mixed if the network, where (α β) ∈ [0 1]2 , is formed as follows. There exists a Poisson
point process on Ω, which determines which points of the latent space will receive a node.
n nodes are drawn according to this point process, with uniform intensity λ > 0. Note this
means that, for any subset A ⊂ Ω,

nA ∼ Poisson(νA ) where νA := λ dy


A

Given the draw of n nodes, the network forms as follows:


1. RGG component:
• If d(i j) ≤ r, then i and j are linked with probability α.
• If d(i j) > r, then there is no link between i and j.
2. ER component:
• Every pair ij with d(i j) > r is linked i.i.d. with probability β.
There is a simple random utility interpretation of this model similar to that used in the
literature. Let the latent utility to i of a link to j be, for θ > γ,
   
ui (j) = θ · 1 d(i j) ≤ r + γ · 1 d(i j) > r − ij 

where ij ∼ F(·), and F(·) is Type I extreme value. Assume ij = ji so that, if one wants
the link, so does the other so that mutual consent is satisfied. Then

P(Gij = 1) = F(θ) · 1{d ≤ r} + F(γ) · 1{d > r}

Setting α = F(θ) and β = F(γ) gives us the (α β)-mixture of RGG and ER.
We assume α ≥ β in our applications. Note that, if α = 1 and β = 0, then this is just a
standard RGG. If α < 1 and β = 0, this is what is called a soft RGG. And if α = β > 0,
then this is a standard ER.

18
The result is generalized to the case with Ω = [0 k]h . Here we take h = 2.
14 A. G. CHANDRASEKHAR, H. LARREGUY, AND J. P. XANDRI

Formally, we are interested in a sequence of networks (Ωk  αk  βk ) such that vol(Ωk ) →


∞.19 Since we study sparse graphs, we need to ensure that both components contribute
a sparse number of links. The RGG component is sparse by definition for fixed r and we
have αk either converging to a positive constant or fixed along the sequence. For the ER
component, this requires βk = O( vol(Ω1
).
k)
We derive lower and upper bounds on a particular class of clans, which we call local
clans. A local clan is a group located in a ball of radius r/2, so that the probability of any
two nodes in this group having a link is αk . A local clan has an expected number of nodes
within the ball that is constant along the growing sequence of nodes.

THEOREM 2: Consider any (αk  βk )-mixture of Random Geometric and Erdős–Rényi


graphs, and a sequence {Ωk = [0 k]2 }k∈N with k → ∞ so vol(Ωk ) → ∞. If βk = O( vol(Ω 1
)
k)
and αk → α > 0, then the share of nodes belonging to local clans remains positive as k → ∞.
If αk → 0, the share of such nodes vanishes as k → 0.

The proof is a direct corollary to Proposition A.2 in Appendix A.20 The proposition
illustrates that, in stochastic network formation models that mimic the structure of real-
world networks where graphs tend to be both sparse and clustered, if π < 1, a non-trivial
share of nodes is likely to get stuck on the wrong state of the world, and thus asymptotic
learning is infeasible.21

2.4.2. Assessing the Presence of Clans


A natural question to ask is whether the existence of clans is related to some well-
studied structural property of the network, such as expansiveness. We show that the ex-
istence of clans relates to a measure in graph theory called conductance and a related
spectral property of expansiveness. We show that any sequence of graphs that is suffi-
ciently expansive in a specific sense cannot have a non-vanishing share of clans.
The conductance (Cheeger constant in Chung’s terminology (Chung (1997))) of a
graph is
∂S
φ(g) := min 
S:0<vol(S)≤ 12 vol(V ) vol(S)
where ∂S := i∈S di (V \S) is the number of links from within set S to V \S and vol(S) =
i∈S di (S) + i∈S di (V \S).

PROPOSITION 4: Any graph sequence (Gn ) that has a non-vanishing share of clans of
uniformly bounded size has limn→∞ φ(Gn ) < 12 .

PROOF: See Appendix A. Q.E.D.

The conductance of the graph is well-known to be difficult to compute (NP-Complete).


We appeal to the bounds in the literature called the Cheeger inequality. This uses the

19
For a general h, we would have vol(Ω) = kh and we take k → ∞.
20
We also derive closed-form bounds on the probability that a given node is part of a local clan in Proposi-
tion A.1 in Appendix A.
21
Notice that in the particular case of a sparse Erdős–Rényi graph, the sequence would have αk = βk → 0,
and so as noted at the end of the proposition, the share belonging to such local clans would vanish.
MODELS OF SOCIAL LEARNING ON NETWORKS 15

spectrum of the Laplacian of g, L := I − D−1/2 GD−1/2 , where D = diag{d1 (G)    


dn (G)}, to bound φ(G) by the second-smallest eigenvalue of the Laplacian:
λ2 (L) 
√ ≤ φ(G) ≤ 2λ2 (L)
2
We say a graph is an h-Laplacian expander if λ2 (L) ≥ h.

2
COROLLARY 1: Consider a sequence of graphs Gn which are 2
-Laplacian expanders.
Then no graph in the sequence contains a clan.

This means that one can simply assess whether agent stuckness or lack of asymptotic
learning is a likely problem

of a graph by looking at the values of the Laplacian’s spectrum.
Any graph that is a 22 -expander cannot have clans. As such, a sufficient condition to rule
out the possibility of stuckness is that the graph has sufficiently high expansion properties.
In Appendix C, we study the expansiveness of the 75 Indian village networks collected
in Banerjee, Chandrasekhar, Duflo, and Jackson (2019). We look at the household-level
network focusing on the graph of links through which information √
is diffused. First, we
2
find that no graph has an expansiveness above 04, let alone 2 . Second, after simulating
a DeGroot model with p = 06, we show that the share stuck is high, with a maximum
of over 45% and an average of roughly 20–25%. Third, if we consider nodes that began
with an incorrect signal, we find that there are nearly 60% of those nodes that get stuck in
one village and on average there are around 30% stuck. Fourth, we find that the stuckness
rate increases with the lack of expansiveness of the network. Altogether, this suggests that
the structure of empirical networks in this setting has a considerable number of clans and
may be susceptible to misinformation traps.

3. TWO EXPERIMENTS
3.1. Setting
First, in 2011, we conducted 95 experimental sessions with a total of 665 subjects across
19 villages in Karnataka, India. The villages range from 1.5 to 3.5 hours’ drive from Ban-
galore. We initially chose the village setting because social learning through networks
is of the utmost importance in rural environments; information about new technologies
(Conley and Udry (2010)), microfinance (Banerjee, Chandrasekhar, Duflo, and Jackson
(2013)), politics (Alt, Jensen, Larreguy, Lassen, and Marshall (2019), Cruz, Labonne, and
Querubin (2018), Duarte, Finan, Larreguy, and Schechter (2019)), among other things,
propagates regularly through social networks.
Second, in 2017, we conducted 50 experimental sessions with a total of 350 subjects
from the pool of undergraduate students at ITAM in Mexico, who were largely junior, and
from economics and political science. We chose this second setting to study how learning
patterns among urban, well-educated individuals may differ from those of the rural poor.
ITAM is one of Mexico’s most highly-ranked institutions of higher education and system-
atically places its undergraduate students studying economics or political science in top
PhD programs in the United States and Europe.

3.2. Game Structure


In both settings, the game structure followed the setup of the model. Every set of seven
subjects anonymously played the learning game on three different network structures,
16 A. G. CHANDRASEKHAR, H. LARREGUY, AND J. P. XANDRI

FIGURE 3.—Timeline.

displayed in Figure 2, which were designed to distinguish between Bayesian and DeGroot
learning behavior. Positions were randomized and order of the networks for each set of
seven subjects was also randomized.
As reflected by Figure 3, every subject received a signal about the binary state of the
world correct with probability 5/7. Then agents in each period submitted their guess about
the state of the world. In every period, every agent could see the entire history of guesses
of their network neighbors, including their own, before making their subsequent guesses.
After each round, the game continued to the next round randomly and, on average, lasted
six rounds.
Subjects were paid for a randomly chosen round from a randomly chosen game. In
India, subjects were paid Rs. 100 if they guessed the state correctly, as well as a Rs. 20
participation fee—just under a day’s wage at the time in this setting. In Mexico, students
were paid $100 pesos if they guessed the state correctly and an additional $50 participation
fee, which amount to slightly above the hourly rate for working as research assistants.

3.3. Implementation
In every Indian village, we recruited an average of 35 individuals from a random set of
households from each village. We brought the individuals to a public space (e.g., marriage
hall, school, dairy, barn, clusters of households) where we conducted the experiment.
While individuals were recruited, the public space was divided into “stations.” Each sta-
tion had a single staff member to monitor the single participant assigned to the station at
random to ensure that participants could not observe each other or communicate. Often
stations would be across several buildings.
At ITAM, we recruited seven undergraduates for each experimental session through
emails to the economics and political science mailing lists. Students were congregated
in a spacious classroom and placed throughout the room also in stations so that they
were unable to see and communicate with other participants. In contrast to India, each
experimental session was run by two staff members since we could not afford the staff
to monitor each of the participants individually. However, we observed no instances of
students trying to talk to another or look at other participants’ signals or guesses.
The experimental protocol was identical in both India and Mexico. At the beginning
of each game, all participants were shown two identical bags, one with five yellow balls
and two blue balls, and the other with five blue balls and two yellow balls. One of the two
bags was chosen at random to represent the state of the world. Since there was an equal
probability that either bag could be chosen, we induced priors of 1/2. As the selected bag
contained five balls reflecting the state of the world, participants anticipated receiving
independent signals that were correct with probability 5/7.
MODELS OF SOCIAL LEARNING ON NETWORKS 17

After an initial explanation of the experiment and payments, the bag for the first game
was randomly chosen in front of the participants. The participants were then assigned to
stations where each was shown a sheet of paper with the entire network structure of seven
individuals for that game, as well as her own location in the network.
Once in their stations, after receiving their signals in round zero, all participants si-
multaneously and independently made their best guesses about the underlying state of
the world. The game continued to the next round randomly and, on average, lasted six
rounds. If the game continued to the second round, at the beginning of this round, each
participant was shown the round-1 guesses of the other participants in her neighborhood
through sheets of paper that presented an image of the network and colored in their
neighbors’ guesses. Agents updated their beliefs about the state of the world and then
again made their best guesses about it. Once again, the game continued to the following
round randomly. This process repeated until the game came to an end.
Notice that, after the time-zero set of signals, no more signals were drawn during the
course of the game. Participants could only observe the historical decision of their neigh-
bors and update their own beliefs accordingly. Importantly, individuals kept the informa-
tion about the guesses of their neighbors in all previous rounds until the game concluded.
The reason was that we intend to test social learning, but not the ability of participants to
memorize past guesses.
After each game, participants were regrouped, the color of the randomly chosen bag
was shown, and if appropriate, a new bag was randomly chosen for the next game. Partic-
ipants were then sent back to their stations and the game continued as the previous one.
After all three games were played, individuals were paid the corresponding amount for a
randomly chosen round from a randomly chosen game, as well as their participation fee.
Participants then faced non-trivial incentives to submit a guess that reflected their belief
about the underlying state of the world.

4. REDUCED-FORM RESULTS
We assess whether the learning patterns described in Section 2.2 hold in our experi-
mental data. Table I presents the results for our experiment among Indian villagers and
Table II presents the results for our experiment in Mexico with undergraduate students
from ITAM.
Recall that we had identified four key patterns. First, irrespective of π < 1, if there are
a set of DeGroot agents in a clan, once a clan comes to a consensus, all agents remain
stuck. To assess the prevalence of this feature in the experimental data, Panel A presents
the share of times that clans remain stuck despite that the Bayesian model would have
predicted a change along the path. Here an all-DeGroot model would predict that the
share is 1.
Second, when all agents are Bayesian and have common knowledge of this, then any
agent whose information set is dominated by that of another must follow the other agent’s
prior behavior for all t > 2. Panel B shows the share of times that those informationally
dominated agents fail to copy their informationally dominating neighbors, which the com-
plete information Bayesian case predicts to be 0.
Third, even in an incomplete information setup, if any Bayesian agent learns through
any history the majority of the initial signals, then the agent must play this for all future
periods. Panel C, column 2, presents the share of times a Bayesian agent would have
18 A. G. CHANDRASEKHAR, H. LARREGUY, AND J. P. XANDRI

TABLE I
REDUCED -FORM PATTERNS: INDIAa

Panel A: Stuckness

(1)
Share of clans that remain stuck on the wrong guess
given that the Bayesian model would have predicted a change along the path
(DeGroot predicts 1)

0.946
(0.0303)
Observations 74

Panel B: Information dominance

(1)
Share of times information dominated agent fails to copy dominating agent
(Complete Information
Bayesian predicts 0)

0.829
(0.0380)
Observations 140

Panel C: Information revelation

(1) (2)
Share of times an agent necessarily learns
the majority of signals and
yet changes guess along path
Share of times an agent responds given that the DeGroot assessment
inefficiently to neighbor’s actions would have changed guess
(Bayesian predicts 0) (Bayesian predicts 0)

0.931 0.945
(0.0203) (0.0249)
Observations 159 73
a Standard errors are reported in parentheses. Panel A corresponds to the feature that, in DeGroot models, a clan that is stuck
remains so until the end. Panel B is motivated by the fact that an agent should never respond to the behavior of someone whose
information set is a subset under a Bayesian model, which is robust to incomplete information (column 2). Similarly, a Bayesian agent
in a complete Bayesian world should only copy their information dominating neighbor and do nothing else (column 1). Panel C looks
at the feature that, irrespective of whether agents are Bayesian or DeGroot, in round 2 they will play the majority, and therefore it is
possible for Bayesian agents, even in an incomplete information world, to learn the majority of signals in certain cases and thus they
should then stick to this guess.

learned whether the majority of initial signals was 0 or 1 and yet changes guess along the
path in a manner consistent with DeGroot learning.22
Finally, even in an incomplete information setup, any Bayesian agent must never con-
dition her decision on the prior period action of an informationally dominated agent and

22
We compute this by enumerating all cases of the complete information Bayesian model in the networks
we used and then calculating this share directly.
MODELS OF SOCIAL LEARNING ON NETWORKS 19
TABLE II
REDUCED -FORM PATTERNS: MEXICOa

Panel A: Stuckness

(1)
Share of clans that remain stuck on the wrong guess
given that the Bayesian model would have predicted a change along the path
(DeGroot predicts 1)

0.303
(0.144)
Observations 33

Panel B: Information dominance

(1)
Share of times information dominated agent fails to copy dominating agent
(Complete Information
Bayesian predicts 0)

0.545
(0.0660)
Observations 112

Panel C: Information revelation

(1) (2)
Share of times an agent necessarily learns
the majority of signals and
yet changes guess along path
Share of times an agent responds given that the DeGroot assessment
inefficiently to neighbor’s actions would have changed guess
(Bayesian predicts 0) (Bayesian predicts 0)

0.614 0.600
(0.0862) (0.117)
Observations 57 35
a Standard errors are reported in parentheses. Panel A corresponds to the feature that, in DeGroot models, a clan that is stuck
remains so until the end. Panel B is motivated by the fact that an agent should never respond to the behavior of someone whose
information set is a subset under a Bayesian model, which is robust to incomplete information (column 2). Similarly, a Bayesian agent
in a complete Bayesian world should only copy their information dominating neighbor and do nothing else (column 1). Panel C looks
at the feature that, irrespective of whether agents are Bayesian or DeGroot, in round 2 they will play the majority, and therefore it is
possible for Bayesian agents, even in an incomplete information world, to learn the majority of signals in certain cases and thus they
should then stick to this guess.

instead should restrict to only using the agent’s initial signal. In column 1 of Panel C,
we look at the share of times an agent fails at this, which should be 0% of the time for
Bayesian agents.23

23
Recall information dominance means that an agent’s information set contains another’s information set,
so for instance, if node 3 information dominates nodes 2 and 6 in Network 3 from the experiment. We calculate
this share by enumerating all such cases.
20 A. G. CHANDRASEKHAR, H. LARREGUY, AND J. P. XANDRI

In the Indian village data, we find evidence that is consistent with DeGroot behavior
and inconsistent with Bayesian behavior. In Panel A of Table I, we show that the share
of clans that remain stuck, conditioning on the cases where Bayesian agents would have
changed along the path, is 0.946. Then, in Panel B, we show that 82.9% of the time, an
agent fails to copy an informationally dominating agent. In Panel C, column 2, we find
that in 94.5% of the instances where an agent should have learned whether the majority
of initial signals was 0 or 1, the agent changed her opinion in the direction suggested by
DeGroot updating. Finally, in Panel C, column 1, we find that 93.1% of the time, agents
inefficiently respond to informationally dominated neighbors’ actions.
The data from the experiment with ITAM students in Mexico exhibit considerably dif-
ferent patterns. In fact, there is evidence consistent with both DeGroot and Bayesian
behavior, indicating that there is likely a more heterogeneous mix of agents from the per-
spective of our incomplete information model. Panel A shows that stuckness occurs only
30.3% of the time. In Panel B, we find that informationally dominated agents fail to copy
dominating agents 54.5% of the time. Column 1 of Panel C shows that, when agents learn
whether the majority of initial signals was 0 or 1, they change their guesses in the man-
ner that DeGroot learners would 60% of the time, and column 2 indicates that agents
inefficiently respond to informationally dominated neighbors’ actions 61.4% of the time.
Taken together, we see that the Indian village population behaves consistent with
the (all) DeGroot model and inconsistently with any Bayesian model (or at least one
with a π significantly different from 0). Meanwhile, the undergraduate student popu-
lation at ITAM, who are considerably more educated, behave in a manner reflecting
a possibly mixed population. The results demonstrate that context affects whether we
should consider a pure DeGroot model, a pure Bayesian model, or an incomplete in-
formation model. Furthermore, it suggests that the sorts of misinformation traps that
we have described—those that arise when networks have clans and some individuals ex-
hibit DeGroot-style learning behavior—might be much more of a problem for the village
population. The Indian villagers, therefore, are more vulnerable to misinformation traps,
whereas some set of agents in the more educated population may be able to overcome
them.

5. STRUCTURAL ESTIMATION
We now turn to our structural estimation. Our primary approach is to consider the
entirety of the social learning outcome as the subject of our study. So we take, therefore,
as the object to be predicted is the entire matrix of actions AT = (aiτ )nT i=1τ=1 and study
which π best explains the data. Theory predicts a path of actions under the true model
for each individual in each period, given a network and a set of initial signals. This method
then maintains that the predicted action under a given model is not path-dependent and
is fully determined by the network structure and the set of initial signals. We denote this
approach as the network-level estimation.
An alternative approach is to perform an individual-level estimation. In this case, the
observational unit is each individual’s action. In contrast with the network-level estima-
tion, the action prescribed by theory is conditional on the information set available to i at
t − 1 and the ex ante probability that a given individual is a Bayesian learner as opposed
to some DeGroot learner.
We estimate the parameters of the model under both approaches, but we prefer the
network-level approach. Our aspiration is not simply to study how a given individual up-
dates but rather, when we focus on a community as a whole, how we should think about
MODELS OF SOCIAL LEARNING ON NETWORKS 21

the macro-dynamics of, and the model that governs, the social learning process. Nonethe-
less, both approaches yield similar results.
For both estimation approaches, we estimate the model parameters that maximize the
likelihood of observing the experimental data. In every period τ, there is an action taken
by i, aiτ . The type of the agent and the history determine the action. Given a history
At−1 = (aiτ )i=1τ=1
nt−1
, there is a prescribed action under the model of behavior which can
depend on the agent’s type ηi , the history of observed play, and the prior probability that
an agent is Bayesian: ait (At−1 ; η π). Agents also make mistakes with probability , and
thus, given the prescribed option, the observed data for the econometrician are

a with probability 1 − 
ait = it 
1 − ait with probability 

for any t ≥ 2. The history of play is then that of the observed actions, which can differ from
the prescribed actions. As noted, for computational feasibility, we assume that mistakes,
and thus such differences, are not internalized by agents.
The matrix ATv = [aitv ] is the data set for a given village v (or session in the case of the
Mexican sample). Suppressing v until it is needed, the likelihood is
       
L π ; AT = P AT |π = P aT |AT −1  π · P aT −1 |AT −2  π · · · P(a1 |π )

Notice that P(a1 |π) and P(a2 |π) are both independent of π, because they are indepen-
dent of η: in period 1, every agent plays their signal, and in period 2, every agent plays the
majority.
Let xit = 1{ait = ait (At−1 ; η π)}, which determines whether the observed action
matches that which was prescribed by the model given the history, type vector, and pa-
rameter value. As a result,

  
n

P at |At−1  π = (1 − )xit 1−xit
P(η|π)
i=1 η

Taking logs, we can write the estimator as

    
  V T n
t−1 1−xit [At−1 ;ηπ]
π = argmax  π; A  T
= log (1 − )xit [A ;ηπ] · P(η|π) 
π∈[01] η
v=1 t=3 i=1

In Appendix D, we prove the consistency of the estimator and show simulations


that indicate that our estimators are consistent. Specifically, first, under each π =
{0 01     09 1}, we generate learning data from the model. Then, using both network-
level and individual-level estimation, we show that our estimators consistently recover the
true parameter used for data generation.
The intuition for identification is as follows. The maximum likelihood estimator sets the
score function (the derivative of the log-likelihood) to zero. So it assesses all configura-
tions of learning types of the seven nodes such that, at the parameter value, the likelihood
of this configuration changes steeply if π changes minimally. While far from comprehen-
sive at capturing all the experimental variation used to identify π, the reduced-form re-
sults of learning patterns that we describe above provide considerable contrast between
Bayesian and DeGroot learning, thus contributing to its estimation. For example, take
22 A. G. CHANDRASEKHAR, H. LARREGUY, AND J. P. XANDRI

the setting of a clan where, if all the agents were Bayesian learners, they would realize
they do not need to respond to the changes in actions of their informationally dominated
neighbors. Here the likelihood of stuckness declines steeply as π increases, as in that
case it becomes increasingly likely that some agents are Bayesian and therefore become
unresponsive to the behavior of those agents.
In order to perform inference on the parameters, we compute standard errors over the
parameter estimates via a block bootstrap procedure that accounts for the dependence
in the data among the individuals playing the same game and session. Specifically, we
draw with replacement the same number of session-game blocks of observations that we
have in each of our experimental samples and compute the parameters that maximize the
corresponding likelihood.24
Before turning to our structural estimates of π, we estimate , which is common irre-
spective of network-level or individual-level estimation. Note that, for any node i in any
graph v, both the Bayesian and DeGroot models, irrespective of π, prescribe the majority
as an action in the second period. Therefore, recalling that Ni∗ = {j : gij = 1} ∪ {i},
      
1 aj2 = majority aj1 : j ∈ Ni∗ · 1 unique majority aj1 : j ∈ Ni∗
v j
:=     
1 unique majority aj1 : j ∈ Ni∗
v j

By standard arguments, this is a consistent and asymptotically normally distributed esti-


mator since this is composed of a set of Bernoulli trials. Panel B of Table III shows that, in
both cases, is similar: 0.1288 (standard error 0.007) in the Indian experimental sample
and 0.134 (standard error 0.013) in the Mexican experimental sample. This means that
about 87% of the time, agents act in accordance with the prescribed action under the
incomplete information Bayesian model irrespective of the π.

5.1. Main Structural Results: Network-Level Estimation


Next, we turn to the estimation of π. Under the network-level approach, we treat the
entire path of actions by all agents in the network as a single observation. From the
network-level perspective, we take ait (At−1 ; η π) = ait ((ai0 )ni=1 ; η π). This means that
given a signal endowment, a type endowment, and a parameter π, the path of all agents’
prescribed action in each period is deterministic. Then the observed actions are just inde-
pendently identically distributed perturbations of these prescribed actions.
Panel B of Table III presents the results of our structural estimation from the network-
level perspective. In column 1, we present the results for the data from the experimental
sample of Indian villagers and, in column 2, the data from the Mexican student experi-
mental sample. In the Indian sample, we find that the share of Bayesian agents is low—π
is 01 (standard error 0.130) and we cannot reject that it is statistically different from zero.
This is consistent with the reduced-form results in Table I, which, for example, indicate
that over 90% of behavior, when Bayesian and DeGroot models disagree on the guesses
that ought to be made, is consistent with DeGroot behavior.
Similarly, the structural estimates of π are consistent with our prior reduced-form re-
sults from Table II for the Mexican sample. There we find mixed behavior, where, for

24
This procedure is analogous to clustering and, therefore, is conservative by exploiting only variation at the
block level.
MODELS OF SOCIAL LEARNING ON NETWORKS 23
TABLE III
STRUCTURAL ESTIMATESa

Panel A:

(1) (2)
India Mexico

0.1288 0.134
(0.007) (0.013)

Panel B: Network Level

(1) (2)
πIndia πMexico

0.1 0.5
(0.130) (0.184)

Panel C: Individual Level

(1) (2)
πIndia πMexico

0.1 0.4
(0.113) (0.268)
a Block-bootstrapped standard errors
at the session level for π and at the agent
level for are reported in parentheses.

example, agents that necessarily learned whether the majority of initial signals was 0 or 1
change their action consistent with DeGroot learning 60% of the time, and we estimate
that π = 05 (standard error 0.184).

5.2. Individual-Level Estimation


Having looked at the estimates of π from a network-level approach, we turn our
attention to the estimates from an individual-level approach. From this perspective,
ait (At−1 ; η π) = ait (At−1
i ; ηi  π) depends on the entire observed history Ai , the agent’s
t−1

type ηi , and the commonly known π.


Observe that because there is an that is not internalized by the agents, it is possible
for them to reach a zero-probability event. We, therefore, define the model as proceeding
until any agent hits a zero-probability event. The terminal round T  , which is a function
of the signal and type endowment as well as the sequence of shocks, is then endogenously
determined as the round prior to any agent hitting a zero-probability event. The model
is silent after this and therefore we treat the data in the same way. This constitutes a
well-defined data-generating process and has a well-defined likelihood. We elaborate on
this in Appendix D and demonstrate consistency. In practice, we consider the data until
T = 3 as 58% of the sessions had at least one agent hit a zero-probability information set
at T = 4.
We present our results in Panel C of Table III. In column 1, we present the results
for the data from the sample of Indian villagers and, in column 2, the data from the
24 A. G. CHANDRASEKHAR, H. LARREGUY, AND J. P. XANDRI

Mexican student sample. In the Indian sample, we find that the share of Bayesian agents is
low—π = 01 (standard error 0.113)—and again we cannot reject that it is different from
zero. This is consistent with the reduced-form results in Table I. Similarly, the structural
estimates of the share of Bayesian agents—π = 04 (standard error 0.268)—are consistent
with our prior reduced-form results from Table II for the Mexican student data.

5.2.1. Trembles and Quantal Response


In our model, agents arrive at zero-probability events.25 Since agents do not internalize
that others can make errors, they may arrive at histories that are not rationalizable. Our
individual-level estimation circumvents this by defining the model to terminate at the first
zero-probability information set.
An a priori natural alternative way to eliminate the zero-probability information set
problem is to introduce disturbances (e.g., trembles as in the quantal-response equilib-
rium in Choi, Gale, and Kariv (2012)). Individuals can make mistakes with some prob-
ability, and Bayesian agents, knowing the distribution of these disturbances, incorporate
this. Unfortunately, this approach is computationally unfeasible for networks beyond a
trivial size. To see this, consider the simpler case where π = 1, and thus there is common
knowledge of this. Let us consider the cases with and without trembles.

PROPOSITION 5: The algorithm for computing Bayesian learning with no disturbances is


Θ(T ).26 Moreover, it is asymptotically tight; that is, any algorithm implementing Bayesian
learning must have a running time of at least Θ(T ).

PROOF: See the computation in Appendix A. Q.E.D.

Specifically, the algorithm is Θ(n4n T ). If n was growing, this algorithm would be ex-
ponential time, but in our case, n is constant. In Appendix A, we similarly show that the
algorithm for the incomplete information model is Θ(n42n T ). Second, we show that the
extension of this algorithm to an environment with disturbances is computationally in-
tractable.

PROPOSITION 6: Implementing the Bayesian learning algorithm with disturbances has a


computational time complexity of Θ(n4n(T −1) ). Moreover, the problem is NP-hard in (n T ).

PROOF: See the computation in Appendix A. Q.E.D.

To see the computational burden of introducing trembles, we compare them to their


deterministic counterparts. For the π = 1 model, the algorithm with trembles with T = 6
involves 119 × 1016 more computations than the deterministic model. With the same T ,
the incomplete information model (π ∈ (0 1)) involves 865 × 1032 more calculations than
its deterministic counterpart. Suppose that the deterministic π = 1 model takes 1 sec-
ond to run. Then the deterministic incomplete information model (again, without trem-

25
At T = 4, 58% of sessions arrive at a zero-probability event and this is considerably worse thereafter.
26
Recall that we say f1 (n) ∈ Θ(f2 (n)) if f1 is asymptotically bounded above and below by f2 , up to a multi-
plicative constant. Formally, if ∃c1  c2 > 0 n such that ∀n > n, c1 · |f2 (n)| < |f1 (n)| < c2 · |f2 (n)|.
MODELS OF SOCIAL LEARNING ON NETWORKS 25

bles) takes 4 1/2 hours. The trembling hand π = 1 model, however, takes approximately
377,346,524 years.

6. CONCLUSION
We study a model of incomplete information of learning on social networks in which
individuals aim to guess a binary state of the world after a series of rounds of repeated
coarse communication in which they transmit their guesses to their network neighbors
each period. Agents are heterogeneous in their sophistication of learning: they can be
of either Bayesian or DeGroot type, where the former agents are cognizant of the dis-
tribution of Bayesian types in the population. This model nests the prior models in the
literature that study similar settings.
We identify key network patterns that separate DeGroot and Bayesian learning behav-
ior in a coarse learning environment with incomplete information. One such concept is
that of clans—subgraphs of nodes who each have more links within the group as com-
pared to outside the group. We show that realistic networks tend to have clans, which
lead to the failure of asymptotic learning—non-trivial shares of agents will never learn
the truth—under the incomplete information model with coarse communication. This re-
sult is robust to strategic behavior on the part of Bayesian agents.
Our empirical results demonstrate that the incomplete information model fits the data
well, but the mixing parameter varies by context. In the Indian villager experimental
sample, estimates indicate that there are approximately 10% Bayesian agents, whereas
the data are best explained in the Mexican student experimental sample by a share of
50% Bayesian agents. These contrasting results point at the importance of contextual
factors to understand how individuals engage in social learning. It is possible, for in-
stance, that more vulnerable populations may be more subject to DeGroot-type learn-
ing behavior. Future work could systematically assess the relevance of various contextual
factors.
An interesting direction to take this line of inquiry is to think about the relation-
ship between the network-formation and social learning processes. As noted by Jackson,
Rodriguez-Barraquer, and Tan (2012), among others, relational contracting motives gen-
erate the need for triadic closure: friends of friends tend to be friends themselves. In
fact, the social quilts of Jackson, Rodriguez-Barraquer, and Tan (2012) consist of only
node-adjacent clans, of which our Network 3 is an example. If agents only communicate
with their favor-exchange networks,27 and the incomplete information learning setup de-
scribes agents’ behavior, asymptotic learning would fail rampantly in communities that
had to overcome contracting problems. Practically speaking, this suggests that vulnera-
ble communities such as villages which need to organize themselves to share risk would
precisely be those where we would expect misinformation traps.

APPENDIX A: PROOFS
PROOF OF PROPOSITION 1: The proof is by induction. Without loss of generality, sup-
pose ait = 1 for all i ∈ C. Of course, for τ = 0 the result is true. Suppose ait+τ−1 = 1 for

27
A natural reason for this comes from multiplexing motives. Fixed costs may be required to maintain links
and therefore it makes sense naturally to use links for multiple reasons such as through layering financial,
informational, and social links.
26 A. G. CHANDRASEKHAR, H. LARREGUY, AND J. P. XANDRI

all i ∈ C. Let T u (i t + τ) = di1+1 [ait+τ−1 + j∈Ni ajt+τ−1 ] be the index that defines uniform
weighting so that if T u (i t + τ) > 1/2, then ait+τ = 1, independent of the particular tie
breaking rule used. We show that T u (i t + τ) > 1/2:
 
ajt+τ−1 ajt+τ−1
j∈{Ni ∪{i}}∩C j∈Ni ∩C c
T u (i t + τ) = +
di + 1 di + 1

ajt+τ−1
di (C) + 1 j∈Ni ∩C c di (C) + 1
=
 + ≥ 
di + 1 di + 1 di + 1
(i)

using in (i) the fact that ajt+τ−1 = 1 for all j ∈ C. Since di = di (C) + di (V \C) for any
set C : i ∈ C, and di (C) ≥ di (V \C), we then have that did(C)+1
i +1
> 1/2, as we wanted to
show. Q.E.D.

PROOF OF THEOREM 1: For (1), Golub and Jackson (2010) studied a model where ini-
tial beliefs (or signals) p(n) it ∈ [0 1] are independently distributed, with some common
mean μ = E[p(n) i0 ] and common finite variance, and agents update their beliefs according
to a DeGroot model with weighting matrix T(n). They showed that, if T(n) corresponds
to a uniform DeGroot weighting model (as the one we use), then plimn→∞ maxi≤n |p(n) i∞ −
μ| = 0, where p(n) i∞ = lim p
t→∞ it
(n)
(see Corollary 2 in their paper). This corresponds to their
definition of wise sequences of DeGroot weighting matrices (Definition 3). For our appli-
cation, let us assume (without loss of generality) that θ = 1, and pi0 = P(θ = 1 | si ). Given
the state, the distribution of pi0 is binary (with values p and (1 − p), with probabilities
p and (1 − p), respectively), independent and identically distributed across agents, with
mean μ = E[pi0 | θ = 1] = p2 + (1 − p)2 . Therefore, if agents communicate their initial
posterior beliefs (after observing their original signals), then plimn→∞ maxi≤n |p(n) i∞ − μ| =
0 (Golub and Jackson (2010)). Since p > 1/2, we also have μ > 12 , and hence whenever
pi∞ ≈ μ, then ai∞ = θ, implying then that limn→∞ P{maxi≤n |a(n) i∞ − θ| ≤ } = 0 for all
> 0, a stronger result than the one we use. An analogous result works if θ = 0 (with
μ = 2p(1 − p)).
Result (2) on Bayesian action models is the central theorem in Mossel, Sly, and Tamuz
(2014b).
For Result (3), the assumption that Xn → x > 0 implies that, for large enough n, there
exist a number hn ∈ N of clans of k members who are disjoint; that is, for every n, there
j=h j=h
exist sets {Cjn }j=1 n ⊂ Vn such that (a) j=1 n Cjn = ∅ and (b) Cjn is a clan with |Cjn | = k
for all j. Moreover, hn → ∞ (since each Cjn has only k < ∞ members). Define h =
lim infn→ hn /n. In the incomplete information model, a clan C ⊆ V of k members where
every agent is a DeGroot type and every agent gets the wrong signal (i.e., ηi = D and
si = 1 − θ for all i ∈ C) plays the wrong action forever, a corollary of Proposition 1. This
happens with probability α := (1 − p)k (1 − π)k > 0. Therefore, in the limit, a fraction α
of all the disjoint clans satisfy this property, implying that at least a share αhk of agents
chooses ait = 1 − θ at every t ∈ N, thus showing the desired result. Q.E.D.
MODELS OF SOCIAL LEARNING ON NETWORKS 27

PROOF OF PROPOSITION 4: Consider any set S such that 0 < vol(S) ≤ 12 vol(V ). Ob-
serve

di (V \S)
i∈S

di (S)
∂S i∈S
=  
vol(S) di (V \S)
i∈S
1+ 
di (S)
i∈S

Consider a clan C satisfying the above requirement. Because it is a clan, di (C) >
di (V \C) for every i. This immediately implies that

di (V \C)
i∈C

di (C)
i∈C 1
 <
di (V \C) 2
i∈C
1+ 
di (C)
i∈C

and, therefore, since we are taking the minimum over all sets and C is just one such set,
φ(g) < 12 . By assumption, the share of clans of bounded size is positive in the limit, and
thus the clans satisfy the requirement above along the sequence of gn . Thus, the result
follows. Q.E.D.
√ √
PROOF OF COROLLARY 1: Since λ2√(L) 2
≤ φ(g) ≤ 2λ2 (L) if λ2 (L)

2
> 1
2
or λ2 (L) > 22 ,
1
then φ(g) > 2 , which proves the result. Q.E.D.

A.1. Proofs of Propositions 5 and 6


PROOF OF PROPOSITION 5: Let Ωt be the set of states that agent i has to integrate over
at time t. The basic algorithm (in this general version) involves two states: the indicator
function of the set Pi (ω) for each ω ∈ Ωt and the action function ait (ω). We define

  1 if ω ∈ Pit (ω)
σ t i ω ω :=
0 otherwise

and
αt (i ω t) := ait (ω)
to compute the objects Pit (ω) and ait (ω) numerically, as in Appendix B.4. To compute
them, we then have to loop across #(Ωt )× #(Ωt ) states for each (i t) to update σ t to σ t+1
and #(Ωt ) to update αt . The number of operations is then t i ( w∈Wt (k + ŵ∈Wt k)),
28 A. G. CHANDRASEKHAR, H. LARREGUY, AND J. P. XANDRI

where k is the number of computations done in each step. In the deterministic complete
information model (without trembles), Ωt = S = {0 1}n and then
    
Computations = nT 2n 1 + 2n k = Θ nT 4n 

Similarly, in the deterministic incomplete information model,


    
Computations = nT 4n 1 + 4n k = Θ nT 16n 

The ratio between the complete and incomplete information deterministic models is
then
  
nT 4n 1 + 4n k n1+4
n
 n   = 2 ≈ 4n  Q.E.D.
nT 2 1 + 2n k 1 + 2n

So, for a network of n = 7, the relative complexity of the incomplete information model
is approximately 16,258.

PROOF OF PROPOSITION 6: The trembling hand, complete information model needs


agents to integrate over 2n(t−1) states, at least, in each round; since there is no longer
a deterministic mapping between information sets and signal profiles, agent i needs to
integrate over the actions of other agents. Although agent i actually does not observe
the information of n − di agents, for rounds t ≥ 3 we have to compute her beliefs about
those agents’ information sets. The partitional model presented in Appendix B.4 does
not suffer this problem, by computing beliefs on all states, which we do here as well.
Therefore, #(Wt ) = 2n(t−1) and then
 T −1 
 t−1
    
T −1

Computations = k 2 n(t−1)
k+2 n(t−1)
=k 2 n(t−1)
+ 2 2n(t−1)

i t=1 i t=1 t=1

2nT − 2 n
22nT − 22n
=k +
i
2 −1
n
2n − 1
 nT  
2 − 2n  nT  n
n 2 −1
 
=n n 1 + 2 − 2 2n = Θ n4n(T −1) 
2 −1 2 −1

Therefore, the ratio between the complete information model with and without trembles
is of order
 
Θ n4n(T −1) 1
 n
 = × 4n(T −2) 
Θ nT 4 T
and for the incomplete information model, the equivalent ratio is
 
Θ n16n(T −1) 1
 n
 = × 16n(T −2) 
Θ nT 16 T

The NP-hardness result is shown in Hazla, Jadbabaie, Mossel, and Rahimian (2017).
Q.E.D.
MODELS OF SOCIAL LEARNING ON NETWORKS 29

A.2. Proof of Theorem 2


A.2.1. Bounds for Finite Graphs
For the given metric space (Ω d), we denote B(i r) to be the open ball centered
at i ∈ Ω !
with radius r > 0. The model is Euclidean if Ω ⊆ Rh is an open set and
k
d(i j) := i=1 (xi − yi ) . The results in this section use a Euclidean model with h = 2
2

and uniform Poisson intensity; f (i) = 1 for all i ∈ Ω. However, all results are easily
generalizable for any intensity function f , and non-Euclidean models (we clarify this
below) with higher dimensions. For any measurable A ⊆ Ω, define the random vari-
able nA = {number of nodes i ∈ A}. The Poisson point process assumption implies that
nA ∼ Poisson(λμ(A)), where μ(·) is the Borel measure over Rh . For any potential node
j ∈ Ω, define dj (A) := {number of links j has with nodes i ∈ A}. dj = dj (Ω) denotes the
total number of links j has (i.e., its degree).
Define ν := E{nA } with A = B(i r) as the expected number of nodes in a “local neigh-
borhood,” which is ν := λπr 2 in the Euclidean model with h = 2.28 Define also the
volume of Ω simply as its measure; that is, vol(Ω) := μ(Ω). It is also useful to define
ω := vol(Ω)/ν, so that the expected number of nodes on the graph can be expressed as
E[nΩ ] = λ vol(Ω) = ν × ω.
A local clan is a non-trivial clan C ⊂ V (i.e., with #C ≥ 2) where the probability of a
link forming between any pair {i j} ∈ C is α. A necessary condition for C to be a local
clan is that C ⊂ L := B(i 2r ) for some i ∈ Ω. With the above definitions, C ⊆ L is a local
clan if #C ≥ 2 and, for all j ∈ C, dj (L) ≥ dj (Ω\L). The goal of this section is to provide
lower and upper bounds for the event
 
BL := g = (V  E) : C = V ∩ L is a local clan
 " 

= g = (V  E) : #C ≥ 2 and dj (L) ≥ dj (Ω\L) 
j∈C

PROPOSITION A.1: Suppose ω > 94 and take i ∈ Ω such that B(i 32 r) ⊆ Ω, and let L =
B(i 2r ). Then,
 
P g = (V  E) : C = V ∩ L is a local clan
∞  n −ν/4
 ν e
≥ F ∗ (n − 1)n × αn(n−1)/2 > 0 (A.1)
n=2
4 n!

where F ∗ (·) is the cdf of a Poisson random variable d ∗ with expected value E(d ∗ ) = (2α +
(ω − 94 )β) × ν. Moreover,
∞  d
   ν e−αν/4
P g = (V  E) : C = V ∩ L is a local clan ≤ α F̂(d) (A.2)
d=1
4 d!

where F̂(·) is the marginal cdf of dj (Ω\L) for any j ∈ C, a Poisson distribution with
E[dj (Ω\L)] = ( 34 α + (ω − 1)β) × ν.


28
If h > 2, ν := λ × (R π)h / (1 + h/2).
30 A. G. CHANDRASEKHAR, H. LARREGUY, AND J. P. XANDRI

PROOF: See Online Appendix E. Q.E.D.

From Proposition A.1, we get simpler upper and lower bounds, which are useful when
proving Theorem 2. Specifically, if αν < 4, we can bound the probability of this event by
 2 −ν/4
ν e αν
F ∗ (1)2 α ≤ P(BL ) ≤ e−αν  (A.3)
4 2 4 − αν

This implies that, if α ≈ 0, P(BL ) ≈ 0, which we use in the next subsection.

A.2.2. Sparsity and Asymptotics


For a given mixed model, the degree of any given node is given by the random vari-
able dj = dj (Ω) = d(Ω). Since dj = dj (B(j r)) + dj (Ω\B(j r)) is a sum of independent
Poisson random variables, so dj is also Poisson, with expectation
# $  # $ # $
E(dj ) = αλμ B(j r) + βλ μ(Ω) − μ B(j r) = α + (ω − 1)β × ν

We now consider a sequence of models (Ωk  αk  βk ) with ωk → ∞. A sequence is sparse


if E(dj ) → d∞ < ∞ as ω → ∞. For that to be the case, we then need that
# $
lim αk + (ωk − 1)βk × ν = d∞ 
k→∞

which can only happen if βk = O(ω−1 k ); that is, βk /ωk → ρ∞ . We also look only for se-
quences with αk → α∞ , so that d∞ = α∞ + ρ∞ .
In the next proposition, we show the main result of this section, which has Theorem 2
as a direct corollary.

PROPOSITION A.2: Consider a sequence (αk  βk ), where ωk → ∞, βk /ωk → ρ∞ > 0,


and αk → α∞ , and let L := B(i 32 r) ⊆ Ωk for large enough k. Then

  > 0 if α∞ > 0
lim P g = (V  E) : under (αk  βk ) C = V ∩ L is a local clan
k→∞ = 0 if α∞ = 0

PROOF: Denote dk∗ = Poisson[(2α+(ωk − 94 )βk )×ν] with cdf Fk∗ (·). Then Fk∗ (d) →k→∞
∗ ∗
F (d) is the cdf of d∞
∞ = Poisson[(2α∞ + ρ∞ ) × ν]. Moreover, for a large enough k so that
L ⊆ Ωk ,
 2 −ν/4
  ν e
P g = (V  E) : under (αk  βk ) C = V ∩ L is a local clan ≥ F ∗ (1)2 × αk 
4 2 k

so that
 2 −ν/4
  ν e
lim P g = (V  E) : under (αk  βk ) C = V ∩ L is a local clan ≥ F ∗ (1)2 × α∞ 
k→∞ 4 2 ∞

Since F∞ (1) = [1 + (2α∞ + ρ∞ )ν]e−(2α∞ +ρ∞ )ν > 0, this limit is strictly bigger than zero
when α∞ > 0.
MODELS OF SOCIAL LEARNING ON NETWORKS 31

When α∞ = 0, we need to show that the upper bound A.3 is 0, showing that no local
clans can appear in the limit. Expression A.3 implies that for any k : αk < ν/4 (which
exists, since αk → 0), then
 
lim P g = (V  E) : under (αk  βk ) C = V ∩ L is a local clan
k→∞

αk ν
≤ lim e−αk ν/4 × = 0
k→∞ 4 − αk ν Q.E.D.

REFERENCES
ACEMOGLU, D., M. A. DAHLEH, I. LOBEL, AND A. OZDAGLAR (2011): “Bayesian Learning in Social Net-
works,” Review of Economic Studies, 78, 1201–1236. [5]
ALT, J. E., A. JENSEN, H. LARREGUY, D. D. LASSEN, AND J. MARSHALL (2019): “Contagious Political Con-
cerns: Identifying Unemployment Information Shock Transmission Using the Danish Population Network,”
Working Paper. [15]
AUMANN, R. J. (1976): “Agreeing to Disagree,” The Annals of Statistics, 1236–1239.
BANERJEE, A. (1992): “A Simple Model of Herd Behavior,” The Quarterly Journal of Economics, 107, 797–817.
[5]
BANERJEE, A., A. G. CHANDRASEKHAR, E. DUFLO, AND M. O. JACKSON (2013): “The Difussion of Microfi-
nance,” Science, 341, 1236498. [15]
(2018): “Changes in Social Network Structure in Response to Exposure to Formal Credit Markets,”
Available at SSRN 3245656. [5]
(2019): “Using Gossips to Spread Information: Theory and Evidence From Two Randomized Con-
trolled Trials,” The Review of Economic Studies. [15]
BIKHCHANDANI, S., D. HIRSHLEIFER, AND I. WELCH (1992): “A Theory of Fads, Fashion, Custom and Cultural
Change as Information Cascades,” Journal of Political Economy, 100, 992–1026. [5]
CHANDRASEKHAR, A. G., H. LARREGUY, AND J. P. XANDRI (2020): “Supplement to ‘Testing Models of Social
Learning on Networks: Evidence From Two Experiments’,” Econometrica Supplemental Material, 88, https:
//doi.org/10.3982/ECTA14407. [7,8]
CHOI, S., D. GALE, AND S. KARIV (2005): “Behavioral Aspects of Learning in Social Networks: An Experi-
mental Study,” Advances in Applied Microeconomics: A Research Annual, 13, 25–61. [5]
(2012): “Social Learning in Networks: A Quantal Response Equilibrium Analysis of Experimental
Data,” Review of Economic Design, 16, 93–118. [4,9,24]
CHUNG, F. R. K. (1997): Spectral Graph Theory, 92. American Mathematical Society. [5,14]
CONLEY, T., AND C. UDRY (2010): “Learning About a New Technology: Pineapple in Ghana,” The American
Economic Review, 100, 35–69. [15]
CORAZZINI, L., F. PAVESI, B. PETROVICH, AND L. STANCA (2012): “Influential Listeners: An Experiment on
Persuasion Bias in Social Networks,” European Economic Review, 56, 1276–1288. [5]
CRUZ, C., J. LABONNE, AND P. QUERUBIN (2018): “Politician Family Networks and Electoral Outcomes: Evi-
dence From the Philippines,” The American Economic Review, 107, 3006–3037. [15]
DEGROOT, M. H. (1974): “Reaching a Consensus,” Journal of the American Statistical Association, 69, 118–121.
[2]
DEMARZO, P., D. VAYANOS, AND J. ZWIEBEL (2003): “Persuasion Bias, Social Influence, and Unidimensional
Opinions,” The Quarterly Journal of Economics, 118, 909–968. [2]
DUARTE, R., F. FINAN, H. LARREGUY, AND L. SCHECHTER (2019): “Networks, Information and Vote Buying,”
Working Paper. [15]
ERDŐS, P., AND A. RÉNYI (1959): “On Random Graphs, I,” Publicationes Mathematicae (Debrecen), 6, 290–
297. [3,13]
EYSTER, E., AND M. RABIN (2014): “Extensive Imitation Is Irrational and Harmful,” The Quarterly Journal of
Economics, 129, 1861–1898. [5]
FAFCHAMPS, M., AND F. GUBERT (2007): “The Formation of Risk Sharing Networks,” Journal of Development
Economics, 83, 326–350. [12]
FELDMAN, M., N. IMMORLICA, B. LUCIER, AND S. M. WEINBERG (2014): “Reaching Consensus via Non-
Bayesian Asynchronous Learning in Social Networks,” in Approximation, Randomization, and Combinatorial
Optimization. Algorithms and Techniques, 192. [2,5]
GALE, D., AND S. KARIV (2003): “Bayesian Learning in Social Networks,” Games and Economic Behavior, 45,
329–346. [2]
32 A. G. CHANDRASEKHAR, H. LARREGUY, AND J. P. XANDRI

GEANAKOPLOS, J. (1994): “Common Knowledge,” Handbook of Game Theory with Economic Applications, 2,
1437–1496.
GOLUB, B., AND M. JACKSON (2010): “Naive Learning in Social Networks and the Wisdom of Crowds,” Amer-
ican Economic Journal: Microeconomics, 2, 112–149. [2,26]
GRAHAM, B. S. (2017): “An Econometric Model of Network Formation With Degree Heterogeneity,” Econo-
metrica, 85, 1033–1063. [12]
HAZLA, J., A. JADBABAIE, E. MOSSEL, AND M. A. RAHIMIAN (2017): “Bayesian Decision Making in Groups
Is Hard,” arXiv:1705.04770. [28]
JACKSON, M. O., T. RODRIGUEZ-BARRAQUER, AND X. TAN (2012): “Social Capital and Social Quilts: Network
Patterns of Favor Exchange,” American Economic Review, 102, 1857–1897. [4,25]
JADBABAIE, A., P. MOLAVI, A. SANDRONI, AND A. TAHBAZ-SALEHI (2012): “Non-Bayesian Social Learning,”
Games and Economic Behavior, 76, 210–225. [2]
LIGGETT, T. M. (1985): Interacting Particle Systems. New York: Springer-Verlag. [2]
LOBEL, I., AND E. D. SADLER (2015): “Information Diffusion in Networks Through Social Learning,” Theo-
retical Economics, 10, 807–851. [5]
MCCORMICK, T. H., AND T. ZHENG (2015): “Latent Surface Models for Networks Using Aggregated Relational
Data,” Journal of the American Statistical Association, 110, 1684–1695. [12,13]
MENAGER, L. (2006): “Consensus, Communication and Knowledge: An Extension With Bayesian Agents,”
Mathematical Social Sciences, 51, 274–279. [12]
MENGEL, F., AND V. GRIMM (2012): “An Experiment on Learning in a Multiple Games Environment,” Journal
of Economic Theory, 147, 2220–2259. [6]
MOBIUS, M., T. PHAN, AND A. SZEIDL (2015): “Treasure Hunt: Social Learning in the Field”. [6]
MORRIS, S. (2000): “Contagion,” The Review of Economic Studies, 67, 57–78.
MOSSEL, E., AND O. TAMUZ (2010): “Effcient Bayesian Learning in Social Networks With Gaussian Estima-
tors,” arXiv:1002.0747. [2,5]
MOSSEL, E., J. NEEMAN, AND O. TAMUZ (2014a): “Majority Dynamics and Aggregation of Information in
Social Networks,” Autonomous Agents and Multi-Agent Systems, 28, 408–429. [5]
MOSSEL, E., A. SLY, AND O. TAMUZ (2014b): “Asymptotic Learning on Bayesian Social Networks,” Probability
Theory and Related Fields, 158, 127–157. [12,26]
(2015): “Strategic Learning and the Topology of Social Networks,” Econometrica, 83, 1755–1794. [2,
5,9,12]
MUELLER-FRANK, M., AND C. NERI (2013): Social Learning in Networks: Theory and Experiments. [6]
OSBORNE, M. J., AND A. RUBINSTEIN (1994): A Course in Game Theory. Cambrdige, MA: MIT Press.
PENROSE, M. (2003): Random Heometric Graphs, 5. Oxford University Press. [3,13]
SMITH, L., AND P. SORENSEN (2000): “Pathological Outcomes of Observational Learning,” Econometrica, 68,
371–398. [5]

Co-editor Fabrizio Zilibotti handled this manuscript.


Manuscript received 16 May, 2016; final version accepted 13 May, 2019; available online 14 August, 2019.

You might also like