GraFRank WWW2021
GraFRank WWW2021
ABSTRACT 1 INTRODUCTION
Graph Neural Networks (GNNs) have recently enabled substantial Learning latent user representations has become increasingly im-
advances in graph learning. Despite their rich representational ca- portant in advancing user understanding, with widespread adop-
pacity, GNNs remain under-explored for large-scale social modeling tion in various industrial settings, e.g., video recommendations on
applications. One such industrially ubiquitous application is friend YouTube [10], pin suggestions on Pinterest [50] etc. The user repre-
suggestion: recommending users other candidate users to befriend, sentations learned using deep models are effective at complementing
to improve user connectivity, retention and engagement. However, or replacing conventional collaborative filtering methods [22], and
modeling such user-user interactions on large-scale social platforms are versatile, e.g., the embeddings can be used to suggest friendships
poses unique challenges: such graphs often have heavy-tailed de- and also infer profile attributes (e.g., age, gender) in social networks.
gree distributions, where a significant fraction of users are inactive Learning latent representations of nodes in graphs has promi-
and have limited structural and engagement information. More- nent applications in multiple academic settings, such as link pre-
over, users interact with different functionalities, communicate with diction [53], community detection [8], and industrial recommender
diverse groups, and have multifaceted interaction patterns. systems, including e-commerce [44, 45], content discovery [49, 50],
We study the application of GNNs for friend suggestion, pro- and food delivery [24]. Graph Neural Networks (GNNs) [47] have
viding the first investigation of GNN design for this task, to our emerged as a popular graph representation learning paradigm due
knowledge. To leverage the rich knowledge of in-platform actions, to their ability to learn representations combining graph structure
we formulate friend suggestion as multi-faceted friend ranking with and node/link attributes, without relying on expensive feature engi-
multi-modal user features and link communication features. We neering. GNNs can be formulated as a message passing framework
design a neural architecture GraFRank to learn expressive user rep- where node representations are learned by propagating features
resentations from multiple feature modalities and user-user interac- from local graph neighborhoods via trainable neighbor aggregators.
tions. Specifically, GraFRank employs modality-specific neighbor Recently, GNNs have demonstrated promising results in a few indus-
aggregators and cross-modality attentions to learn multi-faceted user trial systems designed for item recommendations in bipartite [50]
representations. We conduct experiments on two multi-million user or multipartite [49] user-to-item interaction graphs.
datasets from Snapchat, a leading mobile social platform, where Despite their rich representational ability, GNNs have been rela-
GraFRank outperforms several state-of-the-art approaches on can- tively unexplored in large-scale user-user social interaction modeling
didate retrieval (by 30% MRR) and ranking (by 20% MRR) tasks. applications, like friend suggestion. Recommending new potential
Moreover, our qualitative analysis indicates notable gains for criti- friends to encourage users to expand their networks, is a corner-
cal populations of less-active and low-degree users. stone of social networking, and plays an important role towards
user retention, and promoting engagement within the platform.
CCS CONCEPTS Prior efforts typically formulate friend suggestion as link predic-
• Information systems → Social networking sites; Social net- tion (or matrix completion) with a rich literature of graph-based
works; Social recommendation; Social networks; • Human- heuristics [31] to quantify user-user affinity, e.g., two users are more
centered computing → Social recommendation. likely to connect if they have many common friends. A few GNN
models target link prediction by learning aggregators over enclos-
KEYWORDS ing subgraphs around each candidate link [52–54]; such models
do not scale to industry-scale social graphs with over millions of
Graph Neural Network, Social Network, Recommendation System
nodes and billions of edges. Still, GNNs have enormous potential
ACM Reference Format: for learning expressive user representations in social networks, due
Aravind Sankar and Yozen Liu, Jun Yu, Neil Shah. 2021. Graph Neural to their intuitive message-passing paradigm that enables attention
Networks for Friend Ranking in Large-scale Social Platforms. In Proceedings to social influence from friends in their ego-network.
of the Web Conference 2021 (WWW ’21), April 19–23, 2021, Ljubljana, Slovenia.
Yet, designing GNNs for friend recommendations in large-scale
ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3442381.3450120
social platforms poses unique challenges. First, social networks are
This paper is published under the Creative Commons Attribution 4.0 International characterized by heavy-tailed degree distributions, e.g., many net-
(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their works approximately follow power-law distributions [3]. This poses
personal and corporate Web sites with the appropriate attribution.
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia
a key challenge of limited structural information for a significant
© 2021 IW3C2 (International World Wide Web Conference Committee), published proportion of users with very few friends. A related challenge is
under Creative Commons CC-BY 4.0 License. activity sparsity where a very small fraction of users actively form
ACM ISBN 978-1-4503-8312-7/21/04.
https://doi.org/10.1145/3442381.3450120
new friendships at any given time. Second, modern social platforms
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Sankar et al.
offer a multitude of avenues for users to interact, e.g., users can 2 RELATED WORK
communicate with friends either by directly exchanging messages We briefly review a few related lines of work on friend recommen-
and pictures, or through indirect social actions by liking and com- dations, graph neural networks, and multi-modal learning.
menting on posts. Extracting knowledge from such heterogeneous Friend Recommendation: The earliest methods were carefully
in-platform user actions is challenging, yet extremely valuable to designed graph-based heuristics of user-user proximity in social
address sparsity challenges for a vast majority of inactive users. networks [31], e.g., path-based Katz centrality [26] or common
Present Work: We overcome structural and interactional spar- neighbor-based Adamic/Adar [1]. Supervised learning techniques
sity by exploiting the rich knowledge of heterogeneous in-platform exploited a collection of such pairwise features to train ranking
actions. We formulate friend recommendation on social networks models [12, 34]. However, extracting heuristic features on-the-fly
as multi-faceted friend ranking on an evolving friendship graph, for each potential link is infeasible in large-scale evolving networks.
with multi-modal user features and link communication features Recently, graph embedding methods learn latent node represen-
(Figure 1). We represent users with heterogeneous feature sets tations to capture the structural properties of a node and its neigh-
spanning multiple modalities, that include a collection of static pro- borhoods [11], e.g., popular embedding models like node2vec [16]
file attributes (e.g., demographic information) and time-sensitive and Deepwalk [35] learn unsupervised embeddings to maximize
in-platform activities (e.g., content interests and interactions). We the likelihood of co-occurrence in fixed-length random walks, and
also leverage pairwise link features on existing friendships, which have shown effective link prediction performance. Since graph
capture recent communication activities across multiple direct (e.g., embedding methods learn latent embeddings per node, the num-
messages) and indirect (e.g., stories) channels within the platform. ber of model parameters scales with the size of the graph, which is
To understand the complexity of user interactions and gain in- prohibitive for large-scale networks with over multi-million users.
sights into various factors impacting friendship formation, we con- A related direction is social recommendation [13, 29, 38], which
duct an empirical analysis to investigate attribute homophily with utilizes the social network as an auxiliary data source to model
respect to different user feature modalities. Our analysis reveals user behavior in social platforms [39, 42, 48] and improve quality
diverse homophily distributions across modalities and users, and indi- of item recommendations to users. In contrast, our problem, friend
cates non-trivial cross-modality correlations. Motivated by these ob- suggestion is a user-user recommendation task that is complemen-
servations, we design an end-to-end GNN architecture, GraFRank tary to social recommendation, since it facilitates creating a better
(Graph Attentional Friend Ranker) for multi-faceted friend ranking. social network of users to benefit social recommendation.
GraFRank learns user representations by modality-specific neigh- Graph Neural Networks: GNNs learn node representations by
bor aggregation and cross-modality attention. We handle hetero- recursively propagating features (i.e., message passing) from local
geneity in modality homophily by learning modality-specific neigh- neighborhoods through the use of aggregation and activation func-
bor aggregators to compute a set of representations for each user; tions [36, 47]. Graph Convolutional Networks (GCNs) [28] learn
the aggregator is modeled by friendship attentions to capture the degree-weighted aggregators by operating on the graph Lapla-
influence of individual features and pairwise communications. We cian. Many models generalize GCN with a variety of learnable
introduce a cross-modality attention module to compute the final aggregators, e.g., self-attentions [43], mean and max pooling func-
user representation by attending over the modality-specific repre- tions [17, 18]; these approaches have consistently outperformed em-
sentations of each user, thereby learning non-linear correlations bedding techniques based upon random walks [16, 35]. In contrast
across modalities. We summarize our key contributions below: to most GNN models that store the entire graph in GPU memory,
• Graph-Neural Friend Ranking: To our knowledge, ours is the GraphSAGE [17] is an inductive variant that reduces memory foot-
first work to investigate graph neural network usage and design print by sampling a fixed-size set of neighbors in each GNN layer.
for social user-user interaction modeling applications. Unlike A few scalable extensions include minibatch training with variance
prior work that typically view friend recommendation as struc- reduction [6, 23], subgraph sampling [51], and graph clustering [9].
tural link prediction, we present a novel formulation with multi- Despite the successes of GNNs, very few industrial systems have
modal user features and link features, to leverage knowledge of developed large-scale GNN implementations. One recent system,
rich heterogeneous user activities in social networking platforms. PinSage [50] extends GraphSAGE to user-item bipartite graphs in
• GraFRank Model: Motivated by our empirical study that re- Pinterest; MultiSage [49] extends PinSage to multipartite graphs.
veals heterogeneity in modality homophily and cross-modality However, GNNs remain unexplored for large-scale user-user so-
correlations, we design a neural architecture, GraFRank, to learn cial modeling applications where users exhibit multifaceted behav-
multi-faceted user representations. Distinct from conventional iors by interacting with different functionalities on social platforms.
GNNs operating on a single feature space, GraFRank learns from In our work, we design GNNs for the important application of friend
multiple feature modalities and user-user interactions. suggestion, through a novel multi-faceted friend ranking formulation
• Robust Experimental Results: Our extensive experiments on with multi-modal user features and link communication features.
two large-scale datasets from a popular social networking plat- Multi-Modal Learning: Deep learning techniques have been
form Snapchat, indicate significant gains for GraFRank over explored for multi-modal feature fusion over diverse modalities
state-of-the-art baselines on friend candidate retrieval (relative such as text, images, video, and graphs [14, 25]. Specifically, multi-
MRR gains of 30%) and ranking (relative MRR gains of 20%) tasks. modal extensions of GNNs have recently been examined in micro-
Our qualitative analysis indicates stronger gains for a large, but video recommendation [46] and urban computing [15] applications.
especially crucial population of less-active and low-degree users. Unlike prior work that regard modalities as largely independent
Graph Neural Networks for Friend Ranking in Large-scale Social Platforms WWW ’21, April 19–23, 2021, Ljubljana, Slovenia
Modality Homophily
0.75
0.6 (Section 4.2), followed by model training details (Section 4.3)
0.5
0.60 0.0
stand the extent and variance of attribute homophily with respect
User Feature Modality Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5
User Cluster Centroids to the different user feature modalities. We begin by analyzing
Figure 2: Users exhibit different extents of homophily across users’ ego-networks to characterize modality homophily, both over-
feature modalities. (a) Overall modality homophily scores, all and broken-down across different user segments. The definition
with 95% confidence interval bands (b) five representative of modality homophily echoes the standard definition of attribute
cluster centroids identified by clustering users based on homophily [33], but generalized to include a modality of attributes,
their homophily distributions over the 𝐾 modalities. i.e., the tendency of users in a social graph to associate with others
who are similar to them along attributes of a certain modality.
We formally define the problem of multi-faceted friend ranking We define a homophily measure m𝑢𝑣 𝑘 between a user 𝑢 and her
in large-scale social platforms, over friendship graph G with multi- friend 𝑣 on modality 𝑘 by the standard cosine similarity [2], which
modal user features and pairwise link features, as follows: is a normalized metric that accounts for heterogeneous activity
across users. We compute a modality homophily score m𝑢𝑘 for user
Problem (Multi-Faceted Friend Ranking). Leverage multi-
𝑢 on modality 𝑘 by the mean over all her neighbors, defined by:
modal user features {x𝑠𝑣 : 𝑣 ∈ V, 1 ≤ 𝑠 ≤ 𝑆 }, link features {e𝑢𝑣 𝑠 :
𝑠 = T (𝑡), (𝑢, 𝑣, 𝑡) ∈ E} and friendship graph G, to generate user
1 Õ 𝑘
representations {h𝑣 (𝑡) ∈ R𝐷 : 𝑣 ∈ V} at time 𝑡, that facilitate friend m𝑢𝑘 = m𝑢𝑣 𝑘
m𝑢𝑣 = 𝑐𝑜𝑠 (x𝑢𝑘 , x𝑘𝑣 ) (2)
suggestion tasks of candidate retrieval and re-ranking. |𝑁𝑢 |
𝑣 ∈𝑁𝑢
Normalized Mutual Information (NMI) Adjusted Rand Index (ARI) Chat Rate Snap Rate
1.0 1.0 30 16
M0 M0 14
0.9 25
Content Interests
𝒙*!
Friending Activity 4
5
u
1 𝒙"!
… 4
5
u
1 𝒙$
!
2 " #
𝒙"" 𝑓$%% 𝒙"#
2
𝒙"$
𝑓$%% 𝒙#$
𝒙&! 3
… Modality Attention Weights 3
…
𝒆"!" … 𝒆"!# 𝒆$
!" … 𝒆$
!#
Engagement Activity
𝒆 𝒖𝒗
ATTENTION OVER USER FEATURE MODALITIES
𝒉 !(𝑡)
Figure 5: Overall framework of GraFRank: a set of 𝐾 modality-specific neighbor aggregators (parameterized by individual
modality-specific user features and link communication features) to compute 𝐾 intermediate user representations; cross-
modality attention layer to compute final user representations by capturing discriminative facets of each modality.
Figure 4, we find that a vast majority of users communicate with and self-representations of user 𝑢, and further transform the con-
a small percentage (10-20%) of their friends; thus, we posit that catenated embedding through a dense layer 𝐹𝜃𝑘 , defined by:
friendship activeness is critical to precisely model user affinity.
Towards this goal, we incorporate pairwise link communication
features to parameterize both the attentional coefficients and the z𝑢𝑘 (𝑡) = 𝐹𝜃𝑘 m𝑢←𝑢
𝑘
, z𝑢𝑘 𝑡, 𝑁𝑢 (𝑡) (8)
message aggregated from friends in the ego-network. Specifically,
𝑘 𝑘 𝑘
we formulate the message m𝑢←𝑣 𝑘 ∈ R𝐷 from friend 𝑣 to user 𝑢 at = 𝜎 𝑾𝑎 z𝑢 (𝑡, 𝑁𝑢 (𝑡)) || m𝑢←𝑢 + b𝑎 (9)
time 𝑡 as a function of both friend feature x𝑠,𝑘 𝑠
𝑣 and link feature e𝑢𝑣 .
where 𝑾𝑎𝑘 ∈ R𝐷×𝐷 , b𝑎 ∈ R𝐷 are trainable parameters of the
𝑘
m𝑢←𝑣 = 𝑾2𝑘 x𝑠,𝑘
𝑣 + 𝑾𝑒𝑘 e𝑢𝑣
𝑠
+b 𝑠 = T (𝑡) (6) aggregator and 𝜎 denotes the ELU activation function which allows
messages to encode both positive and small negative signals. Empir-
where 𝑾2𝑘 ∈ R𝐷𝑘 ×𝐷 , 𝑾𝑒𝑘 ∈R 𝐸×𝐷 are trainable weight
matrices
ically, we observe significant improvements due to self-connections,
operating on the user and link features respectively, and b ∈ R𝐷 compared to directly using the propagated ego-network represen-
is a learnable bias vector. The attentional co-efficient 𝛼 𝑘 (𝑢, 𝑣, 𝑡) is tation from the neighborhood, as in GCN [28], and GAT [43].
then computed as a function of the user feature x𝑢𝑠,𝑘 and message Higher-order Propagation: We stack multiple neighbor aggre-
embedding m𝑢←𝑣 𝑘 ∈ R𝐷 from friend 𝑣 to user 𝑢, defined by: gation layers to model high-order connectivity information, i.e.,
propagate features from 𝑙-hop neighbors. The inputs to layer 𝑙 de-
pend on the user representations output from layer (𝑙 −1) where the
𝛼 𝑘 (𝑢, 𝑣, 𝑡) = 𝜎 a𝑇𝑘 𝑾1𝑘 x𝑢𝑠,𝑘 || m𝑢←𝑣
𝑘
𝑠 = T (𝑡) (7)
initial (i.e., “layer 0”) representations are set to the input user fea-
tures in modality 𝑘. By stacking 𝑙 layers, we recursively formulate
where 𝜎 is a non-linearity such as LeakyRELU. We similarly 𝑘 of user 𝑢 at the end of layer 𝑙 by:
the representation z𝑢,𝑙
normalize the attentional coefficients 𝛼 𝑘 (𝑢, 𝑣, 𝑡) using Equation 4
and compute the ego-network representation z𝑢𝑘 (𝑡, 𝑁𝑢 (𝑡)) for user
𝑢 on modality 𝑘 using Equation 5 with the message embedding 𝑘 𝑘 𝑘 𝑘 𝑘 𝑘
z𝑢,𝑙 = 𝐹𝜃,𝑙 m𝑢←𝑢,𝑙−1 , 𝑧𝑢,𝑙−1 𝑡, 𝑁𝑢 (𝑡) m𝑢←𝑢,𝑙−1 = 𝑧𝑢,𝑙−1 (10)
𝑘
m𝑢←𝑣 from Equation 6 conditioned on the link features.
Distinct from conventional GNNs [17, 28, 43, 50] that only con- 𝑘
where 𝑧𝑢,𝑙−1 is the representation of user 𝑢 in modality 𝑘 after
sider user x𝑢𝑠,𝑘 and friend x𝑠,𝑘
𝑣 features to parameterize the neighbor 𝑘
𝑠 into (𝑙 − 1) layers and 𝑧𝑢,𝑙−1 (𝑡, 𝑁 𝑣 (𝑡)) denotes the (𝑙 − 1)-ego-network
aggregation, we additionally incorporate the link features e𝑢𝑣
representation of user 𝑢. We apply 𝐿 neighbor aggregation layers
the attentional co-efficient 𝛼 𝑘 (𝑢, 𝑣, 𝑡) (Equation 7) and the message 𝑘 of user 𝑢 in modality 𝑘.
𝑘 to generate the layer-𝐿 representation z𝑢,𝐿
m𝑢←𝑣 (Equation 6) passed from friend 𝑣; this encourages the aggre-
gation to be cognizant of pairwise communications with friends, 4.2.2 Cross Modality Attention. To learn complex non-linear
e.g., passing more messages from the active friendships. Empirically, correlations between different feature modalities, we design a cross-
we observe a boost in friend ranking performance (Section 5.4) due modality attention mechanism. Specifically, we learn modality at-
to our communication-aware message passing strategy. tention weights 𝛽𝑢𝑘 (𝑡) to distinguish the influence of each modality
Message Aggregation: We refine the representation of user 𝑢 𝑘 using a two-layer Multi-Layer Perceptron, defined by:
by aggregating the messages propagated from friends in 𝑁𝑢 (𝑡). In
addition, we consider self-connections m𝑢←𝑢 𝑘 = 𝑾1𝑘 x𝑢𝑠,𝑘 to retain 𝑘 +𝑏
exp a𝑇𝑚 𝑾𝑚 z𝑢,𝐿 𝑚
knowledge of the original features (𝑾1 is the same transformation 𝛽𝑢𝑘 (𝑡) = Í𝐾 (11)
𝑘′ + 𝑏
exp a𝑇𝑚 𝑾𝑚 z𝑢,𝐿
used in Equation 3). Specifically, we concatenate the ego-network 𝑘 ′ =1 𝑚
Graph Neural Networks for Friend Ranking in Large-scale Social Platforms WWW ’21, April 19–23, 2021, Ljubljana, Slovenia
with weights 𝑾𝑚 ∈ R𝐷×𝐷 , a𝑚 ∈ R𝐷 and scalar bias 𝑏𝑚 . The Dataset Region 1 Region 2
final representation h𝑢 (𝑡) ∈ R𝐷 of user 𝑢 is computed by a weighted
# users 3.1 M 17.1 M
aggregation of the layer-𝐿 modality-specific user representations
1 , . . . , z𝐾 }, guided by modality weights 𝛽 𝑘 (𝑡), defined by: # links 286 M 2.36 B
{z𝑢,𝐿 𝑢,𝐿 𝑢 # user features 79 79
𝐾 # link features 6 6
Õ
h𝑢 (𝑡) = 𝛽𝑢𝑘 (𝑡) 𝑾𝑚 z𝑢,𝑙
𝑘
(12) # test set friend requests 46K 340K
𝑘=1 Table 1: Dataset statistics
4.3 Temporal Model Training but not as relevant as the target friend 𝑣. For a 2-layer GNN (𝐿 =
We train GraFRank using a temporal pairwise ranking objective 2), we randomly choose users in the 3-4 hop neighborhood of 𝑢 as
to differentiate positive and negative neighborhoods. We assume hard negatives; 2-hop neighbors are excluded since such friends of
access to training data described by a set of timestamped links L friends are expected to be relevant suggestion candidates owing
created in a window (𝑡𝑠 , 𝑡𝑒 ), where (𝑢, 𝑣, 𝑡) ∈ L is a bi-directional to triadic closure in social networks. In practice, we pre-compute
friendship link between source user 𝑢 and target friend 𝑣 formed at hard negative examples to facilitate efficient model training.
time 𝑡 ∈ (𝑡𝑠 , 𝑡𝑒 ). To train the parameters of GraFRank, we define a We adopt a two-phase learning approach for candidate ranking.
triplet-like learning objective based on max-margin ranking. We pre-train the model on random negatives (as in candidate
retrieval), to identify good model initialization points, followed
4.3.1 Pairwise Ranking Objective. We define a time-sensitive by fine-tuning on hard negatives. Ranking hard negatives is more
ranking loss over the final user embeddings (h𝑢 (𝑡) for user 𝑢 at challenging, hence encouraging the model to progressively learn
time 𝑡) to rank the inner product of positive links (𝑢, 𝑣, 𝑡) ∈ L, friend distinctions at a finer granularity. We empirically show
higher than sampled negatives (𝑢, 𝑛, 𝑡) by a margin factor Δ, as: notable gains on candidate ranking due to our two-phase strategy,
compared to training individually on random or hard negatives.
Õ
𝐿= E𝑛∼𝑃𝑛 (𝑢) max{0, h𝑢 (𝑡)·h𝑛 (𝑡)−h𝑢 (𝑡)·h𝑣 (𝑡)+Δ} (13) 4.3.3 Temporal Neighborhood Sampling. We learn a tempo-
(𝑢,𝑣,𝑡 ) ∈ L ral user representation h𝑢 (𝑡) for user 𝑢 at time 𝑡 by selecting a fixed
where Δ is a margin hyper-parameter and 𝑃𝑛 (𝑢) is the negative number of friends from 𝑁𝑢 (𝑡) for neighbor aggregation in each
sampling distribution for user 𝑢. Here, we use a single forward layer; this controls the memory footprint during training [17].
pass to inductively compute a time-aware representation h𝑢 (𝑡) for To efficiently identify and sample neighbors of 𝑢 at any time 𝑡,
each user 𝑢 ∈ V at time 𝑡 based on the appropriate user and link we represent the time-evolving friendship graph G as a temporal ad-
features in temporal neighborhoods. Each minibatch of training jacency list at its latest time 𝑡𝑠 where each user 𝑢 has a list of (friend,
examples is then optimized independently which precludes the need time) tuples sorted by link creation times. This data representation
to explicitly model temporal dependencies. This generic contrastive enables 𝑂 (log 𝑑) neighbor lookup at an arbitrary timestamp 𝑡 via
learning formulation enables usage of the same framework for binary search where 𝑑 is the average user degree in the graph.
different recommendation tasks such as candidate retrieval and
4.3.4 Multi-GPU Minibatch Training. We train GraFRank us-
ranking, with different negative sampling distributions.
ing minibatches of links from L over multiple GPUs on a single
4.3.2 Candidate Retrieval and Ranking Tasks. We learn user shared memory machine. The temporal adjacency list of G and
embeddings towards two key use-cases in friend ranking: candidate feature matrices 𝑿, 𝑬 are placed in shared CPU memory to enable
retrieval and candidate ranking. Candidate retrieval aims to generate fast parallel neighborhood sampling and feature lookup. We adopt
a list of top-𝑁 (e.g., 𝑁 = 100) potential friend suggestions out of a a producer-consumer framework [50] that alternates between CPUs
very large candidate pool (over millions of users), while candidate and GPUs for model training. A CPU-bound producer constructs
ranking involves fine-grained re-ranking within a much smaller friend neighborhoods, looks up user and link features, and gener-
pool of the generated candidates, to determine the top-𝑛 (e.g., 𝑛 = ates negative samples for the links of a minibatch. We then partition
10) suggestions shown to end users in the platform. We define each minibatch across multiple GPUs, to compute forward passes
different negative sampling distributions 𝑃𝑛 (𝑢) for each task owing and gradients with a PyTorch model over dynamically constructed
to their different ranking granularities, as follows: computation graphs. The gradients from different GPUs are syn-
• Candidate Retrieval: For the coarse-grained task of candidate chronized using PyTorch’s Distributed Data Parallel [30] construct.
retrieval, we uniformly sample five random negative users for
each positive link, from the entire user set V. Generating random 5 EXPERIMENTS
negatives is efficient and effective at quickly training the model To analyze the quality of user representations learned by GraFRank,
to identify potential friend candidates for each user. However, we propose five research questions to guide our experiments:
random negatives are often too easy to distinguish and may not (RQ1 ) Can GraFRank outperform feature-based models and state-
provide the requisite resolution for the model to learn fine-grained of-the-art GNNs on candidate retrieval and ranking tasks?
distinctions necessary for candidate friend ranking. (RQ2 ) How does GraFRank compare to prior work under alterna-
• Candidate Ranking: To enhance model resolution for candidate tive metrics of reciprocated and communicated friendships?
ranking, we also use hard negative examples; for each positive (RQ3 ) How do the different architectural design choices and train-
pair (𝑢, 𝑣), we generate five hard negatives related to the source 𝑢, ing strategies in GraFRank impact performance?
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Sankar et al.
5.3 Experimental Results Metric HR@50 MRR HR@50 MRR HR@50 MRR
We first present our main results comparing GraFRank with com- LogReg 0.5262 0.1751 0.5582 0.2029 0.5495 0.1811
MLP 0.5726 0.1903 0.6006 0.2165 0.6001 0.1979
peting baselines on candidate retrieval and ranking tasks, followed
XGBoost 0.5957 0.2071 0.6286 0.2322 0.6407 0.2274
by comparisons using alternative measures of friendship quality.
GCN 0.5154 0.1034 0.5329 0.1113 0.5273 0.1038
5.3.1 Friend Candidate Retrieval and Ranking (RQ1 ). We GAT 0.5352 0.0960 0.5596 0.1045 0.5654 0.0971
compare friend recommendation performance (based on add re- SAGE + Max 0.6409 0.1797 0.6653 0.2043 0.6670 0.1834
quests) of various approaches on retrieval and ranking in Tables 2 SAGE + Mean 0.6757 0.2333 0.6984 0.2609 0.7056 0.2446
and 3 respectively. Interestingly, we find that SAGE variants outper- GraFrank 0.7533 0.3035 0.7756 0.3367 0.7942 0.3152
form popular GNN models GCN and GAT. A possible explanation is Percent Gains 11.48 % 30.09% 11.05% 29.05% 12.56% 28.86 %
the impact of feature space heterogeneity in social networks and sto- Table 4: Comparison of all models on add, reciprocate and
chastic neighbor sampling; this results in noisy user representations communicate friendship retrieval tasks (reported on Region
for GNN models (GCN, GAT) that recursively aggregate neighbor 1). GraFRank has consistent gains across all tasks.
features without emphasizing self-connections. Preserving knowl-
edge of the original features by concatenating the self-embedding
in each layer results in noticeable gains (SAGE variants). 5.4 Ablation Study (RQ3 )
GraFRank significantly outperforms state-of-the-art approaches In this section, we present an ablation study to analyze the archi-
with over 20-30% relative MRR gains. The performance gains of tectural modeling choices and training strategies in GraFRank.
GraFRank over the best baseline are statistically significant with
𝑝 < 0.01 judged by the paired t-test. In contrast to singular aggre- 5.4.1 Model Architecture. We design three variants to study the
gation over the entire feature space by prior GNNs, GraFRank utilities of communication-aware and modality-specific aggregation.
handles variance in homophily across different modalities through • GraFRank𝑈 𝑀 (User-Modality): We analyze the contribution
modality-specific communication-aware neighbor aggregation. Fur- of link features by parameterizing the 𝑘-th modality aggregator
ther, the final user representations are learnt by a correlation-aware with just the 𝑘-th modality user features (Equation 3). Note that
attention layer to capture discriminative facets of each modality. link features are excluded during neighbor aggregation.
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Sankar et al.
Dataset Retrieval Ranking We make three consistent observations from the performance
comparison (Table 6) across all of the compared GNN models:
Metric HR@50 MRR HR@10 MRR
• Random negative training achieves best results for retrieval, but
(a) GraFRank𝑈 0.6968 0.2346 0.4255 0.2164 performs poorly on ranking; such models lack the resolution to
(b) GraFRank𝑈 𝐿 0.7069 0.2423 0.4450 0.2301 discriminate amongst potential candidates for re-ranking.
(c) GraFRank𝑈 𝑀 0.7239 0.2823 0.4887 0.2480 • Training on hard negatives improves candidate ranking as ex-
GraFRank 0.7533 0.3035 0.5051 0.2669 pected, yet results in poor retrieval performance. Learning fine
Table 5: Model architecture ablation study of GraFRank. profile-oriented distinctions among graph-based neighbors is
Removing (c) link communication features, (b) modality- actually detrimental to the coarse-grained task of retrieval.
specific aggregation, or (a) both, hurt model performance. • Random negative pretraining yields good parameter initialization
points that are more conducive for effective fine-tuning on hard
• GraFRank𝑈 𝐿 (User-Link): To study the effectiveness of learn-
negatives. Fine-tuning improves results for all GNNs over direct
ing modality-specific aggregators, we define a single modality-
hard negative training on ranking, but is ineffective for retrieval.
agnostic aggregator over user feature vectors obtained by con-
catenation across the 𝐾 modalities and link features.
5.5 Training and Sensitivity Analysis (RQ4 )
• GraFRank𝑈 (User): We remove link features from the aggre-
gator in GraFRank𝑈 𝐿 to further test the standalone benefits of In this section, we quantitatively analyze model convergence and
link features in parameterizing a single neighbor aggregator. model sensitivity to sampled neighborhood sizes in GNN models.
The performance of all architectural variations are reported 5.5.1 Model Training Analysis. We investigate the relative abil-
in Table 5. GraFRank𝑈 𝐿 performs much worse than GraFRank ities of different models to optimize the pairwise friend ranking
highlighting the benefits of learning multiple modality-specific objective (Equation 13). We compare the convergence rates of base-
aggregators to account for varying extents of modality homophily. lines MLP, SAGE + Mean, and our model GraFRank under both
Communication-aware neighbor aggregation is effective at iden- random and hard negative training settings, by examining the av-
tifying actively engaged friends during neighbor aggregation; this is erage training loss per epoch in Figures 6 (a) and (b) respectively.
evidenced by the gains of GraFRank over GraFRank𝑈 𝑀 (modality- 0.12
Random Negative Training
0.12
Hard Negative Training
aware user feature aggregation). We find noticeable gains from pa- MLP
SAGE + Mean
MLP
SAGE + Mean
0.10 0.10
rameterizing the aggregator with link features, even in the absence GraFRank GraFRank
Pairwise Ranking Loss
0.04 0.04
GraFRank with random negatives for candidate retrieval, but adopt Figure 6: GraFRank converges faster to better optimization
two-phase hard negative fine-tuning (with random negative pre- minima in random and hard negative settings, which trans-
training) for candidate ranking. To validate our choices, we examine lates to notable gains on both retrieval and ranking tasks.
three model training settings: (a) random negative training, (b) hard
negative training, and (c) fine-tuning (after pretraining on random As expected, all models converge to a lower training loss against
negatives), for two GNN models: SAGE + Mean and GraFRank, random negatives (Figure 6 (a)) when compared to hard negatives
across both candidate retrieval and ranking tasks. Note that train- (Figure 6 (b)). Interestingly, SAGE + Mean shows similar training
ing with combination of random and hard negatives, as proposed convergence as MLP in Figure 6 (b), but achieves better test results;
in [50], is excluded since it consistently performs worse than the this indicates better generalization for GNNs over feature-based
above three strategies on both retrieval and ranking tasks. models. Compared to baselines, GraFRank converges to a better op-
timization minimum under both random and hard negative settings,
Dataset Retrieval Ranking which also generalizes to better test results (Tables 2 and 3).
Metric HR@50 MRR HR@10 MRR
5.5.2 Runtime and Sensitivity Analysis. A key trade-off in train-
SAGE + Mean (random) 0.6757 0.2333 0.3943 0.1923 ing scalable GNN models lies in choosing the size of sampled neigh-
SAGE + Mean (hard) 0.3275 0.0766 0.4330 0.2255 borhoods 𝑇 in each message-passing layer. In our experiments, we
SAGE + Mean (fine-tune) 0.3978 0.0965 0.4561 0.2372 train two-layer GNN models for friend ranking. Figure 8 shows
GraFRank (random) 0.7533 0.3035 0.4655 0.2254 the runtime and performance of SAGE + Mean and GraFRank for
GraFRank (hard) 0.4542 0.1461 0.4823 0.2594 different sizes of sampled neighborhoods 𝑇 from 5 to 20.
GraFRank (fine-tune) 0.5283 0.1871 0.5051 0.2669 Model training time generally increases linearly with 𝑇 , but
has a greater slope after 𝑇 = 15. We also observe diminishing
Table 6: Training strategy comparison of two GNNs across
returns in model performance (MRR) with increase in the size of
retrieval and ranking tasks. Random negative training
sampled neighborhood 𝑇 after 𝑇 = 15. Thus, we select a two-layer
achieves best results for retrieval. Random negative pre-
GNN model with layer-wise neighborhood size of 15, to provide an
training with hard negative fine-tuning benefits ranking.
effective trade-off between computational cost and performance.
Graph Neural Networks for Friend Ranking in Large-scale Social Platforms WWW ’21, April 19–23, 2021, Ljubljana, Slovenia
Figure 7: Visualization of two-dimensional t-SNE transformed user representations from feature-based MLP, and GNN models:
SAGE + Mean, and GraFRank. Users with the same color belong to the same city. Compared to MLP and SAGE + Mean, the
friendship relationships learnt by GraFRank result in well-separated user clusters capturing geographical proximity.
Compared to SAGE + Mean, GraFRank has marginally higher The performance variation across users with different activity
training times, yet achieves significant performance gains (20% levels in Figure 9(b), exhibits more distinctive trends with clear
MRR), justifying the added cost of modality-specific aggregation. gains for GNN models over feature-based MLP and XGBoost for less-
3500
SAGE + Mean
0.32
SAGE + Mean
active users. Significantly, GraFRank has much stronger gains over
GRaFRank GRaFRank
3000
0.30 SAGE + Mean, in less-active user segments, owing to its multi-faceted
Time per Epoch (Seconds)
2500
0.28 modeling of heterogeneous in-platform user actions. GraFRank ef-
2000
fectively overcomes sparsity concerns for less-active users, through
MRR
0.26
1500
0.24
modality-specific neighbor aggregation over multi-modal user fea-
1000
0.22
tures to learn expressive user representations for friend ranking.
500
1.0
0 0.20 0.8 MLP
5 10 15 20 5 10 15 20 XGBoost 0.9
# Sampled neighbors per layer # Sampled neighbors per layer
0.7 SAGE + MEAN
GraFRank
Figure 8: We observe diminishing returns in MRR after 0.8
Hit Rate @ 50
Hit Rate @ 50
0.6
over SAGE + Mean, with marginally higher training times. 0.4 0.6 MLP
XGBoost
0.3 0.5 SAGE + MEAN
GraFRank
0.2 0.4
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
5.6 User Cohort Analysis (RQ5 ) User groups ordered by increasing node degree User groups ordered by increasing friending activity
In this section, we present multiple qualitative analyses to examine Figure 9: GraFRank has significant improvements across all
model performance across user segments with varied node degree user segments, with notably larger gains for low-to-medium
and friending activity levels, and compare t-SNE visualizations of degree users (a), and less-active users (b).
user representations learned by different neural models.
5.6.2 Visualization. To analyze the versatility of learned user em-
5.6.1 Impact of degree and activity. We examine friend recom- beddings, we present a qualitative visualization to compare different
mendation performance across users with different node degrees models on their expressivity to capture geographical user proxim-
and friending activities. Specifically, for each test user, degree is the ity. We randomly select users from three different cities within
number of friends, and activity is the number of friend requests Region 1 and use t-SNE [32] to transform their learned embeddings
sent/received in the past 30 days. We divide the test users into into two-dimensional vectors. Figure 7 compares the visualization
groups independently based on their degree and activity levels. results from different neural models. Evidently, the visualization
We compare GraFRank with feature-based models MLP, XGBoost, learned by MLP does not capture geographical proximity, while
and the best GNN baseline SAGE + Mean. Figures 9(a) and (b) de- the GNN models are capable of grouping users located within the
pict friend candidate retrieval performance HR@50 across user same city. Compared to SAGE + Mean, GraFRank forms even more
segments with different degrees and activities respectively. well-segmented groups with minimal inter-cluster overlap.
From Figure 9(a), overall model performance generally increases
with node degree due to the availability of more structural infor-
mation. GraFRank has significant improvements across all user 6 CONCLUSION
segments, with notably higher gains for low-to-medium degree users This paper investigates graph neural network design for friend
(relative gains of 20%). GraFRank prioritizes active friendships by suggestion in large-scale social platforms. We formulate friend sug-
communication-aware message-passing, which compensates for gestion as multi-faceted friend ranking with multi-modal user fea-
the lack of sufficient local connectivities in the ego-network. tures and link communication features. Motivated by our empirical
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Sankar et al.
insights on user feature modalities, we design a neural architec- [25] Zhiwei Jin, Juan Cao, Han Guo, Yongdong Zhang, and Jiebo Luo. 2017. Multimodal
ture GraFRank that handles heterogeneity in modality homophily fusion with recurrent neural networks for rumor detection on microblogs. In
MM. 795–816.
via modality-specific neighbor aggregators, and learns non-linear [26] Leo Katz. 1953. A new status index derived from sociometric analysis. Psychome-
modality correlations through cross-modality attention. Our exper- trika 18, 1 (1953), 39–43.
[27] Thomas N Kipf and Max Welling. 2016. Variational Graph Auto-Encoders. NIPS
iments on two multi-million user datasets from Snapchat reveal Workshop on Bayesian Deep Learning (2016).
significant improvements in friend candidate retrieval (30% MRR [28] Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph
gains) and ranking (20% MRR gains), with stronger gains for the convolutional networks. In ICLR.
[29] Adit Krishnan, Hari Cheruvu, Cheng Tao, and Hari Sundaram. 2019. A Modular
crucial population of less-active and low-degree users. Although Adversarial Approach to Social Recommendation. In CIKM. ACM, 1753–1762.
our case studies are conducted on a single platform Snapchat, we [30] Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar, Pieter Noordhuis, Teng Li,
expect GraFRank to be directly applicable to popular bidirectional Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, et al. 2020. Pytorch
distributed: Experiences on accelerating data parallel training. arXiv preprint
friending platforms (e.g., Facebook, LinkedIn) with minor adapta- arXiv:2006.15704 (2020).
tions for unidirectional scenarios (e.g., Twitter, Instagram). [31] David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction problem for
social networks. JASIST 58, 7 (2007), 1019–1031.
[32] Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE.
Journal of machine learning research 9, Nov (2008), 2579–2605.
REFERENCES [33] Miller McPherson, Lynn Smith-Lovin, and James M Cook. 2001. Birds of a feather:
[1] Lada A Adamic and Eytan Adar. 2003. Friends and neighbors on the web. Social Homophily in social networks. Annual review of sociology 27, 1 (2001), 415–444.
networks 25, 3 (2003), 211–230. [34] Joshua O’Madadhain, Jon Hutchins, and Padhraic Smyth. 2005. Prediction and
[2] Luca Maria Aiello, Alain Barrat, Rossano Schifanella, Ciro Cattuto, Benjamin ranking algorithms for event-based network data. ACM SIGKDD explorations
Markines, and Filippo Menczer. 2012. Friendship prediction and homophily in newsletter 7, 2 (2005), 23–30.
social media. ACM Transactions on the Web (TWEB) 6, 2 (2012), 1–33. [35] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning
[3] Réka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex of social representations. In KDD. ACM, 701–710.
networks. Reviews of modern physics 74, 1 (2002), 47. [36] Aravind Sankar, Junting Wang, Adit Krishnan, and Hari Sundaram. 2020. Be-
[4] Enrique Amigó, Julio Gonzalo, Javier Artiles, and Felisa Verdejo. 2009. A com- yond Localized Graph Neural Networks: An Attributed Motif Regularization
parison of extrinsic clustering evaluation metrics based on formal constraints. Framework. In ICDM.
Information retrieval 12, 4 (2009), 461–486. [37] Aravind Sankar, Yanhong Wu, Liang Gou, Wei Zhang, and Hao Yang. 2020.
[5] Rianne van den Berg, Thomas N Kipf, and Max Welling. 2017. Graph convolu- Dysat: Deep neural representation learning on dynamic graphs via self-attention
tional matrix completion. arXiv preprint arXiv:1706.02263 (2017). networks. In WSDM. 519–527.
[6] Jianfei Chen, Jun Zhu, and Le Song. 2018. Stochastic Training of Graph Convolu- [38] Aravind Sankar, Yanhong Wu, Yuhang Wu, Wei Zhang, Hao Yang, and Hari
tional Networks with Variance Reduction. In ICML. 942–950. Sundaram. 2020. GroupIM: A Mutual Information Maximization Framework for
[7] Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. Neural Group Recommendation. In SIGIR. 1279–1288.
In KDD. ACM, 785–794. [39] Aravind Sankar, Xinyang Zhang, Adit Krishnan, and Jiawei Han. 2020. Inf-VAE:
[8] Zhengdao Chen, Lisha Li, and Joan Bruna. 2019. Supervised Community Detec- A Variational Autoencoder Framework to Integrate Homophily and Influence in
tion with Line Graph Neural Networks. In ICLR. OpenReview.net. Diffusion Prediction. In WSDM. 510–518.
[9] Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. [40] Neil Shah, Danai Koutra, Tianmin Zou, Brian Gallagher, and Christos Faloutsos.
2019. Cluster-GCN: An efficient algorithm for training deep and large graph 2015. Timecrunch: Interpretable dynamic graph summarization. In KDD. ACM.
convolutional networks. In KDD. ACM, 257–266. [41] Sucheta Soundarajan, Acar Tamersoy, Elias B Khalil, Tina Eliassi-Rad,
[10] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for Duen Horng Chau, Brian Gallagher, and Kevin Roundy. 2016. Generating graph
youtube recommendations. In RecSys. 191–198. snapshots from streaming edge data. In WWW. 109–110.
[11] Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. 2018. A survey on network [42] Xianfeng Tang, Yozen Liu, Neil Shah, Xiaolin Shi, Prasenjit Mitra, and Suhang
embedding. IEEE TKDE 31, 5 (2018), 833–852. Wang. 2020. Knowing your FATE: Friendship, Action and Temporal Explanations
[12] Daizong Ding, Mi Zhang, Shao-Yuan Li, Jie Tang, Xiaotie Chen, and Zhi-Hua for User Engagement Prediction on Social Apps. In KDD. ACM, 2269–2279.
Zhou. 2017. Baydnn: Friend recommendation with bayesian personalized ranking [43] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro
deep neural network. In CIKM. 1479–1488. Liò, and Yoshua Bengio. 2018. Graph Attention Networks. ICLR (2018).
[13] Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. [44] Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun
2019. Graph neural networks for social recommendation. In The World Wide Web Lee. 2018. Billion-scale commodity embedding for e-commerce recommendation
Conference. 417–426. in alibaba. In KDD. ACM, 839–848.
[14] Golnoosh Farnadi, Jie Tang, Martine De Cock, and Marie-Francine Moens. 2018. [45] Menghan Wang, Yujie Lin, Guli Lin, Keping Yang, and Xiao-Ming Wu. 2020.
User profiling through deep multimodal fusion. In WSDM. 171–179. M2GRL: A Multi-task Multi-view Graph Representation Learning Framework
[15] Xu Geng, Xiyu Wu, Lingyu Zhang, Qiang Yang, Yan Liu, and Jieping Ye. 2019. for Web-scale Recommender Systems. In KDD. ACM, 2349–2358.
Multi-modal graph interaction for multi-graph convolution network in urban [46] Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng
spatiotemporal forecasting. arXiv preprint arXiv:1905.11395 (2019). Chua. 2019. MMGCN: Multi-modal graph convolution network for personalized
[16] Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for recommendation of micro-video. In MM. 1437–1445.
networks. In KDD. ACM, 855–864. [47] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and
[17] Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE
learning on large graphs. In NIPS. 1024–1034. TNNLS (2020).
[18] William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation learning [48] Yuxin Xiao, Adit Krishnan, and Hari Sundaram. 2020. Discovering strategic
on graphs: Methods and applications. arXiv preprint arXiv:1709.05584 (2017). behaviors for collaborative content-production in social networks. In WWW.
[19] John A Hartigan and Manchek A Wong. 1979. Algorithm AS 136: A k-means 2078–2088.
clustering algorithm. Journal of the royal statistical society. 28, 1 (1979), 100–108. [49] Carl Yang, Aditya Pal, Andrew Zhai, Nikil Pancha, Jiawei Han, Charles Rosenberg,
[20] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The elements of and Jure Leskovec. 2020. MultiSage: Empowering GCN with Contextualized Multi-
statistical learning: data mining, inference, and prediction. Springer Science & Embeddings on Web-Scale Multipartite Networks. In KDD. ACM, 2434–2443.
Business Media. [50] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton,
[21] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale
Chua. 2017. Neural collaborative filtering. In WWW. 173–182. recommender systems. In KDD. ACM, 974–983.
[22] Jonathan L Herlocker, Joseph A Konstan, Loren G Terveen, and John T Riedl. [51] Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Vik-
2004. Evaluating collaborative filtering recommender systems. ACM TOIS 22, 1 tor K. Prasanna. 2020. GraphSAINT: Graph Sampling Based Inductive Learning
(2004), 5–53. Method. In ICLR. OpenReview.net.
[23] Wenbing Huang, Tong Zhang, Yu Rong, and Junzhou Huang. 2018. Adaptive [52] Muhan Zhang and Yixin Chen. 2017. Weisfeiler-lehman neural machine for link
sampling towards fast graph representation learning. In Advances in neural prediction. In KDD. ACM, 575–583.
information processing systems. 4558–4567. [53] Muhan Zhang and Yixin Chen. 2018. Link prediction based on graph neural
[24] Ankit Jain, Isaac Liu, Ankur Sarda, , and Piero Molino. 2019. Food Discovery networks. In Advances in Neural Information Processing Systems. 5165–5175.
with Uber Eats: Recommending for the Marketplace. (2019). https://eng.uber. [54] Muhan Zhang and Yixin Chen. 2020. Inductive Matrix Completion Based on
com/uber-eats-graph-learning/ Graph Neural Networks. In ICLR. OpenReview.net.