0% found this document useful (0 votes)
19 views12 pages

GraFRank WWW2021

Uploaded by

Văn Anh Chu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views12 pages

GraFRank WWW2021

Uploaded by

Văn Anh Chu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Graph Neural Networks for Friend Ranking

in Large-scale Social Platforms


Aravind Sankar Yozen Liu, Jun Yu, Neil Shah
University of Illinois at Urbana-Champaign Snap Inc.
[email protected] {yliu2,jyu3,nshah}@snap.com

ABSTRACT 1 INTRODUCTION
Graph Neural Networks (GNNs) have recently enabled substantial Learning latent user representations has become increasingly im-
advances in graph learning. Despite their rich representational ca- portant in advancing user understanding, with widespread adop-
pacity, GNNs remain under-explored for large-scale social modeling tion in various industrial settings, e.g., video recommendations on
applications. One such industrially ubiquitous application is friend YouTube [10], pin suggestions on Pinterest [50] etc. The user repre-
suggestion: recommending users other candidate users to befriend, sentations learned using deep models are effective at complementing
to improve user connectivity, retention and engagement. However, or replacing conventional collaborative filtering methods [22], and
modeling such user-user interactions on large-scale social platforms are versatile, e.g., the embeddings can be used to suggest friendships
poses unique challenges: such graphs often have heavy-tailed de- and also infer profile attributes (e.g., age, gender) in social networks.
gree distributions, where a significant fraction of users are inactive Learning latent representations of nodes in graphs has promi-
and have limited structural and engagement information. More- nent applications in multiple academic settings, such as link pre-
over, users interact with different functionalities, communicate with diction [53], community detection [8], and industrial recommender
diverse groups, and have multifaceted interaction patterns. systems, including e-commerce [44, 45], content discovery [49, 50],
We study the application of GNNs for friend suggestion, pro- and food delivery [24]. Graph Neural Networks (GNNs) [47] have
viding the first investigation of GNN design for this task, to our emerged as a popular graph representation learning paradigm due
knowledge. To leverage the rich knowledge of in-platform actions, to their ability to learn representations combining graph structure
we formulate friend suggestion as multi-faceted friend ranking with and node/link attributes, without relying on expensive feature engi-
multi-modal user features and link communication features. We neering. GNNs can be formulated as a message passing framework
design a neural architecture GraFRank to learn expressive user rep- where node representations are learned by propagating features
resentations from multiple feature modalities and user-user interac- from local graph neighborhoods via trainable neighbor aggregators.
tions. Specifically, GraFRank employs modality-specific neighbor Recently, GNNs have demonstrated promising results in a few indus-
aggregators and cross-modality attentions to learn multi-faceted user trial systems designed for item recommendations in bipartite [50]
representations. We conduct experiments on two multi-million user or multipartite [49] user-to-item interaction graphs.
datasets from Snapchat, a leading mobile social platform, where Despite their rich representational ability, GNNs have been rela-
GraFRank outperforms several state-of-the-art approaches on can- tively unexplored in large-scale user-user social interaction modeling
didate retrieval (by 30% MRR) and ranking (by 20% MRR) tasks. applications, like friend suggestion. Recommending new potential
Moreover, our qualitative analysis indicates notable gains for criti- friends to encourage users to expand their networks, is a corner-
cal populations of less-active and low-degree users. stone of social networking, and plays an important role towards
user retention, and promoting engagement within the platform.
CCS CONCEPTS Prior efforts typically formulate friend suggestion as link predic-
• Information systems → Social networking sites; Social net- tion (or matrix completion) with a rich literature of graph-based
works; Social recommendation; Social networks; • Human- heuristics [31] to quantify user-user affinity, e.g., two users are more
centered computing → Social recommendation. likely to connect if they have many common friends. A few GNN
models target link prediction by learning aggregators over enclos-
KEYWORDS ing subgraphs around each candidate link [52–54]; such models
do not scale to industry-scale social graphs with over millions of
Graph Neural Network, Social Network, Recommendation System
nodes and billions of edges. Still, GNNs have enormous potential
ACM Reference Format: for learning expressive user representations in social networks, due
Aravind Sankar and Yozen Liu, Jun Yu, Neil Shah. 2021. Graph Neural to their intuitive message-passing paradigm that enables attention
Networks for Friend Ranking in Large-scale Social Platforms. In Proceedings to social influence from friends in their ego-network.
of the Web Conference 2021 (WWW ’21), April 19–23, 2021, Ljubljana, Slovenia.
Yet, designing GNNs for friend recommendations in large-scale
ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3442381.3450120
social platforms poses unique challenges. First, social networks are
This paper is published under the Creative Commons Attribution 4.0 International characterized by heavy-tailed degree distributions, e.g., many net-
(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their works approximately follow power-law distributions [3]. This poses
personal and corporate Web sites with the appropriate attribution.
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia
a key challenge of limited structural information for a significant
© 2021 IW3C2 (International World Wide Web Conference Committee), published proportion of users with very few friends. A related challenge is
under Creative Commons CC-BY 4.0 License. activity sparsity where a very small fraction of users actively form
ACM ISBN 978-1-4503-8312-7/21/04.
https://doi.org/10.1145/3442381.3450120
new friendships at any given time. Second, modern social platforms
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Sankar et al.

offer a multitude of avenues for users to interact, e.g., users can 2 RELATED WORK
communicate with friends either by directly exchanging messages We briefly review a few related lines of work on friend recommen-
and pictures, or through indirect social actions by liking and com- dations, graph neural networks, and multi-modal learning.
menting on posts. Extracting knowledge from such heterogeneous Friend Recommendation: The earliest methods were carefully
in-platform user actions is challenging, yet extremely valuable to designed graph-based heuristics of user-user proximity in social
address sparsity challenges for a vast majority of inactive users. networks [31], e.g., path-based Katz centrality [26] or common
Present Work: We overcome structural and interactional spar- neighbor-based Adamic/Adar [1]. Supervised learning techniques
sity by exploiting the rich knowledge of heterogeneous in-platform exploited a collection of such pairwise features to train ranking
actions. We formulate friend recommendation on social networks models [12, 34]. However, extracting heuristic features on-the-fly
as multi-faceted friend ranking on an evolving friendship graph, for each potential link is infeasible in large-scale evolving networks.
with multi-modal user features and link communication features Recently, graph embedding methods learn latent node represen-
(Figure 1). We represent users with heterogeneous feature sets tations to capture the structural properties of a node and its neigh-
spanning multiple modalities, that include a collection of static pro- borhoods [11], e.g., popular embedding models like node2vec [16]
file attributes (e.g., demographic information) and time-sensitive and Deepwalk [35] learn unsupervised embeddings to maximize
in-platform activities (e.g., content interests and interactions). We the likelihood of co-occurrence in fixed-length random walks, and
also leverage pairwise link features on existing friendships, which have shown effective link prediction performance. Since graph
capture recent communication activities across multiple direct (e.g., embedding methods learn latent embeddings per node, the num-
messages) and indirect (e.g., stories) channels within the platform. ber of model parameters scales with the size of the graph, which is
To understand the complexity of user interactions and gain in- prohibitive for large-scale networks with over multi-million users.
sights into various factors impacting friendship formation, we con- A related direction is social recommendation [13, 29, 38], which
duct an empirical analysis to investigate attribute homophily with utilizes the social network as an auxiliary data source to model
respect to different user feature modalities. Our analysis reveals user behavior in social platforms [39, 42, 48] and improve quality
diverse homophily distributions across modalities and users, and indi- of item recommendations to users. In contrast, our problem, friend
cates non-trivial cross-modality correlations. Motivated by these ob- suggestion is a user-user recommendation task that is complemen-
servations, we design an end-to-end GNN architecture, GraFRank tary to social recommendation, since it facilitates creating a better
(Graph Attentional Friend Ranker) for multi-faceted friend ranking. social network of users to benefit social recommendation.
GraFRank learns user representations by modality-specific neigh- Graph Neural Networks: GNNs learn node representations by
bor aggregation and cross-modality attention. We handle hetero- recursively propagating features (i.e., message passing) from local
geneity in modality homophily by learning modality-specific neigh- neighborhoods through the use of aggregation and activation func-
bor aggregators to compute a set of representations for each user; tions [36, 47]. Graph Convolutional Networks (GCNs) [28] learn
the aggregator is modeled by friendship attentions to capture the degree-weighted aggregators by operating on the graph Lapla-
influence of individual features and pairwise communications. We cian. Many models generalize GCN with a variety of learnable
introduce a cross-modality attention module to compute the final aggregators, e.g., self-attentions [43], mean and max pooling func-
user representation by attending over the modality-specific repre- tions [17, 18]; these approaches have consistently outperformed em-
sentations of each user, thereby learning non-linear correlations bedding techniques based upon random walks [16, 35]. In contrast
across modalities. We summarize our key contributions below: to most GNN models that store the entire graph in GPU memory,
• Graph-Neural Friend Ranking: To our knowledge, ours is the GraphSAGE [17] is an inductive variant that reduces memory foot-
first work to investigate graph neural network usage and design print by sampling a fixed-size set of neighbors in each GNN layer.
for social user-user interaction modeling applications. Unlike A few scalable extensions include minibatch training with variance
prior work that typically view friend recommendation as struc- reduction [6, 23], subgraph sampling [51], and graph clustering [9].
tural link prediction, we present a novel formulation with multi- Despite the successes of GNNs, very few industrial systems have
modal user features and link features, to leverage knowledge of developed large-scale GNN implementations. One recent system,
rich heterogeneous user activities in social networking platforms. PinSage [50] extends GraphSAGE to user-item bipartite graphs in
• GraFRank Model: Motivated by our empirical study that re- Pinterest; MultiSage [49] extends PinSage to multipartite graphs.
veals heterogeneity in modality homophily and cross-modality However, GNNs remain unexplored for large-scale user-user so-
correlations, we design a neural architecture, GraFRank, to learn cial modeling applications where users exhibit multifaceted behav-
multi-faceted user representations. Distinct from conventional iors by interacting with different functionalities on social platforms.
GNNs operating on a single feature space, GraFRank learns from In our work, we design GNNs for the important application of friend
multiple feature modalities and user-user interactions. suggestion, through a novel multi-faceted friend ranking formulation
• Robust Experimental Results: Our extensive experiments on with multi-modal user features and link communication features.
two large-scale datasets from a popular social networking plat- Multi-Modal Learning: Deep learning techniques have been
form Snapchat, indicate significant gains for GraFRank over explored for multi-modal feature fusion over diverse modalities
state-of-the-art baselines on friend candidate retrieval (relative such as text, images, video, and graphs [14, 25]. Specifically, multi-
MRR gains of 30%) and ranking (relative MRR gains of 20%) tasks. modal extensions of GNNs have recently been examined in micro-
Our qualitative analysis indicates stronger gains for a large, but video recommendation [46] and urban computing [15] applications.
especially crucial population of less-active and low-degree users. Unlike prior work that regard modalities as largely independent
Graph Neural Networks for Friend Ranking in Large-scale Social Platforms WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

SOCIAL FRIENDSHIP GRAPH MULTI-MODAL USER FEATURES


data sources, user feature modalities in social networks tend to
be correlated. In this work, we model non-linear cross-modality Profile Attributes 𝒙"!

correlations to learn multi-faceted user representations. 𝑡!' 𝑡!(


Content Interests 𝒙#!
𝑡!&
𝑡!$ 𝑡$ 𝑡&

3 PRELIMINARIES 𝑡* 𝑡% 𝑡" 𝑡'


Friending Activity 𝒙$!
𝑡!% 𝑡!" 𝑡!
We first formulate the problem of multi-faceted friend ranking in u 𝑡(
𝑡!!
v Engagement Activity 𝒙%!
𝑡# 𝑡)
large-scale social platforms (Section 3.1) and then briefly introduce
𝑡!* LINK COMMUNICATION FEATURES
relevant background on graph neural networks (Section 3.2).
Pairwise Communications 𝒆 𝒖𝒗

3.1 Problem Formulation


In this section, we introduce the different information sources in a Figure 1: Desiderata for Multi-faceted Friend Ranking: tem-
social platform that are relevant to friend suggestion. Each individ- porally evolving friendship graph with multi-modal user
ual in the platform is denoted by a user 𝑢 or 𝑣 and a pair of users features and pairwise link communication features.
(𝑢, 𝑣) may be connected by a friendship, which is an undirected
exchanging messages with friends, or liking and sharing posts,
relationship, i.e., if 𝑢 is a friend of 𝑣, 𝑣 is also a friend of 𝑢. We
which are indicative of their stable traits and mutable interests. We
assume a set of users V introduced until our latest observation
extract user attributes spanning a total of 𝐾 = 4 modalities, which
time of the platform. The friendship graph G evolves when new
include profile attributes, in-app interests, friend creation activities,
friendships form and when new users join the platform. Here, we
and user engagement activities, described in detail below:
only consider the emergence of new users and friendships while
• Profile Attributes: a set of (mostly) static demographic features
leaving the removal of existing users and edges as future work.
describing the user, including age, gender, recent locations, lan-
Prior work typically represent a dynamic network as a sequence
guages, etc., that are listed or inferred from their profile.
of static snapshots, primarily due to scaling concerns. However,
graph snapshots are coarse approximations of the actual continuous- • Content Interests: a real-valued feature vector describing the tex-
time network and rely on a user-specified discrete time interval tual content (e.g., posts, stories) interacted by the user within the
for snapshot creation [40, 41]. We also assume multiple time-aware platform, e.g., topics of stories viewed by the user on Snapchat.
user-level features (across modalities) and link(edge)-level features • Friending Activity: aggregated number of sent/received friend
capturing pairwise user-user communications. In industrial settings, requests, reciprocated friendships, and viewed suggestions of the
such features are commonly extracted by routine batch jobs and user in different time ranges (e.g., daily, weekly, and monthly).
populated in an upstream database at regular time intervals (e.g., • Engagement Activity: aggregated number of in-app direct and
daily batch inference jobs), to facilitate efficient model training. indirect engagements for the user (e.g., text messages, snaps, and
Thus, we adopt a hybrid data model that achieves the best of both comments on posts) with other friends in different time ranges.
worlds. We formulate the friendship graph as a continuous-time The user feature modalities include a combination of static and
dynamic graph (CTDG) with the expressivity to record friendships time-sensitive features, i.e., the profile attributes are static while the
at the finest possible temporal granularity; and represent features rest of the modalities are time-sensitive and often evolve at different
as a sequence of daily snapshots where the time-sensitive features scales across users, e.g., a long-time active user may frequently
(e.g., engagement activity) are recorded at different time scales. communicate with a stable set of friends, while a new user is more
Friendship Graph: Let us consider an observation time window likely to quickly add new friends before communicating.
(𝑡𝑠 , 𝑡𝑒 ) such that friendships created in this window specify the The feature vector of a user 𝑢 ∈ V in snapshot 𝑠 ∈ [1, 𝑆] is
training data for the friend ranking model. We divide this window defined by x𝑠𝑣 = [x𝑠,1 𝑠,𝐾 𝑠,𝑘 𝐷
𝑣 , . . . , x𝑣 ] where x𝑣 ∈ R𝑘 is the 𝑘-th user
(𝑡𝑠 , 𝑡𝑒 ) into a sequence of 𝑆 daily snapshots, denoted by 1, 2, . . . , 𝑆. feature modality and [·] denotes row-wise concatenation.
Formally, we model the friendship graph G as a timed sequence of Pairwise Link Communication Features: Social networks
friend creation events over the entire time range (0, 𝑡𝑒 ), defined as: comprise two predominant types of communication channels: con-
versations and social actions. Conversations include exchanges of
Definition 3.1 (Friendship Graph). Given a graph G = (V, E, T ),
text messages and media content with friends, which indicate di-
let V be the set of users, E ⊆ V × V × R be the set of friendship
rect user-user communications. In contrast, social actions facilitate
links between users in G. At the finest granularity, each link 𝑒 =
indirect user-user interactions, e.g., posting a Snapchat Story or
(𝑢, 𝑣, 𝑡) ∈ E is assigned a unique timestamp 𝑡 ∈ R+ ; 0 < 𝑡 < 𝑡𝑒 that
liking a Facebook post results in a passive broadcast to friends.
denotes the link creation time, and T : R+ ↦→ [0, 𝑆] is a function
For conversational channels, we extract bidirectional link fea-
that maps each timestamp 𝑡 to a corresponding snapshot in [0, 𝑆].
tures reflecting the number of communications sent and received
Here, the window (𝑡𝑠 , 𝑡𝑒 ) corresponds to snapshots [1, 𝑆] and by each pair of users (who are friends). We capture indirect social
snapshot 0 is a placeholder for any 𝑡 < 𝑡𝑠 , while the graph G actions by recording the number of actions of each type per friend.
includes all friendships with time-stamped links in (0, 𝑡𝑒 ). The set Similar to user features, we extract link features per snapshot by
of temporal neighbors of user 𝑣 at time 𝑡 includes friends created aggregating communications at different time intervals. The link
′ ′
before 𝑡, defined as 𝑁𝑡 (𝑣) = {𝑤 : 𝑒 = (𝑣, 𝑤, 𝑡 ) ∈ E ∧ 𝑡 < 𝑡 }. feature vector for a pair of users (𝑢, 𝑣) at time 𝑡 (who became friends
before 𝑡), is denoted by e𝑢𝑣 𝑠 ∈ R𝐸 where 𝑠 = T (𝑡) is the snapshot
Multi-Modal Evolving User Features: In a social platform,
users typically use multiple functionalities, such as posting videos, associated with time 𝑡 and 𝐸 is the cardinality of link features.
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Sankar et al.

Profile Attributes Content Interests Friending Activity Engagement Activity


We first conduct an empirical analysis on user feature modalities,
1.0
0.85
0.9
to gain insights into various factors impacting friendship forma-
0.8
tion (Section 4.1). We then formulate the design choices of our
0.80
0.7
model GraFRank for friend ranking based on our acquired insights
Modality Homophily

Modality Homophily
0.75
0.6 (Section 4.2), followed by model training details (Section 4.3)
0.5

0.70 0.4 4.1 Motivating Insight: Modality Analysis


0.3
We conduct an empirical study that helps us formulate the design
0.65 0.2
choices in our model. We aim to validate the existence and under-
0.1

0.60 0.0
stand the extent and variance of attribute homophily with respect
User Feature Modality Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5
User Cluster Centroids to the different user feature modalities. We begin by analyzing
Figure 2: Users exhibit different extents of homophily across users’ ego-networks to characterize modality homophily, both over-
feature modalities. (a) Overall modality homophily scores, all and broken-down across different user segments. The definition
with 95% confidence interval bands (b) five representative of modality homophily echoes the standard definition of attribute
cluster centroids identified by clustering users based on homophily [33], but generalized to include a modality of attributes,
their homophily distributions over the 𝐾 modalities. i.e., the tendency of users in a social graph to associate with others
who are similar to them along attributes of a certain modality.
We formally define the problem of multi-faceted friend ranking We define a homophily measure m𝑢𝑣 𝑘 between a user 𝑢 and her
in large-scale social platforms, over friendship graph G with multi- friend 𝑣 on modality 𝑘 by the standard cosine similarity [2], which
modal user features and pairwise link features, as follows: is a normalized metric that accounts for heterogeneous activity
across users. We compute a modality homophily score m𝑢𝑘 for user
Problem (Multi-Faceted Friend Ranking). Leverage multi-
𝑢 on modality 𝑘 by the mean over all her neighbors, defined by:
modal user features {x𝑠𝑣 : 𝑣 ∈ V, 1 ≤ 𝑠 ≤ 𝑆 }, link features {e𝑢𝑣 𝑠 :
𝑠 = T (𝑡), (𝑢, 𝑣, 𝑡) ∈ E} and friendship graph G, to generate user
1 Õ 𝑘
representations {h𝑣 (𝑡) ∈ R𝐷 : 𝑣 ∈ V} at time 𝑡, that facilitate friend m𝑢𝑘 = m𝑢𝑣 𝑘
m𝑢𝑣 = 𝑐𝑜𝑠 (x𝑢𝑘 , x𝑘𝑣 ) (2)
suggestion tasks of candidate retrieval and re-ranking. |𝑁𝑢 |
𝑣 ∈𝑁𝑢

Note that we omit the snapshot 𝑠 above since the discussion


3.2 Background on GNNs is restricted to a single feature snapshot. Figure 2 (a) shows the
We briefly introduce a generic formulation of a graph neural net- overall modality homophily scores (averaged across all users), for
work layer with message-passing for neighborhood aggregation. each of the 𝐾 modalities. We observe differing extents of attribute
GNNs use multiple layers to learn node representations. At each homophily across modalities, with higher variance for the time-
layer 𝑙 > 0 (𝑙 = 0 is the input layer), GNNs compute a representation sensitive modalities (e.g., friending and engagement activities).
for node 𝑢 by aggregating features from its neighborhood, through We further extend our analysis to examine the homophily dis-
a learnable aggregator 𝐹𝜃,𝑙 per layer. Stacking 𝑘 layers allows the tribution over modalities at the granularity of individual users, to
𝑘-hop neighborhood of a node to influence its representation. understand if modality homophily varies across different users.
We first represent each user 𝑢 by a 𝐾-dimensional modality vector
m𝑢 = [m𝑢1 , . . . , m𝑢𝐾 ] that describes the homophily distribution over
 
h𝑢,𝑙 = 𝐹𝜃,𝑙 h𝑢,𝑙−1, {h𝑣,𝑙−1 } , 𝑣 ∈ 𝑁 (𝑢) (1)
𝐾 modalities. We then use 𝑘-means [19] to cluster the user set V
Equation (1) indicates that the embedding h𝑢,𝑙 ∈ R𝐷 for node 𝑢 based on their modality vectors. Figure 2 (b) shows the centroids
at the 𝑙-th layer is a non-linear aggregation of its embedding h𝑢,𝑙−1 of five representative clusters. We observe stark differences in the
from layer 𝑙 − 1 and the embeddings of its immediate neighbors modality vector centroids across the five clusters, indicating the
𝑣 ∈ N (𝑢). The function 𝐹𝜃,𝑙 defines the message-passing function existence of user segments with diverse homophily distributions
at layer 𝑙 and can be instantiated using a variety of aggregators, over the 𝐾 modalities. This motivates our first key observation:
including graph convolution [28], attention [43], and pooling [17].
The node representation for 𝑢 at the input layer is h𝑢,0 , where Observation 1 (Heterogeneity in Modality Homophily).
Users exhibit different extents of homophily across modalities, and the
h𝑢,0 = x𝑢 ∈ R𝐷 . The representation of node 𝑢 at the final GNN
homophily distribution over modalities varies across user segments.
layer is typically trained using a supervised learning objective.
The above formulation (Equation 1) operates under the assump- Each modality enables identification of a subset of friends that
tion of a static graph and a single static input feature vector per exhibit modality homophily. However, this poses a question: do the
node. In contrast, our setting involves a time-evolving friendship 𝐾 modalities induce the same (or disparate) subsets of homophilous
graph G with pairwise link features and multi-modal node features. friends, or are the friends that exhibit homophily in each modality,
correlated? We now investigate this relationship in this section.
4 GRAPH NEURAL FRIEND RANKING For every modality 𝑘, we cluster the ego-network (set of di-
In this section, we present our approach to inductively learn user rect friends) 𝑁 (𝑢) of each user 𝑢, which is represented as a set of
representations in a dynamic friendship graph with pairwise link modality-specific vectors {x𝑘𝑣 ∈ R𝐷𝑘 : 𝑣 ∈ 𝑁 (𝑢)}; this results in
features and time-sensitive multi-modal node features. ego-clustering assignments per modality. To quantify cross-modality
Graph Neural Networks for Friend Ranking in Large-scale Social Platforms WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

Normalized Mutual Information (NMI) Adjusted Rand Index (ARI) Chat Rate Snap Rate
1.0 1.0 30 16

M0 M0 14
0.9 25

Numer of users (x 1000)

Numer of users (x 1000)


0.8 12
20
User Feature Modality

User Feature Modality


0.8 10
M1 M1
0.6
15 8
0.7
6
0.4 10
M2 M2
0.6 4
5
2
0.2
0.5
M3 M3 0 0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Percentage of friends Percentage of friends
M0 M1 M2 M3 M0 M1 M2 M3
User Feature Modality User Feature Modality
Figure 4: Friend communication rate in two direct chan-
Figure 3: Cross-modality Correlation Study: NMI (a) and ARI nels (Chat, Snap) on Snapchat. Most users communicate fre-
(b) metrics for each pair of modalities, quantifying pairwise quently only with a subset (≤ 20%) of their friends, making
correlation by consensus in ego-clustering assignments (ob- influence modeling critical during neighbor aggregation.
tained independently with respect to each modality). We ob-
serve substantial correlations across pairs of modalities. We begin by describing a single layer, which consists of two ma-
jor operations: message propagation and message aggregation. We
correlations, we compute a correlation score for each pair of modali- subsequently discuss generalization to multiple successive layers.
ties by the consensus between their ego-clustering assignments. Message Propagation: We define the message-passing mecha-
We use two standard measures: Normalized Mutual Informa- nism to aggregate information from the ego-network 𝑁𝑡 (𝑢) of user
tion (NMI) and Adjusted Rand Index (ARI) to evaluate consensus 𝑢 at time 𝑡. Specifically, the propagation step for modality 𝑘 aggre-
between clusterings [4]. NMI measures the statistical correlation be-
gates the 𝑘-th modality features {x𝑠,𝑘𝑣 : 𝑣 ∈ 𝑁𝑡 (𝑢), 𝑠 = T (𝑡)} from
tween two clustering assignments; however, NMI increases with the
the corresponding snapshot 𝑠 = T (𝑡) of temporal neighbors 𝑁𝑡 (𝑢).
number of distinct clusters. ARI measures the percentage of correct
To quantify the importance of each friend 𝑣 in the ego-network, we
pairwise assignments, and is chance-corrected with an expected
value of zero. Note that NMI and ARI are symmetric metrics. propose a friendship attention [37, 43] which takes embeddings x𝑢𝑠,𝑘
Figure 3 depicts average NMI and ARI scores for each pair of and x𝑠,𝑘 𝑘
𝑣 as input, and computes an attentional coefficient 𝛼 (𝑢, 𝑣, 𝑡)
feature modalities. We observe substantial correlation in cluster to control the influence of friend 𝑣 on user 𝑢 at time 𝑡, given by:
assignments across a few modalities (e.g., time-sensitive modalities  
𝑀1 and 𝑀3 ) while some (e.g., static modality 𝑀0 ) are quite distinct 𝛼 𝑘 (𝑢, 𝑣, 𝑡) = LeakyRELU a𝑇𝑘 𝑾1𝑘 x𝑢𝑠,𝑘 || 𝑾1𝑘 x𝑠,𝑘
𝑣 𝑠 = T (𝑡) (3)
from the rest. Our key takeaway regarding modality correlation is:
where 𝑾1𝑘 ∈ R𝐷𝑘 ×𝐷 is a shared linear transformation applied
Observation 2 (Cross-Modality Correlation). Non-trivial to each user, || is the concatenation operation, and the friendship
correlations exist between pairs of feature modalities, as indicated by attention is modeled as a single feed-forward layer parameterized
the consensus in their induced clusterings of ego-networks. by weight vector a𝑘 ∈ R2𝐷 followed by the LeakyReLU nonlinearity.
We then normalize the attentional coefficients across all friends
connected with 𝑢 at time 𝑡 by adopting the softmax function:
4.2 GraFRank: Multi-Faceted Friend Ranking
In this section, we first introduce the key components of our model

exp 𝛼 𝑘 (𝑢, 𝑣, 𝑡)
GraFRank (Graph Attention Friend Ranker) for friend ranking. Our 𝛼 𝑘 (𝑢, 𝑣, 𝑡) = Í  (4)
𝑤 ∈𝑁𝑡 (𝑢) exp 𝛼 𝑘 (𝑢, 𝑤, 𝑡)
modeling choices in designing a multi-modal GNN, directly follow 
from our observations. GraFRank has two modules (Figure 5): We now define an ego-network representation z𝑢𝑘 𝑡, 𝑁𝑢 (𝑡) ∈
• Modality-specific neighbor aggregation. R𝐷𝑘 for user 𝑢 in modality 𝑘 that captures messages propagated
• Cross-modality attention layer. from first-order neighbors in the ego-network 𝑁𝑢 (𝑡). The message
𝑘
m𝑢←𝑣 ∈ R𝐷 propagated from friend 𝑣 to user 𝑢 at time 𝑡 is defined
Below, we present a detailed description of each module.
as the transformed friend embedding, i.e., m𝑢←𝑣 𝑘 = 𝑾1𝑘 x𝑠,𝑘
𝑣 . We
𝑘 
4.2.1 Modality-specific Neighbor Aggregation. Since the dif- then compute z𝑢 𝑡, 𝑁𝑢 (𝑡) through a weighted average of message
ferent modalities vary in the extent of induced homophily (Obser- embeddings from each friend 𝑣 ∈ 𝑁𝑢 (𝑡) and guided by normalized
vation 1), we treat each modality individually as opposed to the friendship weights 𝛼 𝑘 (𝑢, 𝑣, 𝑡), which is defined as:
popular choice of combining all features by concatenation. Thus, Õ

we learn a modality-specific representation z𝑢𝑘 (𝑡) ∈ R𝐷 for each z𝑢𝑘 𝑡, 𝑁𝑢 (𝑡) = 𝛼 𝑘 (𝑢, 𝑣, 𝑡)m𝑢←𝑣
𝑘
(5)
user 𝑢 ∈ V at time 𝑡 ∈ R+ , that encapsulates information from 𝑣 ∈𝑁𝑢 (𝑡 )
modality 𝑘. Each user 𝑢 flexibly prioritizes different friends in her In the above equation, the friendship weights are learnt merely
temporal neighborhood 𝑁𝑡 (𝑢), thereby accounting for the variance from the connectivity of the ego-network. In reality, most users have
in homophily distribution across user segments. very few close friends, and users with many friends only frequently
We design a modality-specific neighbor aggregation module to communicate with a few of them. We empirically validate this by
compute 𝐾 representations {z𝑢1 (𝑡), . . . , z𝑢𝐾 (𝑡)}, z𝑢𝑘 (𝑡) ∈ R𝐷 for each examining friend communication rate, defined by the percentage of
user 𝑢 ∈ V at time 𝑡 ∈ R+ , where each z𝑢𝑘 (𝑡) is obtained using an friends that a user has communicated with at least once (directly
independent and unique message-passing function per modality. sent a Chat/Snap with on Snapchat) in a one-month window. From
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Sankar et al.

Profile Attributes ATTENTIONAL NEIGHBORHOOD GRAFRANK ATTENTIONAL NEIGHBORHOOD


AGGREGATION (MODALITY 1) AGGREGATION (MODALITY K)
𝒙"!

Content Interests

𝒙*!

Friending Activity 4
5

u
1 𝒙"!
… 4
5

u
1 𝒙$
!

2 " #
𝒙"" 𝑓$%% 𝒙"#
2
𝒙"$
𝑓$%% 𝒙#$
𝒙&! 3
… Modality Attention Weights 3

𝒆"!" … 𝒆"!# 𝒆$
!" … 𝒆$
!#

Engagement Activity

𝒙'! 𝒛"! (𝑡) 𝒛#


! (𝑡)
𝛼$ (𝑢, 1, 𝑡) 𝛽#$ (𝑡) 𝛽#" (𝑡 )
… 𝛼$ (𝑢, 5, 𝑡)
… 𝛼"
! (𝑢, 1, 𝑡 )
… 𝛼"
! (𝑢, 5, 𝑡 )
Pairwise Communications

𝒆 𝒖𝒗
ATTENTION OVER USER FEATURE MODALITIES

𝒉 !(𝑡)

Figure 5: Overall framework of GraFRank: a set of 𝐾 modality-specific neighbor aggregators (parameterized by individual
modality-specific user features and link communication features) to compute 𝐾 intermediate user representations; cross-
modality attention layer to compute final user representations by capturing discriminative facets of each modality.
Figure 4, we find that a vast majority of users communicate with and self-representations of user 𝑢, and further transform the con-
a small percentage (10-20%) of their friends; thus, we posit that catenated embedding through a dense layer 𝐹𝜃𝑘 , defined by:
friendship activeness is critical to precisely model user affinity.
Towards this goal, we incorporate pairwise link communication  
features to parameterize both the attentional coefficients and the z𝑢𝑘 (𝑡) = 𝐹𝜃𝑘 m𝑢←𝑢
𝑘
, z𝑢𝑘 𝑡, 𝑁𝑢 (𝑡) (8)
message aggregated from friends in the ego-network. Specifically,    
𝑘 𝑘 𝑘
we formulate the message m𝑢←𝑣 𝑘 ∈ R𝐷 from friend 𝑣 to user 𝑢 at = 𝜎 𝑾𝑎 z𝑢 (𝑡, 𝑁𝑢 (𝑡)) || m𝑢←𝑢 + b𝑎 (9)
time 𝑡 as a function of both friend feature x𝑠,𝑘 𝑠
𝑣 and link feature e𝑢𝑣 .
where 𝑾𝑎𝑘 ∈ R𝐷×𝐷 , b𝑎 ∈ R𝐷 are trainable parameters of the
𝑘
m𝑢←𝑣 = 𝑾2𝑘 x𝑠,𝑘
𝑣 + 𝑾𝑒𝑘 e𝑢𝑣
𝑠
+b 𝑠 = T (𝑡) (6) aggregator and 𝜎 denotes the ELU activation function which allows
messages to encode both positive and small negative signals. Empir-
where 𝑾2𝑘 ∈ R𝐷𝑘 ×𝐷 , 𝑾𝑒𝑘 ∈R 𝐸×𝐷 are trainable weight
matrices
ically, we observe significant improvements due to self-connections,
operating on the user and link features respectively, and b ∈ R𝐷 compared to directly using the propagated ego-network represen-
is a learnable bias vector. The attentional co-efficient 𝛼 𝑘 (𝑢, 𝑣, 𝑡) is tation from the neighborhood, as in GCN [28], and GAT [43].
then computed as a function of the user feature x𝑢𝑠,𝑘 and message Higher-order Propagation: We stack multiple neighbor aggre-
embedding m𝑢←𝑣 𝑘 ∈ R𝐷 from friend 𝑣 to user 𝑢, defined by: gation layers to model high-order connectivity information, i.e.,
propagate features from 𝑙-hop neighbors. The inputs to layer 𝑙 de-
   pend on the user representations output from layer (𝑙 −1) where the
𝛼 𝑘 (𝑢, 𝑣, 𝑡) = 𝜎 a𝑇𝑘 𝑾1𝑘 x𝑢𝑠,𝑘 || m𝑢←𝑣
𝑘
𝑠 = T (𝑡) (7)
initial (i.e., “layer 0”) representations are set to the input user fea-
tures in modality 𝑘. By stacking 𝑙 layers, we recursively formulate
where 𝜎 is a non-linearity such as LeakyRELU. We similarly 𝑘 of user 𝑢 at the end of layer 𝑙 by:
the representation z𝑢,𝑙
normalize the attentional coefficients 𝛼 𝑘 (𝑢, 𝑣, 𝑡) using Equation 4
and compute the ego-network representation z𝑢𝑘 (𝑡, 𝑁𝑢 (𝑡)) for user 
𝑢 on modality 𝑘 using Equation 5 with the message embedding 𝑘 𝑘 𝑘 𝑘  𝑘 𝑘
z𝑢,𝑙 = 𝐹𝜃,𝑙 m𝑢←𝑢,𝑙−1 , 𝑧𝑢,𝑙−1 𝑡, 𝑁𝑢 (𝑡) m𝑢←𝑢,𝑙−1 = 𝑧𝑢,𝑙−1 (10)
𝑘
m𝑢←𝑣 from Equation 6 conditioned on the link features.
Distinct from conventional GNNs [17, 28, 43, 50] that only con- 𝑘
where 𝑧𝑢,𝑙−1 is the representation of user 𝑢 in modality 𝑘 after
sider user x𝑢𝑠,𝑘 and friend x𝑠,𝑘
𝑣 features to parameterize the neighbor 𝑘
𝑠 into (𝑙 − 1) layers and 𝑧𝑢,𝑙−1 (𝑡, 𝑁 𝑣 (𝑡)) denotes the (𝑙 − 1)-ego-network
aggregation, we additionally incorporate the link features e𝑢𝑣
representation of user 𝑢. We apply 𝐿 neighbor aggregation layers
the attentional co-efficient 𝛼 𝑘 (𝑢, 𝑣, 𝑡) (Equation 7) and the message 𝑘 of user 𝑢 in modality 𝑘.
𝑘 to generate the layer-𝐿 representation z𝑢,𝐿
m𝑢←𝑣 (Equation 6) passed from friend 𝑣; this encourages the aggre-
gation to be cognizant of pairwise communications with friends, 4.2.2 Cross Modality Attention. To learn complex non-linear
e.g., passing more messages from the active friendships. Empirically, correlations between different feature modalities, we design a cross-
we observe a boost in friend ranking performance (Section 5.4) due modality attention mechanism. Specifically, we learn modality at-
to our communication-aware message passing strategy. tention weights 𝛽𝑢𝑘 (𝑡) to distinguish the influence of each modality
Message Aggregation: We refine the representation of user 𝑢 𝑘 using a two-layer Multi-Layer Perceptron, defined by:
by aggregating the messages propagated from friends in 𝑁𝑢 (𝑡). In
addition, we consider self-connections m𝑢←𝑢 𝑘 = 𝑾1𝑘 x𝑢𝑠,𝑘 to retain 𝑘 +𝑏 
exp a𝑇𝑚 𝑾𝑚 z𝑢,𝐿 𝑚
knowledge of the original features (𝑾1 is the same transformation 𝛽𝑢𝑘 (𝑡) = Í𝐾 (11)
𝑘′ + 𝑏 
exp a𝑇𝑚 𝑾𝑚 z𝑢,𝐿
used in Equation 3). Specifically, we concatenate the ego-network 𝑘 ′ =1 𝑚
Graph Neural Networks for Friend Ranking in Large-scale Social Platforms WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

with weights 𝑾𝑚 ∈ R𝐷×𝐷 , a𝑚 ∈ R𝐷 and scalar bias 𝑏𝑚 . The Dataset Region 1 Region 2
final representation h𝑢 (𝑡) ∈ R𝐷 of user 𝑢 is computed by a weighted
# users 3.1 M 17.1 M
aggregation of the layer-𝐿 modality-specific user representations
1 , . . . , z𝐾 }, guided by modality weights 𝛽 𝑘 (𝑡), defined by: # links 286 M 2.36 B
{z𝑢,𝐿 𝑢,𝐿 𝑢 # user features 79 79
𝐾 # link features 6 6
Õ
h𝑢 (𝑡) = 𝛽𝑢𝑘 (𝑡) 𝑾𝑚 z𝑢,𝑙
𝑘
(12) # test set friend requests 46K 340K
𝑘=1 Table 1: Dataset statistics
4.3 Temporal Model Training but not as relevant as the target friend 𝑣. For a 2-layer GNN (𝐿 =
We train GraFRank using a temporal pairwise ranking objective 2), we randomly choose users in the 3-4 hop neighborhood of 𝑢 as
to differentiate positive and negative neighborhoods. We assume hard negatives; 2-hop neighbors are excluded since such friends of
access to training data described by a set of timestamped links L friends are expected to be relevant suggestion candidates owing
created in a window (𝑡𝑠 , 𝑡𝑒 ), where (𝑢, 𝑣, 𝑡) ∈ L is a bi-directional to triadic closure in social networks. In practice, we pre-compute
friendship link between source user 𝑢 and target friend 𝑣 formed at hard negative examples to facilitate efficient model training.
time 𝑡 ∈ (𝑡𝑠 , 𝑡𝑒 ). To train the parameters of GraFRank, we define a We adopt a two-phase learning approach for candidate ranking.
triplet-like learning objective based on max-margin ranking. We pre-train the model on random negatives (as in candidate
retrieval), to identify good model initialization points, followed
4.3.1 Pairwise Ranking Objective. We define a time-sensitive by fine-tuning on hard negatives. Ranking hard negatives is more
ranking loss over the final user embeddings (h𝑢 (𝑡) for user 𝑢 at challenging, hence encouraging the model to progressively learn
time 𝑡) to rank the inner product of positive links (𝑢, 𝑣, 𝑡) ∈ L, friend distinctions at a finer granularity. We empirically show
higher than sampled negatives (𝑢, 𝑛, 𝑡) by a margin factor Δ, as: notable gains on candidate ranking due to our two-phase strategy,
compared to training individually on random or hard negatives.
Õ
𝐿= E𝑛∼𝑃𝑛 (𝑢) max{0, h𝑢 (𝑡)·h𝑛 (𝑡)−h𝑢 (𝑡)·h𝑣 (𝑡)+Δ} (13) 4.3.3 Temporal Neighborhood Sampling. We learn a tempo-
(𝑢,𝑣,𝑡 ) ∈ L ral user representation h𝑢 (𝑡) for user 𝑢 at time 𝑡 by selecting a fixed
where Δ is a margin hyper-parameter and 𝑃𝑛 (𝑢) is the negative number of friends from 𝑁𝑢 (𝑡) for neighbor aggregation in each
sampling distribution for user 𝑢. Here, we use a single forward layer; this controls the memory footprint during training [17].
pass to inductively compute a time-aware representation h𝑢 (𝑡) for To efficiently identify and sample neighbors of 𝑢 at any time 𝑡,
each user 𝑢 ∈ V at time 𝑡 based on the appropriate user and link we represent the time-evolving friendship graph G as a temporal ad-
features in temporal neighborhoods. Each minibatch of training jacency list at its latest time 𝑡𝑠 where each user 𝑢 has a list of (friend,
examples is then optimized independently which precludes the need time) tuples sorted by link creation times. This data representation
to explicitly model temporal dependencies. This generic contrastive enables 𝑂 (log 𝑑) neighbor lookup at an arbitrary timestamp 𝑡 via
learning formulation enables usage of the same framework for binary search where 𝑑 is the average user degree in the graph.
different recommendation tasks such as candidate retrieval and
4.3.4 Multi-GPU Minibatch Training. We train GraFRank us-
ranking, with different negative sampling distributions.
ing minibatches of links from L over multiple GPUs on a single
4.3.2 Candidate Retrieval and Ranking Tasks. We learn user shared memory machine. The temporal adjacency list of G and
embeddings towards two key use-cases in friend ranking: candidate feature matrices 𝑿, 𝑬 are placed in shared CPU memory to enable
retrieval and candidate ranking. Candidate retrieval aims to generate fast parallel neighborhood sampling and feature lookup. We adopt
a list of top-𝑁 (e.g., 𝑁 = 100) potential friend suggestions out of a a producer-consumer framework [50] that alternates between CPUs
very large candidate pool (over millions of users), while candidate and GPUs for model training. A CPU-bound producer constructs
ranking involves fine-grained re-ranking within a much smaller friend neighborhoods, looks up user and link features, and gener-
pool of the generated candidates, to determine the top-𝑛 (e.g., 𝑛 = ates negative samples for the links of a minibatch. We then partition
10) suggestions shown to end users in the platform. We define each minibatch across multiple GPUs, to compute forward passes
different negative sampling distributions 𝑃𝑛 (𝑢) for each task owing and gradients with a PyTorch model over dynamically constructed
to their different ranking granularities, as follows: computation graphs. The gradients from different GPUs are syn-
• Candidate Retrieval: For the coarse-grained task of candidate chronized using PyTorch’s Distributed Data Parallel [30] construct.
retrieval, we uniformly sample five random negative users for
each positive link, from the entire user set V. Generating random 5 EXPERIMENTS
negatives is efficient and effective at quickly training the model To analyze the quality of user representations learned by GraFRank,
to identify potential friend candidates for each user. However, we propose five research questions to guide our experiments:
random negatives are often too easy to distinguish and may not (RQ1 ) Can GraFRank outperform feature-based models and state-
provide the requisite resolution for the model to learn fine-grained of-the-art GNNs on candidate retrieval and ranking tasks?
distinctions necessary for candidate friend ranking. (RQ2 ) How does GraFRank compare to prior work under alterna-
• Candidate Ranking: To enhance model resolution for candidate tive metrics of reciprocated and communicated friendships?
ranking, we also use hard negative examples; for each positive (RQ3 ) How do the different architectural design choices and train-
pair (𝑢, 𝑣), we generate five hard negatives related to the source 𝑢, ing strategies in GraFRank impact performance?
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Sankar et al.

Dataset Region 1 Region 2


Metric N@5 N@50 HR@5 HR@50 MRR N@5 N@50 HR@5 HR@50 MRR
LogReg 0.1752 0.2460 0.2452 0.5262 0.1751 0.0761 0.1367 0.1134 0.3654 0.0831
MLP 0.1923 0.2679 0.2721 0.5726 0.1903 0.0973 0.1720 0.1466 0.4541 0.1046
XGBoost 0.2099 0.2865 0.2932 0.5957 0.2071 0.1366 0.2097 0.1936 0.4921 0.1409
GCN 0.0934 0.1836 0.1490 0.5154 0.1034 0.1651 0.2634 0.2503 0.6427 0.1678
GAT 0.0851 0.1813 0.1424 0.5352 0.0960 0.1797 0.2794 0.2698 0.6663 0.1812
SAGE + Max 0.1790 0.2736 0.2695 0.6409 0.1797 0.1520 0.2505 0.2315 0.6269 0.1566
SAGE + Mean 0.2378 0.3240 0.3338 0.6757 0.2333 0.2870 0.3805 0.4005 0.7655 0.2790
GraFRank 0.3152 0.3983 0.4318 0.7533 0.3035 0.4166 0.4950 0.5386 0.8395 0.4012
Percentage Gains 32.55 % 22.93 % 29.36 % 11.48 % 30.09 % 45.16 % 30.09 % 34.48 % 9.67 % 43.8 %
Table 2: GraFRank outperforms feature-based models and GNNs (relative gains of 30-43 % MRR with respect to the best
baseline) on candidate retrieval in Regions 1 and 2. HR@K and N@K denote Hit-Rate@K and NDCG@K metrics for 𝐾 = 5, 50.
(RQ4 ) How do the training strategies and hyper-parameters impact the biases of the upstream retrieval system. Thus, we instead gen-
convergence and performance of GraFRank? erate 𝑁 = 500 hard negatives samples per test pair based on 𝐾-hop
(RQ5 ) How do the learned user embeddings in GraFRank perform neighborhoods (Section 4.3), to ensure an unbiased comparison.
across diverse user cohorts?
5.1.3 Training Details. We train GraFRank using 𝐿 = 2 mes-
sage passing layers per modality with a hidden dimension size of
5.1 Experiment Setup 64 and output embedding dimension 𝐷 = 32. In each layer, we
We now present our experimental setup with a brief description of sample 15 first-order neighbors and 15 second-order neighbors for
datasets, evaluation metrics, and model training details. each sampled first-order neighbor; each user receives messages
propagated from upto 225 friends. During model training, we apply
5.1.1 Datasets. We evaluate GraFRank on friend recommenda- dropout with rate of 0.3 in both layers. The model is trained for
tions using two large-scale datasets from Snapchat (Table 1). Each a maximum of 30 epochs with a batch size of 256 positive pairs
dataset is constructed from the interactions among users belonging (apart from 5 random/hard negatives per pair) and learning rate of
to a specific country (obscured for privacy reasons). We collect 79 0.001 using Adam optimizer. We benchmark our experiments using
user features spanning four modalities and six pairwise link fea- a machine with 32 cores, 200 GB shared CPU memory, and a single
tures, as described in Section 3.1. All features are standardized by Nvidia Tesla P100 GPU on the Linux platform. Our PyTorch [30]
zero-mean and unit-variance normalization before model training. implementation of GraFRank is publicly available1 .
In each dataset, the training set comprises timetamped friendships
created during a span of 7 contiguous days. Empirically, we find that 5.2 Baselines
7 days suffices to achieve good results (comparable to 1 month), thus
We compare GraFRank on friend ranking against strong feature-
significantly more efficient. To evaluate the quality of friendship
based ranking models and state-of-the-art GNN models.
suggestions, the test set comprises all friend add requests over the
• LogReg [20]: Logistic regression classifier for link prediction.
subsequent four days. We observe consistent results for different
The input feature for each pair of users, is a concatenation of
train-test splits across 5 time periods and 2 geographic regions. We
source and target features across the 𝐾 user feature modalities.
use 10% of the labeled examples for hyper-parameter tuning.
• XGBoost [7]: Tree boosting model for pairwise learning to rank,
5.1.2 Evaluation Metrics. We experiment on two friend sugges- trained using the same input features as LogReg. It is currently
tion tasks: candidate retrieval and candidate ranking (Section 4.3). deployed at Snapchat for quick-add friend suggestions.
To evaluate friend recommendation, we use ranking metrics Hit- • MLP [20]: Two-layer Multi-Layer Perceptron with fully-connected
Rate (HR@K), Normalized Discounted Cumulative Gain (NDCG@K) layers and ReLU activations to learn user representations. We
and Mean Reciprocal Rank (MRR). We adopt negative-sample eval- train MLP using the same ranking loss as our model (Equation 13).
uation [21] to generate 𝑁 negative samples per positive pair (𝑢, 𝑣) • GCN [28]: Scalable graph convolutional networks with degree-
in the test set (user 𝑢 has sent a friend request to user 𝑣). We then weighted aggregation and neighborhood sampling. We concate-
evaluate metrics for each test pair (𝑢, 𝑣) by ranking 𝑣 among the 𝑁 nate user features across the 𝐾 modalities into a single vector.
negative samples via inner products in the latent space. • GAT [43]: Graph attention networks with self-attentional aggre-
To evaluate candidate retrieval, we use 𝑁 = 10000 randomly sam- gation and neighborhood sampling for scalable training.
pled negative users for each positive test pair, to emulate retrieval • SAGE + Max [17]: Element-wise max pooling for neighbor ag-
from a large candidate pool. Ideally, candidate ranking should op- gregation and self-embedding concatenation at each layer.
erate over a shortlisted set of potential friends identified by the
retrieval system. However, we aim to provide a fair benchmark com-
parison of different models for candidate ranking that is agnostic to 1 https://github.com/aravindsankar28/GraFRank
Graph Neural Networks for Friend Ranking in Large-scale Social Platforms WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

Dataset Region 1 Region 2


Metric N@5 N@10 HR@5 HR@10 MRR N@5 N@10 HR@5 HR@10 MRR
LogReg 0.1521 0.1795 0.2268 0.3116 0.1523 0.1398 0.1711 0.2136 0.3106 0.1449
MLP 0.1873 0.2190 0.2663 0.3644 0.1915 0.1927 0.2241 0.2721 0.3695 0.1967
XGBoost 0.1714 0.2002 0.2394 0.3287 0.1779 0.1844 0.2174 0.2605 0.363 0.1911
GCN 0.1345 0.1698 0.2039 0.3136 0.1462 0.1758 0.2147 0.2619 0.3826 0.1831
GAT 0.1416 0.1776 0.2197 0.3313 0.1503 0.2028 0.2445 0.2984 0.4276 0.2077
SAGE + Max 0.2063 0.2441 0.2980 0.4151 0.2094 0.2426 0.2818 0.3443 0.4654 0.2426
SAGE + Mean 0.2232 0.2607 0.3165 0.4330 0.2255 0.2766 0.3164 0.3835 0.5064 0.2744
GraFRank 0.2684 0.3098 0.3772 0.5051 0.2669 0.3342 0.3767 0.4529 0.5841 0.3282
Percentage Gains 20.25 % 18.83 % 19.18 % 16.65 % 18.36 % 20.82 % 19.06 % 18.1 % 15.34 % 19.61 %
Table 3: GraFRank achieves significant improvements (relative gains of 18-20% MRR with respect to the best baseline) over
both feature-based models and prior GNNs in all ranking metrics on friend candidate ranking in both Region 1 and Region 2.
• SAGE + Mean [17]: Same as SAGE + Max with element-wise 5.3.2 Alternative Friendship Quality Indicators (RQ2 ). In ad-
mean pooling function for neighbor aggregation. dition to evaluating friend suggestion based on friend addition re-
Note that neural graph autoencoders [27] and graph convolu- quests, we consider other metrics to quantify friendship quality,
tional matrix completion models [5] are not comparable because e.g., social platforms often incentivize friendships that result in
they cannot scale to our large-scale social network datasets. greater downstream engagement. We therefore define friendship
We train all baseline GNNs in a time-sensitive manner following reciprocation and future bi-directional communication as two alter-
the same training strategy as GraFRank with time-aware neigh- native measures of friendship quality. We evaluate reciprocated and
borhood sampling and feature lookups, to compute temporal user communicated friend suggestion results on retrieval in Table 4.
representations. For each baseline, we train separate models for We observe consistently high gains for GraFRank on the recip-
retrieval and ranking tasks; we use random negatives for retrieval rocated and communicated friend retrieval tasks; this also demon-
while resorting to hard negatives for ranking; empirically, training strates the generality of our pairwise friend ranking objective (Equa-
separate models is vastly superior to training a single model us- tion 13) in learning user representations that promote downstream
ing a mixture of random and hard negatives, or even a curriculum engagement. Designing multi-criteria ranking objectives to balance
training scheme [50]. Our experimental results are averaged over 5 different quality measures is worth exploring in the future.
independent runs with random initializations for all methods. Dataset Add Reciprocate Communicate

5.3 Experimental Results Metric HR@50 MRR HR@50 MRR HR@50 MRR

We first present our main results comparing GraFRank with com- LogReg 0.5262 0.1751 0.5582 0.2029 0.5495 0.1811
MLP 0.5726 0.1903 0.6006 0.2165 0.6001 0.1979
peting baselines on candidate retrieval and ranking tasks, followed
XGBoost 0.5957 0.2071 0.6286 0.2322 0.6407 0.2274
by comparisons using alternative measures of friendship quality.
GCN 0.5154 0.1034 0.5329 0.1113 0.5273 0.1038
5.3.1 Friend Candidate Retrieval and Ranking (RQ1 ). We GAT 0.5352 0.0960 0.5596 0.1045 0.5654 0.0971
compare friend recommendation performance (based on add re- SAGE + Max 0.6409 0.1797 0.6653 0.2043 0.6670 0.1834
quests) of various approaches on retrieval and ranking in Tables 2 SAGE + Mean 0.6757 0.2333 0.6984 0.2609 0.7056 0.2446
and 3 respectively. Interestingly, we find that SAGE variants outper- GraFrank 0.7533 0.3035 0.7756 0.3367 0.7942 0.3152
form popular GNN models GCN and GAT. A possible explanation is Percent Gains 11.48 % 30.09% 11.05% 29.05% 12.56% 28.86 %
the impact of feature space heterogeneity in social networks and sto- Table 4: Comparison of all models on add, reciprocate and
chastic neighbor sampling; this results in noisy user representations communicate friendship retrieval tasks (reported on Region
for GNN models (GCN, GAT) that recursively aggregate neighbor 1). GraFRank has consistent gains across all tasks.
features without emphasizing self-connections. Preserving knowl-
edge of the original features by concatenating the self-embedding
in each layer results in noticeable gains (SAGE variants). 5.4 Ablation Study (RQ3 )
GraFRank significantly outperforms state-of-the-art approaches In this section, we present an ablation study to analyze the archi-
with over 20-30% relative MRR gains. The performance gains of tectural modeling choices and training strategies in GraFRank.
GraFRank over the best baseline are statistically significant with
𝑝 < 0.01 judged by the paired t-test. In contrast to singular aggre- 5.4.1 Model Architecture. We design three variants to study the
gation over the entire feature space by prior GNNs, GraFRank utilities of communication-aware and modality-specific aggregation.
handles variance in homophily across different modalities through • GraFRank𝑈 𝑀 (User-Modality): We analyze the contribution
modality-specific communication-aware neighbor aggregation. Fur- of link features by parameterizing the 𝑘-th modality aggregator
ther, the final user representations are learnt by a correlation-aware with just the 𝑘-th modality user features (Equation 3). Note that
attention layer to capture discriminative facets of each modality. link features are excluded during neighbor aggregation.
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Sankar et al.

Dataset Retrieval Ranking We make three consistent observations from the performance
comparison (Table 6) across all of the compared GNN models:
Metric HR@50 MRR HR@10 MRR
• Random negative training achieves best results for retrieval, but
(a) GraFRank𝑈 0.6968 0.2346 0.4255 0.2164 performs poorly on ranking; such models lack the resolution to
(b) GraFRank𝑈 𝐿 0.7069 0.2423 0.4450 0.2301 discriminate amongst potential candidates for re-ranking.
(c) GraFRank𝑈 𝑀 0.7239 0.2823 0.4887 0.2480 • Training on hard negatives improves candidate ranking as ex-
GraFRank 0.7533 0.3035 0.5051 0.2669 pected, yet results in poor retrieval performance. Learning fine
Table 5: Model architecture ablation study of GraFRank. profile-oriented distinctions among graph-based neighbors is
Removing (c) link communication features, (b) modality- actually detrimental to the coarse-grained task of retrieval.
specific aggregation, or (a) both, hurt model performance. • Random negative pretraining yields good parameter initialization
points that are more conducive for effective fine-tuning on hard
• GraFRank𝑈 𝐿 (User-Link): To study the effectiveness of learn-
negatives. Fine-tuning improves results for all GNNs over direct
ing modality-specific aggregators, we define a single modality-
hard negative training on ranking, but is ineffective for retrieval.
agnostic aggregator over user feature vectors obtained by con-
catenation across the 𝐾 modalities and link features.
5.5 Training and Sensitivity Analysis (RQ4 )
• GraFRank𝑈 (User): We remove link features from the aggre-
gator in GraFRank𝑈 𝐿 to further test the standalone benefits of In this section, we quantitatively analyze model convergence and
link features in parameterizing a single neighbor aggregator. model sensitivity to sampled neighborhood sizes in GNN models.
The performance of all architectural variations are reported 5.5.1 Model Training Analysis. We investigate the relative abil-
in Table 5. GraFRank𝑈 𝐿 performs much worse than GraFRank ities of different models to optimize the pairwise friend ranking
highlighting the benefits of learning multiple modality-specific objective (Equation 13). We compare the convergence rates of base-
aggregators to account for varying extents of modality homophily. lines MLP, SAGE + Mean, and our model GraFRank under both
Communication-aware neighbor aggregation is effective at iden- random and hard negative training settings, by examining the av-
tifying actively engaged friends during neighbor aggregation; this is erage training loss per epoch in Figures 6 (a) and (b) respectively.
evidenced by the gains of GraFRank over GraFRank𝑈 𝑀 (modality- 0.12
Random Negative Training
0.12
Hard Negative Training

aware user feature aggregation). We find noticeable gains from pa- MLP
SAGE + Mean
MLP
SAGE + Mean
0.10 0.10
rameterizing the aggregator with link features, even in the absence GraFRank GraFRank
Pairwise Ranking Loss

Pairwise Ranking Loss


0.08 0.08
of modality-specific aggregation, from the comparison between
GraFRank𝑈 𝐿 and GraFRank𝑈 (single user feature aggregator). 0.06 0.06

0.04 0.04

5.4.2 Training Strategy. We examine different training strate- 0.02 0.02

gies to learn GNN models for friend recommendation. We train 0 5 10 15


Training Epoch
20 25 30 0 5 10 15
Training Epoch
20 25 30

GraFRank with random negatives for candidate retrieval, but adopt Figure 6: GraFRank converges faster to better optimization
two-phase hard negative fine-tuning (with random negative pre- minima in random and hard negative settings, which trans-
training) for candidate ranking. To validate our choices, we examine lates to notable gains on both retrieval and ranking tasks.
three model training settings: (a) random negative training, (b) hard
negative training, and (c) fine-tuning (after pretraining on random As expected, all models converge to a lower training loss against
negatives), for two GNN models: SAGE + Mean and GraFRank, random negatives (Figure 6 (a)) when compared to hard negatives
across both candidate retrieval and ranking tasks. Note that train- (Figure 6 (b)). Interestingly, SAGE + Mean shows similar training
ing with combination of random and hard negatives, as proposed convergence as MLP in Figure 6 (b), but achieves better test results;
in [50], is excluded since it consistently performs worse than the this indicates better generalization for GNNs over feature-based
above three strategies on both retrieval and ranking tasks. models. Compared to baselines, GraFRank converges to a better op-
timization minimum under both random and hard negative settings,
Dataset Retrieval Ranking which also generalizes to better test results (Tables 2 and 3).
Metric HR@50 MRR HR@10 MRR
5.5.2 Runtime and Sensitivity Analysis. A key trade-off in train-
SAGE + Mean (random) 0.6757 0.2333 0.3943 0.1923 ing scalable GNN models lies in choosing the size of sampled neigh-
SAGE + Mean (hard) 0.3275 0.0766 0.4330 0.2255 borhoods 𝑇 in each message-passing layer. In our experiments, we
SAGE + Mean (fine-tune) 0.3978 0.0965 0.4561 0.2372 train two-layer GNN models for friend ranking. Figure 8 shows
GraFRank (random) 0.7533 0.3035 0.4655 0.2254 the runtime and performance of SAGE + Mean and GraFRank for
GraFRank (hard) 0.4542 0.1461 0.4823 0.2594 different sizes of sampled neighborhoods 𝑇 from 5 to 20.
GraFRank (fine-tune) 0.5283 0.1871 0.5051 0.2669 Model training time generally increases linearly with 𝑇 , but
has a greater slope after 𝑇 = 15. We also observe diminishing
Table 6: Training strategy comparison of two GNNs across
returns in model performance (MRR) with increase in the size of
retrieval and ranking tasks. Random negative training
sampled neighborhood 𝑇 after 𝑇 = 15. Thus, we select a two-layer
achieves best results for retrieval. Random negative pre-
GNN model with layer-wise neighborhood size of 15, to provide an
training with hard negative fine-tuning benefits ranking.
effective trade-off between computational cost and performance.
Graph Neural Networks for Friend Ranking in Large-scale Social Platforms WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

MLP SAGE + Mean GRaFRank

Figure 7: Visualization of two-dimensional t-SNE transformed user representations from feature-based MLP, and GNN models:
SAGE + Mean, and GraFRank. Users with the same color belong to the same city. Compared to MLP and SAGE + Mean, the
friendship relationships learnt by GraFRank result in well-separated user clusters capturing geographical proximity.
Compared to SAGE + Mean, GraFRank has marginally higher The performance variation across users with different activity
training times, yet achieves significant performance gains (20% levels in Figure 9(b), exhibits more distinctive trends with clear
MRR), justifying the added cost of modality-specific aggregation. gains for GNN models over feature-based MLP and XGBoost for less-
3500
SAGE + Mean
0.32
SAGE + Mean
active users. Significantly, GraFRank has much stronger gains over
GRaFRank GRaFRank
3000
0.30 SAGE + Mean, in less-active user segments, owing to its multi-faceted
Time per Epoch (Seconds)

2500
0.28 modeling of heterogeneous in-platform user actions. GraFRank ef-
2000
fectively overcomes sparsity concerns for less-active users, through
MRR

0.26
1500

0.24
modality-specific neighbor aggregation over multi-modal user fea-
1000

0.22
tures to learn expressive user representations for friend ranking.
500
1.0
0 0.20 0.8 MLP
5 10 15 20 5 10 15 20 XGBoost 0.9
# Sampled neighbors per layer # Sampled neighbors per layer
0.7 SAGE + MEAN
GraFRank
Figure 8: We observe diminishing returns in MRR after 0.8
Hit Rate @ 50

Hit Rate @ 50
0.6

neighborhood size 𝑇 = 15; GraFRank has significant gains 0.5


0.7

over SAGE + Mean, with marginally higher training times. 0.4 0.6 MLP
XGBoost
0.3 0.5 SAGE + MEAN
GraFRank
0.2 0.4
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
5.6 User Cohort Analysis (RQ5 ) User groups ordered by increasing node degree User groups ordered by increasing friending activity

In this section, we present multiple qualitative analyses to examine Figure 9: GraFRank has significant improvements across all
model performance across user segments with varied node degree user segments, with notably larger gains for low-to-medium
and friending activity levels, and compare t-SNE visualizations of degree users (a), and less-active users (b).
user representations learned by different neural models.
5.6.2 Visualization. To analyze the versatility of learned user em-
5.6.1 Impact of degree and activity. We examine friend recom- beddings, we present a qualitative visualization to compare different
mendation performance across users with different node degrees models on their expressivity to capture geographical user proxim-
and friending activities. Specifically, for each test user, degree is the ity. We randomly select users from three different cities within
number of friends, and activity is the number of friend requests Region 1 and use t-SNE [32] to transform their learned embeddings
sent/received in the past 30 days. We divide the test users into into two-dimensional vectors. Figure 7 compares the visualization
groups independently based on their degree and activity levels. results from different neural models. Evidently, the visualization
We compare GraFRank with feature-based models MLP, XGBoost, learned by MLP does not capture geographical proximity, while
and the best GNN baseline SAGE + Mean. Figures 9(a) and (b) de- the GNN models are capable of grouping users located within the
pict friend candidate retrieval performance HR@50 across user same city. Compared to SAGE + Mean, GraFRank forms even more
segments with different degrees and activities respectively. well-segmented groups with minimal inter-cluster overlap.
From Figure 9(a), overall model performance generally increases
with node degree due to the availability of more structural infor-
mation. GraFRank has significant improvements across all user 6 CONCLUSION
segments, with notably higher gains for low-to-medium degree users This paper investigates graph neural network design for friend
(relative gains of 20%). GraFRank prioritizes active friendships by suggestion in large-scale social platforms. We formulate friend sug-
communication-aware message-passing, which compensates for gestion as multi-faceted friend ranking with multi-modal user fea-
the lack of sufficient local connectivities in the ego-network. tures and link communication features. Motivated by our empirical
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Sankar et al.

insights on user feature modalities, we design a neural architec- [25] Zhiwei Jin, Juan Cao, Han Guo, Yongdong Zhang, and Jiebo Luo. 2017. Multimodal
ture GraFRank that handles heterogeneity in modality homophily fusion with recurrent neural networks for rumor detection on microblogs. In
MM. 795–816.
via modality-specific neighbor aggregators, and learns non-linear [26] Leo Katz. 1953. A new status index derived from sociometric analysis. Psychome-
modality correlations through cross-modality attention. Our exper- trika 18, 1 (1953), 39–43.
[27] Thomas N Kipf and Max Welling. 2016. Variational Graph Auto-Encoders. NIPS
iments on two multi-million user datasets from Snapchat reveal Workshop on Bayesian Deep Learning (2016).
significant improvements in friend candidate retrieval (30% MRR [28] Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph
gains) and ranking (20% MRR gains), with stronger gains for the convolutional networks. In ICLR.
[29] Adit Krishnan, Hari Cheruvu, Cheng Tao, and Hari Sundaram. 2019. A Modular
crucial population of less-active and low-degree users. Although Adversarial Approach to Social Recommendation. In CIKM. ACM, 1753–1762.
our case studies are conducted on a single platform Snapchat, we [30] Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar, Pieter Noordhuis, Teng Li,
expect GraFRank to be directly applicable to popular bidirectional Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, et al. 2020. Pytorch
distributed: Experiences on accelerating data parallel training. arXiv preprint
friending platforms (e.g., Facebook, LinkedIn) with minor adapta- arXiv:2006.15704 (2020).
tions for unidirectional scenarios (e.g., Twitter, Instagram). [31] David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction problem for
social networks. JASIST 58, 7 (2007), 1019–1031.
[32] Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE.
Journal of machine learning research 9, Nov (2008), 2579–2605.
REFERENCES [33] Miller McPherson, Lynn Smith-Lovin, and James M Cook. 2001. Birds of a feather:
[1] Lada A Adamic and Eytan Adar. 2003. Friends and neighbors on the web. Social Homophily in social networks. Annual review of sociology 27, 1 (2001), 415–444.
networks 25, 3 (2003), 211–230. [34] Joshua O’Madadhain, Jon Hutchins, and Padhraic Smyth. 2005. Prediction and
[2] Luca Maria Aiello, Alain Barrat, Rossano Schifanella, Ciro Cattuto, Benjamin ranking algorithms for event-based network data. ACM SIGKDD explorations
Markines, and Filippo Menczer. 2012. Friendship prediction and homophily in newsletter 7, 2 (2005), 23–30.
social media. ACM Transactions on the Web (TWEB) 6, 2 (2012), 1–33. [35] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning
[3] Réka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex of social representations. In KDD. ACM, 701–710.
networks. Reviews of modern physics 74, 1 (2002), 47. [36] Aravind Sankar, Junting Wang, Adit Krishnan, and Hari Sundaram. 2020. Be-
[4] Enrique Amigó, Julio Gonzalo, Javier Artiles, and Felisa Verdejo. 2009. A com- yond Localized Graph Neural Networks: An Attributed Motif Regularization
parison of extrinsic clustering evaluation metrics based on formal constraints. Framework. In ICDM.
Information retrieval 12, 4 (2009), 461–486. [37] Aravind Sankar, Yanhong Wu, Liang Gou, Wei Zhang, and Hao Yang. 2020.
[5] Rianne van den Berg, Thomas N Kipf, and Max Welling. 2017. Graph convolu- Dysat: Deep neural representation learning on dynamic graphs via self-attention
tional matrix completion. arXiv preprint arXiv:1706.02263 (2017). networks. In WSDM. 519–527.
[6] Jianfei Chen, Jun Zhu, and Le Song. 2018. Stochastic Training of Graph Convolu- [38] Aravind Sankar, Yanhong Wu, Yuhang Wu, Wei Zhang, Hao Yang, and Hari
tional Networks with Variance Reduction. In ICML. 942–950. Sundaram. 2020. GroupIM: A Mutual Information Maximization Framework for
[7] Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. Neural Group Recommendation. In SIGIR. 1279–1288.
In KDD. ACM, 785–794. [39] Aravind Sankar, Xinyang Zhang, Adit Krishnan, and Jiawei Han. 2020. Inf-VAE:
[8] Zhengdao Chen, Lisha Li, and Joan Bruna. 2019. Supervised Community Detec- A Variational Autoencoder Framework to Integrate Homophily and Influence in
tion with Line Graph Neural Networks. In ICLR. OpenReview.net. Diffusion Prediction. In WSDM. 510–518.
[9] Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. [40] Neil Shah, Danai Koutra, Tianmin Zou, Brian Gallagher, and Christos Faloutsos.
2019. Cluster-GCN: An efficient algorithm for training deep and large graph 2015. Timecrunch: Interpretable dynamic graph summarization. In KDD. ACM.
convolutional networks. In KDD. ACM, 257–266. [41] Sucheta Soundarajan, Acar Tamersoy, Elias B Khalil, Tina Eliassi-Rad,
[10] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for Duen Horng Chau, Brian Gallagher, and Kevin Roundy. 2016. Generating graph
youtube recommendations. In RecSys. 191–198. snapshots from streaming edge data. In WWW. 109–110.
[11] Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. 2018. A survey on network [42] Xianfeng Tang, Yozen Liu, Neil Shah, Xiaolin Shi, Prasenjit Mitra, and Suhang
embedding. IEEE TKDE 31, 5 (2018), 833–852. Wang. 2020. Knowing your FATE: Friendship, Action and Temporal Explanations
[12] Daizong Ding, Mi Zhang, Shao-Yuan Li, Jie Tang, Xiaotie Chen, and Zhi-Hua for User Engagement Prediction on Social Apps. In KDD. ACM, 2269–2279.
Zhou. 2017. Baydnn: Friend recommendation with bayesian personalized ranking [43] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro
deep neural network. In CIKM. 1479–1488. Liò, and Yoshua Bengio. 2018. Graph Attention Networks. ICLR (2018).
[13] Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. [44] Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun
2019. Graph neural networks for social recommendation. In The World Wide Web Lee. 2018. Billion-scale commodity embedding for e-commerce recommendation
Conference. 417–426. in alibaba. In KDD. ACM, 839–848.
[14] Golnoosh Farnadi, Jie Tang, Martine De Cock, and Marie-Francine Moens. 2018. [45] Menghan Wang, Yujie Lin, Guli Lin, Keping Yang, and Xiao-Ming Wu. 2020.
User profiling through deep multimodal fusion. In WSDM. 171–179. M2GRL: A Multi-task Multi-view Graph Representation Learning Framework
[15] Xu Geng, Xiyu Wu, Lingyu Zhang, Qiang Yang, Yan Liu, and Jieping Ye. 2019. for Web-scale Recommender Systems. In KDD. ACM, 2349–2358.
Multi-modal graph interaction for multi-graph convolution network in urban [46] Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng
spatiotemporal forecasting. arXiv preprint arXiv:1905.11395 (2019). Chua. 2019. MMGCN: Multi-modal graph convolution network for personalized
[16] Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for recommendation of micro-video. In MM. 1437–1445.
networks. In KDD. ACM, 855–864. [47] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and
[17] Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE
learning on large graphs. In NIPS. 1024–1034. TNNLS (2020).
[18] William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation learning [48] Yuxin Xiao, Adit Krishnan, and Hari Sundaram. 2020. Discovering strategic
on graphs: Methods and applications. arXiv preprint arXiv:1709.05584 (2017). behaviors for collaborative content-production in social networks. In WWW.
[19] John A Hartigan and Manchek A Wong. 1979. Algorithm AS 136: A k-means 2078–2088.
clustering algorithm. Journal of the royal statistical society. 28, 1 (1979), 100–108. [49] Carl Yang, Aditya Pal, Andrew Zhai, Nikil Pancha, Jiawei Han, Charles Rosenberg,
[20] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The elements of and Jure Leskovec. 2020. MultiSage: Empowering GCN with Contextualized Multi-
statistical learning: data mining, inference, and prediction. Springer Science & Embeddings on Web-Scale Multipartite Networks. In KDD. ACM, 2434–2443.
Business Media. [50] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton,
[21] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale
Chua. 2017. Neural collaborative filtering. In WWW. 173–182. recommender systems. In KDD. ACM, 974–983.
[22] Jonathan L Herlocker, Joseph A Konstan, Loren G Terveen, and John T Riedl. [51] Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Vik-
2004. Evaluating collaborative filtering recommender systems. ACM TOIS 22, 1 tor K. Prasanna. 2020. GraphSAINT: Graph Sampling Based Inductive Learning
(2004), 5–53. Method. In ICLR. OpenReview.net.
[23] Wenbing Huang, Tong Zhang, Yu Rong, and Junzhou Huang. 2018. Adaptive [52] Muhan Zhang and Yixin Chen. 2017. Weisfeiler-lehman neural machine for link
sampling towards fast graph representation learning. In Advances in neural prediction. In KDD. ACM, 575–583.
information processing systems. 4558–4567. [53] Muhan Zhang and Yixin Chen. 2018. Link prediction based on graph neural
[24] Ankit Jain, Isaac Liu, Ankur Sarda, , and Piero Molino. 2019. Food Discovery networks. In Advances in Neural Information Processing Systems. 5165–5175.
with Uber Eats: Recommending for the Marketplace. (2019). https://eng.uber. [54] Muhan Zhang and Yixin Chen. 2020. Inductive Matrix Completion Based on
com/uber-eats-graph-learning/ Graph Neural Networks. In ICLR. OpenReview.net.

You might also like