0% found this document useful (0 votes)

27 views11 pages

1361 2019 Heterogeneous Graph Neural Network

Heterogeneous GNN's

Uploaded by

ms.madhu27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views11 pages

1361 2019 Heterogeneous Graph Neural Network

Heterogeneous GNN's

Uploaded by

ms.madhu27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA

Heterogeneous Graph Neural Network

Chuxu Zhang Dongjin Song Chao Huang
University of Notre Dame NEC Laboratories America, Inc. University of Notre Dame, JD Digits
[email protected] [email protected] [email protected]

Ananthram Swami Nitesh V. Chawla

US Army Research Laboratory University of Notre Dame
[email protected] [email protected]
ABSTRACT SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’19),
Representation learning in heterogeneous graphs aims to pursue August 4–8, 2019, Anchorage, AK, USA. ACM, New York, NY, USA, 11 pages.
https://doi.org/10.1145/3292500.3330961
a meaningful vector representation for each node so as to facili-
tate downstream applications such as link prediction, personalized
recommendation, node classification, etc. This task, however, is
challenging not only because of the demand to incorporate het- 1 INTRODUCTION
erogeneous structural (graph) information consisting of multiple Heterogeneous graphs (HetG) [26, 27] contain abundant informa-
types of nodes and edges, but also due to the need for considering tion with structural relations (edges) among multi-typed nodes as
heterogeneous attributes or contents (e.д., text or image) associ- well as unstructured content associated with each node. For in-
ated with each node. Despite a substantial amount of effort has stance, the academic graph in Fig. 1(a) denotes relations between
been made to homogeneous (or heterogeneous) graph embedding, authors and papers (write), papers and papers (cite), papers and
attributed graph embedding as well as graph neural networks, few venues (publish), etc. Moreover, nodes in this graph carry attributes
of them can jointly consider heterogeneous structural (graph) infor- (e.д., author id) and text (e.д., paper abstract). Another example
mation as well as heterogeneous contents information of each node illustrates user-item relations in the review graph and nodes are
effectively. In this paper, we propose HetGNN, a heterogeneous associated with attributes (e.д., user id), text (e.д., item description)
graph neural network model, to resolve this issue. Specifically, we and image (e.д., item picture). This ubiquity of HetG has led to an
first introduce a random walk with restart strategy to sample a influx of research on corresponding graph mining methods and
fixed size of strongly correlated heterogeneous neighbors for each algorithms such as relation inference [2, 25, 33, 35], personalized
node and group them based upon node types. Next, we design a recommendation [10, 23], node classification [36], etc.
neural network architecture with two modules to aggregate feature Traditionally, a variety of these HetG tasks have relied on fea-
information of those sampled neighboring nodes. The first module ture vectors derived from a manual feature engineering tasks. This
encodes “deep” feature interactions of heterogeneous contents and requires specifications and computation of different statistics or
generates content embedding for each node. The second module properties about the HetG as a feature vector for downstream ma-
aggregates content (attribute) embeddings of different neighboring chine learning or analytic tasks. However, this can be very limiting
groups (types) and further combines them by considering the im- and not generalizable. More recently, there has been an emergence
pacts of different groups to obtain the ultimate node embedding. of representation learning approaches to automate the feature engi-
Finally, we leverage a graph context loss and a mini-batch gradient neering tasks, which can then facilitate a multitude of downstream
descent procedure to train the model in an end-to-end manner. Ex- machine learning or analytic tasks. Beginning with homogeneous
tensive experiments on several datasets demonstrate that HetGNN graphs [6, 20, 29], graph representation learning has been expanded
can outperform state-of-the-art baselines in various graph mining to heterogeneous graphs [1, 4], attributed graphs [15, 34] as well
tasks, i.e., link prediction, recommendation, node classification & as specific graphs [22, 28]. For instance, the “shallow” models, e.д.,
clustering and inductive node classification & clustering. DeepWalk [20], were initially developed to feed a set of short ran-
dom walks over the graph to the SkipGram model [19] so as to
KEYWORDS approximate the node co-occurrence probability in these walks
Heterogeneous graphs, Graph neural networks, Graph embedding and obtain node embeddings. Subsequently, semantic-aware ap-
proaches, e.д., metapath2vec [4], were proposed to address node
ACM Reference Format:
and relation heterogeneity in heterogeneous graphs. In addition,
Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh
content-aware approaches, e.д., ASNE [15], leveraged both “latent”
V. Chawla. 2019. Heterogeneous Graph Neural Network. In The 25th ACM
features and attributes to learn node embeddings in the graph.
Publication rights licensed to ACM. ACM acknowledges that this contribution was These methods learn node “latent” embeddings directly, but are
authored or co-authored by an employee, contractor or affiliate of the United States limited in capturing the rich neighborhood information. The Graph
government. As such, the Government retains a nonexclusive, royalty-free right to
publish or reproduce this article, or to allow others to do so, for Government purposes Neural Networks (GNNs) employ deep neural networks to aggre-
only. gate feature information of neighboring nodes, which makes the
KDD ’19, August 4–8, 2019, Anchorage, AK, USA aggregated embedding more powerful. In addition, the GNNs can
© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-6201-6/19/08. . . $15.00 be naturally applied to inductive tasks involving nodes that are not
https://doi.org/10.1145/3292500.3330961 present in the training period. For instance, GCN [12], GraphSAGE

793
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA

Table 1: Model comparison: (1) RL - representation learning?

(a) author paper venue user item (2) HG - heterogeneous graph? (3) C - content aware? (4) HC
a1 p1 u1 i1 - heterogeneous contents aware? (5) I - inductive inference?
a2 p2 v1 u2 i2 DW MP2V ASNE SHNE GSAGE GAT
Property HetGNN
[20] [4] [15] [34] [7] [31]
a3 p3 v2 u3 i3
RL ✓ ✓ ✓ ✓ ✓ ✓ ✓
a4 p4 u4 i4 HG ✗ ✓ ✗ ✓ ✗ ✗ ✓
academic graph review graph C ✗ ✗ ✓ ✓ ✓ ✓ ✓

(b) HC ✗ ✗ ✓ ✗ ✓ ✓ ✓
heterogeneous graph attributes I ✗ ✗ ✗ ✗ ✓ ✓ ✓
type-1
text

d ...
c attributes same feature transformation function for all node types as their
type-2
f contents vary from each other. Thus challenge 2 is: how to de-
a image
b e ... sign node content encoder for addressing content heterogeneity of
... different nodes in HetG, as indicated by C2 in Figure 1(b)?
g
text • (C3) Different types of neighbors contribute differently to the
type-k
image node embeddings in HetG. For example, in the academic graph
... of Figure 1(a), author and paper neighbors should have more
C2 C3 influence on the embedding of author node as a venue node
C1
contains diverse topics thus has more general embedding. Most
Figure 1: (a) HetG examples: an academic graph and a review of current GNNs mainly focus on homogeneous graphs and do not
graph. (b) Challenges of graph neural network for HetG: C1 consider node type impact. Thus challenge 3 is: how to aggregate
- sampling heterogeneous neighbors (for node a in this case, feature information of heterogeneous neighbors by considering the
node colors denote different types); C2 - encoding heteroge- impacts of different node types, as indicated by C3 in Figure 1(b).
neous contents; C3 - aggregating heterogeneous neighbors. To solve these challenges, we propose HetGNN, a heterogeneous
[7], and GAT [31] employ convolutional operator, LSTM architec- graph neural network model for representation learning in HetG.
ture, and self-attention mechanism to aggregate feature information First, we design a random walk with restart based strategy to sample
of neighboring nodes, respectively. The advances and applications fixed size strongly correlated heterogeneous neighbors of each
of GNNs are largely concentrated on homogeneous graphs. Current node in HetG and group them according to node types. Next, we
state-of-the-art GNNs have not well solved the following challenges design a heterogeneous graph neural network architecture with two
faced for HetG, which we address in this paper. modules to aggregate feature information of sampled neighbors in
previous step. The first module employs recurrent neural network
• (C1) Many nodes in HetG may not connect to all types of neigh-
to encode “deep” feature interactions of heterogeneous contents
bors. In addition, the number of neighboring nodes varies from
and obtains content embedding of each node. The second module
node to node. For example, in Figure 1(a), any author node has
utilizes another recurrent neural network to aggregate content
no direct connection to a venue node. Meanwhile, in Figure 1(b),
embeddings of different neighboring groups, which are further
node a has 5 direct neighbors while node c only has 2. Most
combined by an attention mechanism for measuring the different
existing GNNs only aggregate feature information of direct (first-
impacts of heterogeneous node types and obtaining the ultimate
order) neighboring nodes and the feature propagation process
node embedding. Finally, we leverage a graph context loss and
may weaken the effect of farther neighbors. Moreover, the embed-
a mini-batch gradient descent procedure to train the model. To
ding generation of “hub” node is impaired by weakly correlated
summarize, the main contributions of our work are:
neighbors (“noise” neighbors) and the embedding of “cold-start”
node is not sufficiently represented due to limited neighbor infor- • We formalize the problem of heterogeneous graph representation
mation. Thus challenge 1 is: how to sample heterogeneous neighbors learning which involves both graph structure heterogeneity and
that are strongly correlated to embedding generation for each node node content heterogeneity.
in HetG, as indicated by C1 in Figure 1(b)? • We propose an innovative heterogeneous graph neural network
• (C2) A node in HetG can carry unstructured heterogeneous con- model, i.e., HetGNN, for representation learning on HetG. Het-
tents, e.д., attributes, text or image. In addition, content associated GNN is able to capture both structure and content heterogeneity
with different types of nodes can be different. For example, in and is useful for both transductive and inductive tasks. Table 1
Figure 1(b), type-1 nodes (e.д., b or c) contain attributes and text summarizes the key advantages of HetGNN, comparing to a num-
content, type-2 nodes (e.д., f or д) carry attributes and image, ber of recent models which include homogeneous, heterogeneous,
type-k nodes (e.д., d or e) are associated with text and image. attributed graph models, and graph neural network models.
The direct concatenation operation or linear transformation by • We conduct extensive experiments on several public datasets and
the current GNNs cannot model “deep” interactions among node our results demonstrate the superior performance of HetGNN
heterogeneous contents. Moreover, it is not applicable to use the over state-of-the-art baselines for numerous graph mining tasks

794
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA

including link prediction, recommendation, node classification & • They are not suitable for aggregating heterogeneous neighbors
clustering, and inductive node classification & clustering. which have different content features. Heterogeneous neighbors
may require different feature transformations to deal with differ-
2 PROBLEM DEFINITION ent feature types and dimensions.
In this section, we introduce the concept of content-associated het- In light of these issues and to solve the challenge C1, we design a
erogeneous graphs that will be used in the paper and then formally heterogeneous neighbors sampling strategy based on random walk
define the problem of heterogeneous graph representation learning. with restart (RWR). It contains two consecutive steps:
Definition 2.1. Content-associated Heterogeneous Graphs. • Step-1: Sampling fixed length RWR. We start a random walk
A content associated heterogeneous graph (C-HetG) is defined as from node v ∈ V . The walk iteratively travels to the neighbors
a graph G = (V , E, OV , R E ) with multiple types of nodes V and of current node or returns to the starting node with a probabil-
links E. OV and R E represent the set of object types and that of ity p. RWR runs until it successfully collects a fixed number of
relation types, respectively. In addition, each node is associated nodes, denoted as RWR(v). Note that numbers of different types
with heterogeneous contents, e.д., attributes, text, or image. of nodes in RWR(v) are constrained to ensure that all node types
are sampled for v.
The academic graph in Figure 1(a) is a C-HetG. The node types • Step-2: Grouping different types of neighbors. For each node type
OV includes author, paper and venue. The link types R E includes t, we select top kt nodes from RWR(v) according to frequency
author-write-paper, paper-cite-paper and paper-publish-venue. Be- and take them as the set of t-type correlated neighbors of node v.
sides, the author or venue node is associated with paper abstract
written by the author or included in the venue, and the paper node This strategy is able to avoid the aforementioned issues due to: (1)
contains abstract, references, as well as venue. The bipartite review RWR collects all types of neighbors for each node; (2) the sampled
graph in Figure 1(a) is also C-HetG as |OV | + |R E | ≥ 3, where OV neighbor size of each node is fixed and the most frequently visited
includes user and item, the relation R E indicates review behavior. neighbors are selected; (3) neighbors of the same type (having the
The user node is associated with review that is written by the user same content features) are grouped such that type-based aggre-
and the item node contains title, description, and picture. gation can be deployed. Next, we design a heterogeneous graph
neural network architecture with two modules to aggregate feature
Problem 1. Heterogeneous Graph Representation Learning. information of the sampled heterogeneous neighbors for each node.
Given a C-HetG G = (V , E, OV , R E ) with node content set C,
the task is to design a model FΘ with parameters Θ to learn d- 3.2 Encoding Heterogeneous Contents (C2)
dimensional embeddings E ∈ R |V |×d (d ≪ |V |) that are able to To solve the challenge C2, we design a module to extract hetero-
encode both heterogeneous structural closeness and heterogeneous geneous contents Cv from node v ∈ V and encode them as a fixed
unstructured contents among them. The node embeddings can size embedding via a neural network f 1 . Specifically, we denote
be utilized in various graph mining tasks, such as link prediction,
the feature representation of i-th content in Cv as xi ∈ Rd f ×1
recommendation, multi-labels classification, and node clustering.
(d f : content feature dimension). Note that xi can be pre-trained
using different techniques w.r .t . different types of contents. For
3 HetGNN example, we can utilize Par2Vec [13] to pre-train text content or
In this section, we formally present HetGNN to resolve those three employ CNNs [17] to pre-train image content. Unlike the previous
challenges described in Section 1. HetGNN consists of four parts: models [7, 31] that concatenate different content features directly
(1) sampling heterogeneous neighbors; (2) encoding node hetero- or linearly transform them into an unified vector, we design a new
geneous contents; (3) aggregating heterogeneous neighbors; (4) architecture based on bi-directional LSTM (Bi-LSTM) [9] to capture
formulating the objective and designing model training procedure. “deep” feature interactions and obtain larger expressive capability.
Figure 2 illustrates the framework of HetGNN. Formally, the content embedding of v is computed as follows:
−−−−−→ É ←−−−−−
h i
3.1 Sampling Heterogeneous Neighbors (C1) i ∈ Cv LST M F C θ x (xi ) LST M F C θ x (xi )
Í

The key idea of most graph neural networks (GNNs) is to aggregate f 1 (v) =
|Cv |
feature information from a node’s direct (first-order) neighbors, (1)
such as GraphSAGE [7] or GAT [31]. However, directly applying
where f 1 (v) ∈ Rd×1 (d: content embedding dimension), F C θ x
these approaches to heterogeneous graphs may raise several issues:
denotes feature transformer which can be identity (no transforma-
• They cannot directly capture feature information from different
types of neighbors. For example, authors do not directly connect Éconnected neural network with parameter θ x , etc. The
tion), fully
operator denotes concatenation. The LSTM is formulated as:
to local authors and venue neighbors in Fig. 1(a), which could
lead to insufficient representation. zi = σ (Uz F C θ x (xi ) + Wz hi−1 + bz )
• They are weakened by various neighbor sizes. Some author writes fi = σ (Uf F C θ x (xi ) + Wf hi−1 + bf )
many papers while some only have few papers in the academic oi = σ (Uo F C θ x (xi ) + Wo hi−1 + bo )
graph. Some items are reviewed by many users while some receive (2)
ĉi = tanh(Uc F C θ x (xi ) + Wc hi−1 + bc )
few feedbacks in the review graph. The embedding of “hub” node
could be impaired by weakly correlated neighbors and “cold-start” ci = fi ◦ ci−1 + zi ◦ ĉi
node embedding may not be sufficiently represented. hi = tanh(ci ) ◦ oi

795
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA

(a) neighbors type-based feature features neighbors

aggregation
types
sampling neighbors extraction aggregation mixture

NN-1 NN-2
attributes

text NN-3
a a image Graph a
text
Context Loss
attributes

attributes

(b) NN-1 (c) NN-2 (d) NN-3

content features pre-train features aggregation
one-hot

BiLSTM
BiLSTM

attributes ℱ'
type-1 + !"
par2vec

BiLSTM
BiLSTM

text +

Mean Pooling
Mean Pooling

ℱ' K types type-2 !#

type-base
neighbors

...
+

...
...
neighbors
... ...

...
...
...
...
...

ℱ'
!$
CNN
type-K +

BiLSTM
BiLSTM

image !%
ℱ' self +
node heterogeneous contents encoding same type neighbors aggregation types mixture with attention

Figure 2: (a) The overall architecture of HetGNN: it first samples fix sized heterogeneous neighbors for each node (node a in this
case), next encodes each node content embedding via NN-1, then aggregates content embeddings of the sampled heterogeneous
neighbors through NN-2 and NN-3, finally optimizes the model via a graph context loss; (b) NN-1: node heterogeneous contents
encoder; (c) NN-2: type-based neighbors aggregator; (d) NN-3: heterogeneous types combination.

where hi ∈ R(d/2)×1 is the output hidden state of i-th content, ◦ 3.3.1 Same Type Neighbors Aggregation.
denotes Hadamard product, Uj ∈ R(d/2)×d f , Wj ∈ R(d/2)×(d/2) , In Section 3.1, we use RWR based strategy to sample fixed size
and bj ∈ R(d/2)×1 (j ∈ {z, f , o, c}) are learnable parameters, zi , fi , neighbor sets of different node types for each node. Accordingly,
and oi are forget gate vector, input gate vector, and output gate we denote the t-type sampled neighbor set of v ∈ V as Nt (v). Then,
vector of i-th content feature, respectively. To be more specific, we employ a neural network f 2t to aggregate content embeddings of
the above architecture first uses different F C layers to transform v ′ ∈ Nt (v). Formally, the aggregated t-type neighbors embedding
different content features, then employs the Bi-LSTM to capture for v is formulated as follows:
“deep” feature interactions and accumulate expression capability
f 2t (v) = AGvt ′ ∈N (v) f 1 (v ′ )

of all content features, and finally utilizes a mean pooling layer t
(3)
over all hidden states to obtain the general content embedding of
where f 2t (v) ∈ Rd×1 (d: aggregated content embedding dimension),
v, as illustrated in Figure 2(b). Note that the Bi-LSTM operates on
f 1 (v ′ ) is the content embedding of v ′ generated by the module in
an unordered content set Cv , which is inspired by previous work
Section 3.2, AG t is the t-type neighbors aggregator which can
[7] for aggregating unordered neighbors. Besides, we use different
be fully connected neural network, convolutional neural network,
Bi-LSTMs to aggregate content features for different types of nodes
recurrent neural network, etc. In this work, we use the Bi-LSTM
as their contents vary from each other.
since it yields better performance in practise. Thus we re-formulate
There are three main advantages for this encoding architecture:
f 2t (v) as follows:
(1) it has concise structures with relative low complexity (less pa-
rameters), making the model implementation and tuning relatively h
−−−−−→ É ←−−−−− i
v ′ ∈N t (v) LST M f 1 (v ) LST M f 1 (v ′ )
Í ′
easy; (2) it is capable to fuse the heterogeneous contents informa-
f 2t (v) = (4)
tion, leading to a strong expression capability; (3) it is flexible to add |Nt (v)|
extra content features, making the model extension convenient.
where LSTM module has the same formulation as Eq. (2) except
input and parameter set. Obviously, we employ Bi-LSTM to ag-
3.3 Aggregating Heterogeneous Neighbors (C3) gregate content embeddings of all t-type neighbors and use the
To aggregate content embeddings (obtained from Section 3.2) of average over all hidden states to represent the general aggregated
heterogeneous neighbors for each node and solve the challenge embedding, as illustrated in Figure 2(c). We use different Bi-LSTMs
C3, we design another module which is a type-based neural net- to distinguish different node types for neighbors aggregation. Note
work. It includes two consecutive steps: (1) same type neighbors that the Bi-LSTM operates on an unordered neighbors set, which is
aggregation; (2) types combination. inspired by GraphSAGE [7].

796
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA

3.3.2 Types Combination. 1 as it makes little impact when M > 1. Thus Eq. (9) degenerates to
The previous step generates |OV | (OV : set of node types in the the cross entropy loss:
graph) aggregated embeddings for node v. To combine these type-
log σ (Evc · Ev ) + log σ (−Evc ′ · Ev ) (10)
based neighbor embeddings with v’s content embedding, we em-
ploy the attention mechanism [31]. The motivation is that different In other words, for each context node vc of v, we sample a negative
types of neighbors will make different contributions to the final node vc ′ according to Pt (vc ′ ). Therefore, we can reformulate the
representation of v. Thus the output embedding is formulated as: objective o 1 in Eq. (7) as follows:
Ev = α v,v f 1 (v) + α v,t f 2t (v) o2 =
Õ Õ
(5) log σ (Evc · Ev ) + log σ (−Evc ′ · Ev )
(11)
t ∈OV ⟨v,vc ,vc ′ ⟩ ∈Tw al k

where Ev ∈ Rd×1 (d: output embedding dimension), α v,∗

indicates where Twal k denotes the set of triplets ⟨v, vc , vc ′ ⟩ collected by walk
the importance of different embeddings, f 1 (v) is the content em- sampling on the graph. Similar to DeepWalk [20], we employ the
bedding of v obtained from Section 3.2, f 2t (v) is the type-based random walk to generate Twal k . Specifically, first, we uniformly
aggregated embedding obtained from Section 3.3. We denote the generate a set of random walks S in the heterogeneous graph. Then,
set of embeddings as F (v) = f 1 (v) (f 2t (v), t ∈ OV ) and re-
Ð for each node v in a walk Si ∈ S, we collect context node vc which
formulate the output embedding of v as: satisfies: dist(v, vc ) ≤ τ , i.e., vc is within distance τ to v in Si .
Finally, for each vc , we sample a negative node vc ′ with the same
α v,i fi
Õ
Ev = 3/4
node type of vc according to Pt (vc ′ ) ∝ dдvc ′ , where dдvc ′ is the
f i ∈ F(v) frequency of vc ′ in S. To optimize the model parameters of HetGNN,
(6)
exp LeakyReLU (uT [fi f 1 (v)]) at each iteration, we first sample a mini-batch of triplets in Twal k
É
v,i
α = Í and accumulate the objective according to Eq. (11). Then, we update
exp LeakyReLU T [f f 1 (v)])
É
f j ∈ F(v) (u j
the model parameters via the Adam optimizer [11]. We repeat
where LeakyReLU denotes leaky version of a Rectified Linear Unit, the training iterations until the change between two consecutive
u ∈ R2d×1 is the attention parameter. Figure 2(c) gives the illustra- iterations is sufficiently small (see Section A.1 in supplement for
tion of this step. a pseudocode of this training procedure). With the learned model
In this framework, to make embedding dimension consistent parameters, we can infer node representations E for various graph
and model tuning easy, we use the same dimension d for content mining tasks, as we will show in Section 4.
embedding in Section 3.2, aggregated content embedding in Section
3.3, and output node embedding in Section 3.3. 4 EXPERIMENTS
In this section, we conduct extensive experiments with the aim of
3.4 Objective and Model Training answering the following research questions:
To perform heterogeneous graph representation learning, we define • (RQ1) How does HetGNN perform vs. state-of-the-art baselines
the following objective with parameters Θ: for various graph mining tasks, such as link prediction (RQ1-1),
personalized recommendation (RQ1-2), and node classification
o 1 = arg max
Ö Ö Ö
p(vc |v; Θ) & clustering (RQ1-3)?
Θ (7)
t
v ∈V t ∈OV vc ∈C Nv
• (RQ2) How does HetGNN perform vs. state-of-the-art baselines
where CNvt is the set of t-type context nodes of v such first/second for inductive graph mining tasks, such as inductive node classifi-
order neighbors [29] in the graph or local neighbors in short random cation & clustering?
walks [20]. The conditional probability p(vc |v; Θ) is defined as the • (RQ3) How do different components, e.д., node heterogeneous
heterogeneous softmax function: contents encoder or heterogeneous neighbors aggregator, affect
the model performance?
exp Evc · Ev

• (RQ4) How do various hyper-parameters, e.д., embedding dimen-
p(vc |v; Θ) = Í , Ev = FΘ (v) (8)
v k ∈Vt exp Ev k · Ev

sion or the size of sampled heterogeneous neighbors set, impact
the model performance?
where Vt is the set of t-type nodes in the graph, Ev is the output
node embedding formulated by the proposed graph neural network 4.1 Experiment Design
Eq. (6) with all neural network parameters Θ. We leverage the 4.1.1 Datasets.
negative sampling technique (NS) [19] to optimize the objective We use four datasets of two kinds of HetG: academic graph and
o 1 in Eq. (7). Specifically, by applying NS to the construction of review graph. For the academic graph, we extract two datasets,
softmax function in Eq. (8), we can approximate the logarithm of i.e., A-I contains papers between year 1996 and 2005 and A-II con-
p(vc |v; Θ) as: tains papers between year 2006 and 2015), from the public AMiner
M
Õ [30] data1 . For the review graph, we extract two datasets, i.e., R-I
log σ (Evc · Ev ) + Evc ′ ∼Pt (vc ′ ) log σ (−Evc ′ · Ev ) (9) (Movies category) and R-II (CDs category), from the public Amazon
m=1 [8] data2 . The main statistics of four datasets are summarized in
where M is the negative sample size and Pt (vc ′ ) is the pre-defined 1 https://aminer.org/data

noise distribution w.r .t . the t-type nodes. In this model, we set M = 2 http://jmcauley.ucsd.edu/data/amazon/index.html

797
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA

Table 2: Datasets used in this work. Table 3: Link prediction results. Split notation in data de-
notes train/test data split years or ratios.
Data Node Edge
# author: 160,713 # author-paper: 295,103 MP2V ASNE SHNE GSAGE GAT
Dataspl i t Metric HetGNN
Academic I (A-I) # paper: 111,409 # paper-paper: 138,464 [4] [15] [34] [7] [31]
# venue: 150 # paper-venue: 111,409
A-I2003 AUC 0.636 0.683 0.696 0.694 0.701 0.714
# author: 28,646 # author-paper: 69,311 (type-1) F1 0.435 0.584 0.597 0.586 0.606 0.620
Academic II (A-II) # paper: 21,044 # paper-paper: 46,931
A-I2003 AUC 0.790 0.794 0.781 0.790 0.821 0.837
# venue: 18 # paper-venue: 21,044
(type-2) F1 0.743 0.774 0.755 0.746 0.792 0.815
# user: 18,340
Movies Review (R-I) # user-item: 629,125 A-I2002 AUC 0.626 0.667 0.688 0.681 0.691 0.710
# item: 56,361
(type-1) F1 0.412 0.554 0.590 0.567 0.589 0.615
# user: 16,844
CDs Review (R-II) # user-item: 555,050 A-I2002 AUC 0.808 0.782 0.795 0.806 0.837 0.851
# item: 106,892
(type-2) F1 0.770 0.753 0.761 0.772 0.816 0.828
A-II2013 AUC 0.596 0.689 0.683 0.695 0.678 0.717
Table 2 (see Section A.2 in supplement for detail of these datasets). (type-1) F1 0.348 0.643 0.639 0.615 0.613 0.669
Note that HetGNN is flexible to be applied to other HetG. A-II2013 AUC 0.712 0.721 0.695 0.714 0.732 0.767
4.1.2 Baselines. (type-2) F1 0.647 0.713 0.674 0.664 0.705 0.754
We use five baselines including heterogeneous graph embedding A-II2012 AUC 0.586 0.671 0.672 0.676 0.655 0.701
(type-1) F1 0.318 0.615 0.612 0.573 0.560 0.642
model metapath2vec [4] (represented as MP2V), attributed graph
models ASNE [15] and SHNE [34], as well as graph neural network A-II2012 AUC 0.724 0.726 0.706 0.739 0.750 0.775
(type-2) F1 0.664 0.737 0.692 0.706 0.715 0.757
models GraphSAGE [7] (represented as GSAGE) and GAT [31]
AUC 0.634 0.623 0.651 0.661 0.683 0.749
(see Section A.3 in supplement for detailed settings of these baseline R-I5:5
F1 0.445 0.551 0.586 0.542 0.665 0.735
methods).
AUC 0.701 0.656 0.695 0.716 0.706 0.787
4.1.3 Reproducibility. R-I7:3
F1 0.595 0.613 0.660 0.688 0.702 0.776
For the proposed model, the embedding dimension is set as 128. AUC 0.678 0.655 0.685 0.677 0.712 0.736
The size of sampled neighbor set (in Section 3.1) equals 23 (10, 10, R-II5:5
F1 0.541 0.582 0.593 0.565 0.659 0.701
3 for author, paper, venue neighbor groups, respectively) in aca- AUC 0.737 0.695 0.728 0.721 0.742 0.772
R-II7:3
demic data. This value equals 20 (10, 10 for user, item neighbor F1 0.660 0.648 0.685 0.653 0.713 0.749
groups, respectively) in review data. We use Par2Vec [19] and CNN
[17] to pre-train text and image features, respectively. Besides, the before Ts (split year) is training data, otherwise test data. Ts of A-I
DeepWalk [20] is employed to pre-train node embeddings. The data is set to 2003 and 2002. The value for A-II data is set to 2013
nodes in academic data are associated with text (paper abstract) fea- and 2012. In the review data, we consider user-item review links
tures and pre-trained node embeddings, while the nodes in review and divide training/test data sequentially. The train/test ratio (in
data include text (item description), image (item picture) features, terms of review number) is set to 7 : 3 and 5 : 5 for both R-I and
and pre-trained node embeddings. Section A.4 of supplement con- R-II data.
tains more detailed settings. We employ Pytorch3 to implement Result. The performances of all models are reported in Table
HetGNN and conduct experiments on GPU. Code is available at: 3, where the best results are highlighted in bold. According to this
https://github.com/chuxuzhang/KDD2019_HetGNN. table: (a) the best baselines in most cases are attributed graph em-
bedding methods or graph neural network models, showing that
4.2 Applications incorporating node attributes or employing deep neural network
4.2.1 Link Prediction ( RQ1-1). generates desirable node embeddings for link prediction; (b) Het-
Which links will happen in the future? To answer RQ1-1, we design GNN outperforms all baselines in all cases especially in review
experiments to evaluate HetGNN on several link prediction tasks. data. The relative improvements (%) over the best baselines range
Setting. Unlike previous work [6] that randomly samples a por- from 1.5% to 5.6% and 3.4% to 10.5% for academic data and review
tion of links for training and uses the remaining for evaluation, data, respectively. It demonstrates that the proposed heterogeneous
we consider a more practical setting that splits training and test graph neural network framework is effective and obtains better
data sequentially. Specifically, first, the graph of training data is node embeddings (than baselines) for link prediction.
utilized to learn node embeddings and the corresponding links 4.2.2 Recommendation ( RQ1-2).
are used to train a binary logistic classifier. Then, test relations Which nodes should be recommended to the target node? To answer
with equal number of random negative (non-connected) links are RQ1-2, we design experiment to evaluate HetGNN on personalized
used to evaluate the trained classifier. In addition, only new links node recommendation task.
among nodes in training data are considered and duplicated links Setting. The concept of node recommendation is similar to link
are removed from evaluation. The link embedding is formed by prediction besides the experimental settings and evaluation metrics.
element-wise multiplication of embeddings of the two edge nodes. To distinguish with the previous link prediction task, we evaluate
We use AUC and F1 scores as evaluation metrics. In academic data, venue recommendation (author-venue link) performance in the
we consider two types of links: (type-1) collaboration between two academic data. Specifically, the graph in training data is utilized
authors and (type-2) citation between author and paper. The data to learn node embeddings. The ground truth of recommendation
3 https://pytorch.org/ is based on author’s appearance (having papers) in venue of test

798
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA

Table 4: Recommendation results. Split notation in data de-

2 dimension 3 dimension
notes train/test data split years.
NLP NLP
MP2V ASNE SHNE GSAGE GAT
Dataspl i t Metric HetGNN
[4] [15] [34] [7] [31]
Rec 0.158 0.201 0.298 0.263 0.275 0.319
A-I2003 Pre 0.044 0.060 0.081 0.077 0.079 0.094 CV
F1 0.069 0.092 0.127 0.120 0.123 0.145 CV

A-I2002
Rec
Pre
0.144
0.046
0.152
0.050
0.279
0.086
0.231
0.073
0.274
0.087
0.293
0.093
DM
F1 0.070 0.075 0.134 0.112 0.132 0.141 DM
Rec 0.516 0.419 0.608 0.540 0.568 0.625 DB DB
A-II2013 Pre 0.207 0.174 0.241 0.219 0.230 0.252
F1 0.295 0.333 0.345 0.312 0.327 0.359 Figure 3: Author embeddings visualization of four selected
Rec 0.468 0.382 0.552 0.512 0.518 0.606 domains in academic data.
A-II2012 Pre 0.204 0.171 0.233 0.224 0.227 0.264
F1 0.284 0.236 0.327 0.312 0.316 0.368 Table 6: Inductive multi-labels classification (IMC) and node
clustering (INC) results. Percentage is training data ratio.
Table 5: Multi-label classification (MC) and node clustering
(NC) results. Percentage denotes training data ratio. Task Metric GSAGE [7] GAT [31] HetGNN
IMC Macro-F1 0.938 0.954 0.962
MP2V ASNE SHNE GSAGE GAT (10%) Micro-F1 0.945 0.958 0.965
Task Metric HetGNN
[4] [15] [34] [7] [31]
IMC Macro-F1 0.949 0.956 0.964
MC Macro-F1 0.972 0.965 0.939 0.978 0.962 0.978 (30%) Micro-F1 0.955 0.960 0.968
(10%) Micro-F1 0.973 0.967 0.940 0.978 0.963 0.979
NMI 0.714 0.765 0.840
INC
MC Macro-F1 0.975 0.969 0.939 0.979 0.965 0.981 ARI 0.764 0.803 0.894
(30%) Micro-F1 0.975 0.970 0.941 0.980 0.965 0.982
NMI 0.894 0.854 0.776 0.914 0.845 0.901
NC
ARI 0.933 0.898 0.813 0.945 0.882 0.932
embeddings are used as the input to a logistic regression classifier.
Besides, the size (ratio) of training data is set to 10% and 30%, and
data. The preference score is defined as the inner-product between the remaining nodes are used for test. We use both Micro-F1 and
embeddings of two nodes. We use Recall (Rec), Precision (Pre), and Macro-F1 as evaluation metrics. For the node clustering task, the
F1 scores in top-k recommendation list as the evaluation metric. In learned node embeddings are used as the input to a clustering
addition, duplicated author-venue pairs are removed from evalu- model. Here we employ the k-means algorithm to cluster the data
ation. The reported score is the average value over all evaluated and evaluate the clustering performance in terms of normalized
authors. The same as link prediction task, the train/test split year Ts mutual information (NMI) and adjusted rand index (ARI).
for A-I data is set to 2003 and 2002. The value for A-II data is set to Result. Table 5 reports results of all methods, where the best
2013 and 2012. Besides, k is set to 5 and 3 for two data respectively. results are highlighted in bold. It is can be seen that: (1) most of
Result. The results of different models are reported in Table models have good performance in multi-labels classification and
4. The best results are highlighted in bold. According to this ta- obtain large Macro-F1 and Micro-F1 scores (over 0.95). It is reason-
ble, the best baselines are attributed graph embedding methods or able since authors of four selected domains are quite different from
graph neural network models in most cases. In addition, HetGNN each other; (2) Despite (1), HetGNN achieves the best performance
performs best in all cases. The relative improvements (%) over the or is comparable to the best method for multi-label classification
best baseline range from 2.8% to 16.0%, showing that HetGNN is and node clustering tasks, showing that HetGNN can learn effective
effective and can learn better node embeddings (than baselines) for node embeddings for these tasks.
node recommendation. Furthermore, we employ TensorFlow embedding projector to
4.2.3 Classification and Clustering ( RQ1-3). visualize author embeddings of four domains, as shown by Figure
Which class/cluster does this node belong to? To answer RQ1- 3. For each area, we randomly sample 100 authors. It is easy to see
3, we design experiments to evaluate HetGNN for multi-labels that embeddings of authors in the same class cluster closely and can
classification and node clustering tasks. be well distinguished from others in both 2D and 3D visualizations,
Setting. Similar to metapath2vec [4], we match authors in A- demonstrating the effectiveness of learned node embeddings.
II dataset with four selected research domains, i.e., Data Mining 4.2.4 Inductive Classification and Clustering ( RQ2).
(DM), Computer Vision (CV), Natural Language Processing (NLP) Which class/cluster does new node belong to? To answer RQ2, we
and Database (DB). Specifically, we choose three top venues4 for design experiment to evaluate HetGNN for inductive multi-labels
each area. Each author is labeled with the area with the majority classification and inductive node clustering tasks.
of his/her publications (authors without paper in these venues are Setting. The setting of this task is similar to the previous node
excluded in evaluation). The node embeddings are learned from the classification and clustering tasks except that we use the new node
full dataset. For the multi-labels classification task, the learned node embeddings as the model input. Specifically, first, we use the train-
4 DM: KDD, WSDM, ICDM. CV: CVPR, ICCV, ECCV. NLP: ACL, EMNLP, NAACL. DB: SIGMOD, ing data (A-II dataset, train/test split year = 2013) to train the model.
VLDB, ICDE Then, we employ the learned model to infer the embeddings of

799
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA

Link Prediction (Type-1) Link Prediction (Type-2) Node Recommendation Link Prediction (Type-1) Link Prediction (Type-2) Node Recommendation

Figure 5: Impact of embedding dimension.

Figure 4: Performances of variant proposed models.
Link Prediction (Type-1) Link Prediction (Type-2) Node Recommendation

all new nodes in test data. Finally, we use the inferred new node
embeddings as the input to classification and clustering models.
Result. Table 6 reports performances of graph neural network
models, where the best results are highlighted in bold. According
to this table: (1) all methods have good performances in inductive
Link Prediction (Type-1) Link Prediction (Type-2) Node Recommendation
multi-labels classification as the reason described in the previous
task. However, HetGNN still achieves the best performance; (2)
The result of HetGNN is better than the others for inductive node
clustering. The average relative improvements (%) over GSAGE and
GAT are 17.3% and 10.6%, respectively. It shows that the learned
HetGNN model is effective for inferring new node embeddings. Figure 6: Impact of sampled neighbor size.

4.3 Analysis a function of embedding dimension and sampled neighbor size on

4.3.1 Ablation Study ( RQ3). A-II dataset (train/test split year = 2013) are shown in Figure 5 and
HetGNN is a joint learning framework of node heterogeneous Figure 6, respectively. According to these figures, we can find that:
contents encoding and heterogeneous neighbors aggregation. How • When d varies from 8 to 256, all evaluation metrics increase in
content encoder impact the model performance? Whether neigh- general since better representations can be learned. However, the
bors aggregation is effective for improving the model capability? performance becomes stable or slightly worse when d further
To answer these questions and RQ3, we conduct ablation studies increases. This may due to over-fitting.
to evaluate performances of several model variants which include: • When the neighbor size varies from 6 to 34, all evaluation metrics
(a) No-Neigh that uses heterogeneous contents encoding to rep- increase at first as suitable amount of neighborhood information
resent each node embedding (without neighbors information); (b) are considered. But when the size of neighbors exceeds a certain
Content-FC that employs a fully connected neural network (FC) value, performance decreases slowly which may due to uncorre-
to encode node heterogeneous contents; (c) Type-FC that utilizes a lated (“noise”) neighbors are involved. The best neighbor size is
FC to combine embeddings of different neighbor types (see Section in the range of 20 to 30.
A.5 in supplement for detail of model variants). The results of link
prediction and node recommendation on A-II dataset (train/test 5 RELATED WORK
split year = 2013) are reported in Figure 4. From this figure: The related study includes: (1) heterogeneous graph mining; (2)
• HetGNN has better performance than No-Neigh in most cases, graph representation learning; (3) graph neural networks.
demonstrating that aggregating neighbors information is effective Heterogeneous graph mining. In the past decade, many work
for generating better node embeddings. have been devoted to mining heterogeneous graphs (HetG) for dif-
• HetGNN outperforms Content-FC, indicating that the Bi-LSTM ferent applications such as relation inference [2, 25, 33, 35], person-
based content encoding is better than “shallow” encoding like FC alized recommendation [10, 23], classification [36], etc. For example,
for capturing “deep” content feature interactions. Sun et al . [25] leveraged metapath based approach to extract topo-
• HetGNN achieves better results than Type-FC, showing that self- logical features and predict citation relationship in academic graph.
attention is better than FC for capturing node type impact. Chen et al . [2] designed a HetG based ranking model to identify
4.3.2 Hyper-parameters Sensitivity ( RQ4). authors of anonymous papers. Zhang et al . [36] proposed a deep
The hyper-parameters play important roles in HetGNN, as they convolutional classification model for collective classification in
determine how the node embeddings will be generated. We conduct HetG.
experiments to analyze the impacts of two key parameters, i.e., the Graph representation learning. Graph representation learn-
embedding dimension d and the size of sampled neighbors set for ing [3] has became one of the most popular data mining topics in
each node (see Section A.6 in supplement for detailed setup). The the past few years. Graph structure based models [4, 6, 20, 29] were
link prediction and recommendation performances of HetGNN as proposed to learn vectorized node embeddings that can be further

800
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA

utilized in various graph mining tasks. For example, inspired by [7] Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation
word2vec [19], Perozzi et al . [20] developed the innovative Deep- learning on large graphs. In NIPS. 1024–1034.
[8] Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual
Walk which introduces node-context concept in graph (analogy to evolution of fashion trends with one-class collaborative filtering. In WWW. 507–
word-context) and feeds a set of random walks over graph (anal- 517.
[9] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural
ogy to “sentences”) to SkipGram so as to obtain node embeddings. computation 9, 8 (1997), 1735–1780.
Later, to address graph structure heterogeneity, Dong et al . [4] [10] Binbin Hu, Chuan Shi, Wayne Xin Zhao, and Philip S Yu. 2018. Leveraging
introduced metapath guided walks and proposed metapath2vec meta-path based context for top-n recommendation with a neural co-attention
model. In KDD. 1531–1540.
for representation learning in HetG. Further, attributed graph em- [11] Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic opti-
bedding models [14, 15, 34] have been proposed to leverages both mization. arXiv preprint arXiv:1412.6980 (2014).
graph structure and node attributes for learning node embeddings. [12] Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph
convolutional networks. In ICLR.
Besides those methods, many other approaches have been proposed [13] Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and
[1, 18, 21, 28, 32], such as NetMF [21] that learns node embedding documents. In ICML. 1188–1196.
[14] Jundong Li, Harsh Dani, Xia Hu, Jiliang Tang, Yi Chang, and Huan Liu. 2017.
via matrix factorization and NetRA [32] that uses adversarially Attributed network embedding for learning in a dynamic environment. In CIKM.
regularized autoencoders to learn node embeddings, and so on. 387–396.
Graph neural networks. Recently, with the advent of deep [15] Lizi Liao, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua. 2018. Attributed
social network embedding. TKDE 30, 12 (2018), 2257–2270.
learning, graph neural networks (GNNs) [5, 7, 12, 16, 24, 31] has [16] Ziqi Liu, Chaochao Chen, Xinxing Yang, Jun Zhou, Xiaolong Li, and Le Song.
gained a lot of attention. Unlike previous graph embedding models, 2018. Heterogeneous Graph Neural Networks for Malicious Account Detection.
the key idea behind GNNs is to aggregate feature information from In CIKM. 2077–2085.
[17] Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional
node’s local neighbors via neural networks. For example, Graph- networks for semantic segmentation. In CVPR. 3431–3440.
SAGE [7] uses neural networks, e.д., LSTM, to aggregate neighbors’ [18] Jianxin Ma, Peng Cui, Xiao Wang, and Wenwu Zhu. 2018. Hierarchical Taxonomy
Aware Network Embedding. In KDD. 1920–1929.
feature information. Besides, GAT [31] employs self-attention mech- [19] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013.
anism to measure impacts of different neighbors and combine their Distributed representations of words and phrases and their compositionality. In
impacts to obtain node embeddings. Moreover, some task depen- NIPS. 3111–3119.
[20] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning
dent approaches, e.д., GEM [16] for malicious accounts detection, of social representations. In KDD. 701–710.
have been proposed to obtain better node embeddings for specific [21] Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018.
tasks. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and
node2vec. In WSDM. 459–467.
[22] Meng Qu, Jian Tang, and Jiawei Han. 2018. Curriculum Learning for Heteroge-
6 CONCLUSION neous Star Network Embedding via Deep Reinforcement Learning. In WSDM.
468–476.
In this paper, we introduced the problem of heterogeneous graph [23] Xiang Ren, Jialu Liu, Xiao Yu, Urvashi Khandelwal, Quanquan Gu, Lidan Wang,
representation learning and proposed a heterogeneous graph neu- and Jiawei Han. 2014. Cluscite: Effective citation recommendation by information
network-based clustering. In KDD. 821–830.
ral network model, i.e., HetGNN, to address this problem. HetGNN [24] Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan
jointly considered node heterogeneous contents encoding, type- Titov, and Max Welling. 2018. Modeling relational data with graph convolutional
based neighbors aggregation, and heterogeneous types combina- networks. In ESWC. 593–607.
[25] Yizhou Sun, Jiawei Han, Charu C Aggarwal, and Nitesh V Chawla. 2012. When
tion. In the training stage, a graph context loss and a mini-batch will it happen?: relationship prediction in heterogeneous information networks.
gradient descent procedure were employed to learn the model pa- In WSDM. 663–672.
rameters. Extensive experiments on various graph mining tasks, [26] Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim:
Meta path-based top-k similarity search in heterogeneous information networks.
i.e., link prediction, recommendation, node classification & cluster- VLDB 4, 11 (2011), 992–1003.
ing and inductive node classification & clustering, demonstrated [27] Yizhou Sun, Brandon Norick, Jaiwei Han, Xifeng Yan, Philip Yu, and Xiao Yu.
2012. PathSelClus: Integrating Meta-Path Selection with User-Guided Object
that HetGNN can outperform state-of-the-art methods. Clustering in Heterogeneous Information Networks. In KDD. 1348–1356.
[28] Jian Tang, Meng Qu, and Qiaozhu Mei. 2015. Pte: Predictive text embedding
ACKNOWLEDGMENTS through large-scale heterogeneous text networks. In KDD. 1165–1174.
[29] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei.
This work is supported by the CCDC Army Research Laboratory 2015. Line: Large-scale information network embedding. In WWW. 1067–1077.
under Cooperative Agreement Number W911NF-09-2-0053 (Net- [30] Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnet-
miner: extraction and mining of academic social networks. In KDD. 990–998.
work Science CTA) and the National Science Foundation (NSF) [31] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro
grant IIS-1447795. Lio, and Yoshua Bengio. 2018. Graph attention networks. In ICLR.
[32] Wenchao Yu, Cheng Zheng, Wei Cheng, Charu C Aggarwal, Dongjin Song, Bo
Zong, Haifeng Chen, and Wei Wang. 2018. Learning Deep Network Representa-
REFERENCES tions with Adversarially Regularized Autoencoders. In KDD. 2663–2671.
[1] Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, and [33] Chuxu Zhang, Chao Huang, Lu Yu, Xiangliang Zhang, and Nitesh V Chawla.
Thomas S Huang. 2015. Heterogeneous network embedding via deep archi- 2018. Camel: Content-Aware and Meta-path Augmented Metric Learning for
tectures. In KDD. 119–128. Author Identification. In WWW. 709–718.
[2] Ting Chen and Yizhou Sun. 2017. Task-Guided and Path-Augmented Heteroge- [34] Chuxu Zhang, Ananthram Swami, and Nitesh V Chawla. 2019. SHNE: Represen-
neous Network Embedding for Author Identification. In WSDM. 295–304. tation Learning for Semantic-Associated Heterogeneous Networks. In WSDM.
[3] Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. 2018. A survey on network 690–698.
embedding. TKDE (2018). [35] Chuxu Zhang, Lu Yu, Xiangliang Zhang, and Nitesh V Chawla. 2018. Task-Guided
[4] Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: and Semantic-Aware Ranking for Academic Author-Paper Correlation Inference..
Scalable Representation Learning for Heterogeneous Networks. In KDD. 135–144. In IJCAI. 3641–3647.
[5] Hongyang Gao, Zhengyang Wang, and Shuiwang Ji. 2018. Large-scale learnable [36] Yizhou Zhang, Yun Xiong, Xiangnan Kong, Shanshan Li, Jinhong Mi, and Yangy-
graph convolutional networks. In KDD. 1416–1424. ong Zhu. 2018. Deep Collective Classification in Heterogeneous Information
[6] Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for Networks. In WWW. 399–408.
networks. In KDD. 855–864.

801
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA

A SUPPLEMENT Algorithm 1: Training Procedure of HetGNN

A.1 Pseudocode of HetGNN Training input : pre-trained content features of v ∈ V , triplets set Tw al k
output : optimized model parameters Θ (for inferring node
Procedure embeddings E)
The pseudocode of HetGNN training procedure is described in 1 while not done do
Algorithm 1. The content features are pre-trained by different tech- 2 sample a batch of (v, vc , vc ′ ) in Tw al k
niques and Twalk is collected by random walk sampling in graph 3 formulate embeddings of v, vc , and vc ′ by Eq. (6)
(Section 3.4). After training, the optimized model parameters Θ can 4 accumulate the objective by Eq. (11)
be utilized to infer node embeddings E which can be further used 5 update the parameters Θ by Adam
in various graph mining tasks. 6 end
7 return optimized Θ
A.2 Dataset Description
We use four datasets of two types of heterogeneous graphs: aca-
demic graph and review graph. For the academic graph, we extract • Hyper-parameters. The embedding dimension of HetGNN is
two datasets from the public AMiner [30] data5 . The first one (rep- set to 128. In Section 3.1, the return probability of RWR is set to
resented as A-I) contains publications information of the major 0.5 and the length of RWR for node v ∈ V (|RWR(v)|) equals 100.
computer science venues from year 1996 to 2005. In addition, con- The size of sampled neighbors set for each node equals 23 and
sidering most of researchers pay attention to papers published in 20 in academic data and review data, respectively. To be more
top venues, we extract the second one (represented as A-II) which specific, sizes of different neighbor groups (types) are 10 (author),
includes publications in a number of selected top venues6 related to 10 (paper), 3 (venue) in academic data, and 10 (user), 10 (item) in
artificial intelligence and data science from year 2006 to 2015. Each review data. In addition, we use random walk sampling to get
paper has various bibliographic content information: title, abstract, the triplets set Twal k of the graph context loss in Section 3.4. The
authors, references, year, venue. For the review graph, we extract number of random walks rooted at each node equals 10, the walk
two datasets from the public Amazon [8] data (Movies category length is set to 30, the window distance τ equals 5 for both data.
and CDs category)7 . The dataset contains user review and item • Content features. In academic data, we use Par2Vec [19] to pre-
metadata from Amazon spanning from 05/1996 to 07/2014. Each train paper title and abstract contents. Besides, the DeepWalk
item has various content information: title, description, genre, price, [20] is employed to pre-train embeddings of author, paper, venue
and picture. nodes based on the academic heterogeneous graph. The author
node is associated with pre-trained author embedding, average
A.3 Baseline Description abstract and title embeddings of some sampled papers that are
written by the author. Thus the Bi-LSTM length of author con-
We use five baseline methods which include heterogeneous graph
tent encoder equals 3. The paper node carries pre-trained paper
and attributed graph embedding models, as well as graph neural
embedding, title embedding, abstract embedding, average of its
network models.
authors’ pre-trained embeddings, and pre-trained embeddings
• metapath2vec [4]: It is a heterogeneous graph embedding model of its venue. Therefore, the Bi-LSTM length of paper content
which leverages metapath guided walks and Skip-gram model to encoder is 5. The venue node contains pre-trained venue em-
learn node embeddings. bedding, average abstract and title embeddings of some sampled
• ASNE [15]: It is an attributed graph embedding method that uses papers that are included in the venue. In other words, the Bi-
both node “latent” features and attributed features to learn node LSTM length of venue content encoder equals 3. In review data,
embeddings. we use Par2Vec to pre-train item title and description content.
• SHNE [34]: It jointly optimizes graph structure closeness and text The CNN [17] is utilized to pre-train item image (picture). Besides,
semantic correlation to learn node embedding in text-associated DeepWalk is employed to get pre-trained embeddings of user and
heterogeneous graphs. item nodes based on user-item review graph. The user node is
• GraphSAGE [7]: It is a graph neural network model that ag- associated with pre-trained user embedding, average description
gregates feature information of neighbors by different neural and image embeddings of items that are reviewed by the user.
networks, such as LSTM. Thus the Bi-LSTM length of user content encoder is 3. Besides,
• GAT [31]: It is a graph attention network model that aggregates the item node includes pre-trained item embedding, description
neighbors’ feature information by self-attention neural network. embedding, and image embedding. In other words, the Bi-LSTM
length of item content encoder equals 3.
A.4 Reproducibility Settings • Baseline settings. For fair comparison, the embedding dimen-
sion d of all baselines are set to 128 (same as HetGNN). For MP2V,
The detailed settings for reproducing experiments in this work we employ three metapaths, i.e., APA (author-paper-author),
include: APVPA (author-paper-venue-paper-author) and APPA (author-
paper-paper-author), and one metapath, i.e., UIU (user-item-user),
5 https://aminer.org/data
6 ICML, AAAI, IJCAI, CVPR, ICCV, ECCV, ACL, EMNLP, NAACL, KDD, WSDM, ICDM,
in academic and review data, respectively. Besides, the number
SIGMOD, VLDB, ICDE, WWW, SIGIR, CIKM. of walks rooted at each node equals 10 and the walk length is
7 http://jmcauley.ucsd.edu/data/amazon/index.html set to 30 (same as the training procedure of HetGNN). For ASNE,

802
Research Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA

besides “latent” feature, we use the same content features as Het- • Type-FC. This variant replaces types combination module (atten-
GNN and concatenate them as general attribute features. For tion) of HetGNN with a FC. That is, the concatenated embedding
SHNE, we utilize paper abstract and item description (text se- of different neighbor groups (types) is fed to a FC layer to get
quence length = 100) as the input for deep semantic encoding aggregated embedding. The other modules are the same as Het-
(i.e., LSTM) in two data, respectively. Besides, the walk sampling GNN.
setting is the same as MP2V. For GraphSAGE and GAT, we use Besides, the training procedures of all model variants are the same
the same input features (concatenated as a general feature) and as HetGNN.
the sampled neighbors set for each node as HetGNN.
• Software & Hardware. We employ Pytorch8 to implement Het- A.6 Hyper-parameters Sensitivity Setup
GNN and further conduct it on a server with GPU machines. Code In Section 4.3.2, we conduct experiments on A-II dataset (train/test
is available at: https://github.com/chuxuzhang/KDD2019_HetGNN. split year = 2013) to study the impacts of two hyper-parameters:
embedding dimension d and the sampled neighbors size for each
A.5 Model Variants Description node. We investigate a specific parameter by changing its value
In Section 4.3.1, we propose three model variants to conduct abla- and fixing the others. Specifically, when fixing sampled neighbor
tion study experiments. These models are: size (i.e., 23), we set different embedding dimension d (i.e., 8, 16,
• No-Neigh. This variant does not consider neighbors influence 32, 64, 128, 256) of HetGNN and evaluate its performance for each
and uses heterogeneous contents encoding f 1 (v) (Section 3.2) to dimension. Besides, when fixing embedding dimension (i.e., 128),
represent embedding of node v ∈ V . That is, it removes heteroge- we set different sizes of sampled neighbors set (i.e., 6, 12, 17, 23,
neous neighbors aggregation module (Section 3.3) of HetGNN. 28, 34) for each node and evaluate HetGNN’s performance for each
• Content-FC. This variant replaces heterogeneous content en- size. The constitutions of different neighbors groups (types) for
coder (Bi-LSTM) of HetGNN with a fully connected neural net- aforementioned sizes are: 6 = 2 (author) + 2 (paper) + 2 (venue), 12
work (FC). That is, the concatenated content feature is fed to a = 5 (author) + 5 (paper) + 2 (venue), 17 = 7 (author) + 7 (paper) + 3
FC layer to get content embedding. The other modules are the (venue), 23 = 10 (author) + 10 (paper) + 3 (venue), 28 = 12 (author) +
same as HetGNN. 12 (paper) + 4 (venue), and 34 = 15 (author) + 15 (paper) + 4 (venue).
8 https://pytorch.org/

803

Heterogeneous Graph Attention Network
No ratings yet
Heterogeneous Graph Attention Network
21 pages
100 Must Do Leetcode Problems
No ratings yet
100 Must Do Leetcode Problems
36 pages
Wang et al. - 2021 - Self-supervised Heterogeneous Graph Neural Network with Co-contrastive Learning
No ratings yet
Wang et al. - 2021 - Self-supervised Heterogeneous Graph Neural Network with Co-contrastive Learning
11 pages
10 Graph Neural Networks v2.2
No ratings yet
10 Graph Neural Networks v2.2
61 pages
822 2020 Metapath Aggregated Graph Neural Network For Heterogeneou Graph Embedding
No ratings yet
822 2020 Metapath Aggregated Graph Neural Network For Heterogeneou Graph Embedding
11 pages
Graph_Neural_Networks_Methods_Applications_and_Opp
No ratings yet
Graph_Neural_Networks_Methods_Applications_and_Opp
35 pages
Heterogeneous Graphs
No ratings yet
Heterogeneous Graphs
15 pages
DLG Book
No ratings yet
DLG Book
332 pages
199 Fast Node Embeddings Learning - 13
No ratings yet
199 Fast Node Embeddings Learning - 13
11 pages
HG综述2020
No ratings yet
HG综述2020
23 pages
07 Hetero
No ratings yet
07 Hetero
62 pages
Multi-Behavior Enhanced Heterogeneous Graph Convolutional Networks Recommendation Algorithm Based On Feature-Interaction
No ratings yet
Multi-Behavior Enhanced Heterogeneous Graph Convolutional Networks Recommendation Algorithm Based On Feature-Interaction
20 pages
Fraud Detection in Online Product Review Systems Via Heterogeneous Graph Transformer
No ratings yet
Fraud Detection in Online Product Review Systems Via Heterogeneous Graph Transformer
63 pages
GNN&Reasoning
No ratings yet
GNN&Reasoning
187 pages
HIGE
No ratings yet
HIGE
22 pages
Learning Methods
No ratings yet
Learning Methods
70 pages
2897231
No ratings yet
2897231
6 pages
esgnn
No ratings yet
esgnn
14 pages
Graph Neural Networks: A Review of Methods and Applications
No ratings yet
Graph Neural Networks: A Review of Methods and Applications
22 pages
Wilcke Et Al 2017 the Knowledge Graph as the Default Data Model for Learning on Heterogeneous Knowledge
No ratings yet
Wilcke Et Al 2017 the Knowledge Graph as the Default Data Model for Learning on Heterogeneous Knowledge
19 pages
4235-Article Text-7289-1-10-20190705
No ratings yet
4235-Article Text-7289-1-10-20190705
8 pages
异构网络学习综述
No ratings yet
异构网络学习综述
22 pages
GraphBasedDataScience
No ratings yet
GraphBasedDataScience
37 pages
Ai Presentation
No ratings yet
Ai Presentation
71 pages
异质图的社会化推荐00995
No ratings yet
异质图的社会化推荐00995
9 pages
Higpt: Heterogeneous Graph Language Model
No ratings yet
Higpt: Heterogeneous Graph Language Model
19 pages
Yang 20 A
No ratings yet
Yang 20 A
16 pages
Space4HGNN
No ratings yet
Space4HGNN
14 pages
Automated Unsupervised Graph Representation Learning
No ratings yet
Automated Unsupervised Graph Representation Learning
14 pages
NAGNE Node-To-Attribute Generation Network Embedding for Heterogeneous Network
No ratings yet
NAGNE Node-To-Attribute Generation Network Embedding for Heterogeneous Network
12 pages
Representation Learning for Attributed Multiplex Heterogeneous Network.
No ratings yet
Representation Learning for Attributed Multiplex Heterogeneous Network.
11 pages
Heterogeneous Graph Transformer: Ziniu Hu Yuxiao Dong
No ratings yet
Heterogeneous Graph Transformer: Ziniu Hu Yuxiao Dong
11 pages
Semantic-and-relation-aware-neural-network-model-for-bi-class-mul_2025_iScie
No ratings yet
Semantic-and-relation-aware-neural-network-model-for-bi-class-mul_2025_iScie
17 pages
Mathematics 11 00578
No ratings yet
Mathematics 11 00578
18 pages
Xu Et Al. 2023
No ratings yet
Xu Et Al. 2023
12 pages
1218_link_prediction_in_hypergraphs
No ratings yet
1218_link_prediction_in_hypergraphs
13 pages
16600-Article Text-20094-1-2-20210518
No ratings yet
16600-Article Text-20094-1-2-20210518
9 pages
Fla Unit 4
No ratings yet
Fla Unit 4
103 pages
2101.06471v1
No ratings yet
2101.06471v1
16 pages
ch12
No ratings yet
ch12
55 pages
Heterogeneous Graph Neural Network
No ratings yet
Heterogeneous Graph Neural Network
11 pages
ch08
No ratings yet
ch08
53 pages
Dynamic Hypergraph Neural Networks: Jianwen Jiang Yuxuan Wei Yifan Feng Jingxuan Cao Yue Gao
No ratings yet
Dynamic Hypergraph Neural Networks: Jianwen Jiang Yuxuan Wei Yifan Feng Jingxuan Cao Yue Gao
7 pages
Graph Transformer Networks: Corresponding Author
No ratings yet
Graph Transformer Networks: Corresponding Author
11 pages
16526-Article Text-20020-1-2-20210518
No ratings yet
16526-Article Text-20020-1-2-20210518
9 pages
Hierarchical Graph Neural Networks
No ratings yet
Hierarchical Graph Neural Networks
14 pages
3589334.3645685
No ratings yet
3589334.3645685
9 pages
2020 - Line Hypergraph Convolution Network - Bandyopadhyay Et Al
No ratings yet
2020 - Line Hypergraph Convolution Network - Bandyopadhyay Et Al
13 pages
Content Augmented Graph Neural Networks
No ratings yet
Content Augmented Graph Neural Networks
15 pages
2024_Introduction to Graph Neural Networks A Starting
No ratings yet
2024_Introduction to Graph Neural Networks A Starting
49 pages
A Comprehensive Survey of Graph Neural Networks For Knowledge Graphs
No ratings yet
A Comprehensive Survey of Graph Neural Networks For Knowledge Graphs
13 pages
HGNNs
No ratings yet
HGNNs
9 pages
HGNN
No ratings yet
HGNN
9 pages
Recursive Function
No ratings yet
Recursive Function
35 pages
CS3401 Algorithms Lab
No ratings yet
CS3401 Algorithms Lab
42 pages
A Survey of Graph Neural Networks in Various Learning Paradigms Methods, Applications, and Challenges
No ratings yet
A Survey of Graph Neural Networks in Various Learning Paradigms Methods, Applications, and Challenges
70 pages
Description_of_approach
No ratings yet
Description_of_approach
5 pages
A Generalization of Transformer Networks To Graphs
No ratings yet
A Generalization of Transformer Networks To Graphs
8 pages
Seminar Presentation
No ratings yet
Seminar Presentation
19 pages
Heterogeneous Hypergraph Variational Autoencoder for Link Pr
No ratings yet
Heterogeneous Hypergraph Variational Autoencoder for Link Pr
14 pages
1297 2018 Pitfalls of Graph Neural Network Evaluation
No ratings yet
1297 2018 Pitfalls of Graph Neural Network Evaluation
11 pages
Bca 4 Sem Theory of Computation 2233 Summer 2019
No ratings yet
Bca 4 Sem Theory of Computation 2233 Summer 2019
2 pages
Dsu Chapter
No ratings yet
Dsu Chapter
37 pages
Lockbox Problem
No ratings yet
Lockbox Problem
14 pages
Papers Papers PDF
No ratings yet
Papers Papers PDF
48 pages
On Single Machine Scheduling With Processing Time Deterioration
No ratings yet
On Single Machine Scheduling With Processing Time Deterioration
48 pages
Graph Neural Networks: A Review of Methods and Applications
No ratings yet
Graph Neural Networks: A Review of Methods and Applications
20 pages
Facility Location
No ratings yet
Facility Location
43 pages
Multivariable Calculus 10th Edition Larson Test Bank Download
100% (54)
Multivariable Calculus 10th Edition Larson Test Bank Download
8 pages
MAT116E Introduction 2021fall
No ratings yet
MAT116E Introduction 2021fall
26 pages
Defence Transcription
No ratings yet
Defence Transcription
4 pages
TeachingPlan TheoryOfAutomata Fall 2020 DR Raheel 11102020 052419pm
No ratings yet
TeachingPlan TheoryOfAutomata Fall 2020 DR Raheel 11102020 052419pm
7 pages
141 2020 Missing LP Using CN and Centrality Based Parameterized Algorithm
No ratings yet
141 2020 Missing LP Using CN and Centrality Based Parameterized Algorithm
9 pages
Midterm Lab Exam - Attempt Review
No ratings yet
Midterm Lab Exam - Attempt Review
17 pages
Chapter 2 - Random Number Generation
No ratings yet
Chapter 2 - Random Number Generation
21 pages
Taskar+al_NIPS03b
No ratings yet
Taskar+al_NIPS03b
8 pages
Analyzing Correctness
No ratings yet
Analyzing Correctness
14 pages
A Comprehensive Survey of Graph Neural Networks PDF
No ratings yet
A Comprehensive Survey of Graph Neural Networks PDF
22 pages
Tut14 Logic Gate
No ratings yet
Tut14 Logic Gate
7 pages
Vera_Traub
No ratings yet
Vera_Traub
2 pages
Assignment 1
No ratings yet
Assignment 1
5 pages
Raghu Sir Programs
No ratings yet
Raghu Sir Programs
89 pages
Improved Decoding of Reed-Solomon and Algebraic-Geometry Codes
No ratings yet
Improved Decoding of Reed-Solomon and Algebraic-Geometry Codes
11 pages
Up and Down Casting
No ratings yet
Up and Down Casting
10 pages
Data Structures Using C
No ratings yet
Data Structures Using C
2 pages
BMATS201-MQP
No ratings yet
BMATS201-MQP
2 pages
00 Soft Computing
No ratings yet
00 Soft Computing
2 pages
Assignment 2 (MAB-103)
No ratings yet
Assignment 2 (MAB-103)
2 pages
AI Unit 3
No ratings yet
AI Unit 3
10 pages
Cruz CoE702 Review Paper
No ratings yet
Cruz CoE702 Review Paper
1 page
A Search Algorithm - Wikipedia
No ratings yet
A Search Algorithm - Wikipedia
11 pages
Industrial Engineering (IE)
No ratings yet
Industrial Engineering (IE)
22 pages
Lecture 23: Memory Representation of Trees, Traversal Algorithms
No ratings yet
Lecture 23: Memory Representation of Trees, Traversal Algorithms
4 pages
Directed Acyclic Graphs in Theory and Practice: Definitive Reference for Developers and Engineers
From Everand
Directed Acyclic Graphs in Theory and Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet