X-GOAL Multiplex Heterogeneous Graph Prototypical Contrastive Learning
X-GOAL Multiplex Heterogeneous Graph Prototypical Contrastive Learning
Contrastive Learning
Baoyu Jing Shengyu Feng Yuejia Xiang
[email protected] [email protected] [email protected]
University of Illinois at Language Technology Institute Platform and Content Group
Urbana-Champaign Carnegie Mellon University Tencent
Platform and Content Group Platform and Content Group University of Illinois at
Tencent Tencent Urbana-Champaign
𝑣 𝑣 ′ 𝑣 𝑣 ′
if 𝐾 = 𝐾 , the cluster center vectors C and C are distributed at The expectation E is taken over all the 𝑁 nodes, and all the pairs
different positions in the embedding space. of 𝑉 layers, and thus we have:
An illustration of the cluster-level alignment is presented in ′
e𝑐𝑜𝑠 (h𝑛 ,h𝑛 )
𝑁 𝑉 𝑉 𝑣 𝑣
Figure 3. Given a node x𝑛 , on the anchor layer 𝑣, we have the 𝑣 𝑣′ 1 ∑︁ ∑︁ ∑︁
𝐼 (𝐻 ; 𝐻 ) ≥ log ′ (12)
anchor cluster centers C𝑣 , the anchor embedding h𝑛𝑣 , and the anchor 𝑍 𝑛=1 𝑣=1 ′ e𝑐𝑜𝑠 (h𝑛 ,h𝑛 ) + e𝑐𝑜𝑠 (h𝑛 ,h𝑛 )
𝑣 𝑣 𝑣 𝑣−
′ 𝑣 ≠𝑣
semantic distribution p𝑛𝑣 . Next, we use the embedding h𝑛𝑣 from the
′
layer 𝑣 ′ ≠ 𝑣 to obtain the recovered semantic distribution q𝑛𝑣 based where 𝑍 = 𝑁𝑉 (𝑉 − 1) is the normalization factor, and the right
′ side is R N in Equation (7). □
on C𝑣 via Equation (2). Then we align the semantics of h𝑛𝑣 and h𝑛𝑣
′
by minimizing the KL-divergence of p𝑛𝑣 and q𝑛𝑣 : Theorem 3.2 (Maximization of MI between Embeddings and
𝑁 ∑︁
𝑉 Semantic Cluster Assignments). Let 𝐶 𝑣 ∈ [1, · · · , 𝐾 𝑣 ] be the
1 ′
∑︁
R𝐶𝑣 = 𝐾𝐿(p𝑛𝑣 ||q𝑛𝑣 ) (6) random variable for cluster assignments for {h𝑛𝑣 }𝑛=1 𝑁 of the anchor
𝑁 (𝑉 − 1) 𝑛=1 ′ ′ ′ 𝑁
𝑣 ≠𝑣 𝑣 𝑣
layer 𝑣, and 𝐻 ∈ {h𝑛 }𝑛=1 be the random variable for node embed-
where p𝑛𝑣 is treated as the ground-truth and the gradients are not dings of the 𝑣 ′ -th layer, then the cluster-level alignment maximizes
′ ′
allowed to pass through p𝑛𝑣 during training. the mutual information of 𝐶 𝑣 and 𝐻 𝑣 : 𝐼 (𝐶 𝑣 ; 𝐻 𝑣 ).
X-GOAL: Multiplex Heterogeneous Graph Prototypical Contrastive Learning CIKM ’22, October 17–21, 2022, Atlanta, GA, USA.
Table 3: Overall performance of X-GOAL on the unsupervised tasks: node clustering and similarity search.
means out-of-memory. Among all the baselines, HDMI has the best GOAL on Homogeneous Graph Layers. We compare the pro-
overall performance. The proposed X-GOAL further outperforms posed GOAL framework with recent infomax-based methods (DGI
HDMI. The proposed X-GOAL has 0.023/0.019/0.041/0.021 average and HDI) and graph augmentation based methods (GraphCL and
improvements over the second best scores on Macro-F1/Micro- GCA). The experimental results for each single homogeneous graph
F1/NMI/Sim@5. For Macro-F1 and Micro-F1 in Table 2, X-GOAL layer are presented in Tables 4-5. It is evident that GOAL signif-
improves the most on the Amazon dataset (0.050/0.044). For NMI icantly outperforms the baseline methods on all single homoge-
and Sim@5 in Table 3, X-GOAL improves the most on the ACM neous graph layers. On average, GOAL has 0.137/0.129/0.151/0.119
(0.071) and Amazon (0.050) dataset respectively. The superior over- improvements on Macro-F1/Micro-F1/NMI/Sim@5. For node clas-
all performance of X-GOAL demonstrate that the proposed ap- sification in Table 4, GOAL improves the most on the PATAP layer
proach can effectively extract informative node embeddings for of DBLP: 0.514/0.459 on Macro-F1/Micro-F1. For node clustering
multiplex heterogeneous graph. and similarity search in Table 5, GOAL improves the most on the
X-GOAL: Multiplex Heterogeneous Graph Prototypical Contrastive Learning CIKM ’22, October 17–21, 2022, Atlanta, GA, USA.
Table 5: Overall performance of GOAL on each layer: node clustering and similarity search.
IBI layer of Amazon: 0.391 on NMI and 0.378 on Sim@5. The supe- Table 7: Ablation study of GOAL on the PAP layer of ACM.
rior performance of GOAL indicates that the proposed prototypi-
cal contrastive learning strategy is better than the infomax-based MaF1 MiF1 NMI Sim@5
and graph augmentation based instance-wise contrastive learn- GOAL 0.908 0.908 0.735 0.917
ing strategies. We believe this is because prototypical contrasive w/o warm-up 0.863 0.865 0.721 0.903
learning could effectively reduce the semantic errors. w/o L C 0.865 0.867 0.693 0.899
w/o LN 0.878 0.880 0.678 0.881
1st-ord. GCN (relu) 0.865 0.866 0.559 0.859
4.3 Ablation Study GCN (tanh) 0.881 0.881 0.486 0.886
Multiplex Heterogeneous Graph Level. In Table 6, we study the GCN (relu) 0.831 0.831 0.410 0.837
impact of the node-level and semantic-level alignments. The results dropout → masking 0.888 0.890 0.716 0.903
w/o attribute drop 0.843 0.845 0.568 0.869
in Table 6 indicate that both of the node-level alignment (R 𝑁 ) and
w/o adj. matrix drop 0.888 0.888 0.715 0.903
the semantic-level alignment (R𝐶 ) can improve the performance.
Homogeneous Graph Layer Level. The results for different con-
figurations of GOAL on the PAP layer of ACM are shown in Table 7.
First, all of the warm-up, the semantic-level loss L C and the node-
level loss L N are critical. Second, comparing GOAL (1st-order GCN
with tanh activation) with other GCN variants, (1) with the same
activation function, the 1st-order GCN perform better than the
original GCN; (2) tanh is better than relu. We believe this is because
the 1st-order GCN has a better capability for capturing the attribute
information, and tanh provides a better normalization for the node
embeddings. Finally, for the configurations of graph transformation,
if we replace dropout with masking, the performance will drop. This
is because dropout re-scales the outputs by 1/(1 − 𝑝𝑑𝑟𝑜𝑝 ), which
improves the performance. Besides, dropout on both attributes and (a) Macro-F1 v.s. 𝐾 (b) NMI v.s. 𝐾
adjacency matrix is important.
Figure 4: The number of 𝐾 on PSP and PAP of ACM
4.4 Number of Clusters
Figure 4 shows the Macro-F1 and NMI scores on the PSP and PAP ACM is 3, and the results in Figure 4 indicate that over-clustering
layers of ACM w.r.t. the number of clusters 𝐾 ∈ [3, 4, 5, 10, 20, 30, 50]. is beneficial. We believe this is because there are many sub-clusters
For PSP and PAP, the best Macro-F1 and NMI scores are obtained in the embedding space, which is consistent with the prior findings
when 𝐾 = 30 and 𝐾 = 5. The number of ground-truth classes for on image data [27].
CIKM ’22, October 17–21, 2022, Atlanta, GA, USA. Baoyu Jing et al.
Figure 5: Visualization of the embeddings for the PAP and PSP layers of the ACM graph.
4.5 Visualization pairs. GraphCL [67] uses various graph augmentations to obtain
Homogeneous Graph Layer Level. The t-SNE [32] visualizations positive nodes. GCA [76] generates positive and negative pairs
of the embeddings for PSP and PAP of ACM are presented in Figure based on their importance. gCool [25] introduces graph communal
5. L N , L C , R N and R C are the node-level loss, cluster-level loss, contrastive learning. Ariel [8, 9] proposes a information regular-
node-level alignment and cluster-level alignment. The embeddings ized adversarial graph contrastive learning. These methods use the
extracted by the full GOAL framework (L N + L C ) are better sepa- contrastive losses similar to InfoNCE [36].
rated than the node-level loss L N only. For GOAL, the numbers For multiplex heterogeneous graphs, MNE [68], MVN2VEC [47]
of clusters for PSP and PAP are 30 and 5 since they have the best and GATNE [4] sample node pairs based on random walks. DMGI
performance as shown in Figure 4. [39] and HDMI [18] use random attribute shuffling to sample neg-
Multiplex Heterogeneous Graph Level. The visualizations for ative nodes. HeCo [59] decides positive and negative pairs based
the combined embeddings are shown in Figure 6. Embeddings in on the connectivity between nodes. Above methods mainly rely on
Figures 6a-6b are the average pooling of the layer-specific embed- the topological structures to pair nodes, yet do not fully explore
dings in Figure 5. Figure 6c and 6d are X-GOAL w/o cluster-level the semantic information, which could introduce semantic errors.
alignment and the full X-GOAL. Generally, the full X-GOAL best
separates different clusters. 5.2 Deep Clustering and Contrastive Learning
Clustering algorithms [2, 62] can capture the semantic clusters of
5 RELATED WORK instances. DeepCluster [2] is one of the earliest works which use
cluster assignments as “pseudo-labels" to update the parameters of
5.1 Contrastive Learning for Graphs the encoder. DEC [62] learns a mapping from the data space to a
The goal of CL is to pull similar nodes into close positions and push lower-dimensional feature space in which it iteratively optimizes a
dis-similar nodes far apart in the embedding space. Inspired by clustering objective. Inspired by these works, SwAV [3] and PCL
word2vec [35], early methods, such as DeepWalk [42] and node2vec [27] combine deep clustering with CL. SwAV compares the cluster
[12] use random walks to sample positive pairs of nodes. LINE [50] assignments rather than the embeddings of two images. PCL is
and SDNE [56] determine the positive node pairs by their first and the closest to our work, which alternatively performs clustering
second-order structural proximity. Recent methods leverage graph to obtain the latent prototypes and train the encoder by contrast-
transformation to generate node pairs. DGI [53], GMI [41], HDI [18] ing positive and negative pairs of nodes and prototypes. However,
and CommDGI [69] obtain negative samples by randomly shuffling PCL has some limitations compared with the proposed X-GOAL:
the node attributes. MVGRL [14] transforms graphs via techniques it is designed for single view image data; it heavily relies on data
such as graph diffusion [24]. The objective of the above methods augmentations and momentum contrast [15]; it has some complex
is to maximize the mutual information of the positive embedding assumptions over cluster distributions and embeddings.
X-GOAL: Multiplex Heterogeneous Graph Prototypical Contrastive Learning CIKM ’22, October 17–21, 2022, Atlanta, GA, USA.
5.3 Multiplex Heterogeneous Graph Neural Following [27], we maximize the following log likelihood:
Networks 𝑁
∑︁ 𝑁
∑︁ 𝐾
∑︁
The multiplex heterogeneous graph [4] considers multiple relations log 𝑝 (h𝑛 |Θ, C) = log 𝑝 (h𝑛 , 𝑘 |Θ, C) (19)
among nodes, and it is also known as multiplex graph [18, 39], 𝑛=1 𝑛=1 𝑘=1
multi-view graph [46], multi-layer graph [26] and multi-dimension where h𝑛 is the 𝑛-th row of h, Θ and C are the parameters of E
graph [30]. MVE [46] and HAN [58] uses attention mechanisms to and K-means algorithm C, 𝑘 ∈ [1, · · · , 𝐾] is the cluster index, and
combine embeddings from different views. mGCN [31] models both 𝐾 is the number of clusters. Directly optimizing this objective is
within and across view interactions. VANE [11] uses adversarial impracticable since the cluster index is a latent variable.
training to improve the comprehensiveness and robustness of the The Evidence Lower Bound (ELBO) of Equation (19) is given by:
embeddings. Multiplex graph neural networks have been used in
𝑁 ∑︁
𝐾
many applications [7], such as time series [19], text summarization ∑︁ 𝑝 (h𝑛 , 𝑘 |Θ, C)
ELBO = 𝑄 (𝑘 |h𝑛 ) log (20)
[21], temporal graphs [10], graph alignment [63], abstract reasoning
𝑛=1 𝑘=1
𝑄 (𝑘 |h𝑛 )
[57], global poverty [22] and bipartite graphs [64].
where 𝑄 (𝑘 |h𝑛 ) = 𝑝 (𝑘 |h𝑛 , Θ, C) is the auxiliary function.
In the E-step, we fix Θ and estimate the cluster centers Ĉ and the
5.4 Deep Graph Clustering cluster assignments 𝑄ˆ (𝑘 |h𝑛 ) by running the K-means algorithm
Graph clustering aims at discovering groups in graphs. SAE [51] over the embeddings of the original graph H = E (G). If a node h𝑛
and MGAE [55] first train a GNN, and then run a clustering algo- belongs to the cluster 𝑘, then its auxiliary function is an indicator
rithm over node embeddings to obtain the clusters. DAEGC [54] function satisfying 𝑄ˆ (𝑘 |h𝑛 ) = 1, and 𝑄ˆ (𝑘 ′ |h𝑛 ) = 0 for ∀𝑘 ′ ≠ 𝑘.
and SDCN [1] jointly optimize clustering algorithms and the graph In the M-step, based on Ĉ and 𝑄ˆ (𝑘 |h𝑛 ) obtained in the E-step,
reconstruction loss. AGC [70] adaptively finds the optimal order for we update Θ by maximizing ELBO:
graph filters based on the intrinsic clustering scores. M3S [49] uses
𝑁 ∑︁
∑︁ 𝐾
clustering to enlarge the labeled data with pseudo labels. SDCN [1]
ELBO = 𝑄ˆ (𝑘 |h𝑛 ) log 𝑝 (h𝑛 , 𝑘 |Θ, Ĉ)
proposes a structural deep clustering network to integrate the struc-
𝑛=1 𝑘=1
tural information into deep clustering. COIN [20] co-clusters two (21)
𝑁 ∑︁
∑︁ 𝐾
types of nodes in bipartite graphs. MvAGC [28] extends AGC [70] to
− 𝑄ˆ (𝑘 |h𝑛 ) log 𝑄ˆ (𝑘 |h𝑛 )
multi-view settings. However, MvAGC is not neural network based
𝑛=1 𝑘=1
methods which might not exploit the attribute and non-linearity
information. Recent methods combine CL with clustering to fur- Dropping the second term of the above equation, which is a con-
ther improve the performance. SCAGC [61] treats nodes within stant, we will minimize the following loss function:
the same cluster as positive pairs. MCGC [37] combines CL with 𝑁 ∑︁
∑︁ 𝐾
MvAGC [28], which treats each node with its neighbors as positive LC = − 𝑄ˆ (𝑘 |h𝑛 ) log 𝑝 (h𝑛 , 𝑘 |Θ, Ĉ) (22)
pairs. Different from SCAGC and MCGC, the proposed GOAL and 𝑛=1 𝑘=1
X-GOAL capture the semantic information by treating a node with
Assuming a uniform prior distribution over h𝑛 , we have:
its corresponding cluster center as a positive pair.
𝑝 (h𝑛 , 𝑘 |Θ, Ĉ) ∝ 𝑝 (𝑘 |h𝑛 , Θ, Ĉ) (23)
6 CONCLUSION We define 𝑝 (𝑘 |h𝑛 , Θ, Ĉ) by:
In this paper, we introduce a novel X-GOAL framework for multi-
e (ĉ𝑘 ·h𝑛 /𝜏)
𝑇
plex heterogeneous graphs, which is comprised of a GOAL frame- 𝑝 (𝑘 |h𝑛 , Θ, Ĉ) = Í (24)
e (ĉ𝑘 ′ ·h𝑛 /𝜏)
𝑇
𝐾
work for each homogeneous graph layer and an alignment regu- 𝑘 ′ =1
larization to jointly model different layers. The GOAL framework
captures both node-level and cluster-level information. The align- where h𝑛 ∈ R𝑑 is the embedding of the node x𝑛 , ĉ𝑘 ∈ R𝑑 is the
ment regularization is a nimble technique to jointly model and vector of the 𝑘-th cluster center, 𝜏 is the temperature parameter.
propagate information across different layers, which could maxi- Let’s use 𝑘𝑛 to denote the cluster assignment of h𝑛 , and normalize
mize the mutual information of different layers. The experimental the loss by 𝑁1 , then Equation (22) can be rewritten as:
results on real-world multiplex heterogeneous graphs demonstrate 𝑁 (c𝑇 ·h /𝜏)
1 ∑︁ e 𝑘𝑛 𝑛
the effectiveness of the proposed X-GOAL framework. LC = − log Í (25)
𝑁 𝑛=1 𝐾 e (c𝑇𝑘 ·h𝑛 /𝜏)
𝑘=1
A DERIVATION OF CLUSTER-LEVEL LOSS The above loss function captures the semantic similarities be-
The node-level contrastive loss is usually noisy, which could intro- tween nodes by pulling nodes within the same cluster closer to
duce semantic errors by treating two semantic similar nodes as a their assigned cluster center.
negative pair. To tackle this issue, we use a clustering algorithm
C (e.g. K-means) to obtain the semantic clusters of nodes, and we ACKNOWLEDGMENTS
use the EM algorithm to update the parameters of E to pull node BJ and HT are partially supported by NSF (1947135, 2134079 and
embeddings closer to their assigned clusters (or prototypes). 1939725), and NIFA (2020-67021-32799).
CIKM ’22, October 17–21, 2022, Atlanta, GA, USA. Baoyu Jing et al.
REFERENCES [33] David McAllester and Karl Stratos. 2020. Formal limitations on the measurement
[1] Deyu Bo, Xiao Wang, Chuan Shi, Meiqi Zhu, Emiao Lu, and Peng Cui. 2020. of mutual information. In International Conference on Artificial Intelligence and
Structural deep clustering network. In Proceedings of The Web Conference 2020. Statistics. PMLR, 875–884.
[2] Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. 2018. [34] Zaiqiao Meng, Shangsong Liang, Hongyan Bao, and Xiangliang Zhang. 2019.
Deep clustering for unsupervised learning of visual features. In ECCV. Co-embedding attributed networks. In WSDM.
[3] Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and [35] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013.
Armand Joulin. 2020. Unsupervised learning of visual features by contrasting Distributed representations of words and phrases and their compositionality. In
cluster assignments. NeurIPS (2020). Advances in neural information processing systems. 3111–3119.
[4] Yukuo Cen, Xu Zou, Jianwei Zhang, Hongxia Yang, Jingren Zhou, and Jie Tang. [36] Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning
2019. Representation learning for attributed multiplex heterogeneous network. with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).
In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge [37] Erlin Pan and Zhao Kang. 2021. Multi-view Contrastive Graph Clustering. Ad-
Discovery & Data Mining. 1358–1368. vances in Neural Information Processing Systems 34 (2021).
[5] Xiaokai Chu, Xinxin Fan, Di Yao, Zhihua Zhu, Jianhui Huang, and Jingping Bi. [38] Chanyoung Park, Jiawei Han, and Hwanjo Yu. 2020. Deep multiplex graph
2019. Cross-network embedding for multi-network alignment. In The World Wide infomax: Attentive multiplex network embedding using global information.
Web Conference. 273–284. Knowledge-Based Systems 197 (2020), 105861.
[6] Boxin Du, Changhe Yuan, Robert Barton, Tal Neiman, and Hanghang Tong. [39] Chanyoung Park, Donghyun Kim, Jiawei Han, and Hwanjo Yu. 2020. Unsuper-
2021. Hypergraph Pre-training with Graph Neural Networks. arXiv preprint vised Attributed Multiplex Network Embedding. In AAAI. 5371–5378.
arXiv:2105.10862 (2021). [40] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory
[7] Boxin Du, Si Zhang, Yuchen Yan, and Hanghang Tong. 2021. New Frontiers of Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019.
Multi-Network Mining: Recent Developments and Future Trend. In Proceedings Pytorch: An imperative style, high-performance deep learning library. Advances
of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. in neural information processing systems 32 (2019), 8026–8037.
4038–4039. [41] Zhen Peng, Wenbing Huang, Minnan Luo, Qinghua Zheng, Yu Rong, Tingyang
[8] Shengyu Feng, Baoyu Jing, Yada Zhu, and Hanghang Tong. 2022. Adversarial Xu, and Junzhou Huang. 2020. Graph Representation Learning via Graphical
graph contrastive learning with information regularization. In Proceedings of the Mutual Information Maximization. In Proceedings of The Web Conference 2020.
ACM Web Conference 2022. 1362–1371. [42] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning
[9] Shengyu Feng, Baoyu Jing, Yada Zhu, and Hanghang Tong. 2022. ARIEL: Adver- of social representations. In Proceedings of the 20th ACM SIGKDD. 701–710.
sarial Graph Contrastive Learning. https://doi.org/10.48550/ARXIV.2208.06956 [43] Ben Poole, Sherjil Ozair, Aaron Van Den Oord, Alex Alemi, and George Tucker.
[10] Dongqi Fu, Liri Fang, Ross Maciejewski, Vetle I. Torvik, and Jingrui He. 2022. 2019. On variational bounds of mutual information. In ICML.
Meta-Learned Metrics over Multi-Evolution Temporal Graphs. In KDD 2022. [44] Zhenyue Qin, Dongwoo Kim, and Tom Gedeon. 2019. Rethinking softmax with
[11] Dongqi Fu, Zhe Xu, Bo Li, Hanghang Tong, and Jingrui He. 2020. A View- cross-entropy: Neural network classifier as mutual information estimator. arXiv
Adversarial Framework for Multi-View Network Embedding. In CIKM. preprint arXiv:1911.10688 (2019).
[12] Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for [45] Jiezhong Qiu, Qibin Chen, Yuxiao Dong, Jing Zhang, Hongxia Yang, Ming Ding,
networks. In Proceedings of the 22nd ACM SIGKDD. 855–864. Kuansan Wang, and Jie Tang. 2020. Gcc: Graph contrastive coding for graph
[13] William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation learning neural network pre-training. In Proceedings of the 26th ACM SIGKDD International
on graphs: Methods and applications. arXiv preprint arXiv:1709.05584 (2017). Conference on Knowledge Discovery & Data Mining. 1150–1160.
[14] Kaveh Hassani and Amir Hosein Khasahmadi. 2020. Contrastive Multi-View [46] Meng Qu, Jian Tang, Jingbo Shang, Xiang Ren, Ming Zhang, and Jiawei Han.
Representation Learning on Graphs. arXiv preprint arXiv:2006.05582 (2020). 2017. An attention-based collaboration framework for multi-view network
[15] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Mo- representation learning. In CIKM.
mentum contrast for unsupervised visual representation learning. In CVPR. [47] Yu Shi, Fangqiu Han, Xinwei He, Xinran He, Carl Yang, Jie Luo, and Jiawei Han.
[16] Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, 2018. mvn2vec: Preservation and collaboration in multi-view network embedding.
and Jure Leskovec. 2019. Strategies for pre-training graph neural networks. arXiv arXiv preprint arXiv:1801.06597 (2018).
preprint arXiv:1905.12265 (2019). [48] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan
[17] Yizhu Jiao, Yun Xiong, Jiawei Zhang, Yao Zhang, Tianqi Zhang, and Yangyong Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from
Zhu. 2020. Sub-Graph Contrast for Scalable Self-Supervised Graph Representation overfitting. The journal of machine learning research 15, 1 (2014), 1929–1958.
Learning. In 2020 IEEE International Conference on Data Mining (ICDM). [49] Ke Sun, Zhouchen Lin, and Zhanxing Zhu. 2020. Multi-stage self-supervised
[18] Baoyu Jing, Chanyoung Park, and Hanghang Tong. 2021. Hdmi: High-order deep learning for graph convolutional networks on graphs with few labeled nodes. In
multiplex infomax. In Proceedings of the Web Conference 2021. 2414–2424. AAAI, Vol. 34. 5892–5899.
[19] Baoyu Jing, Hanghang Tong, and Yada Zhu. 2021. Network of Tensor Time Series. [50] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei.
In The World Wide Web Conference. https://doi.org/10.1145/3442381.3449969 2015. Line: Large-scale information network embedding. In Proceedings of the
[20] Baoyu Jing, Yuchen Yan, Yada Zhu, and Hanghang Tong. 2022. COIN: Co-Cluster 24th international conference on world wide web. 1067–1077.
Infomax for Bipartite Graphs. arXiv preprint arXiv:2206.00006 (2022). [51] Fei Tian, Bin Gao, Qing Cui, Enhong Chen, and Tie-Yan Liu. 2014. Learning deep
[21] Baoyu Jing, Zeyu You, Tao Yang, Wei Fan, and Hanghang Tong. 2021. Multiplex representations for graph clustering. In AAAI, Vol. 28.
Graph Neural Network for Extractive Text Summarization. In Proceedings of the [52] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro
2021 Conference on Empirical Methods in Natural Language Processing. 133–139. Lio, and Yoshua Bengio. 2018. Graph attention networks. ICLR (2018).
[22] Muhammad Raza Khan and Joshua E Blumenstock. 2019. Multi-gcn: Graph [53] Petar Veličković, William Fedus, William L Hamilton, Pietro Liò, Yoshua Bengio,
convolutional networks for multi-view networks, with applications to global and R Devon Hjelm. 2019. Deep graph infomax. ICLR (2019).
poverty. In AAAI, Vol. 33. 606–613. [54] Chun Wang, Shirui Pan, Ruiqi Hu, Guodong Long, Jing Jiang, and Chengqi Zhang.
[23] Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph 2019. Attributed graph clustering: A deep attentional embedding approach. arXiv
convolutional networks. arXiv preprint arXiv:1609.02907 (2016). preprint arXiv:1906.06532 (2019).
[24] Johannes Klicpera, Stefan Weißenberger, and Stephan Günnemann. 2019. Diffu- [55] Chun Wang, Shirui Pan, Guodong Long, Xingquan Zhu, and Jing Jiang. 2017.
sion improves graph learning. NeurIPS (2019). Mgae: Marginalized graph autoencoder for graph clustering. In CIKM. 889–898.
[25] Bolian Li, Baoyu Jing, and Hanghang Tong. 2022. Graph Communal Contrastive [56] Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network em-
Learning. In Proceedings of the ACM Web Conference 2022. 1203–1213. bedding. In Proceedings of the 22nd ACM SIGKDD international conference on
[26] Jundong Li, Chen Chen, Hanghang Tong, and Huan Liu. 2018. Multi-layered Knowledge discovery and data mining. 1225–1234.
network embedding. In SDM. 684–692. [57] Duo Wang, Mateja Jamnik, and Pietro Lio. 2020. Abstract Diagrammatic Reason-
[27] Junnan Li, Pan Zhou, Caiming Xiong, and Steven CH Hoi. 2021. Prototypical ing with Multiplex Graph Networks. In ICLR.
contrastive learning of unsupervised representations. ICLR (2021). [58] Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S
[28] Zhiping Lin and Zhao Kang. 2021. Graph filter-based multi-view attributed graph Yu. 2019. Heterogeneous graph attention network. In TheWebConf.
clustering. In IJCAI. 19–26. [59] Xiao Wang, Nian Liu, Hui Han, and Chuan Shi. 2021. Self-supervised heteroge-
[29] Yixin Liu, Shirui Pan, Ming Jin, Chuan Zhou, Feng Xia, and Philip S Yu. 2021. neous graph neural network with co-contrastive learning. In Proceedings of the
Graph self-supervised learning: A survey. arXiv preprint arXiv:2103.00111 (2021). 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1726–1736.
[30] Yao Ma, Zhaochun Ren, Ziheng Jiang, Jiliang Tang, and Dawei Yin. 2018. Multi- [60] Lirong Wu, Haitao Lin, Zhangyang Gao, Cheng Tan, Stan Li, et al. 2021. Self-
dimensional network embedding with hierarchical structure. In WSDM. 387–395. supervised on Graphs: Contrastive, Generative, or Predictive. arXiv preprint
[31] Yao Ma, Suhang Wang, Chara C Aggarwal, Dawei Yin, and Jiliang Tang. 2019. arXiv:2105.07342 (2021).
Multi-dimensional graph convolutional networks. In SDM. 657–665. [61] Wei Xia, Quanxue Gao, Ming Yang, and Xinbo Gao. 2021. Self-supervised Con-
[32] Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. trastive Attributed Graph Clustering. NeurIPS (2021).
Journal of machine learning research 9, Nov (2008), 2579–2605. [62] Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Unsupervised deep embedding
for clustering analysis. In ICML.
X-GOAL: Multiplex Heterogeneous Graph Prototypical Contrastive Learning CIKM ’22, October 17–21, 2022, Atlanta, GA, USA.
[63] Hao Xiong, Junchi Yan, and Li Pan. 2021. Contrastive Multi-View Multiplex [70] Xiaotong Zhang, Han Liu, Qimai Li, and Xiao-Ming Wu. 2019. Attributed graph
Network Embedding with Applications to Robust Network Alignment. In Pro- clustering via adaptive graph convolution. arXiv preprint arXiv:1906.01210 (2019).
ceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data [71] Zhen Zhang, Hongxia Yang, Jiajun Bu, Sheng Zhou, Pinggang Yu, Jianwei Zhang,
Mining. 1913–1923. Martin Ester, and Can Wang. 2018. ANRL: Attributed Network Representation
[64] Hansheng Xue, Luwei Yang, Vaibhav Rajan, Wen Jiang, Yi Wei, and Yu Lin. 2021. Learning via Deep Neural Networks.. In IJCAI, Vol. 18. 3155–3161.
Multiplex Bipartite Network Embedding using Dual Hypergraph Convolutional [72] Lecheng Zheng, Dongqi Fu, and Jingrui He. 2021. Tackling oversmoothing of
Networks. In Proceedings of the Web Conference 2021. 1649–1660. gnns with contrastive learning. arXiv preprint arXiv:2110.13798 (2021).
[65] Yuchen Yan, Lihui Liu, Yikun Ban, Baoyu Jing, and Hanghang Tong. 2021. Dy- [73] Lecheng Zheng, Yada Zhu, Jingrui He, and Jinjun Xiong. 2021. Heterogeneous
namic Knowledge Alignment. In AAAI. Contrastive Learning. arXiv preprint arXiv:2105.09401 (2021).
[66] Yuchen Yan, Si Zhang, and Hanghang Tong. 2021. Bright: A bridging algorithm [74] Dawei Zhou, Lecheng Zheng, Jiawei Han, and Jingrui He. 2020. A data-driven
for network alignment. In Proceedings of the Web Conference 2021. 3907–3917. graph generative model for temporal interaction networks. In Proceedings of
[67] Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data
Yang Shen. 2020. Graph contrastive learning with augmentations. Advances in Mining. 401–411.
Neural Information Processing Systems 33 (2020), 5812–5823. [75] Dawei Zhou, Lecheng Zheng, Jiejun Xu, and Jingrui He. 2019. Misc-GAN: A
[68] Hongming Zhang, Liwei Qiu, Lingling Yi, and Yangqiu Song. 2018. Scalable multi-scale generative model for graphs. Frontiers in big Data 2 (2019), 3.
Multiplex Network Embedding.. In IJCAI, Vol. 18. 3082–3088. [76] Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. 2021.
[69] Tianqi Zhang, Yun Xiong, Jiawei Zhang, Yao Zhang, Yizhu Jiao, and Yangyong Graph contrastive learning with adaptive augmentation. In Proceedings of the
Zhu. 2020. CommDGI: community detection oriented deep graph infomax. In Web Conference 2021. 2069–2080.
Proceedings of the 29th ACM International Conference on Information & Knowledge [77] Chenyi Zhuang and Qiang Ma. 2018. Dual graph convolutional networks for
Management. 1843–1852. graph-based semi-supervised classification. In Proceedings of the 2018 World Wide
Web Conference. 499–508.