Matching Knowledge Graphs in Entity Embedding Spaces: An Experimental Study
Matching Knowledge Graphs in Entity Embedding Spaces: An Experimental Study
Abstract—Entity alignment (EA) identifies equivalent entities Recently, due to the emergence and proliferation of knowl-
that locate in different knowledge graphs (KGs), and has attracted edge graphs (KGs), matching entities in KGs draws much atten-
growing research interests over the last few years with the advance- tion from both academia and industries. Distinct from traditional
ment of KG embedding techniques. Although a pile of embedding-
based EA frameworks have been developed, they mainly focus on data matching, it brings its own challenges. Particularly, it un-
improving the performance of entity representation learning, while derlines the use of KGs’ structures for matching, and manifests
largely overlook the subsequent stage that matches KGs in entity unique characteristics of data, e.g., imbalanced class distribu-
embedding spaces. Nevertheless, accurately matching entities based tion, few attributive textual information, etc. In consequence,
on learned entity representations is crucial to the overall alignment although viable, following traditional EM pipeline, it is hard to
performance, as it coordinates individual alignment decisions and
determines the global matching result. Hence, it is essential to train an effective classifier that can infer the equivalence between
understand how well existing solutions for matching KGs in entity entities. Thus, much effort has been dedicated to specifically
embedding spaces perform on present benchmarks, as well as their addressing the matching of entities in KGs, which is also referred
strengths and weaknesses. To this end, in this article we provide a to as entity alignment (EA).
comprehensive survey and evaluation of matching algorithms for
Nevertheless, early solutions to EA are mainly unsuper-
KGs in entity embedding spaces in terms of effectiveness and effi-
ciency on both classic settings and new scenarios that better mirror vised [25], [48], i.e., no labeled data is assumed. They utilize dis-
real-life challenges. Based on in-depth analysis, we provide useful criminative features of entities (e.g., entity descriptions and re-
insights into the design trade-offs and good paradigms of existing lational structures) to infer the equivalent entity pair, which are,
works, and suggest promising directions for future development. however, embarrassed by the heterogeneity of independently-
Index Terms—Entity alignment, entity matching, knowledge constructed KGs [50].
graph, knowledge graph alignment. To mitigate this issue, recent solutions to EA employ a few
labeled pairs as seeds to guide the learning and prediction [9],
[16], [31], [43], [54]. In short, they embed the symbolic repre-
I. INTRODUCTION
sentations of KGs as low-dimensional vectors in a way such
ATCHING data instances that refer to the same real-
M world entity is a long-standing problem. It establishes
the connections among multiple data sources, and is critical to
that the semantic relatedness of entities is captured by the
geometrical structures of embedding spaces [4], where the seed
pairs are leveraged to produce unified entity representations. In
data integration and cleaning [39]. Therefore, the task has been the testing stage, they match entities based on the unified entity
actively studied; for instance, in the database community, vari- embeddings. They are coined as embedding-based EA methods,
ous entity matching (EM) (and entity resolution (ER)) strategies which have exhibited state-of-the-art performance on existing
are proposed to train a (supervised) classifier to predict whether benchmarks.
a pair of data records match [10], [39]. To be more specific, the embedding-based EA1 pipeline can be
roughly divided into two major stages, i.e., representation learn-
Manuscript received 24 April 2022; revised 18 March 2023; accepted 20 April ing and matching KGs in entity embedding spaces (or embedding
2023. Date of publication 3 May 2023; date of current version 8 November 2023. matching for short). While the former encodes the KG struc-
The work of Weixin Zeng, Xiang Zhao and Jiuyang Tang were supported in part
by the National Key R&D Program of China under Grant 2020AAA0108800, tures into low-dimensional vectors and establishes connections
and in part by NSFC under Grants 62272469 and 71971212. Recommended for between independent KGs via the calibration or transformation
acceptance by X. Yi. (Corresponding author: Xiang Zhao.) of (seed) entity embeddings [50], the latter computes pairwise
Weixin Zeng, Xiang Zhao, and Jiuyang Tang are with the Laboratory
for Big Data and Decision, National University of Defense Technology, scores between source and target entities based on such em-
Changsha, Hunan 410073, China (e-mail: [email protected]; xi- beddings and then makes alignment decisions according to the
[email protected]; [email protected]). pairwise scores. Although this field has been actively explored,
Zhen Tan is with the Science and Technology on Information Systems
Engineering Laboratory, National University of Defense Technology, Changsha, existing efforts are mainly devoted to the representation learning
Hunan 410073, China (e-mail: [email protected]). stage [19], [30], [70], while embedding matching has not raised
Xueqi Cheng is with the Institute of Computing Technology, CAS, Beijing many attentions until very recently [35], [62]. The majority of
100045, China (e-mail: [email protected]).
To download the IEEE Taxonomy go to http://www.ieee.org/documents/ existing EA solutions adopt a simple algorithm to realize this
taxonomy_v101.pdf
This article has supplementary downloadable material available at
https://doi.org/10.1109/TKDE.2023.3272584, provided by the authors. 1 In the rest of the paper, we use EA to refer to embedding-based EA solutions,
Digital Object Identifier 10.1109/TKDE.2023.3272584 and use conventional EA for the early solutions.
© 2023 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
ZENG et al.: MATCHING KNOWLEDGE GRAPHS IN ENTITY EMBEDDING SPACES: AN EXPERIMENTAL STUDY 12771
Fig. 2. The pipeline of embedding-based EA. Dashed lines denote the pre-
annotated alignment links.
TABLE II
OVERVIEW AND COMPARISON OF STATE-OF-THE-ART ALGORITHMS FOR MATCHING KGS IN ENTITY EMBEDDING SPACES. NOTE THAT WE ESTIMATE THE ORDER
OF MAGNITUDE OF THE TIME AND SPACE COMPLEXITY
in the target KG, and φ(v) is defined similarly. The mean simi-
larity scores of all source and target entities are denoted in vector
form as φs and φt , respectively. To generate the matched entity
pairs, it further applies Greedy on the CSLS matrix (i.e., S CSLS ).
Algorithm 4 describes the detailed procedure of CSLS. Notably,
Algorithm 6: Sink. (Es , Et , E, l).
Li et al. put forward Graph Interactive Divergence (GID) to
compute the similarity score, which in essence works in the Input: Source and target entity sets: Es , Et ; Unified entity
same way as CSLS according to its code implementation [28]. embeddings: E; Hyper-parameter: l
Complexity. The time and space complexity are O(n2 ). Prac- Output: Matched entity pairs: M
tically, it requires more time and space than DInf, as it needs to 1: Derive similarity matrix S based on E;
generate the additional CSLS matrix. 2: S sinkhorn = Sinkhornl (S) (cf. (3));
3: M ← Greedy (Es , Et , S sinkhorn );
D. Reciprocal Embedding Matching 4: return M;
TABLE III
DATASET STATISTICS
TABLE IV
THE F1 SCORES OF ONLY USING STRUCTURAL INFORMATION
extracted from DBpedia [1]: English to Chinese (D-Z), English and GCN to generate the structural embeddings, respectively,
to Japanese (D-J), and English to French (D-F); and (2) SRPRS, DBP and SRP denote DBP15K and SRPRS, respectively. Next,
which is a sparser dataset that follows real-life entity distribution, we supplement with name embeddings, and report the results
including two multilingual KG pairs extracted from DBpedia: in Table V, where N- and NR- refer to only using the name
English to French (S-F) and English to German (S-D), and embeddings and fusing name embeddings with RREA structural
two mono-lingual KG pairs: DBpedia to Wikidata [53] (S-W) representations, respectively. Note that, on existing datasets, all
and DBpedia to YAGO [49] (S-Y); and (3) DWY100K, a larger the entities in the test set can be matched, and all the algorithms
dataset consisting of two mono-lingual KG pairs: DBpedia to are devised to find a target entity for each test source entity.
Wikidata (D-W) and DBpedia to YAGO (D-Y). The detailed Hence, the number of matches found by a method equals to the
statistics can be found in Table III, where the numbers of entities, number of gold matches, and consequently the precision value
relations, triples, gold links, and the average entity degree are is equal to the recall value and the F1 score [65].
reported. Regarding the gold alignment links, we adopted 70% Overall Performance. First, we do not delve into the em-
as test set, 20% for training, and 10% for validation. bedding matching algorithms and directly analyze the general
Evaluation metric. We utilized F1 score as the evaluation results. Specifically, using RREA to learn structural representa-
metric, which is the harmonic mean between precision and tions can bring better performance compared with using GCN,
recall, where the precision value is computed as the number showcasing that representation learning strategies are crucial to
of correct matches divided by the number of matches found by the overall alignment performance. When introducing the entity
a method, and the recall value is computed as the number of name information, it observes that this auxiliary signal alone
correct matches found by a method divided by the number of can already provide very accurate signal for alignment. This
gold matches. Note that recall is equivalent to the Hits@1 metric is because the equivalent entities in different KGs of current
used in some previous works. datasets share very similar or even identical names. After fusing
Similarity Metric. After obtaining the unified entity represen- the semantic and structural information, the alignment perfor-
tations E, a similarity metric is required to produce pairwise mance is further lifted, with most of the approaches hitting over
scores and generate the similarity matrix S. Frequent choices 0.9 in terms of the F1 score.
include the cosine similarity [7], [36], [52], the euclidean dis- Effectiveness Comparison of Embedding Matching Algo-
tance [8], [27] and the Manhattan distance [55], [58]. In this rithms. From the tables, it is evident that:
work, we followed mainstream works and adopted the cosine (1) Overall, Hun. and Sink. attain much better results than
similarity. the other strategies. Specifically, Hun. takes full account of
Notably, we omit more detailed experimental settings in the the global matching constraints and strives to reach a globally
interest of space, which can be found in Appendix B, available optimal matching given the objective of maximizing the sum
online. of pairwise similarity scores. Moreover, the 1-to-1 constraint it
exerts aligns with present evaluation setting where the source and
target entities are 1-to-1 matched. Sink., on the other hand, im-
C. Main Results and Comparison plicitly implements the 1-to-1 constraint during pairwise score
We first evaluate with only structural information and report computation and still adopts Greedy to produce final results,
the results in Table IV, where R- and G- refer to using RREA where there might exist non 1-to-1 matches; (2) DInf attains
12778 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 12, DECEMBER 2023
TABLE V
THE F1 SCORES OF USING AUXILIARY INFORMATION
Fig. 4. The statistic of pairwise similarity scores (i.e., Top-5 STD), where the
name of the setting is abbreviated, e.g., R-D stands for R-DBP.
TABLE VII
F1 SCORES ON DBP15K+
might well fall short in some scenarios, as the alignment on KGs 4) Currently, the most performant embedding matching algo-
possesses it own challenges, e.g., the matching is not necessarily rithms are not scalable. Among them, the Hungarian algo-
1-to-1 constrained, or the pairwise scores are inaccurate. Thus, rithm requires approximately one hour on the DWY100K
it is suggested to take full account of the characteristics of dataset. Hence, in this case, it might be better to utilize
the alignment settings when adapting other general matching the RInf and its variant algorithms, which save 2/3 of time
algorithms to cope with matching KGs in entity embedding cost at the expense of < 10% performance drop compared
spaces. with the Hungarian algorithm.
(4) The scalability and efficiency should be brought to the
attention. Existing advanced embedding matching algorithms VII. CONCLUSION
have poor scalability, due to the additional resource-consuming
operations that contribute to the alignment performance, such as This paper conducts a comprehensive survey and evaluation
the ranking process in RInf and the 1-to-1 constraint exerted by of matching algorithms for KGs in entity embedding spaces.
Hun. and SMat. Besides, the space efficiency is also a critical We evaluate seven state-of-the-art strategies in terms of effec-
issue. As shown in Section IV-D, most of the approaches have tiveness and efficiency on a wide range of datasets, including
rather high memory costs given large-scale datasets. Therefore, two experimental settings that better mirror real-life challenges.
considering that in practice there are much more entities, the We identify the strengths and weaknesses of these algorithms
scalability and efficiency issues should be considered during the under different settings. We hope the experimental results would
algorithm design. A preliminary exploration has been conducted be valuable for researchers to put forward more effective and
by [15]. scalable embedding matching algorithms.
(5) The practical evaluation settings are worth further inves-
tigation. Under the unmatchable and non 1-to-1 alignment set- REFERENCES
tings, the performance of existing algorithms is not promising. A
[1] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives,
possible future direction is to introduce the notion of probability “DBpedia: A nucleus for a web of open data,” in Proc. Int. Semantic Web
and leverage the probabilistic reasoning frameworks [22], [45], Conf., 2007, pp. 722–735.
which have higher flexibility, to produce the alignment results. [2] M. Berrendorf, E. Faerman, V. Melnychuk, V. Tresp, and T. Seidl, “Knowl-
edge graph entity alignment with graph convolutional networks: Lessons
(6) Integrating the relation embedding might help. Two lat- learned,” in Proc. Eur. Conf. IR Res., 2020, pp. 3–11.
est studies propose to use relation embeddings to help induce [3] K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor, “Freebase: A
aligned entity pairs [33], [56]. Different from existing methods collaboratively created graph database for structuring human knowledge,”
in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2008, pp. 1247–1250.
that regard EA as a matrix (second-order tensor) isomorphism [4] A. Bordes, N. Usunier, A. García-Durán, J. Weston, and O. Yakhnenko,
problem, they express the isomorphism of KGs in the form of “Translating embeddings for modeling multi-relational data,” in Proc. Int.
third-order tensors to better describe the structural information Conf. Neural Inf. Process. Syst., 2013, pp. 2787–2795.
[5] U. Brunner and K. Stockinger, “Entity matching with transformer archi-
of KGs [33]. Thus, it might be interesting to study the matching tectures - A step forward in data integration,” in Proc. 23rd Int. Conf.
between KGs in the joint entity and relation embedding space. Extending Database Technol., Copenhagen, Denmark, Mar. 30 - Apr. 02,
We also provide some actionable insights: 2020, pp. 463–473.
[6] W. Cai, W. Ma, J. Zhan, and Y. Jiang, “Entity alignment with reliable
1) In 1-to-1 constrained scenarios, it is preferable to use Hun- path reasoning and relation-aware heterogeneous graph transformer,” in
garian algorithm or the Sinkhorn operation to conduct the Proc. 31st Int. Joint Conf. Artif. Intell., Vienna, Austria, Jul. 23–29 2022,
matching, as they explicitly or implicitly implement the pp. 1930–1937.
[7] Y. Cao, Z. Liu, C. Li, Z. Liu, J. Li, and T. Chua, “Multi-channel graph
1-to-1 constraint during execution, and take full account neural network for entity alignment,” in Proc. Assoc. Comput. Linguistics,
of the global matching constraints and strive to reach a 2019, pp. 1452–1461.
globally optimal matching given the objective of maxi- [8] M. Chen, Y. Tian, K. Chang, S. Skiena, and C. Zaniolo, “Co-training
embeddings of knowledge graphs and entity descriptions for cross-
mizing the sum of pairwise similarity scores. Given large- lingual entity alignment,” in Proc. Int. Joint Conf. Artif. Intell., 2018,
scale datasets, using Hungarian algorithm would be more pp. 3998–4004.
time-efficient, as Sinkhorn operation needs to operate for [9] M. Chen, Y. Tian, M. Yang, and C. Zaniolo, “Multilingual knowledge
graph embeddings for cross-lingual knowledge alignment,” in Proc. Int.
multiple rounds to achieve convergence. Besides, while Joint Conf. Artif. Intell., 2017, pp. 1511–1517.
Hungarian algorithm depends mainly on CPU, Sinkhorn [10] V. Christophides, V. Efthymiou, T. Palpanas, G. Papadakis, and K. Ste-
operation relies on GPU. fanidis, “An overview of end-to-end entity resolution for Big Data,” ACM
Comput. Surv., vol. 53, no. 6, pp. 127:1–127:42, 2021.
2) Given datasets with unmatchable entities, it is suggested [11] A. Doan et al., “Magellan: Toward building ecosystems of entity matching
to add dummy nodes to make the number of entities in solutions,” Commun. ACM, vol. 63, no. 8, pp. 83–91, 2020.
both sides equal, and then use the Hungarian algorithm. [12] J. Doerner, D. Evans, and A. Shelat, “Secure stable matching at scale,” in
SIGSAC Conf. Comput. Commun. Secur., 2016, pp. 1602–1613.
In this scenario, there is still much room for improvement. [13] M. Fey, J. E. Lenssen, C. Morris, J. Masci, and N. M. Kriege, “Deep graph
3) Non 1-to-1 alignment is a realistic and frequently observed matching consensus,” in Proc. Int. Conf. Learn. Representations, 2020.
scenario that has not received much research attention. [14] D. Gale and L. S. Shapley, “College admissions and the stability of
marriage,” The Amer. Math. Monthly, vol. 69, no. 1, pp. 9–15, 1962.
Among existing algorithms, RInf and CSLS are preferred, [15] Y. Gao, X. Liu, J. Wu, T. Li, P. Wang, and L. Chen, “Clusterea: Scalable
since they take into account the global influence on the entity alignment with stochastic training and normalized mini-batch simi-
local matching and meanwhile do not strictly enforce the larities,” in Proc. 28th ACM SIGKDD Conf. Knowl. Discov. Data Mining,
Washington, DC, USA, Aug. 14–18, 2022, pp. 421–431.
1-to-1 constraint. More practical solutions are to be put [16] C. Ge, X. Liu, L. Chen, B. Zheng, and Y. Gao, “LargeEA: Aligning entities
forward to effectively address non 1-to-1 alignment. for large-scale knowledge graphs,” 2021, arXiv:2108.05211.
ZENG et al.: MATCHING KNOWLEDGE GRAPHS IN ENTITY EMBEDDING SPACES: AN EXPERIMENTAL STUDY 12783
[17] C. Ge, X. Liu, L. Chen, B. Zheng, and Y. Gao, “Make it easy: An effective [41] G. Papadakis, D. Skoutas, E. Thanos, and T. Palpanas, “Blocking and
end-to-end entity alignment framework,” in Proc. Int. ACM SIGIR Conf. filtering techniques for entity resolution: A survey,” ACM Comput. Surv.,
Res. Develop. Informat. Retrieval, 2021, pp. 777–786. vol. 53, no. 2, pp. 31:1–31:42, 2020.
[18] C. Ge, P. Wang, L. Chen, X. Liu, B. Zheng, and Y. Gao, “CollaborEM: [42] H. Paulheim, “Knowledge graph refinement: A survey of approaches and
A self-supervised entity matching framework using multi-features col- evaluation methods,” Semantic Web, vol. 8, no. 3, pp. 489–508, 2017.
laboration,” IEEE Trans. Knowl. Data Eng., early access, Dec. 13, 2021, [43] S. Pei, L. Yu, and X. Zhang, “Improving cross-lingual entity alignment via
doi: 10.1109/TKDE.2021.3134806. optimal transport,” in Proc. Int. Joint Conf. Artif. Intell., S. Kraus, editor,
[19] L. Guo, Q. Zhang, Z. Sun, M. Chen, W. Hu, and H. Chen, “Understanding 2019, pp. 3231–3237.
and improving knowledge graph embedding for entity alignment,” in Proc. [44] L. A. S. Pizzato, T. Rej, T. Chung, I. Koprinska, and J. Kay, “RECON: A
Int. Conf. Mach. Learn., vol. 162 Proc. Mach. Learn. Res., Baltimore, reciprocal recommender for online dating,” in Proc. ACM Conf. Recom-
Maryland, USA, Jul. 17–23 2022, pp. 8145–8156. mender syst., 2010, pp. 207–214.
[20] E. Jiménez-Ruiz and B. C. Grau, “Logmap: Logic-based and scalable [45] J. Pujara, H. Miao, L. Getoor, and W. W. Cohen, “Large-scale knowledge
ontology matching,” in Proc. Int. Semantic Web Conf., Springer, 2011, graph identification using PSL,” in Proc. Conf. Assoc. Advance. Artif.
pp. 273–288. Intell., 2013.
[21] R. Jonker and A. Volgenant, “A shortest augmenting path algorithm for [46] A. E. Roth, “Deferred acceptance algorithms: History, theory, practice,
dense and sparse linear assignment problems,” Computing, vol. 38, no. 4, and open questions,” Int. J. Game Theory, vol. 36, no. 3/4, pp. 537–569,
pp. 325–340, 1987. 2008.
[22] A. Kimmig, A. Memory, R. J. Miller, and L. Getoor, “A collective, [47] P. Shvaiko and J. Euzenat, “Ontology matching: State of the art and future
probabilistic approach to schema mapping,” in Proc. IEEE Int. Conf. Data challenges,” IEEE Trans. Knowl. Data Eng., vol. 25, no. 1, pp. 158–176,
Eng., 2017, pp. 921–932. Jan. 2013.
[23] T. N. Kipf and M. Welling, “Semi-supervised classification with graph [48] F. M. Suchanek, S. Abiteboul, and P. Senellart, “PARIS: Probabilistic
convolutional networks,” in Proc. Int. Conf. Learn. Representations, 2017. alignment of relations, instances, and schema,” Proc. VLDB Endow., vol. 5,
[24] H. W. Kuhn, “The hungarian method for the assignment problem,” Nav. no. 3, pp. 157–168, 2011.
Res. Logistics Quart., vol. 2, no. 1/2, pp. 83–97, 1955. [49] F. M. Suchanek, G. Kasneci, and G. Weikum, “Yago: A core of semantic
[25] S. Lacoste-Julien, K. Palla, A. Davies, G. Kasneci, T. Graepel, and Z. knowledge,” in Proc. Int. World Wide Web Conf., 2007, pp. 697–706.
Ghahramani, “SiGMa: Simple greedy matching for aligning large knowl- [50] Z. Sun et al., “A benchmarking study of embedding-based entity align-
edge bases,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data ment for knowledge graphs,” Proc. VLDB Endow., vol. 13, no. 11,
Mining, 2013, pp. 572–580. pp. 2326–2340, 2020.
[26] G. Lample, A. Conneau, M. Ranzato, L. Denoyer, and H. Jégou, “Word [51] A. Surisetty et al., “RePS: Relation, position and structure aware en-
translation without parallel data,” in Proc. Int. Conf. Learn. Representa- tity alignment,” in Proc. Web Conf., Virtual Event / Lyon, France,
tions, 2018. Apr. 25–29, 2022, pp. 1083–1091.
[27] C. Li, Y. Cao, L. Hou, J. Shi, J. Li, and T. Chua, “Semi-supervised [52] B. D. Trisedya, J. Qi, and R. Zhang, “Entity alignment between knowledge
entity alignment via joint knowledge embedding model and cross-graph graphs using attribute embeddings,” in Proc. Conf. Assoc. Advance. Artif.
model,” in Proc. Conf. Empir. Methods Natural Lang. Process., 2019, Intell., 2019, pp. 297–304.
pp. 2723–2732. [53] D. Vrandecic and M. Krötzsch, “Wikidata: A free collaborative knowl-
[28] J. Li and D. Song, “Uncertainty-aware pseudo label refinery for entity edgebase,” Commun. ACM, vol. 57, no. 10, pp. 78–85, 2014.
alignment,” in Proc. ACM Web Conf., Virtual Event, Lyon, France, Apr. [54] Z. Wang, Q. Lv, X. Lan, and Y. Zhang, “Cross-lingual knowledge graph
25–29, 2022, pp. 829–837. alignment via graph convolutional networks,” in Proc. Conf. Empir. Meth-
[29] Y. Li, J. Li, Y. Suhara, A. Doan, and W. Tan, “Deep entity matching ods Natural Lang. Process., 2018, pp. 349–357.
with pre-trained language models,” Proc. VLDB Endow., vol. 14, no. 1, [55] Y. Wu, X. Liu, Y. Feng, Z. Wang, R. Yan, and D. Zhao, “Relation-aware
pp. 50–60, 2020. entity alignment for heterogeneous knowledge graphs,” in Proc. Int. Joint
[30] X. Lin, H. Yang, J. Wu, C. Zhou, and B. Wang, “Guiding cross-lingual Conf. Artif. Intell., 2019, pp. 5278–5284.
entity alignment via adversarial knowledge embedding,” in Proc. IEEE [56] K. Xin, Z. Sun, W. Hua, W. Hu, and X. Zhou, “Informed multi-context
Int. Conf. Data Minings, 2019, pp. 429–438. entity alignment,” in Proc. ACM Int. Conf. Web Search Data Mining,
[31] B. Liu, H. Scells, G. Zuccon, W. Hua, and G. Zhao, “ActiveEA: Active Tempe, AZ, USA, Feb. 21–25, 2022, pp. 1197–1205.
learning for neural entity alignment,” in Proc. Conf. Empir. Methods [57] K. Xu, L. Song, Y. Feng, Y. Song, and D. Yu, “Coordinated reasoning for
Natural Lang. Process., 2021, pp. 3364–3374. cross-lingual knowledge graph alignment,” in Proc. Conf. Assoc. Advance.
[32] X. Liu et al., “SelfKG: Self-supervised entity alignment in knowl- Artif. Intell., 2020, pp. 9354–9361.
edge graphs,” in Proc. ACM Web Conf., Virtual Event, Lyon, France, [58] H. Yang, Y. Zou, P. Shi, W. Lu, J. Lin, and X. Sun, “Aligning cross-lingual
Apr. 25–29, 2022, pp. 860–870. entities with multi-aspect information,” in Proc. Conf. Empir. Methods
[33] X. Mao et al., “An effective and efficient entity alignment decoding Natural Lang. Process., 2019, pp. 4430–4440.
algorithm via third-order tensor isomorphism,” in Proc. 60th Annu. [59] J. Yang et al., “Entity and relation matching consensus for entity align-
Meeting Assoc. Comput. Linguistics, Dublin, Ireland, May 22–27, 2022, ment,” in Proc. Conf. Inf. Knowl. Manage., 2021, pp. 2331–2341.
pp. 5888–5898. [60] K. Zeng et al., “Interactive contrastive learning for self-supervised entity
[34] X. Mao, W. Wang, Y. Wu, and M. Lan, “Boosting the speed of en- alignment,” in Proc. 31st ACM Int. Conf. Inf. Knowl. Manage., Atlanta,
tity alignment 10 ×: Dual attention matching network with normal- GA, USA, Oct. 17–21, 2022, pp. 2465–2475.
ized hard sample mining,” in Proc. Int. World Wide Web Conf., 2021, [61] K. Zeng, C. Li, L. Hou, J. Li, and L. Feng, “A comprehensive survey of
pp. 821–832. entity alignment for knowledge graphs,” AI Open, vol. 2, pp. 1–13, 2021.
[35] X. Mao, W. Wang, Y. Wu, and M. Lan, “From alignment to assignment: [62] W. Zeng, X. Zhao, X. Li, J. Tang, and W. Wang, “On entity alignment at
Frustratingly simple unsupervised entity alignment,” in Proc. Conf. Empir. scale,” VLDB J., vol. 31, pp. 1009–1033, 2021.
Methods Natural Lang. Process., 2021, pp. 2843–2853. [63] W. Zeng, X. Zhao, J. Tang, X. Li, M. Luo, and Q. Zheng, “Towards entity
[36] X. Mao, W. Wang, H. Xu, Y. Wu, and M. Lan, “Relational reflection entity alignment in the open world: An unsupervised approach,” in Proc. 26th
alignment,” in Proc. Conf. Inf. Knowl. Manage., 2020, pp. 1095–1104. Int. Conf. Database Syst. Adv. Appl., 2021, pp. 272–289.
[37] G. E. Mena, D. Belanger, S. W. Linderman, and J. Snoek, “Learning latent [64] W. Zeng, X. Zhao, J. Tang, and X. Lin, “Collective entity alignment
permutations with gumbel-sinkhorn networks,” in Proc. Int. Conf. Learn. via adaptive features,” in Proc. IEEE Int. Conf. Data Eng., 2020,
Representations, 2018. pp. 1870–1873.
[38] V. Mnih et al., “Asynchronous methods for deep reinforcement learning,” [65] W. Zeng, X. Zhao, J. Tang, X. Lin, and P. Groth, “Reinforcement learning-
in Proc. Int. Conf. Mach. Learn., 2016, pp. 1928–1937. based collective entity alignment with adaptive features,” ACM Trans. Inf.
[39] S. Mudgal et al., “Deep learning for entity matching: A design space Syst., vol. 39, no. 3, pp. 26:1–26:31, 2021.
exploration,” in Proc. Int. Conf. Manage. Data, Houston, TX, USA, Jun. [66] R. Zhang, B. D. Trisedya, M. Li, Y. Jiang, and J. Qi, “A benchmark and
10–15, 2018, pp. 19–34. comprehensive survey on knowledge graph entity alignment via represen-
[40] T. T. Nguyen et al., “Entity alignment for knowledge graphs with multi- tation learning,” VLDB J., vol. 31, no. 5, pp. 1143–1168, 2022.
order convolutional networks,” IEEE Trans. Knowl. Data Eng., vol. 34, [67] Z. Zhang et al., “An industry evaluation of embedding-based entity align-
no. 9, pp. 4201–4214, Sep. 2022. ment,” in Proc. 28th Int. Conf. Comput. Linguistics, 2020, pp. 179–189.
12784 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 12, DECEMBER 2023
[68] X. Zhao, W. Zeng, J. Tang, W. Wang, and F. Suchanek, “An experimental Zhen Tan received the PhD degree from National
study of state-of-the-art entity alignment approaches,” IEEE Trans. Knowl. University of Defense Technology (NUDT), China,
Data Eng., vol. 34, no. 6, pp. 2610–2625, Jun. 2022. in 2018. He is currently an associate professor with
[69] R. Zhu, M. Ma, and P. Wang, “RAGA: Relation-aware graph attention NUDT. His research interests include knowledge
networks for global entity alignment,” in Proc. Pacific-Asia Conf. Adv. graphs and advanced data analytics.
Knowl. Discov. Data Mining, 2021, pp. 501–513.
[70] Y. Zhu, H. Liu, Z. Wu, and Y. Du, “Relation-aware neighborhood matching
model for entity alignment,” in Proc. Conf. Assoc. Advance. Artif. Intell.,
2021, pp. 4749–4756.
Xiang Zhao received the PhD degree from The Uni- Xueqi Cheng (Senior Member, IEEE) is currently
versity of New South Wales, Australia, in 2013. He a professor with the Institute of Computing Tech-
is currently a professor with the National University nology, Chinese Academy of Sciences. His research
of Defense Technology, China. His research interests interests include network science, web search and
include graph data management and mining, with a data mining, Big Data processing, and distributed
special focus on knowledge graphs. computing architecture.