Encoding Conceptual Models For Machine Learning A Systematic Review
Encoding Conceptual Models For Machine Learning A Systematic Review
2023 ACM/IEEE International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C) | 979-8-3503-2498-3/23/$31.00 ©2023 IEEE | DOI: 10.1109/MODELS-C59198.2023.00094
Abstract—Conceptual models are essential in Software and data provided by conceptual models has gained much atten-
Information Systems Engineering to meet many purposes since tion in supporting various conceptual modeling tasks such
they explicitly represent the subject domains. Machine Learning as intelligent modeling assistants [3], model completion [4],
(ML) approaches have recently been used in conceptual modeling
to realize, among others, intelligent modeling assistance, model model transformation [5], metamodel repository management,
transformation, and metamodel classification. These works en- and model domain classification [6], [7]. Furthermore, there
code models in various ways, making the encoded models suitable is a potential to apply ML to publicly available sources of
for applying ML algorithms. The encodings capture the models’ high quality (F.A.I.R. principles [8]) models to enable reuse,
structure and/or semantics, making this information available adaptation, and (collaborative) learning, as well as empirical
to the ML model during training. Therefore, the choice of the
encoding for any ML-driven task is crucial for the ML model to modeling research.
learn the relevant contextual information. In this paper, we report ML on conceptual models aims to “learn” generalized pat-
findings from a systematic literature review which yields insights terns that capture the explicit mapping between the conceptual
into the current research in machine learning for conceptual model’s elements and the domain concepts represented by
modeling (ML4CM). The review focuses on the various encodings them. In other words, the trained ML model should be able
used in existing ML4CM solutions and provides insights into
i) which are the information sources, ii) how is the conceptual to answer what the conceptual model represents in terms of
model’s structure and/or semantics encoded, iii) why is the model the “meaning” of the domain concepts and model elements.
encoded, i.e., for which conceptual modeling task and, iv) which ML-based solutions for conceptual modeling follow a specific
ML algorithms are applied. The results aim to structure the state pattern of first encoding the conceptual model’s semantics in a
of the art in encoding conceptual models for ML. representation suitable for training ML models. Then, the ML
Index Terms—Machine learning, Model-driven engineering,
Model Encoding, Systematic Literature Review
models are trained to learn the knowledge encoded in con-
ceptual models to support CM tasks like metamodel element
prediction and domain classification. ML models typically aim
I. I NTRODUCTION to learn generalized patterns from an input dataset by utilizing
Conceptual modeling (CM) explicitly captures (descriptive a certain encoding of the knowledge represented by the
and/or prescriptive) domain knowledge where a domain, in conceptual models. Therefore, the encoding constraints what
an enterprise and information systems engineering context, is can an ML model learn from the available knowledge in the
anything that is being modeled, including—but not limited model. The contextual information that captures representative
to—business processes, information structures, business trans- semantics of the data needs to be accessible to the ML model
actions, and value exchanges, enabling domain understanding during training for the ML model to learn semantically rich
and communication among stakeholders [1]. Model-driven en- patterns. Current ML-based CM solutions primarily rely on the
gineering (MDE) is a software development approach that em- lexical terms (i.e., names) used as labels on modeling language
phasizes the use of models1 as the primary artifacts throughout primitives (e.g., classes, relations, attributes) to capture the
the entire software development lifecycle. These models can be models’ contextual semantics. This leads to a situation where
automatically transformed and refined to generate executable the natural language (NL) semantics of the primitives are
code, documentation, and other artifacts [2]. encoded. However, additional sources of semantics such as
Applying Machine Learning (ML) techniques i.e., Deep structural semantics, the metamodel semantics, and the CM
Learning (DL) and Natural Language Processing (NLP), on elements’ ontological semantics are left implicit.
Therefore, various issues arise depending upon the require-
1 Throughout the paper, we will use the term ‘model’ to relate to a ments that need to be addressed before applying ML to
conceptual model and ‘ML model’ to relate to machine learning models. CM tasks. Firstly, the sources of relevant information need
563
Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:27:51 UTC from IEEE Xplore. Restrictions apply.
the following logically structured search query in eq. (1).
Instead of starting from scratch and developing a separate
query, we filtered out the documents relevant to our study
from the SMS. We understand that this approach can be seen
as a limitation. However, we chose this alternative for two
reasons: i) our query in the SMS is very inclusive (see eq. (1)),
and ii) we had already done a detailed review and were able
to easily exclude papers which had the contribution in the
direction of AI towards CM (AI4CM). Note that due to the
nature of our query, we do not include the works that do not
apply ML in their approach. This implies that the works that
e.g., propose non-ML-based similarity metrics based on the
structural features of the model graph or the semantics of the
model elements are not within the scope of this work. We aim
to conduct a broader review to cover such cases in the future.
564
Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:27:51 UTC from IEEE Xplore. Restrictions apply.
TABLE I: Classification scheme keywords description
Attribute Description Values
Model Structure If the model’s graph structure is encoded or not. Explicit, Implicit, Not Used
Structural Encoding The model structure encoding type. Raw Graph, Tree-based, Graph Kernel, Bag of Paths, Axiomatic, N-grams, Manual
Metrics
Semantic Data If the semantic data in the model is encoded. Explicit, Implicit, Not Used
Metamodel Semantics If the metamodel semantics are captured in the encoding. Yes, No
Ontological Semantics If the model terms are annotated with ontological semantics Yes, No
and further used in model encoding.
Semantic Encoding The model semantics encoding type. BoW Word Embeddings, BoW TF-IDF, Raw BoW, One-hot, Raw String, Manual
Metrics
Modeling Purpose The ML-based application for which the model is encoded. Analysis, Classification, Completion, Refactoring, Repair, Transformation
ML Model The ML model used in the paper to train on the encoded Classical Machine Learning, Deep Learning without Graph, Deep Learning with
models’ data. Graph, Reinforcement Learning
IV. F INDINGS
565
Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:27:51 UTC from IEEE Xplore. Restrictions apply.
(a) Structural Encodings (b) Semantic Encodings
Fig. 4: Visualization of different model encodings
corresponding to the lexical term, where the size of the vector such as i) using string comparison metrics like Levenshtein
is the total size of the vocabulary; iii) Raw Bag-of-words Distance4 ; ii) defining own metrics for calculating the simi-
(BoW) where the model is represented as a vector containing larity between labels of elements [27], [28] which does not
all lexical model terms; iv) Term Frequency-Inverse Document require any normalization using NLP; iii) renaming all the
Frequency (TF-IDF) vector where the term frequency along tokens that are not keywords to a closed set of words (for
with inverse document frequency of the lexical terms is instance, classes’ names are A, B, C, ..., attribute names are
calculated and then the model is represented with the TF-IDF x, y, z, ..., etc.) [5]; or iv) using only specific keywords as
value of each lexical term in the model; v) BoW embeddings lexical terms where the NL semantics of the terms are not
where each word is represented in the form of a fixed-sized relevant [16], [27]. Word embeddings from large language
vector of arbitrary length where the values in the vector for models (LLMs) (GPT5 , BERT6 ) can encode lexical terms
each word are produced by a language model pre-trained on to get nuanced contextualized word embeddings. Therefore,
a general or domain-specific data corpus; vi) manual metrics several recent works use LLMs-based word embeddings.
that use some specific keywords (e.g., keyword “set”, “get” Thirdly, the model encoded as a raw graph is the most com-
in the model serialization) metrics to implicitly capture the mon encoding technique for capturing the model’s structural
model semantics; and vii) Axiomatic representation, which is information. However, the number of works overall is signifi-
the same as in the case of structural encoding. cantly less (8 out of 37). Moreover, we see that raw graph as
a structural encoding and BoW Embeddings as the semantic
3) Encodings’ usage analysis: – We show the analysis
encoding is a frequent combination. Further analysis showed
of the different encoding pairs used in both the structural
that cases that use this combination benefit from capturing
and semantic dimensions in Table II. We note several key
both the structural and semantic information, e.g., learning a
things from the table. Firstly, the “No Encodings” for the
vector representation of a model [44], and characterizing a
model structure column has the most papers. This is consistent
model generator [45]. Other path-based encodings, such as
with the fact that 15 out of 37 papers did not include
N-grams and Bag-of-Path (BoP), that can encode the model
structural encodings (see Fig. 3) and only used semantic
as a set of paths are not frequent. This seems to be due to
data encoding, with TF-IDF as the most common encoding.
these encodings’ limitation in sufficiently capturing the model
BoW word embeddings and TF-IDF are vector-embeddings-
structure. Finally, several works have used manually selected
based encodings and are the most common choice to embed
metrics and axiomatic representation to encode the model’s
the semantic data (19 out of 37). This choice seems logical
structure and semantics. In these cases, authors, instead of
because if one needs to capture the correlations between the
using the ML model, design their task-specific metrics without
lexical terms of the model, its metamodel, and any ontological
applying any encoding.
semantics associated with the model, then techniques like TF-
IDF and pre-trained language models (LM) can capture these
correlations more effectively as these techniques learn (in case C. Response to RQ3 – How does the ML purpose correlate
of LMs), compute (in case of TF-IDF) these word embeddings with the used encoding and modeling language?
over a large vocabulary, thereby providing a more contextual In the following, we elaborate on the different purposes of
encoding. Several works apply NLP normalization techniques model encodings for ML4CM.
like stop word removal, stemming, and lemmatization before
encoding the lexical terms with TF-IDF or LMs. Interestingly, 4 https://en.wikipedia.org/wiki/Levenshtein distance
we found that some works do not even apply NLP techniques 5 https://openai.com/research/gpt-4
while using lexical terms. There are different reasons for this, 6 https://huggingface.co/docs/transformers/model doc/bert
566
Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:27:51 UTC from IEEE Xplore. Restrictions apply.
TABLE II: Structural and Semantic Encodings across all relevant papers
Semantic Encoding Structural Encodings
No Encodings Manual Metrics Axiomatic N-grams BoP Graph Kernel Tree-based Raw Graph Total
Axiomatic [16] 1
567
Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:27:51 UTC from IEEE Xplore. Restrictions apply.
Fig. 6: Structural (left) and semantic (right) encodings with ML models
to encode the model structure. The plot shows that GNN and structural semantics by encoding the graph’s structural aspects
FFNN are the most used models to capture the model struc- and semantic data using generalized semantically rich word
ture. Moreover, GNNs use a raw graph as model encoding, embeddings. Moreover, FFNN and KNN are frequently used
making GNNs suitable for learning the structural and semantic as general-purpose ML models to encode semantic data with
information from the conceptual model’s graph encoding. Fur- different kinds of encodings. Transformers have been used
thermore, tree-based encodings are used to serialize the model only with BoW word embeddings because of the Transformer
as a sequence of tokens [4], [5] to make the model encoding architecture’s capability of fine-tuning generalized word em-
suitable for DL models like Transformers and LSTM (which beddings for a given context. Therefore, the papers that use
do not explicitly capture the model structure) for model com- Transformers first use the generalized word embeddings from
pletion and transformation. However, tree-based encodings in the pre-trained language models like BERT and then fine-
the current works do not capture longer dependencies, i.e., tune their embeddings for their task [4]. Finally, user and
exceeding an element’s direct neighbors. In contrast, raw graph content-based collaborative filtering (UBCF and CBCF) use
with GNNs allows capturing such information to larger depths, a one-hot encoding for model elements recommendation [30]
which explains the higher frequency of the combination of and a K-Means with BoW TF-IDF encoding for model
raw graphs with GNNs. Multiple models are used with Graph classification [36]. There are other ML approaches such as
Kernel encoding, where Graph Kernels transform the model Random Forest (RF), Naive Bayes (NB), and Inductive Linear
into a set of features and use them to apply graph similarity Programming that are not usually used in ML4CM research.
metrics with ML models like SVM, Naive Bayes, and Random
Forest. BoP encoding stores the model as a collection of paths V. D ISCUSSION
such that the paths (or part of models) thereby allow model
This section summarizes our findings, discusses insights,
similarity comparisons using different ML approaches like
and reflects on the remaining research gaps we observed.
KNN, Apriori association rules, or even complex DL models
We see in Fig. 3 that there is a lack of metamodel and
like Transformers as shown in Fig. 6. Note that N-grams are
ontological semantics contribution towards the “meaning” of
quite similar to BoP in capturing the sequence of vertices
model elements. We consider this lack a first research gap.
that can capture relationships but do not capture complex
The relationship of the model elements with the domain is
relationships compared to BoP [46]. However, it is interesting
not sufficiently captured by only the lexical terms represented
to note that the N-grams encoding is also used with GaAN,
as BoW, TF-IDF, or word embeddings. Using only the NL
where GaAN compensates for the limitations of N-grams of
semantics of words leads to missing out on the contextual se-
not capturing the complex relationships by capturing longer
mantics provided by the model’s metamodel capturing model
graph structural dependencies.
elements’ types and ontological semantics capturing domain
Fig. 6 shows that KNN and GNNs are frequently used concepts. Moreover, providing only type level information i.e.,
for semantic encoding. KNN seems to be a common choice an element is a Class or Relationship, not the relationship
due to its simplicity of finding similarity measures of models between the types on the metamodel level also hides informa-
where the nearest neighbor of a model is considered a similar tion. Without encoding metamodel or ontological semantics,
model based on the model encoding. The similarity measure the ML model misses out on learning type level semantics,
enables efficient model comparison and thereby, classification. the relationship between types, the properties of types (why
Furthermore, KNN is most frequently used with BoW TF- is a class abstract, when does a class need to be abstract),
IDF encoding. The common encodings used with GNNs are common software design patterns, what kind of a foundational
BoW word embeddings. This shows that GNNs can capture ontological stereotype should the class have—all of which
568
Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:27:51 UTC from IEEE Xplore. Restrictions apply.
is important information that makes the conceptual model a [7] J. A. H. López, R. Rubei, J. S. Cuadrado, and D. Di Ruscio, “Machine
learning methods for model classification: a comparative study,” in
semantically rich artifact. Proceedings of the 25th International Conference on Model Driven
We further see in Fig. 3 and Table II that in many cases Engineering Languages and Systems, 2022, pp. 165–175.
structural encodings are not used. We consider this as a sec- [8] A. Jacobsen, R. de Miranda Azevedo, N. Juty, D. Batista, S. Coles,
R. Cornet, M. Courtot, M. Crosas, M. Dumontier, C. T. Evelo et al.,
ond research gap. Moreover, graph encoding techniques like “Fair principles: interpretations and implementation considerations,” pp.
Graph Kernel, which capture local and global neighborhood 10–29, 2020.
structures, are underrepresented and can be used to add more [9] D. Bork, S. J. Ali, and B. Roelens, “Conceptual modeling and
artificial intelligence: A systematic mapping study,” arXiv preprint
structural information to the encoding. arXiv:2303.06758, 2023.
We acknowledge that in our SLR, we have not provided [10] R. Clarisó and J. Cabot, “Applying graph kernels to model-driven engi-
a performance analysis of each of the encodings related to neering problems,” in 1st International Workshop on Machine Learning
and Software Engineering in Symbiosis, 2018, pp. 1–5.
different purposes. However, comparative performance evalu- [11] B. Kitchenham, S. Charters et al., “Guidelines for performing systematic
ation is difficult because of the lack of standardized datasets literature reviews in software engineering,” 2007.
for specific purposes and specific modeling languages. There [12] Ö. Babur and L. Cleophas, “Using n-grams for the automated clustering
of structural models,” in 43rd Int. Conf. on Current Trends in Theory
are further no baselines to test the performance of different and Practice of Computer Science, 2017, pp. 510–524.
encodings systematically. In our analysis, we found that out of [13] B. K. Sidhu, K. Singh, and N. Sharma, “A machine learning approach
all the works that make their dataset public, all the datasets are to software model refactoring,” International Journal of Computers and
Applications, vol. 44, no. 2, pp. 166–177, 2022.
different except for [47] which shows a lack of standardized [14] G. Lin, X. Kang, K. Liao, F. Zhao, and Y. Chen, “Deep graph learning
datasets. Recently, Lopez et al. [7] performed a comparative for semi-supervised classification,” Pattern Recognition, vol. 118, p.
analysis of ML encodings for the domain classification task. 108039, 2021.
However, there is a need to do similar studies for other [15] J. A. H. López and J. S. Cuadrado, “An efficient and scalable search
engine for models,” Softw. Syst. Model., vol. 21, no. 5, pp. 1715–1737,
ML4CM tasks because, as we see from our analysis, the choice 2022.
of encoding is task-dependent. [16] M. Fumagalli, T. P. Sales, and G. Guizzardi, “Towards automated support
for conceptual model diagnosis and repair,” in Advances in Conceptual
Modeling: ER 2020 Workshops. Springer, 2020, pp. 15–25.
VI. C ONCLUSION [17] A. Khalilipour, F. Bozyigit, C. Utku, and M. Challenger, “Categorization
In this paper, we provided an SLR-based detailed analysis of of the models based on structural information extraction and machine
learning,” in Proceedings of the INFUS 2022 Conference, Volume 2,
the various encodings used in the context of machine learning 2022, pp. 173–181.
for conceptual modeling (ML4CM) i.e., using ML methods to [18] A. D. P. Lino and A. Rocha, “Automatic evaluation of erd in e-learning
support CM tasks. We zoomed into what information from the environments,” in 2018 13th Iberian Conference on Information Systems
and Technologies (CISTI). IEEE, 2018, pp. 1–5.
model is encoded, i.e., its semantics and/or structure. We then [19] M. Essaidi, A. Osmani, and C. Rouveirol, “Model-driven data warehouse
analyzed how the information is encoded, thereby identifying automation: A dependent-concept learning approach,” in Advances and
14 different encodings for structural and semantic aspects. Applications in Model-Driven Engineering, 2014, pp. 240–267.
[20] M. Fumagalli, T. P. Sales, and G. Guizzardi, “Pattern discovery in
Then we analyzed why is the model information encoded, conceptual models using frequent itemset mining,” in 41st International
i.e., to solve what task. Finally, we analyzed the relationship Conference on Conceptual Modeling, 2022, pp. 52–62.
between the ML models used with the proposed encodings as [21] J. Yu, M. Gao, Y. Li, Z. Zhang, W. H. Ip, and K. L. Yung, “Workflow
performance prediction based on graph structure aware deep attention
well as the purpose in the literature. Based on the findings, as neural network,” J. Ind. Inf. Integr., vol. 27, p. 100337, 2022.
part of our future work, we plan to do a systematic comparative [22] D. Bork, S. J. Ali, and G. M. Dinev, “Ai-enhanced hybrid decision
study of different encodings for various ML4CM purposes management,” Bus. Inf. Syst. Eng., vol. 65, no. 2, pp. 1–21, 2023.
[23] M. H. Osman, M. R. Chaudron, and P. Van Der Putten, “An analysis
and use a specific dataset to produce benchmarks for other of machine learning algorithms for condensing reverse engineered
researchers to use. class diagrams,” in 2013 IEEE International Conference on Software
Maintenance. IEEE, 2013, pp. 140–149.
R EFERENCES [24] A. Burattin, P. Soffer, D. Fahland, J. Mendling, H. A. Reijers, I. Van-
derfeesten, M. Weidlich, and B. Weber, “Who is behind the model?
[1] H. A. Proper and G. Guizzardi, “Modeling for enterprises; let’s go to classifying modelers based on pragmatic model features,” in 16th Int.
rome via rime,” hand, vol. 1, p. 3, 2022. Conference on Business Process Management, 2018, pp. 322–338.
[2] M. Brambilla, J. Cabot, and M. Wimmer, “Model-driven software [25] A. Barriga, R. Heldal, L. Iovino, M. Marthinsen, and A. Rutle, “An ex-
engineering in practice,” Synthesis lectures on software engineering, tensible framework for customizable model repair,” in Proceedings of the
vol. 3, no. 1, pp. 1–207, 2017. 23rd ACM/IEEE International conference on model driven engineering
[3] R. Saini, G. Mussbacher, J. L. Guo, and J. Kienzle, “Domobot: a bot languages and systems, 2020, pp. 24–34.
for automated and interactive domain modelling,” in Proceedings of the [26] F. Basciani, J. Di Rocco, D. Di Ruscio, L. Iovino, and A. Pierantonio,
23rd ACM/IEEE international conference on model driven engineering “Automated clustering of metamodel repositories,” in 28th Int. Conf. on
languages and systems: companion proceedings, 2020, pp. 1–10. Advanced Information Systems Engineering, 2016, pp. 342–358.
[4] M. Weyssow, H. Sahraoui, and E. Syriani, “Recommending metamodel [27] A. Adamu, S. M. Abdulrahman, W. M. N. W. Zainoon, and A. Zakari,
concepts during modeling activities with pre-trained language models,” “Model matching: Prediction of the influence of uml class diagram
Softw. Syst. Model., vol. 21, no. 3, pp. 1071–1089, 2022. parameters during similarity assessment using artificial neural network,”
[5] L. Burgueno, J. Cabot, S. Li, and S. Gérard, “A generic lstm neural Deep Learning Approaches for Spoken and Natural Language Process-
network architecture to infer heterogeneous model transformations,” ing, pp. 97–109, 2021.
Softw. Syst. Model., vol. 21, no. 1, pp. 139–156, 2022. [28] A. Elkamel, M. Gzara, and H. Ben-Abdallah, “An uml class recom-
[6] P. T. Nguyen, J. Di Rocco, D. Di Ruscio, A. Pierantonio, and L. Iovino, mender system for software design,” in 13th International Conference
“Automated classification of metamodel repositories: a machine learning of Computer Systems and Applications. IEEE, 2016, pp. 1–8.
approach,” in 22nd International Conference on Model Driven Engineer- [29] X. Dolques, M. Huchard, C. Nebut, and P. Reitz, “Learning trans-
ing Languages and Systems, 2019, pp. 272–282. formation rules from transformation examples: An approach based on
569
Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:27:51 UTC from IEEE Xplore. Restrictions apply.
relational concept analysis,” in 14th IEEE Int. Enterprise Distributed
Object Computing Conference Workshops. IEEE, 2010, pp. 27–32.
[30] L. Almonte, S. Pérez-Soler, E. Guerra, I. Cantador, and J. de Lara,
“Automating the synthesis of recommender systems for modelling
languages,” in Proceedings of the 14th ACM SIGPLAN International
Conference on Software Language Engineering, 2021, pp. 22–35.
[31] M. Eisenberg, H.-P. Pichler, A. Garmendia, and M. Wimmer, “Towards
reinforcement learning for in-place model transformations,” in 2021
ACM/IEEE 24th International Conference on Model Driven Engineering
Languages and Systems (MODELS). IEEE, 2021, pp. 82–88.
[32] J. Di Rocco, C. Di Sipio, D. Di Ruscio, and P. T. Nguyen, “A gnn-
based recommender system to assist the specification of metamodels
and models,” in ACM/IEEE 24th International Conference on Model
Driven Engineering Languages and Systems. IEEE, 2021, pp. 70–81.
[33] P. T. Nguyen, D. Di Ruscio, A. Pierantonio, J. Di Rocco, and L. Iovino,
“Convolutional neural networks for enhanced classification mechanisms
of metamodels,” J. Syst. Softw., vol. 172, p. 110860, 2021.
[34] R. Rubei, J. Di Rocco, D. Di Ruscio, P. T. Nguyen, and A. Pierantonio,
“A lightweight approach for the automated classification and clustering
of metamodels,” in ACM/IEEE Int. Conf. on Model Driven Engineering
Languages and Systems Companion (MODELS-C), 2021, pp. 477–482.
[35] P. T. Nguyen, J. Di Rocco, L. Iovino, D. Di Ruscio, and A. Pierantonio,
“Evaluation of a machine learning classifier for metamodels,” Softw.
Syst. Model., vol. 20, no. 6, pp. 1797–1821, 2021.
[36] Ö. Babur, L. Cleophas, T. Verhoeff, and M. van den Brand, “Towards
statistical comparison and analysis of models,” in 2016 4th International
Conference on Model-Driven Engineering and Software Development
(MODELSWARD). IEEE, 2016, pp. 361–367.
[37] Ö. Babur, L. Cleophas, and M. van den Brand, “Hierarchical clustering
of metamodels for comparative analysis and visualization,” in European
Conference on Modelling Foundations and Applications, 2016, pp. 3–18.
[38] ——, “Metamodel clone detection with samos,” Journal of Computer
Languages, vol. 51, pp. 57–74, 2019.
[39] V. Borozanov, S. Hacks, and N. Silva, “Using machine learning tech-
niques for evaluating the similarity of enterprise architecture models,”
in Int. Conf. on Advanced Information Systems Engineering, 2019, pp.
563–578.
[40] P. Danenas and T. Skersys, “Exploring natural language processing in
model-to-model transformations,” IEEE Access, vol. 10, pp. 116 942–
116 958, 2022.
[41] L. Burgueño, R. Clarisó, S. Gérard, S. Li, and J. Cabot, “An nlp-
based architecture for the autocompletion of partial domain models,”
in Advanced Information Systems Engineering: 33rd International Con-
ference, CAiSE 2021. Springer, 2021, pp. 91–106.
[42] M. Goldstein and C. González-Álvarez, “Augmenting modelers with se-
mantic autocompletion of processes,” in Business Process Management
Forum: BPM Forum 2021. Springer, 2021, pp. 20–36.
[43] G. M. Lahijany, M. Ohrndorf, J. Zenkert, M. Fathi, and U. Kelte,
“Identibug: Model-driven visualization of bug reports by extracting class
diagram excerpts,” in 2021 IEEE International Conference on Systems,
Man, and Cybernetics (SMC). IEEE, 2021, pp. 3317–3323.
[44] S. J. Ali, G. Guizzardi, and D. Bork, “Enabling representation learning
in ontology-driven conceptual modeling using graph neural networks,”
in Int. Conf. on Advanced Information Systems Engineering, 2023.
[45] J. A. H. López and J. S. Cuadrado, “Towards the characterization
of realistic model generators using graph neural networks,” in 2021
ACM/IEEE 24th International Conference on Model Driven Engineering
Languages and Systems (MODELS). IEEE, 2021, pp. 58–69.
[46] B. Li, T. Liu, Z. Zhao, P. Wang, and X. Du, “Neural bag-of-ngrams,”
in AAAI Conference on Artificial Intelligence, vol. 31, no. 1, 2017.
[47] J. A. H. López, J. L. Cánovas Izquierdo, and J. S. Cuadrado, “Modelset:
a dataset for machine learning in model-driven engineering,” Softw. Syst.
Model., pp. 1–20, 2022.
570
Authorized licensed use limited to: Sri Sai Ram Engineering College. Downloaded on November 19,2024 at 08:27:51 UTC from IEEE Xplore. Restrictions apply.