0% found this document useful (0 votes)
99 views

Exploring Natural Language Processing in Model-To-Model Transformations

This document summarizes research on applying natural language processing (NLP) techniques to extract information from process models and use case models. The researchers explored extracting verb and noun phrases to form semantic relations between elements. They benchmarked several state-of-the-art NLP tools and implemented custom models to extract phrases. The best performance was from Stanza for phrase extraction and their BERT model for verb phrases. They also explored handling abbreviations/acronyms and parsing conjunctive/disjunctive statements to improve relation extraction. The results indicate NLP can help with automated model-to-model transformations by extracting semantic information from text in visual models.

Uploaded by

sdgjsai11
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views

Exploring Natural Language Processing in Model-To-Model Transformations

This document summarizes research on applying natural language processing (NLP) techniques to extract information from process models and use case models. The researchers explored extracting verb and noun phrases to form semantic relations between elements. They benchmarked several state-of-the-art NLP tools and implemented custom models to extract phrases. The best performance was from Stanza for phrase extraction and their BERT model for verb phrases. They also explored handling abbreviations/acronyms and parsing conjunctive/disjunctive statements to improve relation extraction. The results indicate NLP can help with automated model-to-model transformations by extracting semantic information from text in visual models.

Uploaded by

sdgjsai11
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Received 28 September 2022, accepted 12 October 2022, date of publication 4 November 2022, date of current version 9 November 2022.

Digital Object Identifier 10.1109/ACCESS.2022.3219455

Exploring Natural Language Processing in


Model-To-Model Transformations
PAULIUS DANENAS 1 AND TOMAS SKERSYS1,2
1 Centerof Information Systems Design Technologies, Kaunas University of Technology, 51423 Kaunas, Lithuania
2 Department of Information Systems, Kaunas University of Technology, 51368 Kaunas, Lithuania

Corresponding author: Paulius Danenas ([email protected])

ABSTRACT In this paper, we explore the possibility to apply natural language processing in visual model-
to-model (M2M) transformations. Therefore, we present our research results on information extraction from
text labels in process models modeled using Business Process Modeling Notation (BPMN) and use case
models depicted in Unified Modeling Language (UML) using the most recent developments in natural
language processing (NLP). Here, we focus on three relevant tasks, namely, the extraction of verb/noun
phrases that would be used to form relations, parsing of conjunctive/disjunctive statements, and the detection
of abbreviations and acronyms. Techniques combining state-of-the-art NLP language models with formal
regular expressions grammar-based structure detection were implemented to solve relation extraction task.
To achieve these goals, we benchmark the most recent state-of-the-art NLP tools (CoreNLP, Stanford
Stanza, Flair, Spacy, AllenNLP, BERT, ELECTRA), as well as custom BERT-BiLSTM-CRF and ELMo-
BiLSTM-CRF implementations, trained with certain data augmentations to improve performance on the
most ambiguous cases; these tools are further used to extract noun and verb phrases from short text
labels generally used in UML and BPMN models. Furthermore, we describe our attempts to improve
these extractors by solving the abbreviation/acronym detection problem using machine learning-based
detection, as well as process conjunctive and disjunctive statements, due to their relevance to performing
advanced text normalization. The obtained results show that the best phrase extraction and conjunctive phrase
processing performance was obtained using Stanza based implementation, yet, our trained BERT-BiLSTM-
CRF outperformed it for the verb phrase detection task. While this work was inspired by our ongoing research
on partial model-to-model transformations, we believe it to be applicable in other areas requiring similar text
processing capabilities as well.

INDEX TERMS Information extraction, relation extraction, acronym detection, process models, use-case
models, natural language processing, model-to-model transformation.

I. INTRODUCTION and semantic inconsistencies [1], extraction of relations,


As one of the most established topics in natural language aspects, or entities [2], [3], also, tagging entities of interest
processing (NLP), information extraction is focused on in the text [4], deduplication, identifying similarities or
extracting various structures of interest from unstructured synonymous forms [5] and solve other similar problems.
textual information. Recent advances in deep learning and Moreover, successful implementation of such tasks requires
NLP fields enable the development of high performing fundamental knowledge about multiple techniques at the
models by using large amounts of data and wide contexts intersection of information retrieval, computational linguis-
to automatically extract relevant features, which can be tics, ontology engineering, and machine learning.
transferred and reused in other related tasks. Such techniques This work is inspired by our previous research on
enable complex context-driven detection of grammatical NLP-enhanced information extraction in model-to-model
transformations [6], [7]. However, the need for similar
The associate editor coordinating the review of this manuscript and solutions was also identified in other areas involving visual
approving it for publication was Arianna Dulizia . modeling, such as business process modeling [8], [9], [10].

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
116942 VOLUME 10, 2022
P. Danenas, T. Skersys: Exploring Natural Language Processing in Model-To-Model Transformations

In this paper, we address the issue of relation extraction Unified Modeling Language (UML) [17]. To our knowledge,
from graphical models focused on the detection of semantic this research is one of the first attempts to apply novel deep-
relationships within the given text. More specifically, we aim learning-driven techniques for the extraction of information
to extract subject-verb relations which can be easily extended from such models. Additionally, we provide evaluations of
to triplets (subject, verb, object) using associative or composi- two related tasks, namely, conjunctive/disjunctive statement
tional relationships from the source model (for instance, Use processing and acronym detection, which may significantly
Case element is usually associated with one or more Actors enhance the performance of our developed relation extractors
using Association relationship). Therefore, such relationships in this context. We consider our findings to be also
will be defined between two or more entities and represent applicable to other NLP topics that involve the processing of
certain connections between them. Many recent papers similar texts, such as process mining, aspect-based sentiment
address relation detection between entities of predefined analysis, or conversational intelligence.
types (such as PERSON, ORGANIZATION, LOCATION) Further in this paper, Section II gives a short introduction
and their semantic relations using supervised learning [11], to model-to-model transformations with their reliance upon
while we aim to perform more generalized extraction by NLP functionality and provides a concise review of NLP
extracting all available verb and noun pairs. This is not techniques that we consider to be relevant to our research
a trivial task, although it has been previously addressed and model-to-model transformations in general. Section III
in document processing using pattern-based analysis [12], summarizes the main challenges, which must be addressed
distant supervision [13], [14] and rule-based extraction sys- when solving similar problems, and provides a structured list
tems [15]. In addition to the extraction of verb/noun phrases of element naming anti-patterns, which provide additional
from the text labels, in this paper, we also study the problem noise during automated text processing and illustrate the
of identifying and properly interpreting abbreviations and complexity of this problem. Further, solutions for three
acronyms, which is a very relevant topic in model-driven inter-related tasks are discussed: Section IV describes the
systems development, especially, in the field of automated verb/noun extraction task and the experimental results on this
model transformations. While it may be handled using exter- subject; Section V deals with the processing of conjunctive
nal sources, like acronym databases, dictionaries, or thesauri, and disjunctive statements; similarly, Section VI presents
real-world cases may be more complex to interpret due to abbreviation and acronym detection challenges together with
ambiguities, contextual dependency, or simply the lack of the corresponding experimental results. Section VII provides
proper text formatting (for instance, acronyms may be written a discussion of our experimental findings, the identified
in lowercase if less formal communication or discourse issues, and possible improvements. Finally, the paper is
context is considered, such as chatbots or tweets). Finally, concluded with Section VIII providing certain insights on the
we address the problem of processing conjunctive/disjunctive future work and conclusions.
statements, by parsing them into multiple ‘‘subject-verb’’
relations. In the context of our research, they can later be II. INTRODUCING NLP TO MODEL-TO-MODEL (M2M)
combined with related elements to form valid associative TRANSFORMATIONS
relations (triplets). This is also a sophisticated problem due Let us assume that a system analyst has a valid UML use
to the natural language ambiguities or inconsistency in the case model, created either by himself or obtained from
underlying NLP technology. All the above-mentioned issues external parties, which he intends to use as a part of some
are discussed in more detail in Section III. system specification. Therefore, he wants to use it as a
The main objective of this paper is to evaluate the source of knowledge to develop a conceptual data model
capabilities of the most recent developments in NLP for for that business domain in form of a UML class model.
processing text labels in graphical models and to validate Model-to-model transformations enable direct reuse of the
their suitability by performing the extraction of noun/verb input model without the need to manually develop the
phrases from the names of model elements under certain target model; they also provide the benefits of transferring
real-world conditions and constraints which are usually and reusing the whole logic of model transformations for
not addressed in more generalized NLP-related research. other instances. Unfortunately, existing solutions provide
To solve these problems, we first identify and enumerate only complete model transformations which are quite rigid
multiple anti-patterns for naming model elements extracted due to their solid formal foundations and are very limited
from a real-world dataset which complicate this task and for integrations with complementary functionality, such as
should be handled separately by using additional techniques. natural language processing [18], [19]. Therefore, in this
Further, we apply deep learning-based sequence tagging section we will rely on our own development [7], [20] to
models, pretrained with augmented data to address some of demonstrate use cases for NLP-based transformations, as our
these ambiguities and combine them with predefined formal solution provides the ability for the user to use intuitive drag
grammar-based extraction. In this paper, we specifically and drop actions on certain model source elements, as well
consider the processing of text labels in graphical models as provides relevant extension points to integrate required
created using two prominent visual modeling standards, functionality. These actions trigger selective transformation
namely, Business Process Model Notation (BPMN) [16] and actions to generate a set of one or more related target model

VOLUME 10, 2022 116943


P. Danenas, T. Skersys: Exploring Natural Language Processing in Model-To-Model Transformations

examples which may require additional steps for linguistic


preprocessing:
• Hierarchical relations created after one element is
dragged onto another if text labels of these elements
match some form of the semantic relationships (such
as generalization, synonymy, hyponymy, hypernymy or
holonymy)
• Entity deduplication when multiple entries have the
same meaning but different expressions. In some cases
FIGURE 1. Illustrative example of a partial M2M transformation using they are not considered synonyms, for instance, acronym
drag and drop action.
and abbreviation resolution does not result in synony-
mous entries but rather in duplicate representations
elements and represent those elements in the opened target • Processing of more complex phrasal structures like
model diagram. conjunctions/disjunctions, or combinations of the above
In our example, we use the UML use case model as (e.g. create invoice and send it to the manager).
the source model, and UML class model as the target This may also include mining of ternary associations
model. Furthermore, we present the situation where it is or relationships, as well as identifying possible co-
necessary to apply more advanced processing to produce a references
semantically valid fragment of a target model. We assume • Text normalization, such as having two sets of elements
that the user dragged Actor element Customer from the that differ only in syntactic structures. For instance,
UML use case model onto the opened UML class diagram consider two sets of associated elements in the source
Order Management (Fig. 1, tag 1), which triggered a model, Actor Administrator and Use Case Monitors
transformation action to execute the specific transformation instance, and Actor Administrator and Use Case Moni-
specification. This specification is visually designed and tor instance. The only difference here lies in the present
is specified to be executed particularly after an action, tense form of verb monitors, where normalization to
dragging an Actor element from the use case model onto infinitive form monitor would result in deduplication of
the UML class diagram, is triggered. The transformation output elements, and hence, more clarified and concise
specification instructs the transformation engine to select output model. While this is a very straightforward
Customer element together with instances of Use Case and less likely scenario, more sophisticated cases may
elements associated with this Actor and transform them involve disambiguation of acronyms, or detection of
into UML Class elements and a set of UML Associations missing words as well as grammatical errors.
connecting those classes. Now, we assume that in the Furthermore, we list the main NLP fields which could
exemplary use case model, Customer is associated with two be applicable in this context in Table 1, together with
Use Case elements, particularly, Return back item and Fill- our insights on their further applications in improving the
in complaint form. This results in generation of a UML class quality of model-to-model transformations. Most of them
diagram fragment as presented in Fig. 1, tag 2. will not be considered in this research, yet, they are
While from the very first sight this would seem like a proposed as additional extension points for improving the
straightforward and simple transformation, this particular final pipeline. Moreover, this table is also supplemented
example illustrates a situation where certain NLP processing with core techniques used to solve these problems; it is
is already required to acquire a correct result. The reason clearly indicated that deep learning techniques are the most
behind this is that the conditions defining the extraction of widely researched and applied to solve these problems.
multi-word verb and/or noun phrases are non-trivial. In our For more extensive reviews of the techniques, as well as
case, the association between the two classes Customer more discussions on their weaknesses or future prospects,
and Item is named as the two-word phrase return back, we refer to recent survey NLP papers such as [76], [77],
which is extracted from the name of the source element, [78], [79], and [80]. Additionally, their performance can
particularly Use Case element Return back item. Moreover, be significantly boosted after applying transfer learning
actual verb phrases are not limited to one or two words, with pretrained language models, such as BERT [39],
like phrasal-prepositional verbs containing both particle and ELMO [41], RoBERTa [81], ELECTRA [82], XLNet (83),
preposition (come up with) or even distributed in the whole T5 [84] or Microsoft’s DeBERTa [85]. Therefore, from the
phrase, e.g. when the particle is after the object (associate technological point of view, one would need to consider the
the object with), although the such cases are observed less integration of deep learning based techniques that require
frequently in the formal language used in modeling practice. to satisfy certain technological constraints. This is the first
The above-mentioned examples are just sample cases work which tries to bridge these two fields by performing a
where a straightforward text chunking is not sufficient and thorough evaluation of the existing NLP implementations for
certain involvement of NLP technology is required to obtain processing short text labels, which is required in the context
correct transformation results. Further, we provide more of model-to-model transformations.

116944 VOLUME 10, 2022


P. Danenas, T. Skersys: Exploring Natural Language Processing in Model-To-Model Transformations

TABLE 1. Applicability of NLP techniques in M2M transformations.

III. RELATION EXTRACTION-RELATED TEXT LABELING modeling best practices are generally considered in model-
ANTI-PATTERNS ing [86], [87], actual real-world cases tend to contain various
In this section, we enumerate a set of modeling and element issues (such as linguistic or modeling ambiguities) making
naming issues, which make the automated processing of it very difficult to be dealt with using automated tools.
labels in graphical models rather intricate. While certain Hence, if the processing of text labels created following

VOLUME 10, 2022 116945


P. Danenas, T. Skersys: Exploring Natural Language Processing in Model-To-Model Transformations

TABLE 2. Anti-patterns for naming activity-like model elements.

best modeling practices could be considered as a relatively naming activity elements as conditions or decision points
uncomplicated task (assuming that the tagging bias of the (e.g., Available, Yes, Check if available). Naming activities
underlying implementation is not considered), significant as whole triplets <actor-relationship-activity>
deviations might easily complicate it. is yet another quite common bad modeling practice used
To identify the most common text labeling issues in in modeling processes. The latter should be transformed
graphical models, we used a large dataset provided by the into a combination of a BPMN Lane or Pool element
BPM Academic Initiative (BPMAI) [88], which contained with an activity-like element in it. One may also identify
over 4100 real-world process models presented in BPMN cases that combine multiple anti-patterns, for instance, the
notation. We excluded instances that did not meet certain name of an activity may contain both conjunctive/disjunctive
requirements (e.g., all the elements in the models were named clauses relating multiple verb phrases into one text rumbling
using single letters without any semantic meaning, or the (e.g., Mark the invoice as invalid and return to customer),
text labels were not in English). Labels from the BPMN which increases the complexity of NLP tasks to a whole
Task elements were extracted from the remaining models as new level. Even though resolving conjunctive/disjunctive
one of the main objects of interest in our research. After clauses is a challenging task, it can still be processed by
analyzing the extracted labels, a set of naming anti-patterns using dependency parsing-based extraction, which is further
for activity-like Task elements was formed (Table 2) together addressed in Section V.
with examples and some heuristic rules for detecting these
anti-patterns; in our opinion, the latter could be applied for the
initial screening and filtering tasks in other types of graphical IV. PHRASE EXTRACTION EXPERIMENT
models as well. In this section, we evaluate the capabilities of the existing
The detection rules are not formal in any way but can NLP tools to properly extract noun/verb phrases from the
be used as guidelines to identify the cases of anti-patterns. given text labels. This task is closely related to the relation
Also, the morphosyntactic analysis might have to be carried extraction task, given its goal to extract tuples (verb phrase,
out to properly detect sophisticated cases of element naming noun phrase) from the given chunk of text that can further
anti-patterns in graphical models. Moreover, other elements be used to construct semantic associative relations after
representing subjects or entities (such as BPMN Lane, Pool combining with semantics from the source models (e.g.,
elements) may contain invalid names as well, including associative relationships between UML Use Case and Actor
multiple subjects, phrases, or some of the anti-patterns from elements). Moreover, this task is important for successful
Table 2. model-to-model transformations because the extracted tuples
It is worth noting that some of the observed naming are used to generate sets of elements for various target models
cases indicated invalid modeling practices, for instance, or augment the existing models with additional elements.

116946 VOLUME 10, 2022


P. Danenas, T. Skersys: Exploring Natural Language Processing in Model-To-Model Transformations

Further, we present basic aspects of our experimentation on • Flair [24] toolkit by Zalando Research, which applies
extracting noun/verb phrases from the text labels extracted pooled contextualized embeddings together with deep
from the real-world dataset which contains BPMN process recurrent neural networks, as well as provides its
models and UML use case models; both types of these models pretrained language models.
contain activity-like elements which are subjects for specific • AllenNLP [25] which relies on deeply contextualized
processing. Section IV-A describes the preliminaries and ELMo embeddings based on combined character-level
setup of the experiment, while Section IV-B presents the CNN and Bi-LSTM architecture.
evaluation methodology; in its turn, Section IV-C elaborates • BERT [39] is one of the most dominant techniques in
on the main findings in this experiment. NLP at the moment of writing this paper, based on trans-
former architecture and masked language modeling.
A. EXPERIMENT SETUP • ELECTRA [82] which is an improvement over BERT
Information extraction (and more specifically, relation extrac- that applies token replacements with plausible alterna-
tion) is widely supported by multiple commercial and tives sampled from a generative network during model
academic engineering efforts that provided multiple options training, instead of using masked tokens. The main
for selecting the initial starting point for our research. goal of the model is to predict whether the corrupted
While new techniques emerge frequently, they are based input was replaced with a generator sample. ELECTRA
on the generally-available text corpora that do not provide authors show that this task is more efficient than
the flexibility and specificity required to fulfill our goals. BERT and the final model is capable of substantially
More specifically, our initial testing of such tools helped outperforming BERT model in terms of model size,
us to recognize the possibility of confusion in verb/noun amount of computing and scalability [82].
recognition if the infinitive verb form is used – this is not The fact that these tools use different machine learning
handled correctly by generic POS tagger tools. On the other or deep learning approaches to solve NLP tasks has also
hand, the development of specialized datasets is usually motivated us to test their performance in the context of our
challenging and time-demanding. approach. In this work, we use the BERT2 and ELECTRA3
Therefore, given the lack of specialized resources required implementations from the Hugging Face repository, which
for successful implementation, we chose to adopt and are already fine-tuned for part-of-speech tagging tasks.
test existing tools by complementing them with additional Additionally, we developed our own taggers that were
extraction functionality and applying certain enhancements biased towards the recognition of conflicting verb forms by
to the existing ones. Moreover, some of these toolkits performing augmentations of the original text inputs with
provide implementations for wide array of related problems, their copies containing infinitive verb forms as replacements
such as such as tokenization, POS tagging, lemmatization, for the original ones; a similar approach was successfully
syntactic analysis, dependency parsing, co-reference resolu- applied in our previous work to improve performance for base
tion, or relation extraction, which may significantly enhance CRF-based tagger [6]. OntoNotes corpus [91] was used as the
required pipelines. Additionally, some libraries provide other base data source due to its resemblance to the communication
interesting tools, for instance, Stanford CoreNLP [89] also cases observed in graphical process and system models.
provides natural logic annotator that enables quantifier For the reference implementation, we selected Bi-LSTM-
detection and annotation, as well as CRF-based true case CRF architecture [37] which has been proven to be the
recognition, which is also important for knowledge base best performing one at the time of writing. It consists of a
acquisition and normalization and relates to the problems single input embeddings layer, a bidirectional LSTM hidden
addressed in this work; while quantifier detection is not layer to process both past and future features, and CRF
among such issues, it can be tested and integrated into the layer at the output, which helps to improve tagging accuracy
future pipelines as well. by learning and applying constraints over sentence level to
Further, we list the set of implementations selected for our simultaneously optimize the labeling output and ensure its
experimental implementation and evaluation1 : validity. For our experimental purposes, we implemented two
• Stanford CoreNLP toolkit [89] which relies on condi- versions of our customized taggers:
tional random field (CRF) implementations for perform- • BERT-BiLSTM-CRF that uses original pretrained

ing both part-of-speech tagging and NER-related tasks. BERT embeddings at the input layer,
• Spacy [23] framework, which applies convolutional • ELMo-BiLSTM-CRF that relies on ELMo embeddings

neural networks. at the input layer.


• Stanford Stanza [90] which uses Bi-LSTM to implement
For training these models, CRF (also known as Viterbi)
components and pipelines for multiple NLP tasks loss, based on the maximization of the conditional proba-
such as tokenization, lemmatization, POS tagging, and bility, was used; for more details on its derivation, we refer
dependency/constituency parsing. 2 https://huggingface.co/vblagoje/
bert-english-uncased-finetuned-pos
1 The final datasets, experimental code and results are available at https: 3 https://huggingface.co/danielvasic/en_acnl_
//github.com/paudan/m2m-nlp-experiment electra_pipeline

VOLUME 10, 2022 116947


P. Danenas, T. Skersys: Exploring Natural Language Processing in Model-To-Model Transformations

Listing 1. Formal grammar for extraction used with Spacy, Flair and Stanza.

experiment (this is later addressed in Section V). However,


having a single verb phrase with multiple noun phrases
in conjunctive or disjunctive form could be considered
processable and would result in multiple valid tuples of target
Listing 2. Formal grammar for extraction used with CoreNLP and
transformation outputs.
ELECTRA implementations. After performing the aforementioned steps, we obtained
a dataset of 4044 valid entries that were then used to
manually extract verb phrase and noun phrase pairs. The
to [37] and [92]. Moreover, learning rate was set to 0.1, whole extraction procedure was performed by the authors of
the hidden layer size was set to 128, and the early stopping this paper. These pairs were set as a ‘‘golden standard’’ to
parameter for termination, if no convergence is further validate the outputs acquired from the automatic extraction
observed, was set to 10. SimpleNLG library [93] was used using selected extractors. Hence, the final dataset included
to normalize tense for verb phrases, while NLTK [22] toolkit 328 instances having no verb phrases, and 3716 instances
was used to implement text chunking with POS tags obtained containing both verb and noun phrases.
as an output from the above-mentioned tools.
Listing 1 represents formal grammar, based on regular B. EVALUATION METHODOLOGY
expressions (regex) over part-of-speech tags, which was used The developed extractors were evaluated in terms of accuracy,
for noun/verb phrase extraction. It relies upon the Universal precision, recall, and F-measure, which measured the ability
Dependencies scheme [94], which is used by Spacy, Flair, to match the acquired outputs to the ‘‘golden standard’’
and Stanza tools. Here, NP defines a noun phrase, VP outputs. In our experiment, two different aspects were taken
defines a verb phrase, and PNP defines a proper noun into consideration:
phrase. As Stanford Core NLP and ELECTRA pretrained • Whether the extractor successfully determined that the
implementations use Penn Treebank notation for its POS phrase contained one or more noun/verb phrase that
tagger output, the grammar is adjusted for their cases must have been extracted. In case there is no particular
(Listing 2); here, additionally, ADP defines an adposition, and phrase found, the output would be empty.
ANP – a partial noun phrase, which is further used as a block • Whether the extractor successfully extracted the
in NP extraction. required verb phrases or noun phrases. Note, that it
The datasets used during experimenting were obtained was required to evaluate if both verb phrases and noun
after pre-processing a relatively large number of BPMN phrases were successfully extracted. In cases, where
process and UML use case models, obtained from various multiple phrases were marked as an output, it was
sources. The final experimentation set of such models considered that strictly all of them had to be present in
consisted of: the output for it to be marked as correct.
• 32 BPMN process models and 25 UML models that were
Extraction accuracy is defined as the ratio of correctly
collected freely from the Internet;
extracted verb/noun phrase instances (together with empty
• A large sample of preprocessed and cleansed BPMN
outputs when such instances were absent) to a total number
process models, which were selected from a large set of
of entries:
Signavio BPMN models provided by BPM Academic
Initiative [88].
number of correctly extracted instances
The acquired final set of models was processed, and the accuracy = (1)
names of Task elements (for BPMN process models) and Use number of total instances
Case elements (for UML use case models) were extracted
for experimentation. It was expected that Task and Use Case Precision is defined as the ratio of correctly extracted
elements would contain at least one verb or verb phrase, concepts to the number of total extracted concepts, whereas
and one noun or noun phrase. The extracted elements were recall is a ratio of correctly extracted concepts to the number
cleaned from semantic inconsistencies, grammatical errors, of correct concepts:
invalid names, and common modeling errors, as well as
filtered to exclude invalid practices listed in Table 2. In this concepts correctly identified
precision = (2)
stage, we also excluded entries containing multiple verb concepts identified total
phrases in their names (e.g., conjunctive/disjunctive clauses), concepts correctly identified
recall = (3)
as the recognition of such structures was not a part of this gold standard concepts
116948 VOLUME 10, 2022
P. Danenas, T. Skersys: Exploring Natural Language Processing in Model-To-Model Transformations

TABLE 3. Results of the extraction of noun/verb phrases from the names of activity-like elements.

F1-measure (also referred to as F1-score) is defined as a some other cases it could fail to correctly tag other words
harmonic mean of these two measures: that were handled correctly by the tagger trained using
conventional corpora, and was initially confirmed in our
(1 + β 2 ) × (precision × recall)
Fβ = ,β = 1 (4) previous research applying similar principles to train custom
precision + recall POS taggers [6]. Therefore, more attention should be given to
C. EXPERIMENT RESULTS
improving and tuning custom taggers applied in the research,
as well as finding an optimal balance between an increase
The results of the experimental extraction of verb phrases
in performance for verb detection and a possible decrease
and noun phrases from the names of activity-like elements
in other tasks that are performed better using generic POS
are presented in Table 3. It depicts both results of detecting
taggers.
whether the given entry had particular types of phrases,
Nevertheless, the results of the leading extractor (based
as well as the performance of extracting these phrases from
on the Stanford Stanza toolkit) are quite encouraging –
the respective entries.
the achieved F1-Score was more than 0.8 in most of the
The obtained results indicate that the extractor based
performed evaluation tasks, especially given the limitations
on the RNN-based Stanza tagger outperformed CNN-based
and the level of unavoidable ambiguity in the testing dataset.
and CRF-based tools (Spacy and CoreNLP respectively) in
One of the main challenges in this particular case is
solving our problem. Extraction using Stanza’s Bi-LSTM-
the fact that corpora currently available for training, like
based tagger showed the best performance in 2 tasks, while
OntoNotes [91] or English Web Treebank [95], are better
Flair tagger use resulted in the second-best. Extractor based
accustomed to working with whole documents rather than
on our custom BERT-BiLSTM-CRF tagger outperformed
the analysis of short text and, therefore, do not represent the
other implementations while detecting verb phrase presence
specificity addressed in this paper. We tried to mitigate this
and verb phrase extraction. Moreover, both custom taggers
issue with additional augmentations of the input text, which
also showed improvements over their generic versions, i.e.,
resulted in certain performance improvements; developing
ELMo-BiLSTM-CRF resulted in a better performance than
text corpora, which are better adjusted for this specific task,
the original AllenNLP ELMo, and BERT-BiLSTM-CRF
would certainly help to improve its performance even further.
proved to be better performing compared to the BERT-based
POS tagger. This is quite optimistic, considering the size
and specificity of the dataset. However, some caution should V. PARSING CONJUNCTIVE/DISJUNCTIVE STATEMENTS
be taken while interpreting these results, given that our The techniques described in Section IV proved their effi-
custom-trained tagger was biased towards the identification ciency during the extraction of verb phrases and noun phrases,
of infinitive forms of conflicting verbs. This implies that in the tools we experimented with in the phrase extraction

VOLUME 10, 2022 116949


P. Danenas, T. Skersys: Exploring Natural Language Processing in Model-To-Model Transformations

task are not capable of processing more complex examples relationship from the token tok as DepHead (tok), and the
discussed in Section III when applied directly – here, the end as DepEnd (tok). Finally, we denote GET operation as
conjunctive/disjunctive statements are a good example of the operation, which enables retrieving an entry from the
that. The complexity can be illustrated with the following index, given its indexing value. The syntactic dependencies
examples which depict multiple cases of conjunctive state- are expected to be labeled using Universal Dependencies
ments (disjunctive statements may be formulated almost format [94], particularly DOBJ as the dependent object, OBJ
identically): as the object, POBJ as the object of the preposition, CONJ
• check dates and suggest modifications – the statement as the conjunction.
includes the conjunction ‘‘and’’. The output of this algorithm is a collection of tuples of
• consult project, check progress – the statement does not verb phrases and noun phrases. It is expected that the input
include direct conjunction, but it is inferred. contains both nouns and verbs, otherwise, tuples with empty
• receive invoice, packing slip, and shipment from supplier values instead of the verb or noun phrases can be returned as
– multiple nominal subjects are related to the single verb a result.
receive.
• calculate and send price offer – contains a single B. EXPERIMENT SETUP
nominal subject that has dependencies on multiple To evaluate our approach, we extract a dataset of 410 entries
verbs. acquired from the same set of process models which was
Obviously, the presented examples are not the most used in our phrase extraction experiment. The final dataset
sophisticated text labels one could find in real-world models. comprised only those text labels that included at least
This is not surprising due to the well-known fact that natural one conjunctive or disjunctive clause. Then we manually
language is one of the most complex objects there is for extracted all available verb/noun phrase parts to create a
automated machine processing. It is worth noting that the ‘‘gold standard’’ dataset to be used as a reference point for
topic of processing conjunctive/disjunctive statements is not our evaluation.
widely researched, although it has received some attention The algorithm presented in Section V-A was implemented
from researchers working on sentence simplification [96] or as a separate module without any text normalization capa-
detecting boundaries of the whole conjunction span [97]. bility. To perform comparative testing, we implemented the
Also, many works on sentence simplification rely upon parse module in Python, using Spacy, and extended it to use
trees [15], [98], [99], which is in line with our research. Stanford Stanza, due to its flexible integration with the
In Section V-A, we provide an algorithm based on depen- Spacy framework, to enable comparing the performance of
dency parsing, which is used to extract pairs of noun/verb dependency parsing capabilities of these toolkits.
phrases from conjunctive/disjunctive statements. Section V- Again, for the evaluation, we used metrics like the ones
B describes an experimental setup using a real-world dataset described in Section IV-B, that is, accuracy, precision, recall,
consisting of conjunctive/disjunctive phrases that are then and F1-Score. Here, accuracy is defined as the ratio of the
processed using the proposed solution. Finally, Section V-C entries processed correctly and the total number of entries.
provides the evaluation results, a discussion, and some ideas Note, that this is a very strict measure as it considers a valid
for our future research. extraction only if all noun/verb phrase pairs were extracted
correctly. However, this technique is capable to generate a
A. ALGORITHM FOR EXTRACTING NOUN/VERB PHRASES larger or smaller number of entries compared to the actual
FROM CONJUNCTIVE/DISJUNCTIVE PHRASES outputs. To address this issue and provide an evaluation of
Further, we briefly describe a dependency parsing-based partially correct outputs, we defined two additional metrics to
algorithm for extracting pairs of noun phrases and verb evaluate the performance in terms of the number of generated
phrases from conjunctive/disjunctive phrases (see Algorithm output instances:
1). The input is the parsed and tagged document D; hence, • The mean deviation between the number of extracted
it requires a part-of-speech tagger and a dependency parser as outputs and benchmark output results:
part of its processing pipeline. We define DTOK as the set of n
tokens that constitute document D, together with the parsing 1 X |#iactual − #iextracted |
MeanDiff = (5)
and tagging output. Further, this document is also processed n
i=0
#iactual
to create noun phrase spans (further denoted as SNP) and
verb phrase spans (denoted as SVP) by using predefined • The mean Sørensen–Dice coefficient, which is used to
grammars (such as presented in Section IV-A). Later, we use evaluate the average similarity between the actual and
correspondence indexes IndNP and IndVP to map each token extracted instance sets:
in the document to a corresponding noun phrase or a n
1 X 2 × |Oiactual ∩ Oiextracted |
verb phrase. These indexes enable traversing dependency MeanSDC = (6)
n |Oiactual ∪ Oiextracted |
relationships at a phrase level and at the same time i=0
reduce the ambiguity that is observed after using different Here, n is the total number of processed entries in the
dependency parsers. We denote the head of the dependency dataset; Oiactual is the benchmark set of verb phrase/noun

116950 VOLUME 10, 2022


P. Danenas, T. Skersys: Exploring Natural Language Processing in Model-To-Model Transformations

Algorithm 1 Processing of Conjunctive/Disjunctive Statements


Require: parsed sentence DTOK , mapping indexes IndNP and IndVP
1: results ← 0
2: for all tok ∈ DTOK do
3: if DEPEND (tok) ∈ (DOBJ , OBJ , POBJ ) then
4: results ← results ∪ (GET(IndVP , DepSTART (tok)), GET(IndNP , DepEND (tok)))
5: else if DEPEND (tok) = CONJ then
6: IndPOS ← index of POS tags and tokens for conjuncts in DEPEND (tok)
F Assume pattern <VERB>, <VERB> and <VERB> <NOUN>
7: if |GET(IndPOS , NOUN )| = 1 and |GET(IndPOS , VERB)| > 1 then
8: noun ← Get(IndPOS , NOUN )
9: for all verb ∈ GET(IndPOS , VERB) do
10: results ← results ∪ (GET(IndVP , verb), GET(IndNP , noun))
F Assume pattern <NOUN> <VERB>, <VERB> and <VERB>
11: else if ||GET(IndPOS , VERB)|| = 1 then
12: verb ← GET(IndPOS , VERB)
13: for all noun ∈ GET(IndPOS , NOUN ) do
14: results ← results ∪ (GET(IndVP , verb), GET(IndNP , noun))
15: else if ||GET(IndPOS , NOUN )|| > 1 then
16: for all noun ∈ GET(IndPOS , NOUN ) do
17: results ← results ∪ (GET(IndVP , LEFTMOSTVERBnoun), GET(IndNP , noun))
Output: the set of (verb phrase, noun phrase) tuples results

phrase pairs extracted for the i-th dataset entry; Oiextracted TABLE 4. Performance of processing the conjunction/disjunction
statements.
is the set of output elements extracted for the i-th dataset
entry; #iactual and #iextracted represent the number of elements
in Oiactual and Oiextracted , respectively.

C. EXPERIMENT RESULTS
The results of the experiment are presented in Table 4.
They summarize the performance of both Spacy and Stanza
models. The obtained results prove the influence of the
underlying dependency parser. Here, the implementation
based on the Stanza toolkit significantly outperformed the
Spacy-based implementation. Unfortunately, the extraction of the developed pipeline. Therefore, we safely conclude that
accuracy score for both implementations was very low the dependency parser plays the most important role of all.
proving that those implementations failed to extract all the This was extremely well visible in the experimental cases
expected verb/noun phrase pairs from each given input text; when insignificant changes to the input entry (e.g., adding an
this is also reflected in relatively high values of MeanDiff and adjective to one of the nouns) resulted in completely different
MeanSDC. Moreover, precision, recall, and F1-Score scores, parse trees compared to the initial ones; it complicated the
which are calculated on a macro-level, show that results at the analysis significantly or even resulted in cases not covered
macro level are not disappointing, yet, both implementations by the used formal grammar. This indicates the need for
of the algorithm and the underlying technology could still be more extensive research and improvements in both extraction
improved in the future. and dependency parsing areas. We believe that it could be
Here, the performance of experimental implementation achieved by integrating and testing recent developments in
resulted in F-Score = 0.631, although we must also take dependency parsing, based on neural techniques as described
into consideration the influence of a sample bias. The in [51] and [52] among the others.
significance of the underlying parse model was also obvious,
as the Stanza-based processor significantly outperformed the VI. ACRONYM/ABBREVIATION DETECTION
implementation based on Spacy. Again, all the mandatory Acronym/abbreviation detection is an issue in text normal-
pipeline steps – text tagging, text chunking into noun/verb ization which deals with multiple issues and ambiguities
phrases, and dependency parsing – have proven to be crucial while detecting whether the given word in the text is an
to the overall quality of phrase processing. A failure in any of abbreviation or an acronym. While many cases can be
these steps inevitably translates into errors in the further steps handled by simply performing a search for a particular

VOLUME 10, 2022 116951


P. Danenas, T. Skersys: Exploring Natural Language Processing in Model-To-Model Transformations

candidate’s expansive form in the text or performing a search simply an indicator of whether the particular word represents
in dictionaries and word lists, this is not trivial when it an acronym or abbreviation.
comes to widely used acronyms. The first issue is that these Further in this section, we provide an empirical evaluation
acronyms/abbreviations might be present in dictionaries and of A/A detection in BPMN element names. To make it more
at the same time overlap with some general words (e.g., consistent with other experiments presented in this paper,
acronym IT overlaps with pronoun it); another common issue we will use the same initial set of the BPMN process and
is omitting the expanded form of an acronym/abbreviation UML use case models as in the experiment presented in
due to its widespread use, which makes it almost impossible Section IV. Hence, Section VI-A describes the preliminaries
to automatically identify it as an acronym/abbreviation of and setup of the experiment, while Section VI-B presents and
some particular phrase with simple backtracking in the input briefly discussed the results obtained during that experiment.
(the aforementioned acronym IT can be seen as an example
in this context as well). Acronym/abbreviation expansion is A. EXPERIMENT SETUP
yet another similar task aiming to solve the problem when The initial dataset of process models was used as the source
a given abbreviation or acronym should be replaced by its for developing the feature dataset for our A/A detection
expansive form, which is the most appropriate in the given experiment. The feature dataset was created from all the
context. This task is not a trivial one either - for instance, available words in the extracted text by applying simple
EM could be referred to as entity matching; however, it could heuristic rules:
also be expectation maximization or entity model, with all • Acronym or abbreviation must contain at most 5 char-
these expansive forms coming from a single computer science acters. It can be observed that the longer the word is,
domain. Unfortunately, current research tends to focus on the smaller is the probability of it being an acronym.
long text passages, which highly reduces their applicability Therefore, words with more than the predefined number
in the context of our research. of characters are not considered to be acronyms and are
In model-to-model transformation, as well as in other excluded from further analysis.
relevant topics, the acronym/abbreviation (A/A) detection • The word representing an acronym or abbreviation is
task helps one to properly match full concept names with not available in the dictionary. Since WordNet does
their abbreviated forms, thus adding to greater consistency not contain all the English words and their forms,
of the models being developed. The A/A detection task itself we used Enchant4 library, which is generally used for
comprises two interrelated subtasks: grammatical error correction, to check for the word
• PA/A detection seeks to detect candidate A/A, which existence.
must be expanded (what must be replaced?); The first rule helped to identify the candidate entries for
• A/A expansion is focused on finding the right expansion the feature dataset, and the entries longer than the predefined
for the given A/A (what is the replacement?). length threshold were not considered as candidates for
Acronym/abbreviation (A/A) expansion is often consid- acronyms and abbreviations. The second rule helped to
ered as a simple expansion of entries that are identified as perform its primary labeling. After the automated generation
A/A due to their writing style or absence in relevant sources, of the dataset, some manual adjustments were performed
like thesauri or dictionaries. While simple A/A mapping fixing automated labeling errors and ambiguities, removing
lists are generally applied for common text normalization redundant and duplicate entries, as well as identifying
tasks, they may not always provide the correct result, unless situations that were not covered by the above-listed heuristics
they are restricted to having single meanings in specific and could not be handled automatically – all this was done
or even multiple contexts. Therefore, real-world use cases to make the feature dataset more consistent and suitable
may easily complicate his seemingly uncomplicated task. for the development of our detection classifier. The feature
The complexity of the task may rise depending on the dataset examination also helped to identify that most of the
diversity of corpus or data required to properly train one’s acronyms were written in uppercase, which also helped to
implementation to resolve models. The expansion problem simplify the semi-automated labeling task. To avoid feature
will not be further addressed in this paper due to certain leakage, we removed the feature of the uppercase word as it
limitations of the dataset. would serve as a proxy for the label otherwise (in practical
While recent developments in acronym detection tend to applications, it might serve as a very strong indicator for
apply state-of-the-art deep learning techniques (as stated acronym presence). To perform POS tagging required for
in Table 1), they are not applicable in our context due to the POS-based feature generation, we used Stanford Stanza
relatively short text input. Therefore, we will model this tagger that showed the best performance in our previous
problem in a more traditional yet efficient way by applying experiment presented in Section IV-C.
context-based classification techniques within a space of After performing the feature generation procedure, a fea-
contextual, morphological, and linguistic features. While a ture dataset with a total of 16579 entries was created. Each
similar approach was successfully tested in [56] and [100], entry in the dataset was a vector of 16 features extracted
we propose using a different set of features that are preferred
due to data limitations. The target variable of the classifier is 4 https://github.com/pyenchant/pyenchant

116952 VOLUME 10, 2022


P. Danenas, T. Skersys: Exploring Natural Language Processing in Model-To-Model Transformations

TABLE 5. Features used for the A/A detection.

FIGURE 2. A/A detection performance and feature importance.

TABLE 6. Feature importance computed with CatBoost.

from the text labels in the BPMN process and UML use case
models, together with the label indicating whether a word
represents an acronym or an abbreviation. The full set of
features is presented in Table 5. The features has.special and scikit-learn, catboost and xgboost libraries. Similar to the
long.char.seq were excluded from further analysis as the final experiments presented in Section IV-B, for performance
dataset did not contain any such entries. Nonetheless, these measuring, the measures of accuracy, precision, recall, and
features could be useful while performing further research F1-score were used.
with more extensive datasets and/or contexts, and thus they
are included in Table 5 along with other features as a reference B. EXPERIMENT RESULTS
for future consideration. This left us with 14 features that Figure 1 presents the results obtained using the classifiers
were further used as the inputs for the classifier. described in Section VI-A. They show that CatBoost
For the development of acronym detection classifier, the significantly outperformed Random Forest and slightly -
following techniques were considered: XGBoost classifiers in terms of precision and F1-Score. This
• CatBoost [101] is a high-performing gradient boosting is not surprising, due to the design of the CatBoost tool
classifier. One of its most exceptional features is the and its ability to work directly with categorical variables. Its
ability to efficiently work directly with the categorical superiority over the XGBoost classifier was also confirmed
feature variables, which helps to improve performance by the McNemar’s test that resulted in p < 0.05 (p = 0.029).
when numerous categorical features are used. Table 6 also provides an insight into the feature importance
• XGBoost [102] is one of the best performing gradient obtained using CatBoost classifier. The results indicate that
boosting-based ensemble classifiers, widely used to morphological features of tokens next to the target word were
solve various classification tasks. identified as the most important, whereas the presence of a
• Random Forest [103], [104] is a widely used decision particular word in an English dictionary or similar referential
tree ensemble technique based on bagging and random source played a less influential role as expected. One of
feature selection. the reasons for this is the fact that usually abbreviations are
To handle the high level of class distribution imbalance created by the people, who create models and write documen-
of the input dataset, weighted classification was applied tation (e.g., business/system analysts). And so, those people
to improve detection performance. Also, grid search was create various acronyms and abbreviations by themselves,
used to optimize the performance of CatBoost and XGBoost or they use already established A/A to make the text more
by selecting their optimal hyperparameters. Random Forest compact (compact text labels are particularly relevant in
classifier was run with default parameters, but using 200 esti- visual modeling). Contextual part-of-speech features seem to
mators. All the classifiers were implemented in Python using play an important role as well because they capture acronym

VOLUME 10, 2022 116953


P. Danenas, T. Skersys: Exploring Natural Language Processing in Model-To-Model Transformations

usage patterns in spoken or written language; this is also analysis of the BPMAI dataset used in our research,
proved by the high importance of the features of preceding as well as our personal experience), and some tools may
tokens, as well as more distant contextual features. This fail to tag such labels correctly. For example, return
prompts for testing wider context features (like prev.pos3, invoice could be tagged as <VERB><NOUN>; however,
next.pos4, etc.); however, such features are not considered in Return Invoice might as well become <NOUN><NOUN>,
this paper due to the limited size of the processed text phrases. which would be an incorrect tagging result. Again,
Alternatively, one might consider sequence-tagging mod- in our related experiments, we reverted all text labels to
els (such as Markov models or recurrent neural networks) that lowercase to mitigate this problem. Unfortunately, such
directly apply such context, yet their training would require normalization might remove relevant features that could
larger datasets and the inclusion of an even greater number be used to detect abbreviations.
of additional features (lexical and morphological). Emerging • The previous issue is also relevant for other related
deep learning approaches, such as [58] or similar, seem to be a problems. While such cases could be normalized to
viable solution as well, although their training might require lowercase, doing so increases the risk of failure in the
a significant amount of labeled data, and their applicability other tasks like named entity recognition where capital
for the given problem must be verified. letters play a crucial role. Moreover, NLP tools may
face difficulties detecting named entities within fully
VII. DISCUSSION lowercase entries (e.g., United States was identified as
With the experiments described in this paper, we explored LOCATION, while united states was not).
the capabilities of the advanced NLP tools to process • Detection performance can be negatively affected by
short text fragments (text labels) which are required to the presence of non-alphanumeric symbols (e.g., dashes,
enable advanced capabilities in processing our model-to- commas, apostrophes) within words. It is advisable to
model transformations. While this is inspired by our previous remove such symbols from the model element names
research [6], [7], we believe that the presented research wherever possible. This issue might be mitigated using
results could be applicable in other relevant fields as well. more advanced tokenizers capable of handling most of
Similar text normalization is required for practical process these cases, but the risk of failing to properly handle
mining where the names of the composing elements need them still exists.
to be unified from multiple data sources while reducing the • Generally, using conjunctive/disjunctive clauses in
number of duplicates to a minimum. It is also applicable activity-like element names indicates a bad modeling
in conversational intelligence when intent processing is practice as such instances should be refactored to two or
required to identify the responsive action for the inquiry. The more atomic elements. As stated previously, processing
experiments prove that the recent developments in the field such statements appeared to be a very challenging
of NLP and deep learning could provide the needed tools to task requiring the support of several advanced NLP
solve such and other similar problems. techniques, such as dependency or constituency parsing.
Overall, the experiments presented in this paper revealed In its turn, this would bring in other kinds of errors from
several issues, which should be addressed and might be the underlying parser model.
required to handle separately: • In our experimentation, we observed general ambi-
• Bad modeling (in particular, element naming) practices guity in detecting abbreviations. The A/A detection
were not considered in the extraction activities. During experiment confirmed the applicability of a machine
the initial dataset screening, we observed many such learning-based approach to handling this problem. Yet,
cases that were summarized in Table 2. Detecting the A/A expansion is a more complicated task as full forms
most common bad modeling practices and introducing of concepts designated by A/A might not be present
an automated resolution of such cases into the developed in models under the scope, especially if those A/A are
solution could provide even greater automated process- well-known and heavily used (e.g., IT, USA). External
ing results. sources, such as domain vocabularies and linked data
• A more thorough analysis of the outputs showed that can be applied by matching them contextually to
some tagging tools, like Spacy, were quite sensitive each model instance containing cases of acronyms and
to the letter casing, which is also significant for the abbreviations. Again, this requires additional sources of
practical application of NLP technology in model-to- input data, together with a more extensive dataset, and
model transformations as well as in other relevant fields. could be considered as one of the directions for our
While this is less relevant when processing long text future research.
passages or whole documents, the importance increases
when more specific text processing is considered. This VIII. CONCLUSION
is stipulated by different modeling styles used by NLP discipline has seen impressive advancements and
practitioner modelers who prefer starting each word with improvements during the last several years, with the number
a capital letter while naming model elements such as of NLP applications increasing dramatically. Also, the
activities, tasks, use cases, etc. (this is verified by the progress in deep learning has resulted in a significant increase

116954 VOLUME 10, 2022


P. Danenas, T. Skersys: Exploring Natural Language Processing in Model-To-Model Transformations

in the performance of solving different linguistic tasks. In this of these issues is also discussed in this paper providing
paper, research on applying the recent developments for additional insights for further improvements in this area.
processing small text phrases is discussed. While the need for Results obtained after applying our technique described in
this research originated from our recent research on model- Section V-C indicate that there is still a lot of potential for
to-model transformations [6], [7], we may identify several further improvements. While at this stage, we did not consider
other areas that could benefit from similar text processing training custom parsers, we hope to achieve more progress
capabilities, such as process mining, aspect-based sentiment in the future after carrying out more extensive studies
analysis or conversational interfaces with command-like and taking advantage of the improvements in dependency
short text processing capability. At the same time, all these parsing, constituency parsing, and general relation extraction
areas share the same NLP-related issues that have to be dealt algorithms.
with to ensure satisfactory performance of the underlying Finally, in the third experiment, we tested a machine
NLP technology (e.g., identical representation of verbs and learning-based approach for the acronym/abbreviation detec-
nouns, lack of context required for the automated processing). tion issue. While this issue is widely discussed in multiple
In this paper, we addressed the problem of extracting papers (see Table 1 for more details on that), these works tend
relation tuples from the process and system requirements’ to focus on processing longer text statements or even whole
models containing elements expressing activity-like state- documents, which is not suitable for our particular case. Due
ments. As it is stated in Section III, it is not an easily solved to limitations discussed in previous sections, we approached
problem, due to multiple ambiguities, applied modeling this issue by applying context-based classification using
practices, and many other issues that are not addressed in token-level and text label-level features. We found out that
common NLP processing toolkits. Among such issues, one our trained classifier was able to obtain a precision of 0.78 and
may emphasize the processing of disjunctive or conjunctive F1-Score of 0.73, which we consider to be a rather positive
statements (which is considered to be a bad modeling result due to multiple constraints and limitations. In the
practice), the presence of shortened forms, like acronyms or future, we might as well test the developed solution in other
abbreviations. settings by expanding our developed dataset to include more
To solve the issues addressed in this paper, we evaluated specific cases. The results are expected to be improved after
several current state-of-the-art implementations from the applying the classifier to a more extensive and comprehensive
perspective of our research, while combining them under dataset, which would lead to exploiting additional token-
our custom formal grammar-based extraction to derive level, phrase-level, or even whole model-level features, and
prototype implementations. Additionally, we implemented is still subject to our further research. In this paper, we did
and tested our custom tagging tools, based on input corpora not consider acronym/abbreviation expansion, due to certain
augmentations and bidirectional LSTM-CRF architecture limitations and requirements discussed in Section VI. Yet, it is
with BERT and ELMO embeddings at the input layer. an interesting challenge that will be addressed in our future
In the first experiment, the Stanza-based implementation developments.
showed the best performance results in noun/verb extraction While our research presents a certain amount of contri-
tasks. Yet, we showed that implementation based on our bution in text processing for the system modeling domain,
custom BERT-BiLSTM-CRF tagger helped to improve the there is still a lot of space for future research. In this paper,
detection of verb phrase presence and verb phrase extraction we experimented with text labels of activity-like elements
as compared to the generic tagger implementations, including acquired from the BPMN process models and UML use
generic BERT-based tagger. This was expected as bias case models. However, other models, like UML activity
towards proper tagging of verbs could reduce the ability to models, state machines (or other kinds of statechart models)
correctly tag nouns in short text statements. Hence, balancing could also be successfully tested. Moreover, applying these
between biased and unbiased tagging still requires further techniques to larger and more elaborate datasets might
research. reveal other cases that could be addressed by tuning the
Our second experiment with processing disjunctive and formal grammars or processing algorithms discussed in
conjunctive statements showed this task to be more chal- this paper. Additionally, one could also resort to creating
lenging than expected, due to the dependence of our imple- specialized datasets or text corpora which would enable
mentation on the performance of underlying dependency the development of even-more specialized extraction tools.
parser toolkits. Unfortunately, while such statements are also Complementary, several technological constraints should be
considered to be bad modeling practice, they are widely addressed, particularly optimization of the final models for
used in real-world cases (this is also verified from the deployment due to the requirement of a significant amount
initial analysis of BPMAI dataset) and need to be addressed of resources needed to run larger deep learning models. This
carefully. This is an important topic relevant for multiple may require investigation of model reduction techniques such
information extraction and other NLP-related areas, such as as distillation or quantization.
relation extraction or aspect-based sentiment analysis. It has Finally, it is safe to state that in model-to-model transfor-
been proven to be a complicated task due to the generally mation (as well as in other areas involving the processing of
unstructured nature of natural language texts. Handling graphical models), one could also benefit from other existing

VOLUME 10, 2022 116955


P. Danenas, T. Skersys: Exploring Natural Language Processing in Model-To-Model Transformations

NLP capabilities, such as the extraction of semantic rela- [20] T. Skersys, P. Danenas, and R. Butleris, ‘‘Model-based M2M transfor-
tionships (synonymy, hyponymy, hypernymy, etc.), analysis mations based on drag-and-drop actions: Approach and implementation,’’
J. Syst. Softw., vol. 122, pp. 327–341, Dec. 2016.
and correction of grammatical errors. Indeed, fully automated [21] M. A. Hearst, ‘‘TextTiling: Segmenting text into multi-paragraph subtopic
processing requires significant input and capabilities from passages,’’ Comput. Linguistics, vol. 23, no. 1, pp. 33–64, 1997.
multiple fields of linguistic processing to ensure the high [22] S. Bird, E. Klein, and E. Loper, Natural Language Processing With Python.
Sebastopol, CA, USA: O’Reilly, 2009.
performance of the developed NLP applications, as discussed [23] (2021). Spacy.io. Accessed: Aug. 15, 2022. [Online]. Available:
in Table 1. This paves the road for our next near-future https://github.com/explosion/spaCy
developments and experimentation. [24] A. Akbik, T. Bergmann, D. Blythe, K. Rasul, S. Schweter, and R. Vollgraf,
‘‘FLAIR: An easy-to-use framework for state-of-the-art NLP,’’ in Proc.
Conf. North Amer. Chapter Assoc. Comput. Linguistics, Minneapolis, MI,
REFERENCES USA, Jun. 2019, pp. 54–59.
[1] C. Ru, J. Tang, S. Li, S. Xie, and T. Wang, ‘‘Using semantic similarity [25] M. Gardner, J. Grus, M. Neumann, O. Tafjord, P. Dasigi, N. F. Liu,
to reduce wrong labels in distant supervision for relation extraction,’’ Inf. M. Peters, M. Schmitz, and L. Zettlemoyer, ‘‘AllenNLP: A deep semantic
Process. Manage., vol. 54, no. 4, pp. 593–608, Jul. 2018. natural language processing platform,’’ in Proc. Workshop NLP Open
[2] H. Fei, Y. Ren, and D. Ji, ‘‘Boundaries and edges rethinking: An end-to- Source Softw. (NLP-OSS), Melbourne, VIC, Australia, Jul. 2018, pp. 1–6.
end neural model for overlapping entity relation extraction,’’ Inf. Process. [26] G. Chrupala, G. Dinu, and J. van Genabith, ‘‘Learning morphology with
Manage., vol. 57, no. 6, 2020, Art. no. 102311. Morfette,’’ in Proc. Int. Conf. Lang. Resour. Eval., Marrakech, Morocco,
[3] D. T. Vo and E. Bagheri, ‘‘Self-training on refined clause patterns for Jun. 2008, pp. 1–6.
relation extraction,’’ Inf. Process. Manage., vol. 54, no. 4, pp. 686–706, [27] A. Chakrabarty, O. A. Pandit, and U. Garain, ‘‘Context sensitive lemma-
Jul. 2017. tization using two successive bidirectional gated recurrent networks,’’ in
[4] D. Nozza, P. Manchanda, E. Fersini, M. Palmonari, and E. Messina, Proc. 55th Annu. Meeting Assoc. Comput. Linguistics, R. Barzilay and
‘‘LearningToAdapt with word embeddings: Domain adaptation of named M. Kan, Eds. Vancouver, BC, Canada, 2017, pp. 1481–1491.
entity recognition systems,’’ Inf. Process. Manage., vol. 58, no. 3, [28] T. Bergmanis and S. Goldwater, ‘‘Context sensitive neural lemmatization
May 2021, Art. no. 102537. with Lematus,’’ in Proc. Conf. North Amer. Chapter Assoc. Comput.
[5] Y. Jiang, W. Bai, X. Zhang, and J. Hu, ‘‘Wikipedia-based information Linguistics, Human Lang. Technol., M. A. Walker, H. Ji, and A. Stent, Eds.
content and semantic similarity computation,’’ Inf. Process. Manage., New Orleans, LA, USA, 2018, pp. 1391–1400.
vol. 53, no. 1, pp. 248–265, Jan. 2017. [29] M. Arehart, ‘‘Indexing methods for faster and more effective person name
[6] P. Danenas, T. Skersys, and R. Butleris, ‘‘Natural language processing- search,’’ in Proc. 7th Int. Conf. Lang. Resour. Eval., Valletta, Malta,
enhanced extraction of SBVR business vocabularies and business rules May 2010, pp. 1–15.
from UML use case diagrams,’’ Data Knowl. Eng., vol. 128, Jul. 2020, [30] A. Rozovskaya and D. Roth, ‘‘Grammatical error correction: Machine
Art. no. 101822. translation and classifiers,’’ in Proc. 54th Annu. Meeting Assoc. Comput.
[7] P. Danenas, T. Skersys, and R. Butleris, ‘‘Extending drag-and-drop actions- Linguistics, Berlin, Germany, 2016, pp. 2205–2215.
based model-to-model transformations with natural language processing,’’ [31] M. Junczys-Dowmunt, R. Grundkiewicz, S. Guha, and K. Heafield,
Appl. Sci., vol. 10, no. 19, p. 6835, Sep. 2020. ‘‘Approaching neural grammatical error correction as a low-resource
[8] H. Leopold, F. Pittke, and J. Mendling, ‘‘Ensuring the canonicity of process machine translation task,’’ in Proc. Conf. North Amer. Chapter Assoc.
models,’’ Data Knowl. Eng., vol. 111, pp. 22–38, Sep. 2017. Comput. Linguistics, Human Lang. Technol., New Orleans, LA, USA,
[9] H. Leopold, R.-H. Eid-Sabbagh, J. Mendling, L. G. Azevedo, and 2018, pp. 595–606.
F. A. Baião, ‘‘Detection of naming convention violations in process models
[32] S. Kiyono, J. Suzuki, M. Mita, T. Mizumoto, and K. Inui, ‘‘An empirical
for different languages,’’ Decis. Support Syst., vol. 56, pp. 310–325,
study of incorporating pseudo data into grammatical error correction,’’
Dec. 2013.
in Proc. Conf. Empirical Methods Natural Lang. Process. 9th Int. Joint
[10] F. Pittke, H. Leopold, and J. Mendling, ‘‘When language meets language:
Conf. Natural Lang. Process. (EMNLP-IJCNLP), Hong Kong, 2019,
Anti patterns resulting from mixing natural and modeling language,’’ in
pp. 1236–1242.
Proc. Bus. Process Manage. Workshops, F. Fournier and J. Mendling, Eds.
[33] A. Ratnaparkhi, ‘‘A maximum entropy model for part-of-speech tagging,’’
Cham, Switzerland: Springer, 2015, pp. 118–129.
in Proc. Conf. Empirical Methods Natural Lang. Process., 1996, pp. 1–10.
[11] S. Kumar, ‘‘A survey of deep learning methods for relation extraction,’’
[34] K. Toutanova, D. Klein, C. D. Manning, and Y. Singer, ‘‘Feature-rich part-
2017, arXiv:1705.03645.
of-speech tagging with a cyclic dependency network,’’ in Proc. Conf. North
[12] E. Agichtein and L. Gravano, ‘‘Snowball: Extracting relations from large
Amer. Chapter Assoc. Comput. Linguistics Hum. Lang. Technol., 2003,
plain-text collections,’’ in Proc. 5th ACM Conf. Digit. Libraries, New York,
pp. 252–259.
NY, USA, 2000, pp. 85–94.
[13] M. Mintz, S. Bills, R. Snow, and D. Jurafsky, ‘‘Distant supervision for [35] M. Silfverberg, T. Ruokolainen, K. Lindén, and M. Kurimo, ‘‘Part-of-
relation extraction without labeled data,’’ in Proc. AFNLP, Singapore, speech tagging using conditional random fields: Exploiting sub-label
Aug. 2009, pp. 1003–1011. dependencies for improved accuracy,’’ in Proc. 52nd Annu. Meeting Assoc.
[14] V.-T. Phi, J. Santoso, V.-H. Tran, H. Shindo, M. Shimbo, and Comput. Linguistics, Baltimore, MD, USA, 2014, pp. 259–264.
Y. Matsumoto, ‘‘Distant supervision for relation extraction via piecewise [36] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and
attention and bag-level contextual inference,’’ IEEE Access, vol. 7, P. Kuksa, ‘‘Natural language processing (almost) from scratch,’’ J. Mach.
pp. 103570–103582, 2019. Learn. Res., vol. 12 pp. 2493–2537, Aug. 2011.
[15] A. S. White, D. Reisinger, K. Sakaguchi, T. Vieira, S. Zhang, R. Rudinger, [37] Z. Huang, W. Xu, and K. Yu, ‘‘Bidirectional LSTM-CRF models for
K. Rawlins, and B. Van Durme, ‘‘Universal decompositional semantics on sequence tagging,’’ 2015, arXiv:1508.01991.
universal dependencies,’’ in Proc. Conf. Empirical Methods Natural Lang. [38] G. A. Miller, ‘‘WordNet: A lexical database for English,’’ Commun. ACM,
Process., Austin, TX, USA, 2016, pp. 1713–1723. vol. 38, no. 11, pp. 39–41, 1995.
[16] Business Process Model and Notation (BPMN), Version 2.0.2, Object [39] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, ‘‘BERT: Pre-training
Management Group (OMG), Needham, MA, USA, Dec. 2013. of deep bidirectional transformers for language understanding,’’ in Proc.
[17] Unified Modeling Language (UML), Version 2.5.1, Object Management Conf. North Amer. Chapter Assoc. Comput. Linguistics, Hum. Lang.
Group (OMG), Needham, MA, USA, Dec. 2017. Technol., vol. 1. Minneapolis, MI, USA, Jun. 2019, pp. 4171–4186.
[18] E. Jakumeit, S. Buchwald, D. Wagelaar, L. Dan, Á. Hegedüs, [40] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, ‘‘Distributed
M. Herrmannsdörfer, T. Horn, E. Kalnina, C. Krause, K. Lano, M. Lepper, representations of words and phrases and their compositionality,’’ in Proc.
A. Rensink, L. Rose, S. Wätzoldt, and S. Mazanek, ‘‘A survey and 26th Int. Conf. Neural Inf. Process. Syst., vol. 2. Red Hook, NY, USA,
comparison of transformation tools based on the transformation tool 2013, pp. 3111–3119.
contest,’’ Sci. Comput. Program., vol. 85, pp. 41–99, Jun. 2014. [41] M. E. Peters, M. Neumann, M. Iyyer, and M. Gardner, ‘‘Deep contextu-
[19] N. Kahani, M. Bagherzadeh, J. R. Cordy, J. Dingel, and D. Varró, ‘‘Survey alized word representations,’’ in Proc. Conf. North Amer. Chapter Assoc.
and classification of model transformation tools,’’ Softw. Syst. Model., Comput. Linguistics, Hum. Lang. Technol., vol. 1. New Orleans, LA, USA:
vol. 18, no. 4, pp. 2361–2397, Aug. 2019. ACL, Jun. 2018, pp. 2227–2237.

116956 VOLUME 10, 2022


P. Danenas, T. Skersys: Exploring Natural Language Processing in Model-To-Model Transformations

[42] M. A. Hearst, ‘‘Automatic acquisition of hyponyms from large text [64] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer,
corpora,’’ in Proc. 14th Conf. Comput. Linguistics, 1992, pp. 1–7. ‘‘Neural architectures for named entity recognition,’’ in Proc. Conf.
[43] R. Snow, D. Jurafsky, and A. Y. Ng, ‘‘Learning syntactic patterns for North Amer. Chapter Assoc. Comput. Linguistics, San Diego, CA, USA,
automatic hypernym discovery,’’ in Proc. 17th Int. Conf. Neural Inf. Jun. 2016, pp. 260–270.
Process. Syst., Cambridge, MA, USA: MIT Press, 2004, pp. 1297–1304. [65] A. Ghaddar and P. Langlais, ‘‘Robust lexical features for improved neural
[44] M. Onofrei, I. Hulub, D. Trandabat, and D. Gifu, ‘‘Apollo at SemEval-2018 network named-entity recognition,’’ in Proc. 27th Int. Conf. Comput.
task 9: Detecting hypernymy relations using syntactic dependencies,’’ in Linguistics, Santa Fe, NM, USA, Aug. 2018, pp. 1896–1907.
Proc. 12th Int. Workshop Semantic Eval., New Orleans, LA, USA, 2018, [66] L. Liu, J. Shang, X. Ren, F. F. Xu, H. Gui, J. Peng, and J. Han, ‘‘Empower
pp. 898–902. sequence labeling with task-aware neural language model,’’ in Proc. 32nd
[45] T. Kawaumra, M. Sekine, and K. Matsumura, ‘‘Hyponym/hypernym detec- AAAI Conf. Artif. Intell. 33th Innov. Appl. Artif. Intell. Conf. 8th AAAI
tion in science and technology thesauri from bibliographic datasets,’’ in Symp. Educ. Adv. Artif. Intell., 2018, pp. 1–15.
Proc. IEEE 11th Int. Conf. Semantic Comput. (ICSC), 2017, pp. 180–187. [67] Y. Shao, J. C.-W. Lin, G. Srivastava, A. Jolfaei, D. Guo, and Y. Hu,
[46] Z. Zhang, J. Li, H. Zhao, and B. Tang, ‘‘SJTU-NLP at SemEval-2018 task ‘‘Self-attention-based conditional random fields latent variables model
9: Neural hypernym discovery with term embeddings,’’ in Proc. 12th Int. for sequence labeling,’’ Pattern Recognit. Lett., vol. 145, pp. 157–164,
Workshop Semantic Eval., New Orleans, LA, USA, 2018, pp. 903–908. May 2021.
[47] A. Z. Hassan, M. S. Vallabhajosyula, and T. Pedersen, [68] I. O. Mulang’, K. Singh, C. Prabhu, A. Nadgeri, J. Hoffart, and
‘‘UMDuluth-CS8761 at SemEval-2018 task9: Hypernym discovery J. Lehmann, ‘‘Evaluating the impact of knowledge graph context on
using Hearst patterns, co-occurrence frequencies and word embeddings,’’ entity disambiguation models,’’ in Proc. 29th ACM Int. Conf. Inf. Knowl.
in Proc. 12th Int. Workshop Semantic Eval., New Orleans, LA, USA, Manage., New York, NY, USA, Oct. 2020, pp. 2157–2160.
2018, pp. 914–918. [69] M. P. K. Ravi, K. Singh, I. O. Mulang’, S. Shekarpour, J. Hoffart, and
[48] T. Dozat and C. D. Manning, ‘‘Simpler but more accurate semantic J. Lehmann, ‘‘CHOLAN: A modular approach for neural entity linking on
dependency parsing,’’ in Proc. 56th Annu. Meeting Assoc. Comput. Wikipedia and Wikidata,’’ in Proc. 16th Conf. Eur. Chapter Assoc. Comput.
Linguistics, Melbourne, VIC, Australia, 2018, pp. 484–490. Linguistics, 2021, pp. 504–514.
[49] T. Kim, B. Li, and S.-G. Lee, ‘‘Multilingual chart-based constituency parse [70] Y. Zhang, H. Lin, Z. Yang, J. Wang, Y. Sun, B. Xu, and Z. Zhao,
extraction from pre-trained language models,’’ in Proc. Findings Assoc. ‘‘Neural network-based approaches for biomedical relation classification:
Comput. Linguistics, EMNLP, Punta Cana, Dominican Republic, 2021, A review,’’ J. Biomed. Informat., vol. 99, Nov. 2019, Art. no. 103294.
pp. 454–463. [71] L. He, K. Lee, M. Lewis, and L. Zettlemoyer, ‘‘Deep semantic role
[50] S. Petrov, L. Barrett, R. Thibaux, and D. Klein, ‘‘Learning accurate, labeling: What works and what’s next,’’ in Proc. 55th Annu. Meeting Assoc.
compact, and interpretable tree annotation,’’ in Proc. 21st Int. Conf. Comput. Linguistics, Vancouver, BC, Canada, 2017, pp. 473–483.
COLING-ACL, Sydney, NSW, Australia, Jul. 2006, pp. 433–440. [72] J. Zhou and W. Xu, ‘‘End-to-end learning of semantic role labeling using
[51] Y. Zhang, Z. Li, and M. Zhang, ‘‘Efficient second-order TreeCRF for recurrent neural networks,’’ in Proc. 53rd Annu. Meeting Assoc. Comput.
neural dependency parsing,’’ in Proc. 58th Annu. Meeting Assoc. Comput. Linguistics 7th Int. Joint Conf. Natural Lang. Process., Beijing, China,
Linguistics, 2020, pp. 3295–3305. 2015, pp. 1127–1137.
[52] T. Ji, Y. Wu, and M. Lan, ‘‘Graph-based dependency parsing with graph [73] Z. Tan, M. Wang, J. Xie, Y. Chen, and X. Shi, ‘‘Deep semantic role labeling
neural networks,’’ in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics, with self-attention,’’ in Proc. 32nd AAAI Conf. Artif. Intell. 30th Innov.
Florence, Italy, 2019, pp. 2475–2485. Appl. Artif. Intell. Conf. 8th AAAI Symp. Educ. Adv. Artif. Intell., 2018,
[53] W. Wang and B. Chang, ‘‘Graph-based dependency parsing with bidirec- pp. 1–8.
tional LSTM,’’ in Proc. 54th Annu. Meeting Assoc. Comput. Linguistics, [74] D. Marcheggiani and I. Titov, ‘‘Encoding sentences with graph convolu-
Berlin, Germany, 2016, pp. 2306–2315. tional networks for semantic role labeling,’’ in Proc. EMNLP, Copenhagen,
[54] A. S. Schwartz and M. A. Hearst, ‘‘A simple algorithm for identifying Denmark, Sep. 2017, pp. 1506–1515.
abbreviation definitions in biomedical text,’’ in Proc. 8th Pacific Symp. [75] H. Fei, M. Zhang, B. Li, and D. Ji, ‘‘End-to-end semantic role labeling with
Biocomput., R. B. Altman, A. K. Dunker, L. Hunter, and T. E. Klein, Eds. neural transition-based model,’’ in Proc. AAAI Conf. Artif. Intell., vol. 35,
Lihue, HI, USA, Dec. 2002, pp. 451–462. pp. 12803–12811, May 2021.
[55] S. Sohn, D. C. Comeau, W. Kim, and W. J. Wilbur, ‘‘Abbreviation [76] M. Zhang, ‘‘A survey of syntactic-semantic parsing based on constituent
definition identification based on automatic precision estimates,’’ BMC and dependency structures,’’ Sci. China Technol. Sci., vol. 63, no. 10,
Bioinform., vol. 9, no. 1, p. 402, 2008. pp. 1898–1920, Oct. 2020.
[56] Y. Wu, S. T. Rosenbloom, J. C. Denny, R. A. Miller, S. Mani, [77] W. Han, Y. Jiang, H. T. Ng, and K. Tu, ‘‘A survey of unsupervised
D. A. Giuse, and H. Xu, ‘‘Detecting abbreviations in discharge summaries dependency parsing,’’ in Proc. 28th Int. Conf. Comput. Linguistics,
using machine learning methods,’’ in Proc. AMIA Annu. Symp., 2011, Barcelona, Spain, 2020, pp. 2522–2533.
pp. 1541–1549. [78] J. Li, A. Sun, J. Han, and C. Li, ‘‘A survey on deep learning for named entity
[57] M. Oleynik, M. Kreuzthaler, and S. Schulz, ‘‘Unsupervised abbreviation recognition,’’ IEEE Trans. Knowl. Data Eng., vol. 34, no. 1, pp. 50–70,
expansion in clinical narratives,’’ in Proc. 16th World Congr. Med. Health Jan. 2022.
Inform., vol. 245, A. V. Gundlapalli, M. Jaulent, and D. Zhao, Eds. [79] K. Liu, ‘‘A survey on neural relation extraction,’’ Sci. China Technol. Sci.,
Hangzhou, China: IOS Press, Aug. 2017, pp. 539–543. vol. 63, no. 10, pp. 1971–1989, Oct. 2020.
[58] L. Heryawan, O. Sugiyama, G. Yamamoto, P. H. Khotimah, [80] Ö. Sevgili, A. Shelmanov, M. Arkhipov, A. Panchenko, and C. Biemann,
L. H. O. Santos, K. Okamoto, and T. Kuroda, ‘‘A detection of informal ‘‘Neural entity linking: A survey of models based on deep learning,’’
abbreviations from free text medical notes using deep learning,’’ Eur. Semantic Web, vol. 13, no. 3, pp. 527–570, Apr. 2022.
J. Biomed. Informat., vol. 16, no. 1, pp. 29–37, 2020. [81] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis,
[59] X. Huang, E. Zhang, and Y. S. Koh, ‘‘Supervised clinical abbreviations L. Zettlemoyer, and V. Stoyanov, ‘‘RoBERT: A robustly optimized BERT
detection and normalisation approach,’’ in PRICAI 2019: Trends in Arti- pretraining approach,’’ 2019, arXiv:1907.11692.
ficial Intelligence, A. C. Nayak and A. Sharma, Eds. Cham, Switzerland: [82] K. Clark, M.-T. Luong, Q. V. Le, and C. D. Manning, ‘‘ELECTRA: Pre-
Springer, 2019, pp. 691–703. training text encoders as discriminators rather than generators,’’ 2020,
[60] Q. Jin, J. Liu, and X. Lu, ‘‘Deep contextualized biomedical abbreviation arXiv:2003.10555.
expansion,’’ in Proc. 18th BioNLP Workshop Shared Task, Florence, Italy, [83] Z. Yang, Z. Dai, Y. Yang, J. G. Carbonell, R. Salakhutdinov, and
Aug. 2019, pp. 88–96. Q. V. Le, ‘‘XLNet: Generalized autoregressive pretraining for language
[61] I. Li, M. Yasunaga, M. Y. Nuzumlali, C. Caraballo, S. Mahajan, understanding,’’ 2019, arXiv:1906.08237.
H. M. Krumholz, and D. R. Radev, ‘‘A neural topic-attention model for [84] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou,
medical term abbreviation disambiguation,’’ 2019, arXiv:1910.14076. W. Li, and P. J. Liu, ‘‘Exploring the limits of transfer learning with a unified
[62] V. Joopudi, B. Dandala, and M. Devarakonda, ‘‘A convolutional route to text-to-text transformer,’’ J. Mach. Learn. Res., vol. 21, no. 140, pp. 1–67,
abbreviation disambiguation in clinical text,’’ J. Biomed. Informat., vol. 86, 2020.
pp. 71–78, Oct. 2018. [85] P. He, X. Liu, J. Gao, and W. Chen, ‘‘DeBERTa: Decoding-enhanced BERT
[63] J. P. C. Chiu and E. Nichols, ‘‘Named entity recognition with bidirectional with disentangled attention,’’ 2020, arXiv:2006.03654.
LSTM-CNNs,’’ Trans. Assoc. Comput. Linguistics, vol. 4, pp. 357–370, [86] S. Adolph, A. Cockburn, and P. Bramble, Patterns for Effective Use Cases.
Dec. 2016. Boston, MA, USA: Addison-Wesley, 2002.

VOLUME 10, 2022 116957


P. Danenas, T. Skersys: Exploring Natural Language Processing in Model-To-Model Transformations

[87] S. W. Ambler, The Elements of UML(TM) 2.0 Style. USA: Cambridge, [102] T. Chen and C. Guestrin, ‘‘XGBoost: A scalable tree boosting system,’’
U.K.: Cambridge Univ. Press, 2005. in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining,
[88] M. Weske, G. Decker, M. Dumas, M. La Rosa, J. Mendling, and Aug. 2016, pp. 785–794.
H. A. Reijers, ‘‘Model collection of the business process management [103] T. K. Ho, ‘‘Random decision forests,’’ in Proc. 3rd Int. Conf. Document
academic initiative,’’ Version BPMAI-29-10-2019, Zenodo, 2020. Anal. Recognit., vol. 1, Aug. 1995, pp. 278–282.
[89] C. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard, and [104] L. Breiman, ‘‘Random forests,’’ Mach. Learn., vol. 45, no. 1, pp. 5–32,
D. McClosky, ‘‘The Stanford CoreNLP natural language processing Oct. 2001.
toolkit,’’ in Proc. 52nd Annu. Meeting Assoc. Comput. Linguistics, Syst.
Demonstrations, Baltimore, MD, USA, 2014, pp. 55–60.
[90] P. Qi, Y. Zhang, Y. Zhang, J. Bolton, and C. D. Manning, ‘‘Stanza: A Python
natural language processing toolkit for many human languages,’’ in Proc.
58th Annu. Meeting Assoc. Comput. Linguistics, Syst. Demonstrations,
2020, pp. 101–108.
[91] R. Weischedel, M. Palmer, M. Marcus, E. Hovy, S. Pradhan, L. Ramshaw, PAULIUS DANENAS received the Ph.D. degree
N. Xue, A. Taylor, J. Kaufman, M. Franchini, M. El-Bachouti, R. Belvin, in informatics from Vilnius University, Vilnius,
and A. Houston, ‘‘OntoNotes release 5.0,’’ LDC2013T19, Linguistic Data Lithuania, in 2013.
Consortium, Philadelphia, PA, USA, 2013. He is currently a Researcher at the Centre
[92] R. Panchendrarajan and A. Amaresan, ‘‘Bidirectional LSTM-CRF for of Information Systems Design Technologies,
named entity recognition,’’ in Proc. 32nd Pacific Asia Conf. Lang., Inf.
Kaunas University of Technology, Kaunas, Lithua-
Comput., Hong Kong, Dec. 2018, pp. 1–10.
nia. He is a coauthor of multiple papers in
[93] A. Gatt and E. Reiter, ‘‘SimpleNLG: A realisation engine for practical
applications,’’ in Proc. 12th Eur. Workshop Natural Lang. Gener., Athens, highly-rated academic journals and proceedings
Greece, 2009, pp. 90–93. of international conferences. His research interests
[94] J. Nivre, M.-C. de Marneffe, F. Ginter, J. Hajič, C. D. Manning, S. Pyysalo, include artificial intelligence, machine learning,
S. Schuster, F. Tyers, and D. Zeman, ‘‘Universal dependencies V2: natural language processing, data science, software engineering, model-
An evergrowing multilingual treebank collection,’’ in Proc. 12th Lang. driven development, and decision support systems (including business
Resour. Eval. Conf., Marseille, France, May 2020, pp. 4034–4043. and financial domains). He has served as a reviewer for a number of
[95] A. Bies, J. Mott, C. Warner, and S. Kulick, ‘‘English web Treebank highly-ranked academic journals, including the ones published by Springer,
LDC2012T13,’’ Linguistic Data Consortium, Philadelphia, PA, USA, Elsevier, Wiley, Taylor & Francis, IEEE, and others.
2012.
[96] S. Saha, ‘‘Open information extraction from conjunctive sentences,’’ in
Proc. 27th Int. Conf. Comput. Linguistics, Santa Fe, NM, USA, Aug. 2018,
pp. 2288–2299.
[97] J. Ficler and Y. Goldberg, ‘‘A neural network for coordination boundary
prediction,’’ in Proc. Conf. Empirical Methods Natural Lang. Process.,
Austin, TX, USA, 2016, pp. 23–32. TOMAS SKERSYS is a Scientific Researcher
[98] M. Miwa, R. Sætre, Y. Miyao, and J. Tsujii, ‘‘Entity-focused sentence at the Center of Information Systems Design
simplification for relation extraction,’’ in Proc. 23rd Int. Conf. Comput. Technologies and a Professor at the Department
Linguistics, Beijing, China, Aug. 2010, pp. 788–796. of Information Systems, Kaunas University of
[99] D. Vickrey and D. Koller, ‘‘Sentence simplification for semantic role Technology. His research interests and practical
labeling,’’ in Proc. ACL, HLT, Columbus, OH, USA, Jun. 2008, experience cover various aspects of business pro-
pp. 344–352.
cess management and model-driven information
[100] T. N. C. Vo, T. H. Cao, and T. B. Ho, ‘‘Abbreviation identification
in clinical notes with level-wise feature engineering and supervised
systems development. On these topics, he has
learning,’’ in Knowledge Management and Acquisition for Intelligent published several articles in high-rated academic
Systems, H. Ohwada and K. Yoshida, Eds. Cham, Switzerland: Springer, journals and in a number of international confer-
pp. 3–17, 2016. ences. He is also a co-editor of three books of international conferences
[101] A. V. Dorogush, V. Ershov, and A. Gulin, ‘‘CatBoost: Gradient boosting published by Springer Verlag.
with categorical features support,’’ 2018, arXiv:1810.11363.

116958 VOLUME 10, 2022

You might also like