Large Language Models For Business Process Management
Large Language Models For Business Process Management
net/publication/369645331
CITATIONS READS
2 2,939
3 authors:
Jan Mendling
Humboldt-Universität zu Berlin
628 PUBLICATIONS 24,797 CITATIONS
SEE PROFILE
All content following this page was uploaded by Stefan Bachhofner on 30 March 2023.
Abstract. Large language models are deep learning models with a large
number of parameters. The models made noticeable progress on a large
number of tasks, and as a consequence allowing them to serve as valuable
and versatile tools for a diverse range of applications. Their capabilities
also offer opportunities for business process management, however, these
opportunities have not yet been systematically investigated. In this pa-
per, we address this research problem by foregrounding various manage-
ment tasks of the BPM lifecycle. We investigate six research directions
highlighting problems that need to be addressed when using large lan-
guage models, including usage guidelines for practitioners.
1 Introduction
2 Background
The advent of LLM applications paves the way towards a plethora of new BPM-
related applications. So far, BPM has adopted natural language processing [1],
artificial intelligence [7], and knowledge graphs [14] to support various applica-
tion scenarios. In this section, we discuss the foundations of DL (Section 2.1)
and LLMs (Section 2.2). In this way, we aim to clarify their specific capabilities.
Recent LLM applications build on machine learning and deep learning models,
such as recurrent neural networks (RNNs) and transformer networks. Machine
Learning (ML) studies algorithms that are ”capable of learning to improve their
performance of a task on the basis of their own previous experience” [15]. In
essence, ML techniques use either supervised learning, unsupervised learning,
and reinforcement learning as a paradigm. Several of them are relevant for LLM.
In supervised learning, the ML algorithm receives as an input a collection
of pairs, where one pair consists of features representing a concept, along with
a label. Importantly, this label is task specific and encodes what the algorithm
should learn about the concepts. Such labels can be, for instance, spam and no
spam for a spam classifier, or bounding boxes with annotations for an image.
There are two cases of supervised learning that are relevant for LLM: few-shot
LLM for BPM: Opportunities and Challenges 3
In recent years, BPM research has integrated the capabilities of deep learning
to a large extent for process prediction. For an overview, see [16]. There are also
recent applications for automatic process discovery [28], for generating process
models from hand-written sketches [27], and for anomaly detection [17].
4 M. Vidgof et al.
LLM are DL models trained on vast amounts of text data to perform various
natural language processing tasks. These models, which typically range from
hundreds of millions to billions of parameters, are designed to capture the com-
plexities and nuances of human language. The largest models, such as GPT-1
and GPT-3, are capable of generating human-like text, answering questions,
translating languages, and computer code. The training process of these models
involves processing massive amounts of text data, which is used to learn pat-
terns and relationships between words and phrases. These models then use this
information to predict the likelihood of a given token, or sequence of tokens,
in a specific context. This allows them to generate coherent and contextually
relevant text or perform other language-related tasks. The rise of large language
models has resulted in significant advancements in the field of NLP, and they
are widely used in various applications, including chatbots, virtual assistants,
and text generation systems. One of their strengths is their ability to perform
few-shot and zero-shot learning with prompt-based learning [11].
In 2018, Radford et al. introduced GPT-1 (also sometimes called simply
GPT) in their paper on ”Improving language understanding by generative pre-
training” [23]. Generative Pre-trained Transformer (GPT)-1 refers to the largest
model the authors have trained (110 million parameters). In the paper, the au-
thors studied the ability of transformer networks trained in two phases for lan-
guage understanding. In the first phase, they trained a transformer network to
predict the next token given a set of tokens that appeared before (also called un-
supervised pre-training, generative pre-training, or in statistics auto-regressive).
In the second phase, the transformer networks was fine tuned on tasks with
supervised learning (also called discriminative fine-tuning). In summary, their
major finding is that combining task agnostic unsupervised learning in the first
phase, then using this model in a second phase with supervised learning for fine
tuning on tasks can lead to performance gains - from 1.5% on textual entailment
to 8.9% on commonsense reasoning.
In 2019, Radford et al. introduced GPT-2 in their paper ”Language Models
are Unsupervised Multitask Learners”[24]. Again, GPT-2 refers to the largest
model they have trained. GPT-2 is hence a scaled up version of GPT-1 in model
size (1.5 billion parameters), and also in training data size. In particular, GPT-
2 has roughly more than ten times the number of parameters than GPT-1,
and is trained on roughly more than ten times the amount of training data.
They report two major findings. First, the unsupervised GPT-2 can outperform
language models that are trained on task specific data set, without these data
sets being in the training data set of GPT-2. Second, GPT-2 seems to learn
tasks (for example question answering) from unlabeled text data. In both cases,
however, the performance did not reach the state-of-the-art. In summary, their
major finding is that LLMs can learn tasks without the need to train them on
these tasks, given that they have sufficient unlabeled training data.
In 2020, Brown et al. introduced GPT-3 with the paper ”Language Models
are Few-Shot Learners”[5]. Unlike the above two cases, GPT-3 refers to all the
LLM for BPM: Opportunities and Challenges 5
models the authors have trained, i.e. it refers to a family of models. The largest
model the authors have trained is GPT-3 175B, a model with 175 billion param-
eters. In their paper, the authors showed that language models like GPT-3 can
learn tasks with only a few examples, hence the title includes ”few-shot learn-
ers”. The authors demonstrated this ability by fine-tuning GPT-3 on various
tasks, including question answering and language translation, using only a small
number of examples.
In 2023, OpenAI introduced GPT-4 [21]. Contrary to previous versions of
GPT, this version is a multimodal model as it can process text and images as
an input to produce text. This model is a major step forward as it improves
on numerous benchmarks; however, it suffers from reliability issues, a limited
context window, and inability to learn from experience like previous GPT mod-
els. This release, however, diverges from previous GPT models as OpenAI is
secretive about ”the architecture (including model size), hardware, training com-
pute, dataset construction, training method, or similar”. We only know about the
model that it is a transformer style-model, pre-trained on predicting the next
token on publicly available and not disclosed licensed data, and then fine-tuned
with Reinforcement Learning from Human Feedback (RLHF). Notwithstanding
this departure, the authors include in their report findings on predicting model
scalability. They in particular report on predicting the loss as a function of com-
pute, and the mean log pass rate (a measure on how many code sample pass a
unit test) as a function of compute given a training methodology. In both cases,
they find that they could predict the respective measure with high accuracy
based on data generated with significantly less compute (1.000 to 10.000 less).
They also find the inverse scaling price for a task, meaning that the performance
on a task first decreases as a function of model size and then increases after a
particular model size.
In 2022, OpenAI introduced a conversational LLM – called ChatGPT [19].
As a model, the first version of ChatGPT was based on GPT-3.5 and is an
InstructGPT sibling. GPT-3.5 is a GPT-3.0 model trained on a training data
set that contains text and software code up to the fourth quarter of 2021 [18].
InstructGPT was introduced in ”Training language models to follow instructions
with human feedback ” [22], and is a GPT-3 model fine tuned with supervised
learning in the first step, and in the second step with reinforcement learning
from human feedback [29]. ChatGPT is hence a GPT-3.5 model fine tuned for
conversational interaction with the user. In other words, the user interacts with
the model via sequence of text (the conversation) to accomplish a task. For
example, we can copy and paste a text into ChatGPT’s input field and ask it
to summarize it. We can even be more specific, we can say that the summary
should be 10 sentences long and be written in a preferred style. Importantly, if
we are unhappy with the result we can ask ChatGPT to refine its own summary
without copying and pasting the text it should summarize. At the moment of
this writing, ChatGPT can be used with GPT-4 as the backend LLM.
There are also other large language models. In 2022, Zhang et al. introduced
Open Pre-trained Transformer (OPT) with the paper ”OPT: Open Pre-trained
6 M. Vidgof et al.
Above, we briefly discuss LLMs, where we focus particularly on the GPT model
family as these are the most popular LLMs, we hypothesise. It is important to
recognize the transition from GPT-3 to GPT-4, as it brought a massive increase
on a variety of benchmarks, particular on academic and professional exams [21].
These performance increases in NLP tasks are a result of natural language under-
standing and have, as we argue, massive implications for what can be automated
– the automation frontier. This frontier is arguably shifted further when natural
language understanding is combined with plugin software components. In fact,
at the time of writing, the company behind GPT is experimenting with Chat-
GPT plugins. Among the currently offered plugins are Klarna, Wolfram, the
integration with vector data bases for information retrieval, and an embedded
code interpreter for Python [20]. This has an impact on Robotics Process Au-
tomation (RPA), and more broadly on business process automation including
Business Process Management Systems (BPMSs), and more generally on how
work is carried out.
3.1 Identification
The BPM lifecycle starts from Identification. Normally, at this stage there is not
much structured process knowledge available in the company, and relevant in-
formation has to be extracted from heterogeneous internal documentation. This
is exactly where LLM shine as they can quickly scan and summarize large vol-
umes of text, highlighting important documents or directly outputting required
information.
4
See for example the OpenAI Cookbook GitHub repository, which provides code
examples for the OpenAI API
LLM for BPM: Opportunities and Challenges 7
Identifying processes from documentation The idea is to give LLM all relevant
documentation existing in the organization as input. This can include legal doc-
uments, job descriptions, advertisements, internal knowledge bases and hand-
books. The LLM is then tasked to identify which processes are taking place in
the organization. It can be further instructed to classify the input documents
according to processes they describe. Multimodal LLMs can improve the results
even further as charts, presentations and photos can also directly be used as
information sources.
3.2 Discovery
The second stage of BPM lifecycle is Process Discovery. At this stage one or a
combination of process discovery methods is selected to produce process models.
When one speaks of automated process discovery, one usually means process
mining – a technique of extracting process models and other relevant data out
of event logs left by information systems supporting the execution of a process.
However, with LLM also other discovery techniques can benefit from (at least
partial) automation.
Process discovery from communication logs Another information source that can
be used in evidence-based discovery is communicaiton logs, i.e. e-mails and chats
between process participants: internal employees but also external partners and
customers. LLM can extract patterns from these communication logs, which can
be seen as various steps in a process. Then, they can similarly produce process
descriptions or models.
8 M. Vidgof et al.
Interview chat bot Possible applications of LLM in process discovery can also go
beyond evidence-based discovery. Another common discovery method are inter-
views with domain experts. In these interviews, process analyst asks questions
about the process and produces a process model based on several interviews.
Typically, several separate interviews with different domain experts are required
to produce the first version of process model. Afterwards, additional rounds of
interviews are conducted in order to get and incorporate feedback and to perform
validation. In the worst case, domain experts might have conflicting perceptions
of the process, then resolving such conflicts becomes a very difficult and time-
consuming task for both process analyst and domain experts.
LLM can solve parts of this problem by providing a chat bot interface for
domain experts. In this way, the domain experts answer questions in the chat.
This can bring a lot of advantages. First, the domain experts do not have to
allocate lengthy time slots for interviews but instead talk with the chat bot at
desired pace. Second, the feedback loop gets shorter as LLM can produce process
models directly after or even during the conversation with the domain expert and
also do updates to the model, thus validation can happen simultaneously with
model creation. Finally, the benefits will only grow if multiple domain experts
interact with the chat bot simultaneously (and independently) but the chat bot
can use all of this input in the conversations. The latter option is, however, more
difficult to implement.
Combined process discovery All process discovery methods have their advantages
and drawbacks. Often, a combination of these methods is used to achieve best
results. However, this combination is limited by the resources that are allocated
for process discovery task. Discovery methods presented above give valuable out-
put yet requiring much less resources. Thus, it is possible to apply more of them
simultaneously for even better result. The combination of these methods can be
used in addition to traditional process mining or "manual" process discovery,
which will provide the richest insights. While it could happen that the results
of different methods have some inconsistencies that will have to be fixed, also
fixing them can be done in (semi-)automated manner.
3.3 Analysis
The next stage is Process Analysis. At this stage, the discovered processes are
analyzed to find problems and bottlenecks. While this is a cognitively loaded
task, LLM can be used to help human analysts in some regard.
Issue discovery If an issue exists in a process, chances are high somebody has
already complained about it. Depending on the company, product, and process
it can be the customer, partner or an employee and in can happen on different
platforms, including social media, support service or internal communication
tools. LLM are good at summarizing large volumes of unstructured text as well
as finding patterns, and this capability can be used for this task. It is as easy as
just scraping the text from these platforms and giving it as input to the LLM
with a simple prompt like "find all things customers have complained about".
Issue spotting After an issue in the process is found, the next step is to spot the
part of the process that creates this issue. In some cases, it can be a difficult task,
especially in a complex process. The idea here is to give LLM all process models
(or models of the relevant process in case it is known that only one process causes
the issue and it is known exactly which process) and the spotted problems. The
task of LLM is, by analyzing task names and descriptions to make suggestions
which tasks may be responsible for the issue. In advanced cases, LLM might
be even capable of suggesting some fixes. It might be something as simple as
suggesting to automate some manual task that takes too long but it also might
be some more complex process redesign suggestion as long as LLM is given
redesign methods as additional input or is trained on redesign methods as well.
3.4 Redesign
The fourth phase of the BPM lifecycle is Process Redesign. In this stage, process
improvement suggestions are developed based on discovered issues and general
process improvement methods. These suggestions are evaluated, and a to-be
process model is developed at the end of this stage.
Business process improvement An obvious yet very promising use case is t just
ask LLM to redesign the process. As already mentioned, simple issues arising
from just one activity can be fixed by the LLM. However, it does not stop there
and is theoretically only depending on the quality of the input given to the LLM.
10 M. Vidgof et al.
3.5 Implementation
The next phase of the BPM lifecycle is Process Implementation. It covers organi-
zational and technical changes required to change the way of working of process
participants as well as IT support for the to-be process.
BPMN model explanations with plain text As mentioned, LLM can work with
BPMN models serialized in XML. We have already discussed how LLM can
manipulate process models in order to increase quality as well as suggest or
incorporate redesign ideas. To close the circle, LLM can produce textual expla-
nations of BPMN models. What is more interesting, one can control the level
of detail as well. So, depending on the target audience, LLM can produce tex-
tual overview but also detailed descriptions of the models. It can transform it
into requirements for software developers if enough details are contained in the
BPMN model itself.
BPMN model chatbot Building on top of the previous use case, model description
can be also tailored to every specific user. This way, given a model or – better
– model repository with additional documentation, LLM can prepare specific
descriptions for, e.g. process owner but also for individual participants for which
all specific tasks they are responsible for are also described and explained in
detail. Furthermore, in this use case one can add interaction between the user
and LLM. This way, user may ask clarifications for parts he did not understand
or generally ask for more details as long as some guidance is required.
Process orchestrator LLMs can be accessed via APIs and at the same time
can access APIs themselves, opening a huge variety of opportunities. While the
former means it can be used for automated tasks and be called by the orchestra-
tor, the latter means that it could theoretically be an orchestrator itself: given
executable process model and additional constraints as context as well as the
LLM for BPM: Opportunities and Challenges 11
3.6 Monitoring
The last phase of the BPM lifecycle is Process Monitoring. At this stage, already
implemented processes are executed, and their performance is monitored. The
observations collected in this phase are used for operational management as well
as serve as input for further iterations of the lifecycle.
Process dashboard chatbot Dashboards are a powerful tool that provides overview
of the most important KPIs of a process on a single screen. However, the ultimate
goal of them is to tell the viewer whether the status of the process is good or
not, and the numbers and colors are mostly used as an intermediary medium.
LLM can take away this intermediate step and allow the user to directly know
the status of the processes.
4 Research directions
In this section we propose the research directions. We categorize the research
directions into three groups. The first group studies the use of LLM, and their
applications, in practice. This includes the use within BPM projects in compa-
nies or as part of an Information System (IS) (Section 4.1), the development
of usage guidelines for practitioners and researchers (Section 4.2), and also the
derivation of BPM tasks (Section 4.3) and their corresponding data sets (Sec-
tion 4.4). The second group studies how LLM can be combined with existing
BPM tools, and more generally BPM technologies, to increase user experience
(Section 4.5). Crucially, this group draws from findings in the first group. The
third and final group develops large language models specifically for business
process management, so these models can understand the context and language
of business processes and support various tasks, such as process discovery, mon-
itoring, analysis, and optimization (Section 4.6). Again, this group builds upon
the findings of the first group.
to study is whether we always need the largest, and hence most accurate model,
for each task. We hypothesize that this might not be the case. Finally, and most
importantly, the next big question to answer is how LLM will change how work
is carried out within BPM projects, and within processes that are actively man-
aged. We for example hypothesize that conversational LLMs might take the spot
of the duck in the famous duck approach 5 . This question is a socio-technical
systems question, and we hence strongly believe that the BPM community, and
the information systems community more broadly, is especially well equipped to
contribute to this question.
The second research direction builds usage guidelines for BPM researchers and
practitioners. One question such guidelines have to answer is given an organiza-
tional context, the lifecycle phase, and the process context of a task, suggest a
LLM to achieve an expected value. In addition, such guidelines systematically
collect best practices for creating prompts. For example, for the BPM lifecycle
phase process implementation, and monitoring and controlling, a company might
consider using a LLM within a managed process. Let us assume this company
is a bank and wants to automate the task of replying to customer inquiries with
LLM. Then this guideline proposes for the process implementation a specific
LLM, with the number of parameters it has, gives examples on how to create a
prompt template, fill the template with customer background information, and
finally on how to integrate the customer inquiry within the prompt template.
For process monitoring and controlling, the guidelines might propose a different
model for analyzing different inquiry clusters as the lifecycle phase context is
different. As an example, consider here that the LLM first categorizes each in-
quiry into a positive and negative sentiment, and then lists for both the top five
inquiry reasons. This research direction builds upon the first research direction,
as first research direction, among others, determines the tasks for which LLM
can be used in principle.
This research direction builds and maintains two different task lists. The first
list maps general NLP tasks to tasks within BPM. As an example, consider
the general NLP task of text summarizing. Within BPM, text summarizing can
relate to summarizing a set of process descriptions or task descriptions. We can
think of this list as a one to many mapping between NLP tasks on the one hand,
and BPM tasks on the other. The second list enumerates tasks that are unique
to BPM. This research direction uses the findings from the directions presented
in Section 4.1 and Section 4.2.
5
Rubber duck debugging
LLM for BPM: Opportunities and Challenges 13
language models that are fine-tuned on the BPM domain. An important aspect
of this direction is to open source the created LLM, as is done for OPT [34]. This
is important for researchers can use this model in their studies, and practice as
companies can use these models free of charge for their use cases.
5 Discussion
In this section we discuss the challenges of LLM, the power of combination and
inflated expectations, and end with an outlook and future work.
Challenges The use of LLM entails opportunities and challenges. For exam-
ple, they can help to understand difficult research, but they also carry over
deficiencies (including factual errors) in the training data set to the texts they
generate [32]. In a systematic study of these errors, Borji analyzes errors of
ChatGPT and categorizes them – the author further outlines and discusses the
risks, limitations and societal implication of such models6 [4]. The failure cate-
gories identified by the author include reasoning, factual, math, and coding. A
similar deficiencies study was done in [2], but these authors focus on LLM in
general. A news feature in Nature discusses these and the risks of using LLM [9].
One consequence for education might be that essays as an assignment should be
re-considered [30].
Outlook and future work LLM are used, and will be used in commercial products
with huge amounts of users. We speculate that this will have an effect on research,
as funding agencies might increase the amount of grants for this research field.
An ever increasing user base that interacts with LLM (directly or indirectly) is
6
See the ChatGPT failure archive (GitHub) for an up-to-date list
7
https://openai.com/blog/chatgpt-plugins
LLM for BPM: Opportunities and Challenges 15
therefore, in our view, inevitable. For future work, we plan to work on developing
research directions that are beyond the scope of this paper. We expect that LLM
will have an effect on how work is carried out (see Section 2.3 and Section 4.1).
But this may have far greater impacts than what we cover here, for example on
the BPM capabilities, which are strategy, governance, information technology,
people, and culture [25].
6 Conclusion
In this paper we present six research directions for studying and building LLMs
for BPM. We use the BPM lifecycle to propose applications of LLM to showcase
the impact of these models.
References
1. Van der Aa, H., Carmona Vargas, J., Leopold, H., Mendling, J., Padró, L.: Chal-
lenges and opportunities of applying natural language processing in business pro-
cess management. In: COLING 2018: The 27th International Conference on Com-
putational Linguistics: Proceedings of the Conference: August 20-26, 2018 Santa
Fe, New Mexico, USA. pp. 2791–2801. Association for Computational Linguistics
(2018)
2. Bender, E.M., Gebru, T., McMillan-Major, A., Shmitchell, S.: On the dan-
gers of stochastic parrots: Can language models be too big? In: Proceedings
of the 2021 ACM Conference on Fairness, Accountability, and Transparency.
p. 610–623. FAccT ’21, Association for Computing Machinery, New York, NY,
USA (2021). https://doi.org/10.1145/3442188.3445922, https://doi.org/10.1145/
3442188.3445922
3. Blagec, K., Kraiger, J., Frühwirt, W., Samwald, M.: Benchmark datasets driving
artificial intelligence development fail to capture the needs of medical professionals.
Journal of Biomedical Informatics p. 104274 (2022)
4. Borji, A.: A categorical archive of ChatGPT failures (2023). https://doi.org/10.
48550/ARXIV.2302.03494, https://arxiv.org/abs/2302.03494
5. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee-
lakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A.,
Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C.,
Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner,
C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language models are
few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin,
H. (eds.) Advances in Neural Information Processing Systems. vol. 33, pp. 1877–
1901. Curran Associates, Inc. (2020), https://proceedings.neurips.cc/paper/2020/
file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
6. van Dis, E.A.M., Bollen, J., Zuidema, W., van Rooij, R., Bockting, C.L.: ChatGPT:
five priorities for research. Nature 614(7947), 224–226 (Feb 2023)
7. Dumas, M., Fournier, F., Limonad, L., Marrella, A., Montali, M., Rehse, J.R.,
Accorsi, R., Calvanese, D., De Giacomo, G., Fahland, D., et al.: AI-augmented
business process management systems: a research manifesto. ACM Transactions
on Management Information Systems 14(1), 1–19 (2023)
16 M. Vidgof et al.
8. Dumas, M., La Rosa, M., Mendling, J., Reijers, H.A.: Fundamentals of Business
Process Management, vol. 2. Springer (2018)
9. Hutson, M.: Robo-writers: the rise and risks of language-generating AI. Nature
591(7848), 22–25 (Mar 2021)
10. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444
(2015)
11. Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and
predict: A systematic survey of prompting methods in natural language processing.
ACM Computing Surveys 55(9), 1–35 (2023)
12. Loizos, C.: StrictlyVC in conversation with Sam Altman, part two (OpenAI).
https://www.youtube.com/watch?v=ebjkD1Om4uw (Jannuary 2023), YouTube
channel of Connie Loizos
13. Malinova, M., Mendling, J.: Identifying do’s and don’ts using the integrated
business process management framework. Business Process Management Journal
(2018)
14. Miller, J.A., Mahmud, R.: Research directions in process modeling and mining
using knowledge graphs and machine learning. In: Qingyang, W., Zhang, L.J. (eds.)
Services Computing – SCC 2022. pp. 86–100. Springer Nature Switzerland, Cham
(2022)
15. Mjolsness, E., DeCoste, D.: Machine learning for science: state of the art and future
prospects. Science 293(5537), 2051–2055 (2001)
16. Neu, D.A., Lahann, J., Fettke, P.: A systematic literature review on state-of-the-
art deep learning methods for process prediction. Artificial Intelligence Review pp.
1–27 (2022)
17. Nolle, T., Seeliger, A., Thoma, N., Mühlhäuser, M.: Deepalign: alignment-based
process anomaly correction using recurrent neural networks. In: Advanced Informa-
tion Systems Engineering: 32nd International Conference, CAiSE 2020, Grenoble,
France, June 8–12, 2020, Proceedings. pp. 319–333. Springer (2020)
18. OpenAI: Model index for researchers. https://platform.openai.com/docs/model-
index-for-researchers/model-index-for-researchers
19. OpenAI: Chatgpt: Optimizing language models for dialogue.
https://openai.com/blog/chatgpt/ (November 2022)
20. OpenAI: ChatGPT plugins. https://openai.com/blog/chatgpt-plugins (March
2023)
21. OpenAI: GPT-4 technical report. https://cdn.openai.com/papers/gpt-4.pdf
(March 2023)
22. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C.L., Mishkin, P., Zhang,
C., Agarwal, S., Slama, K., Ray, A., et al.: Training language models to follow
instructions with human feedback. arXiv preprint arXiv:2203.02155 (2022)
23. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.:
Improving language understanding by generative pre-training.
https://openai.com/blog/language-unsupervised/ (2018)
24. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language
models are unsupervised multitask learners. https://openai.com/blog/better-
language-models/ (2019)
25. Rosemann, M., vom Brocke, J.: The six core elements of business process man-
agement. In: Handbook on business process management 1, pp. 105–122. Springer
(2015)
26. Scao, T.L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Castagné, R.,
Luccioni, A.S., Yvon, F., Gallé, M., et al.: Bloom: A 176b-parameter open-access
multilingual language model. arXiv preprint arXiv:2211.05100 (2022)
LLM for BPM: Opportunities and Challenges 17
27. Schäfer, B., Van der Aa, H., Leopold, H., Stuckenschmidt, H.: Sketch2process: End-
to-end bpmn sketch recognition based on neural networks. IEEE Transactions on
Software Engineering (2022)
28. Sommers, D., Menkovski, V., Fahland, D.: Supervised learning of process discovery
techniques using graph neural networks. Information Systems p. 102209 (2023)
29. Stiennon, N., Ouyang, L., Wu, J., Ziegler, D., Lowe, R., Voss, C., Radford,
A., Amodei, D., Christiano, P.F.: Learning to summarize with human feed-
back. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.)
Advances in Neural Information Processing Systems. vol. 33, pp. 3008–3021.
Curran Associates, Inc. (2020), https://proceedings.neurips.cc/paper/2020/file/
1f89885d556929e98d3ef9b86448f951-Paper.pdf
30. Stokel-Walker, C.: AI bot ChatGPT writes smart essays-should academics worry?
Nature (December 2022)
31. Teubner, T., Flath, C.M., Weinhardt, C., van der Aalst, W., Hinz, O.: Welcome
to the era of chatgpt et al. the prospects of large language models. Business &
Information Systems Engineering pp. 1–7 (2023)
32. Van Noorden, R.: How language-generation AIs could transform science. Nature
605(7908), 21 (2022)
33. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser,
Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information
Processing Systems. pp. 5998–6008 (2017)
34. Zhang, S., Roller, S., Goyal, N., Artetxe, M., Chen, M., Chen, S., Dewan, C., Diab,
M., Li, X., Lin, X.V., et al.: OPT: Open pre-trained transformer language models.
arXiv preprint: https://arxiv.org/abs/2205.01068 (2022)