11 Requirements Engineering in Machine Learning Projects

Uploaded by

yukizyx230

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

11 Requirements Engineering in Machine Learning Projects

Uploaded by

yukizyx230

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Received 24 May 2023, accepted 14 June 2023, date of publication 12 July 2023, date of current version 19 July 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3294840

Requirements Engineering in Machine

Learning Projects
ANA GJORGJEVIKJ , KOSTADIN MISHEV , LJUPCHO ANTOVSKI ,
AND DIMITAR TRAJANOV , (Member, IEEE)
Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, 1000 Skopje, North Macedonia
Corresponding author: Ana Gjorgjevikj ([email protected])

ABSTRACT Over the last decade, machine learning methods have revolutionized a large number of domains
and provided solutions to many problems that people could hardly solve in the past. The availability
of large amounts of data, powerful processing architectures, and easy-to-use software frameworks have
made machine learning a popular, readily available, and affordable option in many different domains and
contexts. However, the development and maintenance of production-level machine learning systems have
proven to be quite challenging, as these activities require an engineering approach and solid best practices.
Software engineering offers a mature development process and best practices for conventional software
systems, but some of them are not directly applicable to the new programming paradigm imposed by
machine learning. The same applies to the requirements engineering best practices. Therefore, this article
provides an overview of the requirements engineering challenges in the development of machine learning
systems that have been reported in the research literature, along with their proposed solutions. Furthermore,
it presents our approach to overcoming those challenges in the form of a case study. Through this mixed-
method study, the article tries to identify the necessary adjustments to (1) the best practices for conventional
requirements engineering and (2) the conventional understanding of certain types of requirements to better
fit the specifics of machine learning. Moreover, the article tries to emphasize the relevance of properly
conducted requirements engineering activities in addressing the complexity of machine learning systems,
as well as to motivate further discussion on the requirements engineering best practices in developing such
systems.

INDEX TERMS Machine learning, requirements engineering, software engineering, software requirements.

I. INTRODUCTION (e.g., [2], [3]) have been mainly a result of the improvements
Artificial intelligence (AI) and its sub-field machine learning in the techniques used to train deep neural networks, the
(ML) have had significant research activity and commercial availability of larger datasets and more powerful computers,
use for decades, but over the last decade, they have become as well as the significantly reduced training time [4]. This
significantly more popular and accessible to the wider com- progress has gradually made ML algorithms ubiquitous in
munity. To a large extent, that has happened as a result of the many areas of our society and everyday activities.
significant progress made in the ML sub-field known as deep ML methods introduce a different approach to software
learning (DL) [1], which relies on deep neural networks to programming in which, instead of writing problem-solving
learn meaningful representations from raw data and bypasses instructions in software code, learning algorithms learn
the need for manual feature engineering. The significant solutions to problems through data. This new approach
achievements that DL has made possible in many fields generally consists of specifying a goal of the program
behavior, e.g., by collecting relevant data, limiting the solu-
The associate editor coordinating the review of this manuscript and tion search space through a rough skeleton of code, and
approving it for publication was Vicente Alarcon-Aquino . letting the learning algorithm find the best solution [5].

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
72186 VOLUME 11, 2023
A. Gjorgjevikj et al.: Requirements Engineering in Machine Learning Projects

Although ML systems typically require a significant amount ethical or legal requirements they are subjected to. Never-
of conventional software code to support the ML models theless, not much is available on RE for ML systems, nor
which are at their core [6], the new approach to software have the RE activities from the related domain processes,
programming introduced by ML challenges the established such as Cross-Industry Standard Process for Data Mining
software development process and best practices. The ML (CRISP-DM) and Knowledge Discovery in Databases
software development process is characterized by its data- (KDD), been detailed sufficiently by the RE and ML commu-
centricity, non-linearity, and multiple feedback loops between nities [10]. All of the above was a motivation for this article
stages, which can become even more complex in systems with which tries to answer the following research questions:
multiple ML components that interact in complex ways [7]. 1) Are conventional RE activities relevant to the ML
Previous experience has shown that while developing and development process, what challenges does this pro-
deploying ML systems can sometimes be a relatively fast and cess bring them, and what are their necessary adjust-
inexpensive process, maintaining such systems over time can ments to better fit into this process?
be challenging and costly, mainly because ML systems are 2) What types of requirements are particularly impor-
prone to accumulating hidden technical debt [6]. In addition tant in addressing the ML systems specifics, do these
to engineering challenges, AI systems introduce a new set specifics affect the conventional understanding of the
of challenges related to predicting their exact behavior in requirements, and how should this understanding be
different situations, predicting their effect on individuals or adjusted?
society, and ensuring their trustworthiness. Sometimes it can The article answers the research questions through a
be challenging to predict the behavior and outcomes of AI mixed-method study, i.e., (1) a review of previously published
systems precisely because of their complexity, susceptibility literature in the fields of ML and RE, and (2) a case study
to imperfections of the data they learn from, the difficulty involving a research project of the authors of this article [11].
in interpreting the functional processes that generate their The mixed-method study was primarily motivated by the lack
output, as well as any new behavior arising from their inter- of practical examples of (1) RE activities in ML projects and
actions with the world or changes in their environment [8]. (2) requirements specifications for ML systems reported in
In that context, the research literature has reported an example the literature. The case study gave us the opportunity to share
of bias in a commercial ML system that had been discov- the challenges we faced during the RE activities in a research
ered only after the system had been released for use, and project involving ML, our approach to dealing with those
negative user experiences had been reported [9]. Develop- challenges, and excerpts from the requirements specification
ing an appropriate solution to a real problem through ML for the developed ML system.
is a complex process that requires meticulous analysis of The objectives of this article are the following:
the system capabilities, behavior, risks, limitations, qualities, 1) Emphasizing the importance of RE activities in dealing
and intended/unintended use cases. It also requires analysis with the complexity of ML systems.
of the potential trade-offs between the stakeholders’ (some- 2) Analyzing the aspects of conventional RE that need to
times too high) expectations and their feasibility constrained be adjusted to the ML specifics.
by the available data and resources, between the aspiration 3) Giving an overview and sharing our experiences on
for higher model accuracy (often leading to higher model this relatively unexplored to date, but, in our opinion,
complexity) and the compliance with quality, ethical, and important topic.
legal constraints, between the time spent on experimenting The rest of the article is organized as follows. First,
and the expected time to delivery of an initial value to the an overview of the related work on RE for AI/ML systems
stakeholders, to name just a few. is presented. Next, a description of the methodology used to
The analysis of ML systems’ feasibility, the formulation identify relevant articles for the research questions is given.
of their important quality, ethical, and legal attributes, their Two sections dedicated to answering the research questions
limitations, constraints, and risks, the decisions on the accept- follow. The article concludes with a discussion of the most
able trade-offs, and the choice of system validation strategies important findings and a conclusion.
in agreement with their stakeholders are all activities that
belong to the requirements engineering (RE) stage in the II. RELATED WORK
conventional software development process. This leads to The number of research articles dedicated to RE for AI
the conclusion that RE activities are as crucial to the ML and ML systems is relatively small to date, as noted
development process as they are to the conventional one. in [10] and [12] also. Furthermore, in their review Martínez-
However, when ML components replace conventionally pro- Fernández et al. [13] have identified only one article,
grammed ones, the software requirements should correspond i.e., [10], that covers the whole RE process. For these reasons,
to the different development process and the ML specifics. this section includes articles that are not entirely dedicated to
Otherwise, the consequences of incorrectly engineered or RE for AI or ML systems but which mention this process as
missing requirements may be even greater in the case of part of the broader software engineering process they analyze.
ML systems, given the effects these systems may have on Belani et al. [14] discuss the RE challenges in the devel-
individuals, the control mechanisms they require, and the opment of systems which the authors call AI-based complex