0% found this document useful (0 votes)
3 views

Copy of Software Engineering AI Vol 17 - Roger Lee

Volume 1153 of 'Studies in Computational Intelligence' focuses on advancements in computational intelligence, covering theory, applications, and design methods across various fields. It includes peer-reviewed research papers on topics such as machine learning for software error detection, resource allocation in cluster scheduling, and the use of neural networks for data analysis. The series aims for rapid dissemination of research findings with a global reach and is indexed by major databases.

Uploaded by

sccm.elibrary
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Copy of Software Engineering AI Vol 17 - Roger Lee

Volume 1153 of 'Studies in Computational Intelligence' focuses on advancements in computational intelligence, covering theory, applications, and design methods across various fields. It includes peer-reviewed research papers on topics such as machine learning for software error detection, resource allocation in cluster scheduling, and the use of neural networks for data analysis. The series aims for rapid dissemination of research findings with a global reach and is indexed by major databases.

Uploaded by

sccm.elibrary
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 220

Volume 1153

Studies in Computational Intelligence

Series Editor
Janusz Kacprzyk
Polish Academy of Sciences, Warsaw, Poland

The series “Studies in Computational Intelligence” (SCI) publishes new


developments and advances in the various areas of computational
intelligence—quickly and with a high quality. The intent is to cover the
theory, applications, and design methods of computational intelligence, as
embedded in the fields of engineering, computer science, physics and life
sciences, as well as the methodologies behind them. The series contains
monographs, lecture notes and edited volumes in computational intelligence
spanning the areas of neural networks, connectionist systems, genetic
algorithms, evolutionary computation, artificial intelligence, cellular
automata, self-organizing systems, soft computing, fuzzy systems, and
hybrid intelligent systems. Of particular value to both the contributors and
the readership are the short publication timeframe and the world-wide
distribution, which enable both wide and rapid dissemination of research
output.
Indexed by SCOPUS, DBLP, WTI AG (Switzerland), zbMATH,
SCImago.
All books published in the series are submitted for consideration in Web
of Science.
Editor
Roger Lee

Software Engineering, Artificial


Intelligence, Networking and
Parallel/Distributed Computing
Volume 17
Editor
Roger Lee
Computer Science Department, Software Engineering and Information
Technology Institute, Mt Pleasant, MI, USA

ISSN 1860-949X e-ISSN 1860-9503


Studies in Computational Intelligence
ISBN 978-3-031-56387-4 e-ISBN 978-3-031-56388-1
https://doi.org/10.1007/978-3-031-56388-1

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2024

This work is subject to copyright. All rights are solely and exclusively
licensed by the Publisher, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any
other physical way, and transmission or information storage and retrieval,
electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service


marks, etc. in this publication does not imply, even in the absence of a
specific statement, that such names are exempt from the relevant protective
laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice
and information in this book are believed to be true and accurate at the date
of publication. Neither the publisher nor the authors or the editors give a
warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The
publisher remains neutral with regard to jurisdictional claims in published
maps and institutional affiliations.
This Springer imprint is published by the registered company Springer
Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham,
Switzerland
Foreword
The main purpose of this book is to seek peer-reviewed original research
papers on the foundations and new developments in Networking and
Parallel/Distributed Computing Systems. The focus will also be on
publishing in a timely manner, the results of applying new and emerging
technologies originating from research in Networking and
Parallel/Distributed Computing Systems. The findings of this book can be
applied to a variety of areas, and applications can range across many fields.
The papers in this book were chosen based on review scores submitted
by members of the editorial review board and underwent rigorous rounds of
review.
We would like to thank all contributors including all reviewers, and all
editorial board members of this book for their cooperation in helping to
publish this book.
It is our sincere hope that this book provides stimulation and inspiration,
and that it will be used as a foundation for works to come.
Qiuxiang Yang
Simon Xu
Lizhen Fu
July 2023
Editorial Review Board
Kiumi Akingbehin, University of Michigan, USA
Yasmine Arafa, University of Greenwich, UK
Jongmoon Baik, Korea Advanced Institute of Science and Technology,
South Korea
Ala Barzinji, University of Greenwich, UK
Radhakrishna Bhat, Manipal Institute of Technology, India
Victor Chan, Macao Polytechnic Institute, Macao
Morshed Chowdhury, Deakin University, Australia
Alfredo Cuzzocrea, University of Calabria, Italy
Hongbin Dong, Harbin Engineering University, China
Yucong Duan Hainan, University, China
Zongming Fei, University of Kentucky, USA
Honghao Gao, Shanghai University, China
Cigdem Gencel Ambrosini, Ankara Medipol University, Italy
Gwangyong Gim, Soongsil University, South Korea
Takaaki Goto, Toyo University, Japan
Gongzhu Hu, Central Michigan University, USA
Wen-Chen Hu, University of North Dakota, USA
Naohiro Ishii, Advanced Institute of Industrial Technology, Japan
Motoi Iwashita, Chiba Institute of Technology, Japan
Kazunori Iwata, Aichi University, Japan
Keiichi Kaneko, Tokyo University of Agriculture and Technology, Japan
Jong-Bae Kim, Soongsil University, South Korea
Jongyeop Kim, Georgia Southern University, USA
Hidetsugu Kohzaki, Kyoto University, Japan
Cyril S. Ku, William Paterson University, USA
Joonhee Kwon, Kyonggi University, South Korea
Sungtaek Lee, Yong in University, South Korea
Weimin Li, Shanghai University, China
Jay Ligatti, University of South Florida, USA
Chuan-Ming Liu, National Taipei University of Technology, Taiwan
Man Fung Lo, The University of Hong Kong, Hong Kong
Chaoying Ma, Greenwich University, UK
Prabhat Mahanti, University of New Brunswick, Canada
Tokuro Matsuo, Advanced Institute of Industrial Technology, Japan
Mohamed Arezki Mellal, M’Hamed Bougara University, Algeria
Jose M. Molina, Universidad Carlos III de Madrid, Spain
Kazuya Odagiri Sugiyama, Jogakuen University, Japan
Takanobu Otsuka, Nagoya Institute of Technology, Japan
Anupam Panwar, Apple Inc., USA
Kyungeun Park, Towson University, USA
Chang-Shyh Peng, California Lutheran University, USA
Taoxin Peng, Edinburgh Napier University, UK
Isidoros Perikos, University of Patras, Greece
Laxmisha Rai, Shandong University of Science and Technology, China
Fenghui Ren, University of Wollongong, Australia
Kyung-Hyune Rhee, Pukyong National University, South Korea
Abdel-Badeeh Salem, Ain Shams University, Egypt
Toramatsu Shintani, Nagoya Insutitute of Technology, Japan
Junping Sun, Nova Southeastern University, USA
Haruaki Tamada, Kyoto Sangyo University, Japan
Takao Terano, Tokyo Institute of Technology, Japan
Kar-Ann Toh, Yonsei University, South Korea
Masateru Tsunoda, Kindai University, Japan
Trong Van Hung, Vietnam Korea University of Information and
Communications Tech., Viet Nam
Shang Wenqian, Communication University of China, China
John Z. Zhang, University of Lethbridge, Canada
Rei Zhg, Tongji University, China
Contents
Develop a System to Analyze Logs of a Given System Using Machine
Learning
Md. Tarek Hasan, Farzana Sadia, Mahady Hasan and
M. Rokonuzzaman
Study on Locality, Fairness, and Optimal Resource Allocation in
Cluster Scheduling
Cherindranath Reddy Vanguru
Digital Word-of-Mouth and Purchase Intention.​An Empirical Study in
Millennial Female and Consumers
Melissa del Pilar Usurin-Flores, Miguel Humberto Panez-Bendezú and
Jorge Alberto Vargas-Merino
Alignment of Business Process and Information System Models
Through Explicit Traceability
Aljia Bouzidi, Nahla Zaaaboub Haddar and Kais Haddar
An Assessment of Fintech for Open Banking:​Data Security and
Privacy Strategies from the Perspective of Fintech Users
Amila Munasinghe, Srimannarayana Grandhi and Tasadduq Imam
Using Key Point Detection to Extract Three Dimensional Phenotypes
of Corn
Yuliang Gao, Zhen Li, Seiichi Serikawa, Bin Li and Lifeng Zhang
Optic Cup Segmentation from Fundus Image Using Swin-Unet
Xiaozhong Xue, Linni Wang, Ayaka Ehiro, Yahui Peng and Weiwei Du
From Above and Beyond:​Decoding Urban Aesthetics with the Visual
Pollution Index
Advait Gupta, Manan Padsala, Devesh Jani, Tanmay Bisen,
Aastha Shayla and Susham Biswas
Subcellular Protein Patterns Classification Using Extreme Gradient
Boosting with Deep Transfer Learning as Feature Extractor
Manop Phankokkruad and Sirirat Wacharawichanant
Building a Shapley FinBERTopic System to Interpret Topics and
Articles Affecting Stock Prices
Yoshihiro Nishi and Takahashi Hiroshi
Can a Large Language Model Generate Plausible Business Cases from
Agent-Based Simulation Results?​
Takamasa Kikuchi, Yuji Tanaka, Masaaki Kunigami, Hiroshi Takahashi
and Takao Terano
Analyzing the Growth Patterns of GitHub Projects to Construct Best
Practices for Project Managements
Kaide Kaito and Haruaki Tamada

OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed
Computing , Studies in Computational Intelligence 1153
https://doi.org/10.1007/978-3-031-56388-1_1

Develop a System to Analyze Logs


of a Given System Using Machine
Learning
Md. Tarek Hasan1 , Farzana Sadia1 , Mahady Hasan1 and
M. Rokonuzzaman2
(1) Independent University, Bangladesh, Plot 16 Block B, Bashundhara
R/A, Dhaka, Bangladesh
(2) North South University, Plot 15 Block B, Bashundhara R/A, Dhaka,
Bangladesh

Md. Tarek Hasan (Corresponding author)


Email: [email protected]

Farzana Sadia
Email: [email protected]

Mahady Hasan
Email: [email protected]

M. Rokonuzzaman
Email: [email protected]

Abstract
Software error detection is a critical aspect of software development.
However, due to the lack of time, budget, and workforce, testing
applications can be challenging, and in some cases, bug reports may not
make it to the final stage. Additionally, a lack of product domain knowledge
can lead to misinterpretation of calculations, resulting in errors. To address
these challenges, early bug prediction is necessary to develop error-free and
efficient applications. In this study, the author proposed a system that uses
machine learning to analyze system error logs and detect errors in real time.
The proposed system leverages imbalanced data sets from live servers
running applications developed using PHP and Codeigniter. The system
uses classification algorithms to identify errors and suggests steps to
overcome them, thus improving the software’s quality, reliability, and
efficiency. Our approach addresses the challenges associated with large and
complex software where it can be difficult to identify bugs in the early
stages. By analyzing system logs, we demonstrate how machine learning
classification algorithms can be used to detect errors and improve system
performance. Our work contributes to a better understanding of how
machine learning can be used in real-world applications and highlights the
practical benefits of early bug prediction in software development.

1 Introduction
In today’s data-driven world, data plays a significant role in almost every
aspect of our lives. As more and more businesses rely on data analysis to
drive decision-making, the importance of accurate and error-free data has
become increasingly apparent [1]. Large-scale data sets are collected and
evaluated across various industries, and the problem of errors within those
data sets has become more pressing. One common source of data for
software applications is system logs, which often contain valuable
information and error messages. However, many researchers and analysts
overlook the importance of proper data cleaning and preparation before
analysis [1].
This paper focuses on analyzing system logs to identify and address
errors using machine learning techniques. The goal is to provide guidelines
and solutions for optimizing software performance and reliability. By
leveraging machine learning algorithms, this study aims to predict errors
and provide solutions to prevent instability or inconsistency in the software
during the development life cycle.
To achieve this goal, the proposed framework includes a search-based
testing approach using deep neural networks. The framework incorporates
strategies for code embedding, refactoring policy, mutation testing, and
evaluating test cases. The system logs are preprocessed and analyzed using
machine learning algorithms to identify patterns and predict errors. The
resulting guidelines and solutions can be applied to any application logs,
and the proposed data mining export code can be easily adapted for use in
other settings.
Ultimately, the goal of this study is to contribute to the development of
error-free software by providing a set of cleansed data for further
investigation and analysis. By focusing on data quality and leveraging
machine learning techniques, we hope to improve software performance,
reliability, and efficiency.
The problem of the paper is related to the issues faced by companies
due to the delivery of software without proper testing [15] and quality
checking, resulting in increased development costs. Additionally, there are
difficulties in creating unique or unfamiliar business transactions in the
testing environment. Furthermore, even if the software is running well,
errors in logs may arise due to not following proper standards or unknown
process cycles. These errors may not stop the application’s execution, but
the author creates a considerable amount of logs in the production server,
which affects performance and stability. Missing data integrity can also
create errors and faults in transactions. Therefore, the paper aims to deal
with these log issues and provide guidelines for their resolution without
human intervention. The authors propose a unified algorithm for data
cleansing, and the focus is on analyzing the logs of systems running in the
production environment. The authors suggest machine learning algorithms
to detect errors and propose solutions to optimize them to create an error-
free application. The author aims to create a framework for analyzing big
data to improve fault detection and problem identification.
The authors aim to develop a unified algorithm that can resolve data
quality issues in unclean logs without the need for human intervention or
master data [2]. The focus of their study is on analyzing the logs of running
systems in the production environment to provide guidelines and solutions
for optimizing errors and creating error-free applications. They run machine
learning algorithms to identify errors in the logs and suggest solutions to
overcome them. The authors also emphasize the need for a framework for
analyzing big data to improve fault detection and problem identification
during data preprocessing [1].
2 Literature Review
In recent years, the use of machine learning for log analysis has gained
significant attention in the field of software engineering [3]. Researchers
have proposed various techniques and models for log analysis to improve
software reliability, performance, and maintainability.
One approach is to use clustering algorithms to group log messages
based on their similarity, which can help to identify common patterns and
anomalies in the log data [4]. Another approach is to use classification
algorithms to detect and categorize different types of log messages, such as
errors, warnings, and informational messages [5].
Researchers have also explored the use of natural language processing
(NLP) techniques for log analysis, such as topic modeling and sentiment
analysis, to gain insights into the causes of log messages and to identify
potential areas for improvement [6].
In addition to machine learning, researchers have also proposed other
approaches for log analysis, such as rule-based systems and pattern-
matching techniques [3]. Rule-based systems use predefined rules to detect
specific types of log messages, while pattern-matching techniques search
for specific patterns or sequences of log messages that may indicate a
problem.
Despite the various approaches proposed for log analysis, there are still
challenges and limitations in this field. One challenge is the complexity and
variability of log data, which can make it difficult to develop accurate
models and algorithms [4]. Another challenge is the lack of labeled data for
training and testing machine learning models, which can limit their
effectiveness [5].
Despite these challenges, the potential benefits of log analysis using
machine learning are significant, including improved software quality,
reduced downtime, and increased productivity [6].

3 Research Design
The aim of this research is to detect and classify system errors by using a
Random Forest feature selection algorithm, to minimize the error rate and
predict possible error solutions based on current errors.
3.1 Data Collection
The system log files will be collected from a live-running application that is
privately available. The data will be selected from a specific time period to
ensure consistency in the data. The dataset will be cleaned using the random
sampling technique to reduce the size of the dataset and make it more
manageable. Those applications are creating a huge number of log files [2]
where authors have seen many error tags. Since the logs are huge and there
is a lack of a big data handling machine. The author picked up 1 week of
data from 2021, which are about 921MB,

3.2 Data Pre-processing


Data pre-processing is an essential step to ensure data quality and prepare
the dataset for analysis. The data will be translated into a structured format
and validated to handle missing values correctly. Data cleaning techniques
will be used to remove null values and incomplete data. To achieve high
accuracy, it is also necessary to handle missing values correctly [1]. To
programmatically prepare those files, one must first convert the raw log file
to a CSV file [7]. An algorithm has been developed to mine the dataset and
extract only error logs.
Example of a Computer Program in PHP:
3.3 Feature Selection
Feature selection is a crucial step in the data mining process, which
involves identifying and selecting the most relevant features for analysis.
Random Forest feature selection will be used to select the most relevant
features for analysis. The algorithm will be trained on the dataset, and the
most important features will be selected for analysis.

3.4 Error Detection and Classification


The selected features will be used to detect and classify system errors. The
algorithm will be trained to predict possible error solutions based on current
errors. The error logs will be categorized into groups based on the feature
selection method to identify patterns and relationships between errors.

3.5 Evaluation
The performance of the algorithm will be evaluated based on accuracy and
precision. The results will be compared to existing methods for detecting
and classifying system errors. The data was not organized because it was
gathered in its raw form, from sources. It contains null values, incomplete
data, some missing values, and a sampling date format that is inconsistent.
Thus, data analysis had to be performed using any relevant language and
algorithm to check the dirtiness of the data, and then a well-structured
dataset could be generated using data cleaning techniques, algorithms, and
procedures, which could then be used for analysis or visualization [8].
The research will provide insights into the causes of system errors and
offer possible solutions to minimize the error rate. The Random Forest
feature selection algorithm will be useful in identifying the most relevant
features for analysis, and the error classification system will provide a
framework for predicting possible error solutions. The results of this
research will contribute to the field of data mining and error analysis.
The study will only consider a specific time period and may not provide
a comprehensive analysis of system errors. The dataset is based on a live-
running application, and the results may not be generalizable to other
applications. The study may also face computational limitations due to the
large size of the dataset (see Fig. 1).
Fig. 1 Data analysis model

4 Findings
To achieve a clean, error-free data set for analysis in machine learning, pre-
processing raw data is a critical step, as outlined by Haider et al. [11]. In
this study, customized algorithms were employed to clean the data, and
Python or R are recommended for this task due to their built-in libraries for
statistical analysis and interpretation, as noted by Hossen and Sayeed [1].
The data analyzed in this study covers a one-week period, during which a
total error count of 4042852 was observed.
Upon analysis of the cleaned data set, it was found that only warning
and notice-type errors occurred, which did not cause the application to stop
executing any script. These types of errors are non-fatal and do not halt
script execution. The cleaned data set contains various columns related to
the data, such as Title (log type), Type (error type), the affected variable and
line number, error, and filename. Table 1 provides an overview of our
cleaned data.
Table 1 Mined data set
Title Type Variable, line File
ERROR Notice payment, 192 /controller/./billing.php
ERROR Notice courses, 119 /controller/./billing.php
ERROR Notice semester, 120 /controller/./billing.php
ERROR Warning LastName, 63 /controller/./landing.php
Table 2 illustrates the error frequency which has been generated using
Anaconda a big data handling tool. The first column contains information
about the error type, then the second column represents the error file
location and the third column shows how many errors occurred on those
files.
Table 2 Error type and frequency

Error type Occurred Frequency


Notice Controller and view files 36,40,423
Warning View file 3,99,861

According to the information author gathered, the following errors were


discovered, mentioned in Table 3.
Table 3 Most occurred error

1 Invalid argument supplied for foreach()


2 Trying to get property of non-object
3 Undefined property

To represent the analyzed data insight visualization software such as


Tableau, Power BI, or Rapid Miner could be used [8]. As the author is
familiar with Rapid Miner, it has been used to analyze data.
Table 4 Occurred errors

Errors Frequency Fraction


Undefined property 2494821 0.62
Get property of non-object 1038676 0.21
Invalid argument foreach() 358041 0.09
Table 4 contains information about different types of errors and their
absolute value fraction in a system. There are three types of errors listed in
the table: “Undefined property,” “get property of non-object,” and “Invalid
argument foreach().” For each type of error, the table provides the number
of occurrences (absolute value) and the fraction of their occurrence in the
system. The total number of errors listed in the table is 3, and the total
number of occurrences is 4,005,538.
Table 5 Most occurred error

Title Errors type


ERROR Notice
ERROR Notice
ERROR Warning
INFO Some text
INFO Some text
ERROR Warning
ERROR Notice

Table 5 is used to detect system total error and non-error count which
has shown in Fig. 2.
Fig. 2 Error rate
Raw data is often incomplete, inconsistent, and redundant, making it
unsuitable for direct data mining. Therefore, advanced analysis techniques
are required to process the data [11]. In this study, the author decided to use
one day of acquired data and apply machine-learning techniques [1]. The
data had a shape of (475680, 5), but since the CSV file did not provide
numeric data, a method for transforming nominal data into numeric features
was used [13]. To detect whether each row of the data represents an error or
not, the raw data had to be converted into a numerical format [14]. Excel
was used to generate the desired pattern, which can be seen more clearly in
Fig. 3 and Table 6.
Table 6 Error pattern mining

Log tag Type – – –


ERROR Notice 1 1 1
INFO Text 0 4 0
INFO Text 0 4 0
ERROR Notice 1 1 1
ERROR Warning 1 2 1
ERROR Warning 1 2 1

The inclusion of the affected file name and line number in the cleaned
data set mentioned in Table 1 is expected to be beneficial for future
research. With this information, an AI system can locate faulty files and
affected variables mentioned there, and suggest solutions for the identified
issues. In addition, a new approach called Deep Check can be proposed for
testing Deep Neural Networks (DNN) using symbolic execution and
program analysis. Deep Check uses a white-box technique to facilitate
symbolic execution and identify critical elements in the DNN [9].
In order to utilize the data for analysis, it was transformed into a
structured format. To ensure its reliability, the information was verified and
checked for any instances of missing data [1].
Our proposed methodology requires the use of supervised learning, with
logistic regression being the preferred model due to our data’s pattern.
Given the large input data, it is more convenient to create a prediction
model. To ensure the accuracy of the preprocessing, mining patterns, and
analysis, we selected a single day’s data to train and test the logistic
regression model. The system must differentiate between errors and non-
errors in the selected data. We split the data into training and test sets, with
20% used for testing and 80% for training, resulting in a total dataset of
(475680, 2), with (380544, 2) for training and (95136, 2) for testing.

4.1 Model Training


In order to proceed with the training phase, it is important to select a
suitable data source and identify key features that are critical to the process.
The accuracy of the trained model must then be evaluated to ensure that it
produces reliable results. Subsequently, the ML model can be utilized to
make predictions based on data analysis [1].

Fig. 3 Training process

It is suggested to use a technique such as oversampling or


undersampling to balance the distribution of the imbalanced data during
model training. Oversampling involves increasing the number of instances
of the minority class, while undersampling involves decreasing the number
of instances of the majority class. Both techniques can help to ensure that
important information is not lost during the elimination of examples.
Additionally, the authors wanted to consider using a more sophisticated
sampling technique such as SMOTE (Synthetic Minority Over-sampling
Technique) or ADASYN (Adaptive Synthetic Sampling) to create synthetic
instances of the minority class, which can help to improve the balance of
the dataset. By incorporating these techniques, the authors can improve the
quality of their model and ensure that it produces accurate results [10].

4.2 Model Evaluation


Precise information is a crucial factor in information analytic execution due
to location and anticipation. The author observed a training data accuracy of
0.83 after removing the complete out layer, indicating the accuracy of their
hypothesis. The high accuracy can be attributed to the thorough
preprocessing and mining of the data sets. Eliminating the outer layers
resulted in a higher accuracy of 87% in clean data sets. The Random Forest
classifier was used to train and construct a model to recognize data quality
from the dataset [1]. After the removal of the entire outlier and the
evaluation of Random Forest, the application achieved an accuracy of 83%.
This high accuracy was possible due to the high quality of the processed
data. Therefore, the conclusion is drawn that to achieve high accuracy,
reliable data is necessary. A distributed algorithm like mining outliers can
be used to make data sets more reliable.
The results of the ML analysis showed that errors in the log file can
indicate errors in the system. Therefore, by predicting errors in the log files,
the author can detect errors in the system.

5 Proposed Model
The proposed solution suggested by the authors includes building a data-
cleaning function using machine learning algorithms to predict potential
errors in data. To further improve the application’s quality, the authors
suggest creating loosely coupled modules and classes with global variable
declarations and high cohesion. The actual code-writing process should
involve array or variable declarations, type declarations of variables, isset
checks, empty value checks, and type checks before using any variables.
Finally, a Unit test is recommended to optimize unwanted errors.
To incorporate this proposed solution, developers can follow these
guidelines during the development process of an application or feature to
reduce the chances of errors and bugs. By creating loosely coupled modules
and classes with global variable declarations, developers can make sure that
their code is easy to maintain and update. Moreover, by following the
suggested coding practices, such as array or variable declarations, type
declarations of variables, isset checks, empty value checks, and type checks
before using any variables, developers can ensure that their code is error-
free and more reliable. Finally, by conducting a Unit test, developers can
identify and optimize any unwanted errors, further improving the
application’s overall quality.
Overall, incorporating these guidelines into the development process
can optimize the production cost and time, make the application scalable,
reliable, and faster, and ultimately, ensure an error-free log or output (see
Fig. 4).
Fig. 4 Proposed model

6 Conclusions
The purpose of this study was to create a data cleaning function that can
improve data quality by identifying and predicting errors. The research
question focused on whether machine learning techniques can be used to
improve data quality in software applications. The literature review
highlighted the importance of data cleaning in machine learning and the use
of various techniques such as classification models to identify and correct
errors in software applications.
The study analyzed one week’s worth of data and found a total of
4042852 errors, which were mainly warning and notice-type errors. The
data cleaning function developed by the authors was able to identify and
correct these errors, leading to improved data quality. The authors also
provided guidelines for developing error-free software applications,
including loosely coupled module creation, global variable declaration, and
unit testing.
Future work could focus on expanding the scope of the study to include
data from multiple sources and different time periods. The authors could
also explore the use of other machine learning techniques, such as
clustering, to identify and correct errors in software applications. Also,
more data structuring recommendations will be included. Integrity
constraints (IC), such as Functional Dependencies (FD), can be used in
conjunction with Machine Learning to classify the type of error to be
captured in the event of a data set with an inaccurate value [1]. To fully
appreciate the data’s potential for bringing significant benefits to a variety
of businesses, it is necessary to learn from it [11]. Some “Context of source
code processing” will be considered where a mutation in the context plays
as refactoring source code and 1-Time, K-Time mutation will be played an
important role in the concept [12]. In that study, the author will focus on a
few research questions, and the author will try to solve them one by one by
considering best practices.
In conclusion, this study demonstrated the importance of data cleaning
in machine learning and provided a solution for improving data quality in
software applications. By using machine learning techniques and following
the guidelines provided by the authors, software developers can create
error-free and efficient applications that are scalable, reliable, and faster.
Future research in this area could lead to further improvements in data
quality and software development practices.

References
1. Hossen J, Sayeed S (Sep 2018) Modifying cleaning method in big data analytics process using
random forest classifier. In: 2018 7th (ICCCE). IEEE, pp 208–213
2.
Al-Janabi S, Janicki R (Jul 2016) A density-based data cleaning approach for deduplication with
data consistency and accuracy. In: 2016 SAI computing conference (SAI). IEEE, pp 492–501
3.
Mehra S, Verma R (2019) An approach towards log analysis of large-scale systems. Int J Comput
Appl 181(41):18–22
4.
Li Q, Zhao C, He X (2016) A clustering-based log analysis approach for improving software
reliability. J Syst Softw 118:197–212
5.
Xu X, Li Y, Wang Y, Li X, Li B (2018) Log classification based on multi-view feature learning.
Inf Softw Technol 98:126–139
6.
Zhou M, Zhang Z, Li Y, Zhang H (2019) Log analysis using natural language processing
techniques: a survey. J Syst Softw 151:99–115
7.
Dimov T, Orozova D (Jun 2020) Software for data cleaning and forecasting. In: 2020 21st
international symposium on electrical apparatus & technologies (SIELA). IEEE, pp 1–4
8.
Kumar V, Khosla C (Jan 2018) Data cleaning-a thorough analysis and survey on unstructured
data. In: 2018 8th international conference on cloud computing, data science & engineering
(confluence). IEEE, pp 305–309
9.
Wardat M, Le W, Rajan H (May 2021) DeepLocalize: fault localization for deep neural
networks. In: 2021 IEEE/ACM 43rd international conference on software engineering (ICSE).
IEEE, pp 251–262
10.
Gao K, Khoshgoftaar TM, Napolitano A (Dec 2012) A hybrid approach to coping with high
dimensionality and class imbalance for software defect prediction. In: 2012 11th international
conference on machine learning and applications, vol 2. IEEE, pp 281–288
11.
Haider SN, Zhao Q, Meran BK (Jul 2020) Automated data cleaning for data centers: a case
study. In: 2020 39th Chinese control conference (CCC). IEEE, pp 3227–3232
12.
Pour Maryam V, Zhuo L, Lei M, Hadi H (2021) A search-based testing framework for deep
neural networks of source code embedding. Calgary, Canada. https://​doi.​org/​10.​1145/​1188913.​
1188915
13.
Zdravevski E, Lameski P, Kulakov A, Kalajdziski S (Sep 2015) Transformation of nominal
features into numeric in supervised multi-class problems based on the weight of evidence
parameter. In: 2015 federated conference on computer science and information systems
(FedCSIS). IEEE, pp 169–179
14.
Liu H, Ashwin KTK, Thomas JP, Hou X (2016) Cleaning framework for BigData
15.
Hasan MT, Mahal SN, Bakar NMA, Hasan MM, Islam N, Sadia F, Hasan M (2022) An unified
testing process framework for small and medium size tech enterprise with incorporation of
CMMI-SVC to improve maturity of software testing process. In: ENASE, pp 327–334

OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed
Computing , Studies in Computational Intelligence 1153
https://doi.org/10.1007/978-3-031-56388-1_2

Study on Locality, Fairness, and Optimal


Resource Allocation in Cluster Scheduling
Cherindranath Reddy Vanguru1
(1) CSE (Computer Science Engineering), Bangalore Institute of
Technology, VTU (Visvesvaraya Technological University),
Bengaluru, Karnataka, India

Cherindranath Reddy Vanguru


Email: [email protected]

Abstract
In the field of computing, evaluating system performance is critical. The
need for new scheduling paradigms, particularly in the context of large
cluster computing systems (Lu et al. in IEEE Trans Parallel Distrib Syst
34:1145–1158, 2023), is paramount. The authors will examine the
feasibility of multiple techniques that strive to impart a fair distribution of
resources across systems, with the endgame being to minimize latency.
Moreover, the authors will explore the feasibility of devising solutions with
varying degrees of enhanced data locality, which has consistently been the
backbone of efficient scheduling. Another direction of the study will
investigate the advantages and disadvantages of two contrasting
approaches: the first of these will look into the efficacy of erasing specific
jobs and freeing up resources for the next set of tasks, whereas the
alternative will attempt to finish pending jobs before tackling the remaining
assignments. Finally, the authors conclude that they abided by professional
norms in extracting and analyzing the results of their experiments
(Wennmann et al. in Investig Radiol 58(4):253–264, 2023).
Keywords Cluster scheduling – Latency – Distribution function – Locality
– Fair sharing

1 Introduction
Facebook is one of the most prominent data collectors on the internet [3].
Facebook stores its data in a data warehouse that was over 2 PB in size and
increased by 15 TB each day. Due to the company’s nature, many
researchers and developers execute various data processing tasks
simultaneously. In the example of Facebook, these jobs can be short ad-hoc
queries through Hive [4] or large multi-hour machine learning jobs.
Therefore, it is crucial to implement a scheduling system that provides
fair sharing, i.e. that all resources are distributed fairly, thus decreasing
latency. With this goal in mind, M. Zaharia et al. developed the Hadoop
Fair Sched uler (HFS) [5]. The authors also found a way of increasing data
locality which improved the system’s overall throughput.
This analysis will have a detailed look into the experiments that the
authors ran. We will also reproduce one of the experiments.

1.1 Delay Scheduling


In their paper [6], Zaharia et al. explain their process of development of
“Delay Scheduling”—the theoretical background of HFS—and what
experiments were run to test its performance.
The authors observed that fairness could be achieved by instead of
killing jobs to free up resources, waiting for jobs to finish can also be
sufficient to have enough available resources in a large cluster. Many jobs
finish every second, always freeing up some space that was not available
when the job that needs to be scheduled was enqueued. In combination with
this, Zaharia et al. found that it can be faster to wait for available resources
on a node that can supply data locality, rather than a node with available
computing resources but where data has to be transferred.
An important point that is stressed is Hierarchical Scheduling.
Hierarchical scheduling ensures that urgent and production jobs have a
predictable running time. These jobs should therefore be treated with a
higher priority by the scheduler than, for instance, an experimental machine
learning jobs.
2 Experiments
In this section, we will take a closer look at what kind of experiments were
conducted in the scope of [6]. We will look at the goal of each experiment,
what kind of infrastructure was utilised and at what scale. Furthermore, we
will look at how many times the experiments were repeated and what kind
of statistical methods were applied.
First, we will look at an experiment conducted to determine the relation
of a variable and not to evaluate the paper’s results. Then our focus goes to
the evaluation section of the paper, where multiple experiments have been
conducted.

2.1 Analysis of the Influence of the Maximum Skipcount D


The authors presented a straightforward algorithm called Delay Scheduling
that effectively enforces Fair Sharing and described that if a job has
unlaunched tasks t and data on the local node, then it gets skipped until the
skipcount reaches D. At that time, it is launched on that node anyways.
With this condition, the authors want to improve the locality of jobs since a
job’s execution is delayed D times unless data for the job is present on that
node.
In their initial experiment, the authors manipulated the variable D to
find the best balance between having a job wait and achieving locality and
executing the job even if there is no local data available. Based on their
findings, the authors inferred that non-locality reduces at an exponential
rate with increasing values of D. The authors also discovered that the
waiting time necessary to attain a specific level of locality is proportional to
the average duration of a task and decreases in a linear fashion as the
number of slots per node—denoted as L—increases. This experiment
implies that waiting for resources to clear up is often more efficient than
seeking resources on other nodes, confirming their theory.
The authors did not elaborate on the infrastructure used and how many
times did they execute the experiment. We find it likely that they used a
simulation for this setup using the traces and logs of their production
system. Fortunately, Zaharia et al. disclosed more information about the
used infrastructure in future experiments, giving us reason to believe that
they used the same infrastructure for this experiment.
2.2 Evaluation of HFS and Delay Scheduling
To evaluate the results of their research, the authors ran all the following
experiments on Amazon EC2 with a 100 nodes. Each node had four 2 GHz
cores, four disks and 15 GB RAM. The EC2 nodes had four map and two
reduce slots per node.
All job sizes were quantised into nine bins, ranging from one map job in
bin one to over 1500 map jobs in bin 9. The authors adapted the distribution
of different jobs coming from a particular bin based on the number of actual
percentiles of Facebook of the bin. For example, 14% of all the jobs at
Facebook in a week in October 2009 came from bin 3, with three to twenty
maps. So also 14% of all jobs in the benchmark are from bin 3.
The authors separated the experiments into two categories, Macro- and
Microbenchmarks. While macrobenchmarks focus more on regular user
activity, microbenchmarks aim to test specific stress points of delay
scheduling or scheduling in general.

2.3 Macrobenchmarks
In October 2009, the authors measured the caseload distribution on
Facebook. The authors used this information to sample 100 jobs to simulate
realistic inter-arrival times and input sizes for a multi-user workload on a
regular day. This distribution can be seen in Table 1. Three different
workloads have been created to test different scenarios:
Table 1 How job sizes are distributed along map tasks at Facebook

Bin Maps Jobs at Facebook (%)


1 1 39
2 2 16
3 3–20 14
4 21–60 9
5 61–150 6
6 150–300 6
7 301–500 4
8 501–1500 4
9 1501 3
An intensive Input–Output caseload
A intensive Central Processing Unit caseload
A combined caseload which includes all jobs of the benchmark
The following sections will give an overview of how they created the
data to ensure that the jobs were IO and CPU heavy, respectively.
(1) IO-heavy workload: As an IO-heavy workload, the authors ran a task
looking for a rare pattern in a large dataset, so the jobs are almost entirely
bound to disk IO (Figs. 1 and 2).

Fig. 1 The cumulative distribution functions of job running times in different bin ranges indicate
that in the IO-heavy workload, fair sharing enhances performance for smaller jobs but may cause
slower processing of larger jobs. However, the introduction of delay scheduling can further enhance
performance, particularly for medium-sized jobs. From the original paper [6]

Fig. 2 Our results for the experiment shown in Fig. 1. The bin numbering is from the original
experiment’s distribution

Zaharia et al. provided no information on whether they have repeated


the experiments multiple times or not. But the error bars in Fig. 3 suggest
that various experiments had been conducted.

Fig. 3 The mean enhancement in speed obtained by implementing delay scheduling rather than the
simplistic fair sharing approach for jobs in each bin of the IO-intensive workload is illustrated, while
the black lines represent the respective standard deviations. From the original paper [6]
As mentioned before a statistical means of this experiment was the
calculation of the standard derivation.
(2) CPU-heavy workload: In order to create a task that is predominantly
CPU-bound, the authors executed each input via a costly user-dependent
function, resulting in a significantly slower job execution. Only a small
fraction, specifically 0.01% of the records were produced as output.
With this experiment the authors want to ensure that even under a CPU-
heavy load, such as Machine Learning or an in-depth analysis of the data
like clustering, delay scheduling manages to outperform the FIFO
scheduler, which was in place at that time.
(3) Mixed workload: The mixed workload experiment aims at
presenting a realistic high workload for the scheduling system. The jobs that
are submitted during this experiment contain both CPU-heavy and IO-
heavy workloads. Furthermore, the job pool also contains a variety of short
and long jobs.

2.4 Microbenchmarks
Microbenchmarks try to stress test Delay scheduling in more specific cases,
where locality is hard to achieve. These experiments try to test the quality
of the introduced scheduling method in a more controlled manner.
(1) Hierarchical scheduling: In their paper, the authors introduced a
hierarchical scheduling policy, which prioritises jobs that need to be run on
production for customers like queries etc. These jobs require a higher
priority than experimental machine learning tasks. This experiment aimed at
evaluating this scheduling policy, the Hadoop Fair Scheduler (HFS). It
attempts to assess the speed at which new tasks receive resources depending
on their level of importance.
(2) Delay scheduling with small jobs: Small jobs can pose a challenge to
scheduling systems due to the high amount of throughput they require. In
this experiment, the authors show how small jobs that are handled by a
system utilising delay scheduling compare to those that do not use delay
scheduling. They created three different filter jobs, one with three, one with
ten and one with 100 map tasks.
This experiment was not run on an Amazon EC2, but rather on a private
cluster. The private cluster also had 100 nodes, but 8 cores and 4 disks per
node. Contrary to the EC2 cluster it had 6 instead of 4 map slots and 4
instead of 2 reduce slots per node.
We are unsure why the authors picked a private cluster to run this
experiment. The private cluster has been defined before this experiment but
it was never mentioned as the utilised resource.
(3) Delay Scheduling with Sticky Slots: Earlier in the paper, the authors
introduced so called Sticky Slots. These Sticky Slots occur when a task gets
repeatedly submitted to the same slot. This happens when a job never leaves
its original slot. Sticky slots can have a negative impact on locality of a job.
In their experiment, Zaharia et al. reproduced this locality problem by
creating a 180-GB dataset, which was spread in 2 GB chunks over all 100
nodes in the EC2 Cluster. Then 5 and 50 jobs were submitted which caused
the sticky slot phenomenon to occur.

3 Reproduction of Two Macrobenchmarks


Our plan is to replicate two macrobenchmarks that involve an IO-intensive
workload on the Distributed ASCI Supercomputer 5 (DAS-5), which is a
distributed system spanning six clusters and was specifically designed by
the Advanced School for Computing and Imaging.
This section will go into more detail for the two experiments we chose
to reproduce. Furthermore, we will review our reproduction steps,
challenges encountered, and the results of the experiments.

3.1 IO-Heavy Workload—Comparing Running Times


We chose to reproduce the experiments where the authors compared the
running time’s cumulative distribution function (CDF) of a FIFO scheduler,
a scheduler utilising naïve fair scheduling and a scheduler using both fair
and delay scheduling. The results the authors achieved in their paper can be
seen in Fig. 1. It is apparent that the scheduler using FIFO fares
significantly worse in the lower 8 bins than both the scheduler utilising
naïve fair scheduling and the scheduler using both fair and delay
scheduling. However, the very long running tasks (in bin 9) are hindered by
the presence of fairness.
We decided to focus on this experiment in particular since it did not
show there to be a significant difference between the naïve fair scheduler
and the one using delay scheduling. We wanted to double-check their
results to see if the difference is this slim and possibly due to different
circumstances since Zaharia et al. presumably did not repeat it multiple
times.

3.2 Obstacles
One of the first challenges was to figure out the exact experiments the
original authors ran. They refer to the benchmarks described by Zheng Shao
[7]. However, the purpose of this suite is to compare three different data
processing systems, including Hadoop and Hive. Since Hive is built on top
of Hadoop and many of the authors are from Facebook solving a problem
affecting their company,1 it is safe to assume that they used the Hive
benchmarks for evaluation purposes.
Building these benchmarks as Java programs was also the cause of great
frustration. This was probably due to the sparse documentation and the
12 years that have passed since the publication of the original code.
However, after figuring out the exact library versions to use, environment
variables to set, and the Perl programming language, we have grown to like
the benchmark program.

3.3 Our Setup


We created a Python script for allocating a given number of nodes, and
downloading and installing Hadoop and Hive on them. A master node is
selected on which Derby is also installed and the Hadoop NameNode,
YARN, history server, Hive, and Derby are configured. On all the nodes,
HDFS [8] is also configured. The script continuously maintains an SSH
connection with the master node, on the failure of this connection, or an
interrupt, it cleans up all the nodes used: the processes are stopped and the
data is destroyed.
Another script is responsible for running the Hive version of the grep
benchmark from [7] many times. The length of the datasets is each time
determined by a distribution similar to the original experiment’s input
distribution. Nevertheless, we had far less resources at our disposal, hence,
we took the liberty to scale down the sizes which can be seen in Table 2.
Although we could have used more, we chose to only utilise 12 nodes for
the experiments. Since we were not the only group working on this
assignment, we refrained from hogging all the available nodes for extended
periods of time.
Table 2 Distribution of job sizes

Bin Maps Jobs


0 1 19
1 2 8
2 10 7
3 50 4
4 100 3
5 200 2
6 400 2
7 1200 2

The aforementioned script saves the statistics of each map task into a
separate output file. These are then converted to JSON documents that
only contain the relevant information and can be subsequently plotted using
the rest of our Python scripts.

3.4 Our Results


In Fig. 2, we can see a surprisingly close resemblance to Fig. 1. We
conclude that we and Zaharia et al. [6] must have done very similar steps.2
There are some noticeable differences though. We ran the experiment three
times and used all tasks running times for the CDF, making our charts look
smoother. Secondly, when it comes to the medium sized jobs, delay
scheduling outperformed its naive counterpart3 more noticeably.

3.5 IO-Heavy Workload—Comparing Speedup


Using the logs of the aforementioned run, we did some further analysis. We
were curious about the speedup gained from opting for a different
scheduler. The authors’ original analysis can be seen in Figs. 3 and 4. The
results of our experiment are shown in Figs. 5 and 6. The similarity between
Figs. 4 and 5 is convincing: the values are between around 1 and 1.5, and
both shapes have 2 peaks (at bin 5 and 8, and at bin 2 and 5).

Fig. 4 The mean acceleration achieved by using delay scheduling compared to the basic fair sharing
method for jobs in each bin of the IO-intensive workload is depicted, with the black lines indicating
the corresponding standard deviations. From the original paper [6]
Fig. 5 Average speedup of delay scheduling over naïve fair sharing for jobs in each bin in the IO-
heavy workload

Fig. 6 Average speedup of delay scheduling over FIFO scheduling for jobs in each bin in the IO-
heavy workload
However if we take a closer look at Fig. 3, the standard deviations are
suspiciously large. In our evaluation, the outlier deviations are gone which
might be the result of the multiple repetitions we had.
Overall, we gained more from abandoning the FIFO scheduler than
Zaharia et al. We theorise, that this might be due to the smaller sized cluster
where the scheduling opportunities are more rare as compared with their
100-node clusters.

4 Conclusion
Even though the authors did not disclose the number of repetitions of their
experiments, we found that this paper is of very high quality. We also have
to note that the authors provided no information about the time of day the
experiment or any other external factors which could have affected the
experiment, especially, when running on AWS hardware.
Zaharia et al. described the experiments with a high level of detail, and
they also explained their outcomes and consequences formidably well. The
graphs created from the experiments understandably showed the results,
and the authors made good use of all charts and tables pointing out essential
features. We reproduced a macrobenchmark with an IO-heavy workload.
The naive fair scheduler was not significantly different from the fair
scheduler with delay scheduling in their experiment.
In our repeated experiments, we came to almost the same conclusions as
the authors did when conducting the experiments giving us reason to
believe that the results in the paper are reproducible.
Overall, we conclude that the authors did an excellent job presenting
delay scheduling and showing through experimenting with different
realistic and intentionally straining experiments. These evaluations showed
how their introduced scheduling method fares in real-world applications
compared with the system in place at that time. The conducted experiments,
their consequences, and interpretations were explained and presented in an
in-depth but understandable manner.

References
1. Lu R, Zhang W, Wang Y, Li Q, Zhong X, Yang H, Wang D (2023) Auction-based cluster federated
learning in mobile edge computing systems. IEEE Trans Parallel Distrib Syst 34(4):1145–1158
2.
Wennmann M, Bauer F, Klein A, Chmelik J, Grözinger M, Rotkopf LT, Neher P, Gnirs R, Kurz
FT, Nonnenmacher T et al (2023) In vivo repeatability and multiscanner reproducibility of MRI
radiomics features in patients with monoclonal plasma cell disorders: a prospective bi-institutional
study. Investig Radiol 58(4):253–264
3.
Dwyer T (2016) Convergent media and privacy, ser. Palgrave global media policy and business.
Palgrave Macmillan
4.
Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R (2009)
Hive: a warehousing solution over a map-reduce framework. Proc VLDB Endow 2(2):1626–1629
[Crossref]
5.
Sharma S, Bharti RK (2023) New efficient Hadoop scheduler: generalized particle swarm
optimization and simulated annealing-dominant resource fairness. Concurr Comput Pract Exp
35(4):e7528
[Crossref]
6.
Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2010) Delay
scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In:
Proceedings of the 5th European conference on computer systems, ser. EuroSys’10. Association
for Computing Machinery, New York, NY, USA, pp 265–278. [Online]. https://​doi.​org/​10.​1145/​
1755913.​1755940
7.
Shao Z (2009) Hive performance benchmarks. [Online]. https://​issues.​apache.​org/​jira/​browse/​
HIVE-396
8.
Borthakur D (2007) The Hadoop distributed file system: architecture and design. Hadoop Project
Website 11(2007):21
Footnotes
1 The company responsible for Hive and also utilising it to a great extent.

2 With the caveat, that our cluster setup—similarly to the originals—is also not representative of real
life networking conditions: each node is located in the same rack.

3 We used the Hadoop Fair Scheduler with its thresholds set to zero to implement this.

OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing , Studies in
Computational Intelligence 1153
https://doi.org/10.1007/978-3-031-56388-1_3

Digital Word-of-Mouth and Purchase Intention. An


Empirical Study in Millennial Female and
Consumers
Melissa del Pilar Usurin-Flores1 , Miguel Humberto Panez-Bendezú2 and
Jorge Alberto Vargas-Merino2
(1) Faculty of Business, Universidad Privada del Norte, Lima, Peru
(2) Departments of ResearchInnovation and Social Responsibility, Universidad Privada del
Norte, Lima, Peru

Melissa del Pilar Usurin-Flores


Email: [email protected]

Miguel Humberto Panez-Bendezú


Email: [email protected]

Jorge Alberto Vargas-Merino (Corresponding author)


Email: [email protected]

Abstract
The eWOM manages to be a source of information for current and potential customers, as it
plays a significant and transcendental role in the purchase intention. The objective of this
study was to identify the impact of digital word-of-mouth on the purchase intention of
Peruvian millennial female consumers. A probabilistic sample of 355 women between 26
and 41 years old was determined, and two valid and reliable questionnaires were applied. It
was concluded that digital word-of-mouth indeed has a significant impact on the purchase
intention of millennial female consumers, which was demonstrated by χ2 tests and ordinal
logistic regression (χ2 = 0.000 < 0.05; 164.019 = 0.000 < 0.05; Nagelkerke R2 = 0.425; Wald
coefficient = 52.203 = 0.000 < 0.05), confirming that the purchase intention is explained by
the actions of digital word-of-mouth. This work highlights the importance of managing the
credibility and value of our electronic media, and within action plans adopting actions to
improve the quality and quantity of comments as antecedents to purchase and repurchase
intention.

Keywords Digital word-of-mouth – Consumers – Millennials – Purchase intention –


Regression
1 Introduction
Currently, due to the global advancement of the set of technologies that allow access,
creation, process, and communication of information, both between consumers or between
company and consumer, digital word-of-mouth (eWOM) appears, enabling consumers to
exchange opinions and experiences related to the product or service in purchase evaluation
[1].
Likewise, the eWOM manages to be a source of information for current and potential
customers, as it plays a significant and transcendental role in purchase intention, especially
in millennial consumers or users belonging to generation Y, who were born at a time of great
technological innovation, but for whom there is no agreement on the age range since, for
INCAE [2], they are those born between 1981 and 1996, while, for IPSOS [3], they are those
born between 1995 and 1981; however, for DATUM Internacional [4], those born between
1980 and 2000 belong to this generation; the Compañía Peruana de Estudios de Mercados y
Opinión Pública (CPI) [5], in turn, considers that this generation is made up of those born
between 1980 and 1995.
Despite belonging to a digital and hyperconnected generation, millennials distrust or
hesitate to make online purchases. Therefore, before making an online purchase, they review
information about brands relying heavily on online recommendations found in apps,
websites, and also from the people they know [6].
The digital era is becoming increasingly massive in the daily lives of consumers;
companies in digital platforms are looking for effective and efficient marketing strategies
that can influence the purchase intention of their customers, where their main target is the
segment belonging to the millennial generation, who like the visual content that brands post;
however, their form of communication is mainly through text. They also keep a high activity
time on digital platforms compared to other segments and give great importance to the time
factor, so online purchase is their best option. As they have purchasing power, they tend to
spend more money on virtual platforms, unlike other generations.
However, despite this, this group of consumers in Metropolitan Lima does not have a
high degree of participation in the purchase of online products compared to other
generations. Many millennials have some fear at the time of an online purchase due to the
possibility of buying a product that is different from the one seen on the digital platform or
being victims of scams; therefore, they have developed a high capacity to search for online
information that allows them to generate greater confidence during the purchase process.
Therefore, Gen Y consumers, who actively engage with digital platforms in decision-
making, are more likely to share their opinions and participate in eWOM, either positive or
negative. So, if they experience good service at a restaurant, hotel, or retail store, they
generally engage in positive eWOM within their online communities through social
networks, mobile technologies, or other communication methods. On the other hand, if they
experience poor service, they will spread negative eWOM to their friends or family to
express their complaints or emotional reactions, such as anger, unhappiness, and
dissatisfaction.
Given the above, this paper seeks to answer the following question: How does digital
word-of-mouth affect the purchase intention of millennial consumers in Metropolitan Lima
2021? Therefore, the objective of this paper is to identify the incidence of digital word-of-
mouth in the purchase intention of millennial consumers in Metropolitan Lima 2021. The
general hypothesis is that digital Word-of-mouth has a significant impact on the purchase
intention of millennial consumers. This is divided into 4 specific hypotheses that revolve
around the credibility of eWOM, the quality of eWOM, the quantity of eWOM, and the
value of eWOM and its significant impact on the purchase intention of millennial
consumers.
This research work is supported by the analysis of results and the review of different
theories to identify whether eWOM impacts the purchase intention of millennial consumers,
contributing positively and significantly to scientific knowledge with the information
obtained. From a methodological perspective, research instruments that can be replicated
and validated in different contexts are used to allow the execution of new research. The
results and findings obtained, as well as the recommendations raised, will support and
benefit in providing suggestions and viable solution alternatives to the community and
companies in digital platforms, which may involve a practical contribution from this
research.

Background

In the research conducted in Peru to determine whether there is a relationship between


eWOM on social networks and the intention to purchase urban women’s fashion, it was
concluded that there is a relationship between eWOM on social networks and purchase
intention. This relationship is involved by the value of information that comes from
comments, either from family and/or close friends. Likewise, it was found that users take
into account the product information shared by their acquaintances on social networks to
encourage purchase intention [7].
In a study conducted in Peru to determine the relationship between eWOM and purchase
intention in delivery applications in Metropolitan Lima, the variables Quality, Credibility,
Need for information, and Attitude toward the brand were evaluated. The results of this
research show that all the variables have a positive influence on purchase intention, except
for the credibility of the information. In this sense, the most relevant independent variables
are the Attitude toward the brand and the Quality of the information [8].
Social networking sites (SNSs) facilitate eWOM communication between consumers
from different cultures. Based on contact theory and the theory of planned behaviour, a
conceptual framework is proposed that integrates cross-cultural factors as predictors of
minority consumers’ engagement with eWOM communicated by and to individuals from the
dominant culture on social networking sites. A partial least squares (PLS) analysis on data
collected from the Arab–Israeli minority shows that cross-cultural factors (i.e. acculturation,
social interaction and language proficiency) are antecedents of minority consumers’
engagement with eWOM. However, this relationship is mediated by consumers’ beliefs
(attitudes, subjective norms and perceived behavioral control) regarding this behavior and
moderated by the cultural distance between minority and dominant culture consumers [9].
Review platforms are full of negativity, both from real customers with bad experiences
and fake reviews created by competitors. These negative reviews have been shown to
influence the purchasing behaviour of future consumers. Crafting a response to a poor
review that appeals to future consumers can mitigate some of the negative outcomes
associated with that review. Using a simulation-based experiment to test the influence of
three elements of a review response on purchase intent (i.e., an apology, an explanation, and
a commitment to correct the problem identified in the review). In doing so, the data show
that purchase intent increases only when a response contains all three elements [10].
The monitoring and management of eWOM is one of the main challenges facing
organizations today, due to the sheer volume and frequency of content. In this paper,
sentiment analysis is proposed as an alternative method for analyzing emotions and
behavioral intentions in real-time data. Sentiment analysis is performed on reviews of
women’s electronic clothing. This study applied artificial neural network techniques to
determine the polarity of the data in terms of positive or negative. Sentiment analysis was
performed using two artificial neural networks, Convolutional Neural Network (CNN) and
Long Short- Term Memory (LSTM). Based on the results of this study, the LSTM technique
is highly recommended for sentiment analysis of unstructured text-based user-generated
content [11].
In the research conducted in Vietnam to explore, measure, and analyze the relationship
between online convenience, online customer satisfaction, purchase intention, and electronic
word-of-mouth in customers in Ho Chi Minh City, it was concluded that customer
satisfaction influences online purchase intention and eWOM. Based on this, the solutions
needed to improve purchase intention and incentivize customers to perform eWOM are
recommended [12].
In the research conducted in Pakistan to determine the relationship between electronic
word-of-mouth and consumer purchase intention of Pakistani consumers in Lahore city, the
results showed that all hypotheses are supported since there is a great impact of eWOM on
consumer purchase intention with a mediating effect of brand image. In this context, social
networks play an important role in consumer purchase intentions as they persuade users to
visit the stores and, subsequently, to make the purchase [13].
In a research conducted in Portugal to evaluate the effect of eWOM on online purchase
intention, a research model that establishes that the relationship between these variables can
be direct and indirect through the quality of service and online trust was proposed. The
results of the empirical study allow for corroborating all the proposed hypotheses. It was
concluded that eWOM is a powerful catalyst for online purchase intention [14].
In a research study conducted in Serbia to find out the role of online customer reviews
and key elements of the review page as well as property features on customers’ decision to
book a hotel online, the results show a significant impact of online review factors, such as
filters, quality, quantity, topicality, valence, and property features on customers’ booking
decision. The study also confirms the moderating effects of Gender and Trip Purpose on
some of the relationships proposed in the conceptual model [15].
In a research study conducted in Romania to build a model that assesses the influence of
affective commitment, high sacrifice commitment, and satisfaction on customers’ word-of-
mouth regarding an online retailer, the results show that satisfaction and high sacrifice
commitment have a significant impact on both the volume and valence of word-of-mouth,
while affective commitment only influences the valence of word-of-mouth [16].
A. Theoretical Framework
Digital Word-of-mouth (eWOM)

Regarding the eWOM variable, Ismagilova et al. [17] and Kasabov [18] state that it is
composed of any manifestation within digital platforms, where potential, current or former
consumers expose positive, neutral, or negative experiences about a product, service, brand,
or company.
Cuervo et al. [19], on the other hand, define eWOM as a form of digital communication
that allows understanding consumer behavior and the main motivations that lead them to
seek information online, which, in turn, affects their purchase decision or use of goods and
services. Meanwhile, Babić et al. [20] indicate that the variable in question is the action in
which consumers provide information about goods, services, or brands to other consumers
through virtual environments, and that this represents one of the most revealing and
explanatory developments in contemporary consumer behavior.

Credibility

Vallejo et al. [21] mention that the credibility of information is formed by the set of beliefs
of the consumer, based on the comments and opinions read in digital media, so its role is
decisive in the degree of influence that recommendations from other consumers can have on
the information receiver.
In addition, Abd-Elaziz et al. [22] explain that it is necessary to differentiate the
credibility of the information included in online recommendations from the credibility of the
online seller, since the former, referring to the credibility of eWOM, is based on information
obtained from customer experiences, with a non-commercial purpose. Likewise, Filieri [23]
argues that the credibility of a review is defined as the trust that readers have in what they
are reading. In this sense, consumers do not take into account opinions that are perceived as
unreliable.

Quality

Erkan and Evans [24] define the quality of eWOM as the persuasive strength of the
arguments expressing the information contained in a message or comment issued on digital
platforms. In this sense, Tsao and Hsieh [25] explain that it should be accurate, objective,
and complete, as well as reliable and useful for consumers.
In this framework, Abrego et al. [26] consider that the quality of electronic word of
mouth (eWOM) is a multidimensional construct composed of 4 factors: relevance, accuracy,
comprehensibility, and timeliness.

Quantity

Regarding the amount of eWOM, Alabdullatif and Akram [15] define it as the number of
comments about a product or service, and the greater the number of comments and/or
reviews, the more useful the information a user obtains to recommend or rate a service
becomes. Similarly, Ismagilova et al. [27] report that the quantity of eWOM messages makes
the information more visible to users interested in it, demonstrating the popularity of the
product or service.
Value

Tata et al. [28] explain that, regarding this dimension, distributors or sellers provide filters to
facilitate the search for reviews, allowing customers to filter between positive, negative, or
recent reviews. Therefore, while a set of positive opinions can improve the purchase
decision, a set of negative opinions may provoke a rejection response or another type of
response depending on the overall sentiment of the set of opinions.
In this sense, Purnawirawan et al. [29] mention that if a set of reviews is predominantly
positive or negative, it has a greater influence on perceived usefulness, and when reviews
refer to products, they have a greater influence on attitudes towards unknown brands.

Purchase Intention

Jang and Shin [30] defines it as the willingness of customers to engage in some type of
online transaction to satisfy their purchasing needs, and both utilitarian and hedonic values
can influence consumer purchase intention. Likewise, Lee et al. [31] and Nuseir [32] assert
that this dimension determines the consumer’s buying willingness and is represented when
the consumer considers various conditions and foundations to ultimately make a purchase.
In addition, Dehghani and Tumer [33] explain that this variable is a possibility that lies in
the hands of customers, since it depends greatly on the value of the product and the
recommendations shared by consumers.

Perceived Usefulness

Zappala [34] states that this dimension refers to the degree to which consumers believe that
using the Internet will improve their performance or productivity, thus enhancing the
outcome of their shopping experience. Similarly, Taherdoost [35] defines it as an individual’s
perception of how the use of new technology will increase or improve their performance,
meaning it refers to consumers’ perceptions of the outcome of their experience.
In addition, Cho and Sagynov [36] mention that there is a similarity between the
concepts of perceived usefulness and relative advantage in technology, as electronic
commerce constitutes an innovation within distribution channels. Therefore, the utility
provided to the consumer will be closely linked to its advantages as a sales system.

Confidence in the Seller

Tavera and Londoño [37] argue that in e-commerce there is a certain level of uncertainty in
the purchase process, as consumers do not have control over the actions of the online seller
or information about the intentions of the other party in the transaction, which causes people
to show hesitation towards developing virtual purchasing behaviors. Therefore, trust in the
seller is important for making a purchase.

Subjective Standard

Subjective standards are determined by the pressures perceived by others, how individuals
behave in certain ways, and their motivation to follow the views of these people [38].
Alalwan [39] indicates that subjective norm represents the degree to which a user perceives
the expectations of others regarding a certain behavior.

2 Materials and Methods


This research used the hypothetical-deductive method under a quantitative approach and is
understood as a basic research study.
The level of the research is explanatory/correlational/causal. Likewise, it is necessary to
emphasize that this research has a non-experimental design. In this study, the population is
finite and is comprised of the number of millennial consumers (women), which, according to
edition 003 of population statistics for 2021, published by the Compañía Peruana de
Estudios de Mercados y Opinión Pública [5], regarding the generational population of
Metropolitan Lima, indicates that 22.9% of inhabitants belong to the millennial generation.
In this sense, the population of millennial consumers in Metropolitan Lima is 2,885.3
thousand people, where 1,438.9 thousand are millennial men. Therefore, the population of
millennial women consumers in Metropolitan Lima is 1,446.4 thousand.
The sample consisted of 355 millennial female consumers from the districts that
constitute Metropolitan Lima and who have also made online purchases during the last
12 months. Simple random sampling was used, and the survey technique was subsequently
applied, collecting the information through two original questionnaires applied to the
research sample, which were valid and reliable.
A. Background

The hypothesis test was performed using an Ordinal Logistic Regression analysis. In this
sense, it was possible to explain the incidence of the independent variable Digital Word-of-
Mouth (eWOM) on the dependent variable Purchase Intention since the variables are of the
ordinal category. In this measure, the model contrast was performed at a significance level of
5% (α = 0.05) and a confidence level of 95%, deciding based on the statistical significance
value (Sig.) obtained in the test, analyzing the following statistics.
Goodness-of-fit measures: Likelihood ratio Chi-square, to examine whether the predictor
variable included in the model has a significant relationship with the response variable. In
this sense, if the p-value obtained is lower than the significance level (p < 0.05), the
assumption that the variables are significantly associated is accepted. On the other hand, if
the p-value obtained is higher than the significance level (p > 0.05), the assumption of
significant association is rejected.
Pseudo-R2: This indicator explains the level of influence of the dependent variable X on
the independent variable Y, being useful to know the fit of the model to the data. In the
estimation, the coefficients of the Cox and Snell, Nagelkerke, and McFadden tests are
offered, but there is no agreement on which pseudo-R2 statistics is better.
Measures of association and predictive efficiency: Wald test, to estimate the significance
of the probability that the coefficient is different from zero when projecting the dependent
variable in the regression model. In this sense, if the p-value obtained is lower than the
contrast level (p < 0.05), the assumption that the coefficient is equal to zero is rejected,
understanding that the independent variable contributes to the prediction of the dependent
variable. On the contrary, if the p-value obtained in the test is higher than the contrast level
(p > 0.05), the assumption that the coefficient is equal to zero is accepted, understanding that
the independent variable does not contribute to the prediction of the dependent variable.

3 Results
3.1 Hypothesis Testing
The Table 1 shows that the significance is less than 0.05; therefore, Ha is accepted, i.e.,
eWOM does have a significant influence on purchase intention. In this sense, when eWOM
exists, it will generate purchase intention; therefore, purchase intention is explained by the
possible actions or influence of eWOM.
Table 1 Chi-square test of eWOM and purchase intention

Model Value Df Asymptotic significance (bilateral)


Pearson’s Chi-square 166,729a 4 ,000

Likelihood ratio 151,387 4 ,000


Linear-by-linear association 118,985 1 ,000
Number of valid cases 355

a
0 cells (0.0%) have expected a count lower than 5. The minimum expected count is 15.82

In the Table 2, it is possible to observe that the significance level is less than 0.05, so all
the specific hypotheses are accepted. In other words, the credibility, quality, quantity, and
value of the eWOM influence the purchase intention of millennial consumers, i.e., the
purchase intention is explained by the possible actions or influence of the eWOM.
Table 2 Chi-square tests of specific hypotheses (unified)

Credibility of eWOM quality/purchase eWOM The value of


eWOM/purchase intention quantity/purchase eWOM/purchase
intention intention intention
Value df Asympto Value df Asymptoti Value df Asympto Value df Asympto
tic c tic tic
significance significance significance significance
(bilateral) (bilateral) (bilateral) (bilateral)
Pearson’s 77,037a 4 ,000 141,674a 4 ,000 122,766a 4 ,000 121,970a 4 ,000
Chi-square
Likelihood 70,696 4 ,000 130,380 4 ,000 111,908 4 ,000 120,099 4 ,000
ratio
Linear-by- 58,528 1 ,000 97,260 1 ,000 90,589 1 ,000 102,467 1 ,000
linear
association
N° of 355 355 355 355
valid cases

3.2 Ordinal Logistics Regression Test


In Table 3, a chi-square value of the likelihood ratio χ2 = 164.019 with a degree of freedom 4
and a p-value = 0.000 was determined, which, being less than the established significance
level (p < 0.05), allows accepting the assumption that digital word-of-mouth is associated
with purchase intention, indicating a good fit of the model to explain the incidence of the
independent variable on the dependent variable.
Table 3 Goodness-of-fit test of digital word-of-mouth on the purchase intention of millennial female consumers

Model −2 Log likelihood Chi-square df Sig.


Intercept only 353,331
Final 189,312 164,019 4 ,000

Link function: Logit

In Table 4, the R2 coefficient of determination with the highest value corresponds to


Nagelkerke (0.425), estimating that digital word-of-mouth has a 42.5% impact on the
purchase intention of millennial consumers.
Table 4 Pseudo coefficient of determination of digital word-of-mouth in the purchase intention of millennial female
consumers

Digital word-of-mouth in: Cox and Snell Nagelkerke Mc Fadden


Purchase intention ,370 ,425 ,226

Link function: Logit

In Table 5, a Wald coefficient = 52.203 associated with a p-value = 0.000 lower than the
contrast level (p < 0.05) was determined; therefore, H0 is rejected, and Ha is accepted,
estimating at a confidence level of 95% that as digital word-of-mouth levels decrease, the
probability in the levels of purchase intention decreases, concluding that: Digital word-of-
mouth has a significant impact on the purchase intention of millennial female consumers.
Table 5 Measures of association and predictive effectiveness of digital word-of-mouth in the purchase intention of
millennial consumer

Estimate Std. error Wald df Sig. 95% Confidence interval


Lower bound Upper bound
Threshold [INTENTION = 1] 2,633 ,364 52,203 1 ,000 1,919 3,348
[INTENTION = 2] 5,829 ,468 155,056 1 ,000 4,912 6,747
Location EWOM 2,150 ,194 122,324 1 ,000 1,769 2,531

Link function: Logit

In Table 6, Wald Chi-Square coefficients associated with a p-value = 0.000 lower than
the contrast level (p < 0.05) were determined for the levels of quality, quantity, and value of
the word-of-mouth. Therefore, it is concluded that as the levels of these dimensions
associated with digital word-of-mouth decrease, the probability of the levels of purchase
intention decreases. Thus, the quality, quantity, and value of digital word-of-mouth
significantly influence the purchase intention of millennial female consumers.
Table 6 Tests of model effects

Source Type III


Wald chi-square df Sig.
Credibility level ,104 1 ,747
Quality level 15,138 1 ,000
Quantity level 8,964 1 ,003
Value level 17,452 1 ,000

Dependent variable: purchase intention (grouped)


Model: (threshold), credibility level, quality level, quantity level, value level

4 Discussion and Conclusion


According to the results presented in this research, regarding the general hypothesis, it was
possible to determine that there is a significant incidence between digital word-of-mouth and
purchase intention (χ2 = 0.000 < 0.05; 164.019, 0.000 < 0.05; Nagelkerke’s R2 = 0.425; Wald
coefficient = 52.203 = 0.000). These results are similar to those obtained by Rossell [8],
where it was found that eWOM in social networks influences the purchase intention of
consumers in the category of delivery applications (Sig. = 0.000). In this sense, it was found
that the indicator with the highest valuation was: “I will use a delivery application when I
need it.”
Likewise, Fortes and Santos (2020) also had similar results in a study conducted on the
influence of eWOM on online purchase intention, where the study population consisted of
Portuguese Internet users, highlighting the power of eWOM on online purchase intention
(Sig. = 0.000), analyzing the explained variance of the variables. It is understood that online
purchase intention is explained by the quality of online service, online trust, and eWOM in
58.7%. Similarly, Khan and Mohammed [13], in a study regarding the impact of eWOM on
consumer purchase intention in the footwear industry of Pakistan, obtained that the effect of
consumer purchase intention is determined by eWOM (Sig. = 0.000), showing that there is a
relationship between it and consumer purchase intention.
The impact of eWOM credibility on the purchase intention of millennial consumers was
identified (in the χ2 test, Sig. = 0.000), proving that this dimension of eWOM does have a
significant impact on the purchase intention of millennial consumers (however, in the ordinal
regression, this impact was not proven). In this sense, when consumers consider there is
credibility in eWOM, a high purchase intention will be generated.
Carbajal and Chocaca [7] obtained similar results in their research, where they evaluated
the relationship between eWOM in social networks and whether it influences purchase
intention in the category of urban women’s clothing, evidencing that the hypothesis of a
relationship between the credibility of the information (eWOM) and the purchase intention
variable is accepted (Sig. = 0.000). In this study, the authors conclude that users consider the
product information shared by family and acquaintances on social networks to encourage
purchase intention.
The influence of eWOM quality on the purchase intention of millennial consumers was
identified, proving that this dimension of eWOM does have a significant impact on the
purchase intention of millennial consumers (Sig. = 0.000). In this sense, when there is
quality in the eWOM, a high purchase intention will be generated. Therefore, it could be said
that the companies present in the various digital platforms are efficiently managing eWOM
strategies and actions so that consumers from the millennial generation are aware that there
is a factor that generates certainty for them.
Likewise, Carbajal and Chocaca [7] obtained similar results in their research, where it
was possible to determine the relationship between the quality of the information in the
adoption of eWOM in social networks and the intention to purchase urban women’s fashion.
Similar results were obtained in a study by Rossell [8], who determined that information
quality directly influences purchase intention (Sig. = 0.000).
The impact of eWOM quantity on the purchase intention of millennial consumers was
identified, proving that this dimension of eWOM does have a significant impact on the
purchase intention of millennial consumers (Sig. = 0.000). In this sense, the purchase
intention of millennial consumers is explained by the influence of eWOM quantity, i.e.,
when there is quantity in the eWOM, a high purchase intention will be generated. In this
regard, similar results to those by Alabdullatif and Akram [15] were obtained. They
evaluated the role of online customer reviews and the key elements of review websites,
where, unlike this research, the authors used the SmartPLS 3.2 Bootstrapping procedure in
the proposed conceptual model, concluding that the independent variables influence the
purchase decision significantly (Sig. < 0.05).
The incidence of eWOM value on the purchase intention of millennial female consumers
was identified, proving that this eWOM dimension does have a significant impact on the
purchase intention of millennial female consumers (significance = 0.000 < 0.05). In this
sense, when there is value in the eWOM, a high purchase intention will be generated;
therefore, it could be said that the companies present in the various digital platforms would
be efficiently managing strategies and actions of eWOM so that consumers from the
millennial generation realize that there is a factor that generates conviction, which leads
them to have an intention and possible purchase in digital media. When considering this
result, it was found that there is a similarity with the results by Anastasiei and Dospinescu
[16] who, when evaluating eWOM volume and value in Romanian students, determined that
the value of eWOM affects the purchase intention (Sig. < 0.05), and mixed value opinions do
not influence the purchase intention.
It is important to note that one of the main limitations of this study is that the sample
only consisted of millennial consumers, although they are a good target for studies like this
(because they are very active on social media and digital platforms and tend to buy mostly
online), this could affect the generalization of the results to some extent. However, as shown,
the results are generalizable to other contexts, as similar results have been obtained, even in
an international context.
On the other hand, the findings of this study have significant implications. Firstly, they
reveal the levels of acceptance of online shopping by millennials as potential customers, as
well as the factors related to eWOM that motivate or discourage them from making
purchases online. Secondly, the study highlights the importance of eWOM in the business
environment, as well as the management of credibility and the value of implementing
strategies to improve their presence on digital platforms. Thirdly, these findings allow us to
predict the behavioral intentions of consumers in the context of e-commerce, which is still
growing.
Finally, it is recommended to conduct further studies that delve deeper into the topics
related to digital word-of-mouth and purchase intention to explore more aspects related to
the constant changes and influences on consumers, considering a broader range of factors in
the context of both variables and expanding a greater number of them. It would be beneficial
to expand this study to less developed research areas, such as the public sector and other
countries, and they could be integrated with other topics, such as customer relationship
management, through the application of new models based on advanced technologies.
Similarly, future research could examine gender roles in purchase intentions in relation to
eWOM, to understand how these could impact, as well as expand the population and
consolidate statistical analysis with added structural models.

References
1. Tien D, Amaya A, Liao Y (2019) Examining the influence of customer-to-customer electronic word-of-mouth on
purchase intention in social networking sites. Asia Pac Manag Rev 24(3):238–249
2.
INCAE. Mujeres Millennial: Profesionales, Trabajadoras, Urbanas
3.
IPSOS (2018) Global Trends 2018
4.
DATUM Internacional (2019) ¿En qué se diferencian los Millennial del Perú?
5.
CPI (2021) CPI Perú: Población 2021
6.
IPSOS (2018) IPSOS Millennials: Más X, Que Z
7.
Carbajal N, Chocaca C (2020) EWOM en redes sociales en relación a la intención de compra en la categoría de jeans
urbano femenino en el nivel socioeconómico A y B de los distritos de la zona 8 de Lima Metropolitana. Universidad
Peruana de Ciencias Aplicadas, Lima
8.
Rossell DL (2020) Influencia del EWOM en la intención de compra en los aplicativos de delivery en Lima
Metropolitana. Universidad Peruana de Ciencias Aplicadas, Lima
9.
Levy S, Gvili Y, Hino H (2021) Engagement of ethnic-minority consumers with electronic word of mouth (EWOM) on
social media: the pivotal role of intercultural factors. J Theor Appl Electron Commer Res 16(7):2608–2632
[Crossref]
10.
Zinko R, Patrick A, Furner C, Gaines S, Kim M, Negri M, Orellana E, Torres S, Villarreal C (2021) Responding to
negative electronic word of mouth to improve purchase intention. J Theor Appl Electron Commer Res 16:1945–1959
[Crossref]
11.
Nawaz Z, Zhao C, Nawaz F, Safeer A, Irshad W (2021) Role of artificial neural networks techniques in development of
market intelligence: a study of sentiment analysis of EWOM of a women’s clothing company. J Theor Appl Electron
Commer Res 16(5):1862–1876
[Crossref]
12.
Le-Hoang P (2020) The relationship between online convenience, online customer satisfaction, buying intention and
electronic word-of-mouth. Indep J Manag Prod 11(7):2943–2966
[Crossref]
13.
Khan K, Ali M (2017) Impact of electronic word of mouth on consumer purchase intention in footwear industry of
Pakistan. Kuwait Chap Arab J Bus Manag Rev 6:52–63
[Crossref]
14.
Fortes N, Santos A (2020) A Influência Do EWOM Na Intenção de Compra Online. Revista Ibérica de Sistemas e
Tecnologias de Informação E34:408–420
15.
Alabdullatif AA, Akram MS (2018) Exploring the impact of electronic word of mouth and property characteristics on
customers’ online booking decision. TEM J 7(2):411–420
16.
Anastasiei B, Dospinescu N (2019) Electronic word-of-mouth for online retailers: predictors of volume and valence.
Sustainability 11(3):1–19
[Crossref]
17.
Ismagilova E, Dwivedi YK, Slade E, Williams MD (2017) Electronic word of mouth (EWOM) in the marketing
context. SpringerBriefs in Business; Springer International Publishing, Cham
[Crossref]
18.
Kasabov E (2016) Unknown, surprising, and economically significant: the realities of electronic word of mouth in
Chinese social networking sites. J Bus Res 69(2):642–652
[Crossref]
19.
Cuervo S, Salcedo N, Gutiérrez K, Joaquín M, Ramírez K, Tumbalobos C (2016) Rendimiento del tráfico web en la
elección de un programa de posgrado. Universidad ESAN, Lima
20.
Babić A, Sotgiu F, Valck K, Bijmolt THA (2015) The effect of electronic word of mouth on sales: a meta-analytic
review of platform, product, and metric factors. J Mark Res 53(3):297–318
[Crossref]
21.
Matute J, Polo Y, Utrillas A (2015) Las características del boca - oído electrónico y su influencia en la intención de
recompra online. Revista Europea de Dirección y Economía de la Empresa 24(1):61–75
[Crossref]
22.
Abd-Elaziz ME, Aziz WM, Khalifa GS, Abdel-Aleem M (2015) Determinants of electronic word of mouth (EWOM)
influence on hotel customers’ purchasing decision. Int J Herit Tour Hosp 9(2/2):194–223
23.
Filieri R (2016) What makes an online consumer review trustworthy? Ann Tour Res 58:46–64
[Crossref]
24.
Erkan I, Evans C (2016) The influence of EWOM in social media on consumers’ purchase intentions: an extended
approach to information adoption. Comput Hum Behav 61:47–55
[Crossref]
25.
Tsao W-C, Hsieh M-T (2015) EWOM persuasiveness: do EWOM platforms and product type matter? Electron
Commer Res 15(4):509–541
[Crossref]
26.
Abrego D, Sánchez Y, Medina JM (2017) Influencia de los Sistemas de Información en los Resultados
Organizacionales. Contaduria y Administración 62:303–320
[Crossref]
27.
Ismagilova E, Slade EL, Rana NP, Dwivedi YK (2020) The effect of electronic word of mouth communications on
intention to buy: a meta-analysis. Inf Syst Front 22:1203–1226
[Crossref]
28.
Tata SV, Prashar S, Gupta S (2020) An examination of the role of review valence and review source in varying
consumption contexts on purchase decision. J Retail Consum Serv 52:101734
[Crossref]
29.
Purnawirawan N, Eisend M, De Pelsmacker P, Dens NA (2015) Meta-analytic investigation of the role of valence in
online reviews. J Interact Mark 31(1):17–27
[Crossref]
30.
Jang S-H, Shin J-I (2016) The influence of contextual offer, utilitarian, and hedonic value on purchase intention in
mobile location-based services. Int J Bus Policy Strat Manag 3(1):7–12
[Crossref]
31.
Lee WI, Cheng SY, Shih YT (2017) Effects among product attributes, involvement, word-of-mouth, and purchase
intention in online shopping. Asia Pac Manag Rev 22(4):223–229
32.
Nuseir MT (2019) The impact of electronic word of mouth (e-WOM) on the online purchase intention of consumers in
the Islamic countries—a case of (UAE). J Islam Mark 10:759–767
[Crossref]
33.
Dehghani M, Tumer M (2015) A research on effectiveness of Facebook advertising on enhancing purchase intention of
consumers. Comput Hum Behav 49:597–600
[Crossref]
34.
Zappala S (2017) Impact of e-commerce on consumers and small firms. In: Zappalà S, Gray C (eds) Routledge
35.
Taherdoost H (2018) A review of technology acceptance and adoption models and theories. Procedia Manuf 22:960–
967
[Crossref]
36.
Cho YC, Sagynov E (2015) Exploring factors that affect usefulness, ease of use, trust, and purchase intention in the
online environment. Int J Manag Inf Syst 19(1):21–36
37.
Tavera JF, Londoño BE (2014) Determining factors in technology acceptance of e-commerce in developing countries
(Fatores Determinantes Da Aceitação Tecnológica Do E-Commerce Em Países Emergentes). Revista Ciencias
Estratégicas 22(31):101–119
38.
Ham M, Jeger M, Ivković AF (2015) The role of subjective norms in forming the intention to purchase green food.
Econ Res-Ekonomska istraživanja 28(1):738–748
[Crossref]
39.
Alalwan AA (2018) Investigating the impact of social media advertising features on customer purchase intention. Int J
Inf Manag 42:65–77
[Crossref]

OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed
Computing , Studies in Computational Intelligence 1153
https://doi.org/10.1007/978-3-031-56388-1_4

Alignment of Business Process and


Information System Models Through
Explicit Traceability
Aljia Bouzidi1 , Nahla Zaaaboub Haddar2 and Kais Haddar3
(1) Monastir University, ISIMM, Monastir, Tunisia
(2) Sfax University, FSEGS, Sfax, Tunisia
(3) Sfax University, FSS, Sfax, Tunisia

Aljia Bouzidi
Email: [email protected]

Abstract
In the software development lifecycle, business process models (BPMs)
turn out to play an ever more pivotal role in the development and continued
management of information systems (ISs). However, BPMs and IS models
(ISMs) are traditionally expressed separately. This separation causes drift
between them, impedes their interoperability, and thus builds up misaligned
models. Traceability in software development proves its ability to link
together related artifacts from different sources within a project (for
examples, business modelling, requirements, design models), improves
project outcomes by assisting designers and other stakeholders with
common tasks such as impact analysis, etc. In this paper, we propose an
improvement and an extension of an existing requirement traceability
method in order to tackle the traceability between design, requirement and
BPMs. In fact, the extension consists in adding the UML class diagram
concepts structured according to the Model View Controller (MVC) design
pattern to be traced with BPMN and UML use cases models in a single
unified model. This method is based on the integration mechanism, acts at
the model and the meta-model levels, and can be used to develop a new IS
and/or to examine the misalignment of the existing ISMs and the BPMs
after BPM/ISM evolution.

Keywords Alignment – Class diagram – MVC design pattern – BPMN


model – Use case diagram

1 Introduction
Traceability in software development proves its ability to link together
related artifacts from different sources within a project (for examples,
business modelling, requirements, uses cases, design models), improves
project outcomes by assisting designers and other stakeholders with
common tasks such as impact analysis, etc. Thereby creating an explicit
traceability model that has not a standalone guideline, but it has huge
benefits in terms of consistency, quality, and automation. Although its
creation is not a trivial task, an explicit traceability model remains a
reference for a consistent definition of typed traceability links between
heterogeneous model concepts that help to ensure their alignment and their
coevolution.
In this context, in [1], authors adopt the integration mechanism to
propose an accurate requirement engineering method that acts at the meta-
model and model level, and establishes traceability between BMs and ISMs
to bridge the gap between business and requirement modelling in a
straightforward way. Indeed, authors of [1], firstly define an integrated trace
meta-model that ensures a straightforward integration of the business and
software worlds for representing the BPMN [9] and the UML use case
models [10] in the form of a unified single meta-model. It defines also
traceability links between interrelated concepts to correlate overlapped
concepts as new modelling concepts. Then, they define an integrated model
as an instantiation of the proposed integrated trace meta-model. They draw
it as a new diagram baptized BPSUC (Business Process Supported Use
Cases).
Research conducted in this paper is to enhance and extend the work
presented in [1]. The enhancement consists in adding class diagram
concepts structured according to the MVC pattern. Our intervention
considers both the meta-model and the model levels. Hence, in the
integrated trace meta-model proposed by [1], we add new modelling
concepts to express trace links between the class diagram, use case diagram
and the BPMN concepts. Class diagram concepts that have no
corresponding concepts are also added in the integrated trace meta-model.
Proposed traceability concepts and class diagram concepts are instantiated
in BPSUC diagram. Accordingly, BPSUC is now able to design class
diagram elements and the proposed traceability concepts combined with
their corresponding BPMN and use case diagram artifacts.
The remainder of this paper is structured as follows: Sect. 2 gives an
overview of the method presented in [1]. In Sect. 3, we explain our
proposal. Section 4 is devoted to discussing related works. In Sect. 5, we
show the feasibility of our contributions in practice. Finally, Sect. 6
concludes the current paper and outlooks future works.

2 Background of the Traceability Method of [1]


The main goal of the traceability method proposed in [1] is to overcome the
alignment issues between the business and the requirement engineering
process.
To perform their objectives, [1] explore the advantages of defining an
integrated traceability model to establish and maintain traceability as well
as synchronization between the BPMN and the UML use case models. This
approach acts at both the meta-model and model level, and it is composed
mainly of three steps: In the first step, they define an integrated trace meta-
model that is a specification of traceability between the existing artifacts
keeping them unchanged and independent. In contrast to most existing
approaches that use dedicated traceability meta-models only for traceability
content, this meta-model incorporates all the BPMN and the UML use case
models together with meta-classes and relationships describing traceability
links. The integrated meta-model favors simplicity and uniformity, since all
source meta-model concepts and traceability information conform to one
unified meta-model.
In this integrated trace meta-model, the authors of [1] defined the
following new meta-classes for representing the trace links between
overlapped concepts:
Organization unit actor: This meta-class inherits the properties of actors
of UML metamodel and an empty lane (a lane that does not contains
other lanes) of a BPMN metamodel. In this way, this concept allows
combining the use of both empty lane and UML Actor without affecting
their semantics and preserve the original meta-model consistency.
Organization Unit Package: is a traceability concept that is defined as a
specialization of BPMN lane/pool that contains lane sets and of use case
packages.
Fragment: is defined by [1] as the longuest sequence of BPMN elements
that handle the same business object (data object, data store, input data,
output data) and which are performed by the same participant.
Use case supporting Fragment (UCsF): is the most important traceability
concept. It is represented as a container of BPMN concepts that
participate in the realization of a use case.
In the second step, the authors of [1] instantiate the integrated trace
meta-model at the model level. They draw it as a new diagram baptized
Business Process Supported Use Case (BPSUC). This diagram incorporates
also the BPMN and the UML use case elements together with traceability
links, and allows designing BPMN and use case diagram elements, jointly.
In addition, visualizations and queries on traced elements together are
straightforward, because business analysts and software designers are now
able to work together on one integrated model. BPSUC may be used also to
validate the changes before propagating them to the source models.
In the third step, they define bidirectional set of transformation rules
between the origin models namely the BPMN and the use case models, and
the BPSUC diagram to ensure the coevolution of the origin models when a
change occurs.

3 Our Contribution
The research work conducted in this paper is an improvement and extension
of the method of [1]. The extension consists in improving the integrated
trace meta-model, and the BPSUC diagram to include the artifacts of the
UML class diagram structured according to the MVC design pattern. In this
section, we explain how we add the UML class diagram concepts in the
integrated trace meta-model and the BPSUC diagram, and the rectifications
made on them.

3.1 Integated Trace Meta-Model Improvement


Our first improvement on the integrated trace meta-model consists of
defining an adequate strategy/process for defining the integrated meta-
model concepts and their relationships. Indeed, we propose a process of
definition of the integrated meta-model that contains mainly two steps; (1)
definition of a relevant mapping between BPMN and UML concepts. (2)
After identifying interrelated elements of BPMN and UML meta-models,
we need an adequate methodology to bind them, without changing their
semantics. Accordingly, we propose to keep mapped concepts, and connect
them with either a new meta-class, or a new association to express
relationships between overlapped concepts. Afterwards, we connect each
couple of overlapping artifacts and the new meta-class representing them
through an inheritance relationship. This relationship helps to inherit
characteristics of both separated concepts as well as combines their usage
without changing their semantics.
Table 1 Mapping of the BPMN concepts and the class diagram concepts structured according to the
MVC pattern

BPMN concept Class diagram


concept
Data input/Data output/Data object/Data state/Data store – Entity class
– Association
Empty lane – Entity class
– View class
– Control class
– Association
Fragment – View class
– Control class
Exception event – Exception class
– Operation
Signal event – Signal class
– Operation
Automated task t (User task, Send task, Receive task, Service task, Business – Operation
rule task, Script task) within a fragment
– Association
Item aware element type (Single, Collection)/Gateway/Loop task/Rollback Cardinality of an
sequence flow association
BPMN concept Class diagram
concept
Item aware element attached to an automated task t within a fragment f Operation
parameters
Conditional sequence flow Attribute
Our second optimization consists in applying our strategy defined in
first meta-model improvement to integrate the class diagram meta-model
artifacts to the integrated trace meta-model. By applying our process of
meta-model construction, we need to identify adequate mappings between
BPM and UML class diagram concepts. By referring to the literature,
mappings between BPMN and UML class diagram concepts are widely
educated. The most complete and consistent is defined by [13]. This
research work defined a transformation model from BPMN into UML class
diagram structured according to the MVC design pattern, and use case
diagram. These transformation models are based on semantic and mappings
between a considerable number of BPMN and UML artifacts. For example,
each BPMN empty lane (lane that does not contain child lanes is mapped to
an UML actor in the use case model, and into a class in the class diagram,
etc.
In Table 1, we resume the semantic mapping between the class diagram
meta-model and the BPMN meta-model concepts of [13]. In this Table 1,
the class diagram meta-model concepts are structured according to the
MVC design pattern.
We reuse the defined semantic mappings in this work to add the class
diagram meta-model to the definition of the integrated trace meta-model.
In contrast to the mapped concepts of the BPMN and the use case meta-
models, the relationships between the interrelated concepts of BPMN and
UML class diagram meta-models are not tight as shown in Table 1. Indeed,
one UML concept may be represented by many BPMN concepts and vice
versa. This is due mainly to the important heterogeneity degree between the
BPMN and the class diagram artifacts.
Thus, our mapping is limited to define new associations instead of
defining new traceability meta-classes, since we aim facilitate its readability
and keep its consistency without complicating it. Moreover, we enhance the
trace meta-classes defined by [1] in order to define BPMN and class
diagram concept traceability. The enhanced part of the integrated trace
meta-model is depicted in Fig. 1. For readability reasons, we present only
the core concepts of BPMN and the class diagram meta-model and the
extended traceability meta-classes. White meta-classes denote the BPMN
concepts, orange meta-classes represent the UML class diagram meta-
model concepts, khaki meta-classes represent the UML class diagram
concepts for structuring the class diagram according to the MVC design
pattern, whereas the reused traceability meta-classes are represented with
dark grey meta-classes.
The blue associations denote the new proposed trace links, while the
black ones denote the existing associations.
It is important to point out that all the use case concepts, BPMN
concepts, traceability links and existing associations defined in the previous
version of the integrated trace meta-model, and not present in this extract
remain valid. In the extract of Fig. 1, each BPMN concept is associated to
its corresponding concept of the class diagram meta-model. For example,
we define a trace link called trace between the data object and the entity
class to trace the link between them. The multiplicity of this association is
“1..*” to point out that each item aware element should represent exactly an
entity class. Moreover, we define a trace link between the gateway and the
property meta-classes as gateways may be indicators of association
cardinalities. The multiplicity of this association is “0.”
Fig. 1 Proposed integrated trace meta-model
On the other hand, UCsF is associated to the following meta-classes;
class, ClassDIPackage, and association by composition associations. This
means that an UCsF may incorporate classes, associations and packages.
These associations mean that an UCsF is a use case that encapsulates its
supported class diagram elements, which represent elements of the
supported fragment. The cardinality of the composition association UCsF-
ClassDIPackage is “3..*” to mention that an UCsF should incorporate at
least three packages View, Control and Models, which represent the three
parts of the MVC design pattern.
Further, we define an association between OUActor and class to express
actors in the integrated trace meta-model are represented as classes in the
class diagram meta-model. Furthermore, in our integrated trace meta-
model, a generation relationship is created between the meta-classes
OUPackage and ClassDIPackage to point out that this trace meta-class
inherits all characteristics of the meta-class Package of the class diagram
meta-model.

3.2 BPSUC Diagram Extension


In the research work of [1], the BPSUC diagram features are limited to
designing the BPMN and the use case diagram artifacts, thoroughly,
combined with their traceability links; which already reflects its name.
In this paper, we aim to enhance this diagram capability to design class
diagram elements combined with BPMN and use case diagram elements.
The first thing we do is updating the name of BPSUC to be in harmony with
its new supported features. The new designation we choose is
BPMNTraceISM (Business Process Model and Notation Traces Information
System Models). BPMNTraceISM is an instantiation of our proposed
integrated trace meta-model, and forms one single unified model that
combines the usage of BPMN and UML elements (use case diagram and
class diagram), thoroughly. Thus, this diagram is able now to design
elements and traceability relationships of both BPMN and UML use case
and class diagram, concurrently. In addition, it specifies the traceability
information of the interrelated artifacts.
Each artifact in BPMNTraceISM diagram has its specific notation. Some
of them retain the origin notation (BPMN or UML notations); the others
have a new representation, which does not differ greatly from BPMN and
UML notations.
Some of the artifacts that keep the origin BPMN/UML concepts are (1)
those that not included in the mappings we base on to define our integrated
trace meta-model. This is due to the fact that, some BPMN artifacts have
not their corresponding UML artifacts, and vice versa. For example, the
semantic mapping we base on does not define any UML concept
overlapped with a BPMN start event.
Further, in the mapping we base on, many UML class diagram elements
are mapped to one BPMN element and vice versa. For example, a data
store, in the BPMN diagram is mapped to (i) an association, (ii) an entity
class, (iii) an operation of a class, in the class diagram. In this situation, it is
very difficult to represent the mapped elements by one unifying element.
Accordingly, these mapped elements conserve their origin graphic notation.
OUPackage and OUActor are new meta-classes defined by [1] to
represent traceability links of BPMN and UML use case diagram elements.
In integrated trace meta-model, we did not reuse these meta-classes to
define new associations. Thus, the instantiation of these meta-classes keep
the annotations provided in [1].
UCsF Notation: In previous version of the diagram BPMNTraceISM (in
a BPSUC diagram), [1] cited that an UCsF is a specialization of a use case
and inherits its characteristics. Therefore, the graphic notation of UCsFs
extends the graphic notation of a UML use case. Moreover, UCsF has
composition relationships to (i) a BPMN Fragment.
To represent this trace link, graphically, the authors of [1] define a
compartment that incorporates the corresponding BPMN fragment. In our
integrated trace met-model, we have defined composition relationships
from UCsF to some UML class diagram artifacts (see Fig. 1). Indeed, UCsF
encapsulates should classes, associations and packages, which correspond
to its supported fragment. Accordingly, we propose to update the graphic
notation of the UCsF. Thus, UCsF should act as a complex symbol that
describes concurrently, BPMN elements and UML class diagram elements.
In order to represent, explicitly, the different elements incorporated by
UCsF, the use case notation needs to be extended. Therefore, we adjust the
UCsF notation by adding another compartment (see Fig. 1) to encapsulate
class diagram elements representing the components (classes, associations
and packages) of the supported fragment. In order to avoid the complexity
of this element, the designer can choose to hide or to show each
compartment.

Fig. 2 Example of UCsF


Figure 2 presents an example of an UCsF called “Archive purchase
order”. This ICsF, contains, in the BPMN compartment a service task called
“Archive purchase order”, a data output and an end event. The compartment
of the class diagram elements contains a boundary class (stereotyped
“boundary”) and a control class called “VArchiveO”, a control class called
“CArchiveO”, and three entity classes called respectively “packaged”,
“Archived” and “PurchaseOState”. These contain attributes and operations
and they are associated with each other.

4 Implementation
In order to apply our approach into the practice, we have developed a visual
modelling to support the proposed integrated trace meta-model and the
BPMNTraceISM diagram.
The tool acts as an internal plugin within the Eclipse framework.
Indeed, deploying our modelling tool in the form of a plug-in increases its
reusability and availability in any Eclipse platform without dependency on
the system runtime or the workspace containing the modelling tool. In
addition, our editor is a fully functional graphical tool that allows business
engineers and software designers to work together with a single integrated
graphical user interface (GUI) that incorporates both BPMN, use case and
class diagram elements.
The construction process of this editor begins with using the ECore
meta-modelling language to develop the improved integrated meta-model.
Then, we implement a toolbox to design instances of the integrated trace
meta-model classes.
BPTraceISM environment is composed of four main parts: the project
explorer containing an EMF project that includes BPMNTraceISM
diagrams, the modelling space, the toolbox that contains the graphical
elements of a BPMNTraceISM diagram, and the properties tab to edit the
properties of an element selected in the modelling space part.
Figure 3 outlines a simple example of a BPMNTraceISM diagram
modelled by using the editor. The modelling space contains an OUActor
called supplier associated to an UCsF called manage purchase order. In the
business compartment of the UCsF “manage purchase order”, we have a
user task called Accept purchase order. In the class diagram compartment,
we have four classes linked via undirected associations. Each class has a
name and a stereotype. The boundary class Manage purchase order contains
an operation called “acceptPurhaseOrder()”.
Fig. 3 Example of a BPMNTraceISM diagram designed by using the BPtraceISM editor

5 Related Works
In related works we focus on the approaches that define explicit traceability
models separated from source models.
These works include approaches that propose guidelines for creating
traceability models. For instance, a guideline is proposed in [3] for
establishing traceability between software requirements and UML design
models. This guideline includes two main concepts: (1) meta-model that
represents relationships between requirements and the UML diagrams, and
(2) process that is applied according to specific steps. However, this
guideline focuses on establishing traceability at the met-model level, only.
There are other model-based researches that aim to maintain the
traceability. For example, the research in [5] that proposes a change
propagation-based coevolution of transformations. Its premise is that
knowledge of the meta-model evolution can be propagated by means of
resolutions to drive the transformation coevolution. To deal with particular
cases, the authors introduce composition-based techniques that permit
developers to compose resolutions meeting their needs. Adopting the same
purpose, the authors of [7] refer to machine learning techniques to introduce
an approach baptized TRAIL (TRAceability lInk cLassifier). This study
uses the histories of existing traceability links between pairs of artifacts to
train a machine learning classifier to be able to classify the link between
any new or existing pair of artifacts as related or unrelated. Some other
approaches define traceability models for eliciting requirements of complex
systems. For example, in [2, 6]. Furthermore, the authors of [8] propose a
traceability model that traces the model elements at different levels of the
enterprise architecture. It uses general concepts for representing different
artifacts used to model traceability such as “traceability links”, “Aspect”,
“Element”, “requirement”, etc. Likewise, the authors of [4] base on the
deep learning machine and propose a neural network architecture that uses
word embedding and Recurrent Neural Network (RNN) technique to
generate automatically trace links. The final output of RNN is a vector that
represents the semantic information of the artifact. The tracing network then
compares the semantic vectors of two artifacts and outputs the probability
that they are linked. Yet, managing all meta-models of overall levels in a
single traceability model may yield complex models. There are researches
which are dedicated to specific languages. For example, the approach of
[11] that proposes a meta-model traceability in the form of an extension of
the BPMN meta-model. Then, they define trace links between some
elements. Further, the authors [12] base on Natural Language Processing
techniques to define an enhanced framework of software artifact traceability
management. To illustrate their approach, a tool that supports the
traceability between requirements, UML class diagrams, and corresponding
Java code is implemented. Overall, existing works that defined explicit
traceability model are mostly focused on the meta-model model level, only
and ignore the model level. Moreover, existing explicit traceability models
trace either between UML diagrams at the same/different abstraction or
between business model concepts. However, none of the existing
approaches achieved reliable results when dealing with traceability between
BPMN models, UML use case models and UML class diagram.

6 Conclusion
In this paper, we improved and extend an existing traceability method
between BPMN and UML use case models that establish the traceability
considering the meta-model and the model levels.
The extension consists in enhanced an integrated traceability meta-
model by adding UML class diagram concepts structured according to the
MVC design pattern and trace links between the class diagram concepts and
the existing concepts of the integrated trace meta-model. Traceability links
are created basing on semantic mapping between BPMN use case and class
diagram meta-models.
We have also improved the instantiation of the integrated trace meta-
model by creating new icons for representing UML class diagram elements
combined with UML use case model and BPMN diagram elements in one
unified diagram that we call BPMTraceISM.
In our future works, we look forward to define heuristics on the
compliance of a new diagram to constraints specified dynamically by the
developer. These heuristics should be able to automatically detect the
changes made on the BPMN, UML use case and UML class diagrams, and
to indicate the elements that will be affected by the changes.

References
1. Bouzidi A, Haddar N, Haddar K (2019) Traceability and synchronization between BPMN and
UML use case models. Ingénierie des Systèmes d Inf 24(2):215–228
2.
de Carvalho EA, Gomes JO, Jatobá A, da Silva MF, de Carvalho PVR (2021) Employing
resilience engineering in eliciting software requirements for complex systems: experiments with
the functional resonance analysis method (FRAM). Cogn Technol Work 23:65–83
[Crossref]
3.
Eyl M, Reichmann C, Müller-Glaser K (2017) Traceability in a fine grained software
configuration management system. In: Software quality. Complexity and challenges of software
engineering in emerging technologies: 9th international conference, SWQD 2017. Vienna,
Austria, Proceedings 9. Springer International Publishing, pp 15–29
4.
Guo J, Cheng J, Cleland-Huang J (2017) Semantically enhanced software traceability using deep
learning techniques. In: 2017 IEEE/ACM 39th international conference on software engineering
(ICSE) IEEE, pp 3–14
5.
Khelladi DE, Kretschmer R, Egyed A (2018) Change propagation-based and composition-based
co-evolution of transformations with evolving metamodels. In: Proceedings of the 21th
ACM/IEEE international conference on model driven engineering languages and systems, pp
404–414
6.
Lopez-Arredondo LP, Perez CB, Villavicencio-Navarro J, Mercado KE, Encinas M, Inzunza-
Mejia P (2020) Reengineering of the software development process in a technology services
company. Bus Process Manag J 26(2):655–674
[Crossref]
7.
Mills C, Escobar-Avila J, Haiduc S (2018) Automatic traceability maintenance via machine
learning classification. In: 2018 IEEE international conference on software maintenance and
evolution (ICSME). IEEE, pp 369–380
8.
Moreira JRP, Maciel RSP (2017) Towards a models traceability and synchronization approach of
an enterprise architecture. SEKE, pp 24–29
9.
OMG BPMN specification.: Business Process Model and Notation. http://​www.​bpmn.​org/​.
Accessed 31 Feb 2023
10.
OMG UML Specification (2017) Unified modeling language (omg uml), superstructure,
version2. In: Object management group, p 70
11.
Pavalkis S, Nemuraite L, Milevičienė E (2011) Towards traceability metamodel for business
process modeling notation. In: Building the e-world ecosystem: 11th IFIP WG 6.11 conference
on e-business, e-services, and e-society, I3E 2011. Kaunas, Lithuania. Revised Selected Papers
11. Springer, Berlin, pp 177–188
12.
Swathine K, Sumathi N, Nadu T (2017) Study on requirement engineering and traceability
techniques in software artefacts. Int J Innovat Res Comput Commun Eng 5(1)
13.
Bouzidi A, Haddar NZ, Ben-Abdallah M, Haddar K (2020) Toward the alignment and
traceability between business process and software models. In ICEIS 2:701–708

OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed
Computing , Studies in Computational Intelligence 1153
https://doi.org/10.1007/978-3-031-56388-1_5

An Assessment of Fintech for Open


Banking: Data Security and Privacy
Strategies from the Perspective of Fintech
Users
Amila Munasinghe1 , Srimannarayana Grandhi1 and Tasadduq Imam1

(1) Central Queensland University, Melbourne, Australia

Amila Munasinghe (Corresponding author)


Email: [email protected]

Srimannarayana Grandhi
Email: [email protected]

Tasadduq Imam
Email: [email protected]

Abstract
The use of Fintech for open banking gradually increased as it allows
consumers to share their data with third party providers and access their
products and services. Due to a lack of trust in data security and privacy,
there is a disinclination of consent to share personal data and a fear of
losing control over the data hoarded during open banking. Despite
significant research, there is still no research from the users’ cognitive and
behavioural perspectives to demonstrate the challenges associated with
Fintech use. This study presents a research model for assessing the position
of Fintech in the open banking context, especially from the customers’
perspectives on data security and privacy issues. The article lays out the
protocols and constructs relevant to the model. Using a quantitative
approach, this study will collect data through online survey questionnaires
and employ the structural equation modelling technique to test the model.
This study is expected to help develop appropriate policies to enhance the
use of Fintech by consumers in the Australian banking sector.

Keywords Fintech – Data security and privacy – Customer trust – Open


banking – Cognitive and behavioural perspectives – Australia

1 Introduction
Fintech created new ecosystems with a massive influx of financial data.
This data is considered a salient asset in the modern finance world.
Possessing quality and volume data gives competitive advantages to
financial organisations [1]. Governance of data quality is a paramount
prerequisite to data management due to the mammoth exchange of
information in financial organisations [2, 3]. Due to its power inherency in
data, the governance of such power is crucial in the finance sector.
Governance of power is a global phenomenon, though managed by
country-specific governments. Governments recognise these information
asymmetries as disadvantages in creating viable business opportunities
from both the growth aspect of the evolving Fintech industry as well as fair
user perspectives [1, 4]. Thus, the emergence of an innovative solution
—‘open banking’ to liberalise the data held by banks [1, 5, 6]. Whilst this
move has been frowned upon by the banks due to its marginalised returns in
exchange for their data, using this drive-through mechanism boosted the
Fintech infants’ penetration ability into the competitive lending industry [7].
Despite many benefits in Fintech open banking, challenges persist in
managing and preserving the core of sound data management where users’
positive adoption perceptions are stimulated whilst data security and
privacy remain unaffected [8] and as a measure Fintech blockchain is used
as a safer mode strategy in assuring the open banking application
programming interface (API) security and privacy-related concerns [9].
However, it’s inevitable to expect completely risk-free when exposing data
for system consumption [10], given they are also human-operated and
technology itself advances to beat its own capabilities [11], exposing to
cyberattacks, malware, undue access, fraudulent representation penetration
attempts to APIs [2, 8, 11–13], apart from the likely presence of Fintech
brokers who trade data to big technology [4].
This study identifies three research gaps. First, users greatly trust
Fintech providers to secure personal and sensitive financial data privately
[14], and providers uphold the trust in safeguarding the sensitive data [15].
Whilst there is ample literature that studies the Fintech adoption criteria
using various models [14, 16–18] still, there is a gap in the customer
expectations of data security and privacy which may seem to hinder the
Fintech open banking adoption in Australia and ways to enhance its
adoption that requires further assessment.
Second, consumers have very little control over the data they leave
behind in the process as the responsibility of security and privacy
arrangements seems to alter their form with a change of hands [8, 16, 19].
Whilst customers’ expectations of privacy protection are that the mandated
physical binds and perceived moral binds of data protection laws are
obeyed by Fintech [1, 3, 6], studies explaining the customers’ rights on such
hoarded data beyond their primary objective of release and regulatory
aspects seem to lack. The duty of care phenomenon often has been
conveniently neglected and unsupported [8], which needs further review.
Third, Australia has been an early adopter of open banking amongst the
very few countries with the government supporting the emerging industry
by introducing regulatory sandbox and legislative platforms, especially
toward provoking the uptake. There is, however, a little research on the
impact of such steps on customer trust and adoption intents.
To address the identified research gaps, this study aims to review open
banking customers’ perspective of data security and privacy, the
interconnectedness of the data security and privacy strategies and their
importance in shaping Fintech use.
Based on previous technology adoption studies, this study presents a
research model to pinpoint the underlying behavioural factors affecting
Fintech adoption and use by the consumers of Australian financial
organisations. Especially, it focuses on how Fintech data security and
privacy elements embedded in open banking operations enhance or hinder
the use of Fintech for open banking and address the following research
questions.
RQ1: What data security and privacy challenges are customers of
Australian financial organisations facing while using Fintech for open
banking?

RQ2: How do behavioural factors influence customers to use Fintech for


open banking?
The remainder of this paper unfolds as follows. Section 2 comprises the
systematic literature review on Fintech for open banking in Australia, the
importance of data security and privacy and the challenges that affect
customers’ trust in Fintech adoption for open banking. Section 3 presents
the theoretical basis for the proposed research model. Section 4 proposes
the research model and propositions. Section 5 develops the research
design. Section 6 presents the concluding comments with limitations, future
research opportunities and the discovery elements of this paper.

2 Literature Review
In this section, a condensed summary of the articles reviewed is provided to
highlight the open banking concept, its benefits, and the importance of
enhancing the data security and privacy aspects, government regulatory
governance and customer trust factors in the Australian context to enhance
the use of Fintech by customers for open banking. The technology adoption
model, which can characterise salient features like establishing customer
trust, data security and privacy assurance, social influence, facilitating
conditions and the role of government regulations, is also presented toward
developing the proposed framework and criteria linked to the proposition.

2.1 Fintech for Open Banking in Australia


Open banking is defined as a legislative-backed, online Fintech invention
that uses an API, in the form of software as a service (SaaS) user interface,
with which previously inaccessible, limited, and siloed data, now with
customers’ explicit consent, can be shared with Fintech [1, 6, 13, 19]. Users
of Fintech business models such as payments, lending and borrowing
achieve many benefits such as secure, faster, hasslefree, and cost-effective
services [1, 5, 6]. Open banking empowers consumers to take ownership of
sharing their data with financial organisations, expect switching cost
reduction, allow lender shopping and better rate bargaining [1, 16].
Prior studies established the Fintech open banking concept, the
regulatory framework, ecosystems and business models, advantages and
factors affecting Fintech adoption decisions in various countries [6, 7, 13,
16, 19]. The latest EY’s Global Fintech Adoption Index 2019 indicates that
Australian consumer Fintech adoption is 58% as opposed to China’s 87%
highest rate reported and the same report also revealed that Australian
Fintech customer adoption rate is even below the overall average of 64%
surveyed over twenty-seven markets [20]. The regulatory technology
embedded in Fintech to comply with the stringent legislations relating to
data security and privacy, as part of mitigating and controlling internal and
external data breaches, can boost customers’ adoption intention [12].
Further, transparency concerning data security and privacy can improve the
legitimacy of regulations, which is important for Fintech companies to
maintain positive public perception among customers [18, 21]. The
Australian customers’ perception of open banking is seen as a concept
rather than as an actual adoption experience due to its novelty [16], thus
requiring further research.

2.2 Importance of Data Security and Privacy


Privacy is about protecting one’s personal data. Everyone has a right to their
privacy and a belief that they are protected with a presumption that there are
laws that the Fintech organisations require to follow, that there is a duty of
care and that no harm is done using their personal data [14]. Data privacy
refers to the data usage aspects as to how ‘publicly available the data should
be’, while security would perform the task of protecting the access of such
data through mitigating any risks of misuse, breaches, and attacks to those
data privacy borders apart from the vested duty of care on organisation
executives [22]. Technological attributes such as data security, privacy, and
responsiveness affect customer trust and intention to use Fintech [18].

2.3 Importance of Customer Trust


Trust is considered a key factor in Fintech adoption, brand loyalty and
business continuity [16]. It is a ‘state of mind’ which involves
psychological, interpersonal, experiential, and cultural features to trust
another [23]. It is defined as the belief in the customers’ mind about some
products or services and a positive perspective drawn towards its quality,
security and privacy, system soundness and brings the bonding effect
towards the organisation which creates customer loyalty in the long run
[24]. Due to lockdowns imposed during the Covid-19 pandemic, financial
organisations are offering all virtual online platform-based services and
moving away from face-to-face interaction to serve their customers.
However, the use of Fintech by customers for open banking was reported to
be a lot lesser in Australia compared to many developing countries due to
data security and privacy concerns and a lack of trust in technologies.
Therefore, trust is considered to play a crucial role in attracting customers
to use Fintech [14, 16, 23, 25].
Further, where the trust is concerned in primary financial relationships
(PRFs), the lowest being UK (72%), Australia reports the second lowest
(73%) trust dependent on PRFs whilst China is leading at 92% and the
global average of 14 countries tested is 82% (EY’s NextWave Global
Consumer Banking survey 2021, Ch. 2). These emphasise the need to
further explore the challenges faced by Fintech users in the Australian
context.

3 Theoretical Basis for a Research Model


Overall, the literature review identifies various constructs that can influence
the adoption of Fintech, especially in the open banking context. To
highlight how Fintech data security and privacy elements embedded in open
banking operations enhance or hinder the use of Fintech for open banking,
this study proposes a research model and relevant propositions, as detailed
in the next section. This section presents the theoretical basis underpinning
the model’s development.

3.1 Technology Adoption Models


There is a spectrum of innovators to laggards in every industry [26] and
particularly in the Australian finance industry, the reported, below-average
market low uptake, is a concern in this study. Studies have revealed that
despite the ubiquity of Fintech offers, the low uptake could relate to a lack
of security, privacy, transparency, availability, speed, regulations from the
organisational level and the customer’s level, lack of trust, education,
knowledge, awareness, risk-taking tendencies apart from other variants
such as the geographical location, age, gender, socio physical and mental
behavioural aspects [21].
This study intends to observe the Fintech customers’ perspective on
both perceived and actual usage of Fintech for open banking, emphasising
factors impacting data security and privacy. Ample studies assess the
behavioural intention to use Fintech or continuous usage intent [16, 27, 28].
However, there seems limited research that assess the actual usage of
Fintech [18, 29] and none resulted as part of the literature search strategy
depicting Fintech open banking’s actual usage in the Australian context.
The existing open banking technologies literature in the Australian
context provides conceptual frameworks that are mainly focusing on the
behavioural intention to adopt new technologies and as such, the actual
usage of technology with real customer experiences is not tested [16].
Although initial trust elements with structural assurance are considered an
important construct, trust built, and loyalty formed experienced innovation
is missing in current literature.
The same focus is on the importance of the strengthening role it requires
for government regulations in technology adoption. After reviewing some
of the most renowned technology adoption models such as the Technology
Acceptance Model (TAM), Theory of Reasoned Action (TRA), Diffusion of
Innovation theory (DOI), Theory of Planned Behaviour (TPB), Technology-
Organisation-Environment (TOE), Task-Technology Fit Theory (TTF),
Stimulus–Organism–Response Theoretical Framework (S–O–R), Unified
Theory of Acceptance and Use of Technology (UTAUT), Trust Theory and
Social Cognitive Theory (SCT), it is noted that none of the models by
themselves cover all the aspects under review in this study. In the absence
of a single comprehensive model [18, 29] that covers individuals’ cognitive
and behavioural aspects of Fintech actual adoption in the wholistic
construct that also includes trust, data security, privacy, and government
regulatory governance, this conceptual model may support individual
customers’ perspective—a reason the TAM model is considered as a basis
for the proposed research.

3.2 Users’ Cognitive and Behavioural Perspectives


Users’ cognitive and behavioural perspectives can be described as human
factors and as such are considered equally important when adopting novel
technologies. The S–O–R framework has been seen by many researchers
from different perspectives such as useful for cognitive aspects analysis of
customers’ emotional states and how customers encountered the service and
their perceptions and responses to such technology assessments [30]. This is
an interesting concept to a proposition from the stimulus–organism–
response point of view of the customer trust factor as to how it connects
with data security and privacy strategies, how they translate to system
features and capabilities and what perceptions are formed by users to
motivate to use such systems and thus how they could transform to users’
intent to use the systems and actual usage of technology. Therefore, a
combination of constructs from various technology adoption models is
selected to complement the proposed model to include the cognitive and
behavioural perspectives as previous studies ignored the human factors in
the actual usage decision. This study investigates the effects of customer
trust in data security and privacy, social influence, facilitating conditions
and government regulations and assesses users’ cognitive and behavioural
perspectives on adoption intent as well as actual use for open banking.

4 Research Model and Proposition


To address the research questions of this study, this section presents a
research model that can conceptualize the position of Fintech for open
banking and the data security and privacy strategies that can influence their
use from the perspective of Fintech users. The following propositions are
derived from synthesizing the constructs and the theoretical basis discussed
in the previous sections. Figure 1 illustrates the proposed framework.

Fig. 1 Proposed framework—(extracted from UTAUT, TTF, S–O–R and trust theory)

4.1 Data Security and Privacy


One of the constructs selected (in Fig. 1) and a key influencer that affects
the open banking adoption decision is privacy and data security. As
reflected in Sect. 2 of this paper, users of open banking may hesitate to
share their data with third parties, especially since there may be concerns
about how the customers’ data and privacy are protected by the parties
gaining such access. With the above contextualized view, the proposition
below is derived concerning the use of Fintech for open banking.

H1: Data security and privacy have a positive influence on customers’ trust.

4.2 Government Regulations


Government intervention in bringing fair play and economies of scale in the
financial and business world is an integral aspect of maintaining a country’s
socioeconomic and political stability [6]. It is also important in developing
and supporting business longevity, logistics and technological infrastructure
for sustainable development [27]. In open banking, the legislative
regulations, and procedures to a greater extent need to be embedded in the
technology applications so that compliance of it at least pronounced
through the small print in the consent documentation where at the time of
the user’s initial interactions they are exposed [5, 6, 13, 19].

H2: Government regulations have a positive influence on customer trust.

4.3 Social Influence


Individuals living in a society tend to observe, adjust, adapt, and imitate
others’ behaviours to suit the social environment. This aspect of the
behavioural element is defined as a social influence [31], where it expresses
how others’ attitudes, beliefs or behaviours suggest an individual follows a
certain way. Venkatesh and Bala [32] developed TAM3 from the combo of
TAM2 of Venkatesh and Davis [33] and the model of the base elements of
perceived ease of use of Venkatesh [34], where they adopt the use of social
influence as a factor that stimulates a technology adoption intent and
explains the degree to which an individual perceives others believe they
should use innovative technology [34].
Social influence is known to be a critical influencer in building trust
among family members, friends, and peers’ advice and users’ decision to
use internet banking [23]. Studies empirically prove that social influence
has a positive relation to behaviour intention to use technology [28].
Venkatesh et al. indicate that social influence may not be significant in the
case of voluntary adoption [32, 36]. Although the study relating to internet
banking technology usage empirically proves a significant negative
influence on actual use [18], this study will not expect to synthesize
comparable results. Thereby the negative influence will not be tested.

H3: Social influence has a positive influence on customer trust.

4.4 Facilitation Conditions


This original theory adopted from the concept of perceived behavioural
control aspect from Ajzen’s [36] TPB model to denote the facilitating
conditions construct. It’s the consumers’ perceptions of the resources that
are available in both organizational and technological infrastructure to
support the use of the system, the training and education to perform and has
a significant impact on the direct use of behaviour [32]. It is also tested that
facilitating conditions get strengthened with the support from government
regulations when considered together on Fintech users’ adoption decisions
[27]. Facilitating conditions have a positive effect on customer trust element
to use Fintech [38].

H4: Facilitating conditions have a positive influence on customer trust.

H5: Facilitating conditions have a positive influence on the customer’s


actual use of Fintech for open banking.

4.5 Users’ Trust


Trust is paramount in the banking industry, especially with the move from
traditional brick-and-mortar banking services to more Fintech-oriented
modern and ‘internet-only’ banking [23, 25, 28, 39]. This change, however,
can be an uphill task since from inception the services may be internet-only
or web-based with customers’ adoption trust occurring either via own
instinct or social influences, and thus winning customers ‘initial trust’ is
critical [23]. Hence, the proposed conceptual framework includes a trust
theorem to evaluate the predisposition of trust in Fintech open banking
adoption.

H6: Customer trust positively influences customers’ intention to adopt


Fintech for open banking.
4.6 Customer Use Experience
This study not only focuses on investigating the customers’ behavioural
intention to use Fintech but also the use behaviour. Therefore, customers’
actual technology usage experience and the influence of customers who
have already adopted this technology make an impact on the adoption
decision [18]. The relationship between the user’s behavioural intention and
the actual use behaviour in the consumer perspective needs to be tested on
open banking in this conceptual framework to re-establish the previously
tested hypotheses for Fintech services that have been conducted using the
adapted technology acceptance model [18]. Hence the below proposition is
drawn based on previous studies:

H7: Customer’s behavioural intention positively influences actual usage of


Fintech for open banking.

5 Research Design
Further to conceiving the research model, it is worth reflecting on the
research design and approach as relevant to address the research questions
based on the proposed model. In this regard, this study considers an
objectivist epistemological view and is concerned with the cause and
outcomes [40], especially intending to investigate the role of behavioural
factors in enabling trust in Fintech and promoting Fintech adoption and use.
Thus, this study will adopt a quantitative research approach to address this
goal. A questionnaire will also be developed for data collection, to measure
the constructs and the relationships among the constructs presented in the
research model, which combines multiple research models [41, 42]. Using
the online data collection tool, the online survey questionnaire will be
distributed to the customers of Australian financial organizations who use
Fintech for open banking. This will help understand their views on the role
of behavioural factors promoting trust in technologies and their intent to
accept and use Fintech for open banking. Upon collecting the data, it will be
analyzed and validated using the statistical tool - Statistical Package for the
Social Sciences (SPSS). Then, the survey data will be used to test the
proposed research model and its nominated propositions by applying the
Structural Equation Modelling (SEM) technique. The results of the survey
data will be further compared with that in existing findings and
recommendations will be made accordingly.

6 Conclusion
This study has reviewed previous literature relating to Fintech data security
and privacy strategies and how customers’ cognitive and behavioural
adoption challenges concerning the adoption of Fintech for open banking.
The literature revealed the importance of human factors in technology
adoption decisions and identified research gaps. It can be seen from the
literature that there are limited studies on the challenges faced by customers
of Australian financial organizations. Even though Australia has been an
early adopter in the regulatory sandbox in Fintech open banking compared
to other countries like the UK, Europe, and the USA, there seems a
consumer adoption lag in Australian Fintech open banking. A research
model is proposed in this paper to assess the Fintech adoption intention for
open banking and actual adoption criteria by Australian Fintech users by
examining the impact of data security and privacy strategies and
government regulations on customer trust. Further, the model also expects
to predict the impact of social influence and facilitating conditions on
customer trust and thereby the actual adoption decision. This study intends
to use a quantitative method, online survey questionnaires for data
collection. The proposed model’s derived propositions will be tested using
the structural equation modelling technique to establish the impact of each
variable.
This study has some limitations. First, this study focuses on the
Australian context, it collects data from the customers of Australian
financial organizations. The findings may not be relevant to other countries.
Even so, the research is important with a lack of similar research on the
Australian context. Potentially, testing the research finding across contexts
can be follow-up future research. Second, the proposed study adopts a
quantitative method which may result in offering inconclusive evidence.
Adopting a mixed method, like data collection through interviews can
enable the researchers to seek in-depth answers from the participants. This
is another possible future undertaking. However, with a solid theoretical
basis dictating the research model and the use of SEM which may offer
important information on the relationships among the factors presented in
the research model, this research will expectantly highlight consumers’
different cognitive behaviours concerning Fintech in open banking, and
thereby help reshape data security and privacy strategies of the banking
sector. The outcomes may also guide enacting government regulations to
cater to different values on Fintech adoption in open banking in Australia.

References
1. Fracassi C, Magnuson W (2021) Data autonomy. Vanderbilt Law Rev 74(2):327–383
2.
Brown E, Piroska D (2022) Governing Fintech and Fintech as governance: the regulatory
sandbox, riskwashing, and disruptive social classification. New Polit Econ 27(1):19–32. https://​
doi.​org/​10.​1080/​13563467.​2021.​1910645
3.
Solove DJ (2006) A taxonomy of privacy. Univ Pa Law Rev 154(3):477–564
[Crossref]
4.
Di Porto F, Ghidini G (2020) “I access your data, you access mine”: requiring data reciprocity in
payment services. IIC Int Rev Intellect Prop Compet Law 51:307–329. https://​doi.​org/​10.​1007/​
s40319-020-00914-1
5.
Aytaş B, Öztaner SM, Şener E (2021) Open banking: opening up the walled gardens. J Paym
Strat Syst 15(4):419–431
6.
Palmieri A, Nazeraj B (2021) Open banking and competition: an intricate relationship. In: EU
and comparative law issues and challenges series, Osijek, JJ, Strossmayer University of Osijek,
pp 217–237
7.
Barr MS, DeHart A, Kang A (2019) Consumer autonomy and pathways to portability in banking
and financial services. University of Michigan Center on Finance, Law & Policy Working Paper,
University of Michigan Law & Economics Research Paper, no. 19-022. https://​doi.​org/​10.​2139/​
ssrn.​3483757
8.
Borgogno O, Colangelo G (2020) Consumer inertia and competition-sensitive data governance:
the case of open banking. J Eur Consum Mark Law 9(4):143–150. https://​doi.​org/​10.​2139/​ssrn.​
3513514
[Crossref]
9.
Wang H, Ma S, Dai HN, Imran M, Wang T (2020) Blockchain-based data privacy management
with nudge theory in open banking. Futur Gener Comput Syst 110:812–823. https://​doi.​org/​10.​
1016/​j.​future.​2019.​09.​010
[Crossref]
10.
PwC (2020) Financial services technology 2020 and beyond: embracing disruption. 48
11.
Wang JS (2021) Exploring biometric identification in FinTech applications based on the
modified TAM. Financ Innov 7(42). https://​doi.​org/​10.​1186/​s40854-021-00260-2
12.
Arner D, Buckley R, Charamba K, Sergeev A, Zetzsche D (2022) Governing Fintech 4.0:
bigtech, platform finance, and sustainable development. Fordham J Corp Financ Law 27(1):1–71
13.
Jenga L (2022) Open banking. Oxford University Press, Oxford, United States: Incorporated.
http://​ebookcentral.​proquest.​com/​lib/​cqu/​detail.​action?​do-cID=​7034203. Accessed 7 Aug 2022
14.
Stewart H, Jürjens J (2018) Data security and consumer trust in FinTech innovation in Germany.
Inf Comput Secur 26:109–128. https://​doi.​org/​10.​1108/​ICS-06-2017-0039
[Crossref]
15.
Barbu CM, Florea DL, Dabija DC, Barbu MC (2021) Customer experience in Fintech. J Theor
Appl Electron Commer Res 16(5):1415–1433. https://​doi.​org/​10.​3390/​jtaer16050080
16.
Chan R, Troshani I, Rao HS, Hoffmann A (2022) Towards an understanding of consumers’
FinTech adoption: the case of Open Banking. Int J Bank Mark 40:886–917. https://​doi.​org/​10.​
1108/​IJBM-08-2021-0397
[Crossref]
17.
Li G, Dai JS, Park EM, Park ST (2017) A study on the service and trend of Fintech security
based on text-mining: focused on the data of Korean online news. J Comput Virol Hacking Tech
13(4):249–255. https://​doi.​org/​10.​1007/​s11416-016-0288-9
[Crossref]
18.
Singh S, Sahni MM, Kovid RK (2020) What drives FinTech adoption? A multi-method
evaluation using an adapted technology acceptance model. Manag Decis 58:1675–1697. https://​
doi.​org/​10.​1108/​MD-09-2019-1318
[Crossref]
19.
Kassab M, Laplante PA (2022) Open banking: what it is, where it’s at, and where it’s going.
Computer 55:53–63. https://​doi.​org/​10.​1109/​MC.​2021.​3108402
[Crossref]
20.
Ernst & Young Global Limited (2019) Global FinTech Adoption Index 2019. Ernst & Young
Global Limited, London, UK. https://​assets.​ey.​com/​content/​dam/​ey-sites/​ey-com/​en_​gl/​topics/​
banking-and-capital-markets/​ey-global-fintechadoption-index.​pdf. Accessed 7 Aug 2022
21.
Gomber P, Robert J, Kauffman C, Bruce WW (2018) On the Fintech revolution: Interpreting the
forces of innovation, disruption, and transformation in financial services. J Manag Inf Syst
35(1):220–265. https://​doi.​org/​10.​1080/​07421222.​2018.​1440766
22.
Ng AW, Kwok BK (2017) Emergence of Fintech and cybersecurity in a global financial centre. J
Financ Regul Compliance 25:422–434. http://​dx.​doi.​org.​ezproxy.​cqu.​edu.​au/​10.​1108/​JFRC-01-
2017-00132
23.
Kaabachi S, Ben MS, O’Leary B (2019) Consumer’s initial trust formation in IOB’s acceptance:
the role of social influence and perceived compatibility. Int J Bank Mark 37:507–530. https://​doi.​
org/​10.​1108/​IJBM-12-2017-0270
[Crossref]
24.
Li Z, Li W, Wen Q, Chen J, Yin W, Liang K (2019) An efficient blind filter: location privacy
protection and the access control in FinTech. Futur Gener Comput Syst 100:797–810. https://​doi.​
org/​10.​1016/​j.​future.​2019.​04.​026
[Crossref]
25.
Lee JM, Kim HJ (2020) Determinants of adoption and continuance intentions toward internet-
only banks. Int J Bank Mark 38:843–865. https://​doi.​org/​10.​1108/​IJBM-07-2019-0269
[Crossref]
26.
Rogers EM (1995) Diffusion of innovations (4th ed.) New York: Free Press
27.
Kurniasari F, Tajul Urus S, Utomo P, Abd Hamid N, Jimmy SY, Othman IW (2022) Determinant
factors of adoption of Fintech payment services in Indonesia using the UTAUT approach. Asia-
Pac Manag Account J 17:97–125
28.
Peong KK, Peong KP, Tan K, Tan Y (2021) Behavioural intention of commercial banks’
customers towards financial technology services. J Financ Bank Rev JFBR 5:10–27. https://​doi.​
org/​10.​35609/​jfbr.​2021.​5.​4(2)
29.
Turner M, Kitchenham B, Brereton P, Charters S, Budgen D (2010) Does the technology
acceptance model predict actual use? A systematic literature review. Inf Softw Technol 52:463–
479. https://​doi.​org/​10.​1016/​j.​infsof.​2009.​11.​005
[Crossref]
30.
Roberts-Lombard M, Petzer DJ (2021) Relationship marketing: an S–O–R perspective
emphasising the importance of trust in retail banking. Int J Bank Mark 39:725–750
[Crossref]
31.
Kelman HC (1958) Compliance, identification, and internalization: three processes of attitude
change. J Conflict Resolut 2(March):51–60
[Crossref]
32.
Venkatesh V, Bala H (2008) Technology acceptance model 3 and a research agenda on
interventions. Decis. Sci. 39(2):273–315. https://​doi.​org/​10.​1111/​j.​1540-5915.​2008.​00192.​x
33.
Venkatesh V, Davis FD (2000) A theoretical extension of the technology acceptance model: four
longitudinal field studies. Manage Sci 46(2):186–204
34.
Venkatesh V (2000) Determinants of perceived ease of use: integrating control, intrinsic
motivation, and emotion into the technology acceptance model. Inf Syst Res 11(4):342–365
35.
Venkatesh V, Morris MG, Davis GB, Davis FD (2003) User acceptance of information
technology: toward a unified view. MIS Quarterly 27:425–478
36.
Zhang Q, Li Y, Wang R, Liu L, Tan Y, Hu J (2021) Data security sharing model based on privacy
protection for blockchain-enabled industrial internet of things. Int J Intell Syst 36:94–111
[Crossref]
37.
Ajzen I (1985) From intentions to actions: A theory of planned behavior. In J. Kuhl & J.
Beckman (Eds.) Action-control. From cognition to behavior Heidelberg: Springer 11–39
38.
Lai PC (2017) The literature review of technology adoption models and theories for the novelty
technology. J Inf Syst Technol Manag 14:21–38. https://​doi.​org/​10.​4301/​S1807-
1775201700010000​2
[Crossref]
39.
Kassab M, Laplante P (2022) Trust considerations in open banking. IT Prof 24:70–73. https://​
doi.​org/​10.​1109/​MITP.​2021.​3136015
[Crossref]
40.
Yilmaz K (2013) Comparison of quantitative and qualitative research traditions: epistemological,
theoretical, and methodological differences. Eur J Educ 48(2):311–325. https://​doi.​org/​10.​1111/​
ejed.​12014
[Crossref]
41.
O’Leary Z (2021) The essential guide to doing your research project, 4th edn. SAGE
Publications Ltd
42.
Wahyuni D (2012) The research design maze: understanding paradigms, cases, methods and
methodologies. J Appl Manag Account Res 10(1):69–80. https://​ssrn.​com/​abstract=​210308

OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed
Computing , Studies in Computational Intelligence 1153
https://doi.org/10.1007/978-3-031-56388-1_6

Using Key Point Detection to Extract


Three Dimensional Phenotypes of Corn
Yuliang Gao1 , Zhen Li1 , Seiichi Serikawa1 , Bin Li2 and
Lifeng Zhang1
(1) Kyushu Institute of Technology, Kitakyushu, Japan
(2) Yangzhou University, Yangzhou, China

Yuliang Gao (Corresponding author)


Email: [email protected]

Zhen Li
Email: [email protected]

Seiichi Serikawa
Email: [email protected]

Bin Li
Email: [email protected]

Lifeng Zhang
Email: [email protected]

Abstract
Corn 3D phenotype extraction faces several problems, including low
precision, excessive manual involvement, long processing times, and the
requirement for complex equipment. To address these issues, we propose a
novel key point detection deep learning model called V7POSE-GSConV,
which operates on RGB-D data. This method is built upon the YOLOv7-
POSE key point detection model, allowing us to directly capture key points
of corn plants using RGB data as input. Leveraging corresponding RGB-D
data, this work can derive the comprehensive structure of corn, extract
parameters such as leaf length, leaf angle, plant height, and ear phenotype.
In our experiments, the YOLOv7-POSE model achieved a recognition
accuracy of 99.45% in training stage. To further optimize the model for
efficiency while maintaining accuracy, we introduced GSConV. The results
demonstrate a 2% reduction in the number of model parameters in
V7POSE-GSConV, with no loss in accuracy.

Keywords 3D Phenotype – Key point detection – YOLOv7-POSE

1 Introduction
Corn is one of the most vital staple foods globally, and its products and
derivatives find application in various fields. Increasing corn yield by
breeding has always been the focus of research. By studying phenotypic
parameter differences among corn plants, this discipline enables targeted
crop improvement, demand-driven breeding, and increased crop yields.
Rapidly acquiring a substantial number of plant parameters is a significant
goal in plant phenomics. In the era of information technology, plant
phenomics plays a pivotal role in breeding [1].
Traditional methods for obtaining phenotypic parameters in corn
breeding heavily rely on manual measurements using rulers and protractors.
These methods suffer from inefficiency, measurement errors, damage. A
more advanced approach involves obtaining high-precision plant point
clouds through 3D reconstruction. However, 3D reconstruction method
comes with disadvantages, including expensive equipment, specific site
requirements, susceptibility to weather conditions, complex procedures.
With the evolution of deep learning, key point detection has emerged as a
prominent task. End-to-end key point detection models can directly use
RGB data as input and output key points of corn plants. Utilizing
corresponding RGB-D data, we can directly output corresponding 3D key
points. These 3D key points facilitate the rapid extraction of relevant plant
phenotypes. Key point detection-based methods have advantages such as
cost-effectiveness, fast extraction speeds, and robustness. Consequently, the
development of corn key point identification technology is crucial for the
swift and convenient extraction of corn plant phenotypes.
In the realm of 2D key point detection, Toshev et al. [2] pioneered the
application of Convolutional Neural Networks (CNN) for human key point
detection and introduced the DeepPose algorithm for deep pose estimation.
Deep Neural Networks [3] are initially employed for rough human key
point detection, using DNN regression to predict key point positions in the
human body. Under consistent image sizes and unchanged computational
parameters, cascaded regressors refine the human body key point with
higher accuracy. DeepPose represents a pivotal shift from traditional to
deep learning methods for key point detection in the human body. Wei et
al. [4] introduced Convolutional Pose Machine (CPM), which derives
image features and spatial context directly from the study data. CPM
initially approximates human body key point locations through multi-stage
estimation and progressively refines these positions at each stage. By
combining multiple convolutional networks, CPM resolves the issue of
decreasing neural network gradients as layer count rises. Newell et al. [5]
introduced a Stacked Hourglass network (SHN), which stacks multiple
hourglass networks in a cascaded manner to leverage multi-scale features in
human body key point detection. Lifshitz et al. [6] introduced Deep
Consensus Voting, which utilizes high-density multi-object voting instead
of sparse key point location sets for detecting human body key points. It not
only offers effective detection but also computes the probability of joint key
points related to the detection result. Zhang et al. [7] proposed a fast pose
estimation model called Fast Pose Distillation, a lightweight human key
point detection model utilizing four hourglass models, suitable for low-
compute environments. This network has effectively reduced the network
complexity associated with human key point detection. Thus far, research in
two-dimensional single-player key point detection based on depth methods
is comprehensive, with a primary work focused on improving detection
accuracy and precision.
In classical networks such as YOLOv5 [8] and YOLOv7 [9], there are
also structures specifically designed for key point detection tasks, they all
get outstanding results.

2 Data Collection and Processing


The corn photos used in this study were obtained from the experimental
field as Fig. 1 at Yangzhou University as Fig. 2, and the data were captured
using the Stereolab Zed2i binocular camera as Fig. 3. The Zed2i camera is a
specialized binocular stereo camera known for its lightweight and flexible
design and strong reliability. It is well-suited for outdoor use in varying
light conditions and provides images with pixel counts and resolutions that
meet the requirements for algorithm development.
These photos were taken during the spring of 2023 in the experimental
corn field, specifically featuring Surunuo No. 1 corn variety. They were
captured at the Yangzhou University Experimental Field, located in
Yangzhou, Jiangsu, China. The photos predominantly showcase a frontal
view of the corn plants, allowing for a clear and detailed observation of
each leaf and the entire corn stalk.
In total, 200 sets of corn photographs were collected for this study. Each
set of photos includes a left view, a right view, and a depth map as shown in
Fig. 4. It’s worth noting that the internal camera parameters for each set of
photos are consistent within the margin of error, ensuring the quality and
reliability of the collected data.
Fig. 1 The experimental field

Fig. 2 Location of Yangzhou University experimental place

Fig. 3 Stereolab Zed2i binocular camera


Fig. 4 Depth map

2.1 Data Annotation and Enhancement


Each corn plant was marked with 7 key points as Fig. 5, respectively:
(1) Root point: This denotes the point of where the corn plant’s root
connects to the ground.

(2) Top point: This denotes the point of the highest point of the corn stalk.

(3) Leaf connection point: This denotes the point of where a leaf is
attached to the main stem of the corn plant.

(4) Highest Point of a Leaf: This denotes the point of the uppermost point
on a leaf.
(5) Angle point: This denotes the point of the one-quarter of the distance
from the base to the apex of the leaf.

(6) Tip point: This denotes the point of the end of a leaf blade.

(7) Stalk point: This denotes the point of the root of a corn ear connects to
the main stem of the corn.

Fig. 5 Key point example


In this paper, the main phenotypes measured are as follows:
(1) Plant height: Plant height is the measurement of the vertical distance
between the Root point and the top point of the plant. As shown in
Fig. 6.

(2) Leaf Angle: Leaf angle is the degree of inclination between the midrib
of a leaf blade and the main stem of the corn plant. It plays a critical
role in shaping the canopy structure and can have a direct impact on
crop yield, as cited in reference [11]. As shown in Fig. 7, We calculate
it by the Angle between these three points, leaf connection point, angle
point and top point.
(3) Leaf Length: Leaf length refers to the length of the leaf. Since the
blade has curvature in the normal state, it is measured by straightening
the blade in the artificial scene. As shown in Fig. 8, we calculate the
blade length by the euclidean distance between leaf connection point,
angle point, highest Point of a leaf and tip point.

(4) Ear Position: Ear position indicates the specific section or segment of
the corn plant where the ear grows and develops. As shown in Fig. 9,
we calculate it by root point and stalk point.

Fig. 6 Calculation of plant height


Fig. 7 Calculation of leaf angle
Fig. 8 Calculation of leaf length
Fig. 9 Calculation of ear position
These phenotype clarify the various parameters used to characterize and
analyze corn plants, aiding in a more precise understanding of their growth
and characteristics.
And these data is labeled according to the YOLO-POSE format. In
order to enhance the dataset, the photo is flipped left and right. We train our
model using both the original and flipped images. Our training process
should take into account the flipped images and their adjusted key point
coordinates.

3 V7POSE-GSConv for Key Point Detection


3.1 YOLOv7-POSE
We take the YOLOv7-POSE as Fig. 10 as backbone for key point detection,
the YOLOv7-POSE network structure of which consists of two parts:
backbone network, and Head network.
(1) Backbone Network: The backbone network is responsible for
processing the input image and extracting features at different scales.
It consists of several CBS modules, including convolutional layers
(ConV), Batch Normalization (BN), and SiLU (Sigmoid-Weighted
Linear Unit) activation function. As Fig. 11, the CBS module
consisted of ConV, BN and SiLU is used for feature extraction, and it
plays a crucial role in capturing feature from the input image.

(2) Head Network: The head network is where key point detection is
outputted. It involves the concatenation of feature maps and the ELAN
module as Fig. 12 to extract different scale feature maps. And then use
the concat and ELAN-H module as Fig. 13 to extract the final feature
map for regress.
Fig. 10 YOLOv7-POSE model

Fig. 11 CBS module

Fig. 12 ELAN module


Fig. 13 ELAN-H module

3.2 GS-ConV
GSConV [10] as Fig. 14 is a novel approach employed to enhance the trade-
off between model accuracy and computational efficiency. GSConV
leverages a multi-step process to optimize convolutional operations within a
neural network.
Firstly, GSConV initiates by a standard convolutional layer.
Subsequently, it incorporates DWConv (Depthwise Convolution), a deep
convolution operation designed to efficiently process feature maps.
To maximize the benefits of both convolution types, GSConV
concatenates the output of these two convolutional layers to capture
essential features.
Then, GSConV includes a shuffle operation that strategically rearranges
the channel order of the feature maps. This reshuffling ensures that
corresponding channel numbers from the preceding two convolutional
operations are placed adjacent to each other, thereby enhancing information
flow and facilitating more efficient computations.

Fig. 14 GS-ConV module

3.3 V7POSE-GSConV
To address the requirement for model miniaturization, we substituted the
CBS module with the GSConV module in the re-implemented YOLOv7-
POSE model as Fig. 15. This alteration led to a reduction in the number of
model parameters without compromising accuracy. Through this module
replacement, we successfully decreased the model’s parameter count by
2%.

Fig. 15 V7POSE-GSConV model

4 Experiments and Results Analysis


Based on the experimental results as in Table 1, it is evident that the
YOLOv7-POSE model, which consists of 37 million parameters, achieved
impressive key point precision and recall rates of 99.4% and 99.15%,
respectively. On the other hand, the V7POSE-GSConV model, also
comprising 36.2 million parameters, exhibited slightly improved
performance with key point precision and recall rates of 99.45% and
99.10%, respectively. Furthermore, the mean average precision (Map) of
two model all reached a remarkable value of 99.56%.
The notably high Map value indicates that the model has attained an
exceptional level of accuracy in object detection, and the excellent precision
of key point detection suggests that the model excels in identifying key
points accurately.

4.1 Key Point Detection Result

4.2 3D Phenotype Result


Based on the 2D coordinates obtained from key point detection, 3D key
point coordinate data can be extracted using RGB-D data. This 3D
coordinate information is crucial for calculating phenotypic data related to
corn plants. The error between the three-dimensional representation
obtained through the detection of key points and the annotated value is
shown in Table 2. From the results, we can find that the key point detection
model can well get the 3D phenotype from the RBG-D data.
Table 1 Key point detection result

Detection result
Percision (%) Recall (%) Map (%) Parameters number
YOLOv7-POSE 99.40 99.15 99.56 37.0 M
V7POSE-GSConV 99.45 99.10 99.56 36.2 M

Table 2 3D phenotype result

Phenotype result
Generated value Annotated value
Plant height 1.56 m 1.62 m
Leaf angle
Leaf length 0.691 m 0.732 m
Ear position 0.619 m 0.588 m
5 Conclusion
In this paper, we present a corn key point detection model that achieves an
impressive precision rate of 99.4%. With the corresponding RGB-D data,
this model enables the rapid acquisition of three-dimensional corn
phenotypic data. Compared to traditional manual calculations, our proposed
method offers convenience and speed. In contrast to point cloud-based
methods, our approach demonstrates superior robustness and applicability
in outdoor environments. Additionally, we have addressed the issue of
excessive parameters to accommodate the miniaturization of the model by
incorporating the GS-ConV model, resulting in a 2% reduction in the
parameter count while maintaining accuracy. However, Our work is
ongoing, more work is under working to obtain more accurate 3D
phenotypes and be adapted to more scenarios.

References
1. Zermas D et al (2020) 3D model processing for high throughput phenotype extraction. The case
of corn. Comput Electron Agric 172(2020):105047
2.
Toshev A, Szegedy C et al (2014) Deeppose: human pose estimation via deep neural networks.
In: 2014 IEEE conference on computer vision and pattern recognition. Institute of Electrical and
Electronic Engineers. IEEE, Columbus, US, pp 1653–1660
3.
Krizhevsky A, Sutskever I, Hinton GE et al (2017) ImageNet classification with deep
convolutional neural networks. Commun ACM 60(6):84–90
[Crossref]
4.
Wei SE, Ramakrishna V, Kanade T et al (2016) Convolutional pose machines. In: Proceedings of
the 2016 IEEE conference on computer vision and pattern recognition. Institute of Electrical and
Electronics Engineers. IEEE, Washington, pp 4724–4732
5.
Newell A, Yang KY, Deng J et al (2016) Stacked hourglass networks for human pose estimation.
In: Proceedings of the 14th European conference on computer vision. Springer, Amsterdam,
Berlin, pp 483–499
6.
Lifshitz I, Fetaya E, Ullman S et al (2016) Human pose estimation using deep consensus voting.
In: European conference on computer vision Amsterdam. ECCV, Berlin, Germany, pp 246–260
7.
Zhang F et al (2019) Fast human pose estimation. In: Proceedings of the 2019 IEEE/CVF
conference on computer vision and pattern recognition. Institute of Electrical and Electronics
Engineers. IEEE Computer Soc, USA, pp 3517–3526
8.
Wu Wentong et al (2021) Application of local fully convolutional neural network combined with
YOLO v5 algorithm in small target detection of remote sensing image. PloS One
16(10):e0259283
[Crossref]
9.
Wang C-Y, Bochkovskiy A, Liao H-YM (2022) YOLOv7: trainable bag-of-freebies sets new
state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition, 2023
10.
Li H et al (2022) Slim-neck by GSConv: a better design paradigm of detector architectures for
autonomous vehicles. arXiv:​2206.​02424
11.
Liu K, Cao J, Yu K et al (2019) Wheat TaSPL8 modulates leaf angle through auxin and
brassinosteroid signaling. Plant Physiol 181(1)

OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed
Computing , Studies in Computational Intelligence 1153
https://doi.org/10.1007/978-3-031-56388-1_7

Optic Cup Segmentation from Fundus


Image Using Swin-Unet
Xiaozhong Xue1 , Linni Wang2 , Ayaka Ehiro1 , Yahui Peng3 and
Weiwei Du1
(1) Information Science, Kyoto Institute of Technology, Kyoto, Japan
(2) Retina and Neuron-Ophthalmology, Tianjin Medical University Eye
Hospital, Tianjin, China
(3) Electronic and Information Engineering, Beijing Jiaotong University,
Beijing, China

Xiaozhong Xue
Email: [email protected]

Linni Wang
Email: [email protected]

Ayaka Ehiro
Email: [email protected]

Yahui Peng
Email: [email protected]

Weiwei Du (Corresponding author)


Email: [email protected]

Abstract
Glaucoma is one of the main causes of blindness, it is characterized by an
increase in the area ratio of optic cup (OC) and optic disc (OD). Therefore,
OC segmentation is an important basis for computer-aided diagnosis of
glaucoma. However, the accuracy of OC segmentation still needs to be
improved due to the low contrast, occlusion by blood vessels, and different
scales of OC in fundus images. This study applies swin-unet to do OC
segmentation. The swin-unet is composed of U-shaped structure and swin-
transformer block. U-shaped structure can extract features from different
scales and fuse a large number of features to solve the difficulties of low
contrast and different scales. In addition, swin-transformer block can extract
the features of the relationship between each patches, which can solve the
difficulties of occlusion by blood vessels. In this study, swin-unet obtained
mean intersection of union (IoU) of and mean DICE of ,
which is better than the segmentation results of U-Net and swin-
transformer. Therefore, the effectiveness of swin-unet in OC segmentation
is proved in this study.

Keywords Optic cup segmentation – Swin-unet – Glaucoma

1 Introduction
Fundus image is the photo of retina. In clinical, ophthalmologists generally
use the fundus image to diagnose various fundus diseases, such as
glaucoma [1]. The Fig. 1 is an example of fundus image, various lesions
and organs of retina can be observed from this image. Optic disc (OD) is a
highly bright oval-shaped yellowish region in fundus images, optic cup
(OC) is a brighter part in OD. Physiologically, OD and OC are regions
where blood vessels and optic nerves pass through the retina [10]. Blood
vessels are red lines scattered throughout the retina. Fovea appears as a
darker area on the retina, which is the most sensitive area of vision [11].
Fig. 1 Fundus image
Glaucoma is one of the main causes of loss of vision in the world. Due
to the destruction that leads to blindness is in incremental and insidious,
glaucoma is also called as “silent thief of vision” [20]. However, if
glaucoma can be found and treated in time, it could effectively prevent
blindness. One symptom of glaucoma is an increase in intraocular pressure
[2]. Due to the fact that OC is a concave structure on the retina, the degree
of OC concave increases when intraocular pressure increases. Therefore, as
shown in Fig. 2, in the fundus image of glaucoma patient, the area ratio of
OC and OD will increase [23], which is also an important basis for
computer-aided diagnosis of glaucoma. Furthermore, the segmentation of
OD and OC is the prerequisite for calculating their area ratio.

Fig. 2 The comparison of normal and glaucoma fundus images, the blue curve and red curve are
boundaries of OD and OC, respectively (a is an example of normal, b is an example of glaucoma)

Many researchers have proposed and improved the approaches of OD


segmentation. For example, Zhang et al. [25] proposed a hybrid level set
method, which gets the mean intersection of union (IoU) of in
DRISHTI-GS [19] dataset. However, as shown in Figs. 3 and 4, due to the
difficulties of low contrast (as shown in Fig. 3a, the boundaries between OD
and OC are quite blurring), covered by blood vessels (as shown in Fig. 3b,
the boundary of OC is extensively obstructed by blood vessels), and
multiple scales (as shown in Fig. 4, there are significant differences in the
scale of OC in different fundus images), the existing methods cannot
generate the ideal results in OC segmentation. Some approaches have been
proposed to address these difficulties in a targeted manner. For example,
Raj et al. [17] uses the kinks of blood vessels to detect the boundaries of
OC. This method can solve the difficulties of low contrast and OC covered
by blood vessels. However, it needs to segment the blood vessels firstly, and
not every fundus image has obvious kinks. A lot of different deep learning
models have been applied in OC segmentation, such as NENet [15], PKRT-
Net [13], FCN [18], and U-Net [16]. However, the 3 difficulties are not
explicitly discussed in these papers whether they are resolved.
The swin-unet [4] can solve the above 3 difficulties in a targeted
manner. Swin-unet consists of a U-shaped structure and swin-transformer
block. Swin-transformer block can extract the features of relationship
between each patch. These features can effectively distinguish each sections
of blood vessels are belong to OC regions or non-OC regions. Therefore,
the swin-transformer block can solve the difficulty of covered by blood
vessels. In the process of image size transformation, the U-shaped structure
can extract the features from different scales, which is helpful for solving
the difficulty of different scales of OC. At the same time, the skip
connection mechanism of the U-shaped structure can combine numerous
features. Therefore, this mechanism is effective for solving the difficulty of
low contrast. The swin-unet achieved results with a mean IoU of
and a mean DICE of . This segmentation accuracy is improved
compared with U-Net and swin-transformer, which also proves the
effectiveness of swin-unet in OC segmentation.
In addition, the lack of a large amount of high-quality data is one of the
constraints for deep learning models to be fully and effectively trained, and
it is no exception in OC segmentation. In this study, five publicly available
datasets, DRISHTI-GS, ORIGA, RIM-ONE R3, REFUGE, and G1020, are
integrated together. The low quality data is removed through pre-processing
and all data is normalized. Finally, a dataset containing 2778 high-quality
fundus images is obtained.
Fig. 3 The difficulties of OC segmentation (a is a low contrast case, b is a case of covered by blood
vessels. The red curves are boundaries of OC)

Fig. 4 The different scales of OC (The red curves are boundaries of OC)
The main contributions of this paper are as follows:
Swin-unet is applied to OC segmentation and its effectiveness in OC
segmentation is demonstrated.
The three difficulties in OC segmentation: low contrast, covered by blood
vessels, and multiple scales, have been solved with targeted solutions.
This paper is composed as follows: Sect. 2 briefly introduces the related
works of OC segmentation; Sect. 3 explains the swin-unet that is applied to
OC segmentation in detail; Sect. 4 describes the experimental process and
shows the experimental results; Sect. 5 draws some conclusions.
2 Related Work
At present, the algorithms of OC segmentation are mainly divided into four
categories: vessels kinking—[6, 24], active contour model—[14], clustering
—[5], and deep learning-based methods [12, 22].
OD and OC are the depression parts on the retina, and the slope of OC
depression is stronger than OD. A knot will appear when the blood vessels
pass through the boundary of OD and OC. Damon et al. [24] and Wong et
al. [6] judge the OC boundaries by these knots. This method can solve the
difficulty of OC being covered by blood vessels. However, due to the
different angle when taking fundus images, not every fundus image has
knots. In addition, it is necessary to segment blood vessels before detecting
the knots, this pre-processing will bring more errors. Mishra et al. [14] uses
the active contour model-based method and Chandrika and Nirmala [5] uses
clustering-based method. However, these methods are failed in
segmentation of OC with low contrast. Furthermore, these methods need
some post-processing, such as ellipse fitting, to compensate for under-
segmentation of OC caused by vessel occlusion. These post-processing will
also reduce the OC segmentation accuracy.
With the rapid development of deep learning, it has also been widely
used in OC segmentation. Among various deep learning models, the models
based on U-Net [12] and GAN [22] are most commonly used. Due to its U-
shaped structure, U-Net can extract the features of different scales, and the
skip connection mechanism can improve the amount of extracted features.
However, it cannot solve the difficulty that OC is covered by blood vessels.
The GAN model using semi-supervised learning may help solve the
difficulty of vascular occlusion, but it does not perform well in fundus
images with low contrast.
In summary, the three difficulties in OC segmentation: low contrast,
multiple scales, and covered by blood vessels are still not solved.

3 OC Segmentation Using Swin-Unet


The structure of swin-unet is shown in Fig. 5, it can be seen directly that
swin-unet is an improvement on the basis of U-Net and swin-transformer.
Therefore, swin-unet should have the advantages of both models, including
multi-scale feature extraction, feature fusion (from U-Net), and the
extraction of relationship features between patches (from swin-
transformer). The swin-unet consists of patch partition, linear embedding,
swin-transformer block, patch merging, patch expanding and linear
projection. The swin-unet has U-shaped structure, the features are mainly
extracted by swin-transformer blocks. The patch partition is used to split the
image into patches. The patch merging and patch expanding can down-
sample and up-sample the feature maps, respectively. In Fig. 5, the formulas
beside each blocks are the output size of feature maps. The various blocks
in swin-unet will be described in detail below. In the following, “number
” means to change a certain dimension of the feature map to “number”
times of the original. For example, “making the resolution ” means that
the resolution of the feature map will be doubled (
).
Fig. 5 The structure of swin-unet

Fig. 6 Two types of self-attention mechanisms (a is conventional self attention, while b is window
self attention from swin-transformer. Red squares: the range of implementing self attention)
Self-Attention: Self-attention is first proposed in the field of natural
language processing [21], and later applied in the field of computer vision
[7]. In natural language processing, the calculation of self-attention is often
based on a word, while in computer vision, it is usually based on a small
patch. The calculation process of self-attention can be simply summarized
into the following 3 steps:
Encoding a word or patch as a vector, then a sentence or image is
encoded as a matrix.
Multiply the encoded matrix with three weight matrices , , and
to obtain matrices Q, K, and V, where , , and are three
trainable matrices.
Finally, the new feature map can be calculated using
.

Swin-Transformer Block: As shown in Fig. 6, the conventional vision


transformer (ViT) splits the whole image into patches and uses all patches
for self-attention. Unlike the conventional ViT, swin-transformer block is
doing self-attention based on window. It only uses the patches contained in
the window. However, the features of the relationship between patches in
different windows will not be extracted, resulting in a large amount of
missing information. In order to avoid losing the features of the relationship
between each window, the window is shifted and the self-attention within
the window is implemented again. Therefore, as shown in Fig. 7, a basic
block is composed of two consecutive swin-transformer blocks. Each swin-
transformer block is composed of layer normalization (LN), window multi-
head self-attention (W-MSA)/shifted window multi-head self-attention
(SW-MSA), multi-layer perceptron (MLP), and residual connection
module. The calculation process of a basic block can be expressed as
following:
(1)

(2)

(3)
(4)

Fig. 7 The structure of swin-transformer block


Patch Partition and Linear Embedding: Patch partition is to split the
image into non-overlapping patches. In this study, the size of each patch is
set to . Then flatten each patch (input as a 3-channel color fundus
image), therefore, the size of the image changes from to
. Linear embedding changes the dimension of image to C by
linearly mapping, and the size of image becomes , where C is a
hyper-parameter.
Patch Merging: Patch Merging is the process of feature map down-
sampling. It is similar to pooling processing. As shown in Fig. 8, assuming
that the input is a feature map. Firstly, the feature map is split into
four patches which have same size, then the pixels at the same location of
each patch are put together to get 4 small feature maps. Secondly,
concatting these four feature maps. Finally, doing the layer normalization
and linearly mapping the dimension . Therefore, after patch merging,
the resolution of the feature map becomes , and the dimension becomes
, the down-sampling is completed.
Fig. 8 Patch merging
Patch Expanding and Final Patch Expanding: Patch expanding is the
process of up-sampling the feature map. Assuming that the shape of input
feature map is . Firstly, applying a linear mapping to make the
dimension , that means the shape of the feature map become
. Then directly rearranging the resolution and the
dimension . The feature map has been up-sampled and the shape
becomes . The final patch expanding is a process of up-
sampling. Assuming that the shape of input feature map is .
Similar with patch expanding, making the dimension by linear
mapping. Then directly rearranging the resolution and the dimension
. The shape of feature map will become after final patch
expanding.
Linear Projection: Linear projection is a process to increase and reduce
the dimension of the feature map. Because the swin-unet splits the image
into non-overlapping patches, the boundaries of segmentation results are
not smooth enough. The purpose of this process is to make the model learn
the detail of boundary features and make the segmentation result more
smooth. Linear projection contains three convolution layers. It makes the
dimension , and then reduces the dimension twice, the change process
of the feature map shape is
.
Skip Connection: Skip connection is a process to combine the feature
map of the down-sampling (encoding process) and up-sampling (decoding
process). Taking the skip connection as an example, the down-sampling
and up-sampling feature maps are concatted directly, so the shape of the
feature map becomes . Then using linear mapping to make the
dimension .

Fig. 9 The example of blood vessels patches

As shown in Fig. 9, the small squares are some examples of patches. In


this figure, the blue square and red square are patches of blood vessels, and
the green squares and black squares are patches of OD areas and OC areas,
respectively. The blue curve is boundary of OD, while the red curve is
boundary of OC. The features of relationship between blue square and
green squares, as well as, red square and black squares can be learnt by
swin-unet. These features may useful for distinguishing whether blood
vessels belong to OD areas or OC areas, which means the difficulty of OC
is covered by blood vessel can be solved by swin-unet. Before swin-
transformer blocks, there are patch merging blocks, which can rescale the
dimension and resolution of feature maps. This process makes swin-
transformer to extract the features from different scales. It may be helpful in
solving difficulty at different scales of OC. The left side of swin-unet is the
process of encoding, while the right side is the process of decoding. During
the forward process, some features will be lost. The skip connection
mechanism integrates feature maps from the encoding process and the
decoding process, which can avoid feature loss. This means that swin-unet
can extract more features, thereby solving the difficulty of low contrast in
OC segmentation.

4 Experiment
4.1 Dataset
Five public fundus image datasets: DRISHTI-GS, ORIGA, RIM-ONE R3,
REFUGE, and G1020 are used in this study.
The DRISHTI-GS [19] dataset contains 101 fundus images. All images
were token centered on OD with a field-of-view (FOV) of 30-degrees and
resolutions of pixels. The ground-truth of OD and OC
segmentation results are annotated by 4 experts.
The ORIGA [26] dataset contains 650 fundus images. The resolution of
each image is about . All images are tagged with manually
segmented OD and OC by trained professionals from Singapore Eye
Research Institute.
Unlike other datasets, there are 159 stereo fundus images in RIM-ONE
R3 [9] dataset. The ground-truth of OD and OC segmentation results are
annotated by 2 experts.
The REFUGE [8] dataset is from Retina Fundus Glaucoma Challenge,
which contains 1200 fundus images. These images are divided into training
set, testing set, and validation set. For each image, both OD and OC
segmentation results are manually marked.
The G1020 [3] dataset contains 1020 fundus image. All images were
token with a FOV of 45-degrees. Same as the above datasets, the ground-
truth of OD and OC segmentation results are annotated.
The detail information of each dataset are listed in Table 1.
Table 1 The detail of each dataset used in this study

Dataset Number FOV Resolution GT


DRISHTI-GS 101 OD, OC, Glaucoma
ORIGA 650 OD, OC
RIM-ONE R3 159 – OD, OC, Glaucoma
REFUGE 1200 OD, OC
G1020 1020 OD, OC

4.2 Pre-processing
The pre-processing applied in this study is the extraction of regions of
interest (ROI). OC only occupies a very small part in fundus image, so it is
necessary to extract the ROI. This can not only reduce the calculation
amount, but also improve the segmentation accuracy. Otherwise, as
described in Sect. 4.1, the fundus images in different datasets have different
resolution and FOV. It is necessary to normalize the scale of OC. As shown
in Fig. 10, this study directly uses ground-truth to find the center and radius
of OD, which are utilized to crop the ROI image. The process of ROI
extraction is mainly divided into two steps: center and radius detection and
removal of invalid image.
Center and radius detection: the ground-truth of OD is a binary image,
the center of OD can be detected by the connected area in this binary
image. Then looking for the radius of OD in , , , and
four directions. The maximum value on these four directions is treated as
the radius of OD. As shown in Fig. 11, the red circle and green line are
the detected center and radius, respectively.
Removal of invalid image: as shown in Fig. 12, in this process, some
images with incomplete OD are discarded. Some images could not find
the center of OD, which will also be discarded. Finally, the ROI image
can be cropped according to the center and radius of OD. The side length
of ROI image is 4 times the radius of OD.
Fig. 10 Flowchart of pre-processing

Fig. 11 The detected center and radius of OD (Red circle: center, blue lines: candidate radius, green
line: selected radius)

Fig. 12 The discarded images


The examples of extracted ROI images are shown in Fig. 13, the first
line in Fig. 13 is ROI image, and the second line is ground-truth of OC. As
shown in this figure, the proportion of OC in the ROI images is similar,
which is also helpful for solving the difficulty of different scles of OC.
Table 2 shows the number of fundus images used in this study. This study
uses all images in DRISHTI-GS and RIM-ONE R3 datasets, 752 images in
G1020, 649 images in ORIGA, and 1117 images in REFUGE, for a total of
2778 fundus images.
Fig. 13 The examples of ROI images (First line: ROI images, Second line: ground-truth of OC)

Table 2 The used fundus images

Dataset Total number Used number


DRISHTI-GS 101 101
G1020 1020 752
ORIGA 650 649
REFUGE 1200 1117
RIM-ONE R3 159 159
All 3130 2778

4.3 Compared Models


As described in Sect. 3, swin-unet is improved on the basis of U-Net and
swin-transformer, this study uses these two models to compare the OC
segmentation results.
U-Net: As shown in Fig. 14, U-Net consists of the down-sampling
encoder process and the up-sampling decoder process. Furthermore, the
feature maps from these two processes are combined through the skip
connection mechanism. U-Net can extract multi-scale features, and improve
the amount of extracted features through skip connection.
Fig. 14 The schematic diagram of U-Net
Swin-Transformer: As shown in Fig. 15, the segmentation model used in
this study is composed of segmentor (upernet) and swin-transformer
backbone. Swin-transformer can also extract multi-scale features, and the
self-attention mechanism can extract the features of the relationship
between each patch.
Fig. 15 The schematic diagram of swin-transformer

4.4 Results
In this study, 2778 fundus images are divided into training sets and testing
sets at the ratio of 9 : 1, 7 : 3, and 5 : 5. For each ratio, the data are
randomly divided 10 times, and the final result can be obtained by the
average of each experiment result. IoU and DICE are used to measure the
results of OC segmentation. The formula of IoU and DICE are expressed as
follows:

(5)

(6)

where,
GT and Result are ground-truth and segmentation result, respectively.
The segmentation results of each model are shown in Table 3, and the
examples of segmentation results by swin-unet are shown in Fig. 16.
Table 3 OC segmentation results

Ratio Model Mean IoU (%) Mean DICE (%)


5:5 U-Net 75.96 85.72
Swin-transformer 74.41 84.59
Swin-unet
7:3 U-Net 76.60 86.12
Swin-transformer 75.45 85.31
Swin-unet
9:1 U-Net 76.98 86.43
Swin-transformer 74.81 84.82
Swin-unet

Fig. 16 The OC segmentation results by swin-unet (Red curve: ground-truth, green curve:
segmentation result)

4.5 Discussion
As shown in Table 3, compared with U-Net and swin-transformer, swin-
unet achieves the highest mean IoU and mean DICE in OC segmentation. In
different ratios of training sets and testing sets, swin-unet obtained mean
IoU of 77.39, 77.60, and mean DICE of 86.68, 86.79, ,
respectively. This experimental results prove the effectiveness of swin-unet
in OC segmentation.

Fig. 17 Comparison of the OC segmentation results by 3 models: U-Net, swin-transformer, and


swin-unet (Red curve: ground-truth, green curve: segmentation result)
The Fig. 17 shows the comparison of OC segmentation results by three
models: U-Net, swin-transformer, and swin-unet. The green curve is
segmentation result, while the red curve is ground-truth. In case 1, the
segmentation result of U-Net is affected by blood vessels, while swin-unet
can segment the OC well, even if the OC is occluded by blood vessels. This
case proves that the difficulty of blood vessels occlusion can be solved.
Case 2 and case 3 are two cases with low contrast. The satisfactory results
are obtained by swin-unet. However, U-Net and swin-transformer have
different degrees of over-segmentation and under-segmentation. These two
results verified that swin-unet can solve the difficulty of low contrast.
Simultaneously, the scale of OC in case 1 and case 2 is different, and the
swin-unet is not affected by different scale. Therefore, the difficulty of
different scale has also been solved by swin-unet.

5 Conclusion
In this study, swin-unet is applied to OC segmentation for the first time, and
a mean IoU of and a mean DICE of are obtained. This
segmentation accuracy is improved compared with U-Net and swin-
transformer. The experimental results show that swin-unet is effective in
OC segmentation and can effectively solve the difficulties in OC
segmentation: 1. the low contrast between OC and other areas; 2. the OC is
covered by blood vessels; 3. the different OC has different scale.

References
1. Ahmed A, Ritambhar B, Kaamran R, Vasudevan L (2015) Optic disc and optic cup segmentation
methodologies for glaucoma image detection: a survey. J Ophthalmol 2015:1–28
[Crossref]
2.
Alward W, Feldman F, Cashwell L, Wilensky J, Geijssen H, Greeve E, Quigley H, Skuta G,
Lichter P, Blondeau P, Collaborative Normal-Tension Glaucoma Study Group (1998) The
effectiveness of intraocular pressure reduction in the treatment of normal-tension glaucoma. Am
J Ophthalmol 126(4):498–505
3.
Bajwa MN, Singh GAP, Neumeier W, Malik MI, Dengel A, Ahmed S (2020) G1020: a
benchmark retinal fundus image dataset for computer-aided glaucoma detection. In: 2020
International joint conference on neural networks (IJCNN). IEEE, pp 1–7
4.
Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2023) Swin-unet: Unet-like pure
transformer for medical image segmentation. In: Computer vision–ECCV 2022 workshops: Tel
Aviv, Israel, October 23–27, 2022, proceedings, Part III. Springer, pp 205–218
5.
Chandrika SM, Nirmala K (2013) Analysis of CDR detection for glaucoma diagnosis
6.
Damon WWK, Liu J, Meng TN, Fengshou Y, Yin WT (2012) Automatic detection of the optic
cup using vessel kinking in digital retinal fundus images. In: 2012 9th IEEE international
symposium on biomedical imaging (ISBI). IEEE, pp 1647–1650
7.
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M,
Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for
image recognition at scale. arXiv:​2010.​11929
8.
Fu H, Li F, Orlando JI, Bogunović H, Sun X, Liao J, Xu Y, Zhang S, Zhang X (2019) Refuge:
retinal fundus glaucoma challenge. https://​doi.​org/​10.​21227/​tz6e-r977
9.
Fumero F, Sigut J, Alayón S, González-Hernández M, González de la Rosa M (2015) Interactive
tool and database for optic disc and cup segmentation of stereo and monocular retinal fundus
images
10.
Hayreh SS (1969) Blood supply of the optic nerve head and its role in optic atrophy, glaucoma,
and oedema of the optic disc. Br J Ophthalmol 53(11):721. https://​doi.​org/​10.​1136/​bjo.​53.​11.​721
[Crossref]
11.
He H, Lin L, Cai Z, Tang X (2022) Joined: prior guided multi-task learning for joint optic
disc/cup segmentation and fovea detection. In: International conference on medical imaging with
deep learning. PMLR, pp 477–492
12.
Kuruvilla J, Sukumaran D, Sankar A, Joy SP (2016) A review on image processing and image
segmentation. In: 2016 International conference on data mining and advanced computing
(SAPIENCE), pp 198–203
13.
Lu S, Zhao H, Liu H, Li H, Wang N (2023) Pkrt-net: prior knowledge-based relation transformer
network for optic cup and disc segmentation. Neurocomputing 538:126,183
14.
Mishra M, Nath MK, Dandapat S (2011) Glaucoma detection from color fundus images. Int J
Comput Commun Technol (IJCCT) 2(6):7–10
15.
Pachade S, Porwal P, Kokare M, Giancardo L, Meriaudeau F (2021) NENet: Nested efficientNet
and adversarial learning for joint optic disc and cup segmentation. Med Image Anal 74:102,253
16.
Prastyo PH, Sumi AS, Nuraini A (2020) Optic cup segmentation using u-net architecture on
retinal fundus image. JITCE (J Inf Technol Comput Eng) 4(02):105–109
[Crossref]
17.
Raj PK, Kumar JH, Jois S, Harsha S, Seelamantula CS (2019) A structure tensor based Voronoi
decomposition technique for optic cup segmentation. In: 2019 IEEE international conference on
image processing (ICIP). IEEE, pp 829–833
18.
Shankaranarayana SM, Ram K, Mitra K, Sivaprakasam M (2019) Fully convolutional networks
for monocular retinal depth estimation and optic disc-cup segmentation. IEEE J Biomed Health
Inform 23(4):1417–1426
[Crossref]
19.
Sivaswamy J, Krishnadas S, Joshi GD, Jain M, Tabish AUS (2014) Drishti-gs: retinal image
dataset for optic nerve head (onh) segmentation. In: 2014 IEEE 11th international symposium on
biomedical imaging (ISBI). IEEE, pp 53–56
20.
Turkoski BB (2012) Glaucoma and glaucoma medications. Orthopaed Nurs 31(1):37–41
[Crossref]
21.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I
(2017) Attention is all you need. In: Advances in neural information processing systems, vol 30
22.
Wang S, Yu L, Yang X, Fu CW, Heng PA (2019) Patch-based output space adversarial learning
for joint optic disc and cup segmentation. IEEE Trans Med Imaging 38(11):2485–2495
[Crossref]
23.
Weinreb RN, Aung T, Medeiros FA (2014) The pathophysiology and treatment of glaucoma: a
review. JAMA 311(18):1901–1911
[Crossref]
24.
Wong D, Liu J, Lim J, Li H, Wong T (2009) Automated detection of kinks from blood vessels for
optic cup segmentation in retinal images. In: Medical imaging 2009: computer-aided diagnosis,
vol 7260. SPIE, pp 459–466
25.
Xue X, Wang L, Du W, Fujiwara Y, Peng Y (2022) Multiple preprocessing hybrid level set model
for optic disc segmentation in fundus images. Sensors 22(18):6899
[Crossref]
26.
Zhang Z, Yin FS, Liu J, Wong WK, Tan NM, Lee BH, Cheng J, Wong TY (2010) Origa-light: an
online retinal fundus image database for glaucoma analysis and research. In: 2010 Annual
international conference of the IEEE engineering in medicine and biology, pp 3065–3068.
https://​doi.​org/​10.​1109/​IEMBS.​2010.​5626137

OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed
Computing , Studies in Computational Intelligence 1153
https://doi.org/10.1007/978-3-031-56388-1_8

From Above and Beyond: Decoding Urban


Aesthetics with the Visual Pollution Index
Advait Gupta1 , Manan Padsala1 , Devesh Jani2 , Tanmay Bisen1 ,
Aastha Shayla1 and Susham Biswas1
(1) Department of Computer Science, Rajiv Gandhi Institute of Petroleum
Technology, Jais, UP, India
(2) Department of Electronics and Communication, Chandubhai S. Patel
Institute of Technology, Charotar University of Science and
Technology, Anand, Gujarat, India

Advait Gupta (Corresponding author)


Email: [email protected]

Manan Padsala
Email: [email protected]

Devesh Jani
Email: [email protected]

Tanmay Bisen
Email: [email protected]

Aastha Shayla
Email: [email protected]

Susham Biswas
Email: [email protected]

Abstract
Urban landscapes, emblematic of modernization and growth, are
increasingly faced with the intricate challenge of visual pollution. This
nuanced form of pollution, often overshadowed environmental discussions,
profoundly influences the aesthetic harmony and mental well-being of
urban inhabitants. In this research, we present an innovative methodology
to detect visual pollution using drone-captured imagery. Our distinctive
dataset captures a spectrum of visual pollutants, from graffiti, faded
signage, and potholes to more complex issues like cluttered sidewalks and
unkempt facades. Leveraging this dataset, we fine-tuned pre-trained object
detection models, specifically YOLOv6, achieving remarkable accuracy in
detecting these visual pollutants from images. Central to our study is the
introduction of the Visual Pollution Index (VPI), a metric formulated
through the multiplicative integration of the Counting Categories Ratio
(CCR) and the Severity-Weighted Score (SWS). To provide a spatial
representation of visual pollution levels, we further introduce heatmap
visualizations. These heatmaps, overlaid on urban maps, offer a vivid
depiction of pollution hotspots, enabling city planners and stakeholders to
pinpoint areas of concern. Grounded in real-world perceptions, our
approach offers a comprehensive lens to assess, visualize, and address
visual pollution in urban environments.

Keywords Visual pollution index – VPI – Urban aesthetics – Drone


imagery – Object detection – YOLOv6 – Visual pollutants – Heatmap
visualization – Urban planning – Environmental impact – Counting
categories ratio – CCR – Severity-weighted score – SWS – Urban well-
being – Geospatial analysis – Automated detection – Urban environmental
management

1 Introduction
Urban environments, as the epicenters of human activity and innovation,
have witnessed unprecedented growth over the past few decades. While this
growth has brought about numerous advancements and opportunities, it has
also introduced a myriad of challenges, one of which is visual pollution.
Visual pollution, a term that encompasses unsightly and out-of-place man-
made objects within public and private spaces, has become a growing
concern for urban planners, environmentalists, and city dwellers alike [1].
The concept of visual pollution is not new; however, its significance has
grown in tandem with rapid urbanization. Visual disturbances, ranging from
graffiti, faded signage, and potholes to cluttered sidewalks and unkempt
facades, can degrade the aesthetic appeal of urban areas, impacting not only
the visual harmony but also the psychological well-being of residents [2].
Such disturbances can lead to decreased property values, reduced tourist
interest, and even adverse health effects due to stress and mental fatigue [3].
With the advent of technology, particularly in the realms of drone
imagery and machine learning, there exists an opportunity to address this
issue in a more systematic and data-driven manner. Drones, with their
ability to capture high-resolution images from vantage points previously
inaccessible, offer a unique perspective on urban landscapes [4]. When
combined with advanced object detection algorithms, such as YOLOv6,
these images can be analyzed to detect and quantify visual pollutants with
remarkable accuracy [5].
This paper introduces a novel approach to quantify visual pollution
using the Visual Pollution Index (VPI), a metric derived from drone-
captured imagery and object detection techniques. Furthermore, we present
heatmap visualizations to spatially represent visual pollution levels,
providing a tool for urban planners and stakeholders to make informed
decisions.

2 Literature Survey
Urban environments are increasingly being scrutinized for their aesthetic
appeal, given the rapid urbanization and the subsequent challenges it brings.
Visual pollution, an often-overlooked aspect, plays a crucial role in
determining the aesthetic quality of a place. The term “visual pollution”
refers to the entire set of unsightly and visually unpleasing elements in an
environment. This can range from graffiti, billboards, overhead power lines,
and even the architectural design of buildings.
For instance, a 2016 study by Chmielewski et al. delves into the
commercialization of public space by outdoor advertising and its potential
negative impact on the quality of life and enjoyment of public spaces [6].
The research illustrates that visual pollution can be quantified by correlating
public opinion with the number of visible advertisements. Using a 2.5D
outdoor advertisement dataset from a busy urban street in Lublin, Poland,
the study translates visibility into visual pollution. The findings suggest that
streetscape views with more than seven visible advertisements result in
visual pollution in the studied context. Our study extends this methodology
by incorporating a more comprehensive set of visual pollutants and
applying advanced deep learning techniques for detection.
Building on this, a 2019 study by Ahmed et al. delves deeper into the
realm of visual pollution [7]. The researchers propose a novel approach to
detect visual pollutants using deep learning techniques. More importantly,
they suggest the potential of creating a “Visual Pollution Index” (VPI) in
the future. This index would serve as a tool for urban planners and
professionals in urban environmental management, allowing them to
evaluate and compare the visual aestheticism of different geographic
regions. While they only proposed the idea, our research has taken the
initiative to design and implement such an index.
A distinguishing feature of our research is the utilization of a unique and
novel high-quality dataset, which stands in contrast to the commonly used
publicly available datasets of lower quality. This innovative dataset
enhances the accuracy, detection capabilities, and Intersection over Union
(IoU) metrics in our study. The superior quality of our dataset ensures more
reliable results, setting our research apart from previous studies in the
domain.
Further, a 2019 study by Wakil et al. titled “A Hybrid Tool for Visual
Pollution Assessment in Urban Environments” provides a systematic
approach for the development of a robust Visual Pollution Assessment
(VPA) tool [8]. The research introduces a methodology that integrates both
expert and public perspectives to rank Visual Pollution Objects (VPOs).
Using empirical decision-making techniques, the VPA tool produces a
point-based visual pollution scorecard. After extensive testing in Pakistan,
the tool offers regulators a consistent method for visual pollution
assessment and equips policy makers with a foundation for evidence-based
strategies. Our research builds on this, refining methods to evaluate visual
pollutants across urban settings.
Another 2022 study by Alharbi and Rangel-Buitrago focuses on the
visible deterioration and negative aesthetic quality of the landscape in
coastal environments [9]. Factors such as erosion, marine wrack, litter,
sewage, and beach driving are identified as contributors to visual pollution,
particularly in the Rabigh coastal area of the Kingdom of Saudi Arabia. The
research employs the Coastal Scenery Evaluation System (CSES) to assess
the scenic quality of 31 coastal sites.
In addition, a 2016 study by Madleňák and Hudák discusses the concept
of “visual smog,” which has emerged as a social concern in recent decades
[10]. This refers to the contamination of public spaces by aggressive and
often illegally placed advertisements that are not proportionate in size. The
study aims to measure the level of visual smog on selected road
communications, taking into account the number of ads and billboards near
roads, the distance between billboards and roads, and the density of these
billboards. The research also incorporates an analysis of traffic accidents on
the chosen road communications.
Lastly, a study by Yilmaz titled “In the Context of Visual Pollution:
Effects to Trabzon City Center Silhouette” examines the influence of visual
pollution on city silhouettes [11]. Silhouettes, indicative of a city’s history
and structure, are becoming markers of visual pollution. The research
underscores that many cities now feature buildings that lack harmony and
environmental consideration, leading to uniform concrete façades that
eclipse their historical essence. Using Trabzon as a case study, a coastal city
renowned for its historical richness, the study contrasts old and new city
images to assess the aesthetic shifts, highlighting the interventions that have
reshaped its distinctive silhouette.
In conclusion, the aforementioned studies collectively highlight the
growing importance of addressing visual pollution in urban settings. Our
research builds on these foundational studies, offering a more
comprehensive and practical solution to the problem. By designing the
Visual Pollution Index, we aim to provide urban planners and
environmentalists with a robust tool to assess and mitigate visual pollution
in our cities.

3 Problem Statement
As urban environments undergo rapid transformation, the issue of visual
pollution has become increasingly prominent, impacting the aesthetic and
overall quality of life in cities. While various studies have attempted to
address this concern, there is a distinct lack of a standardized, universally
applicable metric like the Visual Pollution Index (VPI) for quantifying and
addressing visual pollutants comprehensively. The development and
refinement of such an index, backed by advanced technological
methodologies, is crucial for enabling urban planners and environmentalists
to make informed decisions and implement effective strategies to combat
visual pollution.

4 Methodology
The entire methodology adopted for this study, including data collection,
processing, and analysis, is depicted in a comprehensive flowchart
presented in Fig. 1.

Fig. 1 Detailed methodology of our proposed research

4.1 Data Collection


The backbone of our study is the caliber and pertinence of its data. We
employed a dedicated drone application to systematically collect aerial
photos of urban areas. This tool was thoughtfully created to snap photos
during the drone’s journey, sending them instantly to our exclusive servers.
This approach safeguarded data accuracy and immediate accessibility for
study.

4.1.1 Area of Focus


Primarily, our data encompasses the urban zones of Newtown, Kolkata. The
choice of this location was influenced by its urban expansion and the
variety of visual contaminants, making it an apt subject for our study [12].

4.1.2 Timing
For the best lighting and image clarity, drone operations were planned
between 10 AM and 4 PM. This slot was selected after initial assessments
highlighted the difficulties of capturing under variable light conditions.

4.1.3 Height Considerations


Image capture altitude varied with the desired view. For top-down visuals,
drones hovered between 30 and 60 m. For other angles, a uniform 10-m
height was maintained, ensuring crisp and detailed photos.

4.1.4 Weather Concerns


Recognizing weather’s role in image quality, we chose clear days for data
collection. Previous studies have emphasized the importance of weather
factors in aerial photography, backing our choice.

4.1.5 Image Anotations


After collection, each photo underwent a thorough manual annotation
procedure using QGIS software [13]. This step was crucial to ensure data
accuracy, especially for object spotting and further study.

4.1.6 Resolution Details


Image clarity was determined by the drone’s height. Photos from 30 to 60 m
had clarity ranging from 2.5 to 10 cm per pixel. In contrast, those from
10 m displayed a sharper clarity of under 1 cm per pixel, giving detailed
views of visual contaminants.

4.1.7 Gear Information


The drones selected for our research were picked for their superior image
capture and dependability. We utilized the Phantom 4 PRO, Mavic Mini,
and Phantom 4 Pro V2.0, all known for their exceptional image quality in
various investigations. In summary, our meticulous data collection method
guaranteed the gathering of top-notch, relevant data, establishing a solid
base for our subsequent assessments.
4.1.8 Dataset Details
Post data gathering and tagging, our dataset was completed. It comprises
high-definition photos of different urban visual pollutants, each being 700
× 700 pixels in size. The dataset is divided into specific categories: 600
images of Graffiti, 600 of Faded Signage, 1200 of Potholes, 1250 of
Garbage, 600 of Under-Construction Roads, 600 of Damaged Signage, 600
of Malfunctioning Street Lights, 650 of Poorly-Maintained Billboards, 600
of Sand Patches on Roads, 600 of Overcrowded Sidewalks, and 600 of
Neglected Building Facades (Fig. 2).

Fig. 2 Sample images from the dataset

4.2 Data Preprocessing


To bolster the variety and resilience of our dataset, we incorporated data
enhancement methods. Data augmentation serves as a tactic to synthetically
increase the training dataset’s volume by generating altered renditions of
pre-existing images. Given the diverse visual characteristics of pollutants
and the model’s requirement to identify them in assorted conditions and
angles, this approach holds significant pertinence for our research [14].
The augmentation methods we adopted encompass:
Rotation: Images underwent rotation at multiple angles, mirroring the
myriad orientations construction materials might be captured in.
Zooming: Certain images experienced zoom-in or zoom-out effects,
emulating the varying distances from which construction materials could
be photographed.
Flipping: We flipped images both horizontally and vertically to introduce
additional variability.
Brightness and Contrast Adjustments: These modifications were made
to mimic a range of lighting conditions, ensuring the model’s adaptability
to diverse environmental settings
It is imperative to emphasize that with each image enhancement, the
relevant annotation file was refreshed. This meticulous step ensures that the
bounding box details are a true reflection of the object’s position and scale
in the enhanced images. Automating the task of refreshing these details
guaranteed uniformity and precision across the enhanced dataset. By
adopting a judicious data enhancement strategy, we not merely expanded
our dataset’s size but also endowed it with an array of conditions, paving
the way for a model that is both steadfast and versatile.

4.3 Object Detection and Visual Pollutant Classification


For the complex endeavor of pinpointing visual pollutants in overhead
photos, we harnessed the prowess of the YOLO v6 framework [5].
Recognized for its instantaneous object detection ability, this architecture
was diligently adapted to our specialized dataset brimming with hand-
labeled images of various visual pollutants.

4.3.1 YOLO v6 Structure


YOLO, standing for “You Only Look Once,” represents a cutting-edge
technique for real-time object identification. Contrasting conventional
methods which first generate region suggestions followed by classification,
YOLO embodies a distinct methodology. It operates through a unified
neural network that scrutinizes the whole image in one go. This network
segments the image into several areas, deducing the position and likelihood
of objects within these zones. The significance of each inferred location is
gauged by its tied probability. The YOLO v6, a subsequent version in the
YOLO lineage, introduces further refinements. It’s crafted to discern
objects within visuals and categorize them on-the-fly, making it particularly
suitable for tasks like ours that demand swift and sharp detection. The
structure is lauded for its rapidity and precision, made possible through the
integration of Darknet platforms and fine-tuned network tiers.
Training Details: The model was trained rigorously for 50 epochs, using
a batch size of 64. The hyperparameters used were:
Learning Rate: 0.001 (complemented by a step decay)
Momentum: 0.9
Weight Decay: 0.0005
Loss Function: A combination of Mean Squared Error for bounding box
regression and Cross-Entropy for class prediction.

4.4 Visual Pollution Index (VPI) Calculation


The Visual Pollution Index (VPI) is a pioneering metric introduced in this
research to quantify the extent and intensity of visual pollution in urban
landscapes using aerial images. The VPI is meticulously crafted to offer a
comprehensive perspective on visual pollution, factoring in both the
diversity of visual pollutants and their respective severities. For a group of
images belonging to a particular area, we detect and store the coordinates
and labels of each item detected from all the images. Based on this
aggregated data, we calculate the VPI value for that specific area, ensuring
a more holistic representation of the visual pollution present. The
formulation of VPI is rooted in two primary components: the Counting
Categories Ratio (CCR) and the Severity-Weighted Score (SWS).

4.4.1 Counting Categories Ratio (CCR)


The CCR is a measure of the diversity of visual pollutants present in an
image. It calculates the ratio of the number of distinct visual pollution
categories detected to the total number of possible categories. This ensures
that the CCR value lies between 0 (indicating no visual pollutants detected)
and 1 (indicating all possible visual pollutants detected). It is calculated as:

4.4.2 Mapping of Visual Pollutants to Severity Categories


Initially, we categorized visual pollutants into five distinct severity levels:
Very Low Severity, Low Severity, Medium Severity, High Severity, and
Very High Severity. Each severity level was then associated with a
numerical value: 0.2 for Very Low Severity, 0.4 for Low Severity, 0.6 for
Medium Severity, 0.8 for High Severity, and 1 for Very High Severity.
Using these values, each of the visual pollutants was mapped to one of the
severity categories based on established literature and empirical
observations. This mapping has been illustrated in Table 1.
Table 1 Severity categorization of visual pollutants
Severity Pollutants
Very low Graffiti, sand on road
Low Faded signage, cluttered sidewalk
Medium Bad streetlight, broken signage
High Construction road, bad billboard, unkempt facade
Very high Potholes, garbage
This mapping is based on established literature and empirical
observations. For instance, graffiti, often perceived as a form of urban art,
can sometimes be seen as a sign of neglect or decay, especially when it’s
unsanctioned or in inappropriate places [15, 16]. Sand on roads, while
causing minor visual disruption, is generally a temporary issue often
resulting from nearby construction or natural causes [17]. Faded signage,
while not as intrusive, can give an impression of negligence and can be a
safety concern in certain contexts [18]. Cluttered sidewalks, indicative of
disorganization, can impede pedestrian movement and give a sense of
disorder [19]. Bad street lights not only affect the aesthetics of an area but
also raise concerns about safety during nighttime [20]. Broken signage can
be confusing for drivers and pedestrians, leading to potential safety risks.
Construction roads, indicative of ongoing development, can sometimes be
visually disruptive and indicate major infrastructural changes. Bad
billboards, especially those that are oversized or have inappropriate content,
can dominate the visual landscape and detract from the natural or
architectural beauty of an area [21]. An unkempt facade can be a sign of
neglect and can significantly downgrade the visual appeal of a building or
structure [22]. Potholes are not just visually jarring but also pose significant
safety risks, especially in high-traffic areas [23]. Accumulated garbage is a
direct indicator of poor sanitation, environmental neglect, and can have
health implications [24].
By mapping each visual pollutant to a severity category, we aim to
provide a more nuanced understanding of the visual quality of urban
landscapes. This ensures that the VPI captures the essence of real-world
perceptions and concerns related to visual pollution.

4.4.3 Severity Weight Score


The SWS is designed to quantify the severity of visual pollution by taking
into account the severity value assigned to each category and the number of
instances of that category in the image. The score is then normalized to
ensure its value ranges between 0 (indicating minimal severity) and 1
(indicating maximum severity). It is calculated as:

Here, the Severity Value corresponds to the value of severity against which
a particular visual pollutant is mapped, e.g., 0.2 for Very Low Severity. It’s
worth noting that this severity value can be replaced with custom weights
for each category based on specific user needs, allowing for a more tailored
assessment of visual pollution in different contexts.

4.4.4 Visual Pollution Index (VPI) Calculation


The VPI is derived by integrating both the CCR and the SWS, emphasizing
the combined importance of pollutant diversity and their respective
severities. The formula for VPI is;

This results in a VPI value ranging from 0 (indicating minimal or no


visual pollution) to 1 (indicating maximum visual pollution).

4.5 Rationale Behind the Multiplicative Approach


The decision to calculate the Visual Pollution Score/Index (VPSI) using a
multiplicative approach (CCR * SWS) stems from the following
considerations:
Interplay of Diversity and Severity: The multiplicative approach
ensures that both the diversity of visual pollutants (CCR) and their
severity (SWS) are equally emphasized. An area with a singular, yet
highly severe pollutant might not be perceived as negatively as an area
with multiple pollutants of moderate severity. This approach captures that
nuance.
Real-World Perception of Pollution: In real-world scenarios, the
presence of multiple issues, even if individually less severe, can often be
perceived as indicative of broader systemic inefficiencies or negligence.
Conversely, a singular but severe issue might be seen as an isolated
incident. Our multiplicative approach captures this perception effectively.
When there’s a combination of diverse pollutants, even if they have a
slightly lower severity, the overall perceived pollution can be higher than
an area with a singular, more severe pollutant.
Comprehensive Assessment: By considering both the variety and
intensity of visual pollutants, the VPSI offers a more holistic view of
visual pollution. This ensures that areas with diverse pollutants, even if
individually less severe, are given due attention, reflecting the broader
public perception and concerns.
The VPI offers a comprehensive perspective on visual pollution, making
it an invaluable tool for urban planners, environmentalists, and
policymakers. By considering both the variety and severity of visual
pollutants, the VPI provides a more nuanced understanding of the visual
quality of urban landscapes. In our study, the VPI was computed for a set of
aerial images, and these VPI values were then used to generate heatmaps,
offering a visual representation of visual pollution intensity across different
urban regions.

4.5.1 Heatmap Generation


The creation of a heatmap to visualize the Visual Pollution Index (VPI)
across a designated region is an intricate procedure that seamlessly merges
the outcomes of object detection with geospatial information. The following
elucidates the step-by-step process involved in the heatmap generation:
(1) Grouped Image Input and Coordinates: For a set of aerial images
corresponding to a specific area, we input their collective geographical
coordinates (latitude and longitude). This ensures that the VPI,
calculated based on the combined data from these images, aligns
accurately with its real-world geographical location.

(2) Object Detection and VPI Computation: Each image within the
group undergoes processing by the object detection model to identify
and classify visual pollutants. Leveraging the detected pollutants and
their respective severities, the VPI for the group of images is
determined. This VPI value, in conjunction with the group’s
coordinates, is archived for the subsequent heatmap creation.

(3) Data Point Aggregation: As multiple groups of images undergo


processing, a repository of VPI values paired with their corresponding
coordinates is established. This collective data forms the backbone for
crafting a detailed heatmap spanning a vast geographical expanse.
(4) Map and Boundary Formulation: With the aid of geospatial tools,
such as GeoPandas [25], a map of the targeted region (be it a city,
district, or a custom-defined area) is crafted based on delineated
boundaries or coordinates. This map acts as the foundational layer
upon which the heatmap is overlaid.

(5) Coordinate Integration: For the constructed map, the corner


coordinates are ascertained, facilitating the precise overlay of VPI data
points onto the map.

Heatmap Coloring: Each VPI data point is depicted on the map using a
color spectrum, transitioning from light green (indicating a VPI close to 0)
to deep maroon (indicating a VPI nearing 1). The gradation in color
intensity offers a visual cue about the severity of visual pollution in
different regions. When the VPI is visualized for a limited set of images
covering a smaller area, the resultant heatmap might appear more localized
and detailed as depicted in Fig. 3. Conversely, with a denser array of data
points covering a broader region, the heatmap manifests as a more
continuous and expansive visual representation as depicted in Fig. 4.
Fig. 3 Pixelated heatmap of VPI for a limited dataset (Color bar represents VPI values)

Fig. 4 Artist’s illustrations of potential VPI heatmaps for India and Bengaluru

5 Experimental Analysis and On-Site Validation


5.1 Object Detection Techniques and Evaluation
In this research, we implemented cutting-edge object recognition methods
to detect and categorize building materials from images taken by drones.
Specifically, we made use of the YOLOv6 [5] framework for detecting all
types of materials. This model has shown outstanding results in object
recognition across multiple fields.

Evaluation Criterion—Intersection Over Union (IoU)

Intersection over Union (IoU) is a renowned standard for gauging the


precision of object identification algorithms [26]. This metric evaluates the
congruence between the model’s predicted area (P) and the actual object’s
ground truth area (G). The equation to compute IoU is:

where
– (P ∩ G) represents the overlapping area between the predicted and
ground truth bounding boxes.
– (P ∪ G) represents the combined area of both bounding boxes.
Elevated IoU scores suggest superior model accuracy. A score of 1
denotes an ideal match, while a score of 0 signifies no correspondence. We
have achieved IoU values of 0.83 for Graffiti, 0.78 for Faded Signage,
0.88 for Potholes, 0.92 for Garbage, 0.82 for Construction Road, 0.71 for
Broken Signage, 0.79 for Bad streetlight, 0.77 for Bad Billboard, 0.89
for Sand on Road, 0.78 for Cluttered Sidewalk, and 0.79 for Unkempt
Facade. Overall, our models posted an average IoU rating of 0.81, which
suggests a praiseworthy degree of precision in identifying building
materials from the visuals.

5.2 Physical Survey of VPI


Elevated Pollution Zones: Upon analyzing several drone-acquired images
of a specific area, Fig. 5 presents a subset that represents the zone. The
overall VPI for the entire area, based on all images including those not
depicted in Fig. 5, exceeded 0.7, signaling pronounced visual pollution. On-
site evaluations and discussions with residents confirmed the visual
disturbances, with many lamenting the area’s “displeasing aesthetic.”

Fig. 5 Aerial images of an elevated pollution zone chosen for physical survey

Minimal Pollution Zones: In a contrasting scenario, we identified mere


instances of visual pollution in a specific area, resulting in a VPI below 0.2,
as depicted in Fig. 6. To substantiate our model’s observations, an on-
ground survey was executed. Beyond the physical manifestations,
interactions with local residents provided an insightful perspective. Most
inhabitants expressed a sense of pride in their surroundings, often
describing the area as “aesthetically pleasing” and “refreshingly clear.”
Such feedback underscores the low visual disruption observed in our VPI
findings.

Fig. 6 Aerial images of some minimal pollution zones chosen for physical survey

6 Comparative Analysis
The quantification and visualization of visual pollution in urban landscapes
using aerial imagery and computational methods have garnered attention in
the urban planning and environmental aesthetics domain. Several studies
have delved into understanding the impact of visual pollutants on urban
aesthetics and the well-being of residents. Here, we present a comparative
analysis of our work with notable contributions in the literature:
1. Chmielewski, S. (2020): Chmielewski’s research titled “Chaos in
Motion: Measuring Visual Pollution with Tangential View Landscape
Metrics” delved into the concept of visual pollution (VP) in the form of
outdoor advertisements (OA) as a threat to landscape physiognomy.
The study proposed a methodological framework for measuring VP
using tangential view landscape metrics, backed by statistically
significant proofs. The research utilized raster products derived from
aerial laser scanning data to characterize areas in Lublin, East Poland.
The study highlighted the lack of consensus on the definition of VP and
the need for a quantified approach to address this challenge [27].
2. Zaeimdar, M., Khalilnezhad Sarab, F., and Rafati, M. (2019): This
study titled “Investigation of the relation between visual pollution and
citizenry health in the city of Tehran” focused on the impact of visual
contamination on the health of citizens in two urban areas of Tehran.
The research revealed a significant relationship between visual
contamination and various health indicators of citizens, including
physical signs, social function, anxiety, insomnia, and depression [28].

3. Nahian Ahmed, M. Nazmul Islam, Ahmad Saraf Tuba, M. R. C.


Mahdy, Mohammad Sujauddin (2019): In their study titled “Solving
visual pollution with deep learning: A new nexus in environmental
management,” published in the Journal of Environmental Management,
the authors introduced methods for detecting visual pollution using
deep learning [7]. They emphasized the potential of their study to be
used in the future to design a visual pollution index or metric. Their
research highlighted the need for automated visual pollutant
classification and showcased the applicability of deep learning in
achieving this. While they proposed the potential to develop such a
metric, they did not actualize it. Our study has taken this forward by
introducing the Visual Pollution Index (VPI), thereby realizing the
potential they identified and providing a tangible metric for quantifying
visual pollution in urban landscapes.

Our Contribution

Our work distinguishes itself in several key aspects:


We have introduced a comprehensive Visual Pollution Index (VPI) that
not only detects visual pollution but also quantifies it, filling the gap
identified in Chmielewski’s study.
Our approach is holistic, considering both the aesthetic and health
impacts of visual pollution on urban populations, building on the findings
of Zaeimdar and colleagues.
Our unique dataset, combined with advanced computational methods,
allows for a more granular and accurate assessment of visual pollution in
various urban settings.
In conclusion, while several studies have delved into the concept and
impacts of visual pollution, our approach offers a comprehensive,
quantifiable, and actionable perspective. By integrating diverse
methodologies and insights from previous research, our study promises to
significantly advance the field of visual pollution assessment and
mitigation.

7 Use Cases
1. Urban Aesthetics and Revitalization:
– Heatmap Insights: Urban designers can harness the power of the
generated heatmaps to discern visual pollution intensity across
varied locales. This empowers them to pinpoint areas needing
aesthetic enhancements or rejuvenation.
– Beautification Initiatives: Recognizing regions with pronounced
visual pollution can guide city planners in orchestrating targeted
beautification drives, green initiatives, and public art installations.

2. Governmental Supervision and Policy Formulation:


– Regulation and Enforcement: Authorities can leverage the VPI to
ensure adherence to urban aesthetic standards, initiating corrective
measures in areas with high visual pollution.
– Public Awareness and Campaigns: Governments can use VPI data
to launch awareness campaigns, educating citizens about the
importance of maintaining visual aesthetics and the role they can
play.

3. Real-Time Urban Monitoring for Community Stakeholders:


– Community Engagement: Local communities can utilize the VPI to
monitor the visual health of their neighborhoods, rallying together
for cleanup drives or community beautification projects.
– Feedback Mechanism: With the VPI as a reference, residents can
provide feedback to municipal bodies about specific areas of
concern, ensuring a collaborative approach to urban aesthetics.
4. Data-Informed Strategies for Investors and Businesses:
– Location Decisions: Entrepreneurs and investors can consult the
VPI and associated heatmaps to determine suitable locations for new
ventures, especially in sectors like hospitality or real estate, where
aesthetics play a pivotal role.
– Market Analysis: Businesses can use VPI data to gauge the visual
appeal of areas, helping them tailor marketing strategies or product
placements in regions that align with their brand image.

8 Conclusion
In this research, we have unveiled a pioneering methodology to assess
visual pollution in urban environments by harnessing aerial imagery and
cutting-edge object detection mechanisms. Our Visual Pollution Index
(VPI) emerges as a holistic metric, encapsulating both the variety and
gravity of visual pollutants. With the support of our distinctive dataset and
severity categorization, our results highlight the transformative potential of
technology in reshaping our understanding of urban aesthetics. The
congruence between our system’s evaluations and real-world perceptions
attests to the precision and dependability of our methodology. As cities
worldwide grapple with the challenges of urbanization, instruments like
ours will be instrumental in guiding efforts towards creating visually
harmonious urban landscapes. The horizon looks promising, with
opportunities for future research to refine, enhance, and broaden the scope
of our approach, weaving in real-time data streams and more intricate
analytical tools.

9 Future Scope
Integration with Smart City Infrastructure: As urban landscapes evolve
into smart cities, there’s potential to integrate the VPI with sensors and
cameras placed throughout the city. This would facilitate real-time
monitoring of visual pollution, enabling swift interventions and
continuous urban beautification efforts.
Predictive Analysis for Urban Aesthetics: By harnessing the power of
artificial intelligence and machine learning, future versions of this system
could anticipate areas prone to visual pollution. This predictive capability
could be based on urban growth patterns, historical data, and socio-
economic factors, allowing for proactive measures.
Virtual Reality and Augmented Reality Enhancements: The next frontier
could involve the use of Virtual Reality (VR) to simulate the visual
experience of different urban areas based on their VPI scores.
Additionally, Augmented Reality (AR) can be employed to superimpose
potential solutions or improvements on existing urban landscapes,
providing stakeholders with a futuristic vision of possible enhancements.
Adaptive Urban Design Frameworks: With the insights derived from
VPI, urban designers can develop adaptive design frameworks. These
would be dynamic urban design strategies that evolve based on real-time
VPI data, ensuring cities remain visually appealing amidst rapid
urbanization.

Acknowledgements
We extend our heartfelt gratitude to Kesowa Infinite Ventures Pvt Ltd for
their diligent efforts in collecting the invaluable dataset used in this study.
Their commitment to quality and precision has been instrumental in the
success of our research. We are deeply appreciative of their generosity in
granting us exclusive access to this data and allowing its use for our study.

References
1. Lynch K (1984) The image of the city. MIT Press
2.
Nasar JL (1994) Urban design aesthetics: the evaluative qualities of building exteriors. Environ
Behav 26(3):377–401
[Crossref]
3.
Ulrich RS (1984) View through a window may influence recovery from surgery. Science
224(4647):420–421
[Crossref]
4.
Anderson K, Gaston KJ (2013) Lightweight unmanned aerial vehicles will revolutionize spatial
ecology. Front Ecol Environ 11(3):138–146
[Crossref]
5.
Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Ke Z, Li Q, Cheng M, Nie W, Li Y, Zhang B, Liang
Y, Zhou L, Xu X, Chu X, Wei X, Wei X (2022) YOLOv6: a single-stage object detection
framework for industrial applications. arXiv:​2209.​02976
6.
Chmielewski S, Lee DJ, Tompalski P, Chmielewski TJ, Wężyk P (2016) Measuring visual
pollution by outdoor advertisements in an urban street using intervisibilty analysis and public
surveys. Int J Geogr Inf Sci 30(4):801–818
[Crossref]
7.
Ahmed N, Islam MN, Tuba AS, Mahdy MRC, Sujauddin M (2019) Solving visual pollution with
deep learning: A new nexus in environmental management. J Environ Manag 248
8.
Wakil K, Naeem MA, Anjum GA, Waheed A, Thaheem MJ, Hussnain MQ, Nawaz R (2019) A
hybrid tool for visual pollution assessment in urban environments. Sustainability 11
9.
Alharbi OA, Rangel-Buitrago N (eds) Scenery evaluation as a tool for the determination of
visual pollution in coastal environments: the Rabigh coastline, Kingdom of Saudi Arabia as a
study case. Mar Pollut Bull 181
10.
Madleňák R, Hudák M (2016) The Research of Visual Pollution of Road Infrastructure in
Slovakia. In: Mikulski J (ed) Challenge of transport telematics. TST 2016. Communications in
computer and information science, vol 640. Springer, Cham
11.
Yilmaz D, Sagsöz A (2011) In the context of visual pollution: effects to Trabzon city Center
Silhoutte. Asian Soc Sci 7(5):98
[Crossref]
12.
Mitra D, Banerji S (2018) Urbanisation and changing waterscapes: a case study of New Town,
Kolkata, West Bengal, India. Appl Geogr 97:109–118
[Crossref]
13.
https://​www.​qgis.​org/​en/​site/​about/​index.​html
14.
Shorten C, Khoshgoftaar TM (2019) A survey on image data augmentation for deep learning. J
Big Data 6(1):60
[Crossref]
15.
Armstrong JS (2004) The Graffiti problem. General Economics and Teaching, University Library
of Munich, Germany
16.
Wardhana N, Ellisa E (2023) Youth tactics of urban space appropriation: case study of
skateboarding and graffiti. J Asian Arch Build Eng
17.
Stabnikov V, Chu J, Myo AN et al (2013) Immobilization of sand dust and associated pollutants
using bioaggregation. Water Air Soil Pollut 224
18.
Oladumiye EB (2013) Urban environmental graphics: impact, problems and visual pollution of
signs and billboards in Nigeria. Int J Educ Res 1:89
19.
Sarkar S (2003) Qualitative evaluation of comfort needs in urban walkways in major activity
centers. Transp Q 57(4):39–59
20.
Ściężor T (2021) Effect of street lighting on the urban and rural night-time radiance and the
brightness of the night sky. Remote Sens 13(9):1654. https://​doi.​org/​10.​3390/​rs13091654
[Crossref]
21.
Andjarsari S, Subadyo AT, Bonifacius N (2022) Safe construction and visual pollution of
billboards along main street. In: IOP conference series: earth and environmental science, vol 999,
no 1, p 012015
22.
Lafta ZQ, Al-Shamry MMH (2022) The spatial variation of the aspects of environmental
pollution in Al-Hilla city and its environmental effects. Baltic J Law Polit 15(2)
23.
Naveen N, Mallesh Yadav S, Sontosh Kumar A (2018) A study on potholes and its effects on
vehicular traffic. Int J Creat Res Thoughts 6(1):2320–2882
24.
Latypova A, Lenkevich A, Kolesnikova D, Ocheretyany K (2022) Study of visual garbage as
visual ecology perspective. gmd 4(2):153–172
25.
https://​geopandas.​org/​en/​stable/​
26.
Everingham M et al (2010) The PASCAL visual object classes (VOC) challenge. Int J Comput
Vision 88(2):303–338
[Crossref]
27.
Chmielewski S (2020) Chaos in motion: measuring visual pollution with tangential view
landscape metrics. Land 9(12):515. https://​doi.​org/​10.​3390/​land9120515
[Crossref]
28.
Zaeimdar M, Khalilnezhad Sarab F, Rafati M (2019) Investigation of the relation between visual
pollution and citizenry health in the city of Tehran (case study: municipality districts No.1 & 12
of Tehran). Anthropog Pollut 3(1):1–10

OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed
Computing , Studies in Computational Intelligence 1153
https://doi.org/10.1007/978-3-031-56388-1_9

Subcellular Protein Patterns Classification


Using Extreme Gradient Boosting
with Deep Transfer Learning as Feature
Extractor
Manop Phankokkruad1 and Sirirat Wacharawichanant2
(1) School of Information Technology, King Mongkut’s Institute of
Technology Ladkrabang, Bangkok, Thailand
(2) Department of Chemical Engineering, Silpakorn University, Nakhon
Pathom, Thailand

Manop Phankokkruad (Corresponding author)


Email: [email protected]

Sirirat Wacharawichanant
Email: [email protected]

Abstract
Proteins are essential structural and functional components of human cells.
Understanding and identifying proteins can provide valuable insights into
their structure, function and role in human body. Subcellular proteins
provide the expression that characterizes the many proteins and their
conditions across cell types. This work proposed a classification model for
subcellular protein patterns using XGBoost with transfer learning of CNN
as the feature extractor. In the model training process, we used ResNet50,
VGG16, Xception, and MobileNet as the pre-trained models based on the
transfer learning technique to extract different features. The proposed model
was used to classify subcellular proteins into 28 patterns. The XGBoost
with ResNet50, VGG16, MobileNet, and Xception model achieved an
accuracy level of 92.20% 92.77%, 91.63%, and 91.44%, respectively. The
XGBoost with ResNet50, VGG16, MobileNet, and Xception model
obtained an F1 score of 0.9179, 0.9233, 0.9131, and 0.9152, respectively.
Considering the F1 score, All XGBoost with transfer learning of CNN
models gave a high score. Therefore, all evaluation parameters clearly
demonstrate the high performance of the subcellular protein pattern
classification model.

Keywords Human protein atlas – Subcellular protein – Transfer learning –


Xgboost – ResNet50 – VGG16 – MobileNet – Xception – Deep Learning –
Classification

1 Introduction
Proteins are essential structural and functional components of human cells.
Understanding and identifying proteins can provide valuable insights into
their structure, function and role in human body. The human protein atlas
(HPA) is a large source of proteomic data that need to annotate and
characterize the many proteins and their conditions across cell types. The
subcellular protein is section of the HPA provides the expression and
spatiotemporal distribution of proteins encoded. It is made up of a protein
database of proteomic microscopy images in various different cell lines and
tissues.
Since the monstrous amounts of image data, it is difficult for the
machine to identify the data of these cells. Therefore, a powerful tools are
needed to annotate and characterize the proteins and their conditions. For
instance, Newberg et al. [1] presented an automated method for processing
and classifying major subcellular patterns in the Atlas images. They used
support vector machine and random forest as the classification frameworks
at determining subcellular locations. In a similar way, Shwetha et al. [2]
used a conventional approach to extract the various features the images.
Then the features are fed into a classifier. Moreover, they used a
convolutional neural network (CNN) to extract features and classify the
images into 15 classes. Although the human protein classification problem
has been successfully solved by deep learning methods, the extraction of
discriminant features from images is a very hard and time-consuming
process. Extreme Gradient Boosting(XGBoost) is a highly efficient
algorithm which pushes the limit of computational resources. A interesting
study on XGBoost is proposed by Sugiharti el al. [3]. They combined the
CNN with XGBoost for enhancement the accuracy of early detection of
breast cancer on mammogram images. In a like manner, Punuri et al. [4]
used CNN in conjunction with XGBoost technique in facial emotion
recognition. Additionally, Jiang et al. [5] used transfer learning with
XGBoost method to predict protein-coupled receptor interaction. They used
XGBoost as a weak classifier and using the TrAdaBoost algorithm based on
Jensen-Shannon divergence. After that, the transfer learning of CNN was
used for model training. Khan et al. [6] proposed CNN-XGBoost method
for recognizing three dimensions of emotion. They applied several feature
extraction techniques such as fast fourier transformation, and discrete
cosine transformation.
In summary, the combination technique of XGBoost and transfer
learning of CNN is the challenging and most effective way to classify
subcellular protein pattern images. Another challenge is creating a
classification model in the multiclass and imbalance of data. In this work,
we will be used transfer learning of CNN to train the model on the HPA
images dataset to retrieve the features. Then we applied XGBoost classifier
to the extracted features from the transfer learning technique to classify the
subcellular protein into 28 patterns. Furthermore, we have performed the
data pre-processing and augmentation in a dataset. The hybrid model of
transfer learning of CNN and XGBoost will be fine-tuned with the optimal
parameters to provide the best performance.
The paper is organized as follows. Section 2 describes the research
methods, techniques, and evaluation metrics used in this study. Section 3
describes the experimental processes, including data preparation, model
creation, and parameter tuning. Section 4 presents the results of the
experiments and their evaluation. Finally, Sect. 5 concludes the paper.

2 Methodology
In this section, we will describe the research methodology and techniques
used to confirm that the processed classification method is the most
accurate. These techniques include convolutional neural networks, transfer
learning, extreme gradient boosting, and evaluation metrics.
2.1 Transfer Learning of Convolutional Neural Network
A convolutional neural network (CNN) is a powerful type of artificial
neural network that can be used for image recognition and processing.
CNNs have a basic architecture that consists of multiple layers, each of
which is responsible for recognizing and translating different types of
information from images. Transfer learning is a technique that uses an
existing model as a starting point for training a new model. This can save
time and improve performance, as the new model can learn from the
knowledge that the existing model has already learned. In this work, we
used the ResNet50, VGG16, Xception, and MobileNet as pre-trained
models. These models were selected because they are lightweight,
frequently used, low-resource, and high-efficiency models.
Residual Network (ResNet) is a popular deep learning model for
computer vision applications. It was first introduced in 2015 by He et
al. [7]. ResNets have different layer-based architectures, depending on their
variants. The original ResNet architecture has 34 layers and uses skip
connections to connect different layers. These skip connections allow the
network to learn residual mappings, which are functions that map the input
to the output. ResNets are made by stacking residual blocks together. Each
residual block consists of two 3 3 convolutional layers with a ReLU
activation function. In this study, we used ResNet50, which has 50 layers.
ResNet50 is a popular choice for computer vision tasks because it is
accurate and efficient.
VGG [8] is a classic CNN architecture that improves on AlexNet by
using multiple 3 3 convolutional filters instead of larger filters. The VGG
architecture consists of the following layers. Input layer receives a 224
224 image. Convolutional layers use multiple 3 3 convolutional filters.
Rectified linear unit activation function (ReLU) use for reducing training
time. Hidden layers use ReLU instead of Local Response Normalization to
improve overall accuracy. Pooling layers use to reduce the dimensionality
and number of parameters of the feature maps. VGG uses three fully
connected layers. In this work, we used VGG16, which has 16 layers.
Xception [9] is improved from Inception architecture(GoogLeNet) with
a modified depthwise separable convolution. The architecture of Xception
consists of depth-wise separable convolution layers and their residual
connections. The depth-wise separable convolution is an Inception module
with a large number of layers. This modification gives more efficiency in
terms of computation time. The depth-wise separable convolutions consist
of two major processes include depth-wise convolution, and point-wise
convolution. The Xception model is composed of 36 convolutional layers
forming the feature extraction, and more than 22 million parameters.
MobileNet [10] is a mobile and embedded vision application of CNN.
MobileNet is widely used in object detection, fine-grained classifications,
face attributes, etc. MobileNet is the most efficient and lightweight neural
network that uses depth-wise separable convolutions to construct light-
weight deep convolutional neural networks. The archtecture of MobileNet
is based on depth-wise separable filters which are comprised of two layers,
depth-wise convolution filters, and point convolution filters. Depth-wise
convolution filters are applied to each input channel independently, unlike
standard convolution filters which are applied to all input channels at once.
The point convolution filter is used to combine the output of depth-wise
convolution with 1 1 convolutions to create a new feature.

2.2 Extreme Gradient Boosting


The extreme gradient boosting (XGBoost) [11] is a scalable machine
learning algorithm which is based on the gradient boosting algorithm. An
ordinary gradient boosting algorithms works by using the first derivative
, whereas XGBoost uses both the first , and second derivative of loss
function. This makes XGBoost performs more accurately on creates a
prediction model. The following equations are used in XGBoost.

(1)

(2)

Equation (2) implies the scoring function which is used to measure the
quality of the tree structure q. XGBoost performs with speed and accurate
prediction by merging both weak and strong learners iteratively to meet the
highest classification accuracy rate[12]. In general, XGBoost employs the
weak predictor principle of the ensemble decision tree. A performance
achieves by optimizing the loss function value in boosting process. The
XGBoost model for classification problems is called XGBClassifier. We
have used XGBoost in the classification step. After applying CNN to
extract the features, the retrieved features are used by XGBoost to classify
all of the subcellular protein patterns. The advantage of XGBoost is its
scalability, fast speed, and high accuracy.

Fig. 1 An architecture of the proposed XGBoost model based on transfer learning as feature
extractor

2.3 Proposed Model


The proposed model architecture is depicted in Fig. 1. This architecture
consists of three modules which include the transfer learning of the CNN
module, the fully connected layer, and the XGBoost classifier. In transfer
learning of CNN, we used ResNet50 and VGG16 to train subcellular
protein image data. The learning process starts with the input images fed
into the first layer. The next layer identifies complex features such as
texture or shape. This process continues until all complex features have
been identified. We used the pre-trained weights and did fine-tune some
parameters.
Usually, the fully connected layer consists of three kinds of layers
including the pooling layer, the dropout layer, and the dense layer. The
pooling layers are used to generate feature sets consistent with classification
levels. During training the data by using the transfer learning model, the
model might overfit because the model performs well on the training data
but it is not proper on the test data. To solve this problem, the dropout layer
is applied to prevent over-fitting. This layer forces the network to learn
more robust features which are not specific to the training data. Finally, we
replaced the existing XGBoost classifier with a new model that includes
fully connected layers and a classifier. This new model consists of a transfer
learning base for feature extraction, followed by fully connected layers for
fine-tuning, and finally an XGBoost classifier.

2.4 Model Evaluation


To prove the model’s performance, we analyzed the statistical values
calculated from the confusion matrix. A confusion matrix is a tool used to
evaluate the performance of a classification model. The confusion matrix
shown in Fig. 2 is calculated to find the accuracy, precision, sensitivity, and
F1 score. The confusion matrix is a kind of table that is used to evaluate the
performance of the classification model. It represents the actual values to
the predicted values. The confusion matrix for N-class classification is the
dimension table. The terms in the confusion matrix are True
Positive (TP), True Negative (TN), False Positive (FP), and False Negative
(FN). FP is predicted value to be positive whereas it is actually a negative
value. TP is predicted value to be positive and it is actually a positive value.
TN is predicted value to be negative and it is actually a negative value. FN
is predicted value to be negative whereas it is actually a positive value. The
performance metrics that can be calculated from the confusion matrix are
Accuracy, Precision, Sensitivity, and F1 score. These metrics can be used to
evaluate the level of confidence and performance of a classification model.

(3)

where Accuracy is the percentage of data points that are correctly classified.

(4)
where Precision is the percentage of predicted positive data points that are
actually positive.

(5)

where Sensitivity (also known as Recall) is the percentage of positive data


points that are correctly classified.

(6)

where F1 score is a measure of both precision and Sensitivity.

Fig. 2 Confusion matrix

3 Experiments
The subcellular protein pattern classification processes are designed for
multiple steps which are illustrated in Fig. 1.
Fig. 3 Number of images in each class
Fig. 4 An example of images in dataset

3.1 Data Preparation


This work collected data from the subcellular protein image dataset which
is a section of the HPA datasets. The subcellular localization of the protein
has 4,690 images belonging to 28 different organelles and subcellular
structures. All image samples are represented by four filters which include
the protein of interest, nucleus, microtubules, and endoplasmic reticulum in
different colors. The protein of interest filter should hence be used to
predict the label, and the other filters are used as references. The
distribution of the dataset is depicted in Fig. 3 and the example of images is
illustrated in Fig. 4. Figure 4 presents the sample images in the dataset.
During this phase, we reshaped the images suitable for operation size. As
depicted in Fig. 3, the number of images in the classes are not equal. This is
referred to as class imbalance. We solve this problem by adding newly
generated images into the smaller classes. We used the data augmentation
technique in increasing the size of some classes in the dataset by doing the
process such as zoom, width shift, range of brightness, and horizontal flip.

3.2 Creating Model and Tuning


Figure 1 depicts the overall model architecture. We created two different
transfer learning models by using ResNet50, VGG16, MobileNet and
Xception. This part is used to extract some features. Then, we combined it
with the dropout layer and dense layer. Finally, the transfer learning of the
CNN model is combined with the XGBoost model.
In the case of model tuning, we conducted the experiments by defining
the significant parameters, the optimal parameters were assigned and did
the fine-tuning using the hyperparameter optimization technique.
In step 1, we tuned the transfer learning model in the training process by
assigning the essential used parameters. The purpose of this step is to
extract the number of features from the subcellular protein images. We
reused the existing weights and unfreeze some layers during the training
step. Then, we used the following parameters during model training such as
the optimizer is Adam, the batch size is 32 with 30 epochs, the learning rate
is 0.001, and the loss function is categorical cross-entropy.
In step 2, we tune the parameter for the hybrid XGBoost with transfer
learning model using the hyper parameter technique same as the training
step. XGBoost parameters are set as follows; max depth is 6, learning rate
0.1, number of classes is 28, objective is softprob, and eval metric is merror
(multiclass classification error rate).
Furthermore, we evaluated the models using a k-fold cross-validation by
giving k as 5. Additionally, we used a confusion matrix to determine the
model performance by calculating an accuracy, sensitivity, precision, and
F1 score to measure the performance in classification problems. The
mathematical expressions are presented in (3)–(6).
Fig. 5 Learning curve of accuracy at 30 epochs using a ResNet50, b VGG16, c MobileNet, and d
Xception
Fig. 6 Learning curve of loss at 30 epochs using a ResNet50, b VGG16, c MobileNet, and d
Xception
Fig. 7 The comparison of accuracy was achieved by four pre-trained models ResNet50, VGG16,
MobileNet, and Xception
Fig. 8 The confusion matrix obtained by a XGBoost with ResNet50, b XGBoost with VGG16, c
XGBoost with MobileNet, and d XGBoost with Xception

4 Results and Evaluations


Figure 5 depicts the training and validation accuracy of the proposed model
at 30 epochs. XGBoost with ResNet50 achieved an average accuracy of
92.20%, XGBoost with VGG16 of 92.77% XGBoost with MobileNet of
91.63%, and XGBoost with Xception of 91.44%, respectively. Figure 6
shows the training and validation loss curves of XGBoost with ResNet50,
VGG16, MobileNet, and Xception at 30 epochs, repectively. The
comparison of accuracy plot is depicted in Fig. 7. Figure 8 shows the
confusion matrix of XGBoost with ResNet50, VGG16, MobileNet, and
Xception, repectively. Furthermore, we evaluated the model’s performance
by measuring the precision, sensitivity, and F1 score which is shown in
Table 1. The study achieved an average precision, sensitivity, and F1 score
are 0.9192, 0.9166, and 0.9179, respectively. Like as VGG16, we obtained
an average precision, sensitivity, and F1 score are 0.9238, 0.9228, and
0.9233, respectively. In the case of Xception model, we obtained an average
precision, sensitivity, and F1 score are 0.9238, 0.9148, and 0.9152,
respectively. Finally, for MobileNet, the model gave an average precision,
sensitivity, and F1 score are 0.9235, 0.9149, and 0.9131, respectively. By
considering the F1 score, the XGBoost with transfer learning both models
gave the high F1 score which means the proposed model is a high-
performance model. It is clearly observed that from Fig. 5 and Table 1 all
the evaluation represent the classification performance of the model.
Table 1 The model evaluation of performance of accuracy, precision, sensitivity, and F1 score by
ResNet50, VGG16, MobileNet and Xception as feature extractor

Parameters ResNet50 VGG16 MobileNet Xception


Accuracy (%) 92.20 92.77 91.63 91.44
Precision 0.9192 0.9238 0.9235 0.9238
Sensitivity 0.9166 0.9228 0.9149 0.9148
F1 score 0.9179 0.9233 0.9131 0.9152

5 Conclusion
This work proposed a classification model for subcellular protein patterns
using XGBoost with transfer learning of CNN as the feature extractor. In
the model training process, we used ResNet50, VGG16, MobileNet and
Xception as the pre-trained models based on the transfer learning technique
to extract different features. The proposed model was used to classify
subcellular proteins into 28 patterns. The XGBoost with ResNet50, VGG16,
MobileNet, and Xception model achieved an accuracy level of 92.20%
92.77%, 91.63%, and 91.44%, respectively. The XGBoost with ResNet50,
VGG16, MobileNet, and Xception model obtained an F1 score of 0.9179,
0.9233, 0.9131, 0.9152, respectively. Considering the F1 score, All
XGBoost with transfer learning of CNN models gave a high score.
Therefore, all evaluation parameters clearly demonstrate the high
performance of the subcellular protein pattern classification model.

References
1. Li J, Newberg JY, Uhlén M, Lundberg E, Murphy RF (2012) Automated analysis and
reannotation of subcellular locations in confocal images from the human protein atlas. PLOS
ONE 7(11):1–10, 11
2.
Shwetha TR, Thomas SA, Kamath V, Niranjana KB (2019) Hybrid xception model for human
protein atlas image classification. In: 2019 IEEE 16th India council international conference
(INDICON), pp 1–4
3.
Sugiharti E, Arifudin R, Wiyanti DT, Susilo AB (2021) Convolutional neural network-xgboost
for accuracy enhancement of breast cancer detection. J Phys Conf Ser 1918(4):042016
4.
Punuri SB, Kuanar SK, Kolhar M, Mishra TK, Alameen A, Mohapatra H, Mishra SR (2023)
Efficient net-xgboost: an implementation for facial emotion recognition using transfer learning.
Mathematics 11(3)
5.
Tengsheng J, Yuhui C, Shixuan G, Hu Z, Lu W, Fu Q, Yijie D, Haiou L, Wu H (2022) G protein-
coupled receptor interaction prediction based on deep transfer learning. IEEE/ACM Trans
Comput Biol Bioinform 19(6):3126–3134
6.
Khan MS, Salsabil N, Alam MGR, Dewan MAA, Uddin MZ (2022) Cnn-xgboost fusion-based
affective state recognition using EEG spectrogram image analysis. Sci Rep 12(1):14122
7.
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. CoRR
abs/1512.03385
8.
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image
recognition. In: Bengio Y, LeCun Y (eds) 3rd international conference on learning
representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, conference track proceedings
9.
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE
conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, Los
Alamitos, CA, USA, pp 1800–1807
10.
Andrew G, Howard, Menglong Z, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M,
Adam H (2017) Efficient convolutional neural networks for mobile vision applications.
Mobilenets
11.
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the
22Nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD
’16. ACM, New York, NY, USA, pp 785–794
12.
Friedman JH (2000) Greedy function approximation: a gradient boosting machine. Ann Stat
29(5):1189–1232
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed
Computing , Studies in Computational Intelligence 1153
https://doi.org/10.1007/978-3-031-56388-1_10

Building a Shapley FinBERTopic System


to Interpret Topics and Articles Affecting
Stock Prices
Yoshihiro Nishi1 and Takahashi Hiroshi1
(1) Graduate School of Business Administration, Keio University,
Kanagawa-Ken, Japan

Yoshihiro Nishi
Email: [email protected]

Abstract
Financial sentiment analysis can help investors make more efficient
decisions. Finan-cial sentiment analysis involves analyzing text data to
extract sentiments and assists market participants in making informed
investment decisions. In recent years, there have been many efforts to
improve analysis accuracy using machine learning and deep learning.
Particularly FinBERT, which is pre-trained on financial datasets and excels
in analyzing financial documents. However, the output may require more
information to make decisions. It has the disadvantage of being difficult to
use for critical decision-making because of the need for help explaining the
output results. This study applies the BERTopic and SHAP to financial
sentiment analysis by FinBERT. It proposes a Shapley FinBERTopic
system, which contributes to better investor decision-making by explicitly
showing the impact of news article topics and individual words on stock
prices. The results of the experiments with this system showed an increased
interpretability of the results compared to using FinBERT alone. A more
detailed analysis will be conducted in the future.
Keywords Decision support system – Text analysis – Financial markets –
Explainable AI – SHAP – BERTopic – FinBERT

1 Introduction
Sentiment analysis using natural language processing is gaining attention in
the financial sector for analyzing information about stock prices and market
trends [1–3]. Sentiment analysis can analyze text data to extract positive,
negative, or neutral information. Market participants can make more
accurate investment decisions by understanding sentiment analysis results.
The adoption of deep learning models to sentiment analysis in the financial
sector is a topic that has received much attention in recent research.
Previous research has used deep learning models such as LSTM (Long
Short-Term Memory) [4], CNN (Convolutional Neural Network), and
BERT (Bidirectional Encoder Representations from Transformers) [3, 5].
Among them, FinBERT has been pre-trained on a financial-specific dataset
and is highly accurate in the analysis of financial documents [6]. These
methods have the potential to provide higher forecasting accuracy.
However, challenges remain in the explanatory and interpretability of deep
learning models. The deep learning-based analysis is becoming increasingly
valuable, but it has the disadvantage of being difficult to use for critical
decision-making because of the need for help explaining the output results.
Topic Modeling is a method used in text mining and natural language
processing to discover abstract topics in a collection of documents. The
primary aim is to identify hidden thematic structures within the text body. A
common technique employed for this purpose is Latent Dirichlet Allocation
(LDA), which bases the discovery of topics on word frequency and co-
occurrence [7]. This method has been widely used for various purposes like
document clustering, organizing large blocks of textual data, information
retrieval, and feature selection [8]. BERTopic is a modern take on topic
modeling that utilizes BERT embeddings. Unlike traditional methods like
LDA, BERTopic explores contextual word embeddings to apprehend
semantic relationships among words and documents [9]. One of the features
of BERTopic is its semantic understanding, which is achieved through
BERT's contextual embeddings. This feature enables BERTopic to generate
more coherent and interpretable topics than traditional methods. Another
feature is the hierarchical topic reduction, which provides a more granular
understanding of topics. BERTopic also offers built-in visualization support
to help understand and interpret the topic distribution across the corpus. It is
customizable to a reasonable extent, supporting different transformer
models and offering parameters to tweak the topic modeling process as per
requirements. Additionally, BERTopic is known for creating stable topics
even with small datasets, addressing a common issue with traditional topic
modeling techniques.
Researchers are working to develop XAI (Explainable Artificial
Intelligence), as the importance of transparency and accountability in AI has
been increasing recently [10]. XAI technology makes decision-making
more understandable to humans [11, 12]. SHAP (Shapley Additive
Explanations) have received particular attention among XAI techniques.
SHAP can evaluate the importance of each feature to the predictions of a
machine learning model [13]. SHAP is a technique for improving the
explanatory power of predictions in machine learning models and applies to
natural language processing analysis. SHAP allows one to explain how the
model is making predictions, thereby improving the explanatory nature of
the model [14].
This study applies the BERTopic and SHAP to financial sentiment
analysis by FinBERT. It proposes a Shapley FinBERTopic system, which
contributes to better investor decision-making by explicitly showing the
impact of individual news article topics and individual words on stock
prices. A Shapley FinBERTopic uses investors as a representative example
to analyze model-based decision-making by contributing to a more efficient
understanding of the model's features by those involved in the decision-
making process, thereby supporting better decision-making. In addition, the
effectiveness of the proposed method is also experimented in conjunction
with the proposed method.

2 Related Works
The financial industry is actively using natural language processing
techniques for analysis [15–19]. Also, Financial sentiment analysis using
natural language processing techniques plays an essential role in the
financial sector and has been the subject of many studies [1–3]. Financial
natural language processing sentiment analysis was used in one study to
examine the relationship between news articles and stock prices, and the
results showed well against existing quant funds [20]. Another study used
natural language processing sentiment analysis to predict stock prices by
analyzing [21]. As described above, financial natural language processing
sentiment analysis plays an essential role in the financial field, and its
application is expected to expand.
FinBERT is a BERT model specific to the financial sector and is used
for sentiment analysis of stock markets and financial instruments [22]. In a
study that evaluated the construction and performance of FinBERT,
researchers showed that FinBERT is more suitable for text mining in the
financial field than the general BERT model [6]. FinBERT is expected to be
a useful deep-learning model for sentiment analysis in financial markets.
However, the default output from FinBERT sentiment analysis is only a
single label and score. The interpretability of the model's output is limited,
as with many deep learning models.
Topic Modeling embodies a fundamental approach within text mining
and natural language processing, facilitating the unraveling of abstract
topics dispersed across a corpus of documents. The principal objective is to
discern concealed thematic architectures embedded within the textual body.
A frequently employed technique for realizing this objective is the LDA,
which predicates discussing topics on word frequency and co-occurrence
metrics [8]. This method has garnered widespread application across
diverse realms, such as document clustering, orchestration of voluminous
textual data, information retrieval, and feature selection, attesting to its
utility and robustness [23–25]. BERTopic emerges as a nuanced rendition of
topic modeling, harnessing the prowess of BERT embeddings to fathom the
semantic landscapes within texts [9]. Also, BERTopic is gradually being
used in the financial domain, and there are reports that it is more flexible
than LDA and LSA (Latent Semantic Analysis), can capture semantic
relationships that LDA and LSA cannot, and provides meaningful and
diverse topics [26]. In this study, the operational essence of BERTopic is to
ameliorate the interpretability lacuna that encumbers the sentiment analysis
utilization from FinBERT, hence rendering these outputs more intelligible
and actionable. Incorporating BERTopic engenders an auxiliary stratum of
topic-centric interpretability, situating itself as a vital instrument to amplify
the comprehension of sentiment trajectories in financial textual data. This
augmentation harbors the potential to catalyze more enlightened decision-
making paradigms within the financial sector, underscoring the substantive
value of augmenting interpretability in machine-driven sentiment analysis.
The symbiotic amalgamation of BERTopic and FinBERT not merely
endows the system with an enhanced cognizance of the predominant
sentiment in financial dialogues but also enriches the understanding of the
contextual themes intricately woven within the sentiment indices.
Further, efforts to improve the interpretability of analytical models are
progressing in the field of XAI. Among them, SHAP is a method for
explaining the predictions of machine learning models. This method
evaluates the extent to which individual features contribute to predictions.
Specifically, it evaluates how the predictions change when individual
features are removed from the model and calculates the importance of the
features based on the degree of change. SHAP uses the Shapley value, a
method used in game theory to quantify how multiple players distribute the
benefits of cooperation. The method accurately assesses how much an
individual feature contributes to the model's predictions. SHAP can be
applied to a wide variety of machine learning models. In addition,
visualization of SHAP values provides an intuitive understanding of the
importance of features [13], and numerous studies have shown their
effectiveness. For example, SHAP has been proposed as an interpretation
method for tree-based models from local explanations to global
understanding [27]. There are other examples where SHAP has been
applied to make the explanatory properties of machine learning models
more robust and easier to interpret [28]. Some usefulness of SHAP has also
been reported in efforts to apply SHAP to classification and analysis models
in the financial field [29–31]. There has also been an attempt to explain
agent behavior in financial stock trading by employing SHAP in Deep Q
Network (DQN), a deep reinforcement learning architecture [32]. There are
also active efforts to apply SHAP to natural language processing, and some
have applied SHAP to specific models [14]. These studies show that SHAP
is a promising method for explaining predictions in machine learning and
deep learning models using natural language processing.
As described above, natural language processing financial sentiment
analysis is a promising field that is expected to develop in the future.
Pretrained models such as FinBERT, specialized in the financial field, have
been released, making natural language processing financial sentiment
analysis via FinBERT possible. However, the interpretability of the model's
default output is still an issue, and there is room for progress. Incorporating
BERTopic alongside FinBERT could potentially address the interpretability
issue by uncovering thematic structures within financial texts, thereby
providing a layer of topic-based interpretability to the sentiment analysis
output generated by FinBERT, thus contributing to the ongoing
advancements in Explainable AI (XAI) within the domain of financial
sentiment analysis. Further, efforts to improve the interpretability of
analytical models in the field of XAI are progressing, and models such as
SHAP have emerged. SHAP has been reported as a promising method for
explaining predicates in machine learning and deep learning models using
natural language processing. Combining them to build new analytical
models has the potential to improve the interpretability of the models
efficiently.

3 Shapley FinBERTopic System


FinBERT, trained on a dataset specifically tailored towards financial
discourse, manifests a high degree of accuracy in scrutinizing financial
documents. A particular instantiation of FinBERT, denoted as FinBERT-
Sentiment, when fine-tuned to a corpus of 10,000 manually annotated
sentences extracted from analyst reports of S&P 500 companies, yields
analysis results of commendable accuracy. Nonetheless, a conspicuous
limitation arises in the default output generated by FinBERT. Figure 1
compares the FinBERT default output example with the Shapley FinBERT
topic system default output example. the default output of the primal
FinBERT model is confined to three categorizations—Positive, Neutral, and
Negative—accompanied by a score delineating the extent to which the
scrutinized sentences can be attributed to the designated label. We adapted
BERTopic and SHAP to the FinBERT framework to transcend this
limitation, constructing an integrative system. Shapley FinBERTopic
System can provide a variety of outputs in addition to the original outputs.
Figure 2 shows the main components of Shapley FinBERTopic. The textual
corpus destined for analysis is tokenized via the finbert-tone mechanism,
and the ensuing tokens are channeled to inputs interfaced with the FinBERT
model, thus engaging in sentiment-centric classification analysis. After this,
BERTopic uses the text and labels funneled into FinBERT to execute topic
modeling, thereby unveiling the textual corpus's thematic essence.
Concurrently, upon receiving the inputs, the model configuration, the
designated label, and the f(inputs) score, SHAP deciphers and elucidates the
impact exerted by each input on the f(inputs) score. Intrinsically, SHAP
does not accommodate revalued outputs such as Positive, Neutral, and
Negative, thereby necessitating a reconceptualization whereby the impact of
the Positive categorization is harnessed to discern whether an analyte
imparts a positive influence. The output generated encapsulates an
assessment of whether the text under examination exerts a positive or
negative impact on the designated label. A harmonization of the scoring
metric is maintained between FinBERT and SHAP, facilitating
decomposition and disclosure of the impact of inputs on the resultant
output.
Fig. 1 Example of FinBERT default and Shapley FinBERTopic system output

Fig. 2 Main components of a Shapley FinBERTopic system


Furthermore, the integration of SHAP amplifies the interpretability of
individual news articles, bestowing upon the system a refined capacity to
extrapolate insightful deductions from financial narratives. This synthesis
enhances and emboldens the ability to dissect and comprehend the intricate
interplay of thematic elements and sentiment orientations within the
financial textual domain. Through this integrative approach, a Shapley
FinBERTopic System endeavors to obviate the interpretability bottleneck,
fostering a more sophisticated understanding of financial documents and the
underpinning sentiment dynamics.

4 Experiments
In this study, 337,430 news articles distributed by Thomson Reuters in the
financial markets in 2017 were used as data for analysis. Figures 3 and 4
show the results of tokenizing the obtained news articles by finbert-tone and
analyzing them by FinBERT. The vertical axis indicates the number of news
articles, and the horizontal axis shows the FinBERT score. The horizontal
axis means the certainly with which the target news article can be the label
of the output result. 37,803 news articles were classified as Positive,
297,521 news articles were classified as Neutral, and 2,860 news articles
were classified as Negative. News articles classified as Negative were
2,106. High scores of 0.9–1.0 predicted a high percentage of labels for all
labels.

Fig. 3 FinBERT results in all labels

Fig. 4 FinBERT results for positive and negative labels


The data passed to FinBERT were then passed to BERTopic for topic
modeling. When making investment decisions, what factors are considered
positive and deemed negative is essential information. The topic modeling
was performed on news articles classified as having a positive or negative
impact on stock prices. Figure 5 shows the BERTopic results for news
articles classified as positively affecting stock prices. Figure 6 shows the
BERTopic results for news articles classified as having a negative effect on
stock prices. Note that the top 10 topics for each label are extracted for the
number of topics. The vertical axis indicates the number of news articles,
and the horizontal axis shows the FinBERT score of each news article. The
most common topics for both positive and negative news articles were news
articles that used quantitative indicators such as company performance in
combination with words that could be inferred to be good or bad, with
positive news articles using words such as stock, since, pct, as, much,
biggest, intraday, highest, gain, and posts for positive news articles.
Furthermore, pct, as, stock, since, shares, much, biggest, lowest, and index
are representative words for negative news articles.

Fig. 5 BERTopic results for positive labels in a Shapley FinBERTopic system


Fig. 6 BERTopic results for negative labels in a Shapley FinBERTopic system
Figure 7 shows the SHAP results in the Shapley FinBERTopic System.
The top display labels and scores are the results of the output by FinBERT.
The middle and lower figures in the figure show which words had which
and how much impact on the score. In the middle figure, the words that
significantly impact the score are marked in a darker color. The
quantitatively indicating the impact of each word or phrase in the sentence.
The graph at the bottom shows the impact of each word in the sentence on
the score in a bar chart. It can be seen that the word “highest” has the most
significant impact on the classification, except “other.“ The clustering
cutoffs described in the bottom right-hand corner of the figure are
hierarchical clustering of features by training the XGBoost model to predict
the outcome of each pair of input features. For a typical tabular dataset, this
provides a much more accurate measure of feature redundancy than can be
obtained with unsupervised methods such as correlation. Once such a
clustering is computed, it can be passed to a bar chart to visualize both
feature redundancy structure and feature importance simultaneously. Instead
of displaying all of the clustering structures, only the portions of the
clustering where the distance is less than 0.5 are shown.
Fig. 7 SHAP results for in a Shapley FinBERTopic system

5 Conclusions and Future Works


We constructed a Shapley FinBERTopic System in this study and tested its
effectiveness. We found that a Shapley FinBERTopic System can output
more interpretable results supporting investor decision-making than
traditional FinBERT results. The output of a Shapley FinBERTopic system
can include topics that influenced the stock price, the contribution of each
word to the score, Etc. These outputs help better interpret and explain the
output of FinBERT when making investment decisions in practice.
Future works will compare results and accuracy from multiple XAI
methods and the adoption and construction of more human-centric models.

Acknowledgements
This research was partially supported by Telecommunications Advancement
Foundation, JSPS KAKENHI Grant Number JP20K01751and Keio
University Academic Development Funds.

References
1. Kearney C, Liu S (2013) Textual sentiment analysis in finance: a survey of methods and models.
SSRN Electron J.
2.
Kearney C, Liu S (2014) Textual sentiment in finance: a survey of methods and models. Int Rev
Financ Anal 33:171–185
[Crossref]
3.
Man X, Luo T, Lin J (2019) Financial sentiment analysis (FSA): a survey. In: 2019 IEEE
international conference on industrial cyber physical systems (ICPS). pp 617–622. https://​doi.​
org/​10.​1109/​ICPHYS.​2019.​8780312
4.
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
[Crossref]
5.
Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional
transformers for language understanding. arXiv:​1810.​04805
6.
Araci D (2019) FinBERT: financial sentiment analysis with pre-trained language models. arXiv:​
1908.​10063
7.
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
8.
Blei DM, Lafferty JD (2007) A correlated topic model of science. Ann Appl Stat 1(1):17–35.
https://​doi.​org/​10.​1214/​07-AOAS114
[MathSciNet][Crossref]
9.
Grootendorst M (2022) Bertopic: neural topic modeling with a class-based TF-IDF procedure.
arXiv:​2203.​05794
10.
Adadi A, Berrada M (2018) Peeking inside the black-box: a survey on explainable artificial
intelligence (XAI). IEEE Access 6:52138–52160
[Crossref]
11.
Molnar C (2022) Interpretable machine learning. Github.io. https://​christophm.​github.​io/​
interpretable-ml-book/​
12.
Doshi-Velez F, Kim B (2018) Towards a rigorous science of interpretable machine learning.
arXiv:​1702.​08608
13.
Lundberg SM, Lee S (2017) A unified approach to interpreting model predictions. In:
Proceedings of the 31st international conference on neural information processing systems
(NIPS’17). pp 4768–4777
14.
Mosca E, Szigeti F, Tragianni S, Gallagher D, Groh G (2022) SHAP-based explanation methods:
a review for NLP interpretability. In: Proceedings of the 29th international conference on
computational linguistics. pp 4593–4603
15.
Xing FZ, Cambria E, Welsch RE (2017) Natural language based financial forecasting: a survey.
Artif Intell Rev 50:49–73
[Crossref]
16.
Nishi Y, Suge A, Takahashi H (2021) Construction of a news article evaluation model utilizing
high-frequency data and a large-scale language generation model. SN Bus & Econ 1
17.
Nishi Y, Suge A, Takahashi H (2020) News articles evaluation analysis in automotive industry
using GPT-2 and co-occurrence network. In: New frontiers in artificial intelligence. pp 103–114
18.
Nishi Y, Suge A, Takahashi H (2020) Construction of news article evaluation system using
language generation model. Agents Multi-Agent Syst: Technol Appl 2020:313–320
19.
Nishi Y, Suge A, Takahashi H (2019) Text analysis on the stock market in the automotive
industry through fake news generated by GPT-2. In: Proceedings of the artificial intelligence of
and for business
20.
Schumaker RP, Chen H (2010) A discrete stock price prediction engine based on financial news.
Computer 43(1):51–56. https://​doi.​org/​10.​1109/​mc.​2010.​2
[Crossref]
21.
Attanasio G, Cagliero L, Garza P, Baralis E (2019) Combining news sentiment and technical
analysis to predict stock trend reversal. In: Proceedings of the 2019 International Conference on
Data Mining Workshops (ICDMW). https://​doi.​org/​10.​1109/​icdmw.​2019.​00079.
22.
Huang AH, Wang H, Yang Y (2022) FinBERT: a large language model for extracting information
from financial text. Contemp Account Res
23.
Mahajan A, Dey L, Haque SM (2008) Mining financial news for major events and their impacts
on the market. In: 2008 IEEE/WIC/ACM international conference on web intelligence and
intelligent agent technology. https://​doi.​org/​10.​1109/​wiiat.​2008.​309
24.
Hagen L (2018) Content analysis of e-petitions with topic modeling: how to train and evaluate
LDA models? Inf Process Manag 54(6):1292–1307. https://​doi.​org/​10.​1016/​j.​ipm.​2018.​05.​006
[Crossref]
25.
Bastani K, Namavari H, Shaffer J (2019) Latent dirichlet allocation (LDA) for topic modeling of
the CFPB consumer complaints. Expert Syst with Appl 127:256–271. https://​doi.​org/​10.​1016/​j.​
eswa.​2019.​03.​001
[Crossref]
26.
Raju SV, Bolla BK, Nayak DK, Kh J (2022) Topic modelling on consumer financial protection
bureau data: an approach using BERT based embeddings. In: 2022 IEEE 7th international
conference on for convergence in technology (I2CT). pp 1–6 (2022)
27.
Lundberg SM et al (2020) From local explanations to global understanding with explainable AI
for trees. Nat Mach Intell 2(1):56–67. https://​doi.​org/​10.​1038/​s42256-019-0138-9
[Crossref]
28.
Parsa AB, Movahedi A, Taghipour H, Derrible S, Mohammadian A (2020) (Kouros): toward
safer highways, application of XGBoost and SHAP for realtime accident detection and feature
analysis. Accident Analysis & Prev 136:105405. https://​doi.​org/​10.​1016/​j.​aap.​2019.​105405
[Crossref]
29.
Ohana JJ, Ohana S, Benhamou E, Saltiel D, Guez B (2021) Explainable AI (XAI) models
applied to the multi-agent environment of financial markets. In: Calvaresi D, Najjar A, Winikoff
M, Främling K (eds) Explainable and transparent AI and multi-agent systems. EXTRAAMAS
2021. Lecture notes in computer science, vol 12688. Springer, Cham. https://​doi.​org/​10.​1007/​
978-3-030-82017-6_​12
30.
Xia X, Zhang X, Wang Y (2019) A Comparison of feature selection methodology for solving
classification problems in finance. J Phys: Conf Ser 1284(1):012026. https://​doi.​org/​10.​1088/​
1742-6596/​1284/​1/​012026
[Crossref]
31.
Xiaomao X, Xudong Z, Yuanfang W (2019) A comparison of feature selection methodology for
solving classification problems in finance. J Phys: Conf Ser 1284:012026. https://​doi.​org/​10.​
1088/​1742-6596/1284/1/012026.
32.
Kumar S, Vishal M, Ravi V (2022) Explainable reinforcement learning on financial stock trading
using SHAP. arXiv:​2208.​08790

OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed
Computing , Studies in Computational Intelligence 1153
https://doi.org/10.1007/978-3-031-56388-1_11

Can a Large Language Model Generate


Plausible Business Cases from Agent-
Based Simulation Results?
Takamasa Kikuchi1 , Yuji Tanaka2 , Masaaki Kunigami3 ,
Hiroshi Takahashi4 and Takao Terano5
(1) Chiba University of Commerce, 1-3-1, Konodai, Ichikawa Chiba, 272-
8512, Japan
(2) ASKUL Corporation, Toyosu Cubic Garden, 3-2-3, Koto-Ku,
Tokyo 135-0061, Japan
(3) Tokyo Institute of Technology, 2-12-1, Ookayama, Meguro-Ku,
Tokyo 152-8550, Japan
(4) Graduate School of Business Administration, Keio University, 4-1-1,
Hiyoshi, Kohoku-Ku, Yokohama 223-8526, Kanagawa, Japan
(5) Chiba University of Commerce, 1-3-1, Konodai, Ichikawa 272-8512,
Chiba, Japan

Takamasa Kikuchi (Corresponding author)


Email: [email protected]

Yuji Tanaka
Email: [email protected]

Masaaki Kunigami
Email: [email protected]

Hiroshi Takahashi
Email: [email protected]
Takao Terano
Email: [email protected]

Abstract
This paper describes new applications of a Large Language Model (LLM)
for business domains. So far, we have conducted research on agent-based
simulation models to uncover complex socio-technical systems. However,
to let ordinary business people understand the models and their
consequences, conventional validation or visualization methods are not
enough. We must explain the plausible results through cases with natural
languages. In our previous studies, we have reported a method for
describing simulation results in natural language and grounding them with
actual business case. Based on the results, we utilize a Large Language
Model for the generation. From this study we have achieved the following
results: (1) simulation results are comprehensively analyzed and
systematically classified, and (2) the classification results are used as
prompts for with a LLM or ChatGPT, and (3) the LLM generates plausible
business cases with natural language. We have confirmed that the generated
cases are coincide with previous manual generated explanations and easy to
understand for ordinary business people.

Keywords Agent-based simulation – Business case – Case method – Large


language model

1 Introduction
A socio-technical system is a large-scale complex system that serves as a
social infrastructure, such as transportation, information and
communication, finance, and electric power [1]. These systems comprise
technical subsystems, as well as social subsystems that stem from the
individuals and organizations overseeing their functioning, along with their
intricate interactions. To attack the systems, we must consider technical,
social, and communication issues [2].
In the realm of designing and implementing socio-technical systems, the
need for clear boundaries becomes even more pronounced as the scope and
complexity of these systems increase. However, a challenge arises due to
the shifting positions and perspectives of stakeholders, which can change in
response to fluctuations in system boundaries (Fig. 1) [3].

Fig. 1 Socio-technical systems: difficulties in their design and implementation [3]


A constructive approach is effective for such systems [3]. This
methodology involves creating models of the systems and running
simulations to better understand their nature. It complements reductionism,
which aims to describe objects deductively by breaking them down into
their constituent elements. For examining social phenomena, agent
simulation techniques are valuable [4, 5].
In order to utilize the results of agent simulation analysis in the design
and implementation of socio-technical systems, it is necessary not only to
evaluate the validity in conventional ways, but also to build a consensus and
approval among individual stakeholders, or ordinary businesspeople. To
achieve this, it is crucial to establish a framework that allows stakeholders
with diverse backgrounds to share and comprehend the model's structure
and overall simulation results.
In our previous study, we have proposed to describe simulation results
in natural language and ground them with actual business cases [6].
However, there are limitations in describing cases, such as: (a) the
description is limited to a part of the simulation trials, (b) there are
restrictions on expression due to the use of predefined templates, and (c)
substantial time and human resource costs.
In this paper, we propose a new method to address the limitations with a
Large Language Model (LLM) or ChatGPT. improve this methodology.
Specifically, we (1) analyze and classify simulation results systematically
and comprehensively, and (2) write down the classification results in natural
language using a LLM. This approach enables us to gain a bird's-eye view
of the overall structure of the simulation trial and identify could-be and
plausible scenarios, thus enabling the semi-automatic generation of virtual
cases without relying on predefined templates.

2 Related Work
2.1 Understanding the Content of the Agent Model
In the literature, they have proposed various methods for analyzing and
understanding the contents of agent models and their simulation results. In
previous studies, (1) a methodology for describing simulation results in
natural language [6] (Sect. 2.3), (2) a methodology for describing models in
a description language such as Unified Modeling Language (UML) [7], and
(3) a methodology for describing simulation results in a formal language [8,
9] were proposed. While formal language description has an advantage in
terms of comparability of results, natural language description is easier to
understand in terms of ease of stakeholder feedbacks. The two
methodologies are complementary to each other. In this paper, we focus on
examining method with a LLM described in natural language) since our
goal is to effectively engage with various stakeholders.

2.2 Business Case


A business case serves as a documentation of management practice and a
narrative description of a managerial event [10]. A case is an interpretation
of a scenario selected by the case creator (observer) for analysis, rather than
a historical episode itself [11]. Therefore, it might contain diverse
information on actual problems and to facilitate a variety of analytical
perspectives. Major applications include case analysis in business
administration (case study), corporate education in business schools and
corporate training (case method) [12–14], as well as understanding the
phenomena through organizational simulation and system design [10].

2.3 Grounding of Agent Simulation Results with Actual


Business Cases
Kobayashi et al. developed a unified agent model representing
improvements and deviations in a business organization, using the corporate
misconduct of Japanese confectionery manufacturer Akafuku as a case
study of deviations in a production line [6]. They then proposed a method to
generate a virtual business case from the logs of individual simulation trials
(Fig. 2). The outline of Kobayashi et al.’s is summarized as follows: (1)
Create a case design template based on a simplified model of agent
simulation; (2) Generate a virtual case while verifying the consistency of
the simulation results using this template; (3) Confirm the explanatory
scope of the model by comparing the virtual case with the actual business
case.

Fig. 2 Methodology for grounding agent simulation results with actual business cases [6]

This approach enhances the explanatory power of simulations by


evaluating the behavior of a complex system within the simulation and then
grounding it within a real-world business case. However, it is important to
acknowledge certain limitations associated with creating a business case,
which include: (a) limits on described results from a subset of simulation
trials, (b) restrictions on expression due to the use of predefined templates,
and (c) significant human and time costs involved, making it a feasibility
check rather than a comprehensive framework.

3 Proposed Method
3.1 Outline
In this paper, we propose a method to enhance the utility of describing the
results of agent simulations in natural language to create business cases.
The concrete procedure is as follows (Fig. 3):

Fig. 3 Conceptual diagram of the proposed methodology

(Step 1) Perform a comprehensive and systematic analysis of the


simulation results and logs (Sect. 3.2).
(Step 2) Write down the classification results in the form of natural
language using a large language model (Sect. 3.3).

3.2 Systematic Analysis of Simulation Results


The volume of output results of agent simulations is generally a big data,
thus, it is a challenging to extract useful knowledge from them. Our
approach is to use clustering methods to systematically analyze the logs to
grasp the entire could-be and plausible outcomes [15–17]. The method
enables the description of the corresponding case from the perspective of
the simulation trial’s overall structure. Then, the method allows for a bird's-
eye view of the overall structure of the simulation trial, facilitating the
description of relevant cases. By doing so, we resolve the limitations of the
previous study [6] described in Chap. 1 and Sect. 2.3. It is limited to
describing a part of the simulation trials’ results.

3.3 Virtual Case Generation with Large Language Model


There have been widespread attempts to support the generation of so-called
“stories” or scenarios using a LLM (e.g., [18, 19]). In contrast, our
approach characterized by (α) generating stories based on the results of
agent simulation logs and clustering them, and (β) facilitating the
generation of virtual business cases, which is a specific field of interest.
Using a LLM, we expect that business cases could be written in natural
language without a special template. We also expect to reduce the labor and
time costs of writing virtual cases using natural language.

4 Experiment
In this chapter, we apply the proposed method to an agent model of
financial chain failures in our previous study. We then generate virtual
business cases that represent the propagation of the crisis among financial
institutions based on the simulation results.

4.1 Agent Model Used in This Paper


The agent model used in this paper analyzes the impact of interbank
financial markets on the stability of the banking system [20, 21]. According
to a systematic review of this field [22], our model is positioned to focus on
the number and size of interbank network exposures as a stability factor.
Agents are municipal financial institutions with balance sheets and
financial indicators (e.g., capital adequacy ratio). Securities are one of the
asset items on the balance sheet whose market value varies due to
fluctuations in market prices. The network includes an interbank network
for short-term operations and financial institution procurement. Each
financial institution also manages its own ALM, i.e., (1) investment and
lending activities (decisions related to changes in asset items), and (2) cash
management activities (decisions related to balance sheet funding gap
adjustment through the interbank network) (Fig. 4).
Fig. 4 Conceptual diagram of the agent model used in this paper [20]
We deal with the following two types of financial institution failures:
(1) simultaneous failures (cases in which a financial institution fails due to
deterioration in its financial and credit conditions and capital adequacy ratio
caused due to price fluctuations of marketable assets held by the institution,
leading to a deterioration in cash flows and an inability to raise funds), and
(2) chain failures (cases in which a financial institution that directly lends
funds to a failed institution experiences capital loss due to the scorching of
those funds).

4.2 Simulation Settings


In the simulation, the number of financial institutions was set to 20, each of
which was assumed to be a fully connected network. The initial attributes of
each financial institution (e.g., securities balance, funding supply, funding
requirement, and capital adequacy ratio) were randomly generated within a
certain range. Additionally, the prices of exogenously provided securities
were assumed to be in a phase of substantial decline. The balance of
securities held by a particular financial institution (agent #0) was varied (10
patterns), and each trial was repeated 500 times. The number of failed
agents, the reason for their failure, and the number of failure steps were
observed for the simulation results.

4.3 Analysis of Simulation Logs [23]


In this section, we systematically analyze the logs of the simulation results
in Sect. 4.2 using a clustering method. The analysis conforms to the
authors’ previous study [23].
First, the information in the simulation logs was condensed by
converting the above-mentioned observation items into strings for each
trial. Here, we focused on the chain of failures centered on agent #0 and
classified the simulation results. Figure 5 illustrates an example of the
conversion rule by replacing the following three points with strings: (1) the
reason for the failure of agent #0, (2) the reason for the failure of agents
other than agent #0, and (3) the interval of the number of steps in which the
failure occurred.
Fig. 5 Rules for converting simulation logs to strings [23]
Next, a clustering method was applied to the created strings. In a
previous study, a hierarchical clustering method (Ward’s method) and the
Levenshtein distance between strings were employed [23]. We applied the
above method to all simulation logs and classified them into three clusters.
The classification results were plotted on two axes: the number of
bankruptcy steps for agent #0 and the initial balance of securities held
(Fig. 6).

Fig. 6 Clustering results of simulation logs [23]


Of the three clusters, cluster #0 has the least number of securities held
by agent #0 and the slowest number of bankruptcy steps for that agent.
Cluster #1 is a log cluster similar in nature to cluster #0, but it has relatively
more securities holdings and a faster bankruptcy step. Cluster #2 is different
from the other clusters in that it has the most securities holdings and the
fastest number of bankruptcy steps.
Thus, by comprehensively analyzing and systematically classifying the
simulation results, it is possible to obtain an overview of the entire
simulation trial and identify “possible” logs. Furthermore, representative
simulation logs could be analyzed in depth by extracting logs that are close
to the center of each cluster.

4.4 Virtual Case Generation with Large Language Model


This section extracts the representative logs from Fig. 6 and present an
example of creating virtual business cases with a large language model
based on the results. We will re-examine the previous results using a LLM.
(1) Large language model utilized in this paper

This paper utilizes the GPT-4 API proposed by OpenAI [24] and was run in
September 2023. The hyperparameters were set to temperature = 1.0 or 0.9,
and the maximum length was set to be large enough to ensure uninterrupted
output.
(2) Prompts addressed in this paper

Following the extraction of a representative log from cluster #1 in Fig. 6,


the initial agent attributes, simulation rules, and an overview of the
simulation log were entered in the log as prompts (upper part of Fig. 7) to
generate virtual cases. Then, based on the generated case, prompts were
input to generate a question-and-answer session that simulates a group
discussion using the case method (lower part of Fig. 7). The prompts were
input in Japanese, and the output was converted to English.
Fig. 7 Prompts provided in this paper: for virtual case generation (top), for question-and-answer
generation (bottom)

(3) Examples of virtual business case outputs

Two examples of outputs for the inputs described in (2) are depicted in
Fig. 8a, b. The outputs are displayed in correspondence with the results of
the simulation log analysis described in Sect. 4.3. The simulation settings
and the results/facts of the simulation logs are presented in the figures and
are indicated by <italics>. Appropriate assumptions and conclusions based
on the log facts are <underlined>, while items that cannot be directly read
from the logs or assumptions are <shaded>.
Fig. 8 a Example of business case output. b Examples of business case output and related questions
and answers (questions and answers are extracted and aggregated from multiple output results)
Both virtual cases generally describe the facts of the logs accurately; for
example, from the second paragraph to the fifth paragraph in Fig. 8a and
from the second paragraph to the fourth paragraph in Fig. 8b. Furthermore,
appropriate conclusions and summaries are generated under proper
inferences; for instance, in the final paragraph in Fig. 8a and in the first and
fifth paragraphs in Fig. 8b.
In contrast, the logs also describe matters and inferences that are not
directly readable from the logs themselves. These are the statements in the
last paragraph of Fig. 8a and “Question 3” in Fig. 8b regarding the bank’s
risk management and the role of the supervisory authority. As described in
Sect. 4.1, the supervisory authority is not included in this model or regarded
as an agent in the simulations in this paper. They are also absent from the
text provided as prompts. This point will be further discussed in Sect. 5.2.

5 Discussion
5.1 Potential Utilization in the Case Method
The results in Sect. 4.4 (3) not only describe the contents of the simulation
logs in a factual manner, but also generate conclusions and questions and
answers based on appropriate assumptions. Particularly, in the case depicted
in Fig. 8b, the case description in the first part is followed by a brief
summary and related questions and answers. This case study can be used as
a review document or as a starting point for facilitation in the case method
[12–14] described in Sect. 2.2.

5.2 Implications for Agent-Model Extension


From the results of Sect. 4.4 (3), there were some items and guesses that
were not directly readable from the logs or prompts. This could be due to a
form of “hallucination” [25, 26]. However, this may also be taken as
feedback and suggestions for building the model. While the KISS principle
of “the simpler things are, the better” is required in agent models [27], the
importance of exceeding this limit is also emphasized [28]. When extending
and upgrading the model, changes in the range of agents to be targeted and
the boundary conditions of the model are considered, and the model should
be used positively as reference information. In the case depicted in Fig. 8b,
the role of the supervisory authority was suggested, and in fact, the authors
have proposed model extensions that explicitly deal with the behavior of the
regulator and central bank in a series of previous studies [29].

5.3 Expansion of the Application Area of the Proposed Method


In this paper, we proposed a methodology for describing agent simulation
logs in natural language and grounding them with business cases using a
large language model. Alternatively, Kunigami et al. proposed a
methodology to amalgamate agent simulation and gaming simulation using
actual and virtual business cases [3] (Fig. 9). The proposed methodology in
this paper improves the practicality of writing down actual and virtual
business cases in natural language from the agent simulation (constructive
approach) side. Additionally, we expect similar progress from the gaming
simulation (participatory approach) side.

Fig. 9 Methodology for amalgamate agent simulation and gaming simulation [3]

6 Concluding Remarks
In this paper, we propose a new application of a LLM for describing
simulation results in natural language and grounding them in actual
business cases. Specifically, (1) simulation results are comprehensively
analyzed and systematically classified, and (2) the classified results are
written down in natural language using a large language model. From the
results, we state that the answer of the title is ‘YES’. The results indicate
that it is possible to identify “possible” logs based on an overview of the
overall structure of the simulation trials. We also discovered that could-be
and plausible business cases using a LLM without the templates in our
previous studies.
The contents of the generated cases were generally consistent with that
of the simulation results. In addition to various summaries and conclusions,
simulation-based questions and answers were also understandable. They
could also be used as reference information when upgrading the simulation
model.
In the future, we would like to consider a framework that further
integrates and deepens both the constitutive and participatory approaches
described in Sect. 5.3, fostering a sense of conviction among various
stakeholders in socio-technical systems and facilitating the acquisition of
their approval.

Acknowledgements
The authors would like to thank Enago (www.​enago.​jp) for the English
language proofing. The authors also would like to thank DeepL and
ChatGPT for the English translation.

Notes All claims expressed in this article are solely those of the authors
and do not necessarily represent those of their affiliated organizations, or
those of the publisher, the editors, and the reviewers.

References
1. Hollnagel E, Woods DD, Leveson N (2006) Resilience engineering: concepts and precepts. CRC
Press
2.
Abbas R, Michael K (2023) Socio-technical theory: a review. In: Papagiannidis S (eds) Theory
hub book, http://​open.​ncl.​ac.​uk/​. ISBN: 9781739604400
3.
Kunigami M, Terano T (2021) Amalgamating agent and gaming simulation to understand social-
technical systems. In: Kaneda T, Hamada R, Kumazawa T (eds) Simulation and gaming for
social design. Translational systems sciences, vol 25. Springer, Singapore. https://​doi.​org/​10.​
1007/​978-981-16-2011-9_​11
4.
Terano T (2010) Why agent-based modeling in social system analysis? Oukan (J Transdiscipl
Fed Sci Technol) 4(2):56–62 (in Japanese) https://​doi.​org/​10.​11487/​trafst.​4.​2_​56
5.
Terano T (2013) State-of-the-art of social system research 1—world and Japanese research—
cutting-edge social simulation works between computer and social sciences. J Soc Instrum
Control Eng 52(7):568–573 (in Japanese). https://​doi.​org/​10.​11499/​sicejl.​52.​568
6.
Kobayashi T, Takahashi S, Kunigami M, Yoshikawa A, Terano T (2012) Analyzing
organizational innovation and deviation phenomena by agent based simulation and case design.
In: proceedings of 9th int. conf. innovation & management. Wuhan University of Technology
Press, pp 778–791
7.
Goto Y, Sugimoto A, Takizawa Y, Takahashi S (2014) Methodology for facilitating
understandings of complex agent-based models by gaming. Trans Inst Syst, Control, Inf Eng
27(7):290–298 (2014) (in Japanese)
8.
Kunigami M, Kikuchi T, Terano T (2022) A formal model for the business innovation case
description. J Syst, Cybern Inform 20(1):296–318. https://​doi.​org/​10.​54808/​JSCI.​20.​01.​296
9.
Kikuchi T, Kunigami M, Terano T (2023) Agent modeling, gaming simulation, and their formal
description. In: Kaihara T, Kita H, Takahashi S, Funabashi M (eds) Innovative systems approach
for facilitating smarter world. Design Science and Innovation, Springer, Singapore. https://​doi.​
org/​10.​1007/​978-981-19-7776-3_​9
10.
Kikuchi T, Kunigami M, Takahashi H, Toriyama M, Terano T (2019) Description of decision
making process in actual business cases and virtual cases from organizational agent model using
managerial decision-making description model. J Inf Process 60(10):1704–1718 (in Japanese).
https://​cir.​nii.​ac.​jp/​crid/​1050282813791883​520
11.
George AL, Bennett A (2005) Case studies and theory development in the social sciences. In:
BCSIA studies in international security, Cambridge, Mass., MIT Press
12.
Barnes LB, Christensen CR, Hansen AJ (1994) Teaching and the case method: text, cases, and
readings. Harvard Business Review Press
13.
Takeuchi S, Takagi H (2010) Introduction to case method teaching. Keio University Press (in
Japanese)
14.
Gill TG (2011) Informing with the case method: a guide to case method research. Informing
Science Press, Writing and Facilitation
15.
Tanaka Y, Kunigami M, Terano T (2018) What can be learned from the systematic analysis of the
log cluster of agent simulation. Stud Simul Gaming 27(1):31–41 (in Japanese). https://​doi.​org/​10.​
32165/​jasag.​27.​1_​31
16.
Kikuchi T, Kunigami M, Takahashi H, Toriyama M, Terano T (2020) Classification of simulation
results for formal description of business case. J Jpn Soc Manag Inf Res Note 29(3):199–214
(2020) (in Japanese). https://​doi.​org/​10.​11497/​jjasmin.​29.​3_​199
17.
Goto Y (2021) Hierarchical classification and visualization method of social simulation logs
reflecting multiple analytical interests. Trans Soc Instrum Control Eng 56(10):463–474 (in
Japanese)
18.
Chung JJY, Kim W, Yoo KM, Lee H, Adar E, Chang M (2022) TaleBrush: sketching stories with
generative pretrained language models. In: Proceedings of the 2022 CHI conference on human
factors in computing systems (CHI ’22) (2022). https://​doi.​org/​10.​1145/​3491102.​3501819
19.
Yuan A, Coenen A, Reif E, Ippolito D (2022) Wordcraft: story writing with large language
models. In: 27th international conference on intelligent user interfaces, ACM, pp 841–852
20.
Kikuchi T, Kunigami M, Yamada T, Takahashi H, Terano T (2016) Agent-based simulation
analysis on investment behaviors of financial firms related to bankruptcy propagations and
financial regulations. transactions of the Japanese Society for Artificial Intelligence 31(6), AG-
G_1-11 (2016) (in Japanese). https://​doi.​org/​10.​1527/​tjsai.​AG-G
21.
Kikuchi T, Kunigami M, Yamada T, Takahashi H, Terano T (2016) Analysis of the influences of
central bank financing on operative collapses of financial institutions using agent-based
simulation. In: Proceedings of the 40th annual computer software and applications conference
(COMPSAC), 2, The Institute of Electrical and Electronics Engineers, Inc. (IEEE), pp 95–104.
https://​ieeexplore.​ieee.​org/​document/​7552186
22.
Alaeddini M, Madiès P, Reaidy PJ, Dugdale J (2022) Interbank money market concerns and
actors’ strategies – a systematic review of 21 century literature. J Econ Surv:1–81. https://​doi.​
org/​10.​1111/​joes.​12495
23.
Tanaka Y, Kikuchi T, Kunigami M, Yamada T, Takahashi H, Terano T (2017) Classification of
simulation results using log clusters in agent simulation. JSAI Special Interest Group on
Business Informatics (SIG-BI #7) (in Japanese)
24.
OpenAI, GPT-4 Technical Report (2023) https://​arxiv.​org/​pdf/​2303.​08774
25.
Wolfram S (2023) What is ChatGPT doing and why does it work? Wolfram Media Inc
26.
Ji Z, Lee N, Frieske R, Yu T, Su D, Xu Y, Ishii E, Bang YJ, Madotto A, Fung P (2023) Survey of
hallucination in natural language generation. ACM Comput Surv 55(12):1–38
27.
Axelrod R (1997) The complexity of cooperation agent-based models of competition and
collaboration. Princeton University Press
28.
Terano T (2013) Beyond KISS principle: agent-based modeling toward complex adaptive
systems. J Jpn Soc Artif Intell (Spec Issue: Complex Syst Collect Intell) 18(6):710–715 (in
Japanese)
29.
Kikuchi T, Kunigami M, Yamada T, Takahashi H, Terano T (2018) Agent-based simulation of
financial institution investment strategy under easing monetary policy for operative collapses. J
Adv Comput Intell Intell Inform 22(7):1026–1036 (in Japanese). https://​doi.​org/​10.​20965/​jaciii.​
2018.​p1026

OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
R. Lee (ed.), Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed
Computing , Studies in Computational Intelligence 1153
https://doi.org/10.1007/978-3-031-56388-1_12

Analyzing the Growth Patterns of GitHub


Projects to Construct Best Practices
for Project Managements
Kaide Kaito1 and Haruaki Tamada2
(1) Graduate School of Kyoto Sangyo University, Motoyama, Kamigamo,
Kita-ku, Kyoto-shi, Kyoto, Japan
(2) Kyoto Sangyo University, Kyoto, Japan

Kaide Kaito (Corresponding author)


Email: [email protected]

Haruaki Tamada
Email: [email protected]

Abstract
The Git platforms, such as GitHub, are big data providers that store various
development histories. Then, many researchers analyze data from such
platforms from various aspects. Recently, AI-based systems have been used
to construct themselves. However, there are no studies to measure the
quality of the projects, and the ideal images of the projects need to be
defined. This paper aims to find the ideal images of the projects in OSS
(Open Source Software). For this, we extract the time-series project metrics
from the famous OSS projects to categorize for detecting the patterns of
projects’ growth. Our approach tries to explain the patterns and to give the
decision for the patterns, good or not. The time-series metrics from projects
include the number of stargazers, forks, commits, etc. The number of
stargazers should increase as time passes, and the number of forks tends to
decrease. The stargazer pattern indicates that many developers watch the
repository since it was managed well. We conducted the case study to
analyze the time series data from 10 repositories in GitHub. As a result, we
found that the transitions of the number of issues typically form the
sawtooth wave. The sawtooth wave pattern suggests that new issues are
reported continuously and inventoried periodically. Therefore, the projects
with the patterns indicate they are managing well, and many developers pay
attention.

1 Introduction
Recently, there are widly spreaded to distributed pull-based developments
with the platform like GitHub [9]. Many project managers struggle to
manage their projects based on agile practice [8] and scrum.1 Unfortunately,
no best practices for project management have been established.
Our research group proposed a method to characterize projects based on
the time-series data of the project metrics [12, 13]. Also, we introduced the
concept of Project as a City (PaaC) by aligning the obtained features with
the smart city. In PaaC, we assume that the current state of the city (project)
is measured, the future state of the city is defined, and actions are taken to
fill the difference to approach the ideal image. Then, we developed a tool to
analyze the time-series data of the project metrics based on the given
GraphQL [5]. However, the tool, named Argo, is just fetch the specified
time-series data of a repository in GitHub through GitHub GraphQL API.2
It is not enough to find the ideal image of the project.
This paper aims to find the ideal image of the project by analyzing the
time-series data of the project metrics. If the ideal image is clear, the
developers can work toward the goal, evaluate the progress, and corrects the
way of the project goes. For this, we find and categorize the patterns of the
time-series data from the projects. Then, we give the decision for the
patterns to be the best practice or anti-pattern from the professionals’ view.
This paper organizes as follows. Section 2 describes the proposed
method. Next, Sect. 3 shows case studies of our proposed method to find
the ideal image of the projects. Then, we introduce related works in Sect. 5.
Finally, we summarize the paper and discuss future work in Sect. 6.

2 Proposed Method
2.1 Time-Series Data
The purpose of this paper is to find the ideal image of the projects from the
projects’ time-series data. First, we extract time-series metrics from
projects. Then, we depicts the line-chart of the metrics to identify the
characteristics of the data. Next, professionals invited us investigated the
charts to give the description for the patterns. Finally, we and professional
try to find the ideal image of the projects from the patterns through the
discussion.
The ideal image of the projects are generally different by projects’ scale,
characteristics, developers and some aspects. Therefore, we assume that the
ideal image is not unique. However, if the proposed method find the similar
patterns from the most projects, the patterns might be effective for the
projects. Then, the similar patterns may become the ideal image of the
projects.
We use the time-series data from the projects to identify the patterns.
The time-series data reflect the developers’ behaviors in the projects.
Therefore, we obtain the certain time-series data of the project repository.
The example of time-series data is the number of stargazers, forks,
commits, issues, and pull requests. Also, we do not use qualitative data such
as the source codes, the content of issues and pull requests. Those
qualitative data also reflect the project behaviors, however, it is difficult to
identify the patterns. Hence, our approach employes only the quantitative
data and their transitions.

2.2 Analysis
To analyze the projects p from the obtained time-series data, we extract the
data in a period and divide so that n. In other words, we compute the
frequencies of the data in the period. Then, we convert the time-series data
from certain method f into . The datum
represents a sum of metric value of a part of period. Now, a moving average
is computed for each of the m items from to obtain the
.
Besides is a function to calculate the average value of the given data.
Next, we extract 3-grams from and obtain
. By
focusing , we categorize the patterns into the following five types:
Rising: .
Falling: .
V-shaped:
Inverted V-shaped:
Flatten:
Figure 1 depicts above patterns. Dots in each pattern shows
. Vertical locations of each dot indicates the
values of . Note that, the values in Fig. 1 are just for examples. By
applying strictly the above categorizing rules, since the rules of flatten is
quite sensitive, no patterns would be categoized into it. Therefore, we
introduce the buffer for accepting gaps to the rules of flatten as
.

Fig. 1 The patterns of (3-gram)

Finally, we categorize the patterns based on multiple into four


types: Rising, Falling, ZigZag-shaped, and Flatten. ZigZag type is a
sequence of V-shaped and inverted V-shaped types. The professionals and
we discuss the meanings of the shapes and their occurrence datetime.

2.3 The Pattern Analysis for Issue Metrics


This subsection gives the description of the patterns described the previous
subsection for the issue metrics. To understand each pattern requires the
background of the project activities in the period. Therefore, we and
professionals discuss in order to understand the background of the patterns.
The professional has more than 20 years development experiences for
Java, Go and JavaScript in the industry and OSS activities. Additionally, he
is a researcher of software engineering, and has classes to lecture the
software engineering, and Java programming to the univerity student. We
limit to discuss the issue metrics the rest of this paper. Other metrics should
require the different discussion.

2.3.1 Rising
This pattern indicates that some developers post new issues to the project.
The most of issues were posted, the metrics values sharply rise.

2.3.2 Falling
If the metrics values acts this falling pattern, we inferred the developers in
the project tried to solve the issues. For the popular project, many
developers and users posts issues one after another. Therefore, it is
important to clean up unnecessary issues, repeatedly. At this time, the
metrics values forms a sharp falling pattern.

2.3.3 ZigZag
This pattern indicates that the metrics values repeat the V-shaped and
inverted V-shaped patterns. That pattern was guessed the project works
well. Since some problems were solved for by the efforts of the project
team, and the product was used to enough to find the new problems.

2.3.4 Flatten
This pattern indicates that the metrics values are almost constant and do not
change. The causes of this pattern are two reasons. One is that the project
activities are stagnated. The other is that the number of posted and closed
are almost equals, unexpectedly. Therefore, we should consider the commit
counts and opened/closed issue counts in the period of this pattern in order
to distinguish the reasons.
Fig. 2 The chart of remained issue count per month of babylonjs/babylon.js

3 Case Studies
We demonstrate the proposed method through three case studies. The first
case study is the preliminary study to understand the four patterns. In this
study, we chose one project and illustrated the graph of the the issue count
transitions. The second case study is to identify the typical transition
patterns from the actual OSS projects. We investigated the issue count
transitions of 514 repositories and find the ratio of each pattern. The third
case study is the evaluation of the four patterns from 514 repositories.

3.1 Preliminary Study


At first, we identify the introduced four patterns at the previous section in
the actual OSS project. As an example, We show Fig. 2 shows the remained
issue count per month of babylonjs/babylon/js from the beginning
of the project to today. The solid line in Fig. 2 shows the issue count of in
the month, and the dashed line shows the 3-month moving average.
From Fig. 2, this project has started of ZigZag pattern to Sept. 2015.
Then, it has changed to Flatten pattern to Feb. 2017. Next, the metrics
values dramatically grow up until May 2018. From May 2018 to Aug 2019,
the values are falling down. After that, the values shape ZigZag pattern
again with large amplitude.
We showed the transition patterns of the issues for the actual project in
the four patterns introduced at Sect. 2.3. Next, we investigate the ratios of
each pattern in the much projects.

3.2 Experimental Setup


In our case study, we found 514 repositories managed in GitHub with
GitHub Search3 [3] (GHS). The search criteria for GHS were shown in
Table 1. Then, we downloaded a list of repositories from GHS at June 23,
2023. Next, we fetched the number of stargazers, and issues from
repositories with our tool Argo4 [5]. Finally, Argo generated the line charts
for each metric and we manually identified the patterns from the line charts
through discussion with professionals.
Table 1 Search criteria for GitHub search

Search criteria Value


# of Commits Over 15,000
# of Issues Over 1,000
# of Stargazers Over 500
# of Forks Over 250
# of Pull requests Over 500

Note that we ran Argo for fetching data from 514 repositories in GitHub
on iMac Retina 4K, Intel Core i5 3GHz 6-core 2019 with macOS Ventura
13.4.1, 16GB RAM. Moreover, the total time and data length were
summarized in Table 2.
Table 2 Fetched data size and fetching time

Data Data size Time


Issues 4,050,745 07:20:57
Stargazers 7,174,757 13:42:09
Commits 19,786,915 53:39:30.34

3.3 Typical Patterns of Issue Count Transitions


The objective of this case study is to identify the typical patterns of issues’
count transitions. For this, we investigated the collected 514 repositories
and illustrated the line chart of the transitions. Then, we manually identified
the patterns from the line charts through discussion.
Figures 3 are the line charts of the transitions of issue counts for typical
repositories. The horizontal axis of Fig. 3 are the months from the
beginning of the project. The vertical axis of Fig. 3 are the total posted issue
counts in the corresponding month.

Fig. 3 Transitions of issue counts

From the Fig. 3a, the project has started at 2013 in Flatten pattern and
the pattern has changed to Rising from Oct. 2017. Then, it changed to
ZigZag pattern from Jan. 2018 via Falling pattern after that.
Similarly, Fig. 3b indicates that the project has started in ZigZag and
Rising patterns. After that, the patterns has changed to Falling and ZigZag
patterns until Apl. 2018. Finally, the patterns in the project turned to Flatten
pattern. From Fig. 3a and b, what this means is that a project may be active
or stagnant at different times.
The repositories golang/go and apple/swift are both prominant
projects. Figure 3c and d are the charts from both project. From figures,
although golang/go has one great falling, both project increase gradually
with ZigZag patterns. The repository apple/swift looks like to resolve
unimportant issues at the beginning of the year, a slightly large falling
appeared on that time.

3.4 Ratios of Each Pattern


Finally, we investigated the ratios of each pattern in the project activities.
Figure 4 shows the cumulative bar chart as the result of this case study. The
vertical axis is the projects, the horizontal axis is the ratio, and each bar
shows by occupied ratio of each pattern for corresponding projects. From
Fig. 4, the occupied ratio of ZigZag pattern is almost greatest pattern.
Table 3 shows the repository count of each pattern in corresponding range.
In Falling and Flatten pattern, the largest number of repositories had a ratio
of less than 0.1, 361 and 388, respectively. Actually, 88.72% of the
repositories have 50% and more of the ZigZag pattern. Also, 90.27% of the
repositories have 20% and less, and 54.47% of the repositories have least
1% of the Flatten pattern. Additionally, it was that the
correlation coefficient of occupied ratio between ZigZag pattern and Flatten
pattern ( ).
The collected 514 repositories are expected to be active projects by
search criteria shown in Table 1. Therefore, the ZigZag pattern is the most
common pattern in the active projects. Moreover, we found that the more
active repositories are the less Falling and Flatten patterns.
Fig. 4 The cumulative bar chart of pattern ratios in each project

Table 3 Repository count of corresponding range

Ratios ZigZag Rising Falling Flatten


0 16 361 388
0 135 137 75
1 215 16 31
9 119 0 15
48 26 0 3
174 3 0 1
199 0 0 1
78 0 0 0
5 0 0 0
0 0 0 0

4 Discussion
4.1 Ideal Pattern
This section discusses the ideal patterns of issues in OSS projects based on
the results of case studies. The ideal patterns of issues in OSS projects are
considered to be different depending on the project’s situation and goals.
The huge project which attracts the attentions from the begining such as
microsoft/vscode acts the sawtooth pattern of large amplitude and the
Rising pattern. Figure 5 shows the issue count transition of
microsoft/vscode. Actually, projects with more stars tend to have
more issues, since the correlation value was calculated as
between the latest active issue count and latest stargazer count ( ).
The projects with a high number of stargazers indicate that many
developers pay attention to them. Thus, such projects receives a lot of
feedback and issue reports. The projects are considered to be stably
growing, and regularly resolving issues and pull requests.

Fig. 5 Issue count transition of microsoft/vscode

By contrast, a project like babylonjs/babylon.js, which initial


developers were few and has not much attention, exhibits a sawtooth
pattern of small increases and decreases. At this stage, it was guessed that
the developers lead the project in their idea. Because, a Flatten pattern
appeared after sawtooth pattern. This period did not have any new issues
and commits. Presumably, the developers had temporarily left the project
for some reason. However, after that, the issue count transitions turned from
Flatten pattern into Rising pattern. Turned into the Rising pattern, the
number of developers and commit counts similarly increased. Additionally,
the stargazer count also increased a few month later. Then, issue count
transitions in the project acts large sawtooth pattern like
microsoft/vscode’s.
However, after that, the issue count transitions turned from Flatten
pattern into Rising pattern. Turned into the Rising pattern, the number of
developers and commit counts similarly increased. Additionally, the
stargazer count also increased a few month later. Then, the shape of issue
count transitions in the project becomes similar with
microsoft/vscode’s. It means that one ideal patterns of middle scale
project is starts with small zigzag pattern, then, gradually grows zigzag
pattern with large amplitude. Since the Flatten pattern means stagnant, we
except this pattern from the ideal patterns.

4.2 Treats of Validity


The targetted projects in the paper were relatively large and OSS projects
hosted on GitHub. Therefore, Non-OSS projects may acts different pattern.
Also, the ideal pattern may be different for smaller projects.
Additionally, the ideal pattern presented in the paper is not the only one,
and other ideal patterns are possibly exist. Moreover, the ideal pattern will
be different depending on the situation and goals of the project. In order to
resolve these issues, it requires to investigate much more projects, includes
non-OSS projects.

5 Related Works
Gunnar et al. surveyed the transitions of code velocity from activities of
issues for famous repositories [6]. Their work found the code velocity was
not slowed down in the target repositories. It encourages our work since
their results might become an anti-pattern by the falling code velocity for
some reason.
Niranjan presented a tool, GitRank, to rank the repositories in GitHub
using the quality of source codes, popularity, and maintainability [4]. Their
work and ours differ in the exploited metrics and the purpose of the results.
Our work focuses on repository management, therefore, did not care on
source code quality. The developer will use both results for more effective
repository management and development.
There are many other works mining the repositories in GitHub.
Steinmacher et al. conducted the quantitative and qualitative analysis of
pull-request from non-core members [10]. Constantiou et al. proposed the
method for presenting the summary of the Ruby projects on the timeline by
the number of commits, lines, and projects [2]. Pinto et al. analyzed the
behavior of contributors caused by the development language in GitHub
[7]. Borges et al. analyzed the characteristics of the projects that earned
many stars in GitHub [1]. Tian et al. tried to find developers who have geek
talent from activities across GitHub and Stackoverflow [11]. Our work is
related to the above works analyzing the projects in GitHub. However, we
focus on the factors contributing to the spontaneous software evolution for
the ideal project management, and introduce on the new concept of the
project as a city.

6 Conclusion
The goal of this paper is to investigate the ideal management method for the
project based on the histories from OSS projects hosted on GitHub. For
this, we focus on the number of issues in the project. We collected the issue
count transitions of 514 active repositories and classified the transition
patterns. As a result, the famous projects showed a large amplitude
sawtooth pattern. The projects started with a few developers act a small
amplitude sawtooth pattern. However, the amplitude of the sawtooth pattern
gradually increased as the project gained attention.
The following tasks remain for our future works. The first is to analyze
more repositories. Specifically, our analysis could benefit from an increased
sample size, encompassing a broader range of repositories. The second is to
analyze other metrics, for example issue comments, pull requests, etc. The
third is to clarify the specific actions of developers as best practices and
anti-patterns. Lastly, delving deeper into the underlying behaviors of
contributors to identify specific actions that could be categorized as best
practices or anti-patterns would offer actionable insights for effective
project management.

References
1. Borges H, Hora A, Valente MT (2016) Understanding the factors that impact the popularity of
github repositories. In: 2016 IEEE international conference on software maintenance and
evolution (ICSME), pp 334–344. https://​doi.​org/​10.​1109/​ICSME.​2016.​31
2.
Constantinou E, Mens T (2017) Socio-technical evolution of the ruby ecosystem in github. In:
2017 IEEE 24th international conference on software analysis, evolution and reengineering
(SANER), pp 34–44 (2017). https://​doi.​org/​10.​1109/​SANER.​2017.​7884607
3.
Dabic O, Aghajani E, Bavota G (2021) Sampling projects in github for msr studies. In:
Proceedings of the 2021 IEEE/ACM 18th international conference on mining software
repositories (MSR), pp 560–564. https://​doi.​org/​10.​1109/​MSR52588.​2021.​00074
4.
Hasabnis N (2022) GitRank: A framework to rank github repositories. In: Proceedings of the
2022 IEEE/ACM 19th international conference on mining software repositories (MSR), pp 729–
731. https://​doi.​org/​10.​1145/​3524842.​3528519
5.
Kaide K, Tamada H (2022) Argo: projects’ time-series data fetching and visualizing tool for
github. In: Proceedings of the 23rd ACIS international conference on software engineering,
artificial intelligence, networking and parallel/distributed computing (SNPD 2022 Summer)
6.
Kudrjavets G, Nagappan N, Rastogi A (2023) Are we speeding up or slowing down? on temporal
aspects of code velocity. In: Proceedings of the 2023 IEEE/ACM 20th international conference
on mining software repositories (MSR), pp 267–271. https://​doi.​org/​10.​1109/​MSR59073.​2023.​
00046
7.
Pinto G, Steinmacher I, Gerosa MA (2016) More common than you think: an in-depth study of
casual contributors. In: 2016 IEEE 23rd international conference on software analysis, evolution,
and reengineering (SANER), pp 112–123. https://​doi.​org/​10.​1109/​SANER.​2016.​68
8.
Project Management Institute: Agile Practice Guide (2017) Project management institute
9.
Saito Y, Fujiwara K, Igaki H, Yoshida N, Iida H (2016) How do github users feel with pull-based
development? In: 2016 7th international workshop on empirical software engineering in practice
(IWESEP), pp 7–11. https://​doi.​org/​10.​1109/​IWESEP.​2016.​19
10.
Steinmacher I, Pinto G, Wiese IS, Gerosa MA (2018) Almost there: a study on quasi-contributors
in open-source software projects. In: 2018 IEEE/ACM 40th international conference on software
engineering (ICSE), pp 256–266. https://​doi.​org/​10.​1145/​3180155.​3180208
11.
Tian Y, Ng W, Cao J, McIntosh S (2019) Geek talents: Who are the top experts on github and
stack overflow? Comput Mater Continua 61(2):465–479
[Crossref]
12.
Toda K, Tamada H, Nakamura M, Matsumoto K (2019) Characterizing project evolution on a
social coding platform. In: Proceeding 20th ACIS international conference on software
engineering, artificial intelligence, networking and parallel/distributed computing (SNPD 2019),
pp 525–532
13.
Toda K, Tamada H, Nakamura M, Matsumoto K (2020) Capturing spotaneous software
evolution in a social coding platform with project-a s-a-city concept. Int J Softw Innov 8(3):35–
50
[Crossref]
Footnotes
1 https://​www.​scrum.​org/​resources/​what-scrum-module.

2 https://​docs.​github.​com/​graphql.

3 https://​seart-ghs.​si.​usi.​ch.

4 https://​github.​com/​tamadalab/​argo.

OceanofPDF.com

You might also like