2024.Advances in Intelligent Data Analysis and Its Applications
2024.Advances in Intelligent Data Analysis and Its Applications
Edited by
Chao Zhang, Wentao Li, Huiyan Zhang and Tao Zhan
mdpi.com/journal/electronics
Advances in Intelligent Data Analysis
and Its Applications
Advances in Intelligent Data Analysis
and Its Applications
Editors
Chao Zhang
Wentao Li
Huiyan Zhang
Tao Zhan
Tao Zhan
Southwest University,
Chongqing, China
Editorial Office
MDPI
St. Alban-Anlage 66
4052 Basel, Switzerland
This is a reprint of articles from the Special Issue published online in the open access journal
Electronics (ISSN 2079-9292) (available at: https://www.mdpi.com/journal/electronics/special
issues/771L15O65G).
For citation purposes, cite each article independently as indicated on the article page online and as
indicated below:
Lastname, A.A.; Lastname, B.B. Article Title. Journal Name Year, Volume Number, Page Range.
© 2024 by the authors. Articles in this book are Open Access and distributed under the Creative
Commons Attribution (CC BY) license. The book as a whole is distributed by MDPI under the terms
and conditions of the Creative Commons Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)
license.
Contents
Jeyabharathy Sadaiyandi, Padmapriya Arumugam, Arun Kumar Sangaiah and Chao Zhang
Stratified Sampling-Based Deep Learning Approach to Increase Prediction Accuracy of
Unbalanced Dataset
Reprinted from: Electronics 2023, 12, 4423, doi:10.3390/electronics12214423 . . . . . . . . . . . . . 71
Jianxing Zheng, Tengyue Jing, Feng Cao, Yonghong Kang, Qian Chen and Yanhong Li
A Multiscale Neighbor-Aware Attention Network for Collaborative Filtering
Reprinted from: Electronics 2023, 12, 4372, doi:10.3390/electronics12204372 . . . . . . . . . . . . . 109
Jingqi Zhang, Xin Zhang, Zhaojun Liu, Fa Fu, Yihan Jiao and Fei Xu
A Network Intrusion Detection Model Based on BiLSTM with Multi-Head Attention
Mechanism
Reprinted from: Electronics 2023, 12, 4170, doi:10.3390/electronics12194170 . . . . . . . . . . . . . 143
Xiaohui Cui, Yu Li, Zheng Xie, Hanzhang Liu, Shijie Yang and Chao Mou
ADQE: Obtain Better Deep Learning Models by Evaluating the Augmented Data Quality Using
Information Entropy
Reprinted from: Electronics 2023, 12, 4077, doi:10.3390/electronics12194077 . . . . . . . . . . . . . 161
Ziyang Guo, Xingguang Geng, Fei Yao, Liyuan Liu, Chaohong Zhang, Yitao Zhang and
Yunfeng Wang
An Improved Spatio-Temporally Smoothed Coherence Factor Combined with Delay Multiply
and Sum Beamformer
Reprinted from: Electronics 2023, 12, 3902, doi:10.3390/electronics12183902 . . . . . . . . . . . . . 187
v
Can Wang, Chensheng Cheng, Dianyu Yang, Guang Pan and Feihu Zhang
Underwater AUV Navigation Dataset in Natural Scenarios
Reprinted from: Electronics 2023, 12, 3788, doi:10.3390/electronics12183788 . . . . . . . . . . . . . 203
Yong Tao, Haitao Liu, Shuo Chen, Jiangbo Lan, Qi Qi and Wenlei Xiao
An Off-Line Error Compensation Method for Absolute Positioning Accuracy of Industrial
Robots Based on Differential Evolution and Deep Belief Networks
Reprinted from: Electronics 2023, 12, 3718, doi:10.3390/electronics12173718 . . . . . . . . . . . . . 233
Zicheng Zuo, Zhenfang Zhu, Wenqing Wu, Wenling Wang, Jiangtao Qi and Linghui Zhong
Improving Question Answering over Knowledge Graphs with a Chunked Learning Network
Reprinted from: Electronics 2023, 12, 3363, doi:10.3390/electronics12153363 . . . . . . . . . . . . . 273
Yajun Chen, Junxiang Wang, Tao Yang, Qinru Li and Nahian Alom Nijhum
An Enhancement Method in Few-Shot Scenarios for Intrusion Detection in Smart Home
Environments
Reprinted from: Electronics 2023, 12, 3304, doi:10.3390/electronics12153304 . . . . . . . . . . . . . 311
Jie Wang, Ying Jia, Arun Kumar Sangaiah and Yunsheng Song
A Network Clustering Algorithm for Protein Complex Detection Fused with Power-Law
Distribution Characteristic
Reprinted from: Electronics 2023, 12, 3007, doi:10.3390/electronics12143007 . . . . . . . . . . . . . 335
Chenggong Zhang, Daren Zha, Lei Wang, Nan Mu, Chengwei Yang, Bin Wang and
Fuyong Xu
Graph Convolution Network over Dependency Structure Improve Knowledge Base Question
Answering
Reprinted from: Electronics 2023, 12, 2675, doi:10.3390/electronics12122675 . . . . . . . . . . . . . 351
Qiang Wang, Guowei Li, Weitong Jin, Shurui Zhang and Weixing Sheng
A Variable Structure Multiple-Model Estimation Algorithm Aided by Center Scaling
Reprinted from: Electronics 2023, 12, 2257, doi:10.3390/electronics12102257 . . . . . . . . . . . . . 383
vi
Jingyi Qu, Bo Chen, Chang Liu and Jinfeng Wang
Flight Delay Prediction Model Based on Lightweight Network ECA-MobileNetV3
Reprinted from: Electronics , 12, 1434, doi:10.3390/electronics12061434 . . . . . . . . . . . . . . . 417
Jie Yang, Xiaodan Qin, Guoyin Wang, Xiaoxia Zhang and Baoli Wang
Relative Knowledge Distance Measure of Intuitionistic Fuzzy Concept
Reprinted from: Electronics 2022, 11, 3373, doi:10.3390/electronics11203373 . . . . . . . . . . . . . 507
vii
electronics
Editorial
Recent Advances in Intelligent Data Analysis and
Its Applications
Chao Zhang 1, *, Wentao Li 2 , Huiyan Zhang 3 and Tao Zhan 4
1 Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education,
School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
2 College of Artificial Intelligence, Southwest University, Chongqing 400715, China; [email protected]
3 National Research Base of Intelligent Manufacturing Service, Chongqing Technology and Business University,
Chongqing 400067, China; [email protected]
4 School of Mathematics and Statistics, Southwest University, Chongqing 400715, China; [email protected]
* Correspondence: [email protected]
1. Introduction
In the current rapidly evolving technological landscape, marked by transformative
advancements such as cloud computing, the Internet of Things (IoT), and industrial in-
ternet, the complexity of data analysis tasks is escalating across the socio-economic spec-
trum. Within this dynamic environment, the challenges faced by current problem-solving
programs when handling big data primarily revolve around the effective management,
modeling, and processing of extensive datasets.
This surge in data intricacy necessitates a proactive approach towards researching and
developing intelligent models and methods for efficient data analysis and its application. It
is crucial to explore innovative solutions that can navigate the intricacies of large datasets
while ensuring not only the accuracy of analyses but also the timely extraction of valuable
insights. Such research endeavors have become indispensable in addressing the growing
demand for robust data processing capabilities in diverse sectors.
Moreover, as the technological landscape continues to evolve, the importance of
staying at the forefront of data analysis methodologies becomes evident. This involves not
only adapting to existing challenges but also anticipating future complexities. By delving
into research on intelligent data models and methods, we pave the way for advancements
Citation: Zhang, C.; Li, W.; Zhang, that are not only responsive to current demands but also resilient in the face of emerging
H.; Zhan, T. Recent Advances in technologies and data-related challenges in our ever-changing socio-economic landscape.
Intelligent Data Analysis and Its Presently, the domain of intelligent data analysis [1] has experienced a rise in the
Applications. Electronics 2024, 13, 226. number of scholars and professionals working within it. Innovative methods have been
https://doi.org/10.3390/ proposed from diverse perspectives, including data mining, machine learning (ML), natural
electronics13010226 language processing, granularity computation, social networks, machine vision, cognitive
Received: 2 January 2024
computing, and more. These approaches are intricately woven into the fabric of intelligent
Accepted: 3 January 2024 data analysis, presenting expansive and profound application scenarios for the field of
Published: 4 January 2024 data mining.
Data mining technology [2] plays a crucial role in dealing with large-scale data by
extracting valuable information from massive datasets. It provides essential training data
for ML algorithms, enabling the construction of more accurate models. Simultaneously,
Copyright: © 2024 by the authors. the development of natural language processing [3] allows machines to better understand
Licensee MDPI, Basel, Switzerland. and parse human language, imparting more practical meaning to the results of data
This article is an open access article analysis. Advancements in granularity computing [4] have improved the effectiveness
distributed under the terms and of data analysis by simplifying information into fundamental concepts, facilitating swift
conditions of the Creative Commons
and in-depth analysis. Social network analysis [5] uncovers patterns in interpersonal
Attribution (CC BY) license (https://
relationships and group behavior, offering substantial groundwork for the development
creativecommons.org/licenses/by/
of marketing strategies and policy formulation. The progression of machine vision [6]
4.0/).
broadens the horizons of data analysis to encompass image and video processing, providing
strong support for applications such as intelligent surveillance and autonomous driving.
Concurrently, the integration of cognitive computing [7] emulates the functions of the
human brain, enhancing the innovation and intelligence of data analysis.
These intelligent data analysis methods have broadened the comprehension of intricate
data processing at the theoretical research level, concurrently yielding positive effects on
socio-economic development. Especially within the era of big data [8], these methods have
shown considerable importance in tackling practical challenges across diverse domains,
presenting fresh perspectives and innovative solutions for the complexities posed by
intricate data. They not only make data analysis more intelligent and efficient but also drive
the development of socio-economics, providing more comprehensive and viable strategies
for solving practical issues. The research on these intelligent data analysis methods is
becoming a crucial engine for advancing the integration of technology and society.
By conducting in-depth research and widely applying these methods, one can better
address the challenges posed by the increasingly vast and diverse data streams, further pro-
pelling technological innovation. Not only is this innovation exhilarating, it is also playing
an increasingly crucial role in solving practical problems. Further in-depth research and
widespread application of newly emerging models and methods in the field of intelligent
data analysis are anticipated to drive continuous progress in societal digital transformation
and innovation.
To advance research in the field of computer science and engineering, new methods for
intelligent data analysis and their applications must be persistently explored. Throughout
this explorative process, the focus will be on the practicality, reliability, and effectiveness
of innovative technologies and methods, ensuring their maximum impact in real-world
applications. By closely integrating theoretical research with practical applications, there
is the potential to advance the forefront of the intelligent data analysis field, contributing
more beneficial insights to the development of a data-driven society in the future.
Overall, research on intelligent data analysis [9] and its applications holds significant
value in the era of big data [10]. Through interdisciplinary approaches and technological
innovations, it is possible to better address the challenges posed by complex data in the real
world, further advancing the field of computer science and engineering. In the ongoing
exploration in this field, attention is directed towards enhancing the practical applicability
of intelligent data analysis methods to address real-world challenges. This endeavor
aims to provide more reliable and innovative solutions for technological progress and
societal development by resolving issues in practical scenarios. Through these efforts,
there will be a continual contribution of greater depth and breadth of knowledge to propel
the development of the field of data science, continuously pushing the boundaries of
technological innovation.
One of the core tasks of intelligent data analysis is to effectively handle vast amounts
of data and extract insightful information that informs decision making [11]. The essence
of this article is to delve into the latest developments in the field of intelligent data analysis
and explore how these technological innovations can be applied to address real-world
challenges in the realms of society, economy, and science. By comprehensively understand-
ing the latest developments in this field, one can better grasp the trends in technological
advancement. This knowledge enables a more flexible application of these innovative
technologies in practical scenarios.
Proactively exploring and implementing forward-looking approaches is pivotal for
advancing intelligent and efficient data processing methods across diverse fields. This
adaptability is indispensable for navigating the ever-evolving landscape of emerging
complex challenges. Immersing oneself in the dynamic realm of intelligent data analy-
sis facilitates not only better adaptation but also leadership in the unfolding trends of
data science.
This proactive stance plays a crucial role in fostering innovation and formulating
practical solutions that make significant contributions to the sustainable development of
2
Electronics 2024, 13, 226
society, the economy, and the scientific domain. Delving deeper into the intricacies of
intelligent data analysis not only enhances our capacity to address current issues but also
positions us at the forefront of anticipating and responding to future challenges.
In this context, keeping abreast of emerging technologies and methodologies is
paramount, allowing us to harness the full potential of data-driven insights. Embrac-
ing a forward-thinking mindset empowers us to not only meet present demands but also
to shape and propel the future of data science. This proactive engagement acts as a catalyst
for developing and implementing innovative solutions with far-reaching implications for
the betterment of our global community.
Within this Special Issue, twenty-eight papers are published, encompassing diverse
aspects of decision making, recommendation systems, intrusion detection, question an-
swering, as well as topics in ML and deep learning (DL).
2. Overview of Contributions
For diverse domain requirements, numerous intelligent granular computing models
have been established. The utilization of knowledge distance serves to quantify distinctions
between granular spaces, representing an uncertainty metric with robust discriminative
capabilities in rough set theory. However, the existing knowledge distance metric falls short
when considering the relative disparities between granular spaces within the context of
uncertain concepts. To address this gap, Yang et al. (Contribution 1) explored the concept
of relative knowledge distance for intuitionistic fuzzy concepts.
Air pollution poses a significant environmental threat that could have potential con-
sequences for human health. The emergence of IoT devices enables instantaneous and
ongoing surveillance of atmospheric contaminants in metropolitan regions. However, the
presence of uncertainty and inaccuracy in IoT sensor data present challenges in the effective
utilization and fusion of these data. Additionally, divergent opinions among decision-
makers regarding air quality evaluation (AQE) can impact final decisions. Addressing
these issues, Li et al. (Contribution 2) systematically investigated a method utilizing
hesitant trapezoidal fuzzy information, examining its application in AQE.
The multigranulation rough set (MGRS) model, extending the Pawlak rough set, de-
scribes uncertain concepts using optimistic and pessimistic upper/lower approximate
boundaries. However, existing information granules in MGRS lacked sufficient approx-
imate descriptions of uncertain concepts. In response, Yang et al. (Contribution 3) in-
troduced the cost-sensitive multigranulation approximation of rough sets, encompassing
optimistic and pessimistic approximations, grounded in approximation set theory. The
associated properties of these approximations are scrutinized. Additionally, a cost-sensitive
selection algorithm is proposed for optimizing the multigranulation approximation.
A myriad of research endeavors have extensively explored diverse facets within the
field. In this context, Liu and his colleagues (Contribution 4) investigated the utilization
of contextual information and users’ interest preferences within location-based social
networks to propose the subsequent point-of-interest for users in the IoT environment.
Their study demonstrated that their model, named CGTS-HAN, could more accurately
capture the contextual features of users’ POI compared to alternative models.
Addressing the tendency of recommender systems to overlook diverse neighbor views
in collaborative filtering, Zheng et al. (Contribution 5) proposed a multiscale neighbor-
aware attention network. This approach integrates overarching semantics from various
neighbor types with significant local embeddings of multiscale neighbors. The collabo-
rative signals for predicting user ratings of items are derived from a range of neighbors,
encompassing both attribute views and interaction views.
Modeling users’ dynamic preferences is a challenging yet crucial task in recommen-
dation systems. Hu et al. (Contribution 6) systematically addressed this challenge by
considering both local fluctuations in user interests and the need for global stability.
Coping with vast amounts of data requires sophisticated methodologies. Variations in
procedures and protocols across healthcare services and facilities has resulted in the incom-
3
Electronics 2024, 13, 226
4
Electronics 2024, 13, 226
knowledge base. This approach employed graph convolutional networks, facilitating the
effective pooling of information across diverse dependency structures. The result was a
heightened efficacy in the representation of sequence vectors.
Amidst efforts to control healthcare expenses and adapt to changing regulations,
pharmaceutical laboratories aim to prolong the longevity of crucial equipment, particularly
fluid bed dryers crucial for drug manufacturing. Barriga et al. (Contribution 16) proposed a
pioneering solution that incorporates exploration data analysis and a Catboost ML model to
tackle challenges associated with older dryers lacking real-time temperature optimization
sensors. The integration of the Catboost algorithm resulted in a noteworthy decrease in
initial heating time, leading to substantial energy conservation. The ongoing surveillance
of essential parameters signified a departure from traditional fixed-time models, indicating
a paradigm shift in the industry.
Recognizing orphan genes (OGs) can be a labor-intensive process. To address this
challenge, Gao et al. (Contribution 17) introduced XGBoost-A2OGs, an automated predictor
specifically designed for the identification of OGs in seven angiosperm species. The
methodology involves the utilization of hybrid features and XGBoost.
Accurately classifying imbalanced data classes poses a formidable challenge due to
the inherent uneven distribution in datasets. To tackle this obstacle, the incorporation of
sampling procedures into ML and DL algorithms has underscored its indispensability. In
this context, Sadaiyandi et al. (Contribution18) conducted a study that employed sampling-
based ML and DL approaches to automate the identification of deteriorating trees within a
forest dataset.
In the process of feature learning, conventional models for abnormal state detection
frequently neglect the variation in position and orientation system data within the frequency
domain. This neglect results in the forfeiture of vital feature details, hindering the possibility
for additional improvements in detection capability. To overcome this limitation and with
the goal of improving UAV flight safety, Yang et al. (Contribution 19) introduced a technique
for detecting abnormal UAV states.
Autonomous underwater vehicles (AUVs) encounter challenges in underwater naviga-
tion due to the considerable costs associated with inertial navigation devices and Doppler
velocity logs, which impede the acquisition of essential navigation data. In addressing
this issue, methodologies such as underwater simultaneous localization and mapping are
employed. These approaches, coupled with navigation methods reliant on perceptual
sensors like vision and sonar, aim to enhance self-positioning precision. In the field of
machine learning (ML), extensive datasets play a crucial role in improving algorithmic
performance. Wang et al. (Contribution20) introduced an underwater navigation dataset
derived from controllable AUVs.
A network intrusion detection (NID) tool grapples with network data characterized
by high feature dimensionality and an imbalanced distribution across categories. Presently,
certain detection models exhibit suboptimal accuracy in practical detection scenarios. In
response to these challenges, Zhang et al. (Contribution 21) introduced an NID model
leveraging multi-head attention and bidirectional long short-term memory.
To address the accuracy limitations of the traditional interacting multiple-model (IMM)
algorithm in target tracking, Wang (Contribution 22) proposed an innovative algorithm
named VSIMM-CS. This algorithm adopts a variable structure interacting multiple-model
approach. The real-time construction of its model ensemble is based on the initial set,
considering both the error characteristics of a linear system and the inherent symmetry in
the structure of the model set.
Semi-supervised classification stands as a fundamental approach for addressing incom-
plete tag information without manual intervention. Nevertheless, prevailing algorithms
necessitate the storage of all unlabeled instances, leading to iterative processes with po-
tential drawbacks, such as slow execution speed and substantial memory requirements,
particularly for large datasets. While previous solutions have primarily concentrated
on supervised classification, Song et al. (Contribution 23) presented a novel approach
5
Electronics 2024, 13, 226
aimed at reducing the size of the unlabeled instance set in the context of semi-supervised
classification algorithms.
To enhance scatter quality without a notable reduction in the lateral resolution of the
delay multiply and sum (DMAS) beamforming coherence factor, Guo (Contribution 24) in-
troduced an adaptive, spatio-temporally smoothed coherence factor combined with DMAS.
In this research, the generalized coherence factor was applied to identify local coherence
and dynamically ascertain the subarray length for spatial smoothing. Incorporating this
parameter to assess the results improved scatter quality without a substantial compromise
in lateral precision, making it particularly advantageous in intricate clinical environments.
In the field of intelligent manufacturing, the proficient use of industrial robots faces a
hurdle due to the issue of low absolute positioning accuracy. Tao et al. (Contribution 25)
presented an algorithm for precise compensation in the absolute positioning of industrial
robots, leveraging deep belief networks through an offline compensation approach. They
employed deep belief networks through an offline compensation approach, optimizing
these networks using a differential evolution algorithm. Additionally, they introduced a
position error mapping model incorporating evidence theory. The aim is to streamline
the process of precision compensation, specifically targeting the enhancement of absolute
positioning accuracy in industrial robots.
The detection of blind spot obstacles in intelligent wheelchairs holds significance,
particularly within semi-enclosed environments of elderly communities. Current solutions
relying on LiDAR and 3D point clouds are costly, difficult to implement, and demand
substantial computing resources and time. Du et al. (Contribution 26) introduced an
optimized lightweight obstacle detection model called GC-YOLO, based on YOLOv5
architecture.
While sentiment analysis has been extensively researched, the majority of studies have
concentrated on analyzing individual corpora. Yang et al. (Contribution 27) introduced a
pioneering framework, CNEC, tailored for conducting sentiment analysis on bilingual text
that includes emojis, commonly found on social media platforms.
Knowledge graph question answering supported users without mandating data struc-
ture comprehension, addressing challenges such as semantic understanding, retrieval
errors, word abbreviation, object complement, and entity ambiguity. To tackle these issues,
Zuo (Contribution 28) presented the innovative Chunked Learning Network method. The
model incorporated vector representations of entities and predicates into the question, fully
leveraging embeddings derived from the knowledge graph. Adapted for diverse scenarios,
the model utilizes a variety of approaches to acquire vector representations for the subject
entities and relationships within the question.
Author Contributions: C.Z., W.L., H.Z. and T.Z. worked together in the whole editorial process
of the Special Issue, “Advances in Intelligent Data Analysis and Its Applications”, published by
the journal Electronics. H.Z. and T.Z. drafted this editorial introduction. C.Z. and W.L. reviewed,
edited, and finalized the manuscript. All authors have read and agreed to the published version of
the manuscript.
Funding: This editorial was supported in part by the Natural Science Foundation of Chongqing
(No. CSTB2023NSCQ-MSX0152), the Special Fund for Science and Technology Innovation Teams of
Shanxi (202204051001015), the Science and Technology Research Program of Chongqing Education
Commission (Nos. KJZD-K202300807, KJQN202300202, KJQN202100206), the Training Program
for Young Scientific Researchers of Higher Education Institutions in Shanxi, the Cultivate Scientific
Research Excellence Programs of Higher Education Institutions in Shanxi (CSREP) (2019SK036), and
the China Postdoctoral Science Foundation (No. 2023T160401).
Conflicts of Interest: The authors declare no conflicts of interest.
List of Contributions
1. Yang, J.; Qin, X.; Wang, G.; Zhang, X.; Wang, B. Relative Knowledge Distance Measure
of Intuitionistic Fuzzy Concept. Electronics 2022, 11, 3373.
6
Electronics 2024, 13, 226
2. Li, W.; Zhang, C.; Cui, Y.; Shi, J. A Collaborative Multi-Granularity Architecture for
Multi-Source IoT Sensor Data in Air Quality Evaluations. Electronics 2023, 12, 2380.
3. Yang, J.; Kuang, J.; Liu, Q.; Liu, Y. Cost-Sensitive Multigranulation Approximation in
Decision-Making Applications. Electronics 2022, 11, 3801.
4. Liu, X.; Guo, J.; Qiao, P. A Context Awareness Hierarchical Attention Network for
Next POI Recommendation in IoT Environment. Electronics 2022, 11, 3977.
5. Zheng, J.; Jing, T.; Cao, F.; Kang, Y.; Chen, Q.; Li, Y. A Multiscale Neighbor-Aware
Attention Network for Collaborative Filtering. Electronics 2023, 12, 4372.
6. Hu, J.; Liu, Q.; Zhao, F. Local-Aware Hierarchical Attention for Sequential Recommen-
dation. Electronics 2023, 12, 3742.
7. Wilcox, C.; Giagos, V.; Djahel, S. A Neighborhood-Similarity-Based Imputation Algo-
rithm for Healthcare Data Sets: A Comparative Study. Electronics 2023, 12, 4809.
8. Wang, J.; Jia, Y.; Sangaiah, A.K.; Song, Y. A Network Clustering Algorithm for Protein
Complex Detection Fused with Power-Law Distribution Characteristic. Electronics
2023, 12, 3007.
9. Park, C.; Han, E.; Kim, I.; Shin, D. A Study on the High Reliability Audio Target
Frequency Generator for Electronics Industry. Electronics 2023, 12, 4918.
10. Song, D.; Zhao, Y. A Data-Driven Approach Using Enhanced Bayesian-LSTM Deep
Neural Networks for Picks Wear State Recognition. Electronics 2023, 12, 3593.
11. Chen, Y.; Wang, J.; Yang, T.; Li, Q.; Nijhum, N.A. An Enhancement Method in Few-
Shot Scenarios for Intrusion Detection in Smart Home Environments. Electronics 2023,
12, 3304
12. Cui, X.; Li, Y.; Xie, Z.; Liu, H.; Yang, S.; Mou, C. ADQE: Obtain Better Deep Learning
Models by Evaluating the Augmented Data Quality Using Information Entropy.
Electronics 2023, 12, 4077.
13. Bieliński, A.; Rojek, I.; Mikołajewski, D. Comparison of Selected Machine Learning
Algorithms in the Analysis of Mental Health Indicators. Electronics 2023, 12, 4407.
14. Qu, J.; Chen, B.; Liu, C.; Wang, J. Flight Delay Prediction Model Based on Lightweight
Network ECA-MobileNetV3. Electronics 2023, 12, 1434.
15. Zhang, C.; Zha, D.; Wang, L.; Mu, N.; Yang, C.; Wang, B.; Xu, F. Graph Convolution
Network over Dependency Structure Improve Knowledge Base Question Answering.
Electronics 2023, 12, 2675.
16. Barriga, R.; Romero, M.; Hassan, H. Machine Learning for Energy-Efficient Fluid Bed
Dryer Pharmaceutical Machines. Electronics 2023, 12, 4325.
17. Gao, Q.; Zhang, X.; Yan, H.; Jin, X. Machine Learning-Based Prediction of Orphan
Genes and Analysis of Different Hybrid Features of Monocot and Eudicot Plants.
Electronics 2023, 12, 1433.
18. Sadaiyandi, J.; Arumugam, P.; Sangaiah, A.K.; Zhang, C. Stratified Sampling-Based
Deep Learning Approach to Increase Prediction Accuracy of Unbalanced Dataset3.
Electronics 2023, 12, 4423.
19. Yang, T.; Chen, J.; Deng, H.; Lu, Y. UAV Abnormal State Detection Model Based on
Timestamp Slice and Multi-Separable CNN. Electronics 2023, 12, 1299.
20. Wang, C.; Cheng, C.; Yang, D.; Pan, G.; Zhang, F. Underwater AUV Navigation
Dataset in Natural Scenarios. Electronics 2023, 12, 3788.
21. Zhang, J.; Zhang, X.; Liu, Z.; Fu, F.; Jiao, Y.; Xu, F. A Network Intrusion Detection
Model Based on BiLSTM with Multi-Head Attention Mechanism. Electronics 2023,
12, 4170.
22. Wang, Q.; Li, G.; Jin, W.; Zhang, S.; Sheng, W. A Variable Structure Multiple-Model
Estimation Algorithm Aided by Center Scaling. Electronics 2023, 12, 2257.
23. Song, Y.; Zhang, J.; Zhao, X.; Wang, J. An Accelerator for Semi-Supervised Classifica-
tion with Granulation Selection. Electronics 2023, 12, 2239.
24. Guo, Z.; Geng, X.; Yao, F.; Liu, L.; Zhang, C.; Zhang, Y.; Wang, Y. An Improved
Spatio-Temporally Smoothed Coherence Factor Combined with Delay Multiply and
Sum Beamformer. Electronics 2023, 12, 3902.
7
Electronics 2024, 13, 226
25. Tao, Y.; Liu, H.; Chen, S.; Lan, J.; Qi, Q.; Xiao, W. An Off-Line Error Compensation
Method for Absolute Positioning Accuracy of Industrial Robots Based on Differential
Evolution and Deep Belief Networks. Electronics 2023, 12, 3718.
26. Du, J.; Zhao, S.; Shang, C.; Chen, Y. Applying Image Analysis to Build a Lightweight
System for Blind Obstacles Detecting of Intelligent Wheelchairs. Electronics 2023,
12, 4472.
27. Yang, T.; Liu, Z.; Lu, Y.; Zhang, J. Centrifugal Navigation-Based Emotion Computation
Framework of Bilingual Short Texts with Emoji Symbols. Electronics 2023, 12, 3332.
28. Zuo, Z.; Zhu, Z.; Wu, W.; Wang, W.; Qi, J.; Zhong, L. Improving Question Answering
over Knowledge Graphs with a Chunked Learning Network. Electronics 2023, 12, 3363.
References
1. Chen, Y.H.; Yao, Y.Y. A multiview approach for intelligent data analysis based on data operators. Inf. Sci. 2008, 178, 1–20.
[CrossRef]
2. Yang, J.; Li, Y.; Liu, Q.; Li, L.; Feng, A.; Wang, T.; Zheng, S.; Xu, A.; Lyu, J. Brief introduction of medical database and data mining
technology in big data era. J. Evid.-Based Med. 2020, 13, 57–69. [CrossRef] [PubMed]
3. Young, T.; Hazarika, D.; Poria, S.; Cambria, E. Recent trends in deep learning based natural language processing. IEEE Comput.
Intell. Mag. 2017, 13, 55–75. [CrossRef]
4. Lin, T.Y. Granular computing: From rough sets and neighborhood systems to information granulation and computing with words.
In Proceedings of the European Congress on Intelligent Techniques and Soft Computing, Aachen, Germany, 8–11 September 1997;
pp. 1602–1606.
5. Abkenar, S.B.; Kashani, M.H.; Mahdipour, E.; Jameii, S.M. Big data analytics meets social media: A systematic review of
techniques, open issues, and future directions. Telemat. Inform. 2020, 57, 101517. [CrossRef] [PubMed]
6. Kaur, H.; Pannu, H.S.; Malhi, A.K. A systematic review on imbalanced data challenges in machine learning. ACM Comput. Surv.
2019, 52, 1–36. [CrossRef]
7. Gupta, S.; Kar, A.K.; Baabdullah, A.M.; Al-Khowaiter, W. Big data with cognitive computing: A review for the future. Int. J. Inf.
Manag. 2018, 42, 78–89. [CrossRef]
8. Buxton, B.; Goldston, D.; Doctorow, C.; Waldrop, M. Big data: Science in the petabyte era. Nature 2008, 455, 8–9. [PubMed]
9. Zhang, C.; Li, D.Y.; Liang, J.Y. Multi-granularity three-way decisions with adjustable hesitant fuzzy linguistic multigranulation
decision-theoretic rough sets over two universes. Inf. Sci. 2020, 507, 665–683. [CrossRef]
10. Chen, G.Q.; Li, Y.L.; Wei, Q. Big data driven management and decision sciences: A NSFC grand research plan. Fundam. Res. 2021,
1, 504–507. [CrossRef]
11. Lei, Y.; Jia, F.; Lin, J.; Xing, S.; Ding, S.X. An intelligent fault diagnosis method using unsupervised feature learning towards
mechanical big data. IEEE Trans. Ind. Electron. 2016, 63, 3137–3147. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
8
electronics
Article
A Study on the High Reliability Audio Target Frequency
Generator for Electronics Industry
Changsik Park 1,2 , Euntack Han 1,3 , Ikjae Kim 1,4 and Dongkyoo Shin 1,5, *
Abstract: The frequency synthesizer performs a simple function of generating a desired frequency by
manipulating a reference frequency signal, but stable and precise frequency generation is essential for
reliable operation in mechanical equipment such as communication, control, surveillance, medical,
and commercial fields. Frequency synthesis, which is commonly used in various contexts, has been
used in analog and digital methods or hybrid methods. Especially in the field of communication,
a precise frequency synthesizer is required for each frequency band, from very low-frequency
AF (audio frequency) to high-frequency microwaves. The purpose of this paper is to design and
implement a highly reliable frequency synthesizer for application to a railway track circuit systems
using AF frequency only with the logic circuit of an FPGA (field programmable gate array) without
using a microprocessor. Therefore, the development trend of analog, digital, and hybrid frequency
synthesizers is examined, and a method for precise frequency synthesizer generation on the basis
of the digital method is suggested. In this paper, the generated frequency generated by applying
the digital frequency synthesizer using the ultra-precision algorithm completed by many trials and
errors shows the performance of generating the target frequency with an accuracy of more than
Citation: Park, C.; Han, E.; Kim, I.; 99.999% and a resolution of mHz, which is much higher than the resolution of 5 Hz in the previous
Shin, D. A Study on the High study. This highly precise AF-class frequency synthesizer contributes greatly to the safe operation
Reliability Audio Target Frequency
and operation of braking and signaling systems when used in transportation equipment such as
Generator for Electronics Industry.
railways and subways.
Electronics 2023, 12, 4918. https://
doi.org/10.3390/electronics12244918
Keywords: frequency synthesizer; direct frequency synthesizer; indirect frequency synthesizer;
Academic Editors: Chao Zhang, railway track circuit
Wentao Li, Huiyan Zhang and
Tao Zhan
which maintains stable output without shaking the generated frequency, like phase noise.
For the frequency synthesizer, DDFS (direct digital frequency synthesizer), which is faster
than PLL, is used to synthesize the desired frequency quickly [2]. However, the digital
frequency synthesizer can synthesize the desired frequency quickly, but due to the nature
of the microprocessor operated by the program mainly used here, it has a fatal flaw, such as
malfunction or inoperability due to some external factors and environmental variables. For
that reason, it is intentionally stuck to analog methods in industrial or highly stable special
applications, not for general commercial or personal use. However, due to the various
convenient characteristics of frequency synthesis, digital frequency synthesizers that can
operate stably in disturbance environments such as Surge are sometimes implemented and
used only by pure logic circuits without a microprocessor. Therefore, in this paper, to gener-
ate the target frequency used in the railway track circuit, the target frequency is generated
using the pure logic of the FPGA to ensure the convenience, excellent performance, and
safety of the digital frequency synthesizer. FPGA-based frequency synthesizers have been
studied as shown in [8], but most of them deal with relatively high frequencies, and it is rare
to use very low frequencies such as audio frequency bands as used in railway track circuits.
In this paper, we investigate the technical development stage of frequency synthesis and
its theoretical structure and design, fabricate, and simulate a frequency synthesizer with
mHz deviation without a processor using only FPGA logic on the traditional structure of
the digital frequency synthesizer DDFS.
10
Electronics 2023, 12, 4918
to (a) by first passing four oscillation frequencies through each of the four generators and
then selecting them as switches.
In contrast, indirect frequency synthesizers are widely known and widely used PLL.
PLL is a technique that compares the output signal of a VCO (voltage-controlled oscillator)
with respect to an input signal and adjusts the frequency of the VCO to maintain a constant
phase difference of the output signal of the VCO with respect to the input signal. In an
indirect frequency synthesizer, a PLL is used to generate the desired frequency. At this
time, the components of the PLL determine the performance of the frequency synthesizer.
The PLL system proposed in [1] by Yoon Kwang-sup and others includes the com-
ponents of the integer-N PLL. The reference clock generated in the reference divider is
compared with the VCO output signal in a PFD (phase-frequency detector) to generate an
up/down signal, and a CP (charge pump) converts the up/down signal into a current and
transmits it to a LF (loop filter). The LF is used to convert current to voltage and control the
frequency of the VCO; the 1/N Divider in Figure 2 divides the output signal of the VCO
by N to finally produce the desired frequency. Fractional-N (N) dividers are also used to
combine integers and fractions to enable finer frequency control.
The advantage of this PLL system is that the spurious signal level is reduced due to
the LF operation, and it is simpler than the direct analog frequency synthesizer. However,
it is a disadvantage that the frequency switching time is increased and the phase noise is
higher than in the direct analog method.
However, it is a disadvantage that the frequency switching time is increased and the
phase noise is higher than in the direct analog method. The phase noise performance of the
frequency synthesizer in the LF bandwidth can be represented by λ = λPFD + 10logN and
λPFD is the accumulated phase noise of the reference frequency, phase detector, LF, and
feedback 1/N divider inputted to the phase detector.
Yuchen Wang, Xuguang Bao, and Wei Hua have applied PLL to accurately determine
the rotor position of a PMSM using a permanent magnet using the excellent phase lock
capability of [10] PLL. Phase analysis of a three-phase signal is generally based on a
synchronous reference system. PLL (SRF-PLL), a synchronous reference system, is the
most widely used technique for extracting phase, frequency, and amplitude in a three-
phase system. In this thesis, a phase shift PLL is used to map an asymmetric phase shift
signal to a two-phase fixed coordinate system. In the study [11] of Kim Sang-woo and
others applied to the design of a low-power frequency synthesizer for a GPS receiver using
PLL, a frequency synthesizer was studied by applying a traditional fractional-N divider.
Figure 3 shows the block diagram of the frequency synthesizer studied in [11]: PFD as a
11
Electronics 2023, 12, 4918
phase detector, CP as a charge pump, active low-pass filter, VCO, fractional-N divider, and
sigma-delta modulator.
Figure 4 is a DLL-based FS block diagram, which is very similar to PLL, except that
it has a VCDL (Voltage-Controlled Delay Line) instead of the VCO of PLL, and some
researchers define it as a class of PLL. The idea of DLL is basically designed to solve
errors related to delays that inevitably occur as the clock signal of the system goes through
several stages. Despite the advantages of low noise and no phase accumulation, DLL
systems are generally not recommended for FS applications due to unprogrammable,
limited multiplicative factors, and high power consumption during operation [6].
12
Electronics 2023, 12, 4918
In Figure 5, the frequency control word is added to the sum of the accumulator input
from the phase accumulator and calculated, and the value is implemented in the bit adder
and the result is supplied to the accumulator register.
On the one hand, the waveform data, in which the value sent as the sample address
corresponds to the phase-amplitude conversion circuit, is outputted according to the ad-
dress value. It is transformed through the D/A converter and LPF to the analog waveform
and waveform data is outputted
The biggest advantage of this DDS is that the output frequency of the Heltz (Hz)
level is generated by the fine frequency resolution due to the phase accumulator, but the
limitation of available bandwidth and spurious performance are disadvantages. At this
time, the highest possible frequency is limited to less than half of the clock frequency by the
Nyquist theorem, and spurious noise is higher than the analog frequency synthesis method
due to quantization and DAC conversion errors.
A.A. Alsharef et al. implemented the typical DDS of Figure 6 in FPGA (field pro-
grammable gate array) in [12]. FPGA is a device that is composed of unit blocks called CLB
(Configurable Logic Block) rather than individual logic devices. It can be used as a desired
input and output by the user, thereby reducing the complexity of hardware circuits and
increasing reliability. The DDS using FPGA is designed as a LUT consisting of a Verilog
code, which is also composed of PA, LUT, and D/A and simulated by the RTL (register
transfer language) model.
Matt Bergeron and Alan N. Willson, Jr. studied 1 GHz DDS on FPGA in [13]. The
fast orthogonal DDS implemented with the FPGA is based on a new multiplier-based
angle rotation algorithm that does not distort the magnitude of the sine and cosine outputs.
The algorithm is designed to be well mapped to the DSP slices present in the FPGA. The
device is implemented in the Xilinx Virtex-7 device and consumes 54.9 mW at 1 GHz, a
performance previously only achieved in the ASIC design.
Another study using FPGA proposed a frequency synthesizer with a frequency reso-
lution of 1.5 kHz, with a power consumption of 3.96 mW and a spurious performance of
59 dBc in the quadrature DDS of [14] studied by M. Saber Saber, M. Elmasry, and M. Eldin
Abo-Elsoud.
In this study, ROM is not used for low-power implementation during operation on
the FPGA. A simple approach that compensates for the shortcomings of the converter from
phase to amplitude in the structure of a typical DDS, as shown in Figure 6, is to use a ROM
with a function called LUT; however, as shown in the following formula:
W
f out = Fclk (1)
2L
13
Electronics 2023, 12, 4918
accumulator leads to spurious noise, but this approach is commonly used because it
achieves a fine frequency resolution that requires a very large value for L.
To reduce the memory size in LUT-based FS, various angular decomposition methods
have been proposed, which typically consist of dividing the ROM into several small units,
each of which is processed as part of the truncated phase accumulator output. Data
searched in each low-rank ROM is added, and the sine curve approximation value is
produced. Therefore, the proposed structure in [14] states that the sine function is divided
into linear segments, each segment has a linear equation, and the value of this equation is
obtained through additional hardware.
Wenjun Chen et al. [15] studied how to implement DDS performance improvement
with the CORDIC (coordinated rotation digital computer) algorithm. They used XILINX’s
FPGA to reduce the output delay by repeatedly merging into a small amount of ROM,
which can be seen to realize a sinusoidal wave with an SFDR of 86.76 dB at a high frequency
of 350 MHz. Yixiong Yang et al. propose the LUT-ROT (rotation) architecture of traditional
DDS in [16]. In order to optimize the speed and area of 2 GHz DDS, a performance of
11.7 mW/GHz is implemented in an area of 0.016 mm2 by pipelined LUT.
14
Electronics 2023, 12, 4918
binary weighting control, and the simulation results show that the ADFL can operate in
the frequency range between 50 MHz and 500 MHz.
In this case, the resolution of the frequency tends to increase as the reference clock
frequency is small and the number of PA bits is large, as shown in the following equation.
The frequency resolution of DDS can be defined as the reference clock frequency divided
by the bit of the accumulator, which can be expressed by the following equation:
FCLK
ΔF = (3)
2L
Therefore, it is necessary to properly limit the size of the ROM with a large L value for
fine tuning and precise frequency generation.
15
Electronics 2023, 12, 4918
The frequency generated in this way can be applied to various fields, but in this
study, the frequency used for the railway AF track circuit is generated. The AF track
circuit frequency modulates and transmits FSK (frequency shift keying), and the reception
demodulates it to detect and analyze the transmitted frequency to determine whether
there is a train in the corresponding track circuit section. That is, when the frequency is
detected, it is judged that there is no train in the corresponding track circuit section, and if
the detection is not performed, it is judged that there is a train. FSK is a frequency shift
modulation method in which data selects different frequencies between 0 and 1 according
to a frequency having a constant amplitude.
If the frequency shift is A, the following equation becomes the FSK modulated signal.
16
Electronics 2023, 12, 4918
17
Electronics 2023, 12, 4918
25
USB
81 art1 R
MSIO0 B2 SB ATA7 B 82 27
3.3V
M2S010 T G144 0
S 2
5V 3.3V
C18 FS8 2
3 8 1
FS4 4 C
5V 5V FS2 4
104 7R 5 2
FS1 1
16
3
4 R14 2002 R15 2002 R10 R11 R12 R13 S ROTAR BC SM
K8 5 1
2 1002 1002 1002 1002
K7 5V
V
6 18
K6 7 3 RFB
K5 8 4
K4 9 5 C19
K3 10 6
7 330 COG
4
K2 11 18A
8
4
K1 12 1 2 17B 3
9 O T1
-
13 1 6 FRE O T
10
-
14 3 7 1
11
+
15 MC6482 R16 1002 5 2
12
+
2 MC6482
O T2
8
17
VREFI
8
CO 2
G
3
5V
TC7541ABS
B. Component configuration
The components of the internal logic consist of PCSFR, Value_filter, ADC_A and clock
buffer, and the components except for the clock buffer are configured as follows:
18
Electronics 2023, 12, 4918
In the entire compile block of Figure 11, terminals for input and output are connected
to the PCSFRGEN block. The inside of the PCSFRGEN block is configured as shown in
Figure 12 below.
19
Electronics 2023, 12, 4918
Figure 13 is composed of the value filter logic block created for chattering, protecting
the switch input of the input unit, and the frequency generator, Figures 14 and 15 show the
FRGEN logic block and the ADC A OUT logic block having an 8-bit look-up table for the
sine wave output. Each of the configured compile blocks is shown below.
Figure 15 is a detailed compiled block of this block, and the details are divided into
Figures 16 and 17.
20
Electronics 2023, 12, 4918
The FRGEN block may be referred to as a set of initial logic for frequency generation
(Figure 14). The logic configuration and the compiled capture circuit of each part thereof
are as follows:
21
Electronics 2023, 12, 4918
22
Electronics 2023, 12, 4918
23
Electronics 2023, 12, 4918
24
Electronics 2023, 12, 4918
25
Electronics 2023, 12, 4918
26
Electronics 2023, 12, 4918
The output upper 8 bits are input to a DAC circuit (AD7541) and converted into an
analog sine wave output, and the converted signal is generated as an audio frequency
output to output a complete sinusoidal frequency.
The FPGA chip of the proposed system uses the Smart -Fusion2 SoC M2S010 [19] of
Microchip Co., Ltd. The maximum usable logic device of this chip is composed of 12,084
highly integrated system chip ICs.
The M2S010 is designed for low power consumption and is a chip device that pro-
vides excellent reliability and security for multipurpose applications such as video image
processing, I/O expansion and conversion, and Gigabit Ethernet. The ARM series MPU is
also built-in, but this proposed system is not used because of its high reliability. Figure 25.
Shows the internal block diagram of the M2S010.
27
Electronics 2023, 12, 4918
were measured and recorded by selecting the frequency with the octal frequency change
switch, and the output frequency was described in detail in the experimental results.
4. Experimental Results
After repeated trials and errors in the method of optimally generating the target
frequency processed by the FPGA logic block, the results of the following equations
were verified.
1
Y= × 213 × 226 × Frequency (7)
CLOCK
By implementing the practical structure and algorithm that precisely creates the AF
band DDFS used in railway track circuits with FPGA, we show that 16 frequencies currently
used in Europe and Korea are implemented with a precision of 99.9980%~99.9996%, and
this is shown as a simulation result.
As a result of the simulation, it was confirmed that a stable and accurate frequency
output with a frequency deviation superior to the target error range was made. Compared
to the results of previous studies using FPGAs with a deviation of 5 Hz in the range of [8]
0–160 kHz, it shows that it is much better than the deviation of 1~2 Hz, which is [20] of the
Bombardia product specification. The results can be seen in Tables 2 and 3.
Table 2. Simulation result.
28
Electronics 2023, 12, 4918
Simulation
As shown in Figure 27, the board was connected and operated.
The output results observed while turning the TWS for frequency change on the right
side of the test board are shown in Figures 28 and 29:
(1) Track frequency A test results and waveforms
The signal waveform of the ADC7541 A/D converter output is 100 − (100 × (1682.0007
− 1682)/1682) = 99.9999%, and the upper frequency accuracy is shown in Table 2.
The simulation results and waveforms from the track circuit generation frequency B
to H can be confirmed in the attached Appendix A.
29
Electronics 2023, 12, 4918
5. Conclusions
In this paper, we propose a method to implement AF for railway track circuits in
DDS using a microchip FPGA. This frequency generator is composed of pure logic circuits
without using a general CPU, minimizing the factors of malfunction and suggesting the
possibility of increasing safety in key industries. By proposing a practical structure and
algorithm that precisely creates DDFS in the AF band, 16 frequencies currently used in
railway track circuits are implemented with a precision of 99.9980–99.9996%, and these
are shown as simulation results. This proves that the performance is superior to the 5 Hz
deviation of the previous study [8].
It was able to generate very stable and accurate frequency output, and it is judged that
it will be possible to make a precise frequency generator with high reliability in the field
of key industries such as railways. These results are expected to enhance the safety and
user convenience of the control system in key industries such as railways. In the future, it
is expected that the research on multiple AF DDFS that generate multiple frequencies at
the same time by developing this study will be practically useful in various industries.
Author Contributions: Conceptualization, C.P. and E.H.; methodology, E.H.; software, E.H.; vali-
dation, C.P. and E.H.; formal analysis, C.P.; investigation, C.P.; resources, C.P.; data curation, C.P.;
writing—original draft preparation, C.P.; writing—review and editing, I.K.; visualization, D.S.; super-
vision, D.S.; project administration, D.S. All authors have read and agreed to the published version of
the manuscript.
Funding: This work was supported by the National Research Foundation of Korea (NRF) grant
funded by the Korea government (MSIT) (No. 2022R1F1A1074773).
Data Availability Statement: Data are contained within the article.
Conflicts of Interest: The authors declare no conflict of interest. The companies had no role in
the design of the study; in the collection, analyses, or interpretation of data; in the writing of the
manuscript; or in the decision to publish the results.
Appendix A
In this section, measurement results from Group B to Group H among the simulation
results in Table 2 are described.
(1) Track frequency B test results and waveforms
30
Electronics 2023, 12, 4918
The signal waveform shown in the ADC7541 A/D conversion circuit output is
100 − (100 × (2279.0043 − 2279)/2279) = 99.9998% compared with the lower frequency
value designed with the frequency of 2279 Hz when the TWS (thumb-wheel switch) is
located at 1, and the upper frequency accuracy is shown in Table 2.
(2) Track frequency C test results and waveforms
31
Electronics 2023, 12, 4918
The signal waveform shown in the ADC7541 A/D conversion circuit output is
100 − {100 × (1979.0036 − 1979)/1979} = 99.9998% compared with the lower frequency
value designed with the frequency of 2279 Hz when the TWS (thumb-wheel switch) is
located at 1, and the upper frequency accuracy is shown in Table 2.
(3) Track frequency D test results and waveforms
The signal waveform shown in the ADC7541 A/D conversion circuit output is
100 − {100 × (2576.0030 − 2576)/2576} = 99.9998% compared with the lower frequency
value designed with the frequency of 2279 Hz when the TWS (thumb-wheel switch) is
located at 1, and the upper frequency accuracy is shown in Table 2.
32
Electronics 2023, 12, 4918
The signal waveform shown in the ADC7541 A/D conversion circuit output is
100 − {100 × (1532.0026 − 1532)/1532} = 99.9998% compared with the lower frequency
value designed with the frequency of 2279 Hz when the TWS (thumb-wheel switch) is
located at 1, and the upper frequency accuracy is shown in Table 2.
(5) Track frequency F test results and waveform
33
Electronics 2023, 12, 4918
The signal waveform shown in the ADC7541 A/D conversion circuit output is
100 − {100 × (2129.0021 − 2129)/2129} = 99.9999% compared with the lower frequency
value designed with the frequency of 2279 Hz when the TWS (thumb-wheel switch) is
located at 1, and the upper frequency accuracy is shown in Table 2.
(6) Track frequency G test results and waveforms
The signal waveform shown in the ADC7541 A/D conversion circuit output is
100 − {100 × (1831.0037 − 1831)/1831} = 99.9998% compared with the lower frequency
value designed with the frequency of 2279 Hz when the TWS (thumb-wheel switch) is
located at 1, and the upper frequency accuracy is shown in Table 2.
34
Electronics 2023, 12, 4918
The signal waveform shown in the ADC7541 A/D conversion circuit output is
100 − {100 × (2428.0057 − 2428)/2428} = 99.9997% compared with the lower frequency
value designed with the frequency of 2279 Hz when the TWS (thumb-wheel switch) is
located at 1, and the upper frequency accuracy is shown in Table 2.
References
1. Yoon, K.; Song, M.; Noh, J.; Lee, K. Design of Data Converters and PLL; Hongneung Science Publishing House: Daejeon, Republic of
Korea, 2013; pp. 299–324. [CrossRef]
2. Tierney, J.; Radre, C.M.; Gold, B. A Digital Frequency Synthesizer. IEEE Trans. Audio Electro Acoust. 1971, AU-19, 49–50. Available
online: https://ieeexplore.ieee.org/abstract/document/1162151 (accessed on 27 November 2023).
3. Ryu, H.G.; Lee, H.S. Analysis and Minimization of Phase Noise of The Digital Hybrid PLL Frequency Synthesizer. IEEE Trans.
Consum. Electron. 2002, 48, 305–306. Available online: https://ieeexplore.ieee.org/abstract/document/1010136 (accessed on 27
November 2023).
4. Kim, D.C.; Chi, Y.E.; Park, J. High-Resolution Digital Beamforming Receiver Using DDS–PLL Signal Generator for 5G Mobile
Communication. IEEE Trans. Antennas Propag. 2022, 70, 1429–1430. Available online: https://ieeexplore.ieee.org/abstract/
document/9539071 (accessed on 27 November 2023). [CrossRef]
5. Queiroz, E.D.; Ota, J.I.Y.; Pomilio, J.A. State-Space Representation Model of Phase-Lock Loop Systems for Stability Analysis of
Grid-connected Converters. In Proceedings of the 14th IEEE International Conference on Industry Applications 2021, São Paulo,
Brazil, 15–18 August 2021; pp. 388–389. Available online: https://ieeexplore.ieee.org/abstract/document/9529609 (accessed on
27 November 2023).
6. Akurwy, S.H. A Novel ROM Design for High Speed Direct Digital Frequency Synthesizer; Lap Lambert Academic Publishing:
Saarbrücken, Germany, 2014; pp. 6–15.
35
Electronics 2023, 12, 4918
7. Gao, S.; Barnes, M. Phase-locked loops for grid-tied inverters: Comparison and testing. In Proceedings of the 8th IET International
Conference on Power Electronics, Machines and Drives (PEMD 2016), Glasgow, UK, 19–21 April 2016. Available online:
https://digital-library.theiet.org/content/conferences/10.1049/cp.2016.0304 (accessed on 27 November 2023).
8. Shan, C.; Chen, Z.; Yuab, H.; Hu, W. Design and Implementation of a FPGA-based Direct Digital Synthesizer. In Proceedings of
the 2011 International Conference on Electrical and Control Engineering, Yichang, China, 16–18 September 2011; pp. 614–615.
Available online: https://ieeexplore.ieee.org/abstract/document/6057152 (accessed on 27 November 2023).
9. Rokita, A. Direct Analog Synthesis Modules for an X-Band Frequency Source; Telecommunications Research Institute: Daejeon,
Republic of Korea, 1997; pp. 63–64. Available online: https://ieeexplore.ieee.org/abstract/document/737920 (accessed on 27
November 2023).
10. Wang, Y.; Bao, X.; Hua, W. Implementation of Embedded Magnetic Encoder for Rotor Position Detection Based on Arbitrary
Phase Shift Phase Lock Loop. IEEE Trans. Ind. Electron. 2002, 69, 2035–2037. Available online: https://ieeexplore.ieee.org/
abstract/document/9369043 (accessed on 27 November 2023). [CrossRef]
11. Kim, S.; Kim, J.; Oh, H.; Cheon, J.; Park, G.; Go, S.; Lee, K. Design of Low Power Frequency Synthesizer for GPS Receiver. Korea
Inst. Intell. Transp. Syst. 2008, 11a, 165–168.
12. Alssharef, A.A.; Ali, M.A.M.; Sanusi, H. Direct Digital Frequency Synthesizer Design and Implementation on FPGA. Res. J. Appl.
Sci. 2012, 7, 387–390. [CrossRef]
13. Bergeron, M.; Willson, A.N. A 1-GHz Direct Digital Frequency Synthesizer in an FPGA. In Proceedings of the 2014 IEEE
International Symposium on Circuits and Systems (ISCAS), Melbourne, VIC, Australia, 1–5 June 2014; pp. 329–332. Available
online: https://ieeexplore.ieee.org/abstract/document/6865132 (accessed on 27 November 2023).
14. Saber, M.S.; Elmasry, M.; Abo-Elsoud, M.E. Quadrature Direct Digital Frequency Synthesizer Using FPGA. In Proceedings of the
2006 International Conference on Computer Engineering and Systems, Cairo, Egypt, 5–7 November 2006; pp. 14–15. Available
online: https://ieeexplore.ieee.org/abstract/document/4115478 (accessed on 27 November 2023).
15. Chen, W.; Wu, T.; Tang, W.; Jin, K.; Huang, G. Implementation Method of CORDIC Algorithm to Improve DDFS Performance.
In Proceedings of the IEEE 3rd International Conference on Electronics Technology 2020, Chengdu, China, 8–12 May 2020;
pp. 58–61. Available online: https://ieeexplore.ieee.org/abstract/document/9119621 (accessed on 27 November 2023).
16. Yang, Y.; Wang, Z.; Yang, P.; Chang, M.F.; Ho, M.S.; Yang, H.; Liu, Y. A 2-GHz Direct Digital Frequency Synthesizer Based on LUT
and Rotation. In Proceedings of the 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, 27–30
May 2018; pp. 1–3. Available online: https://ieeexplore.ieee.org/abstract/document/8351207 (accessed on 27 November 2023).
17. Kim, D.; Lee, H.; Kim, J.; Kim, S. Design and Modeling of a DDS Driven Offset PLL with DAC. Korea Internet Broadcast. Commun.
Soc. 2012, 12, 1–9. [CrossRef]
18. Gothandaraman, A.; Islam, K.S. An All-Digital Frequency Locked Loop (ADFLL) with a Pulse Output Direct Digital Frequency
Synthesizer (DDFS) and an Adaptive Phase Estimator. In Proceedings of the IEEE Radio Frequency Integrated Circuits Symposium
2003, Philadelphia, PA, USA, 9–10 June 2023; pp. 303–305. Available online: https://ieeexplore.ieee.org/abstract/document/12
13949 (accessed on 27 November 2023).
19. Bombardier. EBI Track 200 TI21 Audio Frequency Track Circuit Technical Manual; Bombardier: Montreal, QC, Canada, 2019. Available
online: https://docplayer.net/28867426-Ebi-track-200-ti21-audio-frequency-track-circuit.html (accessed on 27 November 2023).
20. Microchip. FPGA and SoC Product Families; Microchip Technology Inc.: Chandler, AZ, USA, 2019; pp. 3–5. Available online:
http://ww1.microchip.com/downloads/en/DeviceDoc/00002871B.pdf (accessed on 27 November 2023).
21. Transport RailCorp. TI21 Track Circuit Test and Investigation Guideline; Transport RailCorp: Sydney, NSW, Australia, 2016;
pp. 13–14. Available online: https://www.transport.nsw.gov.au/industry/asset-standards-authority/find-a-standard/ti21-
track-circuit-test-and-investigation (accessed on 27 November 2023).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
36
electronics
Article
A Neighborhood-Similarity-Based Imputation Algorithm for
Healthcare Data Sets: A Comparative Study
Colin Wilcox 1 , Vasileios Giagos 2 and Soufiene Djahel 3, *
1 Department of Computing and Mathematics, Manchester Metropolitan University, Manchester M15 6BH, UK;
[email protected]
2 Department of Mathematical Sciences, University of Essex, Colchester CO4 3SQ, UK; [email protected]
3 Centre for Future Transport and Cities, Coventry University, Priory Street, Coventry CV1 5FB, UK
* Correspondence: [email protected]
Abstract: The increasing computerisation of medical services has highlighted inconsistencies in the
way in which patients’ historic medical data were recorded. Differences in process and practice
between medical services and facilities have led to many incomplete and inaccurate medical histories
being recorded. To create a single point of truth going forward, it is necessary to correct these
inconsistencies. A common way to do this has been to use imputation techniques to predict missing
data values based on the known values in the data set. In this paper, we propose a neighborhood
similarity measure-based imputation technique and analyze its achieved prediction accuracy in
comparison with a number of traditional imputation methods using both an incomplete anonymized
diabetes medical data set and a number of simulated data sets as the sources of our data. The aim
is to determine whether any improvement could be made in the accuracy of predicting a diabetes
diagnosis using the known outcomes of the diabetes patients’ data set. The obtained results have
proven the effectiveness of our proposed approach compared to other state-of-the-art single-pass
imputation techniques.
replacement to more complex statistical approaches. They can be broadly split into several
types of approaches [6]:
• Normal imputation: When the data is numerical, we can use simple techniques, such
as mean or modal values for a feature, to fill in the missing data. For data that is more
categorical (i.e., they have a defined and limited range of possible values), then the
most frequently occurring modal value for this feature can be used.
• Class-based imputation: Instead of replacing missing data with a calculated value
based on existing feature values above, the replacement is done based on some internal
classification. This approach determines the replacement value based on the values of
a restricted subclass of known feature values.
• Model-based imputation: A hybrid approach where the missing value is consid-
ered as the class, and all the remaining features are used to train the model for
predicting values.
The problem we aim to address concerns the rapidly growing amount of incomplete
personal medical data that exists. The rapid increase in volume and complexity of this
data has highlighted potential problems and issues caused by our current reliance on this
incomplete or inaccurate information. Such unqualified use may lead to a loss or misinter-
pretation of critical medical information. This problem is not limited to a medical domain
and equally applies to any problem domain that uses incomplete personal information
in a technology-driven environment. The focus of this paper is on a medical context, but
the solution should be readily generalizable to other problem domains. The existence and
use of incomplete medical data may lead to a loss or misrepresentation of critical medical
information [6]. The increasing amount and variety of stored data about individuals in the
smart healthcare era only emphasizes the urgency in finding solutions to this problem [7].
Our approach will select imputed data values in a more localized manner, thus applying a
more intelligent selection of candidate values rather than one of the more simplistic, and
widely used, imputation methods.
In this paper, we propose a neighborhood-based imputation algorithm that uses the
idea of feature value similarity in similar data records to predict missing feature values
in incomplete records. This subset of candidate records is specific to a single incomplete
record and so is recreated for each incomplete record found in a data set. This differs from
other imputation techniques, which may consider all records in the data set and give a
more general and less localized result, or other approaches, which determine neighborhood
values based on other criteria such as using weighted average or variance estimation
techniques [7].
Our algorithm aims to improve on some of the limitations of existing imputation
algorithms, especially kNNs, by providing a fast, yet accurate imputation process suitable
for use on, initially, medical data, but also on more generic incomplete data sets from
other similar problem domains. The main contributions of this work can be summarized
as follows:
• Reducing the speed degradation of the algorithm as the size of the data set increases.
• The way imputed values are selected is more localized rather than potentially using
all similar values in the data set.
• Reducing the negative impact of outlying values by making imputed values selection
more localized.
• Providing a solution that can be extended for use with textual and categorical data, as
well as numeric data.
The remainder of this paper is organized as follows. In Section 2, we present the
background to understanding the problem being studied in this paper. Section 3 presents
our proposed algorithm to improve prediction accuracy, and Section 4 evaluates the perfor-
mance of our proposed technique in comparison with other imputation methods. Section 5
discusses our conclusions and findings during this work, and, finally, Section 6 indicates
some directions for future work.
38
Electronics 2023, 12, 4809
39
Electronics 2023, 12, 4809
• The kNNs does not work well with imbalanced data. Given two potential choices of
classification, the algorithm will naturally tend to be biased towards a result taken
from the largest data subsets, thus leading to potentially more misclassifications.
• The kNNs is sensitive to outlying values, as the choice of closest neighbors is based
on an absolute measure of the distance.
Our algorithm aims to improve on these drawbacks, especially in the areas of outlier
sensitivity, thereby reducing the likelihood of misclassification and the choice of imputed
feature values. Since the kNNs uses the mean of the k-nearest feature values, this could lead
to a value being calculated that does not appear in any of the actual complete records; our
algorithm removes this scenario by only choosing imputed feature values from a pool of
candidate values taken from the actual feature values of the most similar complete records.
The class of nearest neighbour predictive algorithms can make accurate predictions,
which do not require a human-readable model [16]. The quality of these predictions
depends on the measure of the distance between the data values [17]. There are several
advantages to this class of algorithms, including a robustness for noisy data and the ability
to be tuned quite easily. However, the kNNs has some drawbacks, such as the need for
all the feature values for any missing value to be considered. This was a motivation and
opportunity to use a more localized approach for determining missing data values [16].
40
Electronics 2023, 12, 4809
single-pass imputation algorithms, which either replace the missing feature value with the
mean (MAV) and modal (MDAV) values of the known feature values or just remove all
incomplete records from the processed data set.
Using single values carries with it a level of uncertainty about which values to impute.
Multiple imputation reduces this uncertainty by calculating several different possible
values (“imputations”). Several versions of the incomplete data sets are created, which
are then combined to make the “best” value selections. Such an approach has several
advantages such as reducing bias and minimizing the likelihood of errors being introduced
to the rebuilt data sets, thus improving the validity of the data and increasing the precision
or closeness between two or more imputed values, which makes the data set more resistant
to outlying values [21,22].
The second stage is to use common statistical methods to fit the model of interest to
each of the imputed data sets. Estimated associations in each of the imputed data set will
differ because of the variation introduced in the imputation of the missing values, and
they are only useful when averaged together to give overall estimated associations. Valid
inferences are obtained because we are averaging over the distribution of the missing data
given the observed data [23,24].
Other data-focused approaches using machine learning and deep data analysis tech-
niques are being used as a means of predicting medical events from incomplete medical
data sets. The use of such automated tools in the identification and prediction of medi-
cal conditions is becoming increasingly important due to the shortage of skilled medical
professionals, as well as their ability to increase the prediction accuracy, thus reducing the
burden on medical staff [25,26].
3. Proposed Algorithm
In this section, we outline our approach to improving the effectiveness of predict-
ing binary outcomes based on a series of numerical feature values. We used a suitably
anonymized diabetes diagnosis data set, which identified whether a patient with diabetes
has been positively diagnosed (true positive) or whether one who does not have diabetes
has been negatively diagnosed (false positive).
∀r ∈ D, r = ( f 0 , f 1 , f 2 , . . . f i − 1, f i + 1, . . .) (1)
• Use the k-fold (with k = 10) [27,28] technique to partition D into non-intersecting
subsets. In turn, each subset (fold) will be considered to be the test fold, and the
remaining folds will be used as training folds. For each record in the test fold, we
apply a comparison function F (), which is in our case the cosine similarity, to obtain
a numerical measure of how similar the test record is to the current record in the
training folds. An ordered similarity table, S, is maintained and stores details of each
training record and how similar it is to the current test record. This is repeated until
the test record has been compared against all the records in all the training folds. After
each change to the contents of S, it will be sorted in such a way that the most similar
training record will appear as the first item in the list. This could be more complicated
depending on the comparison function used, but in our case, the sort order is merely
used to maintain the n-closest items (defining the neighborhood) in S in an increasing
41
Electronics 2023, 12, 4809
cosine similarity order. The contents of S must be cleared once all the training set
records have been compared and are ready for subsequent cycles.
Folds containing a large number of records can increase the time needed to compare all
the combinations of these records against a given test record. This could result in a relatively
large similarity table. To address this issue of similarity table size, our proposed algorithm
introduces the concept of a neighborhood containing the most similar n records in the training
set. The size of this neighborhood limits the maximum size of the similarity table and is
used as a means of calculating the new replacement value for a missing attribute.
Considering St to be the set of test records and Str to be the set of training records for
a given cycle, such that t ∈ St and tr ∈ Str , we can say that
If there are less than n records in the similarity table, then add the current training
record, tr, into the next freely available position p. If the similarity table already contains n
records and the current test record, t is more similar than the last record in the similarity
table (at position n − 1 for zero-based arrays); then, we replace the last entry in the similarity
table with the current training record tr. This can be shown with the pseudocode below.
clear SimilarityTable , S
FOR EACH t IN testFold DO
p <- 0
FOR EACH tr IN trainingFolds DO
size = count ( S )
IF size < n THEN
S [ p ] < - F (t , tr )
ELSE
IF F (t , tr ) > S [ n - 1 ] THEN
S [ n - 1 ] < - F (t , tr )
Each time the contents of the similarity table are changed, they should be immediately
sorted based on decreasing similarity value to maintain a list of the most similar training
records for the current test record. In order to build a complete data set D, we need to
calculate each of the missing data values across all the records in D. This is achieved by
comparing each row that contains missing values against all the complete rows that exist
in D. By doing this, we build up a similarity table containing the most similar complete
records from which the candidate values for the missing data values may be selected. Once
all the complete records in the data set have been compared against the current incomplete
record, we are in a position to impute the missing values for the current record in order
to make it complete. This record can then be used as a candidate record for matching the
other incomplete records in later cycles of the process. The end result will be a completely
imputed data set, which can then be used for comparison purposes with the different
imputation techniques.
42
Electronics 2023, 12, 4809
training data set and the test data set. The splitting of the source data ensures that the
number of records in the test data set is a fixed proportion of the total number of records
according to the supplied parameters.
Each record in the test data subset is compared in turn with each record in the training
data subset. A comparison of each pair of records is made using the concept of cosine
similarity to obtain a measure of how similar the corresponding pairs of field attributes are
with each other, thus yielding a numeric measure of their similarity. During this process, a
similarity table is built giving a similarity measure of each training record in the training
set against a single test record. This table is maintained such that the record with the most
similar value (i.e., the most similar) is the first record in the table. The rationalization is
that the training set records that are considered to be a similar match to the test record,
and therefore the initial best-matching training record, will have very similar values for
their input arguments, and, as such, they are the best candidates to determine whether the
outcome given by the closest-matching test record was in fact valid.
Finally, a replacement value for the missing attribute, f i , is determined by applying a
prioritized set of rules to choose the most appropriate value from the candidate value set
C. This approach may be extended to include ‘categorical variables’, which describe those
features that take a value from a limited set of possible values. Since the feature value set C,
used as the pool of possible replacement values, is constructed from known feature values
of the most similar records, then the selection rules are equally applicable and will select a
suitable replacement value from C.
Considering the process diagram shown in Figure 1, the similarity modeling process
is split into two main subflows. The colors used are unimportant and just used for high-
lighting purposes. The blue flow describes the processing steps of loading external data
and standardizing it into a form that can be used by the second (green) flow through the
application of the k-fold technique to split the source data set into folds. The green flow
indicates the application of the N-Similarity algorithm. The key points of the algorithm
flow are to take each fold as a test record in turn and apply cross correlation against each of
the remaining training folds to generate the similarity table of the most similar training
records for each record in the test fold. This is repeated for each training record until all
comparison combinations have been performed. For each incomplete record, the missing
feature value is determined by considering the properties of the closest records in the
similarity table, and a candidate is selected based on a number of rules and criteria. The
results of these comparisons are shown in Table 1.
43
Electronics 2023, 12, 4809
Table 1. Relative prediction accuracy of our N-Similarity algorithm compared to the average predic-
tion accuracy across all selected single imputation techniques for different neighbourhood sizes N.
The colour coding scheme used in Table 1 reflects how, for different neighbourhood
sizes, the prediction accuracy of our N-Similarity algorithm compares to the average
prediction accuracy of the other imputation algorithms under consideration. The green
values indicate those measures where our algorithm performs better than the average of
the other imputation algorithms, red values indicate those measures where our algorithm
performs worse, and the blue values indicate those measures where there is marginal
difference between the algorithms.
s2y
θ̂m = αȲ + (1 − α)Ym∗ , α= , (3)
s2y + (τ̂Y2 | X )+
where s2y is the sample variance of y = (y1 , . . . , yl ) for the l-most-similar observations
(comparing Xm to Xobs ), and (τ̂Y2 | X )+ is an approximation of the Empirical Bayes estimate
of [30]:
(τ̂Y2 |X )+ = max 0, λ × sY2 − s2y , (4)
Since 0 ≤ α ≤ 1, and (3) is a weighted average between Ym∗ and Ȳ, which essentially
shrinks the proposal towards the mean Ȳ, the amount of shrinking is determined by
α. When α = 0, (3) suggests a direct imputation with Ym∗ , whereas α = 1 suggests an
imputation using Ȳ. Generally, our candidate imputed value shrinks towards Ȳ when the
variance associated with Ym∗ exceeds the sample variance of Y.
Motivation
We motivate (3) by considering an empirical Bayes approach to our hierarchical model.
We introduce two types of random variables: one expressing the missing values Ym and
one θm| X expressing the neighboorhood-similarity-based guesses (can also be thought of as
model-based guesses) that rely on a relation between Y and X. For each missing value Ym ,
we assume that it is a normal random variable with mean θm| X and a variance σm 2 . This
|X
allows us to express the “true” missing value in relation to our similarity-based guesses:
for ms with small variances (σm2 ), the similarity-based guesses are informative, and for
|X
large variances, they are not.
44
Electronics 2023, 12, 4809
For each θm| x , we again assume a normal distribution with a common mean and
variance (μY | X , τY2 | X ):
| X ∼ N θm| X , σm| X
2 2
Ym X, θm| X , σm (5)
θm| X X, μY | X , τY2 | X ∼ N μY | X , τY2 | X , (6)
which expresses the overall relation of Y given X as a normal distribution with its mean and
variance varying according to X. In other words, instead of considering the similarity-based
guess of the missing value as a single point, we introduce a normal-distributed kernel
centered around it, which depends on the fully observed X. Our two-level hierarchical
model uses (5) locally to express the distribution of Ym and (6) to express the associated
mean θm| X using a global model between X and Y. Given a candidate value Ym∗ , we can
impute Ym with the posterior empirical Bayes mean θ̂m| X [30], which is a point estimate
of θm| X :
θ̂m = αμY | X + (1 − α)Ym∗ ,
where α = σm 2 / ( σ2
|X m| X
+ τY2 |X ). Linear and nonlinear regression models have been used
for the conditional mean μY | X in a Bayesian setting [31], whereas [32] used a nonparametric
kernel regression, but in our performance evaluations, we also considered the weighted
sample mean and sample variance, e.g., s2y = ∑i wi (yi − ȳ)2 , with weights approximated
by a Gaussian kernel with a minimal RMSE improvement. The empirical Bayes estimate
of [30] for τY2 | X is based on sample estimates for σm 2
| X and τY | X :
2
If we consider the case that Y and X are independent, any similarity between Xobs and
Xm provides no information about the missing Ym . This also implies that μY | X and σY2 | X
become the marginal μY and σY2 , respectively. Furthermore, the y sample becomes a random
sample of Y, with ȳ and s2y being unbiased estimates of μY 2 and σ2 , respectively. Therefore,
Y
we can use Ȳ, sY2 , and s2 as approximations for μ , τ̂ 2 , and σ̂2 , respectively, which,
y Y|X Y|X Y|X
under independence, set α towards one and can serve as a warning for noninformative
imputation. Finally, if Y and X are not independent, y will be a conditional sample from
Y | Xm , and we expect var(Y ) ≥ E[var(y)] to lead to to smaller shrinkage (α < 1) towards Ȳ.
4. Performance Evaluation
In this section, we evaluate the performance of our similarity-based approach, using
the sample diabetes data set, in comparison with a number of other imputation techniques.
Sc ∪ Si = D, Sc ∩ Si = ∅
Considering the corresponding feature values of the n-most-similar complete records
in the similarity table created by the stage above, the algorithm creates a set of candidate
values, C, that will be used to replace the current missing feature value. The algorithm uses
a number of simple rules, applied in strict order, to determine which of these candidate
45
Electronics 2023, 12, 4809
values is the most likely to be used as the replacement value for the missing feature in the
current incomplete record.
∀k ∈ Si , f k,i = Sc ( j), 0 ≤ jn
where j is the index of the best candidate value in C.
The set of rules applied to C in determining a predicted value are derived from both
an evaluation of the corresponding feature values in the most similar diabetes records
together with the nature of the values in the candidate set C. The rules are applied in order,
with the most specific selection criteria applied first and moving down to the most general
selection criteria applied last. For the candidate value set, C, apply the following rules in
order of decreasing priority:
1. If there is a unique modal value in C, then use this value as the imputed feature value.
2. For those modal values which occur in C with equal highest frequency, if one of these
modal values has the same feature value as the actual feature value of the most similar
complete record in Sc , then select this modal value as the new imputed feature value
for the current incomplete record.
3. Determine whether one of the values in C lies closer to the median value of the
candidate set than the others. If such a value is found, select this as the imputed
feature value.
4. If none of the previous rules have been satisfied, then select the mean value of C.
By comparing the prediction accuracy of the algorithm on the training data set (training
folds), we can determine that the results are not noticeably different than the results
obtained by applying the algorithm on the test data set (test fold), and therefore, we can
ascertain that the algorithm does not overfit the diabetes data set.
This is repeated for each missing feature in the current partial record Si (k), after which
the now complete record is moved from Si to Sc to become a potential candidate for the
completion of the next incomplete record in Si .
46
Electronics 2023, 12, 4809
z1 , z2 , z3 ∼ Normal(0, 1)
x1 ∼ Poisson(1)
x2 ∼ Uniform(18, 83)
x3 ∼ Exponential(1/30) (7)
x4 = z1 × x3 + 3
√
x5 = x4 × 3 + z2 ∗ 10
x6 = exp(− x2 × 0.2 + z3 )
Table 2 shows the imputation RMSE of the three methods assuming 1.50 and 100 miss-
ing observations (M) per each simulated data set. Overall, the RMSE for the NSIM-EB was
consistently lower than the rest. For x1 , as the number of M increased, the RMSE increased
too for all the methods, which is expected, as x1 is independent from the rest. Generally,
the RMSE of the NSIM was similar, if not slightly reduced, compared to the RMSE of the
kNNs. Both similarity-based methods were faster (NSIM performed in 126 s and NSIM-EB
performed in 167 s; both were implemented in R) compared to the kNNs (215 s) using the
implementation (with Mahalanobis distance) of the yaImpute package [34].
Table 2. Imputation of RMSE for simulated data using similarity (NSIM), similarity with empirical
Bayes correction (NSIM-EB), and k-nearest neighbors (kNNs) methods.
M Method x1 x2 x3 x4 x5 x6
NSIM 1.382 1.402 1.414 1.378 1.345 1.333
1 NSIM-EB 0.996 1.054 1.068 1.052 1.052 1.035
kNNs 1.399 1.422 1.508 1.453 1.456 1.386
NSIM 1.421 1.401 1.398 1.402 1.401 1.380
50 NSIM-EB 1.047 1.035 1.041 1.042 1.042 1.027
kNNs 1.417 1.413 1.407 1.409 1.405 1.399
NSIM 1.420 1.396 1.385 1.386 1.385 1.373
100 NSIM-EB 1.046 1.031 1.034 1.038 1.038 1.014
kNNs 1.413 1.418 1.411 1.415 1.410 1.417
47
Electronics 2023, 12, 4809
standard missing value mechanism (e.g., Section 4.4.3), or we adapted our implementation
(e.g., in Section 4.4.2, the similarity calculations are based only on valid feature values).
Out of the total number of records, 336 were complete (no missing feature values)
(43.75%), and there were 763 missing feature values spread across the data set out of the
total number of 6144 feature values (12.42%).
Value Range
Feature Data Type
(Zero Indicates Missing Value)
Number of Times Pregnant Positive Integer 0. . . 17
Plasma Glucose Concentration Real 0. . . 199
Diastolic Blood Pressure Real 0. . . 122
Triceps Skinfold Thickness Real 0. . . 99
Serum Insulin Levels Real 0. . . 846
Body Mass Index Real 0. . . 67.1
Diabetes Pedigree Function Real 0.078. . . 2.42
Age Positive Integer 21. . . 81
1 = positive diagnosis,
Classification Binary
0 = negative diagnosis
48
Electronics 2023, 12, 4809
Cross validation is a sampling procedure used to evaluate models that use a limited
data sample. The procedure has a single parameter called k that refers to the number of
equal sized groups (or folds) over which the data sample will be equally divided. The
procedure is often called k-fold cross validation and is used to estimate the ability of a machine
learning model to make predictions based on unseen data; it uses a limited sample in order
to estimate how the model is expected to perform in general when used to make predictions
on data not used during the training of the model.
The average cross validation over n folds is given by
k=n
1
n ∑ Similarityk
k =1
where Similarityk is the measure of similarity between the current test and the training
folds for the session run k.
49
Electronics 2023, 12, 4809
50
Electronics 2023, 12, 4809
Table 4. Performance of our proposed N-Similarity algorithm compared against other single imputa-
tion techniques.
Table 5. Imputation RMSEs for simulated data using our similarity method (NSIM), our similarity
method with empirical Bayes correction (NSIM-EB), and the k-nearest neighbors (kNNs) method.
Our algorithm performed better when the source data set had a small percentage of
missing data values, due to our blind random selection of data values across all the folds.
The larger the number of missing data values, the higher the likelihood would be that
some of the folds would be more sparsely populated. The choice of the number of data
51
Electronics 2023, 12, 4809
partitions in the k-fold step needs to be carefully selected; otherwise, we risk the possibility
of introducing bias into the selection of data values put in any given fold. We settled on
k = 10, as much of the academic literature indicated that this was a commonly used value.
One way of limiting the impact of this problem is to use a stratified approach as mentioned
above. We left this direction as a line of potential future work. The choice of the size of the
neighborhood, N, and, as a direct result, the number of candidates in the set of values for
selecting imputed values, was also sensitive. We spent considerable trial-and-error effort
looking for the best selection for this parameter against the PIMA data set; we tried a much
wider range of potential values for N than are shown. The results for these higher values
were negligibly different in our case.
Table 1 shows that N = 4 was the best choice in our case, although this could vary for
different data sets. Further research is required to determine whether the choice of value
for N could be automated by looking at all the possible potential values for N and whether
this approach would even be practical for large data sets in terms of processing time and
improvements in the results.
5. Conclusions
Our neighborhood-based algorithm was able to provide noticeably improved results
when compared against other techniques, but the degree of this improvement was sensitive
to the size of the neighborhood, with some features being more readily improved than
others for smaller neighborhood sizes and other metrics being noticeably less well predicted
as the size of the neighborhood increased. This paper proposes a technique to provide a
more accurate prognosis of possible patient diabetes based on a number of key patient
characteristics. Our approach creates a similarity neighborhood using the most similar
diagnosed patient records and uses the feature set values of these patients to help with
the diagnosis of undiagnosed patients. By comparing our N-Similarity algorithm against
several widely used single-pass imputation techniques using the same collection of data
sets, both real-world and simulated, we found that it produces better results against
several of our performance metrics (Table 4). However, we observed that the size of the
neighborhood had an impact on the performance of our algorithm. We also noticed that
the limited data set sizes and degrees of missingness of the initial source data could impact
the results, and more extensive work would be necessary using a wider range of different
data sets in order to see how these measures are related. The empirical Bayes correction
of the neighborhood-based algorithm offered consistently smaller RMSEs over the simple
algorithm and the k-nearest neighbors imputation, with minimal computational overhead.
In addition to the performance advantages, we recommended it as a general method, since
the shrinking parameter α indicated a degree of certainty between our inputted value and
the sample mean (with zero indicating certainty of the similarity of the inputted value and
one indicating most uncertain).
6. Future Work
The main limitation of our current work is that the PIMA data set contains only
numeric feature values. Future work could include support for both categorical and textual
data. Both types of information are widely found in medical data sets and would help to
support the usefulness of our algorithm in this domain, as well as in other similar domains.
The implementation of our algorithm has been deliberately developed to be loosely coupled
to the source data to allow for different file formats and structures in the source data to be
supported with minimal effort, thus allowing for generalization of the code for different
future uses.
To aid with future development of this algorithm, we have provided the full source
code to the software we used to generate the presented results. The source code, written in
the Go programming language, can be freely used and modified, and it has been designed
to be modular and loosely coupled to any data set, thereby making it easier to extend
as required.
52
Electronics 2023, 12, 4809
Author Contributions: Conceptualization, C.W., V.G. and S.D.; Methodology, C.W., V.G. and S.D.;
Software, C.W.; Formal analysis, V.G.; Investigation, C.W.; Data curation, C.W.; Writing—original
draft, C.W.; Writing—review & editing, V.G. and S.D.; Supervision, V.G. and S.D.; Project administra-
tion, S.D. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: The main Github repository can be found here: https://github.com/
ColinWilcox1967/PhD-DataSetAnalysis-Diabetes (accessed on 22 November 2023); An example of
how this code may be used with other data sets is given here: https://github.com/ColinWilcox1967/
PHD-DataSetAnalysis-Traffic (accessed on 22 November 2023).
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
References
1. Tang, J.; Zhang, X.; Yin, W.; Zou, Y.; Wang, Y. Missing data imputation for traffic flow based on combination of fuzzy neural
network and rough set theory. J. Intell. Transp. Syst. Technol. Plan. Oper. 2019, 5, 439–454. [CrossRef]
2. Agrawal, R.; Prabakaran, S. Big data in digital healthcare: Lessons learnt and recommendations for general practice. Heredity
2020, 124, 525–534. [CrossRef] [PubMed]
3. Adam, K. Big Data Analysis And Storage. In Proceedings of the 2015 International Conference on Operations Excellence and
Service Engineering, Orlando, FL, USA, 10–11 September 2015; pp. 648–658.
4. Ford, E.; Rooney, P.; Hurley, P.; Oliver, S.; Bremner, S.; Cassell, J. Can the Use of Bayesian Analysis Methods Correct for
Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life
Clinical Data. Public Health 2020, 8, 54. [CrossRef] [PubMed]
5. Xiaochen, L.; Xia, W.; Liyong, Z.; Wei, L. Imputations of missing values using a tracking-removed autoencoder trained with
incomplete data. Neurocomputing 2019, 266, 54–65. [CrossRef]
6. Singhal, S. Defining, Analysing, and Implementing Imputation Techniques. 2021. Available online: https://www.analyticsvidhya.
com/blog/2021/06/defining-analysing-and-implementing-imputation-techniques/ (accessed on 22 November 2023).
7. Beretta, L.; Santaniello, A. Nearest neighbor imputation algorithms: A critical evaluation. BMC Med. Inform. Decis. Mak. 2016, 16,
197–208. [CrossRef] [PubMed]
8. Khaled, F.; Mahmoud, I.; Ahmad, A.; Arafa, M. Advanced methods for missing values imputation based on similarity learning.
Clim. Res. 2022, 7, e619. [CrossRef]
9. Huang, G. Missing data filling method based on linear interpolation and lightgbm. J. Phys. Conf. Ser. 2021. [CrossRef]
10. Peppanen, J.; Zhang, X.; Grijalva, S.; Reno, M.J. Handling bad or missing smart meter data through advanced data imputation.
In Proceedings of the 2016 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), Ljubljana,
Slovenia, 9–12 October 2016; pp. 1–5. [CrossRef]
11. Jackobsen, J.; Gluud, C.; Wetterslev, J.; Winkel, P. When and how should multiple imputation be used for handling missing data
in randomised clinical trials—A practical guide with flowcharts. BMC Med. Res. Methodol. 2017, 17, 162. [CrossRef]
12. Hayati Rezvan, P., Lee, K.J.; Simpson, J.A. The rise of multiple imputation: A review of the reporting and implementation of the
method in medical research. BMC Med. Res. Methodol. 2015, 15, 30. [CrossRef]
53
Electronics 2023, 12, 4809
13. Nguyen, C.; Carlin, J.; Lee, K. Practical strategies for handling breakdown of multiple imputation procedures. Emergent Themes
Epidemiol. 2021, 18, 5. [CrossRef]
14. Guo, G.; Wang, H.; Bell, D.; Bi, Y.; Greer, K. KNN Model-Based Approach in Classification. In Confederated International Conferences
“On The Move To Meaningful Internet Systems 2003”; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany,
2003; Volume 2888, pp. 986–996. [CrossRef]
15. Pohl, S.; Becker, B. Performance of Missing Data Approaches Under Nonignorable Missing Data Conditions. Methodology 2018,
16, 147–165. [CrossRef]
16. Ali, N.; Neagu, D.; Trundle, P. Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets. SN Appl. Sci.
2019, 1, 1559. [CrossRef]
17. Abu Alfeilat, H.A.; Hassanat, A.B.A.; Lasassmeh, O.; Tarawneh, A.S.; Alhasanat, M.B.; Eyal Salman, H.S.; Prasath, V.S. Effects of
Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review. Big Data 2019, 7, 221–248. [CrossRef]
18. Khan, S.; Hoque, A. SICE: An improved missing data imputation technique. J. Big Data 2020, 7, 37. Available online:
https://journalofbigdata.springeropen.com/articles/10.1186/s40537-020-00313-w (accessed on 22 November 2023). [CrossRef]
[PubMed]
19. Misztal, M. Imputation of Missing Data Using R. Acta Univ. Lodz. Folia Oeconomica 2012, 269, 131–144.
20. Kowarik, A.; Templ, M. Imputation with the R Package VIM. J. Stat. Softw. 2016, 74, 1–16. [CrossRef]
21. Choi, J.; Dekkers, O.; Le Cessie, S. A comparison of different methods to handle missing data in the context of propensity score
analysis. Eur. J. Epidemiol. 2019, 34, 23–36. [CrossRef]
22. Cetin-Berber, D.; Sari, H. Imputation Methods to Deal With Missing Responses in Computerized Adaptive Multistage Testing.
Educ. Psychol. Meas. 2018, 79, 495–511. [CrossRef]
23. Alwohaibi, M.; Alzaqebah, M. A hybrid multi-stage learning technique based on brain storming optimization algorithm for
breast cancer recurrence prediction. J. King Saud Univ. Comput. Inf. Sci. 2021, 34, 5192–5203. [CrossRef]
24. Kabir, G.; Tesfamariam, S.; Hemsing, J.; Rehan, S. Handling incomplete and missing data in water network database using
imputation methods. Sustain. Resilient Infrastruct. 2020, 5, 365–377. [CrossRef]
25. Mujahid, M.; Rustam, F.; Shafique, R.; Chunduri, V.; Villar, M.G.; Ballester, J.B.; Diez, I.D.L.T.; Ashraf, I. Analyzing Sentiments
Regarding ChatGPT Using Novel BERT: A Machine Learning Approach. Information 2023, 14, 474.
26. Mujahid, M.; Rehman, A.; Alam, T.; Alamri, F.S.; Fati, S.M.; Saba, T. An Efficient Ensemble Approach for Alzheimer’s Disease
Detection Using an Adaptive Synthetic Technique and Deep Learning. Diagnostics 2023, 13, 2489. [CrossRef] [PubMed]
27. Nti, I.; Nyarko-Boateng, O.; Aning, J. Performance of Machine Learning Algorithms with Different K Values in K-Fold Cross Validation;
MECS Press: Hong Kong, China, 2021. [CrossRef]
28. Brownlee, J. How to Configure k-Fold Cross-Validation; Machine Learning Mastery: San Juan, PR, USA, 2020.
29. Little, R.J.; Rubin, D.B. Statistical Analysis with Missing Data; John Wiley & Sons: Hoboken, NJ, USA, 2019; Volume 793.
30. Carlin, B.; Louis, T. Bayes and Empirical Bayes Methods for Data Analysis, 2nd ed.; Chapman and Hall CRC: Boca Raton, FL,
USA, 2000. [CrossRef]
31. Zhou, X.; Wang, X.; Dougherty, E.R. Missing-value estimation using linear and non-linear regression with Bayesian gene selection.
Bioinformatics 2003, 19, 2302–2307. [CrossRef] [PubMed]
32. Cheng, P.E. Nonparametric Estimation of Mean Functionals with Data Missing at Random. J. Am. Stat. Assoc. 1994, 89, 81–87.
[CrossRef]
33. Root Mean Squared Error Definition. 2022. Available online: https://www.sciencedirect.com/topics/engineering/root-mean-
squared-error (accessed on 22 November 2023).
34. Crookston, N.L.; Finley, A.O. yaImpute: An R package for kNN imputation. J. Stat. Softw. 2008, 23, 1–16. [CrossRef]
35. PIMA Indian Diabetes Database. 2016. Available online: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-
database (accessed on 22 November 2023).
36. Lin, W.C.; Chih-Fong, T. Missing value imputation: A review and analysis of the literature (2006–2017). Artif. Intell. Rev. 2020, 53,
1487–1509. [CrossRef]
37. Dong Y, P.C. Principled missing data methods for researchers. Springerplus 2013, 2, 222. [CrossRef]
38. Huang, L.; Wang, C.; Rosenberg, N.A. The Relationship between Imputation Error and Statistical Power in Genetic Association
Studies in Diverse Populations. Am. J. Hum. Genet. 2009, 85, 692–698. [CrossRef]
39. Pepinsky, T.B. A Note on Listwise Deletion versus Multiple Imputation. Political Anal. 2018, 26, 480–488. [CrossRef]
40. Lall, R. How multiple imputation makes a difference. Political Anal. 2006, 24, 414–433. [CrossRef]
41. Allison, P. Listwise Deletion: It’s NOT Evil. 2014. Available online: https://statisticalhorizons.com/listwise-deletion-its-not-evil/
(accessed on 22 November 2023).
42. Joachim Schork, S.G. Imputation Methods (Top 5 Popularity Ranking). 2019. Available online: https://statisticsglobe.com/
imputation-methods-for-handling-missing-data/ (accessed on 22 November 2023).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
54
electronics
Article
Applying Image Analysis to Build a Lightweight System
for Blind Obstacles Detecting of Intelligent Wheelchairs
Jiachen Du 1 , Shenghui Zhao 2, *, Cuijuan Shang 2 and Yinong Chen 3
1 School of Computer Science and Engineering, Anhui University of Science and Technology,
Huainan 232000, China
2 School of Computer and Information Engineering, Chuzhou University, Chuzhou 233100, China
3 Computer Science and Engineering, Arizona State University, Tempe, AZ 85287, USA
* Correspondence: [email protected]
Abstract: Intelligent wheelchair blind spot obstacle detection is an important issue for semi-enclosed
special environments in elderly communities. However, the LiDAR- and 3D-point-cloud-based
solutions are expensive, complex to deploy, and require significant computing resources and time.
This paper proposed an improved YOLOV5 lightweight obstacle detection model, named GC-YOLO,
and built an obstacle dataset that consists of incomplete target images captured in the blind spot
view of the smart wheelchair. The feature extraction operations are simplified in the backbone and
neck sections of GC-YOLO. The backbone network uses GhostConv in the GhostNet network to
replace the ordinary convolution in the original feature extraction network, reducing the model size.
Meanwhile, the CoordAttention is applied, aiming to reduce the loss of location information caused
by GhostConv. Further, the neck stem section uses a combination module of the lighter SE Attention
module and the GhostConv module to enhance the feature extraction capability. The experimental
results show that the proposed GC-YOLO outperforms the YOLO5 in terms of model parameters,
GFLOPS and F1. Compared with the YOLO5, the number of model parameters and GFLOPS are
reduced by 38% and 49.7%, respectively. Additionally, the F1 of the proposed GC-YOLO is improved
by 10% on the PASCAL VOC dataset. Moreover, the proposed GC-YOLO achieved mAP of 90% on
the custom dataset.
an important task. In target detection algorithms across various fields, diverse sensor-
based methods, including lidar, millimeter-wave radar, and ultrasonic radar sensors, are
employed to address different scenarios. Employing the 3D point cloud encoding of lidar
for the purpose of detection [6], which is characterized by high computational complexity
and data sparsity, and has limitations on small mobile devices; 3D target detection from
LiDAR data for autonomous driving has shown good performance in 3D vision detection
such as autonomous driving [7], but has the limiting issues of high cost and complexity
of deployment on lightweight mobile devices, which can be a challenge for real-time
applications or resource-limited devices. In regard to model lightweight design [8], the
guide dog robot realizes traffic light and motion target detection based on the actual scene
requirements using the MobileNet algorithm. The algorithm’s lightweight advantage is
effectively utilized to address the problem, highlighting the importance of lightness on
mobile devices. The deep learning target detection algorithm used in this paper has made
significant contributions to the field of computer vision, providing an effective method for
solving the problem of detecting safety hazards in a wheelchair’s blind field of view. This
algorithm, which is based on deep learning and designed to detect targets, analyzes real-
time visual information around wheelchair users, helping the elderly avoid accidentally
hitting dangerous obstacles in the blind zones on either side of the wheelchair, such as
dogs, cats, potholes, and human bodies that are incompletely represented at low angles,
such as feet, legs, and wheels. These targets were gathered to construct a custom dataset.
Collaborative annotation and video data management tools can be utilized for curation
purposes [9]. In this paper, the approach to handling video data involves processing it
frame by frame, resulting in an image dataset. When used in these areas, the following
issues must be addressed.
• Target Specificity. The targets displayed on both sides of the wheelchair are incomplete.
In the case of oversized targets, only part of the target’s feature map is captured, such
as feet, legs, and wheels.
• Model lightweighting issues. Adapting to resource-constrained environments and
meeting the needs of resource-constrained devices.
• The issue of heavy performance loss caused by lightweighting. Losing model perfor-
mance while lightweighting is difficult to balance.
Aiming at the above problems, this paper mainly starts from three parts, the first is
to collect the target information under the low view angle to form the unique dataset, the
second in the model is to go through the Ghost [10] module to obtain the sparse feature
maps, and at the same time, utilize the CoordAtt [11] attention to obtain the channel
information and the position information, and finally, the two parts of the features will
be integrated, then, through the residual block adjustment in the neck part, the channel
information of the feature map is enhanced by the SE [12] attention, and more features
are captured to compensate for the feature loss problem caused by the convolution in the
GhostNet idea, and a richer feature output is obtained through the residual connection,
finally, trained on the PASCAL VOC dataset, the number of model parameters is nearly
3/5 of the original, and the GFLOPS are equivalent to 1/2 of the original, with almost the
same detection time, but the overall accuracy and F1 value are significantly improved.
2. Related Work
Target detection models generally have a complex network structure and a large
number of parameters, resulting in slow operation, large memory occupancy, and power
consumption on low-end devices when the model is deployed on the mobile side. To solve
these problems, in recent years, researchers have proposed that the study of lightweight
target detection models focuses on two aspects: one is a lightweight model based on the
network structure, and the other is based on some special tricks to reduce the computational
and parametric quantities of the model, so that it can be efficiently operated on low-
end devices.
56
Electronics 2023, 12, 4472
57
Electronics 2023, 12, 4472
Model Quantification
To reduce the model parameters and computation for network depth and width,
the model is mathematically modeled using two metrics, GFLOPS (the model’s floating
point operations, which denotes the amount of computation in billions of floating point
operations required by the model to perform inference) and Parameters (the model’s
parameter count, which denotes the total number of parameters to be trained in the model).
Backbone partially improves the efficiency of residual feature extraction in the C3 module,
reducing computational complexity and the number of parameters. Assuming that the
GFLOPS of the original Backbone with C3 module is Fbackbone and the number of parameters
is Pbackbone , α(0 < α < 1) is a scaling factor for reducing the computational complexity
and the number of parameters, the GFLOPS and Parameters of the improved Backbone
module are α × Fbackbone and α × Pbackbone , respectively. The original GFLOPS of the Neck
part is Fneck and the number of parameters is Pneck, and the computational overhead is
reduced by a scaling factor β(0 < β < 1), so that the quantized GFLOPS and Parameters
are β × Fneck and β × Pneck , respectively. In summary, the parameters and computational
quantities of the model before and after definition are
58
Electronics 2023, 12, 4472
Y = X × F1×1 (3)
In the Ghost module, only half of the features, the essential features, are smaller than
the original output features, which will lose the captured spatial and position information,
and to consider this loss, this paper will use the attention module to enhance its spatial and
position features.
4. Model Structure
4.1. Yolov5 Algorithm Principle
The YOLOv5 network structure consists of four main parts: Input, Backbone, Neck
and Head. The four parts, respectively, perform data input processing, feature learning,
feature enhancement processing, and target detection and classification.
Input performs Mosic operations on the input data, mainly cutting, splicing, resizing,
and optimizing the input image data to compute the anchor frames. Mosic data augmen-
tation is used to increase the diversity of the dataset, thus increasing the robustness and
generalizability of the model.
Backbone is mainly used for feature learning, and the main constituent modules are
C3 and SPPF (Spatial Pyramid Pooling—Fast).
The C3 module is similar to the original CSP (Cross-Stage Partial Network) structure,
which is mainly used to simplify the network structure, reduce the number of convolutional
layers and channels, and maintain the performance, and the SPPF module is the fusion of
deep and shallow information to improve the feature extraction ability of the network.
The Neck structure uses a PANet structure to achieve feature enhancement through
multi-layer feature fusion of top-down and bottom-up deep and shallow features, thereby
increasing the robustness of the model and improving the accuracy of the target detection.
The Head structure obtains the position of the prediction frame target in the input
image as well as the category information by designing three detection heads for detecting
targets of different scales, each of which acquires feature information of different scale sizes
from different layers of the Neck.
59
Electronics 2023, 12, 4472
Figure 1. Improved GC-YOLO model diagram. Compared to the native YOLOv5 model, the enhanced
GC-YOLO model replaces the original C3 module in the backbone section with the CAGhost and
replaces the original C3 module in the Neck section with the GhostSE module.
4.3. CA-Ghostbotelneck
CA-GhostBotelneck (shown in Figure 2), as a key network module in the backbone
network, adopts ideas from GhostnetV2 [28]. The CA-GhostBotelneck in this paper takes
into account the fact that the Ghost module is only half functional and the nature features
are smaller than the original output features, and when extracting the features for the input
feature map X ∈ R H ×W ×C , the output Y ∈ R H ×W ×Cout is obtained, and the features of Y
are lost in both the channel information and the position information. In this paper, the
input X is processed in two stages. First, the sparse feature map Y is obtained by the Ghost
module, and second, the channel information and position information are obtained by the
CoordAttention module, and finally, the two parts of the features are integrated to obtain a
new output. The benefits of using CA-GhostBotelneck are as follows:
60
Electronics 2023, 12, 4472
• Reduce the number of parameters: the Ghost module can use sparse convolution to
obtain the nature features, improving the lightness of the effect.
• Improve model expressiveness: CoordAttention captures channel and position infor-
mation, allowing more flexible access to global feature information and improving
model expressiveness.
Figure 2. CA-GhostBotelneck with step size 1 on the left, CA-GhostBotelneck with step size 2 on
the right.
Y = X × F1×1 (5)
Y ∈ R H ×W ×Cout are intrinsic features whose sizes are usually smaller than the original
output features, which compensate for the lack of original channel and position information
by having stronger feature information from CoordAttention than Deep-WiseConv, i.e.,
4.4. GhostSE
In this paper, the GhostSE structure is used in the Neck part (as shown in Figure 3), and
the intrinsic features obtained by 1 × 1 convolution have fewer output features than those
obtained by ordinary convolution. Improved access to channel information of feature maps
using SE attention to capture more features to compensate for the feature loss problem
caused by convolution in the Ghost idea, second, residual joining is used to obtain richer
feature output, and finally residual joining is performed using the GhostConvSE module
and GhostBottelneck, reducing the number of parameters and float calculations while
keeping as much feature-rich information as possible.
Given an input feature X ∈ R H ×W ×C with height H, width W, and number of channels
C, output Y via 1 × 1 Conv,Y via SE Attention to output Y , input Y in GhostSE, and
output Z after Add, and finally output the feature map Z, i.e.,
Y = X × F1×1 (8)
Y = Concat Y , Y × FSE (9)
61
Electronics 2023, 12, 4472
Z = Concat Y , Y × FGhostBottelneck (10)
Z = Z × FGhostConv (11)
Figure 3. The left image shows GhostConvSE, which uses SE attention to obtain more feature
information; the right image shows GhostSE.
5. Experiment
5.1. Experimental Environment
The experiment selects the PASCAL VOC dataset commonly used for target detection
for training, which is mainly used for detecting the four major classes of vehicle, household,
animal and person in the environment, and the detection target samples are relatively
abundant. The computer configuration for the experiment is GPU: RTX 3060, CPU: I5-
10400, 16G RAM; the training network environment is Python: 3.9, CUDA12.1.
6. Recall: indicates the proportion of positive samples that are true positive samples.
R = TP/( TP + FN ) (13)
8. F1 score: Combines Precision and Recall to evaluate the performance of the model and
is defined as the harmonic mean of Precision and Recall.
F1 = 2 × P × R/( P + R) (16)
62
Electronics 2023, 12, 4472
By balancing the lightness, detection accuracy and detection speed of the model, this
paper improves the model. By calculating the Efficient value, the model M that balances
the detection efficiency and speed is finally obtained.
5.3. Experiment
In order to verify the overall improvement of GC-YOLO of the design model, this
paper designs several comparative experiments with typical lightweight networks. The
PASCAL VOC dataset is selected, and the experimental dataset is divided into the training
set and the validation set with a ratio of 9:1, the image size is 640 × 640, the training batch
is set to 32, and all reference models are trained for 300 epochs according to this parameter.
The experiments compared the number of model parameters, GFLOPS, mean accuracy
mAP: 0.5, and harmonic mean F1.
As shown in Table 1, the original Yolov5s had 7.28 M parameters and 17.16 G GFLOPS.
With CA-GhostBotelneck and GhostSE’s GC-YOLO, the number of parameters is 2.8 M less
and GFLOPS are 8.53 G less with a slight increase in mAP and F1. The results show that the
model’s feature extraction capability is significantly improved and at the same time the
model’s parameter number is reduced.
The partial detection results of the GC-YOLO model are shown in figures. Figure 4
shows the harmonic mean F1 after training the model on the VOC dataset with the threshold
set to 0.5. The F1 value combines the accuracy and completeness of the model, and is
particularly useful for dealing with category imbalance or focusing on improving both
accuracy and recall. Higher F1 values are an indication of better model performance in
positive sample detection and negative sample exclusion, from the twenty categories in the
figure, it is evident that the F1 values of 12 categories are higher than the average value of
0.72, and there are only a few major fluctuations, which proves that the model achieves a
relatively balanced performance in different categories, has a better generalization ability
in each category, can adapt to different categories, and has an advantage in dealing with
multicategory problems; Figure 5 shows the average accuracy of the model mAP on each
category, combining the prediction accuracy and recall of the model on different target
categories, which is used to measure the overall performance of the model on multiple target
categories, as shown in the figure, the data are more tightly clustered, with 13 categories
exceeding 84.19%, five categories surpassing 90%, and two categories falling below 70%.
This suggests that the model is successful in accurately localizing and identifying the target
object across multiple categories, without excessively focusing on certain categories and
disregarding others, and with exceptional overall performance. Figure 6 shows the "loss
rate" of the model, especially in the security and surveillance area. It reflects the proportion
of targets missed by the model during target detection, and a lower leakage rate indicates
that the model performs better in target detection and is able to capture targets more
comprehensively, from the figure, it is evident that the leakage rate is mainly below 0.3,
with five categories exceeding this threshold. However, the highest leakage rate is only 0.56,
63
Electronics 2023, 12, 4472
indicating that the model has a high recall rate and can detect most of the target objects.
It is also robust to the size and location of different targets, ensuring a consistently low
leakage rate.
In addition to testing GC-YOLO detection on images outside the VOC dataset, in this
paper, an image downloaded from the Internet was used for detection, and the comparative
detection experiment is shown in Figure 7, and it is evident that (a) enhances target
recognition accuracy by approximately 0.05 in unobstructed and approximately 0.1 in
obstructed targets compared to (b). The model’s overall accuracy for recognizing categories
is enhanced, including the recognition of yellow cars.
Figure 4. Harmonic mean F1 values of the GC-YOLO model for the VOC dataset.
Figure 5. Average accuracy of the GC-YOLO model on the VOC dataset (mAP = 84.19%).
64
Electronics 2023, 12, 4472
(a) (b)
Figure 7. (a) GC-YOLO model detection results; (b) original model detection results. In this context,
blue boxes indicate people, while green boxes represent cars. Compared to Figure (b), Figure (a)
displays superior accuracy in identifying individuals, successfully detecting the concealed yellow
car, and demonstrating an increased confidence in identifying the black car. Furthermore, there is a
decreased likelihood of misidentifying a person as a vehicle.
Comparison experiments were also performed between the GC-YOLO model and the
FPS of Yolov5’s real-time detection, as shown in Figure 8. The model introduces attention
to improve the extraction of features, while ensuring the real-time performance of the
lightweight model, and the model’s FPS is relatively smooth.
65
Electronics 2023, 12, 4472
(a) (b)
Figure 8. (a) Shows the FPS detection effect of the GC-YOLO model; (b) shows the FPS detection
effect of the original model. The detection rates of the two graph algorithms have been compared,
and the modified algorithm consistently maintains high performance without any reduction in
detection rates.
The trained model is tested in real scenarios, and the results of the test scenarios are
shown in Figure 11. For intelligent wheelchair obstacle detection in the blind zones on both
sides of the wheelchair in a senior living community environment, side safety is judged
mainly based on the display of incomplete targets. The four images reflect wheels, legs,
feet, and potholes in low vision, and the first three images judge obstacle targets based on
human targets with incomplete displays in low vision.
Figure 9. Average accuracy of the GC-YOLO model on the custom set (mAP = 90.34%).
66
Electronics 2023, 12, 4472
Figure 10. F1 of the GC-YOLO model on the homebrew set (score threshold = 0.5).
(a) Identify the target through spe- (b) Indoor experimental trials.
cific regions, including the wheels.
(c) Data within the elderly commu- (d) From a low angle, the algo-
nity, illustrating the algorithm’s de- rithm’s efficacy in detecting pot-
tection efficacy. holes.
Figure 11. The four panels show the real scene model detection effect. Distinctive colors serve to
discern and display diverse categories: green boxes denote legs, purple boxes symbolize wheels,
yellow boxes indicate feet, and red boxes represent potholes.
The above experimental results show that compared with YOLOv5s, YOLOv4-
mobilenetv3 and other lightweight model algorithms, the real-time performance of the
GC-YOLO model is as stable as that of the native YOLOv5s, but it has been improved in
the number of parameters, GFLOPS, mAP, and the evaluation of the F1 value, and it also
has a very good performance in the custom dataset to perform the safety supervision of the
blind zones of intelligent wheelchair detection.
6. Conclusions
In this paper, we propose a lightweight target detection algorithm GC-YOLO based
on YOLOv5. By improving the network, the model is able to achieve good detection
performance while being lightweight, balancing the relationship between lightweight and
detection performance. In intelligent wheelchairs for the elderly community that have
a good lightweight performance, the research found that the model has good detection
performance in blind spot obstacle detection to avoid potential safety threats. In future
work, the algorithm will deploy on Nvidia Jetson Nano, and cameras will install on both
sides of the wheelchair to detect each side independently. Subsequent experiments will
aim to further improve and optimize the system. However, limitations may arise during
67
Electronics 2023, 12, 4472
the experimental process, as well as during the maintenance and retraining of the model
after deployment on the mobile terminal. When major environmental changes occur, the
model’s performance may diminish. In our future studies, we will explore the integration of
multi-modal or unsupervised learning approaches to improve the model’s responsiveness
to environmental fluctuations and continue our research in this area.
Author Contributions: Conceptualization, S.Z. and J.D.; methodology, Y.C. and J.D.; validation,
S.Z. and C.S.; formal analysis, J.D. and S.Z.; investigation, S.Z.; data curation, S.Z., C.S. and J.D.;
writing—original draft preparation, J.D.; writing—review and editing, S.Z., Y.C., C.S. and J.D. All
authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by Chuzhou University. This work was supported by Anhui
Higher Education Research Program Project under Grant 2022AH010067 (Title: Smart Elderly Care
and Health Engineering Scientific Research Innovation Team).
Data Availability Statement: Data sharing not applicable. Further research is needed.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Ahmadi, A.; Argany, M.; Neysani Samany, N.; Rasooli, M. Urban Vision Development in Order To Monitor Wheelchair Users
Based on The Yolo Algorithm. Int. Arch. Photogramm. Remote. Sens. Spat. Inf. Sci. 2019, XLII-4/W18, 25–27. [CrossRef]
2. Okuhama, M.; Higa, S.; Yamada, K.; Kamisato, S. Improved Visual Intention Estimation Model with Object Detection Using YOLO;
IEICE Technical Report; IEICE Tech: Tokyo, Japan, 2023; Volume 122, pp. 1–2.
3. Chatzidimitriadis, S.; Bafti, S.M.; Sirlantzis, K. Non-Intrusive Head Movement Control for Powered Wheelchairs: A Vision-Based
Approach. IEEE Access 2023, 11, 65663–65674. [CrossRef]
4. Hashizume, S.; Suzuki, I.; Takazawa, K. Telewheelchair: A demonstration of the intelligent electric wheelchair system towards
human-machine. In Proceedings of the SIGGRAPH Asia 2017 Emerging Technologies, Bankok, Thailand, 27–30 November 2017;
p. 1.
5. Suzuki, I.; Hashizume, S.; Takazawa, K.; Sasaki, R.; Hashimoto, Y.; Ochiai, Y. Telewheelchair: The intelligent electric wheelchair
system towards human-machine combined environmental supports. In Proceedings of the ACM SIGGRAPH 2017 Posters,
Los Angeles, CA, USA, 30 July–3 August 2017; p. 1.
6. Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. Pointpillars: Fast encoders for object detection from point clouds.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June
2019; pp. 12697–12705.
7. Meyer, G.P.; Laddha, A.; Kee, E.; Vallespi-Gonzalez, C.; Wellington, C.K. Lasernet: An efficient probabilistic 3D object detector for
autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach,
CA, USA, 15–20 June 2019; pp. 12677–12686.
8. Chen, Q.; Chen, Y.; Zhu, J.; De Luca, G.; Zhang, M.; Guo, Y. Traffic light and moving object detection for a guide-dog robot. J.
Eng. 2020, 13, 675–678. [CrossRef]
9. Ferretti, S.; Mirri, S.; Roccetti, M.; Salomoni, P. Notes for a collaboration: On the design of a wiki-type educational video lecture
annotation system. In Proceedings of the IEEE International Conference on Semantic Computing (ICSC 2007), Irvine, CA, USA,
17–19 September 2007; pp. 651–656.
10. Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589.
11. Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–21 June 2021; pp. 13713–13722.
12. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141.
13. Lee, H. J.; Ullah, I.; Wan, W.; Gao, Y.; Fang, Z. Real-time vehicle make and model recognition with the residual SqueezeNet
architecture. Sensors 2019, 19, 982. [CrossRef] [PubMed]
14. Sheng, T.; Feng, C.; Zhuo, S.; Zhang, X.; Shen, L. A quantization-friendly separable convolution for mobilenets. In Proceedings
of the IEEE 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications
(EMC2), Williamsburg, VA, USA, 25 March 2018.
15. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018;
pp. 4510–4520.
16. Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching
for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27
October–2 November 2019; pp. 1314–1324.
68
Electronics 2023, 12, 4472
17. Nascimento, M.G.; Fawcett, R.; Prisacariu, V.A. Dsconv: Efficient convolution operator. In Proceedings of the IEEE/CVF
International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 5148–5157.
18. Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018;
pp. 6848–6856.
19. Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 2019 International
Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 6105–6114.
20. He, Y.; Zhang, X.; Sun, J. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE 2017
International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1389–1397.
21. Cai, Y.; Li, H.; Yuan, G.; Niu, W.; Li, Y.; Tang, X.; Ren, B.; Wang, Y. Yolobile: Real-time object detection on mobile devices via
compression-compilation co-design. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February
2021; pp. 955–963.
22. Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and training of neural
networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2704–2713.
23. Yang, Y.; Sun, X.; Diao, W.; Li, H.; Wu, Y.; Li, X.; Fu, K. Adaptive knowledge distillation for lightweight remote sensing object
detectors optimizing. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [CrossRef]
24. Qu, J.; Chen, B.; Liu, C.; Wang, J. Flight Delay Prediction Model Based on Lightweight Network ECA-MobileNetV3. Electronics
2023, 12, 1434. [CrossRef]
25. 25 Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European
Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19.
26. Zhu, X.; Cheng, D.; Zhang, Z.; Lin, S.; Dai, J. An empirical study of spatial attention mechanisms in deep networks. In
Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November
2019; pp. 6688–6697.
27. Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020;
pp. 11534–11542.
28. Tang, Y.; Han, K.; Guo, J.; Xu, C.; Xu, C.; Wang, Y. GhostNetv2: Enhance cheap operation with long-range attention. Adv. Neural
Inf. Process. Syst. 2022, 35, 9969–9982.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
69
electronics
Article
Stratified Sampling-Based Deep Learning Approach to Increase
Prediction Accuracy of Unbalanced Dataset
Jeyabharathy Sadaiyandi 1 , Padmapriya Arumugam 1, *, Arun Kumar Sangaiah 2,3 and Chao Zhang 4, *
Abstract: Due to the imbalanced nature of datasets, classifying unbalanced data classes and drawing
accurate predictions is still a challenging task. Sampling procedures, along with machine learning
and deep learning algorithms, are a boon for solving this kind of challenging task. This study’s
objective is to use sampling-based machine learning and deep learning approaches to automate
the recognition of rotting trees from a forest dataset. Method/Approach: The proposed approach
successfully predicted the dead tree in the forest. Seven of the twenty-one features are computed
using the wrapper approach. This research work presents a novel method for determining the state
of decay of the tree. The process of classifying the tree’s state of decay is connected to the issue
of unequal class distribution. When classes to be predicted are uneven, this frequently hides poor
performance in minority classes. Using stratified sampling procedures, the required samples for
precise categorization are prepared. Stratified sampling approaches are employed to generate the
necessary samples for accurate prediction, and the precise samples with computed features are input
into a deep learning neural network. Finding: The multi-layer feed-forward classifier produces the
greatest results in terms of classification accuracy (91%). Novelty/Improvement: Correct samples are
necessary for correct classification in machine learning approaches. In the present study, stratified
Citation: Sadaiyandi, J.;
samples were considered while deciding which samples to use as deep neural network input. It
Arumugam, P.; Sangaiah, A.K.;
suggests that the proposed algorithm could accurately determine whether the tree has decayed or not.
Zhang, C. Stratified Sampling-Based
Deep Learning Approach to Increase
Keywords: machine learning; deep learning; imbalanced datasets; stratified sampling; prediction;
Prediction Accuracy of Unbalanced
Dataset. Electronics 2023, 12, 4423.
classification; accuracy; wrapper classes
https://doi.org/10.3390/
electronics12214423
amounts of carbon from the atmosphere. The carbon stored in forest biomass is a crucial
element of healthy forest ecosystems and the global carbon cycle.
Forests store carbon in various forms that can be challenging to accurately quantify.
The estimation of carbon storage in forests depends on several factors, including the
density of tree wood, decay class, and density reduction factors. Accurate estimations of
carbon storage in forests are essential for effective carbon flux monitoring. Moreover, the
classification of forest data is critical in determining the health and productivity of forest
ecosystems. Forest classification algorithms can help identify various features of forests,
such as tree species, forest density, and biomass, which are essential in monitoring changes
in forest structure and function.
Forest-based accurate classification can also help to predict the occurrence and spread
of forest disturbances like wildfires, insect infestations, and diseases. Such disturbances can
cause significant losses of carbon from forests, negatively impacting the planet’s ecological
balance. Therefore, the development of accurate and robust classification algorithms for
forest datasets is critical for maintaining healthy forest ecosystems and mitigating the
impact of natural disasters on the environment. In the realm of predicting tree decay rates
in forests, past research has mainly focused on using regression techniques. However, these
methods may not be suitable for distinguishing individual dead trees within a forest.
Deep neural network (DNN) architecture is aimed at detecting individual dead trees
within the forest more accurately in this study. For that, this research work proposed a novel
approach to deal with imbalanced datasets using sampling techniques. The imbalanced
nature of forest datasets can make predictions less accurate, particularly when most data
points belong to a single class (e.g., living trees). Therefore, by employing sampling
techniques, we balanced the dataset, which improved the accuracy of predictions for both
dead and living trees. This ultimately improves the accuracy of predictions made with
unbalanced forest datasets. The organization of this research work is as follows. The dataset
used for this research work is described first. Then, we employ a DNN with sampling
techniques to forecast both dead and living trees. This method was then compared to other
techniques for its efficacy. Finally, we present our findings and future directions.
Overall, the development of DNN architecture for predicting individual dead trees
in forests, coupled with sampling techniques to handle imbalanced datasets, can raise
prediction accuracy and contribute to better forest management. It enables forest managers
to conserve and protect the forest ecosystem by making informed decisions.
2. Literature Review
In general, the process of classifying unbalanced datasets consists of three steps:
selecting features, fitting the data distribution, and training a model. The review of the
literature is presented below in Table 1.
72
Electronics 2023, 12, 4423
Table 1. Cont.
The goal of feature selection is to identify subsets of features that are most suited
for classifying the unbalanced data while considering the feature class imbalance. This
contributes to the development of a more efficient classifier [10–13]. To limit the impact
of class imbalances on the classifier, most data preparation procedures, such as various
resampling techniques, are used to adjust the data distribution [14–17]. These techniques
significantly balance the datasets.
Model training to accommodate unequal data distribution requires primarily adding
an enforcement algorithm to an existing classification approach or applying ensemble
learning. Standard cost-sensitive learning is an example of the latter [18–20]; it improves
minority class classification accuracy by increasing the weights of the class samples. Clas-
sification accuracy can be achieved via ensemble learning techniques like boosting and
bagging [21–23].
Distribution-level data resampling will resolve the class imbalance. The most signifi-
cant advantage of this methodology is that the sampling method and the classifier training
procedure are independent of one another. Typically, the sample distribution of the training
set is changed at the data preprocessing stage to decrease or eliminate class imbalance. The
representative methods consist of a few resampling strategies, with the two main categories
being oversampling and undersampling.
Oversampling entails adding appropriately created new points to increase the sample
points in a minority class to attain sample balance. The synthetic minority oversampling
method (SMOTE) and several of its variants, as well as ROS, are examples of prevalent
algorithms [24]. SMOTE generates synthetic samples and inserts them between a given
sample and its neighbors, whereas datasets are balanced by ROS by adding minority
sample points at random.
Xnew = X j + rand(0, 1)· Xi − X j
73
Electronics 2023, 12, 4423
undersampling approach is RUS [24], which discards majority class samples at random.
To balance the magnitude of primary class samples with the least class samples, another
undersampling strategy uses appropriate majority class samples. The training set will be
more evenly distributed because of this method, which will also improve the classification
accuracy of minority class samples. The disadvantage is that a sizable portion of the
majority class sample characteristics could be lost, and the model might not fully acquire
the majority class sample properties. As a result, it is crucial to make sure that the learning
process is set up so that the bulk of the information covered in class is retained.
74
Electronics 2023, 12, 4423
Figure 1. Stratified sampling-based deep neural network (SSDNN) approach for predicting decay
class of forest trees.
75
Electronics 2023, 12, 4423
Attributes Description
Log num Log number
Species Four categories of trees in this region
Time The tree’s age in years
Year Year of the tree
Subtype Hard, soft, and other tree types
Rad pos The location of the measurement
D1 Tree circumference
D2 Tree’s circumference in various positions
D3 Tree’s circumference in various positions
D4 Tree’s circumference in various positions
VOL1 Tree’s volume
VOL2 Tree’s volume
Wet Wt Weight of the water content in the tree
DRYWT The dried weight of the tree
MOIST Wood’s moisture content
Decay The tree’s level of decay
WDENSITY The tree’s wood density with respect to vol1
Den2 The tree’s wood density with respect to Vol2
Knot Vol The wood’s volume at a knot
Sample Date Sample collected date
Comments Other features of the tree
The model is iteratively trained on several subsets of features using the wrapping
technique, and the best subset of features is chosen. The choice of the feature subset
selection is based on the inferences from the model. A feature selection strategy called
backward elimination starts with a model that incorporates all the available features and
gradually eliminates the least significant ones until a stopping requirement is met. This
strategy, also known as a wrapper, is typically combined with statistical models to choose a
subset of important features. By repeatedly removing elements that are the least significant
based on the selected significance level, backward elimination assists in identifying the
most pertinent characteristics. Table 3 shows the extracted features using feature selection
methods for further processing. Before assessing the feature subsets, these strategies train
and test the model using a variety of feature combinations. The strategy reduces overfitting
and eliminates pointless or unnecessary features to enhance the model’s performance
and interpretability.
Attributes Description
Species Four categories of trees in this region
Year Tree’s age
D1 Tree’s circumference
VOL1 Tree’s volume
DRYWT The dried weight of the tree
WDENSITY Tree’s wood density based on vol1
76
Electronics 2023, 12, 4423
In the experimental dataset, the explanatory variables Species, Diameter, Volume, Wet
Weight, Dry Weight, and Decay are considered for multiple linear regression, and the target
variable is Wood density Wi of the tree.
The prediction equation is given below.
77
Electronics 2023, 12, 4423
78
Electronics 2023, 12, 4423
DNNs can handle both linear and nonlinear issues by monitoring the probability of
each output layer by layer with an appropriate activation function. In essence, DNNs are
fully linked neural networks. A deep neural network is sometimes known as a multi-layer
perceptron (MLP). The hidden layer alters the input feature vectors, which eventually
arrive at the output layer, where the binary classification result is obtained.
Environments have been interested in determining functional links between carbon
storage and plant uncertainty of wood density; an appropriate technique is required.
Developing empirical models to forecast the DECAY CLASS of the tree is the focus of
this research. A deep neural network, a subset of expert systems, predicts the DECAY
CLASS of the tree more accurately than standard models. Because there was no constraint
for constructing models in DNN, the outcomes are more accurate predictions than the
ensemble model. The data loss was achieved using the training data, as shown in the
topology of the model, implying that there was no overfitting.
The suggested work’s learning model has four layers: one input layer, two hidden
layers, and one output layer is shown in Figure 4. At the last three layers, the ReLu
activation function was utilized, and the sigmoid function was used at the output layer. The
binary cross-entropy loss between the input was used to establish the objective function,
79
Electronics 2023, 12, 4423
which should be minimized in the NN. Adam’s optimization was chosen above other
existing optimization techniques because it was more efficient. To create a model, each
dataset was first randomly divided into two parts: a 75% training set and a 25% test set.
The training set is examined for skewness and, if necessary, balanced using a stratified
sampling procedure. The balanced training set is then used to develop DNN models and
train them, while the test sets are utilized to evaluate the performance of the predictive
models. We used the following easy method to choose the best threshold. The curve of
balanced accuracy as a function of prediction is first plotted. The best threshold was finally
determined to be where the DNN achieved the most balanced accuracy. The unbalanced
learn library from Python was then used to apply each data-balancing technique to each
training batch. The model has been tried with different numbers of mini-batches as 10, 50,
25, and 100 and determined 100 as the best choice with epoch sizes as 10, 25, and 50.
80
Electronics 2023, 12, 4423
Hyperparameter Value/Type
Hidden Layers 2
Neurons 400
Optimizer Adam
Hidden Layer ReLu
Output Layer Softmax
Epochs 10, 25, 50
Batch size 100
When the number of epochs increases, the accuracy of the proposed method also
increases, and we obtain maximum accuracy when the epoch is closer to 100. The built
model is compared with the existing models, and the performance is analyzed in the results
and discussion section.
81
Electronics 2023, 12, 4423
The performance of the proposed SSDNN method with different existing sampling
techniques is shown in Figure 6.
The DNN, DNN+ oversampling, DNN+ undersampling, DNN+ SMOTE, and
DNN+ stratified sampling yields test accuracy of 80%, 76%, 69%, 78%, and 91%, respec-
tively. First, the DNN model was created and tested on the prepared dataset, yielding low
accuracy. The DNN model was analyzed for the reason of yielding low accuracy, and it was
found that the dataset was unbalanced. The imbalanced dataset was subsequently handled
using a stratified sampling technique, which divided the training dataset into groups of
distinct strata for each class. The data from each stratum was distributed uniformly to the
deep neural network, resulting in good accuracy, precision, recall, and F1 score. Several
tests using the tree dataset were carried out to determine the optimal deep neural network.
82
Electronics 2023, 12, 4423
The training, as well as testing accuracy and loss of the proposed SSDNN, is visualized
in Figure 6. From the figure, during the initial epochs, accuracy is not appreciable, and at
the same time, the loss is highly noticeable. But in the subsequent epochs, the results are
more promising. Similarly, the same parameters are analyzed for the testing phase. The
testing phase also has the same impact on model accuracy and model loss. To observe the
variations more clearly, the chart is prepared up to 25 epochs.
Also, the training/testing accuracy and loss of the proposed method is shown in
Figure 7. The proposed DNN + stratified sampling results in an accuracy of 91% with higher
efficiency. The proposed model was compared to the ensemble SVM kernel algorithm
used in prior work, and the results show that the proposed DNN + stratified model is
more efficient. The proposed method is robust compared to the traditional methods due to
hyper-tuning, low false positive, and high recall.
83
Electronics 2023, 12, 4423
Figure 7. Performance in terms of training/testing accuracy, as well as loss of the proposed SSDNN.
84
Electronics 2023, 12, 4423
5. Conclusions
In this research, we experimented to find the best model to classify the forest tree
as a dead or live tree. For predicting the decay class of a tree, the classification models
DNN, DNN+ oversampling, DNN+ undersampling, DNN+ SMOTE, and DNN+ stratified
sampling were applied to the dataset. The results show that DNN+ stratified sampling
offers better performance with high accuracy.
The proposed method correctly classifies a tree as either dead or alive compared to
other models. The proposed model will be suitable to handle any imbalanced dataset for
classification. In deep learning, classification accuracy often increases when the amount
of data used for training increases; thus, using a larger dataset for training can be a
good research direction to continue improving our classification accuracy of forest tree
classification. This paper suggests that identifying decaying trees earlier will help forest
managers in removing them before they begin to emit carbon back into the atmosphere.
This research promotes reforestation by planting a new tree after removing a dead
tree to reduce pollution and forest fires. In the case of stratified sampling, the research gap
discovered is that the number of records in both classes is not equal; hence, deficit records
occur when training the model. To address this issue, the deficit class is oversampled,
strata are shuffled, and the model is trained to increase model efficiency. In future work,
the proposed method can be applied to smart forest management. Since there may be
uneven data or irrelevant data during data collection, we can use IOT-based RFID for
each tree to automate data collection for the tree and also to indicate its level of decay and
carbon absorption.
Author Contributions: Conceptualization, P.A.; methodology, J.S. and P.A.; validation, A.K.S. and
C.Z.; writing—original draft preparation, P.A. and J.S.; writing—review and editing, A.K.S. and C.Z.
All authors have read and agreed to the published version of the manuscript.
Funding: This work was supported by the Rashtriya Uchchatar Shiksha Abhiyan (RUSA) Phase 2.0
[grant sanctioned vide Letter No.F.24-51/2014-U, Policy (TNMulti-Gen), Department of Education,
Government of India, Date 9 October 2018].
Data Availability Statement: https://andrewsforest.oregonstate.edu/data (accessed on 11 Septem-
ber 2023).
Conflicts of Interest: The authors declare that they have no conflict of interest.
References
1. Briechle, S.; Krzystek, P.; Vosselman, G. Silvi-Net—A dual-CNN approach for combined classification of tree species and standing
dead trees from remote sensing data. Int. J. Appl. Earth Obs. Geoinf. 2021, 98, 102292. [CrossRef]
2. Karatas, G.; Demir, O.; Sahingoz, O.K. Increasing the performance of machine learning-based IDSs on an imbalanced and
up-to-date dataset. IEEE Access 2020, 8, 32150–32162. [CrossRef]
3. Cao, L.; Shen, H. CSS: Handling imbalanced data by improved clustering with stratified sampling. Concurr. Comput. Pr. Exp.
2020, 34, e6071. [CrossRef]
4. Li, K.; Chen, X.; Zhang, R.; Pickwell-MacPherson, E. Classification for Glucose and Lactose Terahertz Spectrums Based on SVM
and DNN Methods. IEEE Trans. Terahertz Sci. Technol. 2020, 10, 617–623. [CrossRef]
5. Mînăstireanu, E.-A.; Mes, nit, ă, G. Methods of Handling Unbalanced Datasets in Credit Card Fraud Detection. BRAIN. Broad Res.
Artif. Intell. Neurosci. 2020, 11, 131–143. [CrossRef]
6. Shoohi, L.M.; Saud, J.H. DCGAN for Handling Imbalanced Malaria Dataset based on Over-Sampling Technique and using CNN.
Medico-Legal Update 2020, 20, 1079–1085.
7. Sheikh, T.S.; Khan, A.; Fahim, M.; Ahmad, M. Synthesizing data using variational autoencoders for handling class imbalanced
deep learning. In Proceedings of the International Conference on Analysis of Images, Social Networks and Texts, Kazan, Russia,
17–19 July 2019; pp. 270–281.
8. Elreedy, D.; Atiya, A.F. A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class
imbalance. Inf. Sci. 2019, 505, 32–64. [CrossRef]
9. Oberle, B.; Ogle, K.; Zanne, A.E.; Woodall, C.W. When a tree falls: Controls on wood decay predict standing dead tree fall and
new risks in changing forests. PLoS ONE 2018, 13, e0196712. [CrossRef]
85
Electronics 2023, 12, 4423
10. Tallo, T.E.; Musdholifah, A. The Implementation of Genetic Algorithm in Smote (Synthetic Minority Oversampling Technique)
for Handling Imbalanced Dataset Problem. In Proceedings of the 2018 4th International Conference on Science and Technology
(ICST), Yogyakarta, Indonesia, 7–8 August 2018; pp. 1–4. [CrossRef]
11. Moayedikia, A.; Ong, K.-L.; Boo, Y.L.; Yeoh, W.G.; Jensen, R. Feature selection for high dimensional imbalanced class data using
harmony search. Eng. Appl. Artif. Intell. 2017, 57, 38–49. [CrossRef]
12. Maldonado, S.; López, J. Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM
classification. Appl. Soft Comput. 2018, 67, 94–105. [CrossRef]
13. Maldonado, S.; Weber, R.; Famili, F. Feature selection for high-dimensional class-imbalanced data sets using Support Vector
Machines. Inf. Sci. 2014, 286, 228–246. [CrossRef]
14. Ng, W.W.; Hu, J.; Yeung, D.S.; Yin, S.; Roli, F. Diversified sensitivity-based under-sampling for imbalance classification problems.
IEEE Trans. Cybern. 2014, 45, 2402–2412. [CrossRef] [PubMed]
15. Sáez, J.A.; Krawczyk, B.; Woźniak, M. Analyzing the oversampling of different classes and types of examples in multi-class
imbalanced datasets. Pattern Recogn. 2016, 57, 164–178. [CrossRef]
16. Gónzalez, S.; García, S.; Lázaro, M.; Figueiras-Vidal, A.R.; Herrera, F. Class Switching according to Nearest Enemy Distance for
learning from highly imbalanced data-sets. Pattern Recognit. 2017, 70, 12–24. [CrossRef]
17. Cao, L.; Shen, H. Imbalanced data classification using improved clustering algorithm and under-sampling method. In Proceedings
of the 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, Gold Coast, Australia,
5–7 December 2019.
18. Cheng, F.; Zhang, J.; Wen, C.; Liu, Z.; Li, Z. Large cost-sensitive margin distribution machine for imbalanced data classification.
Neurocomputing 2016, 224, 45–57. [CrossRef]
19. Cao, C.; Wang, Z. IMCStacking: Cost-sensitive stacking learning with feature inverse mapping for imbalanced problems.
Knowl.-Based Syst. 2018, 150, 27–37. [CrossRef]
20. Ohsaki, M.; Wang, P.; Matsuda, K.; Katagiri, S.; Watanabe, H.; Ralescu, A. Confusion-Matrix-Based Kernel Logistic Regression for
Imbalanced Data Classification. IEEE Trans. Knowl. Data Eng. 2017, 29, 1806–1819. [CrossRef]
21. Sun, Z.; Song, Q.; Zhu, X.; Sun, H.; Xu, B.; Zhou, Y. A novel ensemble method for classifying imbalanced data. Pattern Recognit.
2015, 48, 1623–1637. [CrossRef]
22. Feng, W.; Huang, W.; Ren, J. Class Imbalance Ensemble Learning Based on the Margin Theory. Appl. Sci. 2018, 8, 815. [CrossRef]
23. Chen, Z.; Lin, T.; Xia, X.; Xu, H.; Ding, S. A synthetic neighborhood generation based ensemble learning for the imbalanced data
classification. Appl. Intell. 2018, 48, 2441–2457. [CrossRef]
24. Japkowicz, N. The class imbalance problem: Significance and strategies. In Proceedings of the 2000 International Conference on
Artificial Intelligence (IC-AI’2000), Las Vegas, NV, USA, 26–29 June 2000.
25. Zhao, X.; Liang, J.; Dang, C. A stratified sampling based clustering algorithm for large-scale data. Knowl.-Based Syst. 2019, 163,
416–428. [CrossRef]
26. Available online: https://www.nal.usda.gov/data/find-data-repository (accessed on 10 October 2023).
27. Wang, W.; Zhao, Y.; Zhang, T.; Wang, R.; Wei, Z.; Sun, Q.; Wu, J. Regional soil thickness mapping based on stratified sampling of
optimally selected covariates. Geoderma 2021, 400, 115092. [CrossRef]
28. Alogogianni, E.; Virvou, M. Handling Class Imbalance and Class Overlap in Machine Learning Applications for Undeclared
Work Prediction. Electronics 2023, 12, 913. [CrossRef]
29. Wu, Z.; Wang, Z.; Chen, J.; You, H.; Yan, M.; Wang, L. Stratified random sampling for neural network test input selection. Inf.
Softw. Technol. 2023, 165, 107331. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
86
electronics
Article
Comparison of Selected Machine Learning Algorithms in the
Analysis of Mental Health Indicators
Adrian Bieliński, Izabela Rojek * and Dariusz Mikołajewski
Abstract: Machine learning is increasingly being used to solve clinical problems in diagnosis, therapy
and care. Aim: the main aim of the study was to investigate how the selected machine learning
algorithms deal with the problem of determining a virtual mental health index. Material and Methods:
a number of machine learning models based on Stochastic Dual Coordinate Ascent, limited-memory
Broyden–Fletcher–Goldfarb–Shanno, Online Gradient Descent, etc., were built based on a clinical
dataset and compared based on criteria in the form of learning time, running time during use and
regression accuracy. Results: the algorithm with the highest accuracy was Stochastic Dual Coordinate
Ascent, but although its performance was high, it had significantly longer training and prediction
times. The fastest algorithm looking at learning and prediction time, but slightly less accurate, was
the limited-memory Broyden–Fletcher–Goldfarb–Shanno. The same data set was also analyzed
automatically using ML.NET. Findings from the study can be used to build larger systems that
automate early mental health diagnosis and help differentiate the use of individual algorithms
depending on the purpose of the system.
Keywords: computer science; artificial intelligence; machine learning; burnout; clinical reasoning
(a)
(b)
(c)
(d)
(e)
Figure 1. Number of scientific publications: (a) concerning clinical applications of machine learning
(total number of publications: 103,017), (b)with keywords “machine learning” and “clinical problem
solving” (total number of publications: 113), (c) with keywords “machine learning” and “diagnosis”
(total number of publications: 37,242), (d) with keywords “machine learning” and “prediction”
(total number of publications: 50,619), and (e) with keywords “machine learning” and “mental
health” (2332).
88
Electronics 2023, 12, 4407
Related Publications
There are many articles in the literature on the virtual mental health index. Each of
them stands out from the others, approaching the topic from a different point of view.
One article addresses the topic of e-health and modern technologies used in mental health
care [8,9]. It is indicated that the aim of the article is to present issues related to e-health,
and its elements used in the diagnosis and treatment of patients with mental disorders.
The article points out that there is a lot of enthusiasm for e-health issues around the world,
which may be related to the transformation potential of the healthcare system [8,9]. The
article points out that e-health solutions have been shown to be effective in preventing,
diagnosing and treating patients with a variety of illnesses, both physical and mental [9],
including substance abuse, depression, bipolar disorder, anxiety, stress and/or suicidal
thoughts. This article adopts the World Health Organisation’s (WHO) definition of e-health.
In addition, differences between the original and the newer definition are pointed out, as
the newer definition describes it as the use of electronic means of communicating health-
related information, resources and services, whereas the original definition presented the
concept as the use of information technology, locally and remotely, in support of health and
related fields. The newer definition according to the WHO also includes electronic health
records, mobile health and health analytics. An important change was also indicated in the
context of the patient–professional relationship, i.e., the patient participates as a partner
in the diagnosis and treatment process, rather than being merely a passive figure. An
increase in patients’ responsibility for their own treatment, an increase in their involvement
in treatment decisions or a tendency to use strengthening and improvement exercises
were also noted. It was also mentioned that inviting the patient into the e-health system
does not imply patient involvement. The studies mentioned in this article identified three
different types of involvement: active, partner and submissive [8,9]. Mobile apps used
in practice were also identified, including for practicing stress management skills, in the
diagnosis and treatment of depression, and as an aid to screening. The cited authors
indicated that apps could be used to monitor mental status and mood, as well as bipolar
affective disorder [8,9]. This article presents modern technology as an opportunity for the
development of medicine, including in the context of mental health. The article draws
on a number of sources, indicating that these are not isolated, exceptional situations. It
is noteworthy that it was written before the onset of the problems associated with the
COVID-19 pandemic. This article provides an interesting insight into the applications of
technology not only in treatment but also in prevention. In contrast, another article [10]
deals with the use of ML techniques to predict stress in active workers. As an introduction,
the prevalence of mental disorders among the working class was highlighted, with a clear
upward trend when looking at the percentage of employees who experience depressive and
anxious states. It was concluded that the greatest emphasis must be placed on maintaining a
stress-free atmosphere in order to achieve better productivity and well-being of employees.
The authors [10] used the results of a survey of technology employees in 2017, with which
they trained various models for their analyses. The original data consisted of 750 responses
from people from different technical departments in the form of 68 attributes related to
private life and work. A data cleaning exercise was carried out, which left 14 parameters, in
addition to which a one-hot encoding (1 of n) was used to represent some fields as numeric.
In addition, the text responses ‘Yes’ were given a value of 1, ‘No’ a value of 0, and ‘Maybe’ a
value of 0.5. NaN values were replaced by 0, and nominal data were converted to numeric
using a label encoder. The authors chose models for training that had already been tested
in classification problems, implementing them in Python using the Scikit-learn library:
• Logistic regression;
• K-nearest-neighbormethod;
• Decision trees;
• Random forest;
• Boosting (increasing the effectiveness of existing models);
• Bagging.
89
Electronics 2023, 12, 4407
90
Electronics 2023, 12, 4407
• Misfit;
• Moves;
• Myfitnesspal;
• Strava [11].
Based on the data collected, it was discovered that:
• Examined daily activity time received from wearable devices was greater than that
derived from the mobile phone app;
• Of the 43 participants from whom at least three daily activity observations were ob-
tained, 11 of them had at least 20% missing data between the first and last observation,
but this did not show a relationship with DASS-21 scores;
• For the remaining 32 participants, entropy techniques were used, which initially
showed no significant relationship between data and DASS-21 scale scores. It was not
until splitting into two equal groups in relation to the amount of data that a significant,
positive correlation was detected between the DASS-21 anxiety subscale and entropy
in those with more data [11].
The authors [11] point to the lack of standardized systems for continuous mental
health monitoring, which, together with continued monitoring in specific time windows,
has contributed to the escalation of the problem. They note that people with mental health
conditions are generally willing to share information from their mobile phones to help
with research into these conditions, including serious illnesses. The authors present their
work as a proof of concept for continuous mental health monitoring of mental health,
but note the challenges of privacy, assessment and clinical integration and inclusion that
would need to be addressed before it is more widely accepted. Another article [12], which
deals with the determination of a voice-based mental health indicator using a mind-state
observation system, explores the validity of such an approach. It draws attention to the huge
cost of mental illness in developed countries and the need for early detection technology
for depression and stress. Light is also shed on the current state of screening methods
in the context of mental illness, including general health questionnaires (questionnaires
including the General Health Questionnaire (GHQ) or the Beck Depression Index (BDI)).
The effectiveness of such approaches in assessing disease conditions in the early stages
was highlighted, and the problems of reporting bias, i.e., the effect of consciously or
unconsciously under- or overestimating a patient’s self-report, as well as the problem of
reduced detection rates of mental illness in organizations with established hierarchies, were
also noted. The authors of [12] report on their active research and work on voice-based
mental health estimation. They list additional advantages of this approach:
• Ease of application;
• Possibility to monitor day by day, which conventional methods do not allow.
They have developed a software development toolkit (SDK) called MIMOSYS (https:
//medical-pst.com/en/products/mimosys/ accessed on 11 September 2023), whose
features include:
• Recording a voice from a microphone;
• Analyzing this voice;
• Determining a health indicator based on this.
To enable daily monitoring, the authors developed a mobile app using MI-MOSYS.
The aim of the study was to compare the indicator defined in the app with the BDI indicator.
The study was carried out with the support of the local authority, which provided mobile
phones with the mobile app installed for 50 company employees. The test participants had
to record their voices by reading out ready-made phrases and talking using the device they
were given. In addition, a BDI test was conducted at the beginning of the experiment. The
voice analysis was based on the fact that people with mental illness show changes in the
expression of emotions and changes in the proportions of the components of the voice. The
four components hidden in the voice—anger, sadness, joy and calmness—were calculated
from the characteristics of the recorded voice. In addition, the degree of excitement of the
91
Electronics 2023, 12, 4407
respondent was determined. Taking these values into account, a short-term and a medium-
term index of psychological well-being was determined, the latter based on short-term
indices collected over a two-week period. As a result of the experiment, the correlation
value was determined to be negative, with a value of 0.208 for the short-term value and
0.285 for the medium-term value. A lower correlation coefficient value was obtained for
telephone calls, below 0.2 [12]. For the optimal cut-off, the following values of sensitivity,
specificity and accuracy were obtained when analyzing the ROC curve:
• 0.795; 0.643; 0.660 for the short-term indicator;
• 1.000; 0.605; 0.646 for the medium-term indicator [12].
In the context of this research, the weak negative correlation between the indices from
the app and the BDI was understandable, as a lower mental health index was associated
with a higher rate of depression. Finally, the performance of the method in distinguishing
between individuals with a high BDI was shown to confirm the appropriateness of the
method. The efficiency of data accumulation was also noted, and furthermore, the results
indicated that such a system could complement routine screening. However, the authors
have set their sights on the commercialization of the product, as they do not disclose details
in the form of the algorithms used or the scheme of operation of the system. Furthermore,
it is not possible to download this toolkit without first contacting them via a form, which
presumably means that it is made available for a fee. In addition, the library (Sensibility
Technology) underpinning this software is also unavailable.
In [13], mental health before and during the COVID-19 pandemic was compared using
a large probability sample from the UK population. The coronavirus and the methods
used to slow its spread had a serious impact on people’s livelihoods, incomes and debts,
and was associated with serious concerns about an uncertain future. The authors of
this publication [13] drew attention to the limited research on mental health during the
pandemic, due to problems such as:
• Use of incomplete samples;
• Use of unverified or modified assessment tools;
• Lack of comparable pre-pandemic data to measure change.
Their study [13] was based on a large-scale survey conducted since 2009, including
people aged 16 years and older. In addition, invitations to participate in the COVID-19
online survey were sent to participants in the last two series of surveys via emails, text
messages and even letters. The pre-pandemic health assessment was based on data collected
since 2014, and the data included results from the GHQ-12 questionnaire (a valid tool for
assessing general mental health problems in the past two weeks, particularly effective in
large-scale surveys). This scale was scored in two ways, the first based on a mean value
and the second based on a binary threshold above which individuals were judged to have
a significant level of mental health problems. The rating scale of this questionnaire for each
question ranged from 0 to 3 (from no deviation to significant deviation). The authors [13]
also carried out analyses by gender, age ranges, geographical location, or looking at the
data from an ethnic perspective. Estimates of total annual income, employment status,
living with a partner, age of the youngest child in the family were also analyzed, and a
group of people at risk and those involved in COVID-19 was identified. Years with a small
number of observations were excluded from the study, which may have led to less accurate
estimates. Changes in mental health were also assessed using regression [13]. These models
only included people for whom data from both the COVID-19 survey and at least one
pre-pandemic data set were available, therefore 16- and 17-year-olds were excluded from
this section. The value of the GHQ-12 index was constructed during the pandemic and
placed in a time-variable model where average scores were used as the baseline, instead
of using a binary index, as this would affect the statistical power of the results and their
generalization. The final model included the following factors:
• Age;
• Sex;
92
Electronics 2023, 12, 4407
• Family income;
• Employment status;
• Living with a partner;
• Presence of risk factors [13].
Various patterns related to variables have been detected, including [13]:
• Higher GHQ-12 scores in women;
• Higher scores in younger age groups;
• Slight differences in ethnicity (apart from the difference between Asians and white
British—Asians scored higher);
• Slightly lower results were recorded outside cities;
• Higher scores in low-income families;
• Unemployed and professionally inactive people scored higher than employed and
retired people;
• People without a partner and with young children had higher scores, as did the
risk groups;
• Significant increase in average scores was noticed comparing the state before and
during the pandemic [13].
The authors present their publication as one of the first in their country to measure
the impact of the pandemic on the mental health of the population. The increase in mental
health problems was not even among the designated groups. However, towards the end,
they conclude that the increase was not significant, but point out the need for further
studies spread over time, even postponed by half a year. They note that although GHQ-12
is a screening tool, it is not a clinical diagnosis. In the publication [14], it was mentioned that
in the coming years, a radical change will be needed, consisting of attaching the patient’s
mental health profile to provide him with better treatment and help him recover faster.
It was also noted that there has already been discussion about how medical predictive
analytics could revolutionize healthcare globally. Factors affecting mental health include:
• Globalization;
• Pressures in the workplace;
• Competition [14].
The authors of [14] claim that the K-nearest neighbor’s method, the naive Bayes
classifier, or regression can be used to build the model. In their approach to identifying
mental health, they used classification and clustering algorithms. They note the need for
early diagnosis of deviations in mental health. The WHO report urged the nations of the
world to harness the power of knowledge and technology to tackle mental health. They list
some of the mental health assessment tools:
• Questionnaires;
• Sensors of wearable devices;
• Biological signals [14].
They also mention work on statistical relationships between mental health and other
parameters, including:
• Educational achievements;
• Socioeconomic achievements;
• Satisfaction with life;
• Quality of interpersonal relations;
They also list various assessment methods [14] appearing in other works:
• Regression analysis;
• K-nearest neighbors method;
• Decision trees;
• Support vector method;
• Fuzzy logic;
• K-means method [14].
93
Electronics 2023, 12, 4407
In their work [14], they started the analysis with clustering in order to better under-
stand the data—obtaining certain groups, however, without any interpretation. They list
and describe commonly used clustering methods:
• K-means;
• Hierarchical;
• Based on density;
• And their variants [14].
In addition, they presented frequently used indicators for validating clustering and
applied the concept of the Mean Opinion Score (MOS) scale, used for subjective quality
assessment. Their questionnaire consisted of 20 questions, posed to two populations: the
first included 300 people aged 18 to 21, and the second 356 people aged 22 to 26. The rating
scale for each question was five-point, from 1 (almost never) to 5 (almost always). The
division into a set of training and test data was in the ratio of 80:20. In terms of validity, the
best of all models were: bagging and random forest (0.90), slightly worse support vectors
and K-nearest neighbors (0.89), and even worse logistic regression (0.84) and decision tree
(0.81). The worst result was achieved by the naive Bayes classifier (0.73). It should be noted
that the bagging algorithm uses multiple decision trees, trained on the basis of subsets of
data selected by sampling with return. The remaining, undrawn data becomes the testing
set. For already-built tree models, voting is used to get the final answer. The authors [14]
pointed out that the quality of the features affects the reliability of the produced models,
and they also propose the use of a feature subset selection strategy to shorten the learning
time, or fuzzy logic when the number of classes is increased. In addition, they propose
recursive neural networks as a possible option for larger data sets, also ensuring high
accuracy. The authors of the publication [15], on the other hand, note the lack of a global
definition for positive mental health, presenting various approaches to this issue. They
mention the observation that definitions of good mental health are, and should be, to
some extent context-dependent. The Public Health Agency of Canada, mentioned by the
authors of [15], refers to positive mental health as the ability to feel, think and act in a way
that strengthens the ability to enjoy life and cope with the problems encountered. Keyes
describes it in a slightly different way, suggesting a definition of the syndrome of signs of
positive feelings and positive functioning in life. The authors [15] note that a positive state
of mental health is not synonymous with the absence of mental illness. This is the short
version of the Mental Health Continuum (MHC), based on the concept of two related but
distinguishable dimensions. The authors cite successful tests of this scale in countries such
as Poland, Italy, Brazil and the United States. Many indicators of positive mental health
have been identified in populations, including aspects such as general health, physical
activity, sleep, substance use, violence or discrimination. For young people, factors such
as relationships with peers or support from teachers are particularly important. Similarly,
income, employment and place of residence were positively associated with good mental
health. In their study, the authors [15] examined 5399 students from grades 8 and 10.
All of them were willing to answer questions, and 92% of students answered all of them.
The questionnaire used in the study was based on the Swedish version of the Survey of
Adolescent Life in Vestmanland, which also included a short version of the MHC and
other questions related to general health, substance abuse, exposure to technology, school
life and socioeconomic background. Changed the wording of several questions to better
fit the Chinese context. The data obtained were analyzed using SPSS 22 software, using
multivariate logistic regression, likelihood ratios and 95% confidence intervals for the
analysis of variables related to positive mental health as a dependent variable. In the
beginning, the collinearity of the variables was checked by Spearman’s correlation analysis.
Further, insignificant indicators were dropped until the model was statistically significant.
Nagelkerke’s Pseudo-R2 statistic and model fit were also calculated. Their research [15]
extends knowledge about the prevalence of positive mental health among Chinese minors,
as well as about the indicators of positive mental health. As a result, information was
obtained that the surveyed group of Chinese people was significantly healthier in terms
94
Electronics 2023, 12, 4407
of mental health than in similar studies in other countries. The authors acknowledge that
their study covered only one city in China, so further research in different regions will be
needed. On the other hand, the authors of the publication [16] on economic difficulties
and reported mental health problems during the COVID-19 epidemic point to the problem
of isolation increasing the risk of loneliness, or the need to assess the links between the
labor market and mental health, also in order to understand the impact of the pandemic on
existing the socioeconomic inequalities. Their considerations [16] include factors related to
changes in workload, income decline and job loss, as well as three mental health issues:
• Depression;
• Loneliness;
• Fear for your health [16].
The data came from employee surveys in Italy, Spain, the Czech Republic, Slovakia,
the Netherlands and Germany from March and April 2020. The research also took into
account the International Socio-Economic Index (ISEI). It expresses the relative position of
the profession in the labor market, on a scale of 10 to 89 points. During the analyses [16], it
was noted that occupations with an ISEI index below 30 points were characterized by a
much higher risk of economic difficulties—about twice as high as medium and high-rated
occupations (ISEI up to about 80 points). In addition, freelance and self-employment
increased the likelihood of a reduction in workload by more than 32 percentage points, a
decrease in income by 42 percentage points, and a loss of a job by just under 20 percentage
points, compared to typical workers. Similarly, in the comparison between employees and
employers, reductions in workload and income were more pronounced in the first group.
In the final part of the work [16], they point out that the indicators used by them are not
clinically confirmed, which makes it impossible to compare them on an equal basis, but
they are an assessment of feelings about mental health. In addition, they consist of single
questions, which makes them a non-detailed assessment of mental health. The authors
explain that this is due to the data in the questionnaires not being designed to capture
mental health, so researchers have had to rely on crude indicators. On the other hand, in
the paper [17] attention was drawn to incomplete or partial evidence of the connection
between mental illnesses and work. Therefore, the authors assumed that the mental health
of an individual depends on characteristics such as:
• Personality;
• Sex;
• Own results at work;
• Loss of a job by a family member [17].
They developed [17] two models, one for the issue of the impact of job loss by a
partner on the spouse, and the other describing the effects of parental job loss on underage
children. They also sought to limit biasing effects in their study, based on data from around
7700 Australian households. The data consisted of responses to the Household, Income
and Labor Dynamics in Australia (HILDA) survey. In order to develop two models, two
separate data samples were created [17]—one for married couples, the other for parent–
child pairs. Part of the data included answers to the Self-Completion Questionnaire (SCQ),
which the researchers used in both the first data sample and the second. The MHI-5
(MHI—Mental Health Inventory) was used as the output variable [17], consisting of five
questions on a 6-point scale. These questions were as follows:
• Were you a nervous person?
• Have you felt so down that nothing could cheer you up?
• Did you feel calm and composed?
• Have you felt depressed?
• Were you a happy person? [17].
The scores on this scale ranged from 0 to 100, where the lower the value, the worse
the mental health. As a result of these studies [17], it turned out that the impact of losing a
wife’s job had no greater effect on husbands, while wives whose spouses lost their jobs had
95
Electronics 2023, 12, 4407
between 2 and 2.7 lower scores than women whose husbands still had jobs. However, the
authors, taking into account other factors, indicate that this is not a statistically significant
result. It was only when differentiating between groups with persistent unemployment,
financial stress and dissatisfaction with relationships that a significant effect of losing a job
by husbands was found. They found that continued unemployment caused a significant
decline in mental health between studies and that the financial stress situation did not
significantly contribute to worse mental health, while both women and men experienced
worse mental health as dissatisfaction with their partner increased compared to previous
answers. However, looking at the results [17] regarding the mental health of children after
the loss of a job by one of the parents, it did not have a significant impact on its deterioration.
A drop of 6.6 points was recorded when the mother was unemployed between examinations,
which has a much higher impact than was observed for other variables. Comparing the
mental state of boys and girls, it was shown that the deterioration of mental health was
greater in girls, especially when the mother was unemployed. However, in the work [18],
the impact of natural disasters on the mental health of minors is compared with their
peers who have not experienced such events. Their study uses data on students from two
Canadian cities located in the same province (Fort McMurray and Red Deer). In the surveys
conducted in these cities, six questionnaires common to both studies were used, including:
• Patient Health Questionnaire, Adolescent version (PHQ-A);
• Hospital Anxiety and Depression Scale (HADS);
• CRAFFT questionnaire;
• Tobacco Use Questionnaire;
• Rosenberg’s self-esteem scale;
• Kidscreen questionnaire [18].
The authors [18] performed a statistical analysis based on these questionnaires, and
also compared the percentage odds of:
• Depression;
• Thoughts of suicide;
• Medicines;
• Using alcohol/stimulants;
• Tobacco use;
• Any of the options: about depression, about fears or use of alcohol/stimulants.
An additional limitation was the use of only complete answers for each measure, i.e.,
without omitted questions. A comparison [18] of indicators between the two regions found
significant differences in 8 out of 12 measures of mental health status. The rates of possible
depression were significantly higher in the city that experienced a natural disaster, as were
those for suicidal thoughts and tobacco use. On the other hand, the self-esteem and quality
of life scales (Rosenberg and Kidscreen, respectively) were much lower, but this is related to
the nature of their questions. The conclusions [18] include the observation that this research
reinforces the need for policies and programs to care for mental health among minors,
especially after natural disasters, in order to reduce their vulnerability and build a positive
state of mental health. They also note that it would be useful to compare these studies
with data for post-traumatic stress symptoms from both cities, as the authors did not have
such data from the city of Red Deer. They also indicate that minors are very vulnerable
to the adverse impact of natural disasters. Summing up the studied literature, it can be
noted that these are extremely diverse studies, they address many aspects related to mental
health indicators, both from the side of positive and negative mental health. In addition, a
variety of approaches were used, including voice data analysis, conducting surveys using
many different questionnaires, random forests, bagging algorithm, support vector method,
K-nearest neighbor’s method, and statistical analysis. However, it should be borne in mind
the need to expand research in the search for more effective algorithms that can be used in
this area.
96
Electronics 2023, 12, 4407
The proposed solution can be used in a prototype preventive mental health medicine
system (Figure 2) for healthy people to monitor and detect the first symptoms of chronic
stress and burnout as early as possiblebased on a combination of a generic standard and a
dynamic standard generated directly from the data set. Given the second opinion offered
by the ML system, it will support the activities of primary care physicians and psychology
and psychiatry specialists in their daily efforts to provide early diagnosis and treatment of
this group of conditions and will allow the selection and application of prevention and, if
necessary, minimize the duration of potential therapy and reduce its cost [19].
Novelty and contribution lie in the application and matching of ML methods to the
form and characteristics of test data describing chronic stress and job burnout. Pre-selection
of methods and their initial facilitated matching to presumed criteria is key, which will
support the development of preventive mental health medicine systems.
The research aims to determine a virtual indicator of mental health using selected ML
algorithms, as well as to determine their effectiveness in this task by checking the learning
time, operation and accuracy. In addition, research hypotheses will be verified, i.e.,:
• Choice of the ML method affects the regression accuracy, learning time and running time;
• Differences in accuracy are relatively small—up to about 10 percentage points differ-
ence between methods.
97
Electronics 2023, 12, 4407
Mental well-being data were used, including people’s gender, age, length of service
and their responses to the three questionnaires: Perceived Stress Scale (PSS), Maslach
Burnout Inventory (MBI) and Satisfaction with Life Scale (SWLS).
The subject of the study was data from a set of 99 people, information about which
was divided into 4 subgroups, each in a separate MS Excel sheet: “Patient data”, “PSS10”,
“MBI”, and “SWLS”. The first of the above sheets includes the patient’s gender, age and
work experience. The second sheet contains answers to 10 questions from the PSS set, on
a scale of 0 to 4, where 0 corresponds to “never”, 1—“almost never”, 2—“sometimes”,
3—“quite often”, and 4—“very often”. The third sheet contains answers to 22 questions
from the MBI set, on a scale of 0 to 6, where 0 corresponds to “never”, 1—“several times a
year”, 2—“once a month”, 3—“severaltimes a month”, 4—“once a week”, 5—“several times
a week”, and 6—“every day”. The fourth sheet contains answers to 5 questions from the
SWLS set, on a scale of 1 to 7, where 1 corresponds to “strongly disagree”, 2—“disagree”,
3—“slightly disagree”, 4—“neither agree nor disagree”, 5—“agree slightly”, 6—“agree”,
and 7—“strongly agree”. Based on these four sheets, a CSV (Comma Separated Values) file
was created, which was used in the application due to the inability to directly load an Excel
file with the .xls extension, also taking into account the available NuGet packages—they
are satisfactorily documented for use in the project. This CSV file uses a semicolon (;) as the
98
Electronics 2023, 12, 4407
delimiter, which has been included in the app as the default delimiter value. The total is
based on all answers from PSS, MBI and SWLS sets. All but the first column of the CSV file
contain numeric values, while the first column can only contain two options: M (Male) or
F (Female).
The study was approved by the Bioethics Committee No. KB 391/2018 at the Ludwik
Rydygier Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Toruń.
Each participant in the study gave informed consent.
2.2. Methods
Two languages were used to develop the application: C# in .NET and Extensible
Application Markup Language (XAML), whereby:
• C# language was used to describe the actions performed by the program;
• XAML was used to develop the layout of the user interface in a Universal Windows
Platform (UWP) application, along with the naming of elements (which allows them
to be used in C# as variables), or the binding of events to specific functions in the code
behind the interface (code behind).
A number of ML models based on Stochastic Dual Coordinate Ascent (SDCA), limited-
memory Broyden–Fletcher–Goldfarb–Shanno, Online Gradient Descent, etc., were built
based on a clinical dataset (PSS, MBI and SWLS) and compared based on criteria in the
form of learning time, running time during use and regression accuracy. The rationale
for choosing these particular algorithms lies in their popularity and the authors’ previous
experience and previous research on measuring long-term stress and burnout using the
aforementioned group of tests and AI [19–22]. Knowledge in the area of matching AI/ML
tools for the analysis, inference and prediction of stress and burnout measurements is still
nascent and no computational or theoretical basis can be cited as yet.
The predicted value was a virtual mental health index.
The data set has been divided into a training set (70% of samples) and a test set (30%
of samples).
SDCA algorithm is a linear algorithm, meaning that it generates a model that calculates
results based on a linear combination of the input data and a set of weights. The model
weights are those parameters that are determined during training. In the general case, linear
algorithms are scalable, fast and have a low cost during training and during prediction.
This class of algorithms goes through the training dataset many times [23]. It is devoid of
parameters for manual tuning and has a clearly defined stopping criterion. This algorithm
has good implicit performance. It combines some of the best features, such as:
• Possibility of streaming learning, i.e., operating on data without having to put it all in
memory at once;
• Achieving satisfactory results with a small number of circuits through the entire
data set;
• Not wasting computing power on zeros in sparse datasets [24].
It should be borne in mind that the results obtained with this algorithm are dependent
on the order of the training data, but the solutions obtained can be treated as equally good
between different executions of the algorithm [25]. This algorithm is a stochastic version of
DCA. The basic version of the algorithm (DCA) performs optimization on a single variable
in each iteration without affecting the others. The SDCA version of the algorithm performs
a pseudo-random selection of a double coordinate for optimization based on a uniform
probability distribution [26].
LBFGS is an abbreviation for limited-memory Broyden–Fletcher–Goldfarb–Shanno,
an optimization algorithm based on BFGS, but using limited Random Access Memory
(RAM) [27,28], as it does not store a matrix approximating the inverse of the Hessian ∇2
f(x), instead using an intermediate approximation [28,29]. The calculation is based on an
initial approximation and an update rule that models local curvature information [27,28].
The original Broyden–Fletcher–Goldfarb–Shanno method, called full BFGS (BFGS), pro-
99
Electronics 2023, 12, 4407
posed by these four authors in 1970, keeps the aforementioned matrix in memory, whose
computational cost of updating is high, of the order of O(n2) [28,29]. As for the convergence
of the BFGS method, if the function has a continuous second derivative and the function is
strongly convex, the sequence of successive values of xk tends towards the global minimizer,
and furthermore, when it is assumed that the Hessian satisfies the Lipschitz condition, the
rate of convergence is super linear [28,29]—i.e., faster than linear. The convergence of the
LBFGS algorithm depends on the quality of the Hessian approximation, which is difficult
to achieve, and it has been observed in numerical observations that an appropriate guess of
the initial Hessian has a significant impact on the search direction and convergence [27,28].
The Online Gradient Descent (OGD) algorithm is a variation of the Stochastic Gra-
dient Descent (SGD) method used for online training—i.e., training by learning concepts
incrementally by processing examples from the training set one at a time, one after the
other, with the algorithm not storing the last occurrence after each update, but based on the
next sample [29,30]. SGD uses an iterative technique based on error gradients, in addition
to providing the ability to update the weight vector using the average of the observed data
vectors as the algorithm progresses [31]. SGD is popular for its simplicity, computational
efficiency, or convergence independent of the training dataset, and the performance of
DL methods depends heavily on this algorithm. However, it is susceptible to the effects
of noisy data, especially noticeable in robotics, where robots do not have the capacity to
collect enough data to negate these effects [32].
Three questionnaires were used to determine a virtual mental health index: PSS, MBI
and SWLS. Data from these questionnaires were used in the application to train the models
and determine metrics and statistics.
PSS is a scale developed by Cohen, Kamarck and Mermelstein in 1983, which aimed
at respondents’ self-assessment of the unpredictability of their life, their lack of control
over it and the overload they feel. The original version has fourteen general questions
on a four-point scale, and the final score is obtained by reversing the scale for positively
valenced questions and then adding up the scores for all questions. In addition, two shorter
versions of the scale have been developed, the ten-question scale used in this work, as
well as a four-question scale [33]. Research on this instrument is massively carried out all
over the world, including China, Ethiopia, Iran and Greece, and the results indicate that
this scale can be relied upon to be used in these countries. To validate the scale, Cohen
studied the responses of people of different ages, both genders and a variety of racial
backgrounds [34]. Similar information is presented by the authors of a Czech study, where
they briefly describe that all versions of the scale had previously been compared in a variety
of cultural and linguistic contexts and that these researchers agreed that the ten-question
scale was at least comparable to or better than the original version in terms of internal
consistency while noting a significant decrease in reliability of the four-question version,
which was attributed to it simply being too short [33].
The MBI (Maslach Burnout Inventory) was developed by Christina Maslach and
her team. In her article, she explains the concept of professional burnout (burnout)—a
syndrome of emotional exhaustion and cynicism often found in people who work with
people, with a key component being the increased sense of emotional exhaustion mentioned
earlier. It indicates that with the depletion of their emotional resources, employees begin
to feel that they are not able to give their best; furthermore, they develop negative, even
cynical attitudes and feelings about their clients. The two aspects seem to be linked, and a
tendency to evaluate oneself negatively, especially in relation to one’s work, not feeling
satisfied with one’s achievements, is mentioned as a third effect related to professional
burnout [34]. Occupational burnout is characterized by high levels of emotional exhaustion,
dehumanization and low feelings of personal fulfillment. In addition, they point out that
occupational burnout and depressive states are related, but they are not the same concepts,
i.e., their characteristics do not overlap and thus cannot be used interchangeably [35]. The
version used in this thesis consists of three groups of questions regarding these issues:
100
Electronics 2023, 12, 4407
101
Electronics 2023, 12, 4407
3. Results
The algorithm with the highest accuracy was Stochastic Dual Coordinate Ascent, but
although its performance was high, it had significantly longer training and prediction times
(Figure 3a).
(a)
(b)
(c)
Figure 3. (a) General comparison of metrics, training times and predictions, (b) comparison of
selected metrics and (c) assesment of models (three columns on right side).
102
Electronics 2023, 12, 4407
The fastest algorithm looking at learning and prediction time, but slightly less accurate,
was the limited-memory Broyden–Fletcher–Goldfarb–Shanno (Figure 3b).
The first criterion considered was the model learning time, expressed in milliseconds.
The average, minimum and maximum values were taken into account. Both the average,
minimum and maximum times were the longest for the SDCA model and the shortest for
the LBFGS model. This means that the SDCA model performed the worst in this ranking,
and the LBFGS model performed best. It should be noted that while for the LBFGS and
OGD models, the difference between their maximum and minimum values was relatively
small (about 4% of the average, both for LBFGS and OGD), for the SDCA model it was
about 64% of the average. Another criterion was the prediction time for the entire data set,
expressed in milliseconds. The average, minimum and maximum values were taken into
account. This time was the lowest for the OGD model, but it differed only slightly from
the LBFGS model, both models reached a time slightly above 1 ms. On the other hand, the
average time for the SDCA model was about 44 times longer than for the OGD model, and
again there were larger differences between the maximum and minimum values for the
SDCA model (approximately 18% of the mean value). The average absolute error was the
lowest for the SDCA model and amounted to about 0.216, while it was the highest for the
OGD model, amounting to about 0.481 (more than twice as much). On the other hand, for
the LBGFS model, it was around 0.320, which corresponds to an increase of 48%. For this
criterion, as well as for the mean squared error and the mean squared error, the best results
were achieved by the SDCA model, and the worst by the OGD model. The ranking for the
coefficient of determination looks similar, the value of which is closer to 1, the better for
the model. The last lines of the comparison show the number of occurrences for which
the absolute value of the difference between the rounded prediction and the value from
the dataset was 0, 1 or 2, respectively. Looking at the difference equal to 0, the best result
was obtained by the SDCA model, and the worst by the OGD model. For a difference of
1, the best result was obtained by the SDCA model (6 occurrences), and the worst by the
OGD model (41 occurrences). However, for the difference equal to 2, there was one such
occurrence for the LBFGS model (Figure 3c).
In this particular problem of determining a virtual mental health index, all three
models considered achieved comparable final results. Based on the criterion of model
learning time, and considering other factors (e.g., prediction time), the LBFGS model
would be the best choice. On the other hand, looking at metrics in the form of, among
other things, mean absolute error or coefficient of determination, the SDCA model, whose
biggest drawbacks are learning time and prediction time, would prove to be the best choice.
Although the OGD model achieved the best prediction time, it achieved the worst of the
results when looking at the metrics.
Looking at the results obtained, the differences between the ML methods used are
clearly visible, especially for learning time and metrics. Furthermore, bearing in mind
that the accuracy of the model increases as the value of the coefficient of determination
approaches one, the differences between the methods amounted to a maximum of around
1.4 percentage points, looking at the difference between the maximum and minimum value
in relation to the maximum possible value, i.e., 1 (which can be understood as 100%).
We have compared the aforementioned results with automated analysis using ML.NET
results (249 models checked, Tables 2 and 3).
103
Electronics 2023, 12, 4407
Despite the fact that the data lends itself to both prediction and classification, it has not
been possible to find one algorithm that is good at everything—a thoughtful combination
of different algorithms must be used in automated analysis.
4. Discussion
A comparison of the three ML algorithms showed small differences in regression
accuracy (about 1.4 percentage points, or, according to the thesis, less than 10 percentage
points), which, in relation to the work [10], which nevertheless dealt with the classification
problem, but revealed differences in accuracy between six different methods of about
5.5 percentage points, probably means a small impact of the method used on regression
accuracy or classification accuracy.
The results of the paper [14] are similar, where all the algorithms used, except for
the naive Bayes classifier—which is the simplest one used and probably for this problem
did not have a strong connection to reality—obtained accuracy differences of at most
9 percentage points. Looking at the proposal in that paper, continuing research would
need to use a feature subset selection strategy so that the solution is based on the highest
quality features. Applying such an approach successfully would mean a reduction in
learning time and potentially an increase in model reliability. In addition, the inclusion of
the patient’s mental profile mentioned can be considered to have been done, as the data
contains answers to a set of questions to assess the patient’s mental health status.
When comparing with studies [13,16–18], it is important to note the lack of analysis of
the impact of individual factors on the virtual mental health index, considering particular
attributes such as age, gender, or length of service. This implies an opportunity for further
research to be able to establish some trends, for example among different age groups, as in
the article [6]. In addition, further data would have to be collected, not only more numerous
but possibly also including the ISEI index, which expresses the relative position of the
occupation in the labor market, as in the study [16]. Regarding the study [17], the dataset
could be extended to include information on the dynamics of employment, or also the
household of the person surveyed. Looking at the study [18], it would be valuable to assess
the risk of problems such as depression, anxiety or the use of stimulants, which could be
baseline variables for the trained models.
Referring to the work [11], which addresses the problems of assessing health status
in discrete moments in time, mainly in terms of not being able to assess the impact of
the environment on the patient in real time, one could use data from apps and activity
monitoring devices of potential volunteers to derive models based on measured data. On
the one hand, this would make it possible to assess mental health on a continuous basis,
and on the other hand, it would make it independent of the patient’s self-assessment.
As described in the paper [12], voice-based mental health determination, while it
appears to be a promising solution, the authors did not present the ML methods used,
which, combined with the commercialization of the developed library and system, does not
allow for the extension of these analyses. On the other hand, the idea is intriguing, but in
order to be realized, it would require an appropriate selection of libraries and ML methods,
as well as the disposal of voice data, together with the determination of the patient’s mental
health status for these data.
On the other hand, the observation in the article [15] that a positive mental health
status does not imply the absence of mental illness, which was taken into account in the
Mental Health Continuum scale, whose tests in various countries have been successful.
104
Electronics 2023, 12, 4407
This is something to bear in mind, as it happens that mental illnesses are able to be hidden,
both consciously and unconsciously. It is also important to consider factors that are often
indicative of a patient’s mental state, such as their physical activity, sleep, use of stimulants,
and relationships with peers in the case of adolescents or relationships with co-workers
among adults.
In the study, learning time or prediction time is an evaluation criterion. The performers’
tasks do not require real time, but with large databases and a large number of simultaneous
system users, the value of this parameter can be very important.
It is noteworthy that a variety of tools have been used in these papers, whether in the
form of questionnaires, such as the Depression Anxiety Stress Scale-21, Beck Depression
Inventory or algorithms (logistic regression, K-nearest-neighbor method, decision trees,
bagging, support vector method) and technology, including the Python language, Scikit-
learn library, physical activity tracking mobile apps and wearable devices. In addition,
many of these papers did not present the programming language used, making it impossible
to make a technology choice based on them.
Key findings in the area of ML-supported human mental health analysis have shown
that, despite the variety of tools that have been used in these papers, one leading approach
is lacking, both in the selection of tests and in the selection of ML-based aggregation
and analysis methods. This makes it difficult both to compare different approaches and
to extract the best ones (based on common criteria) for further development and use in
both simple predictive systems within preventive medicine and complex diagnostic and
monitoring systems within more complex specialized studies. This results in the unique
contribution of the current study compared to the existing literature, which includes how
to aggregate test results into a virtual mental health index and how to select optimal ML
methods for its further use providing a basis for further research, including for other
groups of clinicians and researchers. Our experience to date shows that this element
of technological support is lacking in clinical practice, hence interdisciplinary teams are
needed for further research.
105
Electronics 2023, 12, 4407
Table 4. Directions of research on the virtual index of mental health with the use of ML algorithms [43–45].
This research can be a long and complicated process, but it can have significant benefits
in diagnosing, monitoring and managing patients’ mental health [46,47].
5. Conclusions
The ability of ML to identify burnout using passively collected electronic health record
(EHR) data and predict future health status with an accuracy of more than 70% (for some
traits: more than 90%) accounts for the usefulness of this group of methods in daily clinical
practice, which is worth developing.
The algorithms did not differ significantly from each other in terms of accuracy (about
1.4 percentage points) but differed more strongly in other parameters. The algorithm with
the highest accuracy was Stochastic Dual Coordinate Ascent, but although its performance
was high, it had a significantly longer training and prediction time. In contrast, the fastest
algorithm looking at learning and prediction time, but slightly less accurate, was the
limited-memory Broyden–Fletcher–Goldfarb–Shanno.
Findings from the study can be used to build larger systems that automate early
mental health diagnosis and help differentiate the use of individual algorithms depending
on the purpose of the system.
106
Electronics 2023, 12, 4407
Author Contributions: Conceptualization, A.B., I.R. and D.M.; methodology, A.B., I.R. and D.M.;
software, A.B., I.R. and D.M.; validation, A.B., I.R. and D.M.; formal analysis, A.B., I.R. and D.M.;
investigation, A.B., I.R. and D.M.; resources, A.B., I.R. and D.M.; data curation, A.B., I.R. and D.M.;
writing—original draft preparation, A.B., I.R. and D.M.; writing—review and editing, A.B., I.R.
and D.M.; visualization, A.B., I.R. and D.M.; supervision, I.R.; project administration, I.R.; funding
acquisition, I.R. and D.M. All authors have read and agreed to the published version of the manuscript.
Funding: The work presented in the paper has been financed under a grant to maintain the research
potential of Kazimierz Wielki University.
Data Availability Statement: Data are unavailable due to privacy and cyber security.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Asatryan, B.; Bleijendaal, H.; Wilde, A.A.M. Toward advanced diagnosis and management of inherited arrhythmia syndromes:
Harnessing the capabilities of artificial intelligence and machine learning. Heart Rhythm. 2023, 20, 1399–1407. [CrossRef]
2. Kannampallil, T.; Dai, R.; Lv, N.; Xiao, L.; Lu, C.; Ajilore, O.A.; Snowden, M.B.; Venditti, E.M.; Williams, L.M.; Kringle, E.A.; et al.
Cross-trial prediction of depression remission using problem-solving therapy: A machine learning approach. J. Affect. Disord.
2022, 308, 89–97. [CrossRef] [PubMed]
3. Hong, N.; Liu, C.; Gao, J.; Han, L.; Chang, F.; Gong, M.; Su, L. State of the Art of Machine Learning-Enabled Clinical Decision
Support in Intensive Care Units: Literature Review. JMIR Med. Inform. 2022, 10, e28781. [CrossRef]
4. Lopez-Jimenez, F.; Attia, Z.; Arruda-Olson, A.M.; Carter, R.; Chareonthaitawee, P.; Jouni, H.; Kapa, S.; Lerman, A.; Luong, C.;
Medina-Inojosa, J.R.; et al. Artificial Intelligence in Cardiology: Present and Future. Mayo Clin. Proc. 2020, 95, 1015–1039.
[CrossRef]
5. Reid, J.E.; Eaton, E. Artificial intelligence for pediatric ophthalmology. Curr. Opin. Ophthalmol. 2019, 30, 337–346. [CrossRef]
[PubMed]
6. Mentis, A.A.; Lee, D.; Roussos, P. Applications of artificial intelligence–machine learning for detection ofstress: A critical overview.
Mol. Psychiatry 2023, 1–13. [CrossRef]
7. Galatzer-Levy, I.R.; Onnela, J.P. Machine Learning and the Digital Measurement of Psychological Health. Annu. Rev. Clin. Psychol.
2023, 19, 133–154. [CrossRef] [PubMed]
8. Sutrisno, S.; Khairina, N.; Syah, R.B.Y.; Eftekhari-Zadeh, E.; Amiri, S. Improved Artificial Neural Network with High Precision
for Predicting Burnout among Managers and Employees of Start-Upsduring COVID-19 Pandemic. Electronics 2023, 12, 1109.
[CrossRef]
9. Adapa, K.; Pillai, M.; Foster, M.; Charguia, N.; Mazur, L. Using Explainable Supervised Machine Learning to Predict Burnout in
Healthcare Professionals. Stud. Health Technol. Inform. 2022, 294, 58–62. [CrossRef]
10. Srinivasulu Reddy, U.; Thota, A.; Dharun, A. Machine Learning Techniques for Stress Prediction in Working Employees. In
Proceedings of the 2018 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), Madurai,
India, 13–15 December 2018; pp. 1–4.
11. Knight, A.; Bidargaddi, N. Commonly available activity tracker apps and wearables as a mental health outcome indicator: A
prospective observational cohort study among young adults with psychological distress. J. Affect. Disord. 2018, 236, 31–36.
[CrossRef]
12. Hagiwara, N. Validity of Mind Monitoring System as a Mental Health Indicator using Voice. Adv. Sci. Technol. Eng. Syst. J. 2017,
2, 338–344. [CrossRef]
13. Pierce, M. Mental health before and during the COVID-19 pandemic: A longitudinal probability sample survey of the UK
population. Lancet Psychiatry 2020, 7, 883–892. [CrossRef] [PubMed]
14. Srividya, M.; Mohanavalli, S.; Bhalaji, N. Behavioral modeling for mental health using machine learning algorithms. J. Med. Syst.
2018, 42, 88. [CrossRef] [PubMed]
15. Guo, C.; Tomson, G.; Keller, C.; Söderqvist, F. Prevalence and correlates of positive mental health in Chinese adolescents. BMC
Public Health 2018, 18, 263. [CrossRef] [PubMed]
16. Witteveen, D.; Velthorst, E. Economic hardship and mental health complaints during COVID-19. Proc. Natl. Acad. Sci. USA 2020,
117, 27277–27284. [CrossRef]
17. Bubonya, M.; Cobb-Clark, D.A.; Wooden, M. Jobloss and the mental health of spouses and adolescent children. IZAJ. LaborEcon.
2017, 6, 6.
18. Brown, M.R.G. After the Fort McMurray wild fire there are significant increases in mental health symptoms ingrade 7–12 students
compared to controls. BMC Pyschiatry 2019, 19, 18.
19. Pal, S.; Xu, T.; Yang, T.; Rajasekaran, S.; Bi, J. Hybrid-DCA: A double asynchronous approach for stochastic dual coordinate ascent.
J. Parallel Distrib. Comput. 2020, 143, 47–66. [CrossRef]
20. Spiridonoff, A.; Olshevsky, A.; Paschalidis, I.C. Robust Asynchronous Stochastic Gradient-Push: Asymptotically Optimaland
Network-Independent Performance for Strongly Convex Functions. J. Mach. Learn. Res. 2020, 21, 58.
107
Electronics 2023, 12, 4407
21. Pu, S.; Olshevsky, A.; Paschalidis, I.C. A Sharp Estimate on the Transient Timeoff Distributed Stochastic Gradient Descent. IEEE
Trans. Automat. Contr. 2022, 67, 5900–5915. [CrossRef]
22. Pu, S.; Olshevsky, A.; Paschalidis, I.C. Asymptotic Network Independence in Distributed Stochastic Optimization for Machine
Learning. IEEE Signal Process. Mag. 2020, 37, 114–122. [CrossRef]
23. Mohsen, F.; Al-Saadi, B.; Abdi, N.; Khan, S.; Shah, Z. Artificial Intelligence-Based Methods for Precision Cardiovascular Medicine.
J. Pers. Med. 2023, 13, 1268. [CrossRef]
24. Price, M.J. Hello, C#! Welcome,. NET! In C# 8.0 and.NET Core 3.0—Modern Cross-Platform Development, 4th ed.; Packt Publishing
Ltd.: Birmingham, UK, 2019; pp. 1–69.
25. Perkins, B.; Hammer, J.V.; Reid, J.D. Introducing C#. In Beginning C# 7 Programming with Visual Studio 2017; Wiley: Hoboken, NJ,
USA, 2018; pp. 3–13.
26. Shalev-Shwartz, S.; Tong, Z. Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization. arXiv 2013,
arXiv:1209.1873.
27. Lu, X.; Yang, C.; Wu, Q.; Wang, J.; Wei, Y.; Zhang, L.; Li, D.; Zhao, L. Improved Reconstruction Algorithm of Wireless Sensor
Network Based on BFGS Quasi-Newton Method. Electronics 2023, 12, 1267. [CrossRef]
28. Aggrawal, H.O.; Modersitzki, J. Hessian Initialization Strategies for L-BFGS Solving Non-linear Inverse Problems. arXiv 2021,
arXiv:2103.10010.
29. Asl, A.; Overton, M.L. Behavior of limited memory BFGS when applied to nonsmooth functions and their nesterov smoothings.
arXiv 2020, arXiv:2006.11336.
30. Bousbaa, Z.; Sanchez-Medina, J.; Bencharef, O. Financial Time Series Forecasting: A Data Stream Mining-Based System. Electronics
2023, 12, 2039. [CrossRef]
31. Benczúr, A.A.; Kocsis, L.; Pálovics, R. Online Machine Learning in Big Data Streams. arXiv 2018, arXiv:1802.05872.
32. Ilboudo, W.E.L.; Kobayashi, T.; Sugimoto, K. Robust stochastic gradient descent with student-t distribution basedfirst-order
momentum. IEEE Trans. Neural Netw. Learn. Syst. 2020, 33, 1324–1337. [CrossRef]
33. Figalová, N.; Charvat, M. The Perceived Stress Scale: Reliability and validity study in the Czech Republic. Ceskoslovenská Psychol.
2021, 65, 46–59. [CrossRef]
34. Prasetya, A.; Purnama, D.; Prasetyo, F. Validity and Reliability of The Perceived Stress Scale with RASCH Model. PSIKOPEDA-
GOGIA J. Bimbing. Konseling 2020, 8, 48–51. [CrossRef]
35. Maslach, C.; Jackson, S.E. The measurement of experienced burnout. J. Occup. Behav. 1981, 2, 99–113. [CrossRef]
36. Schaufeli, W.B.; Bakker, A.B.; Hoogduin, K.; Kladler, A.; Schaap, C. On the clinical validity of the Maslach Burnout Inventory and
the Burnout Measure. Psychol. Health 2001, 16, 565–582. [CrossRef] [PubMed]
37. Checa, I.; Perales, J.; Espejo, B. Measurement in variance of the Satisfaction with Life Scale by gender, age, marital status and
educational level. Qual. Life Res. Int. J. Qual. Life Asp. Treat. Care Rehabil. 2019, 28, 963–968. [CrossRef]
38. Diener, E.; Emmons, R.A.; Larsen, R.J.; Griffin, S. The Satisfaction with Life Scale. J. Personal. Assess. 1985, 49, 71–75. [CrossRef]
39. Prokopowicz, P.; Mikołajewski, D.; Mikołajewska, E. Intelligent System for Detecting Deterioration of Life Satisfaction as Tool for
Remote Mental-Health Monitoring. Sensors 2022, 22, 9214. [CrossRef]
40. Rojek, I. Neural networks as prediction models for water intake in water supply system. In Artificial Intelligence and Soft
Computing—ICAISC 2008. Lecture Notes in Computer Science, 5097; Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M.,
Eds.; Springer: Berlin/Heidelberg, Gemany, 2008; pp. 1109–1119. Available online: https://link.springer.com/chapter/10.1007/
978-3-540-69731-2_104 (accessed on 31 August 2023).
41. Spoor, J.M.; Weber, J. Evaluation of process planning in manufacturing by a neural network based on an energy definition of
Hopfield nets. J. Intell. Manuf. 2023, 1–19. [CrossRef]
42. Teixeira, I.; Morais, R.; Sousa, J.J.; Cunha, A. Deep Learning Models for the Classification of Cropsin Aerial Imagery: A Review.
Agriculture 2023, 13, 965. [CrossRef]
43. Rojek, I.; Mikołajewski, D.; Macko, M.; Szczepański, Z.; Dostatni, E. Optimization of Extrusion-Based 3D Printing Process Using
Neural Networks for Sustainable Development. Materials 2021, 14, 2737. [CrossRef]
44. Rojek, I.; Mikołajewski, D.; Kotlarz, P.; Macko, M.; Kopowski, J. Intelligent system supporting technological process planning for
machining and 3D printing. Bull. Pol. Acad. Sci. Tech. Sci. 2021, 69, e136722.
45. Mohammadi, E.K.; Talaie, H.R.; Azizi, M. A healthcare service quality assessment model usinga fuzzy best–worst method with
application to hospitals within-patient services. Healthc. Anal. 2023, 4, 100241. [CrossRef]
46. Gajos, A.; Wójcik, G.M. Independent component analysis of EEG data for EGI system. Bio-Algorithms Med-Syst. 2016, 12, 67–72.
[CrossRef]
47. Kawala-Janik, A.; Podpora, M.; Pelc, M.; Piatek, P.; Baranowski, J. Implementation of an inexpensive EEG headset for the pattern
recognition purpose. In Proceedings of the 2013 IEEE 7th International Conference on Intelligent Data Acquisition and Advanced
Computing Systems (IDAACS), Berlin, Germany, 12–14 September 2013; Volume 1, pp. 399–403.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
108
electronics
Article
A Multiscale Neighbor-Aware Attention Network for
Collaborative Filtering
Jianxing Zheng 1 , Tengyue Jing 2 , Feng Cao 3, *, Yonghong Kang 3 , Qian Chen 3 and Yanhong Li 3
1 Institute of Intelligent Information Processing, Shanxi University, Taiyuan 030006, China; [email protected]
2 North Automatic Control Technology Institute, Taiyuan 030006, China; [email protected]
3 School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China;
[email protected] (Y.K.); [email protected] (Q.C.); [email protected] (Y.L.)
* Correspondence: [email protected]
Abstract: Most recommender systems rely on user and item attributes or their interaction records to
find similar neighbors for collaborative filtering. Existing methods focus on developing collaborative
signals from only one type of neighbors and ignore the unique contributions of different types of
neighbor views. This paper proposes a multiscale neighbor-aware attention network for collaborative
filtering (MSNAN). First, attribute-view neighbor embedding is modeled to extract the features of
different types of neighbors with co-occurrence attributes, and interaction-view neighbor embedding
is leveraged to describe the fine-grained neighborhood behaviors of ratings. Then, a matched attention
network is used to identify different contributions of multiscale neighbors and capture multiple
types of collaborative signals for overcoming sparse recommendations. Finally, we make the rating
prediction by a joint learning of multi-task loss and verify the positive effect of the proposed MSNAN
on three datasets. Compared with traditional methods, the experimental results of the proposed
MSNAN not only improve the accuracy in MAE and RMSE indexes, but also solve the problem of
poor performance for recommendation in sparse data scenarios.
different rating neighbors to explore the fine-grained preference motivation. Thus, how to
deal with collaborative filtering by combining multiscale attribute neighbors with rating
neighbors is an important task. Multiscale node embedding describes the fine-grained
semantic representations from multiple perspectives, and can effectively solve the prob-
lem of poor performance of sparse recommendation, which is of great significance for
industrial applications.
In the e-commerce recommender systems, multiscale attribute combinations can
produce single-attribute-view neighbors and multi-attribute-view neighbors. For example,
in movie recommender systems, users with the same gender and age have more similar
interest behaviors than users of the same gender. As a result, different types of attribute-
view neighbors can be constructed in various attribute combination spaces. In addition,
different interaction-view neighbors can be obtained according to the types of interaction
ratings, such as 1–5. We model the interaction-view neighbor embedding of nodes on
various interactive views. The attention mechanism [5] is used to focus on specific input
features, analyze the importance of all aspects of input features, and improve the expression
ability of the model, which has been widely applied in the fields of natural language
processing and image processing. Inspired by [6], this paper captures a different type
of attribute neighbor embedding and interactive neighbor embedding and mine their
collaborative signals with the attention mechanism to model multiscale node embedding.
The multiscale node embedding can effectively capture diverse semantics of nodes from
different types of neighbors to enhance sparse recommendation.
To summarize, the main contributions of this paper are as follows.
• Various neighbor graphs of attribute tag and rating tag are designed to learn attribute
neighbor embedding and interaction neighbor embedding, which capture embedding
signals of various neighbors at different levels.
• An attention network is developed to refine the collaborative semantics of multi-
scale neighbors, which is utilized to filter the irrelevance signals of various types
of neighbors.
• A joint learning of multiscale neighbor embedding is proposed for rating prediction,
which solves the problem of poor accuracy in the context of sparse recommendation.
The rest of this paper is structured as follows. Section 2 outlines related work, including
the state-of-the-art of neighbor-based recommendation and attention mechanism. In Section 3,
we present the framework of the proposed multiscale neighbor-aware attention network.
Section 4 provides the methodology of the MSNAN recommendation. Section 5 describes
the experimental setup and evaluation. Section 6 gives the experimental results and analysis.
Finally, Section 7 concludes this work.
2. Related Work
In this section, the related work on neighbor-based recommendations of collaborative
filtering and attention mechanisms are briefly reviewed.
110
Electronics 2023, 12, 4372
placement. In recent years, some matrix decomposition models of deep learning have
been studied [16,17]. DeepFM learns low-order and high-order interaction features of
compressed interaction neighbors through a neural network [18]. Cai et al. [19] leveraged
various multi-grained sentiment features and latent factors of matrix factorization to obtain
sufficient representations of users and items to make rating predictions. Although these
models have the ability to handle the sparsity recommendation problem, they leverage raw
neighbors to learn their high-order interaction features and have limitations on processing
different contributions of fine-grained neighbors.
111
Electronics 2023, 12, 4372
Xiao et al. [36] designed a social explorative attention network to make personal interest
recommendations. Ye et al. [37] utilized both influence graph and the preference graph
to fuse different user and item embeddings to make rating predictions. However, most
attentional models deal with the role of the same type of neighbors, which limits the
discriminative contributions of various neighbors to the user’s overall decision making.
X X X XQ
HX$WW $WWHQWLRQDOPXOWLVFDOHQHLJKERUQRGHHPEHGGLQJ
X X X XQ
D
HX$WW B
9LHZ 3RROLQJ
D HX$WW B N 9LHZ HX$WW B
E
HX$WW
X X X XQ
X X X XXQ HX$WW
D
E
F HX$WW B P
HX$WW B P N
9LHZP 3RROLQJ
HX$WW B P N 9LHZP
HX$WW B J
X Y X YP
HX,QW HX,QW B J
X Y X YQ
U X HX
Y
U X HX,QW B 9LHZ
HX,QW B N 9LHZ 3RROLQJ HX,QW
Y HX,QW B
X Y X YQ ,QW
H 5DWLQJSUHGLFWLRQ
X
X
X YU X YQ
UW
Y
HX,QW B W
9LHZW HX,QW B W
3RROLQJ
HX,QW B W N
9LHZW
,QWHUDFWLRQYLHZQHLJKERUQRGHHPEHGGLQJ
In a nutshell, the framework works as follows. For the attribute-view neighbor node
embedding, we first construct an attribute-view neighbor graph according to the association
of a node on an attribute set such as {a} or {a,b}. Attribute sets of different scales induce
multiple types of attribute neighbors. Based on various attribute-view neighbors, graph
neural networks are used to obtain attribute-view neighbor node embedding.
For the interaction-view neighbor node embedding, we divide different rating tag
spaces according to different rating grades. Under different rating tag spaces, we form
similar neighbors with different rating behaviors. Then, graph neural networks are lever-
112
Electronics 2023, 12, 4372
4. Methodology
4.1. Attribute-View Neighbor Embedding
In e-commerce networks, some attribute descriptions describe characteristics of users
or products, which are helpful to discover various types of similar users or similar items.
In this subsection, we utilize various types of attribute sets to calculate similar neighbors of
different scales. Then, we learn the nodes’ attribute-view neighbor embedding in terms of
different-scale neighbor graphs.
Usually, users have various kinds of attributes, such as gender, age, occupation, and so
on, which reflects users’ interest preference to a certain extent. For example, users with the
same gender can form a neighbor graph with a coarse-grained perspective, while users
with the same gender and age can build a neighbor graph with a fine-grained attribute
space. A coarse-grained neighbor can provide robust interest preferences for cold-start
recommendation. A fine-grained neighbor helps discover refined similar preferences and
model accurate collaborative recommendation. Based on this assumption, we can construct
different views of attribute neighbor graphs to incorporate signals of various neighbors for
modeling the embedding of nodes.
Given an attribute a, we can define a neighbor set Nua of user u on an attribute a
as follows:
Nua = {u | f a (u) = f a (u )} (1)
where f a (u) is the attribute value of the user u on attribute a. Nua describes the collaborative
neighbors with the same attributes as user u. Considering all attribute value types in
the set A, we can construct the user-attribute relation matrix as MU × A . Then, based on
the user distribution of the set A, we can establish the neighbor relationship matrix of
users by MM T , labeled as MU ×U . Here, various attribute-view neighbors reflect multiscale
collaborative preference, which can affect the decision-making tendency of the target user.
Based on the idea in [33,38], we can define the collaborative signal of a first-order
neighbor u for user u on the attribute a as follows.
u + (u u )
mua ←u = (2)
| Nua || Nua |
113
Electronics 2023, 12, 4372
Further, attribute a-view recursive collaborative signal for user u can be defined as
a(k) a(k)
eu = ∑ m . We adopt the average pooling to obtain the attribute a-view neighbor-
u u←u
∈ Nua
a (1) a(k)
aware node embedding for user u as euAtt_a = agg(eu , · · · , eu ). Here, some aggregation
strategies can be used to fuse different orders of neighbor embedding.
Different types of attributes can induce different similar neighbors. Considering
various types of neighbors in other attribute views, we can obtain attribute A-view neighbor-
A (1) A(k)
aware user embedding as euAtt_A = avg(eu , · · · , eu ). Similarly, given an item v,
we adopt different types of item neighbors to obtain the attribute-view neighbor-aware
item embedding.
Different users have special behavioral perceptions of rating labels, which can be
used to explore users’ behavioral preferences in fine granularity. Thus, according to the
types of rating labels, we also divide different interaction view spaces with rating labels
and construct various rating interaction graphs for learning user and item embeddings.
For example, we can think of item groups with the same rating as neighbors of a user
with the same scale preference. Based on different rating labels, we model interaction-
view neighbor-aware embedding with different rating neighbors, which can be defined as
{euInt1 , · · · , euIntr }. Then, interaction-view neighbor-aware item embedding of an item v can
Int
be defined as {ev 1 , · · · , evIntr }.
qi = hi euAtt_i (4)
Equation (4) describes the influence of user embedding in the attribute i space on the
global attribute neighbor-aware user embedding. hi is the parameter vector. Considering
different user embeddings of m spatial types, the normalized weights are defined in
Equation (5).
e qi
αi = (5)
∑ e qs
s∈{1,...,m}
Then, the global attribute neighbor-aware node embedding for user u is defined
as follows.
= ∑ αi euAtt_i
Att_g
eu (6)
i ∈{1,...,m}
114
Electronics 2023, 12, 4372
Here, Equation (7) defines the preferential influence of user embedding in attribute i’s
view on the user’s rating decision. Then, we normalize this influence using Equation (8).
e βi
γi = (8)
∑ eβs
s∈{1,...,m}
The normalized weight γi reflects the influence of multiscale attribute neighbor em-
bedding for the user’s interaction rating. Furthermore, the attribute embedding by incorpo-
rating user rating behavior preference can be updated as in Equation (9).
The multi-type matched signals based on attribute neighbors and interactive neighbors
can enhance the embedding representation of nodes. Similarly, we can calculate the
matched neighbor embedding of the interactive view as below.
Att_g Int_j
w j = eu eu (10)
Here, w j is the dependency influence of user embedding in rating tag j’s view on user’s
attributes. In terms of different rating levels, this dependence effect can be normalized
using Equation (11).
ew j
gj = (11)
∑ ew p
p∈{1,...,t}
∑
Int_j
euInt = g j eu (12)
p∈{1,...,t}
In Equation (12), euInt represents the user’s interaction embedding incorporating the
dependency of user attributes. Considering the matched neighbor embedding in the
attribute view and neighbor embedding in the interactive view, we can model fused
multiscale neighbor embedding with the concatenation operator as follows.
ŷ = eu · ev + bg + bu + bv (14)
Here, the parameters bg , bu , and bv are global bias, user bias, and item bias. To pre-
serve the preference information of user attributes over item attributes, we define rating
prediction of user u for the item v with their attribute-view neighbor embeddings as follows:
Similarly, we can compute the rating with the interaction-view neighbor embeddings
in interactive space, which is shown as below.
115
Electronics 2023, 12, 4372
The joint loss function considers the global and local important neighbors to predict
the user’s rating of the item, which not only captures the user’s attribute preference for the
item from various attribute-view neighbors, but also retains the behavioral preference of
collaborative fine-grained interaction neighbors.
5. Experiments
In this section, we verify the performance of the proposed model with the aim of
answering the following three questions:
• RQ1: How does MSNAN perform compared with state-of-the-art neighbor-based
collaborative filtering methods?
• RQ2: How does the multiscale neighbor node embedding perform for sparsity
recommendation?
• RQ3: How do different types of neighbor embedding affect the performance of
the model?
5.1. Dataset
We ran the proposed MSNAN model and baselines on three public datasets: Movielens-
100kr (ML-100kr) (https://grouplens.org/datasets/movielens/ (accessed on 24 September
2023)), Book-Crossing-10croe (BK-10C) (http://bookcrossing.com (accessed on 24 Septem-
ber 2023)), and Douban (https://movie.douban.com/ (accessed on 24 September 2023))
datasets to verify their effectiveness. The ML-100kr contains the interaction ratings of
943 users on 1682 movies. Users have 5 attributes, and movies have 19 attributes. The rat-
ing score is on the scale 1–5. For the BK-10C dataset, we selected users who had rated
at least 10 books and books that have been rated by at least 10 users, which involved
1820 users and 2030 books. The rating score uses the range of 1–10. The Douban dataset
contains ratings of 6971 movies from 3022 users with rating values of 1–5 [39]. On all
datasets, the higher the rating is, the more the user likes the movie/book. Statistical infor-
mation of experimental datasets is shown in Table 1. During the experiment, we use the
MAE and RMSE indexes for performance evaluation. The smaller MAE and RMSE values
indicate better performance. All the datasets were divided into training set, validation set,
and testing set with a proportion of 8:1:1. The rating prediction performance of the model
is evaluated on the testing set.
116
Electronics 2023, 12, 4372
(3) NGCF [33]. A collaborative filtering method based on a graph neural network learns
the embedding representations of users and items with a user–item interaction graph.
(4) GCN [40]. A graph convolutional neural network leverages the information of multi-
order neighbors by superimposing several convolutional layers for recommendation.
(5) LightGCN [38]. The embedding representations of user and item are learned by
aggregating linear neighbor information of nodes.
(6) GAT [41]. A graph convolutional neural network method together with attention
mechanism learns weighted node embedding representation for recommendation.
(7) ACCM [3]. The attention mechanism is used to integrate a content-based method
with collaborative filtering for rating prediction.
(8) AFM [5]. An attentional network factorization method learns the interactive impor-
tance of different features for prediction.
(9) TANP [42] A task-adaptive neural network is constructed to learn the relevance of
different tasks for user cold-start recommendations.
6. Experimental Results
In this subsection, we compare the proposed MSNAN model with the benchmark mod-
els in terms of MAE and RMSE metrics on three datasets. The experimental results are shown
in Table 2. In Table 2, we have the following observations from the experimental results.
117
Electronics 2023, 12, 4372
Through the comparison results, the performance of MSNAN shows excellent improve-
ments on three datasets. For example, compared with the ACCM method of collaborative
filtering, the MAE and RMSE values of the MSNAN + LightGCN method increase by 2.74%,
2.35%, 8.18%, 5.27%, 4.70%, and 4.08% on three datasets, respectively. In addition, the MAE
and RMSE of the MSNAN + NGCF model have achievements of 3.29%, 2.05%, 1.09%,
1.26%, 1.90%, and 0.94% over the best baseline on three datasets, respectively. Meanwhile,
the improvements of the MSNAN + LightGCN model to the best baseline are 2.74% and
2.35% for MAE and RMSE on the ML-100kr dataset, 1.33% and 1.41% on the BK-10C dataset,
and 2.23% and 1.86% on the Douban dataset, respectively. This shows that the proposed
MSNAN model can better learn the embedding representation of users and items and
effectively improve the accuracy of rating prediction, which verifies the significance of
collaborative semantics of multiscale neighbors.
Compared with these graph neural network baselines, the proposed MSNAN based
on multiscale neighbors achieves competitive improvements on all datasets. For example,
on the ML-100kr dataset, the MSNAN + NGCF, MSNAN + GCN, MSNAN + LightGCN,
and MSNAN + GAT methods improve by 5.32%, 3.83%; 3.38%, 3.14%; 4.28%, 4.00%; and
3.91%, 3.10% over the NGCF, GCN, LightGCN, and GAT baselines on MAE and RMSE
metrics. In addition, on the BK-10C dataset, the MSNAN + NGCF, MSNAN + GCN,
MSNAN + LightGCN, and MSNAN + GAT methods also improve MAE and RMSE values
by 1.89%, 1.29%; 2.60%, 1.18%; 1.46%, 1.56%; and 1.80%, 1.39% over the corresponding
graph model baselines, respectively. This is because the node embedding of MSNAN
combines collaborative semantics of multiscale neighbors, which purifies the important
information of similar neighbors in rating prediction. However, for different graph neural
network models, the representation quality of nodes is different, and the superposition
effect of MSNAN is also different. As is known, LightGCN simplifies the transformation
matrix and activation function, which obtains the highest embedding quality of node
representation, and can also improve the performance of this method. The performance
of the general GCN method is relatively backward. In addition, compared with ACCM
and AFM methods, the MSNAN approach can achieve a smaller error in the rating pre-
diction scenario, which indicates that the proposed model can better learn the embedding
representation of users and items with the collaborative signal of multiscale neighbors.
118
Electronics 2023, 12, 4372
Figure 2. MAE results of different methods on the ML-100kr dataset with different sparsity proportions.
Figure 3. RMSE results of different methods on the ML-100kr dataset with different sparsity proportions.
Compared with graph neural network methods, the proposed MSNAN model con-
sistently yields the best performance for MAE and RMSE on three datasets to adapt the
sparsity ratio of different scales. For different graph neural network models, the result
of blending the MSNAN model is more generalized than the original model. As shown
in Figures 4–7, when there is a low drop ratio, LightGCN can obtain better performance
compared with other graph neural network methods due to its better node representation
quality. This is because LightGCN itself effectively learns the embedding representation
of nodes by simplifying the nonlinear structure and reducing complexity. In large-scale
e-commerce platforms, the sparsity of user–item interactions can be high. It is difficult to
find collaborative neighbors based on the similarity of interaction behaviors. Although the
user–item matrix loses a part of interaction records, the proposed model fuses various
neighbor information from attribute views and rating views, which fully learns the rep-
resentations of users and items. According to the attributes of users or items, we can
find neighbors of different scales and conduct collaborative filtering recommendation.
119
Electronics 2023, 12, 4372
Moreover, the model takes advantage of high-quality collaborative signals from multiscale
neighbors for improving the quality of embedding representation, which is suitable for the
practice of large-scale e-commerce and alleviates the performance impact of data sparsity
to some extent.
Figure 4. MAE results of different methods on the BK-10C dataset with different sparsity proportions.
Figure 5. RMSE results of different methods on the BK-10C dataset with different sparsity proportions.
120
Electronics 2023, 12, 4372
Figure 6. MAE results of different methods on the Douban dataset with different sparsity proportions.
Figure 7. RMSE results of different methods on the Douban dataset with different sparsity proportions.
121
Electronics 2023, 12, 4372
embedding makes the greatest contribution on modeling the preference of the user. For
the Douban dataset, two att-view neighbor embedding is important. These results indicate
that multiscale neighbors on different views have a collaborative effect on the user’s
preference decisions.
122
Electronics 2023, 12, 4372
Figure 11. Attention weight of multiscale neighbor embedding for 10 users on the ML-100kr dataset.
Figure 12. Attention weight of multiscale neighbor embedding for 10 users on the BK-10C dataset.
123
Electronics 2023, 12, 4372
Figure 13. Attention weight of multiscale neighbor embedding for 10 users on the Douban dataset.
8. Conclusions
In this paper, we propose a multiscale neighbor-aware attention network for collab-
orative filtering recommendation. The proposed strategy fuses the global semantics of
various types of neighbors and important local embedding of multiscale neighbors. Multi-
ple attribute-view neighbors and interaction-view neighbors provide collaborative signals
to predict the user’s rating of items. Experiments verify the effectiveness of collaborative
contributions of multiscale neighbors for learning user and item representation. The key
finding is that the combination of multiscale attribute neighbors and interactive neigh-
bors can improve the accuracy of recommendation, and alleviate the poor performance of
recommendation in the case of sparse data. The disadvantage is that the computation of
multiscale neighbors requires different graph structures to learn the representation of nodes.
The platform can construct the graph structure offline and calculate multiscale neighbors,
which can reduce the online resource requirements. However, in the e-commerce scenario,
the proposed method can realize targeted personalized recommendation according to
different attribute neighbors of users. In particular, for some cold-start users who do not
have interactive behaviors, the method can select neighbors with similar attributes for the
target user according to one’s social attribute set, and then conduct collaborative filtering
recommendation. Some products and services can be recommended based on similar
attribute preferences of neighbors for target users. In addition, by combining with the
behavioral preferences of group users, we can make rating predictions and recommend
popular products for target users.
In future work, by investigating the semantic difference of various attribute and in-
teractive behavior views, we can focus on a consistency study of node representations on
different behavior views and improve the accuracy and interpretability of the recommen-
dation system. In addition, heterogeneous types of semantic information from different
types of user behaviors such as evaluation, clicking, and buying can describe the ordered
semantic interests of the user. We will distinguish the types of multiple interaction behav-
iors to learn heterogeneous semantic representations and model the sequential relations
between different behaviors.
Author Contributions: Methodology, J.Z., F.C. and Q.C.; Software, T.J.; Investigation, J.Z. and Y.L.;
Writing—original draft, J.Z.; Writing—review & editing, Y.K. and Q.C. All authors have read and
agreed to the published version of the manuscript.
Funding: This work was partially supported by the National Natural Science Foundation of China
(nos. 62272286, 62072291), the Natural Science Foundation of Shanxi Province (nos. 20210302123468,
202203021221021, 202203021221001).
124
Electronics 2023, 12, 4372
Data Availability Statement: Data used in this manuscript consist of publicly available standard
benchmark datasets.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Li, Z.; Cui, Z.; Wu, S.; Zhang, X.; Wang, L. Fi-gnn: Modeling feature interactions via graph neural networks for ctr prediction.
In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–10
November 2019; pp. 539–548.
2. Naghiaei, M.; Rahmani, H.; Deldjoo, Y. CPFair: Personalized Consumer and Producer Fairness Re-ranking for Recommender
Systems. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval,
Madrid, Spain, 11–15 July 2022; pp. 770–779.
3. Shi, S.; Zhang, M.; Liu, Y.; Ma, S. Attention-based adaptive model to unify warm and cold starts recommendation. In Proceedings
of the 27th ACM International Conference on Information and Knowledge Management, Turin, Italy, 22–26 October 2018;
pp. 127–136.
4. Ge, Y.; Tan, J.; Zhu, Y.; Xia, Y.; Luo, J.; Liu, S.; Fu, Z.; Geng, S.; Li, Z.; Zhang, Y. Explainable Fairness for Feature-aware
Recommender Systems. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in
Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 1–11.
5. Xiao, J.; Ye, H.; He, X.; Zhang, H.; Wu, F.; Chua, T. Attentional factorization machines: Learning the weight of feature interactions
via attention networks. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Beijing, China, 21–22
May 2017; pp. 3119–3125.
6. Wang, X.; He, X.; Cao, Y.; Liu, M.; Chua, T. Kgat: Knowledge graph attention network for recommendation. In Proceedings of the
25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019;
pp. 950–958.
7. Koren, Y.; Bell, R.; Volinsky, C. Matrix Factorization Techniques for Recommender Systems. Computer 2009, 42, 30–37. [CrossRef]
8. Salakhutdinov, R.; Mnih, A. Probabilistic matrix factorization. In Proceedings of the 20th International Conference on Neural
Information Processing Systems, Vancouver, WA, USA, 3–6 December 2007; pp. 849–858.
9. Koren, Y. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008;
pp. 426–434.
10. Lee, J.; Kim, S.; Lebanon, G.; Singer, Y. Local low-rank matrix approximation. In Proceedings of the 30th International Conference
on International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 82–90.
11. Ling, G.; Lyu, M.; King, I. Ratings meet reviews, a combined approach to recommend. In Proceedings of the 8th ACM Conference
on Recommender Systems, Foster City, CA, USA, 6–10 October 2014; pp. 105–112.
12. Rendle, S.; Freudenthaler, C. Improving pairwise learning for item recommendation from implicit feedback. In Proceedings of the
7th ACM International Conference on Web Search and Data Mining, New York, NY, USA, 24–28 February 2014; pp. 273–282.
13. Ning, X.; Karypis, G. Sparse linear methods with side information for top-n recommendations. In Proceedings of the 6th ACM
Conference on Recommender Systems, Dublin, Ireland, 9–13 September 2012; pp. 155–162.
14. Sedhain, S.; Menon, A.; Sanner, S.; Xie, L. Autorec: Autoencoders meet collaborative filtering. In Proceedings of the 24th
International Conference on World Wide Web, Florence, Italy, 18–22 May 2015; pp. 111–112.
15. Park, J.; Nam, K. Group recommender system for store product placement. Data Min. Knowl. Disc. 2019, 33, 204–229. [CrossRef]
16. Zheng, J.; Liu, J.; Shi, C.; Zhuang, F.; Li, J.; Wu, B. Dual Similarity Regularization for Recommendation. In Proceedings of the 2016
Pacific-Asia Conference on Knowledge Discovery and Data Mining, Auckland, New Zealand, 19–22 April 2016; pp. 542–554.
17. He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T. Neural collaborative filtering. In Proceedings of the 26th International
Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 173–182.
18. Lian, J.; Zhou, X.; Zhang, F.; Chen, Z.; Xie, X. DeepFM: Combining explicit and implicit feature interactions for recommender
systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London,
UK, 19–23 August 2018; pp. 1754–1763.
19. Cai, Y.; Ke, W.; Cui, E.; Yu, F. A deep recommendation model of cross-grained sentiments of user reviews and ratings. Inf. Process
Manag. 2022, 59, 102842. [CrossRef]
20. Cheng, H.; Koc, L.; Harmsen, J.; Shaked, T.; Chandra, T.; Aradhye, H.; Anderson, G.; Corrado, G.; Chai, W.; Ispir, M.; et al. Wide
& deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems,
Boston, MA, USA, 15–19 September 2016; pp. 7–10.
21. He, X.; Chua, T. Neural factorization machines for sparse predictive analytics. In Proceedings of the 40th International ACM
SIGIR conference on Research and Development in Information Retrieval, New York, NY, USA, 7–12 February 2017; pp. 355–364.
22. Wang, X.; Wang, R.; Shi, C.; Song, G.; Li, Q. Multi-component graph convolutional collaborative filtering. In Proceedings of the
AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 6267–6274.
23. Magron, P.; Fevotte, C. Neural content-aware collaborative filtering for cold-start music recommendation. Data Min. Knowl. Disc.
2022, 36, 1971–2005. [CrossRef]
125
Electronics 2023, 12, 4372
24. Jin, B.; Gao, C.; He, X.; Jin, D.; Li, Y. Multi-behavior recommendation with graph convolutional networks. In Proceedings of the
43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Xi’an, China, 25–30 July 2020;
pp. 659–668.
25. Su, Z.; Dou, Z.; Zhu, Y.; Qin, X.; Wen, J. Modeling Intent Graph for Search Result Diversification. In Proceedings of the
44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Online, 11–15 July 2021;
pp. 736–746.
26. Li, H.; Chen, Z.; Li, C.; Xiao, R.; Deng, H.; Zhang, P.; Liu, Y.; Tang, H. Path-based Deep Network for Candidate Item Matching in
Recommenders. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information
Retrieval, Online, 11–15 July 2021; pp. 1493–1502.
27. Tai, C.; Huang, L.; Huang, C.; Ku, L. User-Centric Path Reasoning towards Explainable Recommendation. In Proceedings of
the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Online, 11–15 July 2021;
pp. 879–889.
28. Duan, H.; Zhu, Y.; Liang, X.; Zhu, Z.; Liu, P. Multi-feature fused collaborative attention network for sequential recommendation
with semantic-enriched contrastive learning. Inf. Process Manag. 2023, 60, 103416. [CrossRef]
29. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, L.; Polosukhin, I. Attention is all you need.
In Proceedings of the 31st Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017;
pp. 5998–6008.
30. Wang, X.; Jin, H.; Zhang, A.; He, X.; Xu, T.; Chua, T. Disentangled graph collaborative filtering. In Proceedings of the 43rd
International ACM SIGIR Conference on Research and Development in Information Retrieval, Xi’an, China, 25–30 July 2020;
pp. 1001–1010.
31. Chen, J.; Zhang, H.; He, X.; Nie, L.; Liu, W.; Chua, T. Attentive collaborative filtering: Multimedia recommendation with item-and
component-level attention. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in
Information Retrieval, Tokyo, Japan, 7–11 August 2017; pp. 335–344.
32. Tang, X.; Wang, T.; Yang, H.; Song, H. AKUPM: Attention-enhanced knowledge-aware user preference model for recommendation.
In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK,
USA, 4–8 August 2019; pp. 1891–1899.
33. Wang, X.; He, X.; Wang, M.; Feng, F.; Chua, T. Neural graph collaborative filtering. In Proceedings of the 42nd International ACM
SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 165–174.
34. Wang, Z.; Lin, G.; Tan, H.; Chen, Q.; Liu, X. CKAN: Collaborative knowledge-aware attentive network for recommender systems.
In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Xi’an,
China, 25–30 July 2020; pp. 219–228.
35. Niu, G.; Li, Y.; Tang, C.; Geng, R.; Dai, J.; Liu, Q.; Wang, H.; Sun, J.; Huang, F.; Si, L. Relational Learning with Gated and
Attentive Neighbor Aggregator for Few-Shot Knowledge Graph Completion. In Proceedings of the 44th International ACM
SIGIR Conference on Research and Development in Information Retrieval, Online, 11–15 July 2021; pp. 213–222.
36. Xiao, W.; Zhao, H.; Pan, H.; Song, Y.; Zheng, V.; Yang, Q. Social explorative attention based recommendation for content
distribution platforms. Data Min. Knowl. Disc. 2021, 35, 533–567. [CrossRef]
37. Ye, H.; Song, Y.; Li, M.; Cao, F. A new deep graph attention approach with influence and preference relationship reconstruction
for rate prediction recommendation. Inf. Process Manag. 2023, 60, 103439. [CrossRef]
38. He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.; Wang, M. Lightgcn: Simplifying and powering graph convolution network for
recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information
Retrieval, Xi’an, China, 25–30 July 2020; pp. 639–648.
39. Zheng, Y.; Tang, B.; Ding, W.; Zhou, H. A Neural Autoregressive Approach to Collaborative Filtering. In Proceedings of the 33rd
International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 764–773.
40. Kipf, T.; Welling, M. Semi-supervised classification with graph convolutional networks. Commun. ACM 2007, 50, 36–44.
41. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. In Proceedings of the 6th
International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018.
42. Liu, X.; Wu, J.; Zhou, C.; Pan, S.; Cao, Y.; Wang, B. Task-adaptive neural process for user cold-start recommendation. In
Proceedings of International World Wide Web Conference, Ljubljana, Slovenia, 19–23 April 2021; pp. 1306–1316.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
126
electronics
Article
Machine Learning for Energy-Efficient Fluid Bed Dryer
Pharmaceutical Machines
Roberto Barriga 1 , Miquel Romero 2 and Houcine Hassan 1, *
Abstract: The pharmaceutical industry is facing significant economic challenges due to measures
aimed at containing healthcare costs and evolving healthcare regulations. In this context, pharmaceu-
tical laboratories seek to extend the lifespan of their machinery, particularly fluid bed dryers, which
play a crucial role in the drug production process. Older fluid bed dryers, lacking advanced sensors
for real-time temperature optimization, rely on fixed-time deterministic approaches controlled by
operators. To address these limitations, a groundbreaking approach taking into account Exploration
Data Analysis (EDA) and a Catboost machine-learning model is presented. This research aims to
analyze and enhance a drug production process on a large scale, showcasing how AI algorithms
can revolutionize the manufacturing industry. The Catboost model effectively reduces preheating
phase time, resulting in significant energy savings. By continuously monitoring critical parameters,
a paradigm shift from the conventional fixed-time models is achieved. It has been shown that the
model is able to predict on average a reduction of 50.45% of the preheating process duration and up
to 59.68% in some cases. Likewise, the energy consumption of the fluid bed dryer for the preheating
process could be reduced on average by 50.48% and up to 59.76%, which would result on average in
around 3.120 kWh energy consumption savings per year.
Keywords: energy consumption; IoT-based power control systems; machine learning; optimization
using sensor data; predictive control; pharmaceutical technology; process modeling; exploratory
data analysis
Citation: Barriga, R.; Romero, M.;
Hassan, H. Machine Learning for
Energy-Efficient Fluid Bed Dryer
Pharmaceutical Machines. Electronics 1. Introduction
2023, 12, 4325. https://doi.org/ The entire pharmaceutical manufacturing process comprises multiple stages, including
10.3390/electronics12204325 dispensing, granulation, drying, compression, and coating [1], as depicted in the diagram
Academic Editor: Adel M. Sharaf
below in Figure 1.
Fluid bed drying technology is widely employed in pharmaceutical manufacturing
Received: 20 September 2023 due to its high efficiency in drying granules obtained through wet granulation [2]. However,
Revised: 11 October 2023 the primary challenge associated with using a fluid bed dryer lies in the time and energy it
Accepted: 17 October 2023 consumes to complete the process. The drying process entails three phases: (i) preheating
Published: 18 October 2023 the machine without introducing any product, (ii) drying the product, and (iii) cooling the
machine for product cooling. Costs are incurred in all three phases, encompassing the time
taken by the machines and the energy required for heating and air circulation. Additionally,
the budget is impacted by the number of operators involved in handling the machine [3].
Copyright: © 2023 by the authors.
The fluid bed drying of wet granules obtained through high shear granulation involves
Licensee MDPI, Basel, Switzerland.
This article is an open access article
a combination of moisture diffusion from the solid material, facilitated by hot air, and the
distributed under the terms and
entrainment of this moisture through forced convection. The success of this process relies
conditions of the Creative Commons on the uniform fluidization of the granules by hot air, ensuring efficient mass and energy
Attribution (CC BY) license (https:// transfer. The drying time can be reduced by increasing the temperature and intake airflow.
creativecommons.org/licenses/by/ However, each parameter must be carefully tailored for the specific granule type. The inlet
4.0/). air temperature is adjusted based on the temperature signal recorded by the air sensor in
contact with the fluidized product, ensuring it does not exceed the critical temperature for
pharmaceutical stability. Inlet air humidity is kept within a narrow dew-point range to
achieve batch-to-batch reproducibility. Thus, under optimized conditions of temperature,
humidity, and airflow entering the machine, drying takes less time and generates a high-
quality product. Temperature, pressure, and flow sensors monitor the changes throughout
the process [4,5].
2. Related Work
The most significant hurdle in employing a fluid bed dryer lies in mitigating the
substantial time and energy consumption associated with completing the process. Follow-
ing the electric energy crises of the 1970s [7], electricity consumption became a topic of
128
Electronics 2023, 12, 4325
discussion. Furthermore, it has been established that global electric energy use is quickly
expanding [8], specifically in the pharmaceutical industry, which is a growing field nowa-
days. As a result, every pharmaceutical company seeks to utilize as little electric energy
as possible in many sectors, such as manufacturing fields, packing industrial processes,
and transportation to different hospitals or medical stores [9]. Utilizing advanced analytics
techniques, such as machine learning, enables us to anticipate the electricity consumption
in diverse pharmaceutical manufacturing processes, allowing us to tailor strategies to
specific domains [10]. The accurate prediction of electricity usage holds paramount impor-
tance for decision makers and policymakers within the pharmaceutical industry, given the
energy-intensive nature of its machinery. In the context of increasingly dynamic electricity
markets, where prices are subject to fluctuation, understanding and forecasting electricity
usage becomes even more critical. The ability to predict electricity costs can significantly
impact the bottom line for pharmaceutical manufacturers. Comprehending the expected
electric energy consumption empowers us to envision enhancements in pharmaceutical
manufacturing processes, aiming to reduce electricity usage. This predictive capability,
whether in the short or long term, equips us with insights into energy-saving opportunities
and strategies for optimizing current energy consumption, thus mitigating the potential
impacts of rising electricity prices. With many variables, estimating energy usage is a
problematic manufacturing task [11]. Machine learning models are currently employed in
various fields, since they are beneficial. Machine learning operates similarly to a function
that nicely maps the input data to the output. Machine-learning models can give high-
accuracy predictions for energy usage in the pharmaceutical process or the heating process
in the manufacturing process. As a result, pharmaceutical companies can use them to
enact energy-saving initiatives in different manufacturing domains. For example, machine
learning algorithms can forecast how much electric energy is utilized in a dryer machine in
manufacturing [12]. They can also be used to forecast the future-energy consumption, such
as power or organic gas [13]. Numerous studies have showcased the wide applicability
of machine learning techniques in the pharmaceutical industry [14–18]. For instance, [19]
conducted a comprehensive investigation into the implementation of Artificial Neural
Networks (ANNs) for the development and formulation of pharmaceutical products us-
ing a Quality by Design approach for tablet formulations. By leveraging historical data,
the researchers were able to gain valuable insights into the intricate interactions between
formulation variables and drug specifications. The study’s conclusions emphasized the
efficiency of neural networks and genetic algorithms in optimizing formulations, ultimately
leading to reduced energy consumption.
3. Proposed Methodology
Figure 2, from left to right, shows the overall approach for data modeling and simulat-
ing. First, a business need and objective have to be clearly agreed—in the present work, the
modeling and optimization of the drying process—due to the high energetic cost and the
evaluation that significant savings can be obtained. Next, the right data have to be captured
in order to satisfy the business objective. This is followed by data exploration/processing,
modeling and finally evaluation of the results [20]. Note that this can, in practice, become
a cyclic processing iterating back from the result evaluation phase to the data collection
phase, or even back to re-evaluate the business need.
• Define business problem: The initial phase of the machine learning workflow in-
volves defining the business problem. The duration of this step varies, ranging from
several days to a few weeks, depending on the complexity of the problem and its
specific application. During this stage, data scientists collaborate with subject matter
experts (SMEs) to gain a comprehensive understanding of the problem. This involves
conducting interviews with key stakeholders, gathering pertinent information, and
establishing overall project goals. In the case at hand, our objective is to minimize the
energy consumption in the fluid bed dryer.
129
Electronics 2023, 12, 4325
• Obtain the data: Once the understanding of the problem is achieved, it is about
obtaining the information identified and available for solving the business problem.
In our case, the data obtained from the fluid bed dryer will be used directly.
• Explore the data: The next step in the process is exploration data analysis (EDA),
which involves analyzing the raw data. The primary objective of EDA is to delve into
the data, evaluate its quality, identify any missing values, examine feature distributions,
assess correlations, and so on.
• Create the model: Model creation encompasses various tasks, including dividing
the data into training and testing sets, handling missing values, training multiple
models, fine-tuning hyperparameters, consolidating models, evaluating performance
metrics, and ultimately selecting the optimal model for deployment to forecast our
target variable. In our specific scenario, we aimed to predict the duration required
for the preheating process in order to minimize energy consumption. In this paper,
Catboost machine learning model for optimizing fluid bed dryer energy consumption
is used.
J
h( x ) = ∑ b j l{ x ∈ R j }
j =1
130
Electronics 2023, 12, 4325
131
Electronics 2023, 12, 4325
product to initiate the drying phase. During drying, operators take samples to analyze
various chemical parameters. After drying, the cooling process commences. Once all
three phases are finished, the fluid bed dryer undergoes cleaning before a new batch is
processed. Overall, continuous monitoring of the outlet air temperature through SCADA is
crucial to ensure the preheating process is controlled effectively and prevent unnecessary
energy consumption.
ȱ
Figure 3. Fluid bed dryer Fielder Aeromatic.
132
Electronics 2023, 12, 4325
56 columns, containing the recorded information from the various sensors and parameters
of the fluid bed dryer. This comprehensive dataset was the basis for our further exploration
and optimization of the fluid bed drying process.
Item TagName (Symbol) Description Min Max Units PMA TSG CIP
1 FS3_GEA_EIS1200_ME Impeller power [Kw] 0 300 Kw X
2 FS3_GEA_EOP_GP Current EOP in GP 0 1000 None X
3 FS3_GEA_EOP_MP Current EOP in MP 0 1000 None X
4 FS3_GEA_FIC1217_ME Liquid flow rate in GP [cl/min] 0 833 cl/min X
5 FS3_GEA_FIC1217_XS Liquid flow setpoint in GP [cl/min] 0 833 cl/min X
6 FS3_GEA_FIC200_ME Air flow [m3 /h] 0 4500 m3 /h X
7 FS3_GEA_FIC200_XS Air flow setpoint [m3 /h] 0 4500 m3 /h X
8 FS3_GEA_FIC701_ME Spray liquid flow in MP [cl/min] 0 667 cl/min X
9 FS3_GEA_FIC701_XS Spray liquid flow rate setpoint in MP [cl/min] 0 667 cl/min X
10 FS3_GEA_LI940_ME Cleaning water tank level [L] 0 500 L X
11 FS3_GEA_MIS213_ME Inlet air humidity [g/Kg] 0 250 g/Kg X
12 FS3_GEA_NFGP No. Current phase in execution in GP 0 1000 None X
13 FS3_GEA_NFMP No. Current phase in execution in MP 0 1000 None X
14 FS3_GEA_NFW No. Current cleaning phase in execution 0 1000 None X
15 FS3_GEA_NW No. Current cleaning in execution 0 1000 None X
133
Electronics 2023, 12, 4325
On the SCADA screen, the status of the station in detail can be seen, including the
values of the sensors and valves, for example the temperature or pressure, and in the upper
right, it shows the state of the fluid bed dryer, what process it is carrying out and what state
each of them is in (granulating, drying or cleaning). For example, when steam is added
to the fluid bed dryer to control the humidity of the air that is introduced into the dryer,
if the humidity is very low, more steam is added to increase it. The air that is introduced
into the dryer allows us to control both the temperature and its humidity. The pressure
of the dryer is indicative of the clogging of the filters: if there is a big difference between
the internal pressure and the output pressure, it means that it has dirty filters, and you
need to clean them. The SCADA records and monitors the operating status of the fluid
bed dryer in its operating modes and states and the duration of these and the registers of
the analog parameters involved. Taking into account the drying process and how the fluid
bed machine works, four sensors were selected for the exploration analysis. The signals
recorded by the different sensors in the fluid bed dryer were as follows:
• Fan motor: this signal indicates whether the fluid bed dryer is currently running (ON)
or turned off (OFF).
• Air flow: This signal represents the quantity of air flowing into the fluid bed dryer,
measured in cubic meters per hour (m3 /h). The machine operator configures this
parameter. Monitoring the air flow helps distinguish between the preheating and
drying phases, as both processes require air to be completed.
• Inlet air temperature: this signal indicates the initial temperature of the air entering
the fluid bed dryer, and it is set by the machine operator at the start of the process.
• Outlet air temperature: this signal indicates the temperature of the air leaving the fluid
bed dryer.
During the operation of the fluid bed dryer and the commencement of the hot air inlet
process, it is essential to consider the heat absorbed by the machine to reach the preheating
temperature. The temperature difference between the outlet air temperature and the inlet
air temperature helps determine the amount of heat absorbed by the fluid bed dryer. When
the machine reaches a point where it cannot absorb more heat, the inlet air temperature
will become similar to the outlet air temperature. To better understand the behavior of the
process, the temperature difference of the air inlet and outlet of the machine was utilized,
denoted as TA D , which is defined in Equation (1):
where TAs represents the outlet air temperature, TAe represents the inlet air temperature,
and TA D represents the temperature difference.
5. Experimental Results
5.1. Exploratory Data Analysis
The dataset used for machine learning analyses is the same that was used for the
exploratory data analysis. It includes various parameters related to the fluid bed dryer’s
operation, such as the inlet and outlet air temperatures, airflow rate, and the phase number
the machine is in (preheating, drying, or cooling). Information from over 200 batches of
dried drug product, covering a span of 18 months of production, was also accessible for
analysis. The variables used in the current study were the following:
• The phase indicator takes values 1, 2, or 3, representing the current phase of the fluid
bed dryer. Phase 1 indicates preheating, Phase 2 is the drying phase, and Phase 3
indicates cooling after the drying process.
• The inlet air temperature sensor represents the temperature at which the air enters the
machine during any of the three phases (preheating, drying, or cooling).
• The outlet air temperature signal corresponds to the temperature at which the air
leaves the machine.
• The inlet airflow sensor indicates the volume of air supplied by the machine’s fan.
134
Electronics 2023, 12, 4325
• The fan motor signal is useful for determining when the machine is active during any
of the three phases, indicating the fan motor’s movement.
In the next step of the analysis, random days will be selected to observe the behavior
of the machine signals during the preheating, drying, and cooling processes for each batch
of pharmaceutical product processed. The primary goal of this exploration is to identify
trends and gain a better understanding of fluid bed dryer processes, with the objective
of identifying opportunities for improvement. Figure 5 visually depicts the behavior of
the signals on different days, representing a full day of fluid bed dryer operation. The
x-axis represents the elapsed time for one day of fluid bed dryer operation (1440 min,
corresponding to 24 h), while the y-axis indicates the difference in temperature between
the machine’s inlet and outlet air. The blue dots indicate the preheating process, the orange
dots represent the drying process, and the green dots represent the cooling process.
Figure 6 shows a sample of 4 different days taken randomly, where it can be observed
that some days, the fluid bed dryer processed one batch, and other days two batches, with
an average of around 350 min per batch. In the figure dated 2 December 2019, there are
two batches that were processed, and if looking at the blue dots, it can be seen that the
preheating process lasted much longer in the two batches, compared to the duration of the
preheating process; for example, on 7 October 2018, the blue dots were much smaller and
the temperature difference, the y-axis, did not exceed 10 degrees. It can also be observed
that the duration of the drying process, the orange dots, was more or less homogeneous,
and lasted approximately the same for all the days and all the batches (x-axis); as well as
this, the temperature differences were approximately similar (y-axis). As a conclusion, it is
135
Electronics 2023, 12, 4325
evident from the data that the duration of the preheating process exhibits variability. To
preheat the fluid bed dryer, some batches take a longer time preheating the machine than
others, with the consequent unnecessary consumption of energy.
Figure 6. Example of 4 different days of batch drying. Above each figure is plotted the date of the
batch (1.0 Preheating, 2.0 Drying, 3.0 Cooling).
where Batcht is the time consumed by the fluid bed dryer for preheating the batch, and
Cpm corresponds to the fluid bed dryer energy consumption per minute. The fluid bed
dryer currently consumes 18.5 kWh during the preheating process; this means that for each
136
Electronics 2023, 12, 4325
minute it consumes 0.31 kWh (18.5 kWh/60 min = 0.31 kWh). If the preheating process
may take between 50.1 and 180.3 min, the fluid bed dryer consumes between 15.5 kWh and
55.8 kWh for preheating the machine to dry one batch of drug product.
Figure 7. Fluid bed dryer preheating duration in minutes. Brown line indicates average duration.
ȱ
Figure 8. Fluid bed dryer preheating energy consumption (kWh). Brown line indicates average duration.
137
Electronics 2023, 12, 4325
It can be observed that some batches needed 55.8 kWh; however, other batches needed
less than 15.5 kWh, which means in some cases around 72.2% less energy consumption
for some batches. The brown line indicates the average consumption for the 200 batches,
around 30.9 kWh. This indicates important potential energy savings if the preheating
process in the fluid bed dryer is optimized. To calculate the potential energy savings of the
fluid bed dryer during the preheating process for each batch, a machine learning model
was implemented, as discussed in the next chapter, to predict when the right time was to
stop the process, and therefore, consume just the energy needed for preheating the fluid
bed dryer.
138
Electronics 2023, 12, 4325
to obtain different combinations of partitions). The results of the most relevant algorithm’s
evaluation are shown in Table 2.
Based on Table 2, the Catboost Regressor has the lowest MAE of 83.507 and the lowest
RMSE of 118.781, indicating that it has the best predictive accuracy compared to the other
models. It also has the highest R2 value of 0.6806, indicating that it can explain about
68.06% of the variance in the target variable. The Light Gradient Boosting Machine has the
second-best performance, with slightly higher MAE and RMSE values than the Catboost
model, and an R2 value of 0.67. The Extreme Gradient Boosting, Random Forest Regressor,
and Gradient Boosting Regressor models have higher MAE, MSE, and RMSE values and
lower R2 values than the Catboost and Light Gradient Boosting models, indicating that they
may not perform as well on this specific dataset, the same as the rest of the models. To select
the best metric for the Catboost algorithm, the nature of the problem and the evaluation
criteria was considered. To measure the proportion of variance in the target variable that
can be explained by the model, R2 was the most suitable metric. MAE was discarded
because it focuses on minimizing the average absolute difference between predicted and
actual values, and MSE or RMSE penalize larger errors more than smaller errors.
139
Electronics 2023, 12, 4325
Figure 9. Time duration for preheating process comparing real duration with Catboost prediction.
140
Electronics 2023, 12, 4325
ȱ
Figure 10. Energy consumption for preheating process comparing real energy with Catboost prediction.
6. Conclusions
This paper introduced an exploration data analysis methodology tailored for the anal-
ysis and optimization of a large-scale drug production process, and a Catboost machine
learning model implementation, specifically focusing on the preheating stage of pharma-
ceutical granules using a fluid bed dryer. As a conclusion drawn from the exploratory data
analysis of the signals, it can be stated that the preheating phase lasts longer than necessary.
Some batches need less than 50.1 min to complete the preheating process; however, there
are batches that take up to 180.3 min. In terms of energy consumption, this means that for
some batches, the fluid bed dryer consumes 15.5 kWh, and for others it consumes 55.8 kWh,
which could represent savings, in some cases, of 72.2% of energy. In addition, the most
suitable model for the fluid bed dryer prediction process was selected based on the current
dataset obtained from the activity of the fluid bed dryer process in the production plant.
First, several models, including Catboost, Elastic net, Random Forest or Linear Regression,
were compared. Catboost was selected because it provided the lowest error and, at the
same time, the highest R2, as it has been described in previous sections. Once the model
was selected, the analysis of the historical dataset, with 200 batches from 18 months of
production, was performed. It has been shown that the model is able to predict on average
a reduction of 50.45% of the preheating process duration and up to 59.68% in some cases.
Likewise, the energy consumption of the fluid bed dryer for the preheating process could
be reduced on average by 50.48% and up to 59.76%, which results on average in around
3.120 kWh of energy consumption savings per year.
Author Contributions: Conceptualization, R.B. and M.R.; methodology, R.B.; software, R.B.; val-
idation, R.B. and M.R.; formal analysis, R.B.; investigation, R.B.; resources, R.B.; data curation,
R.B.; writing—original draft preparation, R.B.; writing—review and editing all authors; visualiza-
tion, all authors; supervision H.H. All authors have read and agreed to the published version of
the manuscript.
Funding: This research received no external funding.
Data Availability Statement: Data is unavailable due to privacy restrictions.
141
Electronics 2023, 12, 4325
References
1. Pharmaguide. Available online: https://www.pharmaguideline.com/2021/10/tablet-manufacturing-process-overview.html
(accessed on 1 September 2023).
2. Parikh, D. How to Optimize Fluid Bed Processing Technology: Part of the Expertise in Pharmaceutical Process Technology Series; Academic
Press: Cambridge, MA, USA, 2017.
3. Lourenço, V.; Lochmann, D.; Reich, G.; Menezes, J.; Herdling, T.; Schewitz, J. A quality by design study applied to an industrial
pharmaceutical fluid bed granulation. Eur. J. Pharm. Biopharm. 2012, 81, 438–447. [CrossRef] [PubMed]
4. Burggraeve, A.; Monteyne, T.; Vervaet, C.; Remon, J.P.; De Beer, T. Process analytical tools for monitoring, understanding, and
control of pharmaceutical fluidized bed granulation: A review. Eur. J. Pharm. Biopharm. 2013, 83, 2–15. [CrossRef] [PubMed]
5. Yüzgeç, U.; Becerikli, Y.; Türker, M. Dynamic neural-network-based model-predictive control of an industrial baker’s yeast
drying process. IEEE Trans. Neural Netw. 2008, 19, 1231–1242. [CrossRef]
6. Price, W.N. Making do in making drugs: Innovation policy and pharmaceutical manufacturing. Boston Coll. Law Rev. 2013,
55, 2013. [CrossRef]
7. Lifset, R.D. A new understanding of the American energy crisis of the 1970s. Hist. Soc. Res. Hist. Sozialforschung 2014, 39, 22–42.
8. Boyd, G.A. Development of a Performance-based Industrial Energy Efficiency Indicator for Pharmaceutical Manufacturing Plants; Duke
University: Durham, NC, USA, 2013. [CrossRef]
9. Thomas, P. Will Pharma Wear the Energy Star. Pharma Manufacturing, 6 March 2006.
10. Pazhayattil; Babu, A.; Konyu-Fogel, G. An empirical study to accelerate machine learning and artificial intelligence adoption in
pharmaceutical manufacturing organizations. J. Generic Med. 2023, 19, 17411343221151109.
11. Mujumdar, A.S. Research and development in drying: Recent trends and future prospects. Dry. Technol. 2014, 22, 1–26. [CrossRef]
12. Aghbashlo, M.; Mobli, H.; Rafiee, S.; Madadlou, A. The use of artificial neural network to predict exergetic performance of spray
drying process: A preliminary study. Dry. Technol. 2012, 88, 32–43. [CrossRef]
13. Lai, J.-P.; Chang, Y.-M.; Chen, C.-H.; Pai, P.-F. A survey of machine learning models in renewable energy predictions. Appl. Sci.
2020, 10, 5975. [CrossRef]
14. Diaz, L.P.; Brown, C.J.; Ojo, E.; Mustoe, C.; Florence, A.J. Machine learning approaches to the prediction of powder flow behaviour
of pharmaceutical materials from physical properties. Digit. Discov. 2023, 2, 692–701. [CrossRef]
15. Sciuto, G.L.; Susi, G.; Cammarata, G.; Capizzi, G. A spiking neural network-based model for anaerobic digestion process. In
Proceedings of the 2016 International Symposium on Power Electronics, Electrical Drives, Automation and Motion (SPEEDAM),
Capri, Italy, 22–24 June 2016; pp. 996–1003.
16. Kim, D.; Kim, M.; Kim, W. Wafer Edge Yield Prediction Using a Combined Long Short-Term Memory and Feed- Forward Neural
Network Model for Semiconductor Manufacturing. IEEE Access 2020, 8, 215125–215132. [CrossRef]
17. Wang, J.; Zhang, J.; Wang, X. A Data Driven Cycle Time Prediction with Feature Selection in a Semiconductor Wafer Fabrication
System. IEEE Trans. Semicond. Manuf. 2018, 31, 173–182. [CrossRef]
18. Aksu, B.; Matas, M.D.; Cevher, E.; Özsoy, Y.; Güneri, T.; York, P. Quality by design approach for tablet formulations containing
spray coated ramipril by using artificial intelligence techniques. Int. J. Drug Deliv. 2012, 4, 59.
19. Peterson, J.J.; Snee, R.D.; McAllister, P.R.; Schoeld, T.L.; Carella, A.J. Statistics in pharmaceutical development and manufacturing.
J. Qual. Technol. 2019, 41, 111–134. [CrossRef]
20. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features.
In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 3–8
December 2017.
21. Liu, Z. Using neural network to establish manufacture production performance forecasting in IOT environment. J. Supercomput.
2022, 78, 9595–9618. [CrossRef]
22. Markarian, J. Modernizing pharma manufacturing. Pharm. Technol. 2018, 42, 20–25.
23. Nettleton, D.F.; Wasiak, C.; Dorissen, J.; Gillen, D.; Tretyak, A.; Bugnicourt, E.; Rosales, A. Data Modeling and Calibration of
In-Line Pultrusion and Laser Ablation Machine Processes. In Proceedings of the International Conference on Advanced Data
Mining and Applications (ICADMA), Barcelona, Spain, 20–21 August 2018.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
142
electronics
Article
A Network Intrusion Detection Model Based on BiLSTM with
Multi-Head Attention Mechanism
Jingqi Zhang 1 , Xin Zhang 1 , Zhaojun Liu 1 , Fa Fu 1, *, Yihan Jiao 1 and Fei Xu 2
1 College of Computer Science and Technology, Hainan University, Haikou 570228, China;
[email protected] (J.Z.); [email protected] (X.Z.); [email protected] (Z.L.);
[email protected] (Y.J.)
2 College of Civil and Architecture Engineering, Hainan University, Haikou 570228, China;
[email protected]
* Correspondence: [email protected]
Abstract: A network intrusion detection tool can identify and detect potential malicious activities
or attacks by monitoring network traffic and system logs. The data within intrusion detection
networks possesses characteristics that include a high degree of feature dimension and an unbalanced
distribution across categories. Currently, the actual detection accuracy of some detection models is
relatively low. To solve these problems, we propose a network intrusion detection model based on
multi-head attention and BiLSTM (Bidirectional Long Short-Term Memory), which can introduce
different attention weights for each vector in the feature vector that strengthen the relationship
between some vectors and the detection attack type. The model also utilizes the advantage that
BiLSTM can capture long-distance dependency relationships to obtain a higher detection accuracy.
This model combined the advantages of the two models, adding a dropout layer between the
two models to improve the detection accuracy while preventing training overfitting. Through
experimental analysis, the network intrusion detection model that utilizes multi-head attention and
BilSTM achieved an accuracy of 98.29%, 95.19%, and 99.08% on the KDDCUP99, NSLKDD, and
CICIDS2017 datasets, respectively.
Citation: Zhang, J.; Zhang, X.; Liu, Z.; Keywords: intrusion detection; deep learning; multi-head attention; BiLSTM
Fu, F.; Jiao, Y.; Xu, F. A Network
Intrusion Detection Model Based on
BiLSTM with Multi-Head Attention
Mechanism. Electronics 2023, 12, 4170.
1. Introduction
https://doi.org/10.3390/
electronics12194170 In recent years, network intrusion has gradually expanded, resulting in the theft of
personal privacy and becoming the main attack platform [1]. Intrusion detection, as one of
Academic Editors: Chao Zhang,
the most important network security protection tools after firewalls, plays a more important
Wentao Li, Huiyan Zhang and
role in network security defense systems [2]. It can be defined as “network security devices
Tao Zhan
that monitor network traffic to find unexpected patterns” [3]. Intrusion detection is the
Received: 31 August 2023 process of monitoring network traffic or system activity for unauthorized access, policy
Revised: 29 September 2023 violations, and other malicious activities. The intrusion detection aims to identify potential
Accepted: 6 October 2023 security breaches and alert security personnel so they can take appropriate action to prevent
Published: 8 October 2023 further damage. There are various IDS (intrusion detection systems) to be used, including
host-based IDS and network-based IDS. Host-based IDS monitor activity on a single device
or server, while network-based IDS monitor all traffic on a network segment. Intrusion
detection is a significant component of a comprehensive cybersecurity strategy. It can
Copyright: © 2023 by the authors.
detect intrusions effectively by monitoring the status and activities of the protection system.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
Therefore, it has the ability to discover unauthorized or abnormal network behaviors.
distributed under the terms and
As mentioned above, intrusion detection has three types, which are host-based detec-
conditions of the Creative Commons tion, network-based detection, and collaborative detection [4]. HIDS (Host-based intrusion
Attribution (CC BY) license (https:// detection) is located in the software component of the monitored system, which mainly
creativecommons.org/licenses/by/ monitors activities within the host, such as system or shell program logs. Based on the de-
4.0/). tection techniques, there are two types of intrusion detection, including misuse detection [5]
and anomaly detection [6]. Misuse detection, also known as signature-based detection, is
an application of signature matching to identify intrusions. It can effectively detect known
attacks and has a low rate of false alarms.
However, some machine learning techniques have some shortcomings, such as training
time being too long in large training sets, too sensitive to irrelevant attributes, and so on [7],
researchers try hard to adopt deep learning technology to solve these problems. Currently,
artificial intelligence technology is constantly developing, and multitudinous methods of
machine learning or deep learning have been applied to intrusion detection systems [8].
The methods that use machine learning have better performance than classical intrusion
detection methods. These methods have the ability to learn from a quantity of intrusion
data to build an intrusion detection model for distinguishing whether there is an intrusion
or not. But it still has some problems, such as the need for plentiful training samples, taking
a long time, and relying on feature selection.
Deep learning is usually a modification of artificial neural networks for feature ex-
traction, perception, and learning. Its applications are now used in scores of fields, such
as speech recognition, unmanned vehicles, image recognition and classification, natural
language processing, bioinformatics, etc. There are various neural network models using
deep learning technology. In this paper, we propose an intrusion detection model based on
multi-head attention with BiLSTM. The model completes the selection and extraction of
features based on the multi-head attention mechanism, and captures the dependencies of
vectors over longer distances through the BiLSTM model, thus improving the accuracy and
efficiency of identifying network intrusions.
2. Related Research
The first part of the related literature that uses BiLSTM for intrusion detection:
S.Siviamohan et al. [9] proposed a local university intrusion detection method based on
RNN-BiLSTM (Bidirectional Long Short-Term Memory Recurrent Neural Network), which
uses a two-step mechanism to solve the network problem. The experimental results show
that BiLSTM is better than all other RNN (Recurrent Neural Network) architectures in terms
of classification accuracy, and the prediction accuracy on the CICIDS2017 dataset reaches
98.48%, but it does not specify whether it is dual classification or multi-classification.
Nelly Elsayed et al. [10] produced an intrusion detection model by using BiLSTM and
CNN (Convolutional Neural Network). The BiLSTM recursive behavior is used to save
the information used for intrusion detection, while the CNN perfectly extracts the data
features. It can be implemented and applied to lots of smart home network gateways.
Huang Chi et al. [11] created a network intrusion detection method, which uses CNN
and BiLSTM. The former extracts local parallel features, solving the problem of incomplete
local feature extraction. The latter is used to extract long-distance related features, taking
into account the influence of attributes before and after each data point in the sequence
data, which can improve accuracy.
Liangkang Zhang et al. [12] produced a new model based on mean control, CNN and
BiLSTM. During data preprocessing, the data standardization of mean control is used to
standardize the original data, and then the CNN-BiLSTM algorithm is combined to predict.
The following are the related literature of the works that use an attention mechanism
for intrusion detection: Jingyi Wang et al. [13] proposed an intrusion detection model that
uses an attention mechanism. A SSAE (Stacked Sparse Autoencoder) was constructed
to extract the high-level feature representation of related information. Furthermore, the
double-layer BiGRU (Bidirectional Gated Recurrent Unit) network with an attention mech-
anism was used to classify data.
Haixia Hou et al. [14] proposed a method that uses HLSTM (Hierarchical LSTM) and
an attention mechanism. First of all, in order to extract sequence features across multiple
hierarchical structures on network record sequences, researchers used HLSTM. Then, the
attention layer’s function is to capture the correlation between features, redistribute the
144
Electronics 2023, 12, 4170
weight of features, and adaptively map the importance of each feature to different network
attack categories.
Yalong Song et al. [15] proposed a mechanism using BiGRU and a multi-head attention
mechanism. It can manage the data and capture the correlation between data and features.
The above articles all use artificial intelligence to predict intrusion detection but do
not make a detailed classification prediction of the type of intrusion. Some articles [13]
use binary classification, some articles [11,12,14] divide the main categories of network
intrusion, and some articles [9,10,15] do not explain the classification clearly. The above-
mentioned articles do not make a detailed prediction of intrusion detection. In order to
resolve this situation and improve our classification and precision, we propose an intrusion
detection model based on BiLSTM and a multi-head attention mechanism.
3. Model Methodology
In order to be suitable for the current NIDS (Network Intrusion Detection Systems)
methods and their characteristics, we propose a new detection method to complete intrusion
detection, which uses multi-head Attention and BiLSTM. The whole model consists of
two phases, including the training phase and the prediction phase. In the first phase, the
model’s goal is mainly to learn the original vector features of the network intrusion data,
training the network to adjust the weight parameters. Then, it can realize the training of
the proposed model through true value comparison and loss function calculation. In the
prediction phase, the predicted data was put into our model to obtain the final prediction
results, this model can also calculate the relevant performance metrics. The overall training
and evaluation structure is shown in Figure 1. A more detailed network model structure is
shown in Figure 2.
濥濴瀊濗濴瀇濴
濗濴瀇濴澳
濗濴瀇濴澳 濡瀈瀀濸瀅濼濶濴濿澳 濢瀁濸激濛瀂瀇澳
濧瀅濴瀁瀆濹瀂瀅瀀濴瀇濼瀂瀁澳 濦瀇濴瀁濷濴瀅濷濼瀍濴瀇濼瀂瀁 濘瀁濶瀂濷濼瀁濺
濖濿濸濴瀁濼瀁濺
濴瀁濷澳濦瀃濿濼瀇
澸濕濨濕澔濄濦濙濤濦濣濗濙濧濧濝濢濛
濧瀅濴濼瀁濼瀁濺澳瀆濸瀇 濩濴濿濼濷濴瀇濼瀂瀁澳瀆濸瀇
濡濸瀇瀊瀂瀅濾澳濜瀁瀇瀅瀈瀆濼瀂瀁澳濗濸瀇濸濶瀇濼瀂瀁澳濠瀂濷濸濿
濘瀉濴濿瀈濴瀇濸澳濠瀂濷濸濿澳濣濸瀅濹瀂瀅瀀濴瀁濶濸澳
濠瀈濿瀇濼激濛濸濴濷澳濔瀇瀇濸瀁瀇濼瀂瀁澾濕濼濟濦濧濠
濘瀉濴濿瀈濴瀇濼瀂瀁澳濠濸瀇瀅濼濶瀆 濖濿濴瀆瀆濼濹濼濶濴瀇濼瀂瀁澳濠瀂濷濸濿
濔濶濶瀈瀅濴濶瀌澳濣瀅濸濶濼瀆濼瀂瀁澳濥濸濶濴濿濿澳濙濄激瀆濶瀂瀅濸 濠瀈濿瀇濼激濶濿濴瀆瀆澳濶濿濴瀆瀆濼濹濴瀇濼瀂瀁
濈濦濕濝濢濝濢濛澔濕濢濘澔澹濪濕濠濩濕濨濝濣濢 濧濸瀆瀇濼瀁濺澳瀆濸瀇
Figure 1. Overall structure of the model based on multi-head attention and BiLSTM.
145
Electronics 2023, 12, 4170
濃濩濨濤濩濨澔澔澔澔澔
瀖瀖 濗濸瀁瀆濸
瀖瀖
澸濦濣濤濣濩濨
瀖瀖 濗濸瀁瀆濸
濖瀂瀁濶濴瀇
瀖瀖
瀖瀖
濦濶濴濿濸濷澳濗瀂瀇激濣瀅瀂濷瀈濶瀇澳濔瀇瀇濸瀁瀇濼瀂瀁
澸濦濣濤濣濩濨
澹濡濖濙濘濘濝濢濛
澽濢濤濩濨
3.1. Embedding
In Figure 2, we detail the model structure. First of all, we take the processed data and
use the embedding layer to transform each feature of intrusion detection into the form of
a vector.
The embedding layer is aimed to raise the dimension, and the input data is used to
represent each feature value in the original vector (i ranges from 1 to 41 for KDDCUP99
and NSLKDD datasets, i ranges from 1 to 15 for CICIDS2017 datasets). Furthermore,
ai = Embedding( xi ), where ai is the one-dimensional vector corresponding to each feature
with a length of 32. At this time, the original vector is transformed into a two-dimensional
vector (taking the NSLKDD dataset as an example, the data is transformed from one-
dimensional data with a length of 41 to two-dimensional data with a length of [41, 32],
where 41 is the number of features, 32 represents the embedding dimension). We hope that
the features can be enlarged so that the model is capable of learning more characteristics
of network intrusion activities with embedding. After the embedding layer is a dropout
layer, which is used to improve the generalization ability of our proposed model. Without
the dropout layer, the model we designed is prone to overfitting which leads to the low
prediction accuracy of the test set. Therefore, the addition of the dropout layer can prevent
overfitting to a certain extent, and that is, ai = Dropout( ai ).
146
Electronics 2023, 12, 4170
k i = SWkj , and vi = SWvj , the range of j is from 1 to 3. The weight matrices Wq , Wk , and Wv
can be continuously trained through learning, so the model’s fitting ability can be further
improved. The similarity matrix of different features is obtained by multiplying Q and
K T , and the similarity relationship between different features is obtained. After that, the
similarity is normalized by the Softmax function, which reduces the amount of calculation
to a certain extent. Finally, the obtained result is multiplied by V to obtain the data with
the same dimension as the input according to the Equations (1) and (2). In Equation (2), dK
represents the dimension of the K matrix. Finally, we can obtain the final result according
to Equation (3). Its structure is shown in Figure 3 schematically.
QK T
head j ( Q, K, V ) = So f tmax √ V (1)
dk
exp(zi )
So f tmax (zi ) = (2)
∑ j exp(z j )
濻濸濴濷濄
濻濸濴濷濅
3.3. BiLSTM
After that, the data weighted by the attention mechanism is fed into the BiLSTM model.
LSTM is a kind of RNN, that can learn and remember long-term dependencies, capture
the relationship between different features in the feature vector, and will not encounter the
problem of gradient disappearance or gradient explosion [18]. Graves et al. [19] reported
an important improvement in classification accuracy when using LSTM in a bidirectional
architecture. The feature vector of intrusion detection is not a time series data, but it can
analyze the relationship between distant features, associate different features, and then
make predictions. In this case, the output is a one-dimensional vector of length 128.
In the previous section, we were given the data T generated by the multi-head attention
mechanism. Data T is a two-dimensional vector composed of multiple one-dimensional
vectors, which we denote as T = (m1 , m2 , . . . , mi ).
147
Electronics 2023, 12, 4170
We believe that for the detection of a certain relationship in the data, LSTM has the
ability to capture this longer distance dependence while being able to avoid gradient
disappearance, gradient explosion, and other problems. However, only using LSTM cannot
encode back-to-back information. Therefore, we use BiLSTM to improve the ability to
capture bidirectional features.
BiLSTM is composed of several small structures, with one basic unit consisting of four
layers, which are the input layer, forward propagation layer, backward propagation layer,
and output layer. The forward propagation layer is in charge of extracting the forward
features of the vector from front to back, while the backward transmission layer is in charge
of extracting the reverse features of the input sequence from back to front. The output
layer integrates the data output from the forward propagation layer and the backward
propagation layer. We want to extract the forward-backward correlation of the vectors, so
the output formula of BiLSTM is shown in Equation (4).
−
→ ← −
hi = [ hi ⊕ hi ] (4)
−
→
where ⊕ denotes the summation calculation of the corresponding elements. hi denotes
←
−
the forward output, hi denotes the backward output. Finally, hi denotes the result of the
summation of the corresponding elements.
Among them, the BiLSTM network structure has lots of single LSTM structures, and
the individual LSTM structure is shown in Figure 4.
LSTM adds three gating structures in the hidden layer, namely the forget gate, input
gate, and output gate, and it also adds a new hidden cell state. In Figure 4, f (t), i (t), and
o (t) represent the forget gate, input gate, and output gate at time t, and a(t) represents
the initial feature extraction of h(t − 1) and c(t) at time t. All formulas are shown in
Equations (5)–(8).
f ( t ) = σ (W f h t − 1 + U f m t + b f ) (5)
i (t) = σ(Wi ht−1 + Ui mt + bi ) (6)
a(t) = tanh(Wa ht−1 + Ua mt + ba ) (7)
o (t) = σ (WO ht−1 + UO mt + bO ) (8)
where xt represents the input at time t. ht−1 represents the hidden layer status value at time
t − 1. W f , Wi , Wo represent the weight parameter of ht−1 in the feature extraction process
148
Electronics 2023, 12, 4170
of the forget gate, input gate, and output gate. U f , Ui , Uo represent the weight parameter
of xt in the feature extraction process of forget gate, input gate, and output gate. b f , bi ,
and bo represent the forget gate, input gate, output gate, and offset value in the process of
feature extraction, respectively. The related functions are shown in Equation (9) [20] and
Equation (10) [21]:
1 − e−2x
tanh( x ) = (9)
1 + e−2x
1
σ( x) = (10)
1 + e− x
The results of the forget gate and output gate calculations act on c(t − 1), which
constitutes the cell state c(t) at moment t, denoted as Equation (11). The final hidden state
h(t) at moment t is derived from the output gate o(t) as well as the cell state c(t), denoted
as Equation (12). where represents Hadamard product.
c ( t ) = c ( t − 1) f ( t ) + i ( t ) a ( t ) (11)
The second dense layer uses the Softmax activation function (shown in Equation (2)),
which is often used in multi-class classification problems, so it is very suitable for our
network detection model. Then, the output of the dense layer also corresponds to the
number of attack types. In the end, our model predicts all the mentioned attack types in
the dataset, so it is of great importance to use the Softmax activation function.
Combining multi-head attention with BiLSTM has several advantages, including:
• Improved sequence modeling: BiLSTM is a type of RNN that can effectively model
sequential data in both forward and backward directions. However, when combined
with multi-head attention, they can further enhance the model’s ability to capture
long-range dependencies and improve the quality of sequence modeling.
• Increased interpretability: Multi-head attention mechanism allows the model to attend
to distinct parts of the input sequence selectively, providing more transparency and
interpretability to the model’s decision-making process. This is particularly useful in
detection tasks such as network intrusion detection.
• Robustness to noise and variations: By attending to multiple parts of the input se-
quence, the model becomes more robust to variations and noise in the data.
• Scalability: The combination of multi-head attention with BiLSTM allows the model
to scale well to larger datasets and more complex tasks without compromising per-
formance or accuracy. This makes it an effective approach for handling large-scale
network intrusion detection tasks.
149
Electronics 2023, 12, 4170
4. Implementation Details
In our experiments, the hardware environment is as follows: the CPU model is Intel
Core i7-10750H, the GPU model is NVIDIA GeForce RTX2060 with Max-Q Design, the
memory on the GPU card is 6 GB and the RAM on the computer is 32 GB.
The language and platform (software) environment are as follows: the operating
system used in the experiment is Windows 11, and the programming environment is
Python 3.9. The Keras deep learning framework and Scikit-learn framework are used to
help us build the model and process the data.
We divided the data set into three parts, each part has a different function. The training
set accounted for 64%, the validation set accounted for 16%, and the test set accounted
for 20%. A total of 60 rounds of model training were performed, the batch size was 512,
the random number seed was 0, and the number of heads of the multi-head attention
mechanism was 3. The Adam algorithm [23] is used as the optimizer of the model, and
the value of the learning rate is 0.0003. Adam is able to adaptively adjust the learning rate
based on the gradient information and adjust the momentum to avoid falling into a local
minimum too early. We add two dense layers after BiLSTM and in the output part, but
they have distinct activation functions, one is the ReLU activation function, and the other is
the Softmax activation function. Meanwhile, a dropout layer is added after the embedding
layer and between the two dense layers with parameters of 0.8 and 0.3, respectively. The
whole training time of the model is 55 min on the KDDCUP99 dataset, 4 min 45 s on the
NSLKDD dataset, and 33 min on the CICIDS2017 dataset.
The core code of the model we designed is shown in Figure 5. The code is written in
Python and built under the Keras deep learning framework.
150
Electronics 2023, 12, 4170
DOS back, land, neptune, pod, Denial-of-Service (DoS) attack is a type of cyber attack
smurf, teardrop where a perpetrator attempts to make a website or
network resource unavailable to its intended users by
overwhelming it with traffic or other types of data.
U2R buffer overflow, loadmod- User-to-Root (U2R) attackis a type of cyber attack
ule, perl, rootkit where an attacker with limited privileges on a sys-
tem attempts to gain root-level access.
The CICIDS2017 dataset [26], also known as the Canadian Institute for Cybersecurity
Intrusion Detection System (CIC-IDS2017), is a comprehensive dataset designed for eval-
uating NIDS. Its authors are researchers at the University of New Brunswick in Canada.
This dataset consists of various network traffic features extracted from different types of
network traffic, including normal traffic and several types of attacks. It also simulates a
real-world network environment to provide a realistic representation of network traffic. A
short description of its files is shown in Table 2. Descriptions of all the datasets are shown
in Table 3 (Table 3 is the data set size that has been balanced by using the algorithm SMOTE.
The relevant content is presented in Section 5.2.4).
151
Electronics 2023, 12, 4170
mi − m
mi= (14)
x
We use mi and m i to represent the value of the data sample before and after normal-
ization. Meanwhile, m is used to represent the average data value of the feature before
normalization.
152
Electronics 2023, 12, 4170
153
Electronics 2023, 12, 4170
Table 5. Cont.
ϭ ϭ
ϭ Ϭ͘ϵϵ Ϭ͘ϵϵ Ϭ͘ϵϵ
Ϭ͘ϵϴ Ϭ͘ϵϴ Ϭ͘ϵϴ
Ϭ͘ϵϴ Ϭ͘ϵϳ Ϭ͘ϵϳ
Ϭ͘ϵϲ Ϭ͘ϵϱ Ϭ͘ϵϱ
Ϭ͘ϵϰ
Ϭ͘ϵϮ
Ϭ͘ϵ
Ϭ͘ϴϴ
Ϭ͘ϴϲ
Ϭ͘ϴϰ
<hWϵϵ E^>< //^ϮϬϭϳ
ĐĐƵƌĂĐLJ WƌĞĐŝƐŝŽŶ ZĞĐĂůů &ϭͲ^ĐŽƌĞ
Figure 7. Accuracy and loss variation of model training on the KDDCUP99 dataset.
154
Electronics 2023, 12, 4170
Figure 8. Accuracy and loss variation of model training on the NSLKDD dataset.
Figure 9. Accuracy and loss variation of model training on the CICIDS2017 dataset.
155
Electronics 2023, 12, 4170
156
Electronics 2023, 12, 4170
ĐĐƵƌĂĐLJŽĨĚŝĨĨĞƌĞŶƚĚĂƚĂƐĞƚƐ
ϭϬϬй Ϭ͘ϵϵϬϴ
Ϭ͘ϵϳϳϵ Ϭ͘ϵϴϮϵ Ϭ͘ϵϴ Ϭ͘ϵϴ
Ϭ͘ϵϳϭϱ
Ϭ͘ϵϱϭϵ
ϵϱй Ϭ͘ϵϯϱϴ Ϭ͘ϵϯϵϰ
ϵϬй
Ϭ͘ϴϳ
Ϭ͘ϴϱϵϳ
Ϭ͘ϴϰϮϱ
ϴϱй
ϴϬй
ϳϱй
6. Conclusions
In this paper, we propose an intrusion detection model based on a multi-head attention
mechanism and BiLSTM. The embedding layers can convert sparse high-dimensional
feature vectors into low-dimensional feature vectors. This operation can fuse a large
amount of valuable information. Then, we try to use the attention mechanism to introduce
different attention weights for each vector in the feature vector, not only strengthening the
relationship between certain vectors and the type of detected attacks but also improving
the accuracy of detection. We also improve the use of a multi-head attention mechanism
to avoid focusing too much attention on certain elements in the vector. Finally, we apply
the BiLSTM network to detect some kind of relationship that exists in the data, while
LSTM aims to capture long-distance dependencies and also can avoid lots of situations
like gradient disappearance and gradient explosion. The experimental comparison shows
that our proposed model has better accuracy and F1-score on the KDDCUP99, NSLKDD,
and CICIDS2017 datasets than other models, and it is more accurate for multiple types of
intrusion detection than other binary intrusion detection models.
Of course, our model still has some shortcomings. For example, our model is a multi-
classification model, hoping to make more detailed and accurate predictions for different
network intrusions. If the intrusion is a new type, our model can not give its definition
or report what kind of specific intrusion method it is, but it can still be classified as an
intrusion type rather than a normal network activity for professionals to study. In addition,
the normal samples of the KDDCUP99 dataset and the normal samples of the CICIDS2017
dataset are too large. To ensure the availability of the detection model, we use oversampling
and undersampling to ensure the balance of the data, but the sampling process has great
randomness, which may delete some important information in the majority sample. In
future improvements, we will try to solve these problems.
Author Contributions: Conceptualization, J.Z., X.Z., Z.L. and F.F.; methodology, J.Z.; software, J.Z.
and Y.J.; validation, J.Z. and Y.J.; formal analysis, J.Z.; investigation, X.Z.; resources, X.Z.; data
curation, J.Z.; writing—Original draft preparation, J.Z. and X.Z.; writing—Review and editing, Z.L.,
F.F. and F.X.; supervision, F.F.; project administration, J.Z. and X.Z. All authors have read and agreed
to the published version of the manuscript.
Funding: This work was funded by Hainan Province Science and Technology Special Fund (Grant
No. ZDYF2021GXJS006), Haikou Science and Technology Plan Project (Grant No. 2022-007) and Key
Laboratory of PK System Technologies Research of Hainan, China.
Data Availability Statement: The datasets analyzed during this study are available from the corre-
sponding author upon reasonable request.
Conflicts of Interest: The authors declare no conflict of interest.
157
Electronics 2023, 12, 4170
References
1. Manzoor, I.; Kumar, N. A feature reduced intrusion detection system using ANN classifier. Expert Syst. Appl. 2017, 88, 249–257.
2. Thapa, S.; Mailewa, A. The role of intrusion detection/prevention systems in modern computer networks: A review. In
Proceedings of the Conference: Midwest Instruction and Computing Symposium (MICS), Online, 3–4 April 2020 ; Volume 53,
pp. 1–14.
3. Patgiri, R.; Varshney, U.; Akutota, T.; Kunde, R. An investigation on intrusion detection system using machine learning. In
Proceedings of the 2018 IEEE Symposium Series on Computational Intelligence (SSCI), Bangalore, India, 18–21 November 2018;
pp. 1684–1691.
4. Liu, M.; Xue, Z.; Xu, X.; Zhong, C.; Chen, J. Host-based intrusion detection system with system calls: Review and future trends.
ACM Comput. Surv. (CSUR) 2018, 51, 1–36. [CrossRef]
5. Pu, G.; Wang, L.; Shen, J.; Dong, F. A hybrid unsupervised clustering-based anomaly detection method. Tsinghua Sci. Technol.
2020, 26, 146–153. [CrossRef]
6. Buczak, A.L.; Guven, E. A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE
Commun. Surv. Tutor. 2015, 18, 1153–1176. [CrossRef]
7. Momand, A.; Jan, S.U.; Ramzan, N. A Systematic and Comprehensive Survey of Recent Advances in Intrusion Detection Systems
Using Machine Learning: Deep Learning, Datasets, and Attack Taxonomy. J. Sens. 2023, 2023, 6048087. [CrossRef]
8. Liu, H.; Lang, B. Machine learning and deep learning methods for intrusion detection systems: A survey. Appl. Sci. 2019, 9, 4396.
[CrossRef]
9. Sivamohan, S.; Sridhar, S.; Krishnaveni, S. An effective recurrent neural network (RNN) based intrusion detection via bi-
directional long short-term memory. In Proceedings of the 2021 international conference on intelligent technologies (CONIT),
Hubli, India, 25–27 June 2021; pp. 1–5.
10. Elsayed, N.; Zaghloul, Z.S.; Azumah, S.W.; Li, C. Intrusion detection system in smart home network using bidirectional LSTM
and convolutional neural networks hybrid model. In Proceedings of the 2021 IEEE International Midwest Symposium on Circuits
and Systems (MWSCAS), Lansing, MI, USA, 9–11 August 2021; pp. 55–58.
11. Chi, H.; Lin, C. Industrial Intrusion Detection System Based on CNN-Attention-BILSTM Network. In Proceedings of the 2022
International Conference on Blockchain Technology and Information Security (ICBCTIS), Huaihua City, China, 15–17 July 2022;
pp. 32–39.
12. Zhang, L.; Huang, J.; Zhang, Y.; Zhang, G. Intrusion detection model of CNN-BiLSTM algorithm based on mean control. In
Proceedings of the 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China,
16–18 October 2020; pp. 22–27.
13. Wang, J.; Chen, N.; Yu, J.; Jin, Y.; Li, Y. An efficient intrusion detection model combined bidirectional gated recurrent units with
attention mechanism. In Proceedings of the 2020 7th International Conference on Behavioural and Social Computing (BESC),
Bournemouth, UK, 5–7 November 2020; pp. 1–6.
14. Hou, H.; Di, Z.; Zhang, M.; Yuan, D. An Intrusion Detection Method for Cyber Monintoring Using Attention based Hierarchical
LSTM. In Proceedings of the 2022 IEEE 8th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference
on High Performance and Smart Computing,(HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS), Jinan,
China, 6–8 May 2022; pp. 125–130.
15. Song, Y.; Zhang, D.; Li, Y.; Shi, S.; Duan, P.; Wei, J. Intrusion Detection for Internet of Things Networks using Attention Mechanism
and BiGRU. In Proceedings of the 2023 5th International Conference on Electronic Engineering and Informatics (EEI), Wuhan,
China, 30 June–2 July 2023; pp. 227–230.
16. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In
Proceedings of the NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long
Beach, CA, USA, 4–9 December 2017.
17. Liu, C.; Liu, Y.; Yan, Y.; Wang, J. An intrusion detection model with hierarchical attention mechanism. IEEE Access 2020,
8, 67542–67554. [CrossRef]
18. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef]
19. Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures.
Neural Netw. 2005, 18, 602–610. [CrossRef]
20. Medsker, L.R.; Jain, L. Recurrent neural networks. Des. Appl. 2001, 5, 2.
21. Schtickzelle, M. Pierre-François Verhulst (1804–1849). La première découverte de la fonction logistique. Population 1981, 3,
541–556. [CrossRef]
22. Sudjianto, A.; Knauth, W.; Singh, R.; Yang, Z.; Zhang, A. Unwrapping the black box of deep relu networks: Interpretability,
diagnostics, and simplification. arXiv 2020, arXiv:2011.04041.
23. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980.
24. Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2009
IEEE symposium on computational intelligence for security and defense applications, Ottawa, ON, Canada, 8–10 July 2009;
pp. 1–6.
25. Revathi, S.; Malathi, A. A detailed analysis on NSL-KDD dataset using various machine learning techniques for intrusion
detection. Int. J. Eng. Res. Technol. 2013, 2, 1848–1853.
158
Electronics 2023, 12, 4170
26. Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic
characterization. ICISSp 2018, 1, 108–116.
27. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell.
Res. 2002, 16, 321–357. [CrossRef]
28. Andresini, G.; Appice, A.; Malerba, D. Nearest cluster-based intrusion detection through convolutional neural networks.
Knowl.-Based Syst. 2021, 216, 106798. [CrossRef]
29. Luo, J.; Zhang, Y.; Wu, Y.; Xu, Y.; Guo, X.; Shang, B. A Multi-Channel Contrastive Learning Network Based Intrusion Detection
Method. Electronics 2023, 12, 949. [CrossRef]
30. Zhang, L.; Yan, H.; Zhu, Q. An Improved LSTM Network Intrusion Detection Method. In Proceedings of the 2020 IEEE 6th
International conference on Computer and Communications (ICCC), Chengdu, China, 11–14 December 2020; pp. 1765–1769.
31. Su, T.; Sun, H.; Zhu, J.; Wang, S.; Li, Y. BAT: Deep Learning Methods on Network Intrusion Detection Using NSL-KDD Dataset.
IEEE Access 2020, 8, 29575–29585. [CrossRef]
32. Yang, Y.; Zheng, K.; Wu, C.; Yang, Y. Improving the Classification Effectiveness of Intrusion Detection by Using Improved
Conditional Variational AutoEncoder and Deep Neural Network. Sensors 2019, 19, 2528. [CrossRef]
33. Ieracitano, C.; Adeel, A.; Morabito, F.C.; Hussain, A. A novel statistical analysis and autoencoder driven intelligent intrusion
detection approach. Neurocomputing 2020, 387, 51–62. [CrossRef]
34. Wang, Z.; Zeng, Y.; Liu, Y.; Li, D. Deep belief network integrating improved kernel-based extreme learning machine for network
intrusion detection. IEEE Access 2021, 9, 16062–16091. [CrossRef]
35. Mendonça, R.V.; Teodoro, A.A.; Rosa, R.L.; Saadi, M.; Melgarejo, D.C.; Nardelli, P.H.; Rodríguez, D.Z. Intrusion detection system
based on fast hierarchical deep convolutional neural network. IEEE Access 2021, 9, 61024–61034. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
159
electronics
Article
ADQE: Obtain Better Deep Learning Models by Evaluating the
Augmented Data Quality Using Information Entropy
Xiaohui Cui 1,2 , Yu Li 1,2 , Zheng Xie 1,2 , Hanzhang Liu 1 , Shijie Yang 1 and Chao Mou 1,2, *
1 School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China;
[email protected] (X.C.)
2 Engineering Research Center for Forestry-Oriented Intelligent Information Processing of National Forestry
and Grassland Administration, Beijing 100083, China
* Correspondence: [email protected]
Abstract: Data augmentation, as a common technique in deep learning training, is primarily used
to mitigate overfitting problems, especially with small-scale datasets. However, it is difficult for
us to evaluate whether the augmented dataset truly benefits the performance of the model. If the
training model is relied upon in each case to validate the quality of the data augmentation and
the dataset, it will take a lot of time and resources. This article proposes a simple and practical
approach to evaluate the quality of data augmentation for image classification tasks, enriching the
theoretical research on data augmentation quality evaluation. Based on the information entropy,
multiple dimensional metrics for data quality augmentation are established, including diversity, class
balance, and task relevance. Additionally, a comprehensive data augmentation quality fusion metric
is proposed. Experimental results on the CIFAR-10 and CUB-200 datasets show that our method
maintains optimal performance in a variety of scenarios. The cosine similarity between the score of
our method and the precision of model is up to 99.9%. A rigorous evaluation of data augmentation
quality is necessary to guide the improvement of DL model performance. The quality standards and
evaluation defined in this article can be utilized by researchers to train high-performance DL models
in situations where data are limited.
Keywords: data augmentation; deep learning; data quality; big data; data mining
theoretical and comprehensive. By visualizing and analyzing the augmented data, we can
evaluate the effectiveness of data augmentation, observe whether the changes in data are
reasonable, and cover different categories and tasks. This intuitive and simple method
can efficiently evaluate the effects of data augmentation and quickly find suitable data
augmentation methods, which can help obtain the most suitable dataset for model training
in advance.
Studies [23,24] have shown that data quality is a multidimensional concept. Data
quality has different meanings in different contexts. For example, data quality can be about
measuring defective or outlier data in a general context [25–27], or describing whether the
data meet the expected purpose in a specific context [28]. In this paper, we define data
quality as a measure of data suitability for constructing a DL training set. Existing data
quality assessments consider both intrinsic data quality and contextual quality [29], but
the definitions of contextual quality vary. The most common idea is to divide contextual
quality into two parts based on the process of DL: diversity within the training set and
similarity between the training and testing sets. The main idea is to make the training
set complex enough to encompass all the features and be similar to the real distribution
represented by the testing set so that the DL model can learn adequately from this dataset.
However, they overlook the fact that the performance of deep learning models is not only
influenced by the problem space covered by the data. For instance, imbalanced classes
in the dataset may lead to model bias [30,31], and these imbalances can occur in terms of
quantity, features, or colors. Creating more dimensions based on the task and data features
can better describe the quality of the data and its value for deep learning models. We
hope to construct a universal, robust, and highly generalizable multidimensional quality
evaluation method by refining and differentiating the definition of quality metrics, which
can provide strong support for the quality evaluation of data augmentation.
In addition, due to the curse of dimensionality, such as in the case of image and text
data, there arise computational and statistical challenges, with computational complexity
growing exponentially. Hence, many works have used average similarity and minmax
similarity between samples to calculate these two dimensions [29,32]. Although average or
minmax similarity between samples can quickly assess the quality of a dataset, they cannot
accurately approximate the precision of models trained on that dataset. Information en-
tropy [33] can provide a comprehensive evaluation of data distribution, considering global
characteristics such as sample diversity, rather than just focusing on average differences
in the data [34]. Its feature as a non-linear measure based on probability distribution can
better capture non-linear relationships in data distribution, with less sensitivity to noise
and stronger interpretability [35]. Because of the low noise sensitivity, it is suitable to im-
prove the computational efficiency with dimension reduction technology. The computation
problem caused by a dimension disaster can be avoided. In summary, information entropy,
as a metric for evaluating the quality of data augmentation, possesses more comprehensive,
robust, and interpretable characteristics, making it more suitable for approximating the
precision of models.
Therefore, this paper proposes an information entropy-based method for evaluating
the quality of data augmentation. By attempting to deconstruct the dimensions of the data,
we assess the quality of the dataset and data augmentation. In our approach, the augmented
dataset is initially broken down into three dimensions, including diversity, class balance,
and task relevance. Furthermore, taking image data as an example, for each dimension,
numerous sub-dimensions are derived based on the task and data characteristics. Finally,
by considering the correlations between the metrics, we calculate the ultimate composite
metric score, providing insights into the impact of the current augmentation strategy on
model performance.
• In this paper, we design and implement a data enhancement quality evaluation
method, which can optimize and generate large-scale, high-quality data sets by disas-
sembling and balancing the quality dimensions of data sets.
162
Electronics 2023, 12, 4077
• This paper discusses the choice of mathematical tools for statistical analysis of data
dimensions, and determines that information entropy is more suitable than other
methods for evaluating the information content of data.
• This paper extensively evaluates the proposed method on various data augmentation
techniques, datasets, and models. There is a strong correlation between the experi-
mental results of the deep learning model and the evaluation results of the method,
which shows that the method can improve the performance of the model on related
tasks by evaluating the data enhancement quality.
2. Methods
In this work, we aim to explore the effectiveness of data augmentation in enhancing
datasets, with the hope of replacing expensive model training with more comprehensive
statistical metrics to evaluate the quality of augmented datasets. The primary goal of
data augmentation is to generate a diverse and balanced dataset that is highly relevant to
the task.
2.1. Preliminaries
Before presenting the details of our method, we give a brief overview of deep learning
and data augmentation, which provides the theoretical basis of our algorithm design. For
better illustration, some notations are summarized in Table 1.
Mathematical Description
Notation
The data and labels of the dataset, as well as the input and output space of the
X, Y model. X and Y represent the original training dataset. X and Y represent the
augmented training dataset. Xt and Yt represent the test dataset.
The x denotes an input data and the y denotes a label for the data. The x and y
x, y represent the original training dataset. The x and y represent the augmented
training dataset. The xt and yt represent the test dataset.
Both represent a probability function that describes the distribution of the
P, p
sample space.
It represents the risk function. Its subscripts represent the computational ideas
R
used, empirical and expectation, respectively.
It represents a collection of data augmentation quality metrics. Each Qi ∈ Q
Q
counts the different dimensions of the data.
pixel It represents the pixel of the image data.
D denotes the dataset. D represents the original training dataset. D represents
D
the augmented training dataset. Dt represents the test dataset.
C represents the number of classes in the dataset and ci represents the number
C, ci
of samples in the ith classes of the training dataset.
It represents the number of samples in the dataset. N represents the original
N training dataset. N represents the augmented training dataset. Nt represents
the test dataset.
163
Electronics 2023, 12, 4077
and the data distribution P( X, Y ) to train the model. Therefore, the expected loss of the
model f ( x, θ ) with respect to the joint distribution P( X, Y ) is expressed as
since the L(y, f ( x, θ )) represents the loss function, which quantifies the difference between
individual input and output instances. However, in reality, the data distribution P( X, Y ) is
often unknown, and we only have knowledge of the distribution of samples in the training
set. Therefore, to deal with this situation, in DL, the approach is to minimize the expected
loss on the training set. As shown in Equation (2), the empirical distribution P̂( X, Y )
based on the training set is used instead of the true distribution P( X, Y ) to calculate the
empirical loss Remp . This way, during the training process, the model performs parameter
optimization based on the sample distribution in the training set, aiming to approximate
the performance of the true distribution as closely as possible.
N
1
Remp (θ ) =
N ∑ L(y, f (x, θ )). (2)
n =1
Lemma 1 (Chebyshev’s inequality). Let t be a random variable with finite expected value μ. And
there are n variables in total. Then, for any small positive number ,
∑ ti
lim P(| − μ < |) = 1. (4)
n→∞ n
According to the law of large numbers, Equations (2) and (4), as the sample size N
becomes sufficiently large, the empirical risk Remp tends to the expected risk Rexp . However,
when it comes to DL datasets, considering only the distribution of samples and labels is
insufficient. For instance, simply resampling data can lead to more severe overfitting of
the model. We also need to ensure that the features in the data align closely with the true
distribution, which is a key problem addressed by data augmentation. In general, data
augmentation is achieved by modifying the original data based on prior knowledge to
expand the dataset. The generated data may have the same labels as the original data,
but the features extracted by the model are different. This enables the model to more
easily recognize the critical features relevant to the task at hand. The parameters for data
enhancement need to satisfy the following expression:
164
Electronics 2023, 12, 4077
The ideal scenario is that the dataset remains consistent with the true distribution for
all features P( Xt , Yt ). However, this is an ideal situation and the true distribution is still
unknown. Therefore, we use the test set instead of the true distribution for the estimation.
Theorem 1. The expectation and variance of the original dataset are μ and σ, the expectation and
variance of the augmented dataset are μ and σ . Assuming that the expectation and variance of
true distribution are μt and σt . Equation (5) can be expressed as
However, due to the randomness of data augmentation, the generated data may
not necessarily be more in line with the true distribution compared to the original data.
Therefore, we need to perform quality estimation on it.
Lemma 2. Expectation and variance are not mutually independent, only when the distribution
does not follow a normal distribution.
which is the product of the statistical values of expectation Qσ and variance Qμ . The
expectation of the dataset is primarily influenced by the target task, and the value obtained
from expectation is also referred to as task relevance.
Unlike expectation values, data can have different distributions based on different
feature selections, leading to varying variances. These variances can be mainly divided
into two categories. One is the distribution of semantic features that approximate a normal
distribution, and the other is the distribution of categories, which is mostly a uniform
distribution. The variances of these two distributions are independent of each other, so
statistical values of variances can be obtained using addition.
C C
P(label, semantic) = ∑ P(label ) × P(semantici ) = ∑ P(semantici ). (8)
i i
Features are classified according to whether they have a relationship with the category,
and the same is true for data quality. So we have
165
Electronics 2023, 12, 4077
primarily used to evaluate the diversity and class balance metric. (3) Result Statistics: The
scores of each augmented dataset will be ranked, and the datasets with higher quality will
be selected for model training and validation. The implementation code is available at:
https://github.com/ForestryIIP/ADQE (accessed on 20 August 2023).
166
Electronics 2023, 12, 4077
2.3.2. Clustering
After data augmentation, we can proceed with the calculation of various metrics.
For diversity computation, calculating low-dimensional feature diversity simply requires
measuring the individual feature values of each image, which can be performed in O(n) time
complexity. However, in the case of high-dimensional features, the complexity increases
as we need to calculate the similarity between each pair of samples and compute the
eigenvalues of the nn similarity matrix, resulting in a time complexity of O(n3 ). Similarly,
for task relevance, we need to compute the similarity between sample pairs from the
training set and the test set, resulting in a high time complexity of O(n × m). To address
these challenges, this study adopts the FINCH clustering algorithm [37], which employs
pooling and sampling techniques to reduce the dimensionality and scale of the dataset,
thereby mitigating the computational cost and time overhead. In the calculation of class
balance, the clustering algorithm can directly compute the desired metrics.
This matrix is derived from a user-defined similarity function applied to the samples
under evaluation for diversity:
C ci
1
Q1 =
C ∑ exp(− ∑ λ j logλ j ), (12)
i =1 j =1
where the λ j represents the eigenvalues of the similarity matrix K. The K is a positive-
definite matrix obtained from a set of samples x and the similarity function. For all x,
similarity( x, x ) = 1. This method , as shown in Algorithm 2, primarily quantifies the
effective number of distinct elements in the data. For example, after extracting features
from an image using a neural network, you typically obtain a 2048-dimensional vector. This
vector stores features across different dimensions of the image. Assuming the similarity
function is a cosine similarity, this metric measures whether the directions of two vectors
align. If the feature dimensions forming these directions are more similar, the metric value
is higher. Consequently, a similarity matrix K can be computed. Eigenvalues generally
represent inherent structural properties and patterns within the data [39]. Each eigenvalue
corresponds to a mode of variation or structure in the data. In the context of image similarity,
167
Electronics 2023, 12, 4077
they can indicate different similarity patterns or clusters among images. The magnitude of
eigenvalues also reflects the proportion of corresponding patterns in the data. Therefore,
computing the entropy of eigenvalues is equivalent to quantifying the richness of patterns
in the data. If a single pattern dominates the data, the style and content of the images
are highly certain, resulting in low information uncertainty. Conversely, when multiple
similar patterns share similar proportions, the style and content of the images become
less certain, leading to higher information uncertainty. In summary, the eigenvalues of
similarity matrix K and the entropy of these eigenvalues provide valuable insights into the
data’s structure and diversity. High entropy indicates complexity and diversity, whereas
low entropy suggests simplicity or uniformity in similarity patterns within the data. They
can guide overall data analysis in the context of diversity metrics.
Ignoring the time of feature extraction and clustering part, the computation of Q1 is
mainly divided into two steps. They are the computation of similarity adjacency matrix
and its eigenvalues, respectively. For each pair of vectors, similarity computation, such
as cosine similarity, requires O(K ) time complexity. Where K is the dimension size of the
image vector output by the model. Every element in the dataset needs to be computed,
then computing the adjacency matrix requires O(KN 2 ) time complexity. The next step is
to solve for the eigenvalues, which are also usually O( N 3 ). Generally, K N, so the time
complexity is O( N 3 ). Although the time complexity is high, the data size is reduced by
clustering in advance. The actual computation time is still within acceptable limits.
Furthermore, analyzing the diversity of data solely based on high-dimensional fea-
tures provides an abstract understanding of diversity, but it may not provide an intuitive
and clear understanding. Therefore, it is still necessary to define the diversity of data in
low-dimensional features. For example, in image data, models often learn texture fea-
tures extensively from the dataset [40]. To address this, we can calculate the occurrence
probabilities of different textures in all the data:
NTexture
Q2 = exp( ∑ p Texturei,j logp Texturei,j ), (13)
i =1
168
Electronics 2023, 12, 4077
where NTexture is the number of different Texture and p Texturei,j represents the probability
that the value of Texturei,j will occur. The Texture is defined as a combination of pixeli,j
and adjacent pixels:
⎛ ⎞
pixeli−1,j+1 pixeli,j+1 pixeli+1,j+1
Texturei,j = ⎝ pixeli−1,j pixeli,j pixeli+1,j ⎠. (14)
pixeli−1,j−1 pixeli,j−1 pixeli+1,j−1
Due to computational complexity and memory limitations, this paper calculates the
information entropy of the average pixel values within a 3 × 3 window instead of the
information entropy of pixel combinations. Lastly, considering that brightness can have an
impact on the model [41], this paper also calculates the probability of brightness for each
pixel in the entire dataset and computes its entropy value:
3 256
1
Q3 =
3 ∑ exp( ∑ plevel logplevel ), (15)
RGB=1 level =1
where RGB represents the three color channels of an image and level represents the com-
ponent of the channel, up to a maximum of 256. Texture and brightness are counting the
pixels of the image, so the time complexity is O( PN ) and P is the average number of pixel
points in the image. As shown in Algorithm 3, the variables for the Q2 and Q3 are similar
and can be run together in the feature extraction phase.
169
Electronics 2023, 12, 4077
in each class is equal, which is the most fundamental metric. The formula for balance of
number in class is as follows:
C
1
Q4 = 1 −
C ∑ (ci − c̄), (16)
i =1
where the c̄ represents the average count of samples across classes. The variance of
the classes is computed with a time complexity of O(C ).The algorithm is illustrated in
Algorithm 4.
H ( X ) = − ∑ p( x )logp( x ), (17)
X
where H ( X ) represents the entropy of the classes. The basic definition of mutual informa-
tion is
p( x, y)
I ( X; Y ) = ∑ ∑ p( x, y)log , (18)
X Y
p ( x ) p(y)
where I ( X; Y ) represents the mutual information and p( x, y) is the joint distribution be-
tween the clustering results and the ground truth. The p( x ) and p(y) are the marginal
distributions of the ground truth labels and clustering results, respectively. Although mu-
tual information can also measure the degree of similarity between two clustering results,
its value is strongly influenced by the sample size. Normalized mutual information (NMI),
on the other hand, can better measure the degree of similarity between two clustering
results by normalizing the mutual information values to the same range of values. The
NMI is defined as
I ( X; Y )
Q5 = 2 . (19)
H ( x ) + H (y)
The details are presented in Algorithm 5. Ignoring the time of feature extraction and
clustering part, the computational time complexity of the mutual information of the label
distributions before and after clustering is only O( N ).
170
Electronics 2023, 12, 4077
Nt N
1
Q6 =
Nt ∗ N
∑ ∑ similarity(i, j), (20)
i =1 j =1
where similarity(i, j) denotes the similarity function used to measure the similarity between
samples. Based on the Equation (20), we can get the Algorithm 6.
Ignoring the time of feature extraction and clustering part, the computation of Q6 is
mainly divided into two steps. They are the computation of the similarity matrix and its
average, respectively. The similarity of all unordered pairs of the dataset and test set needs
to be computed, assuming that the dataset size is N and the test set size is M, then the
total number is N × M. The similarity matrix time complexity is O(KMN ). Where K is the
171
Electronics 2023, 12, 4077
dimension size of the image vector output by the model. And calculating the mean only
requires O( NM ). So the total time complexity is O(KMN ).
Q6 5 Q 5
Q6 i∑
Q= × wi i / ∑ wi , (21)
=1
Q i i =1
where the Qi represents the quality metric of the augmented dataset, Qi represents the
quality metric of the original dataset and wi represents the weight of each metric to the final
fusion metric Q. By assigning appropriate weight coefficients to the metrics of different
parts, the balance of influence of different factors in the metrics can be ensured. For example,
when you encounter high precision requirements of the task, task relevance metrics are
more critical and need to be assigned a higher weight.
3. Experimental
3.1. Datasets and Data Augmentation
In order to validate the effective evaluation of data augmentation quality, this study
selected different augmentation strategies, datasets at different granularities, and several
specific tasks as the objects of method evaluation. The CIFAR-10 and CUB-200 [46] datasets
were chosen as the experimental datasets for image classification tasks. The CIFAR-10 and
CUB-200 represent different areas of computer vision problems. CIFAR-10 is an image
classification dataset containing 10 different categories of common objects such as aircraft,
dogs, cars, and so on. The CUB-200, which focuses on bird identification, contains images of
200 different species of birds. The multi-domain coverage of these two datasets allowed us
to explore the impact of diversity and task relevance in different application contexts. They
also exhibit varying levels of image diversity and class balance. CIFAR-10 includes diverse
scenes, lighting conditions, angles, and variations. In contrast, CUB-200 exhibits limited
image diversity, with predominantly consistent backgrounds. Furthermore, both datasets
represent numerous practical application scenarios, such as image classification, object
detection, and object recognition. By conducting experiments on CIFAR-10 and CUB-200,
we gain a better understanding of the performance and applicability of our methods.
In the context of image data applications, data augmentation methods add noise to
the original data to simulate other real-world scenarios, thus creating augmented images
for model training. To evaluate the improvement in data quality brought by different
data augmentation methods, this study employs RandAugment’s data augmentation
search strategy [47]. Unlike RandAugment, which applies random transformations to
images during training, this study scales up the dataset by employing the RandAugment
172
Electronics 2023, 12, 4077
strategy prior to training. To generate data diversity, this study selects n transformations
from a set of k = 16 data augmentation transformations with uniform probability, where
the augmentation magnitude for each transformation is set to m. By varying these two
parameters, the strategy can express a total of m × 2n potential augmentation policies,
where n represents the strategy for selecting data enhancement and m represents the
intensity of data augmentation. Each parameter of the transformations is scaled using
the same linear scale, ranging from 0 to 30, where 30 represents the maximum scale for a
given transformation, and then mapped to the parameter range of each transformation.
Subsequently, during the expansion and generation of the augmented dataset, we uniformly
sample dataset samples with probability c for transformation. All generated images are
then merged with the original dataset to create the augmented dataset. However, image
data alone is insufficient for calculating quality metrics. Therefore, the image data needs
to be abstracted into 2048-dimensional feature vectors. In this study, a pre-trained model,
ResNet101, is utilized to extract features from the generated augmented dataset.
3.2. Baseline
In this paper, we will present the results of our proposed method on CIFAR-10 and
CUB-200 datasets to demonstrate how our approach captures the intuitive notion of data
augmentation and can be applied to assess the quality of data augmentation in DL. The
existing data augmentation evaluation work can be divided into two categories: model vali-
dation for assessing effectiveness and statistical analysis for improving data quality. Model
validation aims for the highest level of accuracy, as seen in approaches like AutoAugment.
However, due to the immense computational demands, it may not be practically feasible
in real-world applications. On the other hand, work in the field of statistical analysis for
enhancing data quality tends to focus on calculating dataset quality at a finer granularity
within a specific dimension [38,48]. Quality is multidimensional, and solely relying on a
single dimension for analysis can provide a limited perspective. These two articles are both
based on the intrinsic attributes of the data and the contextual tasks for analysis [29,32].
However, the mathematical tools chosen in statistical analysis cannot achieve the goal of
correctly assessing data quality. This paper divides data quality into multiple dimensions,
allowing researchers to comprehensively assess the quality of data and identify areas in
need of improvement. Since it focuses on enhancing data to improve model performance,
it is essential to clearly define their definitions and relationships, while having effective
methods for utilizing these dimensions to evaluate the effectiveness of data augmentation.
So, we will compare our method with two baseline approaches: diversity and task inde-
pendence calculated using the mean and min-max criteria. Since the criteria for quality
fusion differ among these methods, we will employ our proposed quality fusion method to
calculate the final scores for all the approaches. Baseline’s formula is shown in Table 3.
In this section, this study conducted two sets of experiments to validate the effective-
ness of the proposed method. The first set of experiments evaluated the correlation between
the model precision on the test set and the quality evaluation results of the generated aug-
173
Electronics 2023, 12, 4077
mented dataset using different parameters for data augmentation. Due to the enormous
search space of data augmentation strategies, it was challenging to define it precisely.
Therefore, this study randomly selected 7 sets of parameters as the comparative parameters
for the experiment. The second set of experiments involved generating augmented datasets
of different sizes using the same set of parameters for data augmentation. The performance
of the model on these datasets was observed to see if it aligns with the algorithm results.
Epochs = 90, Init lr = 0.1 divide by 5 at 40th, 60th, 80th, Batchsize = 256, Weight decay = 5 × 10−4 ,
CIFAR10 densenet161
momentum = 0.9
CUB200 NtsNet Epochs = 50, Lr = 0.001, Batchsize = 16, Weight decay = 1 × 10−4 , Momentum = 0.9
Due to the different dimensions between the scores of the method and the baseline and
the accuracy of the model, we first divide all the model accuracies by the accuracy of the
model on the original training set, similar to how the scores are calculated. We refer to this
as the actual score of the augmented training set. Then, we consider using Cosine Similarity
(CS) to calculate the similarity between the estimated scores and the actual scores. We sort
the scores into a vector according to certain rules, such as enhancement magnitude or scale,
and then calculate the cosine value of the angle between them. This cosine value reflects
the similarity of the changing trends between the estimated scores and the actual scores.
However, we still need to know the absolute distance between the two scores. Therefore,
we also select Mean Squared Error (MSE) as our second metric. By combining these two
metrics, we can comprehensively analyze the strengths and weaknesses of the algorithm.
A larger CS and a smaller MSE indicate better results.
174
Electronics 2023, 12, 4077
We have chosen datasets from significantly different domains—images and text, which
represent two major data types. We also ensure that the datasets are large enough to
facilitate meaningful augmentation and analysis. By calculating the score differences after
augmentation using ADQE, we evaluate and visualize the differences between the two
datasets with the largest score differences. EuroSAT, being an image dataset, undergoes
the same augmentation and evaluation methods as described in this paper. For the text
dataset IMDB, we apply data augmentation methods such as Optical Character Recognition
(OCR), semantic augmentation, and summarization. Since the data type is different, we
cannot directly use ADQE to calculate evaluation scores, and adjustments need to be
made to the methods. The text dataset also involves extracting feature vectors to calculate
diversity and task relevance, but text is composed of words rather than pixels. Q2 and Q3
need to be recalculated using words and characters. We combine these two metrics and
redefine them as the information entropy calculated from frequency of the top 10,000 most
frequent words.
175
Electronics 2023, 12, 4077
(a) (b)
(c) (d)
(e) (f)
Figure 2. Under the same scale and different data augmentation scenarios, our method achieves the
best quality evaluation results. (a) Evaluation of three algorithms and the model on the augmented
dataset using the CIFAR-10 dataset. (b) Evaluation of three algorithms and the model on the
augmented dataset using the CUB-200 dataset. (c) Partial indicator scores of the augmented dataset
by three algorithms on the CIFAR-10 dataset. (d) Performance evaluation of the three algorithms
using CS. (e) Performance evaluation of the three algorithms using MSE. (f) Partial indicator scores
of the augmented dataset by three algorithms on the CUB-200 dataset. In (a–c,f), the x-axis represents
the data augmentation strategies, including the selection probability of data augmentation and the
magnitude of data augmentation. The selection of these strategies is randomly chosen within the
given range. The word “mine” represents the methodology of this paper.
176
Electronics 2023, 12, 4077
In our data augmentation quality evaluation metric calculation, we partition the over-
all evaluation metric based on the mean and variance of the data augmentation quality.
The mean measures the distance between the augmented data and the target data distribu-
tion. The variance measures the distribution uniformity and diversity of the augmented
data. From the q6 curves in Figure 2c,d, for all methods, their mean calculation results
are relatively consistent. They accurately calculate the distance between all augmented
training data and the test set distributions. However, only the entropy method meets the
criteria for variance estimation. Since semantic information has multiple dimensions and is
not entirely related to labels, it is divided into class balance and diversity. Even through
augmentation, class-related semantics will not be changed or lost. By using clustering
dimensionality reduction, we can clearly understand the spatial distribution of the data,
which remains relatively unchanged before and after augmentation. However, within the
class, due to color changes, deformations, inversions, and other operations, these noises are
substantially supplemented. We need to assess the distribution balance of data label-related
features, and so on. The minmax criteria emphasizes the farthest distance of the data
distribution. Although the mean criteria is more balanced, it is also affected by extreme
values. Moreover, data augmentation methods are likely to inject many extreme values
due to their randomness, thus affecting the evaluation. Entropy can smooth these extreme
values, categorizing them together, and calculate the overall diversity evaluation value
by statistically counting the effective number of categories under different dimensions.
Experimental results demonstrate that our framework can effectively evaluate data aug-
mentation quality by incorporating entropy, visually and comprehensively showcasing the
improvements brought by data augmentation to the dataset.
From another perspective, Figure 2a–d illustrate that the model accuracy is signifi-
cantly enhanced primarily through the increased diversity of the data. However, as the
diversity increases, there is a bottleneck, and the task relevance decreases, resulting in
poorer performance than the initial state. This is primarily related to the data augmen-
tation techniques chosen in this paper. Most of the changes involve alterations in color
space, shape, and orientation, which enhance data diversity and model robustness. Fewer
enhancements focus on image quality or denoising, reducing the degree of association
between samples in the dataset and the task objectives, which increases the dimensions of
the data that need to be analyzed. The target variables become less understandable and
predictable. Therefore, the experimental results inevitably show a decrease in task relevance
and an increase in diversity. The intensity of data enhancement up to a certain point reduces
the benefits and loses more data quality. By clearly defining dimensions, researchers can
better guide the selection and implementation of data augmentation strategies. Using
data quality dimensions to evaluate the effectiveness of data augmentation in experiments
helps provide empirical evidence. This allows researchers to quantify improvements and
demonstrate that the measures taken have indeed enhanced model performance.
The results of the second group of experiments are shown in Figure 3. The evaluation
trends of the three methods are mostly consistent, showing an improvement in the quality
fusion score as the scale increases, although the rate of improvement decreases. The
minmax principle still exaggerates the quality of data augmentation. The model precision
also increases as the scale expands, but gradually starts to decline after reaching a four-fold
scale. This indicates that the repetition of data augmentation-generated images can have a
negative impact on model performance, leading to overfitting. It demonstrates that more
data are not always better. According to Figure 3b, our algorithm and the averaging method
have similar performance, which is superior to the minmax values.
177
Electronics 2023, 12, 4077
(a) (b)
Figure 3. Under the same data augmentation with different scales, our algorithm and the averaging
method exhibit similar performance. (a) Evaluation of the three algorithms and the model on the
augmented dataset of CUB-200 dataset. (b) Performance evaluation of the three algorithms using CS
and MSE. The word “mine” represents the methodology of this paper.
178
Electronics 2023, 12, 4077
Metrics Q1 Q2 Q3 Q4 Q5 Q6
Q1 1
Q2 −0.05 1
Q3 −0.29 0.76 * 1
Q4 0.81 * 0.10 −0.10 1
Q5 −0.19 0.62 * 0.81 * −0.14 1
Q6 −0.92 ** 0.02 0.24 −0.57 0.07 1
* p < 0.05, ** p < 0.01.
Metrics Q1 Q2 Q3 Q4 Q5 Q6
Q1 1
Q2 −0.23 1
Q3 −0.71 * 0.73 * 1
Q4 0.76 * 0.11 −0.24 1
Q5 −0.71 * 0.79 * 0.95 ** −0.19 1
Q6 −0.97 ** 0.36 0.79 * −0.66 0.8 * 1
* p < 0.05, ** p < 0.01.
However, there are slight differences in the experimental results between the two
datasets. For example, in CIFAR-10, Q6 is strongly correlated only with Q1 . But in CUB-
200, Q6 is strongly correlated with not only Q1 but also Q3 and Q5 . In particular, the
correlation with Q5 increased from 0.07 to 0.8, showing two extremes. From the perspective
of dataset properties, CIFAR-10 is a large-scale dataset with data from various scenarios
and weather conditions, and data augmentation has a limited impact on its basic image
feature distribution, and the distribution of each class changes little. Data augmentation
mainly affects semantic information in this dataset, such as object inversion not affecting
recognition. Therefore, only Q1 and Q6 show correlation. However, in the small dataset
CUB-200, most images have a single color and similar backgrounds. Image features have
a greater impact on the final results, which is also reflected in the correlation of indicator
results. Therefore, when using our framework to evaluate data augmentation quality,
analyzing the correlations of various indicators can help understand the strengths and
weaknesses of the dataset.
Figure 4 illustrates the impact of clustering algorithms on the framework results,
where only Q1 and Q6 are accelerated by clustering in the framework. Figure 4a,b show the
accuracy of the results before and after clustering. The original algorithm uses brute force
to compute the distance between pairwise image vectors to obtain evaluation results. In
Figure 4a, the clustered results show some differences compared to the original results, with
a similar trend but a decrease in evaluation scores after clustering. However, as Figure 4b
indicates, both indicators improve after clustering, proving that clustering actually helps
improve the accuracy of the evaluation results. Figure 4c displays the saved running time
through clustering. The x-axis represents the product of the number of classes and the
179
Electronics 2023, 12, 4077
number of samples in each class in the dataset. It is evident that when the number of classes
is large and the number of samples per class is small, clustering only saves about half of the
time. However, when the number of samples per class is much larger than the number of
classes, clustering can save over 90% of the time. This is because Q1 separates the dataset
based on classes, and the time complexity within each class is O(n2 ). After fast clustering,
since the number of classes is roughly the same, it can be considered a constant factor.
Therefore, the fewer the number of classes and the more samples per class, the more time
can be saved. In Figure 5, it can be observed that replacing different pre-trained models
does not significantly affect the evaluation results of this method. The more sufficient
the pre-training, the more accurate the grasp of image features, and the generated image
feature vectors also have certain discriminative power. The two pre-trained models selected
in this study, both trained on ImageNet, did not show significant differences.
(a) (b)
(c)
Figure 4. The clustering results have not only optimized the computation of the algorithm but also
improved its performance. (a) The running times of the unoptimized and optimized algorithms are
shown for different numbers of categories and samples within each category. The x-axis represents
the product of the number of categories and the number of samples within each category in the
dataset. (b) Partial scores of the original algorithm and the optimized algorithm are presented
for the CUB-200 dataset. The x-axis represents the total number of samples in the augmented
training set. (c) Performance evaluation of the original algorithm and the optimized algorithm in the
CUB-200 dataset.
180
Electronics 2023, 12, 4077
(a) (b)
Figure 5. Under the same data augmentation with different scales, our algorithm and the averaging
method exhibit similar performance. (a) Evaluation of the three algorithms and the model on the
augmented dataset of CUB-200 dataset. (b) Performance evaluation of the three algorithms using CS
and MSE.
(a) (b)
Figure 6. This is a display of the EuroSAT dataset, each row is a different class. (a) Origin dataset.
(b) The augmented dataset with the highest score.
181
Electronics 2023, 12, 4077
(a) (b)
Figure 7. The t-SNE algorithm reduces dimensionality of all features in the IMDB dataset and
visualizes them. (a) Origin dataset. (b) The augmented dataset with the highest score.
The context of intrinsic data attributes and the intended use of the data in the evalua-
tion method cannot be balanced well, which limits its applicability. For example, remote
sensing images inherently have issues like low contrast, noise, and blurriness. Additionally,
remote sensing images often contain imprecisely shaped objects, unlike medical imaging,
which require precise target recognition. Therefore, remote sensing images are not highly
sensitive to task relevance, and optimizing intrinsic data attributes for quality improvement
results in more significant quality loss compared to the decrease in task relevance. In
contrast, medical imaging has sufficiently high image quality. In an attempt to enhance
diversity, it leads to a loss in data quality and task relevance, ultimately resulting in an
overall decline in quality. Different tasks have varying quality requirements for datasets,
and during the calculation, parameter variations in three areas—diversity, class balance,
and task relevance—need to be considered. This is also a direction we need to focus on
in the future. When our method is not tuned to the correct parameter resulting in poor
performance, a more well-performing parameter can be fitted by combining the evaluation
results of multiple augmented datasets with a small number of training results. Then try
evaluating a new augmented dataset again until the parameter is similar to most of the
training results.
Different downstream tasks have different priorities. The evaluation method may
succeed for the assumed task but fail for the target task. For example, in the case of object
recognition tasks, the optimal augmented dataset obtained using the method described
in this paper may be suboptimal for this task. This is because image similarity metrics
are better suited for classification tasks rather than object recognition, where the goal is
to detect similarity between small regions. Task relevance metrics, on the other hand, can
replace this by calculating the similarity between annotated regions and other regions.
Data can exhibit significant differences in terms of structure, format, complexity, size,
noise levels, and more. Evaluation methods tailored to one data type may fail when
applied to another data type. For image datasets, we calculate statistics on pixels and
textures. However, text datasets are composed of characters, words, and sentences, so the
intrinsic data quality of different data types needs rules established by domain experts
for measurement. In addition to this, this paper’s method reflects high applicability and
robustness in data diversity. The main reason is that no matter what type of deep learning
data can use neural networks to extract multi-dimensional feature vectors, which constitutes
most of the indicators of this paper’s method based on the calculation of feature vectors, so
to a certain extent this paper has the ability to adapt to most of the fields.
182
Electronics 2023, 12, 4077
In most real-world datasets, a certain degree of anomalies, noise, errors, and outliers
can be expected. Truly clean and pristine data are a rare find. This is particularly true for
sensor data, remote sensing data, and internet-derived data, which often exhibit higher
variability and a higher incidence of anomalies. The process of data augmentation can
introduce new error patterns. In this paper, we split the dataset into three attributes and
utilize techniques such as information entropy to effectively quantify the number of patterns
and task relevance. Excessive errors can lead to a decrease in task relevance, which can
manifest in the final results. However, because this paper employs unsupervised algorithms,
it is challenging to distinguish and optimize true anomalies from acceptable variations.
The diversity and complexity of real-world data make the development of universally
applicable data quality assessment methods inherently challenging. Thoughtful method
design, extensive evaluation, avoiding overfitting, and relaxing assumptions can enhance
applicability. In practical scenarios, the methods outlined in this paper can guide the con-
struction of datasets. If the collected sample dataset is insufficient for training deep learning
models, you can evaluate the effectiveness of data augmentation using quantitative metrics
defined based on the three dataset attributes and the ultimate goal, such as improving
classification accuracy, reducing error rates, or enhancing signal-to-noise ratios. Utilizing
data augmentation and assessment methods can address the long-tail problem in data,
improve data consistency, and reduce the labeling workload. As more real-world data
become available, it is essential to continuously reassess and enhance data augmentation.
The demand for augmentation may evolve over time. In summary, the methods described
in this paper contribute to the development of well-generalized models from limited and
imperfect real-world data.
Furthermore, the methods outlined in this paper are easy to implement and can be
seamlessly integrated into the deep learning workflow using popular frameworks like
TensorFlow or PyTorch. Leveraging the data pipelines within these frameworks and the
data monitoring integrated into them, it becomes straightforward to quickly compute
intrinsic data attribute metrics and feature vectors generated by pre-trained models after
generating multiple augmented datasets from the input data. Once the optimal dataset for
evaluation is obtained, you can proceed directly to training your model.
5. Conclusions
This study aims to enhance model performance through the improvement of data
quality. By categorizing data quality into three key dimensions, diversity, class balance,
and task relevance, and evaluating the effectiveness of data augmentation within these
dimensions, we have achieved the following key outcomes and conclusions.
Firstly, we have successfully deconstructed the complexity of data quality into three
essential dimensions. This aids in providing a more comprehensive understanding of data
quality. Diversity ensures the inclusion of various sample types in the dataset, class balance
helps address imbalances in class distribution, and task relevance ensures the alignment of
data with the actual task at hand.
Secondly, by assessing the impact of data augmentation methods across these three
dimensions, we can quantitatively measure the influence of different enhancement strate-
gies on data quality. Our experimental results demonstrate that, with reasonable selection
and adjustment of augmentation strategies, significant improvements can be made in data
diversity and class balance while maintaining a high degree of relevance to the task.
Most importantly, our work holds significant practical implications. In the modern
fields of machine learning and artificial intelligence, data serves as the foundation of
successful models. By elevating data quality, we can enhance model generalization, mitigate
overfitting risks, improve model robustness in real-world scenarios, and provide more
accurate predictive and decision-making support across various application domains.
This impact extends to critical areas such as medical diagnostics, financial risk analysis,
autonomous driving, and beyond.
183
Electronics 2023, 12, 4077
In the future, we aim to further explore the relationships among data quality dimen-
sions and strive towards the automatic selection of parameters tailored to specific domains
and tasks. The most notable improvement will be in terms of efficiency, as manual pa-
rameter selection and adjustment will no longer be necessary. Additionally, this approach
will reduce subjectivity and bias, ensuring the replicability and comparability of experi-
mental results. Most importantly, it will simplify experimentation, making this method
accessible to a broader range of researchers interested in understanding the principles of
data construction.
In summary, our research provides a systematic approach to enhancing data quality,
enabling researchers to better comprehend, evaluate, and enhance data for improved
machine learning model performance. This work offers robust guidance for future research
and applications, with the potential to make a positive impact in data-driven fields.
Author Contributions: Conceptualization, X.C., Y.L. and C.M.; methodology, X.C. and Y.L.; software,
X.C., Y.L. and C.M.; validation, X.C., Y.L. and C.M.; formal analysis, X.C. and Y.L.; investigation,
Y.L.; resources, X.C., Y.L., Z.X. and C.M.; data curation, X.C., Y.L., H.L. and S.Y.; writing—original
draft preparation, X.C. and Y.L.; visualization, X.C. and Y.L.; supervision, X.C., Z.X. and C.M.; project
administration, X.C. and C.M.; funding acquisition, X.C. and C.M.; X.C., Y.L. and C.M.: significance
contributions to the manuscript. All authors have read and agreed to the published version of the
manuscript.
Funding: This research was supported by Outstanding Youth Team Project of Central Universi-
ties(QNTD202308) and National Key R&D Program of China (2022YFF1302700).
Data Availability Statement: Not applicable.
Acknowledgments: The authors thank the anonymous reviewers for their valuable comments.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Zhang, T.; Chen, J.; Li, F.; Zhang, K.; Lv, H.; He, S.; Xu, E. Intelligent fault diagnosis of machines with small & imbalanced data: A
state-of-the-art review and possible extensions. ISA Trans. 2022, 119, 152–171. [PubMed]
2. Chlap, P.; Min, H.; Vandenberg, N.; Dowling, J.; Holloway, L.; Haworth, A. A review of medical image data augmentation
techniques for deep learning applications. J. Med. Imaging Radiat. Oncol. 2021, 65, 545–563. [CrossRef] [PubMed]
3. Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; et al.
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 2018, 362, 1140–1144.
[CrossRef] [PubMed]
4. Hao, X.; Liu, L.; Yang, R.; Yin, L.; Zhang, L.; Li, X. A Review of Data Augmentation Methods of Remote Sensing Image Target
Recognition. Remote Sens. 2023, 15, 827. [CrossRef]
5. Chen, Y.; Yang, X.H.; Wei, Z.; Heidari, A.A.; Zheng, N.; Li, Z.; Chen, H.; Hu, H.; Zhou, Q.; Guan, Q. Generative adversarial
networks in medical image augmentation: A review. Comput. Biol. Med. 2022, 144, 105382. [CrossRef]
6. Yang, J.; Guo, X.; Li, Y.; Marinello, F.; Ercisli, S.; Zhang, Z. A survey of few-shot learning in smart agriculture: Developments,
applications, and challenges. Plant Methods 2022, 18, 28. [CrossRef]
7. Maslej-Krešňáková, V.; Sarnovskỳ, M.; Jacková, J. Use of Data Augmentation Techniques in Detection of Antisocial Behavior
Using Deep Learning Methods. Future Internet 2022, 14, 260. [CrossRef]
8. Shorten, C.; Khoshgoftaar, T.M.; Furht, B. Text data augmentation for deep learning. J. Big Data 2021, 8, 101. [CrossRef]
9. Gong, C.; Wang, D.; Li, M.; Chandra, V.; Liu, Q. Keepaugment: A simple information-preserving data augmentation approach. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021;
pp. 1055–1064.
10. Iwana, B.K.; Uchida, S. An empirical survey of data augmentation for time series classification with neural networks. PLoS ONE
2021, 16, e0254841. [CrossRef]
11. Zhou, X.; Hu, Y.; Wu, J.; Liang, W.; Ma, J.; Jin, Q. Distribution bias aware collaborative generative adversarial network for
imbalanced deep learning in industrial IoT. IEEE Trans. Ind. Inform. 2022, 19, 570–580. [CrossRef]
12. Bishop, C.M. Training with noise is equivalent to Tikhonov regularization. Neural Comput. 1995, 7, 108–116. [CrossRef]
13. Hernández-García, A.; König, P. Data augmentation instead of explicit regularization. arXiv 2018, arXiv:1806.03852.
14. Carratino, L.; Cissé, M.; Jenatton, R.; Vert, J.P. On mixup regularization. arXiv 2020, arXiv:2006.06049.
15. Shen, R.; Bubeck, S.; Gunasekar, S. Data augmentation as feature manipulation. In Proceedings of the International Conference
on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 19773–19808.
184
Electronics 2023, 12, 4077
16. Ilse, M.; Tomczak, J.M.; Forré, P. Selecting data augmentation for simulating interventions. In Proceedings of the International
Conference on Machine Learning, Virtual Event, 18–24 July 2021; pp. 4555–4562.
17. Allen-Zhu, Z.; Li, Y. Feature purification: How adversarial training performs robust deep learning. In Proceedings of the 2021
IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), Denver, CO, USA, 7–10 February 2022; pp. 977–988.
18. Kong, Q.; Chang, X. Rough set model based on variable universe. CAAI Trans. Intell. Technol. 2022, 7, 503–511. [CrossRef]
19. Zhao, H.; Ma, L. Several rough set models in quotient space. CAAI Trans. Intell. Technol. 2022, 7, 69–80. [CrossRef]
20. Kusunoki, Y.; Błaszczyński, J.; Inuiguchi, M.; Słowiński, R. Empirical risk minimization for dominance-based rough set approaches.
Inf. Sci. 2021, 567, 395–417. [CrossRef]
21. Chen, S.; Dobriban, E.; Lee, J.H. A group-theoretic framework for data augmentation. J. Mach. Learn. Res. 2020, 21, 9885–9955.
22. Mei, S.; Misiakiewicz, T.; Montanari, A. Learning with invariances in random features and kernel models. In Proceedings of the
Conference on Learning Theory, Boulder, CO, USA, 15–19 August 2021; pp. 3351–3418.
23. Wand, Y.; Wang, R.Y. Anchoring data quality dimensions in ontological foundations. Commun. ACM 1996, 39, 86–95. [CrossRef]
24. Abdullah, M.Z.; Arshah, R.A. A review of data quality assessment: Data quality dimensions from user’s perspective. Adv. Sci.
Lett. 2018, 24, 7824–7829. [CrossRef]
25. Firmani, D.; Mecella, M.; Scannapieco, M.; Batini, C. On the meaningfulness of “big data quality”. Data Sci. Eng. 2016, 1, 6–20.
[CrossRef]
26. Jarwar, M.A.; Chong, I. Web objects based contextual data quality assessment model for semantic data application. Appl. Sci.
2020, 10, 2181. [CrossRef]
27. Sim, K.; Yang, J.; Lu, W.; Gao, X. MaD-DLS: Mean and deviation of deep and local similarity for image quality assessment. IEEE
Trans. Multimed. 2020, 23, 4037–4048. [CrossRef]
28. Senaratne, H.; Mobasheri, A.; Ali, A.L.; Capineri, C.; Haklay, M. A review of volunteered geographic information quality
assessment methods. Int. J. Geogr. Inf. Sci. 2017, 31, 139–167. [CrossRef]
29. Chen, H.; Chen, J.; Ding, J. Data evaluation and enhancement for quality improvement of machine learning. IEEE Trans. Reliab.
2021, 70, 831–847. [CrossRef]
30. Gosain, A.; Saha, A.; Singh, D. Measuring harmfulness of class imbalance by data complexity measures in oversampling methods.
Int. J. Intell. Eng. Inform. 2019, 7, 203–230. [CrossRef]
31. Bellinger, C.; Sharma, S.; Japkowicz, N.; Zaïane, O.R. Framework for extreme imbalance classification: SWIM—Sampling with the
majority class. Knowl. Inf. Syst. 2020, 62, 841–866. [CrossRef]
32. Li, A.; Zhang, L.; Qian, J.; Xiao, X.; Li, X.Y.; Xie, Y. TODQA: Efficient task-oriented data quality assessment. In Proceedings of the
2019 15th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN), Shenzhen, China, 11–13 December 2019;
pp. 81–88.
33. Delgado-Bonal, A.; Marshak, A. Approximate entropy and sample entropy: A comprehensive tutorial. Entropy 2019, 21, 541.
[CrossRef]
34. Li, Y.; Chao, X.; Ercisli, S. Disturbed-entropy: A simple data quality assessment approach. ICT Express 2022, 8, 309–312. [CrossRef]
35. Liu, L.; Miao, S.; Liu, B. On nonlinear complexity and Shannon’s entropy of finite length random sequences. Entropy 2015,
17, 1936–1945. [CrossRef]
36. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
37. Sarfraz, S.; Sharma, V.; Stiefelhagen, R. Efficient parameter-free clustering using first neighbor relations. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8934–8943.
38. Friedman, D.; Dieng, A.B. The Vendi Score: A Diversity Evaluation Metric for Machine Learning. arXiv 2022, arXiv:2210.02410.
39. Mishra, S.P.; Sarkar, U.; Taraphder, S.; Datta, S.; Swain, D.P.; Saikhom, R.; Panda, S.; Laishram, M. Multivariate Statistical Data
Analysis- Principal Component Analysis (PCA). Int. J. Livest. Res. 2017, 7, 60–78.
40. Geirhos, R.; Rubisch, P.; Michaelis, C.; Bethge, M.; Wichmann, F.A.; Brendel, W. ImageNet-trained CNNs are biased towards
texture; increasing shape bias improves accuracy and robustness. arXiv 2018, arXiv:1811.12231.
41. Lore, K.G.; Akintayo, A.; Sarkar, S. LLNet: A deep autoencoder approach to natural low-light image enhancement. Pattern
Recognit. 2017, 61, 650–662. [CrossRef]
42. Yang, Y.; Xu, Z. Rethinking the value of labels for improving class-imbalanced learning. Adv. Neural Inf. Process. Syst. 2020,
33, 19290–19301.
43. Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International
Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988.
44. Xu, Y.; Lu, Y. Adaptive weighted fusion: A novel fusion approach for image classification. Neurocomputing 2015, 168, 566–574.
[CrossRef]
45. Ahmad, S.; Pal, R.; Ganivada, A. Rank level fusion of multimodal biometrics using genetic algorithm. Multimed. Tools Appl. 2022,
81, 40931–40958. [CrossRef]
46. Nawaz, S.; Calefati, A.; Caraffini, M.; Landro, N.; Gallo, I. Are these birds similar: Learning branched networks for fine-grained
representations. In Proceedings of the 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ),
Dunedin, New Zealand, 2–4 December 2019; pp. 1–5.
185
Electronics 2023, 12, 4077
47. Cubuk, E.D.; Zoph, B.; Shlens, J.; Le, Q.V. Randaugment: Practical automated data augmentation with a reduced search space. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19
June 2020; pp. 702–703.
48. Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training gans. In
Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December
2016; pp. 2234–2242.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
186
electronics
Article
An Improved Spatio-Temporally Smoothed Coherence Factor
Combined with Delay Multiply and Sum Beamformer
Ziyang Guo 1,2 , Xingguang Geng 1 , Fei Yao 1 , Liyuan Liu 1,2 , Chaohong Zhang 1,2 , Yitao Zhang 1,2, * and
Yunfeng Wang 1,2
Abstract: Delay multiply and sum beamforming (DMAS) is a non-linear method used in ultrasound
imaging which offers superior performance to conventional delay and sum beamforming (DAS).
While the combination of DMAS and coherence factor (CF) can further improve single plane-wave
imaging lateral resolution, by using CF to weight the DMAS output, the main lobe width and
aberration effects can be suppressed, which will improve the disadvantage of low lateral resolution
when imaging with a single plane-wave. However, in low signal-to-noise ratio (SNR) environments,
the speckle variance of the image increases, and there are black area artifacts around high echo objects.
To improve the quality of the scatter without significantly reducing the lateral resolution of the
DMAS-CF, this paper proposes an adaptive spatio-temporally smoothed coherence factor (GSTS-CF)
combined with delay multiply and sum beamformer (DMAS + GSTS-CF), which uses the generalized
coherence factor (GCF) as a local coherence detection tool to adaptively determine the subarray length
to obtain an improved adaptive spatio-temporally smoothed factor, and uses this factor to weight the
output of DMAS. The simulation and experimental data show that the proposed method improves
lateral resolution (20 mm depth) by 86.87% compared to DAS, 52.13% compared to DMAS, 15.84%
compared to DMAS + STS-CF, and has a full width at half maxima (FWHM) similar to DMAS-CF.
The proposed method improves the speckle signal-to-noise ratio (sSNR) by 87.85% (simulation) and
77.84% (in carotid) compared to DMAS-CF, 20.37% (simulation) and 40.74% (in carotid) compared to
Citation: Guo, Z.; Geng, X.; Yao, F.; DMAS, 15.03% (simulation) and 13.46% (in carotid) compared to DMAS + STS-CF, and has sSNR and
Liu, L.; Zhang, C.; Zhang, Y.; Wang, Y. scatter variance similar to DAS. This indicates that the method improves scatter quality (lower scatter
An Improved Spatio-Temporally
variance and higher sSNR) without significantly reducing lateral resolution.
Smoothed Coherence Factor
Combined with Delay Multiply and
Keywords: ultrasound imaging; plane-wave; beamforming; coherence factor; adaptive; spatio-
Sum Beamformer. Electronics 2023, 12,
temporally smoothed; delay multiply and sum beamforming
3902. https://doi.org/10.3390/
electronics12183902
lateral resolution and scattering retention performance, we combined the new coherence factor
with the DMAS method to further improve lateral resolution.
One of the most common beamformers is the delay and sum beamforming (DAS),
but its ability to improve image resolution and suppress clutter interference is limited.
Giulia Matrone et al. introduced DMAS based on receive aperture autocorrelation, which,
unlike DAS, is a non-linear algorithm in which signals are combined and coupled and
then multiplied before summing [5]. This means that a correlation operation is conducted
on the echoes. Since DMAS multiplies echoes of almost the same frequency, DC and
second harmonic components appear in the output spectrum. Therefore, a band-pass
filter is added after the DMAS output to filter out DC components and higher harmonic
components, while the signal centered at 2 f 0 remains unchanged ( f 0 is the central frequency
of the echo), and finally the output of the filtered delayed multiplication sum (F-DMAS)
is obtained [6]. Compared to DAS, DMAS better suppresses spurious and noise via a
correlation operation, brings the measure of backward scattering signal coherence into
the beamforming process, and increases the number of new “artificial apertures” (because
the coefficient of the autocorrelation function is 2N − 1 and N is the number of received
apertures), thus reducing f # and resulting in an improvement in lateral resolution [7].
However, this imaging method requires further suppression of the side lobes to improve
the imaging quality.
A number of scholars have proposed correcting the output of DMAS with a CF-like
method, which can develop for side lobe suppression, clutter reduction, and aberration
correction. By adaptively weighting the beamsum, they can enhance image contrast
without sacrificing spatial resolution. In addition, they have low computational complexity
and are easy to implement. The most representative CF-like method is the coherence
factor (CF) [8,9], which is defined as the ratio between the coherent energy of the aperture
received signal and the total energy (non-coherent). By using CF to weigh the output of
beamforming, the side lobes and aberration effects can be suppressed, but this darkens
the image and even reconstructs the image with errors. p.-C. Li and M.-L. Li proposed
the generalized coherence factor (GCF), a spatial frequency domain version of CF, which
adds a low frequency signal that is not very different from the axial fundamental frequency
signal to the molecule of CF and improves the preservation of scatter [10]. Camacho
et al. [11] designed the phase coherence factor (PCF) and the sign coherence factor (SCF).
The principle is to replace the amplitude information with the phase information, and a
linear or exponential relationship curve was added to regulate the suppression of off-axis
signals and the retention of background scatter. This suppresses the side lobe and improves
the lateral resolution. The implementation of this technique is simple and practical.
Although the CF-like method combined with the DMAS beamformer has advantages,
the images may suffer some undesirable effects in a low-SNR environment. They include
overall image brightness reduction, increased speckle variance, underestimation of the
size of the point target, black area artifacts in the region around the high echo reflector,
and even removal of the speckle pattern. To solve the above problems, we introduced
and improved the spatio-temporally smoothed coherence factor (STS-CF) proposed by
MengLing Xu et al. [12], the essence of which is to measure the coherence between the split
subarrays. It uses spatial smoothing (i.e., sub-aperture averaging) to create overlapping
subarrays and temporal smoothing across multiple time samples to calculate the energy
of the coherence sum [13]. This method introduces a tunable factor, achieving a balance
between image quality and algorithmic robustness. However, the value of the subarray
length L is determined empirically, and the most appropriate L is different in different
environments, so the method performs generally in clinical applications.
To solve the above problem, we propose the GSTS-CF. In this study, we use GCF to
detect local coherence and adaptively determine the subarray length for spatial smooth-
ing [14]. Using this factor to weigh the DMAS output can improve the scatter quality
without significantly reducing the lateral resolution. This is more applicable in complex
clinical settings. Section 2 briefly introduces the framework background of GCF, STS-CF,
188
Electronics 2023, 12, 3902
and DMAS, and then we describe the proposed method. Section 3 describes the simulation
setup and experimental steps and provides some metrics for evaluating a different beam-
former. Section 4 shows the obtained images and discusses the results. The performance of
these methods and the possibilities for further improvements are discussed in Section 5.
Section 6 provides a conclusion of the proposed method of this paper.
where L is the length of the subarrays, and xm ( p + k ) is the delayed signal received by the
m-th element at time index p + k. The temporal smoothed technique divides the received
array into N − L + 1 overlapping subarrays containing L elements and uses the subarray
instead of a single element to measure the coherence of the signal. L as an adjustable
parameter is able to balance between performance and algorithmic robustness, when K = 0
and L = 0, STS-CF is CF; when K = 0 and L = M, STS − CF ≡ 1, which means no
correction to the beamformer output.
where h( f , p) is the discrete Fourier transform of the delay-compensated aperture data for
imaging point p, and the low-frequency region (LFR) is determined by the cutoff frequency
M0 , which determines the performance of the GCF. When M0 = 0, GCF becomes CF.
189
Electronics 2023, 12, 3902
is summed and band-pass filtered (BP). If the receiving aperture is N transducers, then
there will be N ( N − 1)/2 combinations. The expression of DMAS is:
N −1 N N −1 N
y DMAS (t) = ∑ ∑ ŝij [t] = ∑ ∑ sign si (t)s j (t) · si (t)s j (t) (4)
i =1 j = i +1 i =1 j = i +1
where si (t) and s j (t) are the delayed RF signals received by the i-th and j-th transducer
elements, and sign( x ) is the sign function. Due to the multiplication of signals with similar
frequencies, a DC and a second harmonic component appear in the spectrum of the DMAS.
Therefore, band-pass filtering is further introduced to filter the DC and higher frequency
components, and the filtered DMAS is called F-DMAS.
The flow of the algorithm is to weigh the output of the DMAS with the GSTS-CF
factor. The key point is to calculate the GSTS-CF factor. The calculation process of GSTS-CF
is divided into two steps. The first step is to improve the CF using a spatio-temporal
smoothing method to obtain the STS-CF factor. The second step is to detect local coherence
with the GCF and map the GCF onto the subarray length L so that its value varies adaptively.
So first, we divide the array into N − L + 1 subarrays, each containing L elements. This is
called the spatial smoothed method, and the diagram is shown in Figure 2.
Figure 2 shows the ultrasonic probe consisting of N array elements. The spatial
smoothing method is used to process the data by dividing the N array elements into
N − L + 1 subarrays, the first subarray contains the 1st to the Lth array element, the second
subarray contains the 2nd to the L + 1th array element, and so on, and the N − L + 1th
190
Electronics 2023, 12, 3902
subarray contains the N − L + 1th to the Nth array element. The spatial smoothed method
computes the coherence of the subarray beamsums instead of computing the coherence
with a single element. Equation (1) calculates coherence using a single subarray, while
Equation (2) is a spatial smoothed method based on Equation (1), which uses a subarray
of L array elements instead of a single array element to calculate the coherence between
the arrays. Further, we measure the coherence of the subarrays at 2k + 1 neighbouring
time samples instead of a single time sample, which have improved array gain on SNR and
lowered side lobe levels. This is called the spatio-temporally smoothed method. L, as an
adjustable parameter, can only be determined empirically. In order to make L adaptively
changeable, we use the GCF to detect local coherence and map the GCF to the subarray
length L.
In general, as the subarray length increases, the STS-CF approaches 1, thus enhancing
robustness at the expense of lateral resolution. Therefore, to maintain the scatter pattern,
the L value should be larger, while for echo-free cysts and highly echogenic reflectors, the
L value should be smaller to obtain satisfactory image resolution and contrast [16].
Considering the performance of the GCF, the GCF values are small in non-coherent
scattering targets (i.e., in echo-free capsules), large in strongly coherent scattering targets
(i.e., high-echo reflectors), and tend to be moderate in low-coherent scattering targets (i.e.,
scattering spots).
According to the analysis above, when the value of GCF is large or small, we want the
corresponding L to be small, while when the value of GCF tends to be medium, we want
the value of L to be large.
In order to further determine the mapping relationship between GCF and L, we select
some points in the incoherent region, strongly coherent region, and low-coherent region,
respectively, to calculate the GCF value, and determine the appropriate L value at that point.
The evaluation criterion of L-optimal solutions is that in the low coherence region, the
scattering variance is the smallest, and in the strongly coherent region and the incoherent
region, the lateral resolution is the best. Different echo targets correspond to different GCF
values, and we selected 50 different targets whose GCF values were uniformly distributed
between 0 and 1. We also calculated the optimal L values corresponding to these 50 targets
and plotted a scatter plot, as shown in Figure 3a, using the GCF value of the point as the
horizontal coordinate and the L value as the vertical coordinate. The scatter plot is plotted
in matlab and then the geom_smooth() function is used to fit the scatter points; the optimal
model is selected as the Gaussian function model by calculating the AIC value. Thus, the
mapping of GCF to L is shown in Equation (5), and the fitting curve is shown in Figure 3b.
GCF ( p)−0.5 2
L( p) = f ix N × e−( α )
(5)
140
120
100
80
60
40
20
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
GCF values
(a)ȱ (b)ȱ
Figure 3. (a) The scatter plot of appropriate L for GCF. (b) Fitting curves for scattered points.
191
Electronics 2023, 12, 3902
The f ix is an integer operation, and GCF ( p) is derived from Equation (3), which
takes values from 0 to 1, and the range of L( p) is 0 to N. The value of parameter α ranges
from 0 to 2. The smaller α is, the faster L( p) changes, and the more sensitive the algorithm
is to the detection of the target but the less robust [17]. In this paper, α = 0.2. The cutoff
frequency M0 of the low-frequency region in GCF is selected empirically, and different
selection will affect the results. In this paper, M0 = 10.
Then, bring L( p) into Equation (2) and replace fixed L with adaptive L(p) to obtain
the expression of GSTS-CF at point p:
2
M− L( p)+1 L( p)+l −1
∑kK=−K ∑l =1 ∑m=l xm ( p + k )
GSTS-CF(p) = 2 (6)
M − L( p)+1 L( p)+l −1
( M − L( p) + 1) ∑kK=−K ∑l =1 ∑m=l xm ( p + k )
The method adaptively changes the subarray length, which gives it a stronger scatter
retention capability compared with the conventional CF, but the noise reduction capability
is not sufficient, thus combining it with DMAS. Weighting the output of DMAS with
GSTS-CF to obtain the output expression of DMAS + GSTS-CF as:
192
Electronics 2023, 12, 3902
is sufficiently sharp, and sSNR is used to evaluate the quality of background scatter [20].
The CR, CNR, and sSNR are defined as follows:
μi
CR = 20log10 (8)
μb
|μ − μb |
CNR = i (9)
σb2 − σi2
μb
sSNR = (10)
σi
where μi and μb are the average image intensities (before logarithmic compression) of the
cyst and background within a region, while σi2 and σb2 are the corresponding variances.
4. Results
4.1. Simulated Point Target Results
From Figure 3a,b, we can see that the artifacts of DMAS are much smaller than
those of DAS, which is due to the fact that the correlation operation brings the metric of
backscattered signal coherence into the beamforming process, achieving better spurious
and noise rejection, but the image intensity decays as the depth increases [21–23]. The
weighted image has a better contrast and resolution, which can be seen in Figure 4c–f.
In the four images above, it can be found that the CF-weighted image has the sharpest
resolution of point pairs and fewer surrounding artefacts. The method proposed in this
paper (GSTS-CF) is the next best, and finally STS-CF is evaluated [14,24].
Figure 4. Simulated single plane-wave imaging with a dynamic range of 60 dB. (a) DAS image,
(b) DMAS image, (c) DMAS weighted by the CF, (d) DAS weighted by the CF, (e) DMAS weighted
by the STS-CF, (f) DMAS weighted by the GSTS-CF. All images are shown in a 60 dB dynamic range.
To further compare the lateral resolution of the different beamformer, we draw the
lateral projections of Figure 4 in Figure 5, while calculating the FWHM of the different
193
Electronics 2023, 12, 3902
beamformer based on the lateral projections [25]. From Figure 5, it can be found that the
combination of CF and DMAS has the narrowest main lobe width. Weighting DMAS with
STS-CF also improves the lateral resolution, but compared to DMAS-CF, there is still a gap.
In point imaging, the combination of DMAS and CF gives the best resolution because of
less clutter interference and a high echo SNR, but in complex environments with a low
SNR, the method will produce a large number of artifacts. The proposed method, which
combines the improved GSTS-CF with DMAS, has a slightly worse main lobe width and
side lobe amplitude than DMAS-CF, which weakens the suppression effect of CF and
makes a compromise between imaging clarity and algorithmic robustness [26]. It can also
be seen from Figure 5 that the GSTS-CF has a much-improved lateral resolution compared
to the STS-CF, and its FWHM converges to that of the CF-weighted beamformer. Although
CF-weighted beamformers have the best lateral resolution, they produce artefacts and
uneven background scatter in complex environments, as will be seen in the following
experiments.
Figure 5. Lateral projections of the single plane-wave images in Figure 4. The point pairs were located
at (a) 20 mm and (b) 40 mm. The corresponding zoomed-in figures are shown in (c,d).
From Table 1, we can see the variation in the lateral resolution of different beamformers.
The CF-weighted beamformer has a narrower main lobe and higher lateral resolution due
to the fact that CF works well in a simple scattering environment with a high signal-to-noise
ratio [27]. The combination of GSTS-CF and DMAS achieves a lateral resolution similar to
that of the CF-weighted beamformer, which is significantly better than DMAS as well as
DMAS + STS-CF.
194
Electronics 2023, 12, 3902
FWHM (mm)
Method
20 mm 40 mm
DAS 3.762 3.831
DMAS 1.032 1.157
DAS-CF 0.525 0.543
DMAS-CF 0.463 0.584
DMAS + STS-CF 0.587 1.034
DMAS + GSTS-CF 0.494 0.612
Figure 6. Single plane-wave images of the computer-generated cyst phantom reconstructed using
(a) DAS, (b) DMAS, (c) DAS+CF, (d) DMAS + CF, (e) DMAS + STS-CF, (f) DMAS + GSTS-CF. All
images are shown in a 60 dB dynamic range.
Lateral cross-sections through the cyst target in the simulated images are shown in
Figure 7. It can be seen that DMAS-CF and DMAS + GSTS-CF have the lowest average
grey values, indicating that the internal clutter of the cyst is effectively removed [30]. The
greatest variation is seen at the cyst demarcation line, indicating that the two methods have
the clearest boundaries. Table 2 presents the results of the evaluation of the parameters
195
Electronics 2023, 12, 3902
σb2 , CR, CNR, and sSNR, and the areas used to calculate these indicators are shown in
the rectangle in Figure 6 [31]. It can be seen that DMAS-CF has the highest CR and the
lowest CNR and sSNR, while DAS has the lowest CR and the highest CNR and sSNR.
This indicates that the normal DAS has the best ability to preserve scatter quality despite
its low CR; however, neither method has the best CR, CNR, and sSNR at the same time.
The DMAS + GSTS-CF is a compromise between the CR and CNR and sSNR. Its CR is
comparable to the DMAS-CF, and CNR and sSNR are comparable to the DAS, while it can
be seen that the GSTS-CF outperforms the STS-CF. For σb2 , DMAS + GCTS-CF is also low,
just above DAS. This indicates a more uniform background scatter.
Figure 7. Lateral cross-sections through the cyst target in the simulated images.
Table 2. The cyst average intensity, background average intensity, CR, CNR, and sSNR of cysts for
different methods.
(a)ȱ (b)ȱ
Figure 8. (a) Location of the collected carotid artery data. (b) Ultrasound signal acquisition device.
1 RF receiver and transmitter circuits, 2 line array probe, 3 signal generator.
196
Electronics 2023, 12, 3902
Figure 9. Carotid artery model maps and plane-wave imaging results. The green and red boxes
indicate the areas where data μi and μb were collected (a) Carotid artery model maps, (b) DAS,
(c) DMAS, (d) DMAS + CF, (e) DMAS + STS-CF, (f) DMAS + GSTS-CF.
Compared with the simulation, this experimental object has a more complex structure
and more noise disturbances. In addition to the coherent noise of the echoes considered in
the simulation, there are a series of conditions affecting the signal quality, such as phase
distortion caused by the different transmission media and signal distortion caused by the
limited performance of the acquisition device. Therefore, to improve the imaging quality
of the experiment, we need a more precise signal acquisition device with the assistance
of interpolation processing and ultrasonic image denoising technology [32]. Although all
of this experimental imaging has some speckle noise, the analysis of the performance of
different algorithms is not affected.
It can be seen from Figure 9 that DMAS brings about a higher contrast ratio, but
at the same time it results in artifacts in the background, and the overall quality of the
image becomes darker. DMAS-CF suppresses the signal excessively, leading to image
reconstruction errors, while DMAS + GSTS-CF has a more uniform background area, and
the demarcation line between the vessel wall and the lumen can be seen more clearly. At
the same time, there are fewer artifacts in the echoless region inside the lumen. Compared
to the over-suppression of DMAS + CF, DMAS + GSTS-CF provides higher algorithmic
robustness, suppresses artifacts within the vessel lumen, and results in more uniform tissue
in the perivascular region. In contrast, although DMAS + STS-CF had similar effects, its
performance was lower than that of DMAS + GSTS-CF.
The μlumen CR, CNR, and sSNR obtained by the different methods are given in Table 3,
and the regions used to estimate these metrics are marked with rectangular boxes in
Figure 9 (μlumen is the average value of the vascular lumen region, which reflects the ability
of clutter suppression in the vascular lumen). In a complex scattering environment, the
CF-weighted beamformer has a very low CNR and sSNR, while the DAS has the lowest
CR. The GSTS-CF well balances the CR, CNR, and sSNR, suggesting that the GSTS-CF can
preserve scatter patterns well and achieve a high contrast.
197
Electronics 2023, 12, 3902
Table 3. Average intensity of the carotid lumen and sSNR of the scattered area.
5. Discussion
In this paper, we use a coherence factor to improve the performance of DMAS, and
our motivation is to achieve a trade-off between lateral resolution and scattering retention
performance. Compared with DMAS + CF, the proposed method better preserves the
scatter pattern without significantly reducing the lateral resolution. Compared with DAS,
the proposed method greatly improves the lateral resolution and contrast while having an
approximate background pattern. GSTS-CF is essentially an improved STS-CF who uses
GCF as a local coherence detection tool and adaptively selects the appropriate subarray
length to conduct spatial smoothing. The combination of GSTS-CF and DMAS further
improves the image quality.
Tables 1–3 show that the proposed method improves lateral resolution (20 mm depth)
by 86.87% compared to DAS, 52.13% compared to DMAS, 15.84% compared to DMAS +
STS-CF, and has a full width at half maxima (FWHM), similar to DMAS-CF. The proposed
method improves the speckle signal-to-noise ratio (sSNR) by 87.85% (simulation) and
77.84% (in carotid) compared to DMAS-CF, 20.37% (simulation) and 40.74% (in carotid)
compared to DMAS, 15.03% (simulation) and 13.46% (in carotid) compared to DMAS +
STS-CF, and has sSNR and scatter variance similar to DAS.
Because the subarray length L for GSTS-CF is estimated by GCF, the performance of
the proposed method is influenced by the M0 (cutoff frequency in the molecule of GCF).
As can be seen from Equation (1), the GCF increases with the increase in M0 , leading to the
change in L( p) in Equation (4). From Figure 3, it can be seen that in incoherent target, L( p)
increases with the increase in GCF, which leads to a slight decrease in lateral resolution
but a more uniform scattering area. In strongly coherent target, L( p) decreases with the
increase in GCF, leading to an improvement in the lateral resolution but a decrease in the
scattering quality. In low-coherent targets, the change in GCF has little effect on the results
because the curve of GCF values mapping to the length of subarray L( p) changes slowly in
this region. In clinical applications, we can change the value of M0 according to different
environments, and the default of M0 in this paper is 10.
During carotid data acquisition, the SNR of the signal is much lower than during
cyst simulation, especially when receiving echoes from deeper regions with greater signal
attenuation. In Figure 9, the images generated by DMAS + CF may not be suitable for
clinical applications because of the significantly lower amplitude levels in the background
and the loss of texture information. This is due to the fact that each channel signal has
different amounts of correlated noise and interference, and they have different SNRs.
Therefore, the weighting factor varies widely, and these artifacts may appear.
DMAS + STS-CF uses the spatio-temporally smoothed method, which enhances the
robustness to noise interference and side lobe interference in coherent measurements. As
a result, there is a lower scattering variance and higher sSNR, but the lateral resolution
is reduced. However, the artifacts of DMAS + STS-CF are more severe in the carotid
experiment (Figure 9) than in the cyst simulation (Figure 6). Because the choice of L is fixed,
it is necessary to choose a different L for different scenarios in clinical applications, which
affects the performance of the algorithm.
In Figure 9f, DMAS + GSTS-CF has higher robustness in complex environments.
Compared to DMAS + STS-CF, the method estimates a value of L( p) with GCF at each
imaging point, which significantly removes these artifacts by improving the scatter pattern
198
Electronics 2023, 12, 3902
(lower scatter variance and higher sSNR) while maintaining the clutter rejection capability
(lower mean value (μlumen ) in the vessel lumen).
Since the adaptive subarray length is estimated by the GCF, the performance of
the proposed method is affected by the cut-off frequency M0, and therefore in clinical
applications there may be drawbacks, such as noise, clutter, or other types of artefacts, if the
parameters are not set correctly. Nevertheless, the proposed method also has potential in
clinical applications. This is because the proposed method allows for a flexible selection of
the subarray length L according to the echo target and thus improves the lateral resolution
along with speckle protection. Thus, it may have potential for applications in the heart,
carotid artery, thyroid, tumors, etc. [33–36]. Using the GCF to estimate the subarray length
L( p) leads to a significant increase in computational effort because of the large number of
Fourier transforms in the GCF. However, graphics processing unit calculations have been
used to accelerate these beamformers for real-time imaging and thus can be used in the
method proposed in this paper to improve computational efficiency.
6. Conclusions
To improve the speckle quality (lower scatter variance and higher sSNR) without
significantly reducing the lateral resolution, we propose an adaptive spatio-temporally
smoothed coherence factor called GSTS-CF and combine it with DMAS. The simulation
and experimental results show that the method can obtain better background scattering
without affecting the lateral resolution. The algorithm is more robust and more suitable for
clinical applications.
Author Contributions: In this paper, Z.G. is responsible for methodology and software. X.G. is
responsible for data curation and investigation. F.Y. is responsible for visualization, writing—original
draft preparation, and validation. Y.Z. is responsible for writing—reviewing and conceptualization.
L.L., C.Z. and Y.W. are responsible for supervision and editing. All team members participated in the
article work, and there were no conflicts of interest between the teams. All authors have read and
agreed to the published version of the manuscript.
Funding: This research was funded by Sichuan Science and Technology Major Project (No.2022ZDZX0033)
and the Key Research Pro-gram of the Chinese Academy of Sciences (No. ZDRW-ZS-2021-1).
Data Availability Statement: Data sharing is not applicable to this article.
Conflicts of Interest: There are no conflict of interest among the authors.
References
1. Wells, P.N.T. Ultrasonics in medicine and biology. Phys. Med. Biol. 1977, 22, 629–669. [CrossRef] [PubMed]
2. Tanter, M.; Fink, M. Ultrafast imaging in biomedical ultrasound. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2014, 61, 102–119.
[CrossRef] [PubMed]
3. Montaldo, G.; Tanter, M.; Bercoff, J.; Benech, N.; Fink, M. Coherentplane-wave compounding for very high frame rate ultrasonog-
raphy and transient elastography. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2009, 56, 489–506. [CrossRef] [PubMed]
4. Matrone, G.; Savoia, A.S.; Caliano, G.; Magenes, G. The delay multiply and sum beamforming algorithm in ultrasound B-mode
medical imaging. IEEE Trans. Med. Imaging 2015, 34, 940–949. [CrossRef] [PubMed]
5. Matrone, G.; Ramalli, A.; Tortoli, P.; Magenes, G. Experimental evaluation of ultrasound higher order harmonic imaging with
filtered delay multiply and sum (F-DMAS) non-linear beamforming. Ultrasonics 2018, 86, 59–68. [CrossRef] [PubMed]
6. Synnevåg, J.F.; Austeng, A.; Holm, S. Adaptive beamforming applied to medical ultrasound imaging. IEEE Trans. Ultrason.
Ferroelectr. Freq. Control 2007, 54, 1606–1613. [CrossRef]
7. Synnevåg, J.F.; Austeng, A.; Holm, S. A low-complexity data-dependent beamformer. IEEE Trans. Ultrason. Ferroelectr. Freq.
Control 2010, 57, 281–289.
8. Hollman, K.W.; Rigby, K.W.; O’donnell, M. Coherence factor of speckle from a multi-row probe. In Proceedings of the 1999 IEEE
Ultrasonics Symposium, Tahoe, NV, USA, 17–20 October 1999; pp. 1257–1260.
9. Nilsen, C.I.C.; Holm, S. Wiener beamforming and the coher-ence factor in ultrasound imaging. IEEE Trans. Ultrason. Ferroelectr.
Freq. Control 2010, 57, 1329–1346. [CrossRef]
199
Electronics 2023, 12, 3902
10. Li, P.C.; Li, M.L. Adaptive imaging using the generalized co-herence factor. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2003, 50,
128–141.
11. Xu, M.; Yang, X.; Ding, M.; Yuchi, M. Spatio-temporally Smoothed Coherence Factor for Ultrasound Imaging. IEEE Trans. Ultrason.
Ferroelectr. Freq. Control 2014, 61, 182–190. [CrossRef]
12. Shan, T.J.; Wax, M.; Kailath, T. On spatial smoothing for direction-of-arrival estimation of coherent signals. IEEE Trans. Acoust.
Speech Signal Process. 1985, 33, 806–811. [CrossRef]
13. Lan, Z.; Jin, L.; Feng, S. Joint Generalized Coherence Factor and Minimum Variance Beamformer for Synthetic Aperture. IEEE
Trans. Ultrason. Ferroelectr. Freq. Control 2021, 68, 1167–1183
14. Varray, F.; Kalkhoran, M.A.; Vray, D. Adaptive minimum variance coupled with sign and phase coherence factors in IQ domain
for plane wave beamforming. In Proceedings of the International Ultrasonic Symposium (IUS), Tours, France, 18–21 September
2016; pp. 1–4.
15. Behar, V.; Adam, D.; Friedman, Z. A new method of spatial compounding imaging. Ultrasonics 2003, 41, 377–384. [CrossRef]
[PubMed]
16. Wang, Y.; Li, P. SNR-Dependent Coherence-Based Adaptive Imaging for High-Frame-Rate Ultrasonic and Photoacoustic Imaging.
IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2014, 61, 1419–1432. [CrossRef] [PubMed]
17. Wu, X.; Gao, Q.; Lu, M. An improved spatio-temporally smoothed coherence factor combined with eigenspace-based minimun
variance beamformer for plane-wave imaging in medical ultrasound. In Proceedings of the 2017 IEEE International Ultrasonics
Symposium (IUS), Washington, DC, USA, 6–9 September 2017.
18. Wagner, R.F.; Insana, M.F.; Smith, S.W. Fundamental correlation lengths of coherent speckle in medical ultrasonic images. IEEE
Trans. Ultrason. Ferroelectr. Freq. Control 1988, 35, 34–44. [CrossRef] [PubMed]
19. Synnevåg, J.F.; Nilsen, C.I.C.; Holm, S. P2B-13 Speckle statistics in adaptive beamforming. In Proceedings of the 2007 IEEE
Ultrasonics Symposium, New York, NY, USA, 28–31 October 2007; pp. 1545–1548.
20. Matone, G.; Ramalli, A.; D’hooge, J. Spatial Coherence Based Beamforming in Multi-Line Transmit Echocardiography. In
Proceedings of the 2018 IEEE International Ultrasonics Symposium (IUS), Kobe, Japan, 22–25 October 2018. [CrossRef]
21. Synnevåg, J.-F.; Austeng, A.; Holm, S. A low-complexity data dependent beamformer. IEEE Trans. Ultrason. Ferroelectr. Freq.
Control 2011, 58, 281–289. [CrossRef] [PubMed]
22. Synnevag, J.-F.; Austeng, A.; Holm, S. Benefits of minimumvariance beamforming in medical ultrasound imaging. IEEE Trans.
Ultrason. Ferroelectr. Freq. Control 2009, 56, 1868–1879. [CrossRef]
23. Zimbico, J.; Granado, D.W.; Schneider, F.K.; Maia, J.M.; Assef, A.A.; Schiefler, N., Jr.; Costa, E.T. Eigenspace generalized sidelobe
canceller combined with SNR dependent coherence factor for plane wave imaging. Biomed. Eng. Online 2018, 17, 109. [CrossRef]
24. Asl, B.M.; Mahloojifar, A. Minimum variance beamforming combined with adaptive coherence weighting applied to medical
ultrasound imaging. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2009, 56, 1923–1931. [CrossRef]
25. Zhao, J.; Wang, Y.; Yu, J.; Guo, W.; Li, T.; Zheng, Y.-P. Subarray coherence based postfilter for eigenspace based minimum variance
beamformer in ultrasound plane-wave imaging. Ultrasonics 2016, 65, 23–33. [CrossRef]
26. Deylami, A.M.; Jensen, J.A.; Asl, B.M. An improved minimum variance beamforming applied to plane-wave imaging in medical
ultrasound. In Proceedings of the 2016 IEEE International Ultrasonics Symposium (IUS), Tours, France, 18–21 September 2016;
pp. 1–4.
27. Qi, Y.; Wang, Y.; Yu, J.; Guo, Y. 2-D Minimum Variance Based Plane Wave Compounding with Generalized Coherence Factor in
Ultrafast Ultrasound Imaging. Sensors 2018, 18, 4099. [CrossRef] [PubMed]
28. Asl, B.M.; Mahloojifar, A. Contrast enhancement and robustness improvement of adaptive ultrasound imaging using forward-
backward minimum variance beamforming. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2011, 58, 858–867. [CrossRef]
[PubMed]
29. Zhang, C.; Geng, X.; Yao, F.; Liu, L.; Guo, Z.; Zhang, Y.; Wang, Y. The Ultrasound Signal Processing Based on High-Performance
CORDIC Algorithm and Radial Artery Imaging Implementation. Appl. Sci. 2023, 13, 5664. [CrossRef]
30. Ali, I.; Saleem, M.T. Spatiotemporal Dynamics of Reaction–Diffusion System and Its Application to Turing Pattern Formation in a
Gray–Scott Model. Mathematics 2023, 11, 1459. [CrossRef]
31. Kaddoura, T.; Zemp, R.J. Hadamard Aperiodic Interval Codes for Parallel-Transmission 2D and 3D Synthetic Aperture Ultrasound
Imaging. Appl. Sci. 2022, 12, 4917. [CrossRef]
32. Khan, S.U.; Ali, I. Application of Legendre spectral-collocation method to delay differential and stochastic delay differential
equation. AIP Adv. 2018, 8, 035301. [CrossRef]
33. Rindal, O.M.H.; Aakhus, S.; Holm, S.; Austeng, A. Hypothesis of improved visualization of microstructures in the interventricular
septum with ultrasound and adaptive beamforming. Ultrasound Med. Biol. 2017, 43, 2494–2499. Available online: http://www.
sciencedirect.com/science/article/pii/S0301562917302466 (accessed on 5 May 2018). [CrossRef]
34. Nguyen, N.Q.; Prager, R.W. Minimum variance approaches to ultrasound pixel-based beamforming. IEEE Trans. Med. Imaging
2017, 36, 374–384. [CrossRef]
200
Electronics 2023, 12, 3902
35. Qi, Y.; Wang, Y.; Guo, W. Joint subarray coherence and minimum variance beamformer for multitransmission ultrasound imaging
modalities. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2018, 65, 1600–1617. [CrossRef]
36. Szasz, T.; Basarab, A.; Kouame, D. Beamforming through regularized inverse problems in ultrasound medical imaging. IEEE
Trans. Ultrason. Ferroelectr. Freq. Control 2016, 63, 2031–2044. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
201
electronics
Article
Underwater AUV Navigation Dataset in Natural Scenarios
Can Wang, Chensheng Cheng, Dianyu Yang, Guang Pan and Feihu Zhang *
School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an 710072, China;
[email protected] (C.W.); [email protected] (C.C.);
[email protected] (D.Y.); [email protected] (G.P.)
* Correspondence: [email protected]; Tel.: +86-029-88492611
Abstract: Autonomous underwater vehicles (AUVs) are extensively utilized in various autonomous
underwater missions, encompassing ocean environment monitoring, underwater searching, and
geological exploration. Owing to their profound underwater capabilities and robust autonomy, AUVs
have emerged as indispensable instruments. Nevertheless, AUVs encounter several constraints in
the domain of underwater navigation, primarily stemming from the cost-intensive nature of inertial
navigation devices and Doppler velocity logs, which impede the acquisition of navigation data.
Underwater simultaneous localization and mapping (SLAM) techniques, along with other navigation
approaches reliant on perceptual sensors like vision and sonar, are employed to augment the precision
of self-positioning. Particularly within the realm of machine learning, the utilization of extensive
datasets for training purposes plays a pivotal role in enhancing algorithmic performance. However,
it is common for data obtained exclusively from inertial sensors, a Doppler Velocity Log (DVL),
and depth sensors in underwater environments to not be publicly accessible. This research paper
introduces an underwater navigation dataset derived from a controllable AUV that is equipped with
high-precision fiber-optic inertial sensors, a DVL, and depth sensors. The dataset underwent rigorous
testing through numerical calculations and optimization-based algorithms, with the evaluation of
various algorithms being based on both the actual surfacing position and the calculated position.
underwater environments [8]. Concurrently, the high cost of high-precision inertial sensors
and sound velocity measurement devices restricts the application and data collection of
small AUVs [9]. In particular, the cost of common fiber-optic inertial navigation systems
can reach tens of thousands of dollars, making them impractical for small teams. In such
cases, the availability of underwater high-precision navigation data allows researchers
to analyze the generation and propagation of AUV navigation errors more profoundly
and devise strategies to mitigate any potential errors or limitations. This will provide a
research foundation for various fields, including marine biology, geology, environmental
monitoring, and defense operations [10].
Currently, the collection of AUV navigation data in natural underwater environments
is concentrated in authoritative experimental institutions, such as the Naval Surface Warfare
Center (NSWC) [11], the Naval Undersea Warfare Center (NUWC) [12], the European
Research Agency [13], and the Australian military, etc. [14]. The collection of pertinent
data necessitates the utilization of shore-based platforms or motherships, and in intricate
environments, human divers are also employed to facilitate navigation, thereby resulting
in substantial costs. The range of underwater sensors is limited by the environment,
especially in visible light, where active light sources are subject to forward and backward
scattering [15]. Acoustic-based forward-looking sonar and side-scan sonar (SSS) are widely
used for underwater environment sensing and terrain-matching navigation [16]. Therefore,
the collection and organization of underwater datasets will be a high-cost, complex task,
and the number of existing publicly available datasets is very small, which reflects this
paper’s significance in this work.
In this paper, we present a novel dataset to expand research in underwater navigation.
The uniqueness lies in the data sourced from multiple sensors, including high-quality
inertial navigation systems (INS) and Differential Global Positioning Systems (DGPS),
as well as synchronized DVLs and depth sounder data. In particular, all data are collected in
the natural environment, including lakes, reservoirs, and offshore areas. Moreover, the data
are generated during the autonomous navigation of the AUV, meaning that the navigation
data conform to the kinematics of the vehicle. To the best of our knowledge, this is the
first publicly released AUV lake/ocean navigation dataset based on high-precision sensors.
The dataset is accessible via the following link: https://github.com/nature1949/AUV_
navigation_dataset (accessed on 8 June 2023).
In summary, the main contributions of this article are as follows:
• Presentation of a substantial amount of underwater high-precision navigation data,
covering approximately 147 km;
• Collection of data from real scenarios in three different regions, encompassing diverse
trajectories and time spans;
• Introduction of navigation challenges in underwater environments and the proposed
methods based on dead reckoning and collaborative localization, evaluated against
our benchmark.
The paper is structured as follows: Section 2 describes the research foundation and
current status of underwater navigation, as well as the characteristics and limitations of
publicly available datasets for underwater navigation. Section 3 describes the platforms
and sensors used for data acquisition, as well as the acquisition process. Section 4 describes
the dataset structure and typical trajectories and tests the dataset by common methods.
A discussion of the results and data is carried out in Section 5 and finally summarized
in Section 6.
2. Related Work
2.1. Underwater Navigation Methods
Typically, AUVs employ inertial navigation combined with acoustics for collabora-
tive navigation, while ROVs, due to their limited mobility, additionally use visually and
acoustically aided navigation. Positioning algorithms often apply filtering or optimization
methods, including traditional EKF, UKF, and the latest SLAM techniques, among others.
204
Electronics 2023, 12, 3788
For instance, Harris et al. developed an AUV position estimation algorithm using the
ensemble Kalman filter (EnKF) and fuzzy Kalman filter (FKF), which avoids linearization of
the AUV’s dynamics model [5]. Jin et al. proposed a single-source assisted passive localiza-
tion method that combines acoustic positioning with inertial navigation and concluded that
time difference of arrival (TDOA) + AOA yields better results [17]. This method utilizes
fixed sound sources to periodically emit sound pulses underwater and locate the source
using a TDOA positioning technique.
Jorgensen et al. based their approach on the XKF principle and constructed an ob-
server for estimating position, velocity, attitude, underwater wave speed, rate sensor,
and accelerometer biases, which demonstrated stability and achieved near-noise optimal
performance [18]. Wang et al. integrated depth information into two-dimensional visual
images and proposed an online fusion method based on tightly coupled nonlinear opti-
mization to achieve continuous and robust localization in rapidly changing underwater
environments [19]. Manderson et al. presented an efficient end-to-end learning approach for
training navigation strategies using visual data and demonstrated autonomous visual navi-
gation over a distance of more than one kilometer [20]. Machine learning-based approaches
require massive amounts of training data, which highlights the importance of collecting
underwater navigation data to enhance the performance of navigation algorithms.
205
Electronics 2023, 12, 3788
creasing number of research teams have released datasets related to AUV autonomous
navigation, with a focus on easily obtainable visual information. The establishment of these
datasets has facilitated the development of AUV technologies, particularly in underwater
target recognition and underwater SLAM techniques. For instance, Cheng et al. provided
data collected in inland waterways using a stereo camera, LiDAR system, global positioning
system antenna, and inertial measurement unit [30]. Song et al. obtained a millimeter-
precision underwater visual-inertial dataset through a motion capture system, but the data
were acquired in a laboratory setting [31]. Tomasz et al. introduced an underwater visual
navigation SLAM dataset that includes ground truth tracking of vehicle positions obtained
through underwater motion capture [32].
Martin et al. offered canoe attitude and stereo camera data collected in natural river
environments [33]. Panetta et al. presented the Underwater Object Tracking (UOT100)
benchmark dataset, which comprises 104 underwater video sequences and over 74,000 an-
notated frames from natural and artificial underwater videos with various distortions [34].
Angelos et al. provided data collected by AUVs in complex underwater cave systems,
particularly equipped with two mechanically scanned imaging sonars [35]. Notably, Kristo-
pher et al. simulated AUV data with advanced sensors by equipping a ground vehicle with
two multibeam sonars and a set of navigation sensors [36]. More recently, Maxime et al. col-
lected ROS data for underwater SLAM using a monocular camera, an inertial measurement
unit (IMU), and other sensors in harbors and archaeological sites [37]. Li et al. presented an
adaptive AUV-assisted ocean current data collection strategy, formulating an optimization
problem to maximize the VoI energy ratio, thereby reducing AUV energy consumption and
ensuring timely data acquisition [38].
However, the existing datasets primarily concentrate on underwater vision and are
obtained from natural environments or created through data augmentation. These datasets
are primarily utilized for various applications, including underwater image recognition, un-
derwater 3D reconstruction, and visual/visual-inertial SLAM. However, the availability of
independent datasets specifically focused on underwater inertial/DVL navigation remains
limited. Therefore, this paper aims to address this gap by compiling AUV navigation data
gathered from diverse natural scenarios and presenting it in an enhanced KITTI format,
facilitating the extraction of algorithmic data for general purposes.
3. Data Acquisition
3.1. Platform
We used a 325 mm diameter AUV as the acquisition platform and collected data
through different trajectories at different times and locations to achieve a diversified data
type and a more representative sample set. The platform was equipped with high-precision
inertial navigation, differential RTK, DVL, depth finder, and other sensors. The computing
platform used a customized motherboard, which allowed different devices access and
provided high-speed computing power. The platform structure and sensor layout are
shown in Figure 1. The perception and navigation sensors were fixed on the vehicle and
could be associated with rigid body transformations, but the provided data were obtained
through rotational transformations based on their own sensors.
z
y
206
Electronics 2023, 12, 3788
3.2. Sensor
This section introduces the hardware and software used for data collection, including
navigation sensors, DVL, depth sensors, and other payloads. These components work in
harmony to capture comprehensive and accurate underwater navigation data. The high-
precision fiber-optic inertial navigation system performs inertial measurements, provides
six-axis angular velocity and linear acceleration, and has the internal potential for satellite
and Doppler fusion. The DVL features a four-phase beam array, allowing it to calculate the
vehicle’s velocity relative to the water independently. This additional velocity information
contributes to a more comprehensive understanding of the AUV’s motion and aids in
precise positioning during underwater navigation. Table 1 lists the complete hardware.
3.4. Synchronization
Various sensors first record the timestamp of the captured frames. Through the
central processing platform, they are fused with the GPS time and computer time to record
events with a time accuracy of 100 ms. For asynchronous inertial and DVL measurements,
the lower frequency is mainly recorded. The period is not fixed due to the inconsistent
frequencies and times of different sensors.
207
Electronics 2023, 12, 3788
4. Dataset
4.1. Data Structures
Existing public underwater datasets adopt non-standard data structures based on
different types of sensors, which also causes difficulties for researchers to interpret. To unify,
this dataset is based on the data structure of the commonly used KITTI car dataset [39] and
adds additional data including DVL velocity and depth information. It is finally provided
in CSV format, together with Python tools that can be directly transferred to ROS. The file
directory structure is shown in Figure 2.
Several typical vehicle trajectories are shown in Figure 3. Note that the dataset contains
structured underwater and surface maneuvering trajectories and that switching between
surface and underwater sailing also occurs in one segment of the trajectory. We labeled
the surface and underwater navigation sections to differentiate between them, while the
most significant feature of underwater navigation is the absence of GPS signals, resulting
in constant latitude and longitude received from GPS.
Compared with other underwater datasets, most of this dataset is driven autonomously
according to the AUV’s own driving method rather than with manual assistance. This
is conducive to analyzing the kinematic rules of the AUV. At the same time, underwater
navigation based on inertial units inevitably leads to error accumulation, which is fatal to
the task. In the dataset, precise global positioning results measured by differential RTK
are provided, with RTK base stations deployed on land. During underwater cruising, GPS
measurements are unavailable until the AUV surfaces. This helps to determine the accuracy
of underwater navigation. The latitude and longitude derived based on waypoints are the
initial results of onboard calculations and do not indicate navigation error performance.
濔濨濩濲瀁濴瀉濼濺濴瀇濼瀂瀁濲濷濴瀇濴瀆濸瀇濂
濷濴瀇濸澹瀇濼瀀濸濂
瀆濸瀅濼濴濿濲瀁瀈瀀濵濸瀅濍濄濂
濷濴瀇濸澹瀇濼瀀濸濲瀆濸瀅濼濴濿濲瀁瀈瀀濵濸瀅濁濶瀆瀉
瀆濸瀅濼濴濿濲瀁瀈瀀濵濸瀅濍濅濂
濷濴瀇濸澹瀇濼瀀濸濲瀆濸瀅濼濴濿濲瀁瀈瀀濵濸瀅濁濶瀆瀉
濷濴瀇濴濲濹瀂瀅瀀濴瀇濁瀋濿瀆瀋
濥濘濔濗濠濘
Figure 2. Directory structure of the file set.
Figure 3. Cont.
208
Electronics 2023, 12, 3788
Figure 3. Various types of AUV tracks in different regions. (a) Scenario 1. (b) Scenario 2. (c) Scenario 3.
(d) Scenario 4. (e) Scenario 5. (f) Scenario 6.
4.2. Testing
The mathematical model and engineering implementation of the fiber-optic iner-
tial navigation system are mature cases. In this paper, the performance of the data is
initially evaluated by solving the inertial navigation part underwater and comparing it
with the initial navigation estimation results. The evaluation uses data acquired from
gyroscopes, accelerometers, depth sensors, magnetometers, etc., and performs position
estimation after calibration, filtering, and time synchronization. Methods for compensating
errors due to underwater welfare, flow, etc., are not the focus of this paper and are to be
further investigated.
To begin the evaluation process, we utilize the position and attitude of the AUV at
the entry point as the initial state. The initial longitude, latitude, and height values are
recorded as λ0 , L0 , and h0 , respectively. The initial velocity is obtained from either the
DVL’s effective state or the inertial navigation system. Additionally, the initial strapdown
attitude matrix is expressed in (1), providing a reference for the AUV’s initial orientation.
By comparing the results of the inertial navigation solution with those of the initial
navigation estimate, we can assess the accuracy and reliability of the fiber-optic inertial
navigation system in the underwater environment and verify its ability to provide accurate
navigation information to the AUV throughout the underwater mission. Accumulated
errors will occur during the solving process, and can be resolved through periodic error
correction and position calibration using known landmarks or reference points for accurate
navigation [40].
⎡ ⎤
T11 T12 T13
Tbn (0) = T21 T22 T23 ⎦
⎣
T31 T32 T33
⎡ ⎤, (1)
cos γ0 cos ψg0 − sin γ0 sin θ0 sin ψg0 − cos θ0 sin ψg0 sin γ0 cos ψg0 + cos γ0 sin θ0 sin ψg0
= ⎣cos γ0 sin ψg0 + sin γ0 sin θ0 cos ψg0 cos θ0 cos ψg0 sin γ0 sin ψg0 − cos γ0 sin θ0 cos ψg0 ⎦
− sin γ0 cos θ0 sin θ0 cos γ0 cos θ0
209
Electronics 2023, 12, 3788
The angular velocity of the AUV, ωnbb , is calculated based on the output value ω b
ib
of the gyroscope. The calculation process is as follows. Compared with the original text,
the following optimizations are made:
b
ωnb = ωib
b
− Tnb ωin
n
= ωib
b
− Tnb (ωien + ωen
n
), (4)
1
K1 = Q(k)ωnbb
(k)
2
1 Δt Δt
K2 = ( Q(k) + K1 )ωnb b
(k + )
2 2 2
1 Δt Δt
K3 = ( Q(k) + K2 )ωnb (k + )
b , (5)
2 2 2
1 Δt Δt
K4 = ( Q(k) + K3 )ωnb b
(k + )
2 2 2
Δt
Q k +1 = Q(k) + (K1 + 2K2 + 2K3 + K4 )
6
At this point, the updated strapdown matrix is shown in (6).
⎡ ⎤
q20 + q21 − q22 − q23 2( q1 q2 − q0 q3 ) 2( q1 q3 + q0 q2 )
Tbn = ⎣ 2( q1 q2 + q0 q3 ) − q21 + q22 − q23
q20 2( q2 q3 − q0 q1 ) ⎦, (6)
2( q1 q3 − q0 q2 ) 2( q2 q3 + q0 q1 ) q0 − q21 − q22 + q23
2
The acceleration output of the accelerometer needs to be transformed from the carrier
coordinate system to the navigation coordinate system, that is, f n = Tbn f b . At this point,
the acceleration with respect to the ground is shown in (7).
⎡ ⎤
0 −(2ωiez
n + ωn )
enz
n + ωn
2ωiey eny
⎢ n + ω n ) ⎥V n + g,
V˙n = f n − ⎣ 2ωiezn + ωn
enz 0 −(2ωiex enx ⎦ (7)
−(2ωiey + ωeny )
n n 2ωiex + ωenx
n n 0
Δt ˙n
V n ( k ) = V n ( k − 1) + (V (k − 1) + V˙n (k)), (8)
2
The position angular velocity update equation is:
⎡ Vn
⎤
− RM
y
⎢ ⎥
n
ωie =⎢
⎣
Vxn
RN
⎥,
⎦ (9)
Vxn
RN tan L
Due to the slow changes in position during the navigation process, the update of the
position matrix can be represented as follows:
210
Electronics 2023, 12, 3788
At this moment, the position of the AUV is calculated using the following formula:
C32
λ = arctan
C31
C33 . (11)
L = arctan
2
C31 + C32
2
Here, we assess three representative sequences from the dataset, encompassing distinct
movement attributes and geographic regions, as illustrated in Figures 4–6. Observing the
results, it is evident that the computed velocities and attitude angles align well with the
initial data. While the altitude direction remains linked to the resolved velocity, depth gauge
measurements offer increased reliability. The deviation between the updated trajectory
from the odometer and the measured trajectory, which does not accurately represent the
true values, results from the accumulation of measurement errors.
For independent strapdown inertial navigation systems (SINS), the estimation of
relative velocity and position involves the integration of accelerometer and gyro sensor
data, which can introduce errors and result in significant drift in the estimated position and
velocity [4]. Integrating the DVL and depth gauge measurements would notably enhance
underwater navigation precision, even though the challenge of mitigating errors persists.
Figure 4. A comparison between the solved and measured values for Case 1 is presented. The
subscript “s” denotes the solved result, while the subscript “m” indicates the measured result.
(a) entails a comparison between the initial heading projection position and the solved position.
(b) involves a comparison between the measured attitude and the solved attitude. Lastly, (c) examines
the contrast between the measured and solved velocity values in the northeast sky direction.
Figure 5. A comparison between the solved and measured values for Case 2 is presented. The
subscript “s” denotes the solved result, while the subscript “m” indicates the measured result.
(a) entails a comparison between the initial heading projection position and the solved position.
(b) involves a comparison between the measured attitude and the solved attitude. Lastly, (c) examines
the contrast between the measured and solved velocity values in the northeast sky direction.
211
Electronics 2023, 12, 3788
Figure 6. A comparison between the calculated and measured values for Case 3 is presented. The sub-
script “s” denotes the solved result, while the subscript “m” indicates the measured result. (a) entails
a comparison between the initial heading projection position and the calculated position. (b) involves
a comparison between the measured attitude and the calculated attitude. Lastly, (c) examines the
contrast between the measured and calculated velocity values in the northeast sky direction.
5. Discussion
Navigation data on lakes and oceans were gathered by employing autonomous under-
water vehicles (AUVs) equipped with rudimentary sensors. The trajectories and perfor-
mance of the navigation data were acquired via dead-reckoning. By considering diving
points and upper floating points, the underwater state of the vehicle can be ascertained
and examined for diverse trajectories. In particular, the kinematic model of the AUV
enables a meticulous analysis of its navigation trajectory attributes, thereby facilitating the
augmentation of precision in underwater navigation. Extensive scholarly investigations
have been conducted on fusion navigation algorithms for IMU/DVL. However, it is of
utmost importance to conduct an integrated evaluation that incorporates openly accessible
datasets. The profusion of underwater navigation data presents a valuable resource for
comprehensively analyzing the interdependent connection between navigation strategies
and devices, thereby revealing possibilities for attaining high-precision navigation through
cost-effective sensor solutions. Additionally, future endeavors will explore applications of
the navigation dataset.
Until new and efficient means of underwater navigation are developed, the capacity
of AUVs to achieve high-precision navigation remains constrained by cost and techno-
logical limitations. The predominant approach to AUV navigation is centered on aided
navigation techniques based on inertial navigation principles and amalgamating diverse
measurements [41]. However, the complex interaction of practical environmental limita-
tions, hydroacoustic channel multipath effects, and submerged ambient noise interference
often leads to significant irregularities [42]. Consequently, addressing these challenges, such
as mitigating cumulative errors arising from inertial navigation and rectifying measurement
inaccuracies from various sensors, becomes a crucial focus for future research efforts.
In future work, we plan to expand our dataset further by incorporating additional
sensing modalities, such as perception and acoustic data, to extend its usability. Specifically,
we are interested in exploring underwater SLAM techniques based on forward-looking and
side-scan sonar data, which will open up new avenues in underwater navigation. Moreover,
data-driven pedestrian dead reckoning (PDR) research has already shown promising results
with extensive datasets, inspiring us to further improve underwater navigation accuracy
through large-scale learning approaches.
6. Conclusions
We have compiled a navigation dataset of AUVs operating in various regions, collected
using high-precision inertial navigation, DVL, and depth sensors. The dataset encapsulates
a myriad of natural scenarios involving AUVs navigating in both underwater and surface
environments and spanning diverse latitudes, longitudes, and timelines. This dataset
212
Electronics 2023, 12, 3788
represents a pioneering collection of underwater navigation data obtained through the com-
bination of high-cost fiber-optic gyroscopes. Drawing upon our dataset, we offer significant
data support for the enhancement of underwater navigation algorithms. The assessment
of typical algorithms has substantiated the practicality and effectiveness of our dataset.
We hope that this dataset will be beneficial to other researchers in the field of autonomous
exploration in constrained underwater environments.
Author Contributions: Conceptualization, C.W. and F.Z.; methodology, C.W.; software, C.W. and
C.C.; validation, D.Y.; investigation, F.Z.; resources, F.Z. and G.P.; formal analysis, C.C. and D.Y.;
writing—original draft preparation, C.W. and C.C.; writing—review and editing, C.W. and F.Z.; visu-
alization, C.W. and C.C.; supervision, F.Z.; project administration, F.Z. and G.P.; funding acquisition,
F.Z. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the National Natural Science Foundation of China (52171322),
the National Key Research and Development Program (2020YFB1313200), and the Fundamental
Research Funds for the Central Universities (D5000210944).
Data Availability Statement: Data available in a publicly accessible repository. The data presented
in this study are openly available in the AUV_navigation_dataset at https://github.com/nature194
9/AUV_navigation_dataset (accessed on 8 June 2023).
Acknowledgments: The authors gratefully acknowledge the support provided by the Key Laboratory
of Unmanned Underwater Transport Technology during the data collection process, as well as the
assistance of the research team members.
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
References
1. Lapierre, L.; Zapata, R.; Lepinay, P.; Ropars, B. Karst exploration: Unconstrained attitude dynamic control for an AUV. Ocean Eng.
2021, 219, 108321. [CrossRef]
2. Yan, J.; Ban, H.; Luo, X.; Zhao, H.; Guan, X. Joint Localization and Tracking Design for AUV With Asynchronous Clocks and State
Disturbances. IEEE Trans. Veh. Technol. 2019, 68, 4707–4720. [CrossRef]
3. Liu, R.; Liu, F.; Liu, C.; Zhang, P. Modified Sage-Husa Adaptive Kalman Filter-Based SINS/DVL Integrated Navigation System
for AUV. J. Sens. 2021, 2021, 9992041. [CrossRef]
4. Sahoo, A.; Dwivedy, S.K.; Robi, P. Advancements in the field of autonomous underwater vehicle. Ocean Eng. 2019, 181, 145–160.
[CrossRef]
5. Harris, Z.J.; Whitcomb, L.L. Cooperative acoustic navigation of underwater vehicles without a DVL utilizing a dynamic process
model: Theory and field evaluation. J. Field Robot. 2021, 38, 700–726. [CrossRef]
6. Bucci, A.; Zacchini, L.; Franchi, M.; Ridolfi, A.; Allotta, B. Comparison of feature detection and outlier removal strategies in a
mono visual odometry algorithm for underwater navigation. Appl. Ocean Res. 2022, 118, 102961. [CrossRef]
7. Franchi, M.; Ridolfi, A.; Zacchini, L. 2D Forward Looking SONAR in Underwater Navigation Aiding: An AUKF-based strategy
for AUVs*. IFAC-Papersonline 2020, 53, 14570–14575. [CrossRef]
8. Zhou, W.H.; Zhu, D.M.; Shi, M.; Li, Z.X.; Duan, M.; Wang, Z.Q.; Zhao, G.L.; Zheng, C.D. Deep images enhancement for turbid
underwater images based on unsupervised learning. Comput. Electron. Agric. 2022, 202, 107372. [CrossRef]
9. Su, R.; Zhang, D.; Li, C.; Gong, Z.; Venkatesan, R.; Jiang, F. Localization and Data Collection in AUV-Aided Underwater Sensor
Networks: Challenges and Opportunities. IEEE Netw. 2019, 33, 86–93. [CrossRef]
10. Howe, J.A.; Husum, K.; Inall, M.E.; Coogan, J.; Luckman, A.; Arosio, R.; Abernethy, C.; Verchili, D. Autonomous underwater
vehicle (AUV) observations of recent tidewater glacier retreat, western Svalbard. Mar. Geol. 2019, 417, 106009. [CrossRef]
213
Electronics 2023, 12, 3788
11. Gallagher, D.G.; Manley, R.J.; Hughes, W.W.; Pilcher, A.M. Development of an enhanced underwater navigation capability for
military combat divers. In Proceedings of the OCEANS 2016 MTS/IEEE Monterey, Monterey, CA, USA, 19–23 September 2016;
pp. 1–4. [CrossRef]
12. Dzikowicz, B.R.; Yoritomo, J.Y.; Heddings, J.T.; Hefner, B.T.; Brown, D.A.; Bachand, C.L. Demonstration of Spiral Wavefront
Navigation on an Unmanned Underwater Vehicle. IEEE J. Ocean. Eng. 2023, 48, 297–306. [CrossRef]
13. Huet, C.; Mastroddi, F. Autonomy for Underwater Robots—A European Perspective. Auton. Robot. 2016, 40, 1113–1118.
[CrossRef]
14. Bil, C. Concept Evaluation of a Bi-Modal Autonomous System. In Proceedings of the AIAA AVIATION 2023 Forum, San Diego,
CA, USA, 12–16 June 2023. [CrossRef]
15. Li, H.; Zhu, J.; Deng, J.; Guo, F.; Zhang, N.; Sun, J.; Hou, X. Underwater active polarization descattering based on a single
polarized image. Opt. Express 2023, 31, 21988–22000. [CrossRef] [PubMed]
16. Franchi, M.; Ridolfi, A.; Pagliai, M. A forward-looking SONAR and dynamic model-based AUV navigation strategy: Preliminary
validation with FeelHippo AUV. Ocean Eng. 2020, 196, 106770. [CrossRef]
17. Jin, B.; Xu, X.; Zhu, Y.; Zhang, T.; Fei, Q. Single-Source Aided Semi-Autonomous Passive Location for Correcting the Position of
an Underwater Vehicle. IEEE Sens. J. 2019, 19, 3267–3275. [CrossRef]
18. Jorgensen, E.K.; Fossen, T.I.; Bryne, T.H.; Schjolberg, I. Underwater Position and Attitude Estimation Using Acoustic, Inertial, and
Depth Measurements. IEEE J. Ocean. Eng. 2020, 45, 1450–1465. [CrossRef]
19. Wang, Y.; Ma, X.; Wang, J.; Wang, H. Pseudo-3D Vision-Inertia Based Underwater Self-Localization for AUVs. IEEE Trans. Veh.
Technol. 2020, 69, 7895–7907. [CrossRef]
20. Manderson, T.; Gamboa Higuera, J.C.; Wapnick, S.; Tremblay, J.F.; Shkurti, F.; Meger, D.; Dudek, G. Vision-Based Goal-
Conditioned Policies for Underwater Navigation in the Presence of Obstacles. arXiv 2020, arXiv:2006.16235.
21. Singh, D.; Valdenegro-Toro, M. The Marine Debris Dataset for Forward-Looking Sonar Semantic Segmentation.In Proceed-
ings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Virtual, 11–17 October 2021;
pp. 3734–3742.
22. Zhou, Y.; Chen, S.; Wu, K.; Ning, M.; Chen, H.; Zhang, P. SCTD 1.0:Sonar Common Target Detection Dataset. Comput. Sci. 2021,
48, 334–339. [CrossRef]
23. Zhang, P.; Tang, J.; Zhong, H.; Ning, M.; Liu, D.; Wu, K. Self-Trained Target Detection of Radar and Sonar Images Using Automatic
Deep Learning. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [CrossRef]
24. Huo, G.; Wu, Z.; Li, J. Underwater Object Classification in Sidescan Sonar Images Using Deep Transfer Learning and Semisynthetic
Training Data. IEEE Access 2020, 8, 47407–47418. [CrossRef]
25. Chang, L.; Song, H.; Li, M.; Xiang, M. UIDEF: A real-world underwater image dataset and a color-contrast complementary image
enhancement framework. ISPRS J. Photogramm. Remote Sens. 2023, 196, 415–428. . [CrossRef]
26. Yin, X.; Liu, X.; Liu, H. FMSNet: Underwater Image Restoration by Learning from a Synthesized Dataset. In Proceedings of
the Artificial Neural Networks and Machine Learning—ICANN 2021, Bratislava, Slovakia, 14–17 September 2021; Farkaš, I.,
Masulli, P., Otte, S., Wermter, S., Eds.; Springer: Cham, Switzerland, 2021; pp. 421–432.
27. Chen, L.; Dong, J.; Zhou, H. Class balanced underwater object detection dataset generated by class-wise style augmentation.
arXiv 2021, arXiv:2101.07959.
28. Polymenis, I.; Haroutunian, M.; Norman, R.; Trodden, D. Artificial Underwater Dataset: Generating Custom Images Using Deep
Learning Models. In Proceedings of the ASME 2022 41st International Conference on Ocean, Offshore and Arctic Engineering,
Hamburg, Germany, 5–10 June 2022. [CrossRef]
29. Boittiaux, C.; Dune, C.; Ferrera, M.; Arnaubec, A.; Marxer, R.; Matabos, M.; Audenhaege, L.V.; Hugel, V. Eiffel Tower: A deep-sea
underwater dataset for long-term visual localization. Int. J. Robot. Res. 2023, 02783649231177322. [CrossRef]
30. Cheng, Y.; Jiang, M.; Zhu, J.; Liu, Y. Are We Ready for Unmanned Surface Vehicles in Inland Waterways? The USVInland
Multisensor Dataset and Benchmark. IEEE Robot. Autom. Lett. 2021, 6, 3964–3970. [CrossRef]
31. Song, Y.; Qian, J.; Miao, R.; Xue, W.; Ying, R.; Liu, P. HAUD: A High-Accuracy Underwater Dataset for Visual-Inertial Odometry.
In Proceedings of the 2021 IEEE Sensors, 31 October–3 November 2021; pp. 1–4. [CrossRef]
32. Luczynski, T.; Scharff Willners, J.; Vargas, E.; Roe, J.; Xu, S.; Cao, Y.; Petillot, Y.; Wang, S. Underwater inspection and intervention
dataset. arXiv 2021, arXiv:2107.13628. [CrossRef]
33. Miller, M.; Chung, S.J.; Hutchinson, S. The Visual–Inertial Canoe Dataset. Int. J. Robot. Res. 2018, 37, 13–20. [CrossRef]
34. Panetta, K.; Kezebou, L.; Oludare, V.; Agaian, S. Comprehensive Underwater Object Tracking Benchmark Dataset and Underwater
Image Enhancement With GAN. IEEE J. Ocean. Eng. 2022, 47, 59–75. [CrossRef]
35. Mallios, A.; Vidal, E.; Campos, R.; Carreras, M. Underwater caves sonar data set. Int. J. Robot. Res. 2017, 36, 1247–1251. [CrossRef]
36. Krasnosky, K.; Roman, C.; Casagrande, D. A bathymetric mapping and SLAM dataset with high-precision ground truth for
marine robotics. Int. J. Robot. Res. 2022, 41, 12–19. [CrossRef]
37. Ferrera, M.; Creuze, V.; Moras, J.; Trouvé-Peloux, P. AQUALOC: An underwater dataset for visual–inertial–pressure localization.
Int. J. Robot. Res. 2019, 38, 1549–1559. [CrossRef]
38. Li, Y.; Sun, Y.; Ren, Q.; Li, S. AUV-Aided Data Collection Considering Adaptive Ocean Currents for Underwater Wireless Sensor
Networks. China Commun. 2023, 20, 356–367. [CrossRef]
214
Electronics 2023, 12, 3788
39. Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013, 32, 1231–1237.
[CrossRef]
40. Wang, C.; Cheng, C.; Yang, D.; Pan, G.; Zhang, F. AUV planning and calibration method considering concealment in uncertain
environments. Front. Mar. Sci. 2023, 10, 1228306. [CrossRef]
41. Zhai, W.; Wu, J.; Chen, Y.; Jing, Z.; Sun, G.; Hong, Y.; Fan, Y.; Fan, S. Research on Underwater Navigation and Positioning
Method Based on Sea Surface Buoys and Undersea Beacons. In Proceedings of the China Satellite Navigation Conference (CSNC)
2020 Proceedings, Chengdu, China, 22–25 November 2020; Sun, J., Yang, C., Xie, J., Eds.; Springer: Singapore, 2020; Volume III,
pp. 390–404.
42. Wang, J.; Zhang, T.; Jin, B.; Zhu, Y.; Tong, J. Student’s t-Based Robust Kalman Filter for a SINS/USBL Integration Navigation
Strategy. IEEE Sens. J. 2020, 20, 5540–5553. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
215
electronics
Article
Local-Aware Hierarchical Attention for
Sequential Recommendation
Jiahao Hu *, Qinxiao Liu and Fen Zhao
Abstract: Modeling the dynamic preferences of users is a challenging and essential task in a rec-
ommendation system. Taking inspiration from the successful use of self-attention mechanisms in
tasks within natural language processing, several approaches have initially explored integrating
self-attention into sequential recommendation, demonstrating promising results. However, existing
methods have overlooked the intrinsic structure of sequences, failed to simultaneously consider the
local fluctuation and global stability of users’ interests, and lacked user information. To address these
limitations, we propose LHASRec (Local-Aware Hierarchical Attention for Sequential Recommenda-
tion), a model that divides a user’s historical interaction sequences into multiple sessions based on a
certain time interval and computes the weight values for each session. Subsequently, the calculated
weight values are combined with the user’s historical interaction sequences to obtain a weighted
user interaction sequence. This approach can effectively reflect the local fluctuation of the user’s
interest, capture the user’s particular preference, and at the same time, consider the user’s general
preference to achieve global stability. Additionally, we employ Stochastic Shared Embeddings (SSE)
as a regularization technique to mitigate the overfitting issue resulting from the incorporation of
user information. We conduct extensive experiments, showing that our method outperforms other
competitive baselines on sparse and dense datasets and different evaluation metrics.
Keywords: sequential recommendation; local fluctuation; global stability; Stochastic Shared Embeddings
2. Related Work
The user’s behavior is a time-ordered behavior sequence, and their interests also
dynamically change over time. Therefore, extracting temporal information from sequen-
tial data can provide valuable information. Early sequential recommendation models
utilized Markov chains (MCs) [9] to capture the correlations within the sequential data.
Shani et al. used the Markov chain [1] to mine the correlation between users’ short-term
behaviors, thus achieving a good recommendation effect. Rendle et al. combined the
idea of Matrix Factorization (MF) with Markov chains [5,10] by storing user transition
matrices in a three-dimensional matrix and explored the temporal information in the user’s
short-term behavior sequences by predicting the user’s interests in other items. How-
ever, due to the scalability issue of Markov chains, the time and space complexity of the
models significantly increase when dealing with longer sequences, leading to suboptimal
recommendation performance.
Compared to Markov chains, Recurrent Neural Networks (RNNs) [11,12], bene-
fiting from their distinctive structure, are more effective in handling sequential data.
B. Hidasi et al. first applied RNNs to sequential recommendation [6] and proposed the
session-based sequential recommendation model. It divided the user’s behavior into mul-
tiple sessions based on a certain time interval, modeled each session’s behavior using an
RNN, and predicted the next item the user interacted with. To further improve sequential
218
Electronics 2023, 12, 3742
3. Method
We propose a hierarchical attention-based sequential recommendation model, LHAS-
Rec. The model consists of an embedding layer, a local-aware layer, a global attention
layer, and a prediction layer. This section will describe how to construct this sequential
recommendation model. The architecture of LHASRec is illustrated in Figure 1.
219
Electronics 2023, 12, 3742
+ 2X + X + QX
&URVV(QWURS\/RVV
3UHGLFWLRQOD\HU
6HOI$WWHQWLRQ%ORFNV
:HLJKWRI/RFDO)OXFWXDWLRQ
...
&RPELQHG(PEHGGLQJV
+1X + 2X + QX1 X1 X2 XQ
(PEHGGLQJ (PEHGGLQJ
Figure 1. The overall framework of LHASRec. The model primarily consists of an embedding
layer, a local-aware layer, a global attention layer, and a prediction layer, and the input and output
of the global attention layer are handled using SSE regularization. (a) The target user’s historical
behavior sequence is combined with user information and divided into multiple sessions based on
time intervals. (b) Each session is individually processed by the local-aware layer to generate local
attention weights, which are then combined with the sequences containing item information and user
information to serve as the input for the global attention layer. (c) The SSE regularization technique is
applied to the input matrix. (d) The global attention layer captures the representation of the user’s
local and global preferences. (e) The output matrix is regularized using SSE. (f) Recommendations
are made based on the target user’s local and global preferences.
220
Electronics 2023, 12, 3742
Table 1. Notation.
Notation Description
U, I user and item set
Hu historical interaction sequence for the user u
td ∈ N division time interval
n∈N maximum sequence length
ns ∈ N length of each session
k∈N number of sessions
b∈N number of stacked temporal attention blocks
d∈N latent vector dimension of the model
di , d u ∈ N latent dimension of item and user
M I ∈ R| I |×di item embedding matrix
MU ∈ R|U |×du user embedding matrix
Ŝ1 , Ŝ2 , · · · , Ŝk ∈ Rns ×d input embedding matrix of local-aware layer
Ê ∈ Rn×d input embedding matrix of global attention layer
c1 , c2 , · · · , c k ∈ R fluctuation coefficient of each session
A ∈ Rn × d output of the self-attention layer
F ∈ Rn × d output of the point-wise feed-forward network
It is worth noting that when the length of the original sequence is smaller than n, we
pad the left side of the sequence with zeros. When the length of the original sequence
is greater than n, we only consider the most recent n interactions. We construct the item
embedding matrix M I ∈ R| I |×di and the user embedding matrix MU ∈ R|U |×du , where
di , du ∈ N represent the latent embedding dimensions for items and users, respectively.
From these two embedding matrices, we retrieve the user information embedding for
the target user and the embeddings of each item in the user’s input sequence. These
embeddings are combined to obtain the input embedding E ∈ Rn×d , where d = di + du :
⎡ ⎤
M I ; MuU
⎢ h1 ⎥
⎢ ⎥
⎢ MhI ; MuU ⎥
E=⎢ 2 ⎥ (2)
⎢ ⎥
⎣ · · · ⎦
MhI n ; MuU
where MhI ; MuU is the concatenation of the embedding vectors for item hi and user u. We
i
believe that when the time intervals between several user interactions are close, it indicates
that the user is selecting items of the same kind of interest, indicating a strong correlation
between these items. On the other hand, when the time interval between two adjacent items
is large, it may suggest a change in the user’s interest, resulting in a change in the selected
items’ category and indicating a weak correlation between these two items. In the user’s
historical behavior, the local fluctuation is generated with the change of the user’s interest.
To capture these fluctuations, we introduce a time interval threshold, denoted as td ∈ N,
and examine the time intervals between every two adjacent items. If the interval exceeds
td , we split the input sequence accordingly. Following this approach, we divide the input
221
Electronics 2023, 12, 3742
sequence E into k sessions, i.e., local interaction sequences. Within each session, the items
exhibit strong correlations, while there are weak correlations between items from different
sessions. As the divided sessions have different lengths, we apply the same fixed-length
rule for each session Si ∈ Rns ×d to adjust its length to a specific value, denoted as ns ∈ N:
⎡ ⎤ ⎡ ⎤
M I S1 ; MuU M I Sk ; MuU
⎢ ⎥ ⎢ ⎥
⎢ h1 ⎥ ⎢ h1 ⎥
⎢ ⎥ ⎢ ⎥
⎢ M I ; MU ⎥ ⎢ I U ⎥
S1 = ⎢ S u ⎥ , · · · , S = ⎢ M S k ; Mu ⎥ (3)
⎢ h2 1 ⎥ k ⎢ h2 ⎥
⎢ · · · ⎥ ⎢ · · · ⎥
⎢ ⎥ ⎢ ⎥
⎣ ⎦ ⎣ ⎦
M I S1 ; MuU M I Sk ; MuU
hns hns
where M I Si ∈ Rd represents the embedding of item h j in the i-th session. Since the self-
hj
attention mechanism is unaware of the positional relationship of items in the sequence, we
introduced learnable position embeddings for each session:
⎡ ⎤ ⎡ ⎤
S S
M I S1 ; MuU + p1 1 M I Sk ; MuU + p1 k
⎢ ⎥ ⎢ ⎥
⎢ h1 ⎥ ⎢ h1 ⎥
⎢ ⎥ ⎢ ⎥
⎢ M I ; M U + p S1 ⎥ ⎢ M I ; M U + p Sk ⎥
Ŝ1 = ⎢
⎢
S
h2 1
u 2 ⎥, · · · , Ŝ = ⎢
⎥ k ⎢
S
h2 k
u 2 ⎥
⎥ (4)
⎢ ··· ⎥ ⎢ ··· ⎥
⎢ ⎥ ⎢ ⎥
⎣ S
⎦ ⎣ S
⎦
M I S1 ; MuU + pn1s M I Sk ; MuU + pnks
hns hns
S
where pj i ∈ Rd represents the embedding of the j-th position in the i-th session.
QK T
Attention( Q, K, V ) = so f tmax √ V (5)
d
where Q represents
√ the query matrix, and K and V denote the key and value matrices,
respectively. d is a scaling factor used to mitigate the problem of large inner products
when the dimension is high.
We feed each session of the user separately into different attention layers to avoid
mutual interference between items with weak correlations, ultimately obtaining the fluc-
tuation coefficient corresponding to each session, thereby capturing the local fluctuations
of the user’s interests. Specifically, for the i-th session, it is linearly projected into three
matrices, which are then fed into the i-th attention layer:
where WSQ , WSK , WSV ∈ Rd×d represent the projection matrices of Q, K, and V, respec-
i i i
tively, for the matrix Si . Due to the strong correlations among items within each session,
we consider that their sequential order can be ambiguous, allowing subsequent keys to
be connected to the current query to fully capture the representation power of the self-
attention mechanism.
After the self-attention layer, we employ two MLP layers to model the non-linear
relationships among items within the session to obtain the fluctuation coefficient ci corre-
sponding to the i-th session:
222
Electronics 2023, 12, 3742
ci = GELU ( Li ) T Wic + bic (8)
where WiM ∈ Rd×1 , Wic ∈ Rns ×1 , biM ∈ Rns ×1 , bic ∈ R are learnable parameters. Instead of
the ReLU function, we utilize the smoother GELU [18] function for activation.
223
Electronics 2023, 12, 3742
where x is a vector containing all the features of the samples, μ and σ denote the mean and
variance, α is a learnable scale factor, and β is a bias term.
We merge one layer of the self-attention and one layer of the Feed-Forward Network
into one attention module. To capture the user’s preferences more accurately, we stack b
attention modules to learn more complex item transformations, ultimately obtaining the
representation of the user’s preferences.
where σ (·) is the sigmoid function. Because ADAM demonstrates greater robustness in
handling noise and outliers compared to the stochastic gradient descent algorithm (SGD),
we optimize the model using the ADAM optimizer [19]. The top-K recommendations for
the target user at time step t can be obtained by sorting the scores of all items, and the top
K items in the sorted list are the recommended items.
4. Experiments
In this section, we will present our experimental setup and show the results of our
experiments. The experiments conducted aim to answer the following research questions:
RQ1: Can our proposed method outperform the state-of-the-art baselines?
RQ2: Does the choice of different time interval values for sequence dividing affect the
model’s ability to capture the local fluctuation of the user’s interests?
RQ3: How do parameters such as maximum sequence length and the number of
attention blocks impact the model’s performance?
4.1. Datasets
We evaluated LHASRec on four datasets. These datasets cover different domains,
sizes, and sparsity levels, and all of them are publicly available:
Movielens: https://grouplens.org/datasets/movielens/ (accessed on 25 August
2023) This dataset is sourced from the GroupLens Research project at the University of
Minnesota. It is a widely used benchmark dataset. We utilized the Movielens-1M version,
which consists of 1 million ratings from 6040 users on 3900 movies.
Amazon: http://jmcauley.ucsd.edu/data/amazon/ (accessed on 25 August 2023) We
utilized the users’ purchase and rating dataset from the e-commerce platform Amazon,
which was collected by McAuley et al. [20]. To enhance the usability of the dataset, the
researchers divided it based on high-ranking categories on Amazon. Specifically, we
selected the “Beauty” and “Video Games” categories for our study.
Steam: https://cseweb.ucsd.edu/~jmcauley/datasets.html#steam_data (accessed on
25 August 2023) It originates from the popular digital game distribution platform, Steam.
224
Electronics 2023, 12, 3742
The dataset captures users’ behaviors, such as game purchases, game ratings, and game
social interactions on the Steam platform.
These four datasets all include timestamps of users’ interactions. We followed the
methods described in [3,7] to preprocess the data. Firstly, we sorted the user–item inter-
actions in ascending order based on the timestamps. To ensure the validity of the data,
we excluded those cold-start users and those with less than three user–item interactions.
Similar to the approach in [3], we used the last item in the interaction sequence (i.e., the
most recent item interacted with by the user) as the test set, the second-to-last item as the
validation set, and the remaining items as the training set. Through these preprocessing
steps, we reduced redundant information while preserving the data’s original meaning, fa-
cilitating further research and algorithm evaluation in the recommendation system domain.
Table 2 provides an overview of these datasets, highlighting their characteristics. Among
them, Movielens-1M is the densest dataset, with fewer users and items. On the other hand,
the Steam dataset is the sparsest, containing relatively fewer interactions.
225
Electronics 2023, 12, 3742
where M is the number of users, hits(i ) indicates whether the item interacted with by the
i-th user is present in the recommendation list of length N, and pi represents the position
of the item interacted with by the i-th user in the recommendation list. In our experiments,
we set the length N of the recommendation list to 10. To evaluate the performance of the
recommendation algorithms, we employed HR@10 and NDCG@10 as the two metrics.
Specifically, we appended 100 negative samples [27] randomly after each user’s actual
items and calculated the metric values based on the rankings of these 101 items. It is worth
noting that higher values of HR@10 and NDCG@10 indicate better model performance.
226
Electronics 2023, 12, 3742
Table 3. Recommended performance. We have bolded the best-recommended method in each row
and underlined the second-best-performing approach in each row.
227
Electronics 2023, 12, 3742
of the user’s interests and laying a solid foundation for achieving global stability of the
user’s interests.
Table 4. Impact of different division time interval values on the recommendation performance of the
models across four datasets. We have bolded the best-recommended method in each row.
Table 5. Recommended performance. We removed specific user attributes from LHASRec and
compared the resulting model with the baseline model for analysis. We have bolded the best-
recommended method in each row and underlined the second-best-performing approach in each row.
228
Electronics 2023, 12, 3742
Table 6. Impact of different regularization methods on the recommendation effect on the MovieLens-1M.
We have bolded the best-recommended method in each row and underlined the second-best-performing
approach in each row.
229
Electronics 2023, 12, 3742
all datasets to balance the model’s fitting ability and complexity, thereby obtaining better
recommendation performance.
5. Conclusions
In this work, we propose a sequential model with local-aware ability (LHASRec). The
model comprehensively considers the local fluctuation and global stability of the user’s
interests to capture the long-term and short-term preferences more accurately. Meanwhile,
we enhance the user’s historical interaction sequences by embedding user information.
230
Electronics 2023, 12, 3742
Author Contributions: Conceptualization, J.H.; methodology, J.H.; software, J.H. and Q.L.; valida-
tion, J.H., Q.L. and F.Z.; formal analysis, J.H.; investigation, J.H.; resources, J.H.; data curation, J.H.;
writing—original draft preparation, J.H.; writing—review and editing, J.H. and F.Z.; supervision, J.H.
and Q.L.; project administration, J.H. All authors have read and agreed to the published version of
the manuscript.
Funding: This work is supported by the Action Plan for High-Quality Development of Graduate
Education of Chongqing University of Technology (No. gzlcx20232102).
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Shani, G.; Heckerman, D.; Brafman, R.I.; Boutilier, C. An MDP-based recommender system. J. Mach. Learn. Res. 2005, 6, 1265–1295.
2. Tang, J.; Wang, K. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of
the Eleventh ACM International Conference on Web Search and Data Mining, Marina Del Rey, CA, USA, 5–9 February 2018;
pp. 565–573.
3. Kang, W.C.; McAuley, J. Self-attentive sequential recommendation. In Proceedings of the 2018 IEEE International Conference on
Data Mining (ICDM), Singapore, 17–20 November 2018; pp. 197–206.
4. Hosseinzadeh Aghdam, M.; Hariri, N.; Mobasher, B.; Burke, R. Adapting recommendations to contextual changes using
hierarchical hidden markov models. In Proceedings of the 9th ACM Conference on Recommender Systems, Vienna, Austria,
16–20 September 2015; pp. 241–244.
5. Rendle, S.; Freudenthaler, C.; Schmidt-Thieme, L. Factorizing personalized markov chains for next-basket recommendation. In
Proceedings of the 19th International Conference on World Wide Web, Raleigh, NC, USA, 26–30 April 2010; pp. 811–820.
6. Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; Tikk, D. Session-based recommendations with recurrent neural networks. arXiv 2015,
arXiv:1511.06939.
7. Wu, L.; Li, S.; Hsieh, C.J.; Sharpnack, J. SSE-PT: Sequential recommendation via personalized transformer. In Proceedings of the
14th ACM Conference on Recommender Systems, Virtual Event, Brazil, 22–26 September 2020; pp. 328–337.
8. Li, J.; Wang, Y.; McAuley, J. Time interval aware self-attention for sequential recommendation. In Proceedings of the 13th
International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020; pp. 322–330.
9. Brémaud, P. Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues; Springer Science & Business Media: Berlin, Germany,
2001; Volume 31.
10. Xue, H.J.; Dai, X.; Zhang, J.; Huang, S.; Chen, J. Deep matrix factorization models for recommender systems. In Proceedings of
the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), Melbourne, Australia, 19–25 August 2017;
Volume 17, pp. 3203–3209.
11. Hopfield, J.J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA
1982, 79, 2554–2558. [CrossRef] [PubMed]
12. Elman, J.L. Finding structure in time. Cogn. Sci. 1990, 14, 179–211. [CrossRef]
13. Hidasi, B.; Quadrana, M.; Karatzoglou, A.; Tikk, D. Parallel recurrent neural network architectures for feature-rich session-based
recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September
2016; pp. 241–248.
14. Zhang, Y.; Dai, H.; Xu, C.; Feng, J.; Wang, T.; Bian, J.; Wang, B.; Liu, T.Y. Sequential click prediction for sponsored search
with recurrent neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Québec City, QC, Canada,
27–31 July 2014; Volume 28.
15. Yuan, F.; Karatzoglou, A.; Arapakis, I.; Jose, J.M.; He, X. A simple convolutional generative network for next item recommendation.
In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, Melbourne, VIC, Australia,
11–15 February 2019; pp. 582–590.
16. Li, C.; Liu, Z.; Wu, M.; Xu, Y.; Zhao, H.; Huang, P.; Kang, G.; Chen, Q.; Li, W.; Lee, D.L. Multi-interest network with dynamic
routing for recommendation at Tmall. In Proceedings of the 28th ACM International Conference on Information and Knowledge
Management, Beijing, China, 3–7 November 2019; pp. 2615–2623.
17. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need.
Adv. Neural Inf. Process. Syst. 2017, 30.
231
Electronics 2023, 12, 3742
18. Hendrycks, D.; Gimpel, K. Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units. CoRR
2016, abs/1606.08415. Available online: https://www.bibsonomy.org/bibtex/9aaf203ef9c9e38569532ac88603af8e (accessed on
24 August 2023).
19. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980.
20. McAuley, J.; Targett, C.; Shi, Q.; Van Den Hengel, A. Image-based recommendations on styles and substitutes. In Proceedings
of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile,
9–13 August 2015; pp. 43–52.
21. Rendle, S.; Freudenthaler, C.; Gantner, Z.; Schmidt-Thieme, L. BPR: Bayesian personalized ranking from implicit feedback. arXiv
2012, arXiv:1205.2618.
22. He, R.; Kang, W.C.; McAuley, J. Translation-based recommendation. In Proceedings of the Eleventh ACM Conference on
Recommender Systems, Como, Italy, 27–31 August 2017; pp. 161–169.
23. Zhang, Q.; Cao, L.; Shi, C.; Niu, Z. Neural time-aware sequential recommendation by jointly modeling preference dynamics and
explicit feature couplings. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 5125–5137. [CrossRef] [PubMed]
24. He, M.; Pan, W.; Ming, Z. BAR: Behavior-aware recommendation for sequential heterogeneous one-class collaborative filtering.
Inf. Sci. 2022, 608, 881–899. [CrossRef]
25. Zhou, K.; Yu, H.; Zhao, W.X.; Wen, J.R. Filter-enhanced MLP is all you need for sequential recommendation. In Proceedings of
the ACM Web Conference 2022, Lyon, France, 25–29 April 2022; pp. 2388–2399.
26. He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.S. Neural collaborative filtering. In Proceedings of the 26th International
Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 173–182.
27. Koren, Y. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008;
pp. 426–434.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
232
electronics
Article
An Off-Line Error Compensation Method for Absolute
Positioning Accuracy of Industrial Robots Based on Differential
Evolution and Deep Belief Networks
Yong Tao 1,2, *, Haitao Liu 1 , Shuo Chen 3 , Jiangbo Lan 3 , Qi Qi 1 and Wenlei Xiao 1
1 School of Mechanical Engineering and Automation, Beihang University, Beijing 100191, China;
[email protected] (H.L.); [email protected] (Q.Q.); [email protected] (W.X.)
2 Research Institute of Aero-Engine, Beihang University, Beijing 102206, China
3 School of Large Aircraft Engineering, Beihang University, Beijing 100191, China;
[email protected] (S.C.); [email protected] (J.L.)
* Correspondence: [email protected]; Tel.: +86-010-8231-3905
Abstract: Industrial robots have been increasingly used in the field of intelligent manufacturing. The
low absolute positioning accuracy of industrial robots is one of the difficulties in their application.
In this paper, an accuracy compensation algorithm for the absolute positioning of industrial robots
is proposed based on deep belief networks using an off-line compensation method. A differential
evolution algorithm is presented to optimize the networks. Combined with the evidence theory, a
position error mapping model is proposed to realize the absolute positioning accuracy compensation
of industrial robots. Experiments were conducted using a laser tracker AT901-B on an industrial robot
KR6_R700 sixx_CR. The absolute position error of the end of the robot was reduced from 0.469 mm
to 0.084 mm, improving the accuracy by 82.14% after the compensation. Experimental results
demonstrated that the proposed compensation algorithm could improve the absolute positioning
accuracy of industrial robots, as well as its potential uses for precise operational tasks.
Keywords: absolute positioning accuracy; deep belief network; differential evolution algorithm;
industrial robot; off-line error compensation
Citation: Tao, Y.; Liu, H.; Chen, S.;
Lan, J.; Qi, Q.; Xiao, W. An Off-Line
Error Compensation Method for
Absolute Positioning Accuracy of
1. Introduction
Industrial Robots Based on
Differential Evolution and Deep Industry 4.0 technologies are critical and indispensable tools to propel social and
Belief Networks. Electronics 2023, 12, technological innovation advancements. Previous research [1,2] noted that the use of In-
3718. https://doi.org/10.3390/ dustry 4.0 technologies can better sustain current resources, reduce labor costs, be better
electronics12173718 sources of energy, and potentially produce higher-quality sustainable products. Examples
Academic Editor: Christos Volos
of Industry 4.0 technologies include, but are not limited to, machine learning, virtual and
augmented reality, IoT, artificial intelligence, big data, and robotics [3]. Ref. [4] found
Received: 7 August 2023 Industry 4.0 technologies assist manufacturing companies’ sustainability and increase their
Revised: 29 August 2023 economic potential. Scholars have examined Industry 4.0 technologies in diverse industries
Accepted: 30 August 2023 besides manufacturing. Ref. [5] implemented a systematic review to understand the use of
Published: 2 September 2023 Industry 4.0 technologies in managing pandemics. The use of Industry 4.0 technologies to
meet the increasing demands of society is ubiquitous; applications include robotics [6], arti-
ficial intelligence [7], IoT [8], augmented reality [9], big data [10], and machine learning [11]
in food and agricultural sciences, to assist in more efficient and enhanced production,
Copyright: © 2023 by the authors.
which is needed to feed a growing world population. Ref. [12] investigated the use of
Licensee MDPI, Basel, Switzerland.
This article is an open access article
Industry 4.0 technologies in the manufacturing sector, as examined in 380 papers prior to
distributed under the terms and
2020. Ref. [13] sought to understand the manufacturing patterns implemented based on
conditions of the Creative Commons Industry 4.0 technologies.
Attribution (CC BY) license (https:// Modern advanced manufacturing technology and key technologies demonstrate the
creativecommons.org/licenses/by/ fundamental competitiveness of a nation’s manufacturing industry. Robotics is a sig-
4.0/). nificant Industry 4.0 innovation that offers immeasurable possibilities in manufacturing
disciplines [14]. An industrial robot is a complex system that succeeds in situations in-
volving cross-work environments, high repetition, and high-precision processing. The
current manual-based processing methods cannot meet all the requirements of a short
development cycle and high assembly precision [15], and the use of industrial robots for
processing is an excellent solution to this problem. The repeat positioning accuracy of
industrial robots during the actual work process is usually quite satisfactory [16], and is
typically 0.1 mm. However, the absolute positioning accuracy is poor, with an accuracy
range of only approximately 2–3 mm. The absolute positioning accuracy severely limits the
promotion and application of industrial robots in the manufacturing industry.
To address the problem of poor absolute positioning accuracy at the end of industrial
robots, scholars at home and abroad have proposed various solutions [17]. Kinematic
model-based control of robot joints makes it possible to compensate for absolute positioning
errors in industrial robots. However, the positioning accuracy of the robot is affected by
the size of the error of each kinematic parameter. The kinematic parameter errors can be
addressed using kinematic parameter identification. The resulting parameter errors can be
applied to the kinematic model to realize the adjustment of the model. This increases the
positioning accuracy of the robot in the actual working environment, and the generated
kinematic errors can be used to upgrade the kinematic model. This improves the positioning
accuracy requirements of robots in actual workplaces.
In addition to positioning errors caused by geometric factors, nongeometric factors,
such as gear gap, joint deformation, and temperature change, also affect the end positioning
accuracy of robots [18]. The error mechanisms affecting robot positioning accuracy are
complex and interconnected [19]. It is difficult to establish an accurate kinematic model that
can account for all sources of error. Researchers have begun to investigate the establishment
of a mapping relationship between the theoretical and actual position values.
A co-kriging-based error compensation method [20] was suggested to improve the
positioning accuracy of the aerial drilling robot. A compensation method based on error
similarity and error correlation was proposed to increase the robot’s positioning accu-
racy [21]. First, the maximum working stiffness of the robotic drilling system in a specific
machining task was obtained. This was achieved by optimizing the mounting angle be-
tween the motor spindle and the robot end flange. This optimization laid the groundwork
for achieving high hole-processing accuracy. Second, the method for calculating the cor-
responding compensation value was introduced. This was undertaken according to the
position to be drilled. The method took into account the force deformation at the robot
end and the absolute positioning error of the robot [22]. Combining error similarity and a
radial basis function (RBF) neural network, Wang [23] developed a position error compen-
sation approach. The robot joint angle and position error were used to fit the experimental
semi-variance function. The bandwidth of the RBF neural network was modified using the
parameters of the semi-variance function. The position error of the target position was also
estimated using the RBF neural network. The estimated position error was used to modify
the target position to achieve the compensation effect. In precision manufacturing, Li [24]
introduced a synchronization estimation approach for the total inertia and load torque of
spindle-tool systems. The synchronization method was based on a novel double extended
sliding mode observer (DESMO), which synchronously tracked the total inertia and load
torque. The robustness of DESMO was enhanced by inserting a robust activator to reduce
the effect of coupling errors between the two expansion terms. This was critical to the
precision control of the spindle tool and directly influenced the control performance.
Long-term research has been carried out by Tian Wei’s team at the Nanjing University
of Aeronautics and Astronautics to increase the absolute positioning accuracy of indus-
trial robots. A robot positioning error compensation method based on a deep neural
network [25] was proposed to perform Latin hypercube sampling in Cartesian space. A
positioning error prediction model based on genetic particle swarm optimization and a
deep neural network (GPSO-DNN) was developed to predict and compensate for position-
ing error. Then, a practical positioning error compensation scheme for mobile industrial
234
Electronics 2023, 12, 3718
robots was proposed [16]. A binocular vision measurement method for robot positioning
was developed. A mapping model between theoretical and actual robot pose errors was
proposed based on deep belief networks (DBN), and the pose error estimation was realized.
A method for optimizing neural networks using the genetic particle swarm algorithm
was proposed. This was done to improve the positioning accuracy of robots [26]. The
aim was to model and predict the positioning error of industrial robots and achieve the
compensation of target points within the robot workspace. Tian [27] proposed an absolute
positioning error compensation scheme based on the DBN and error similarity. Relevant
scholars have conducted in-depth research. The accuracy and versatility of error prediction
can be further studied to improve the absolute positioning accuracy of the robot [28]. In
addition, there are many mathematical methods [29–32] that can also be used to study the
positioning accuracy of robots. Analytic methods such as fractional order have gained
more and more attention [33,34].
The DBN is simple in structure and is suitable for data training in industrial robots.
The training time of the DBN is short, thereby helping to improve the efficiency of the robot.
Meanwhile, the differential evolution (DE) algorithm is an optimization algorithm based
on the theory of swarm intelligence. It has been widely used in many fields because of its
simple principle, small number of controlled parameters, and strong robustness. Finally,
evidence theory can make experimental results more reliable.
Min [35] proposed a stable and high-accuracy model-free calibration method for un-
opened robotic systems, which can significantly improve the robot positional accuracy.
Ref. [36] proposed an adaptive hierarchical compensation method based on fixed-length
memory window incremental learning and incremental model reconstruction. Real-time tra-
jectory position error compensation technology that considers non-kinematic errors [37,38]
has also been proposed.
An absolute positioning accuracy compensation algorithm is proposed for industrial
robots based on the DBN. The DE algorithm is used to optimize the DBN. The number
of layer nodes, learning rate, momentum factor, restricted Boltzmann machine (RBM)
iterations, and DBN fine-tuning iterations improve the optimization effect based on six
dimensions and nine parameters. Combined with the evidence theory, the position error
mapping model of industrial robots is established to realize its absolute positioning accuracy
compensation. The technical process is shown in Figure 1.
Combined with the off-line feed-forward compensation method, the prediction error
of the theoretical pose coordinates of the robot target is superimposed on the robot control
235
Electronics 2023, 12, 3718
instructions. The validity and superiority of the scheme are verified using a AT901-B laser
tracker on KUKA KR6_R700 sixx_CR. The absolute positioning error at the end of the robot
was reduced by 82.14%, from 0.469 mm to 0.084 mm. Future work can further consider
industrial robot load, motion speed, acceleration, ambient temperature, or other factors
that affect the absolute positioning accuracy of the robot.
The chapter arrangement of this paper is as follows: Section 1 serves as the intro-
duction, providing an overview of the absolute positioning accuracy of industrial robots
and the method proposed in this paper. Section 2 focuses on the robot positioning error
prediction algorithm based on DE and DBN. Section 3 presents supervised predictive opti-
mization based on evidence theory. Section 4 includes experimental setup, data collection,
model training, and result analysis. Finally, Section 5 presents the conclusion.
The multilayer RBM and the BP layer are stacked to form a DBN, as shown in Figure 3.
236
Electronics 2023, 12, 3718
The first layer of RBM consists of a visible layer v1 and a hidden layer h1 . The visible
layer v2 of the second layer of RBM is the hidden layer h1 of the first layer of RBM, that is,
v1 = h1 , etc. The DBN realizes its layer-by-layer learning by stacking multiple RBMs, so as
to extract the features of the data. The last layer of the DBN sets the BP network.
Unsupervised pretraining and fine-tuning are two processes of DBN training [41]. In
the pretraining process, the greedy algorithm is used. The result obtained by the previous
RBM training is used as the input of the next RBM until all RBMs are trained. The initial
parameters of each RBM are obtained at the same time. The energy function of RBM is
defined as follows:
m n m,n
E(v, h; θ) = − ∑ bi vi − ∑ cj hj − ∑ wij vi h j (1)
i =1 j =1 i,j=1
" #
θ = wij , bi , c j (2)
where m and n are the number of nodes in the visible and hidden layers, respectively. vi ,bi
are the biases between neurons in the visible layer. c j , h j are the biases between neurons in
the hidden layer. wij is the connection weight value between the i-th neuron in the visible
layer and the j-th neuron in the hidden layer. Based on the energy function, the probability
distribution can be obtained as:
1
P(v, h; θ) = exp(−E(v, h; θ)) (3)
Z(θ)
237
Electronics 2023, 12, 3718
1N
2∑
F( τ ) = (ŷ − y)2 (7)
1
∂F(τ )
wout (τ + 1) − wout (τ ) = −η (8)
∂wout (τ )
The DBN has strong robustness and fault tolerance because the information is dis-
tributed in the neurons in the network, and it can approximate any complex nonlinear
system. Therefore, it is suitable for dealing with the nonlinear problem of error compensa-
tion.
238
Electronics 2023, 12, 3718
j
where CR is a crossover factor with a value range of [0, 1]. ui,G is a new individual generated
by the crossover strategy.
After the crossover is completed, the DE algorithm selects each individual of the
current population and the crossover individual. It keeps the best individual among the
two as the next-generation population individual:
& j
j ui,G , i f f ui,j ≤ f( xi,G )
xi,G+1 = j (12)
xi,G , otherwise
The mean square error (MSE) between the expected output of the DBN and the actual
output of the data is used as the fitness function of the DE algorithm:
n m 2
∑ ∑ ŷij − yij
i =1 j =1
F f itness = (13)
N
where N is the number of training sample data sets. m is the dimension of the network
output. ŷij refers to the expected output of the sample. yij refers to the actual output of the
network.
The DE algorithm is known as an efficient global optimizer with the advantages of
convergence speed and high precision. The fitness function of the DE algorithm is related
to the DBN, and the smaller the fitness function value, the better the optimization result.
The principle of the DE algorithm is shown in Figure 4.
The DE algorithm starts searching from a group, that is, multiple points instead of
the same point [45]. This is the main reason why it can find the overall optimal solution
with a greater probability. The evolution criterion of the DE algorithm is based on fitness
information [46] with the help of other auxiliary information. It has inherent parallelism,
which is suitable for large-scale parallel distributed processing [47]. The DE parameter
settings are shown in Table 1.
239
Electronics 2023, 12, 3718
The DBN input layer has nine channels. They are the theoretical position coordinates
of the robot and the angles of the corresponding six joints (x, y, z, θ1, θ2 , θ3 , θ4 , θ5 , θ6 ). The
output layer of DBN is three channels, which are the position error ex , ey , ez of the robot.
The maximum number of hidden layers of DBN nodes is set to four layers. The range
of nodes in the hidden layer is (10, 101). The initial learning rate is 0.01. The momentum
factor is 0.8. The activation function is Sigmoid, and the MSE is used as the loss function.
The hyperparameters that need to be optimized are the number of hidden layers of the
DBN, number of hidden-layer nodes, learning rate, momentum factor, RBM iterations, and
DBN fine-tuning iterations. The hyperparameters are shown in Table 2.
240
Electronics 2023, 12, 3718
The confidence interval is a closed interval composed of a belief function (Bel) and a
likelihood function (Pl), which is used to indicate the degree of support for event θ [55].
Bel(A) is the sum of the basic probability distributions of all subsets of A, which indicates
the degree of trust in A. It is expressed as:
&
Bel(A) = ∑ m(B)
B⊆A (14)
A⊆Θ
Pl(A) is the sum of the basic probability assignments of all subsets intersecting with
A. It indicates the degree of non-denial to A. It is expressed as:
&
Pl(A) = ∑ m(B)
B∩A =∅ (15)
A⊆Θ
Let the finite nonempty set Θ = {θ1 , θ2 , · · · , θn } be the identification framework, and
the function m:2n → [0, 1] be the basic probability assignment function on the identification
framework Θ. In this study, Bel and Pl represent the upper and lower bounds of the
reliability of the positioning accuracy of industrial robots. For the hypothetical conclusion
A in the identification framework, Bel(A) and Pl(A) form a confidence interval denoted
Bel(A), Pl(A).
This represents a propositional uncertainty, where the probability of the occurrence of
proposition A lies somewhere between Bel and Pl bounds. It is shown in Figure 5.
241
Electronics 2023, 12, 3718
samples are used to train the model to ensure that the model can meet the requirements of
high classification accuracy. The model is saved. The model parameters are loaded. The
parameters of the last hidden layer of the model are extracted. The extracted parameters
are converted into the basic probability assignment function. The Dempster synthesis rule
is used to combine and output the final basic probability assignment function.
The second stage is the uncertainty modeling of DBN prediction. The basic probability
assignment function is used as the decision index to predict the classification results. At
the same time, the basic probability assignment function is further calculated to obtain
the conflict value and uncertainty. The uncertainty of DBN prediction is quantitatively
evaluated using the uncertainty evaluation module. The uncertainty evaluation method
for DBN prediction is shown in Figure 6.
The evaluation method proposed earlier is based on the evidence classifier. The
evidence classifier is modeled by extracting the parameters of the hidden layer of the
trained DBN model. During the modeling process, the modification of the training loss
function and the retraining of the DBN model are not required. Such characteristics mean
that the evaluation method can be applied to any pre-trained DBN model and has strong
scalability in the application of the DBN model.
4. Experiments
4.1. Experimental Setup and Data Collection
The experimental platform for compensating absolute positioning accuracy compen-
sation of industrial robots is shown in Figure 7. The industrial robot used for precision
compensation is KUKA’s KR6_R700 sixx_CR. It has a load capacity of 6 kg and a working
range of 700 mm radius. The volume of the working space is 1.36 m3 , the position repeata-
bility is ±0.03 mm, and the absolute positioning accuracy is ±0.6 mm. A Leica AT901-B
laser tracker is used to measure the position error and the error is ±15 μm + 6 μm/m. The
error of the laser tracker increases with the increase in distance.
AT901-B uses an angle encoder to measure the angle and an absolute interferometer to
measure the distance. The absolute interferometer in the AT901 integrates a helium-neon
laser interferometer and an absolute range finder. The two lasers can work independently.
The laser beam emitted by the laser is directed to the target through the universal mirror.
The interferometer laser beam also serves as the collimation axis for the tracker. The
reflected laser light is measured using the tracker’s built-in dual-axis position detector. The
pulse generated by the position detector is processed by the processor of the tracker. The
output is then fed back to the servo motor, which drives the motor to track the target mirror
of the tracker in real time. Finally, the tracking distance measurement is realized, which is
used to measure the actual pose of the end effector of the industrial robot.
242
Electronics 2023, 12, 3718
Figure 7. Error compensation platform for laser trackers and industrial robots.
The control and communication diagram of the error compensation platform is shown
in Figure 8. The computer is used as the TwinCAT master, that is, the primary controller of
the control system. The TwinCAT master uses industrial Ethernet EtherCAT to communi-
cate with industrial robots. The laser tracker communicates with the TwinCAT master via
Ethernet (TCP/IP protocol).
In this study, an off-line compensation method [58] is adopted, which uses a laser
tracker to obtain the actual position of the manipulator. The DBN based on the DE algorithm
is employed to perform the error compensation function.
Assuming that the measurement requirements of the laser tracker are met, the target ball
of the fixed tooling of the industrial robot is installed in the 240 mm × 240 mm × 200 mm
working space, and about 8000 sets of data are measured. For the universality and random-
ness of the experimental data, the random number module drand is used in TwinCAT3 to
randomly generate specific sampling data within a predetermined sampling space. In order
to obtain the real position of the robot and the laser tracker in steady state, each sampling
is divided into three steps. First, the robot arrives at the sampling point and remains there
for 2000 ms. Then, the laser tracker records the data for 1000 ms. Finally, the devices are
delayed for another 1000 ms in order to reset them. The theoretical position coordinates and
joint angles of the robot are the input of the model. The absolute position error of the robot
end constitutes the output of the model.
The data set is divided into the training set and the test set in a ratio of 0.3. The
8000 sets of collected data are divided into 5600 sets of training data and 2400 sets of
test data. As shown in Figure 9, the blue dots represent the training set and the red dots
represent the test set.
243
Electronics 2023, 12, 3718
The DE algorithm is utilized to determine the number of hidden layers of the DBN,
number of nodes in the hidden layer, learning rate, momentum factor, number of iterations
of RBM, and number of iterations of DBN fine-tuning. Figure 11 shows the fitness decline
curve of the DE-optimized DBN, which is reduced by 60.7% from 0.387 to 0.152. After
150 iterations of training, the optimal fitness iteration number can be found at the end of
the 92nd iteration, and the optimal parameters of the DBN can be output.
244
Electronics 2023, 12, 3718
The DE algorithm is utilized to determine the number of hidden layers of the DBN,
number of hidden-layer nodes, learning rate, momentum factor, number of iterations
of RBM, and number of iterations of DBN fine-tuning. The fitness of each particle is
calculated according to the fitness calculation conditions. When the training error reaches
the allowable value or the number of iterations reaches the maximum value, the DE
algorithm iteration terminates.
Finally, the DBN hyperparameters determined according to the DE algorithm are
shown in Table 3.
1 n
n i∑
MSE = (yi − ŷi )2 (19)
=1
'
1 n √
n i∑
RMSE = (yi − ŷi )2 = MSE (20)
=1
245
Electronics 2023, 12, 3718
1 n |yi − ŷi |
n i∑
MAPE = (21)
=1
yi
1 n
n i∑
MAE = |yi − ŷi | (22)
=1
n
2
∑ (yi − ŷi )
i =1 RMSE
R2 = 1 − n = 1− (23)
2 Var
∑ ( yi − yi )
i =1
where yi represents the true value of the data set. ŷi represents the predicted value of the
data set. yi represents the average value of the predicted value of the data set. n represents
the number of data sets. Var represents the data variance.
MSE is the mean of the sum of squares of the corresponding point errors between the
predicted data and the original data. RMSE is the square root of MSE, also known as the
fitting standard deviation of the regression system. MAPE is often used to measure the
accuracy of predictions. However, when the real data are equal to zero, the denominator
becomes zero and the formula is not available. The situation where the true value is zero
does not appear in this study. The MAE refers to the average value of the absolute value of
the deviation of each measurement value, which accurately reflects the size of the actual
prediction error. The closer the four indicators approach 0, the closer the predicted value
will be to the real value, indicating a better prediction effect.
R2 represents the coefficient of determination of the model. The best score is 1, indicat-
ing that the model perfectly predicts the real value. It may also be negative because the
model can be arbitrarily worse, that is, no mapping-fitting relationship exists between the
predicted data and the real data.
Table 4 shows the calculated MSE, RMSE, MAPE, MAR, and R2 from the DBN model.
x y z
MSE 0.0104 0.0017 0.0003
RMSE 0.1021 0.0412 0.0194
MAPE 0.1021 0.0412 0.0193
MAE 0.0824 0.0284 0.0900
R2 0.8701 0.9038 0.9582
The MSE, RMSE, MAPE, and MAE of the position error prediction value of the robot
end effector are all found to be close to 0 using the proposed position error prediction
model for industrial robots. The coefficient of determination (R2 ) of the predicted value
of the position error is shown in Figures 12–14. The figures show that the predicted value
(blue dot) is closely distributed around the real value (red line). In addition, R2 is close to 1,
indicating the correlation between the predicted value and the actual value. The higher the
value, the higher the fitting accuracy. Therefore, the proposed machine learning model has
good adaptability and robustness in the prediction of industrial robot position errors.
The R2 of the robot end precision compensation error is around 0.87–0.95, and the
overall effect is good. As the DBN is trained and iterated in three dimensions, the charac-
teristics of the three dimensions interact and couple with each other. It is also disturbed by
nonlinear factors such as the accuracy of data acquisition and environmental conditions,
which makes some differences in the effect of robot end accuracy error compensation.
246
Electronics 2023, 12, 3718
A total of 50 random verification points were selected in the robot motion space
with a measurement space of 240 × 240 × 200 mm3 . These are presented to verify the
effectiveness and improvement effect of the DBN optimization based on the DE algorithm.
The distribution of position errors before and after compensation in the x, y, and z directions
are shown in Figures 15–17, respectively. The compensation results revealed the following.
Before compensation, the errors in the x direction are basically evenly distributed above and
below 0. The errors in the y direction are also distributed around 0, but are more negative.
The errors in the z direction are basically negative. The error compensation technology
proposed in this study is used to optimize the DBN with the DE algorithm. The errors
in the three directions are basically distributed around 0 and fluctuate around ±0.2 mm,
±0.1 mm, and ±0.5 mm. The range of fluctuations is extremely small, indicating that the
247
Electronics 2023, 12, 3718
accuracy after compensation has high stability and that the accuracy of robot operation can
be improved.
Figure 15. Position error on x before and after compensation of the robot.
Figure 16. Position error on y before and after compensation of the robot.
248
Electronics 2023, 12, 3718
Figure 17. Position error on z before and after compensation of the robot.
Table 5 shows the static statistical analysis results before and after the robot end
position error compensation. The x, y, and z directions are improved by 65.56%, 55.22%,
and 49.12%, respectively.
Percent
Error Range Confidence
Improvement
Before [−0.674, 0.773] [−0.500, 0.345]
x error (mm) 65.56%
After [−0.130, 0.368] [−0.017, 0.017]
Before [0.133, −0.559] [−0.225, 0.063]
y error (mm) 55.22%
After [0.108, −0.201] [−0.047, 0.031]
Before [0.003, −0.162] [−0.115, −0.026]
z error (mm) 49.12%
After [0.029, −0.054] [−0.021, 0.009]
The experimental platform for data acquisition and verification of the robot is the light
industrial robot KR6_R700 sixx_CR. The error range is much smaller compared with that of
the traditional heavy industrial robot. Hence, using the DBN for feature extraction, model
training, and optimization is difficult. The network is optimized and combined with the
evidence theory, and the position error mapping model of industrial robots is established.
The comprehensive analysis of the compensation effect of the robot end accuracy in three
directions is shown in Figure 18.
The method used in this study is compared with previous methods to verify the test
results [25,27]. The results are shown in Table 6. After off-line compensation, the minimum
value is reduced from 0.097 mm to 0.006 mm. The average value is reduced from 0.110 mm
to 0.083 mm. Therefore, the proposed DE-DBN method was successful in improving the
minimum and average values after the end error compensation of the robot.
249
Electronics 2023, 12, 3718
5. Conclusions
Based on deep belief networks using an off-line compensation method, a compensation
algorithm for the absolute positioning accuracy of industrial robots is proposed. It predicts
and compensates for the absolute positioning error of industrial robots based on the
DBN and DE algorithm. The number of hidden layers, hidden-layer nodes, learning
rate, momentum factors, RBM iterations, and DBN fine-tuning iterations are optimized.
The position error model of industrial robots is established. Combined with the off-line
feedback compensation method, the proposed method is verified experimentally using the
KR6_R700 sixx_CR industrial robot and the AT901-B laser tracker.
After compensation, the absolute positioning error of the robot end is reduced by
82.14%, from 0.469 mm to 0.084 mm. The absolute positioning accuracy of the industrial
robot is improved. This indicates the proposed approach is advantageous for performing
more precise operation tasks. The results of this paper can be used to improve the absolute
positioning accuracy of industrial robots, which is of great help in improving the motion
accuracy and force control performance of robots.
Considerations of the off-line compensation method include an experimental environ-
ment free of vibration, the allowable operating temperature of the robot, and the higher
accuracy of the laser tracker. Future work can further consider industrial robot load, motion
speed, acceleration, ambient temperature, or other factors that affect the absolute posi-
tioning accuracy of the robot. Deep learning can be integrated into the robot’s motion
control system. The training model can be deployed in the control algorithm. Realizing
the intelligent prediction and real-time compensation of robot errors is a direction of great
research value.
Author Contributions: Conceptualization, Y.T. and H.L.; methodology, Y.T.; software, H.L. and
S.C.; validation, J.L.; formal analysis, Q.Q.; investigation, W.X.; resources, H.L.; data curation, S.C.;
writing—original draft preparation, H.L.; writing—review and editing, H.L.; visualization, S.C.;
supervision, W.X.; project administration, Y.T. and W.X.; funding acquisition, Y.T. and W.X. All
authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the Ministry of Industry and Information Technology of the
People’s Republic of China, National Key Research and Development Plan “Intelligent Robot” Project
No. 2022YFB4700402. and No. 2019YFB1310100.
Data Availability Statement: All data have been included in the manuscript.
250
Electronics 2023, 12, 3718
Acknowledgments: The authors would like to thank all the colleagues who contributed to this
research.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Mubarak, M.F.; Petraite, M. Industry 4.0 Technologies, Digital Trust and Technological Orientation: What Matters in Open
Innovation? Technol. Forecast Soc. Chang. 2020, 161, 120332. [CrossRef]
2. Papakostas, N.; Constantinescu, C.; Mourtzis, D. Novel Industry 4.0 Technologies and Applications. Appl. Sci. 2020, 10, 6498.
[CrossRef]
3. Jaskó, S.; Skrop, A.; Holczinger, T.; Chován, T.; Abonyi, J. Development of Manufacturing Execution Systems in Accordance with
Industry 4.0 Requirements: A Review of Standard- and Ontology-Based Methodologies and Tools. Comput. Ind. 2020, 123, 103300.
[CrossRef]
4. Rosin, F.; Forget, P.; Lamouri, S.; Pellerin, R. Impacts of Industry 4.0 Technologies on Lean Principles. Int. J. Prod. Res. 2019, 58,
1644–1661. [CrossRef]
5. Moosavi, J.; Bakhshi, J.; Martek, I. The Application of Industry 4.0 Technologies in Pandemic Management: Literature Review
and Case Study. Healthc. Anal. 2021, 1, 100008. [CrossRef]
6. Klerkx, L.; Rose, D. Dealing with the Game-Changing Technologies of Agriculture 4.0: How Do We Manage Diversity and
Responsibility in Food System Transition Pathways? Glob. Food Sec. 2020, 24, 100347. [CrossRef]
7. Javaid, M.; Haleem, A.; Khan, I.H.; Suman, R. Understanding the Potential Applications of Artificial Intelligence in Agriculture
Sector. Adv. Agrochem. 2023, 2, 15–30. [CrossRef]
8. Strong, R.; Wynn, J.T.; Lindner, J.R.; Palmer, K. Evaluating Brazilian Agriculturalists’ IoT Smart Agriculture Adoption Barriers:
Understanding Stakeholder Salience Prior to Launching an Innovation. Sensors 2022, 22, 6833. [CrossRef]
9. Ronaghi, M.; Ronaghi, M.H. Investigating the Impact of Economic, Political, and Social Factors on Augmented Reality Technology
Acceptance in Agriculture (Livestock Farming) Sector in a Developing Country. Technol. Soc. 2021, 67, 101739. [CrossRef]
10. Osinga, S.A.; Paudel, D.; Mouzakitis, S.A.; Athanasiadis, I.N. Big Data in Agriculture: Between Opportunity and Solution. Agric.
Syst. 2022, 195, 103298. [CrossRef]
11. Ahn, J.; Briers, G.; Baker, M.; Price, E.; Sohoulande Djebou, D.C.; Strong, R.; Piña, M.; Kibriya, S. Food Security and Agricultural
Challenges in West-African Rural Communities: A Machine Learning Analysis. Int. J. Food Prop. 2022, 25, 827–844. [CrossRef]
12. Zheng, T.; Ardolino, M.; Bacchetti, A.; Perona, M. The Applications of Industry 4.0 Technologies in Manufacturing Context: A
Systematic Literature Review. Int. J. Prod. Res. 2021, 59, 1922–1954. [CrossRef]
13. Frank, A.G.; Dalenogare, L.S.; Ayala, N.F. Industry 4.0 Technologies: Implementation Patterns in Manufacturing Companies. Int.
J. Prod. Econ. 2019, 210, 15–26. [CrossRef]
14. Javaid, M.; Haleem, A.; Singh, R.P.; Suman, R. Substantial Capabilities of Robotics in Enhancing Industry 4.0 Implementation.
Cogn. Robot. 2021, 1, 58–75. [CrossRef]
15. Zhang, T.; Yu, Y.; Yang, L.X.; Xiao, M.; Chen, S.Y. Robot Grinding System Trajectory Compensation Based on Co-Kriging Method
and Constant-Force Control Based on Adaptive Iterative Algorithm. Int. J. Precis. Eng. Manuf. 2020, 21, 1637–1651. [CrossRef]
16. Wang, W.; Tian, W.; Liao, W.; Li, B. Pose Accuracy Compensation of Mobile Industry Robot with Binocular Vision Measurement
and Deep Belief Network. Optik 2021, 238, 166716. [CrossRef]
17. Qi, J.; Chen, B.; Zhang, D. Compensation for Absolute Positioning Error of Industrial Robot Considering the Optimized
Measurement Space. Int. J. Adv. Robot Syst. 2020, 17. [CrossRef]
18. Kong, L.B.; Yu, Y. Precision Measurement and Compensation of Kinematic Errors for Industrial Robots Using Artifact and
Machine Learning. Adv. Manuf. 2022, 10, 397–410. [CrossRef]
19. Cao, C.T.; Do, V.P.; Lee, B.R. A Novel Indirect Calibration Approach for Robot Positioning Error Compensation Based on Neural
Network and Hand-Eye Vision. Appl. Sci. 2019, 9, 1940. [CrossRef]
20. Chen, D.; Yuan, P.; Wang, T.; Cai, Y.; Xue, L. A Compensation Method for Enhancing Aviation Drilling Robot Accuracy Based on
Co-Kriging. Int. J. Precis. Eng. Manuf. 2018, 19, 1133–1142. [CrossRef]
21. Chen, D.; Yuan, P.; Wang, T.; Ying, C.; Tang, H. A Compensation Method Based on Error Similarity and Error Correlation to
Enhance the Position Accuracy of an Aviation Drilling Robot. Meas. Sci. Technol. 2018, 29, 085011. [CrossRef]
22. Shen, N.Y.; Guo, Z.M.; Li, J.; Tong, L.; Zhu, K. A Practical Method of Improving Hole Position Accuracy in the Robotic Drilling
Process. Int. J. Adv. Manuf. Technol. 2018, 96, 2973–2987. [CrossRef]
23. Chen, D.; Wang, T.; Yuan, P.; Sun, N.; Tang, H. A Positional Error Compensation Method for Industrial Robots Combining Error
Similarity and Radial Basis Function Neural Network. Meas. Sci. Technol. 2019, 30, 125010. [CrossRef]
24. Wang, L.; Tang, Z.; Zhang, P.; Liu, X.; Wang, D.; Li, X. Double Extended Sliding Mode Observer-Based Synchronous Estimation of
Total Inertia and Load Torque for PMSM-Driven Spindle-Tool Systems. IEEE Trans. Ind. Inf. 2022, 19, 8496–8507. [CrossRef]
25. Fu, S.; Li, Y.; Zhang, M.; Hu, J.; Hua, F.; Tian, W. Robot Positioning Error Compensation Method Based on Deep Neural Network.
J. Phys. Conf. Ser. 2020, 1487, 012045. [CrossRef]
26. LI, B.; TIAN, W.; ZHANG, C.; HUA, F.; CUI, G.; LI, Y. Positioning Error Compensation of an Industrial Robot Using Neural
Networks and Experimental Study. Chin. J. Aeronaut. 2022, 35, 346–360. [CrossRef]
251
Electronics 2023, 12, 3718
27. Wang, W.; Tian, W.; Liao, W.; Li, B.; Hu, J. Error Compensation of Industrial Robot Based on Deep Belief Network and Error
Similarity. Robot Comput. Integr. Manuf. 2022, 73, 102220. [CrossRef]
28. Qi, J.; Chen, B.; Zhang, D. A Calibration Method for Enhancing Robot Accuracy Through Integration of Kinematic Model and
Spatial Interpolation Algorithm. J. Mech. Robot 2021, 13, 061013. [CrossRef]
29. Adel, M.; Khader, M.M.; Algelany, S. High-Dimensional Chaotic Lorenz System: Numerical Treatment Using Changhee
Polynomials of the Appell Type. Fractal. Fract. 2023, 7, 398. [CrossRef]
30. Adel, M.; Khader, M.M.; Assiri, T.A.; Kallel, W. Numerical Simulation for COVID-19 Model Using a Multidomain Spectral
Relaxation Technique. Symmetry 2023, 15, 931. [CrossRef]
31. Khader, M.M.; Inc, M.; Adel, M.; Akinlar, M.A. Numerical Solutions to the Fractional-Order Wave Equation. Int. J. Mod. Phys. C
2023, 34, 2350067. [CrossRef]
32. Adel, M.; Srivastava, H.M.; Khader, M.M. Implementation of an Accurate Method for the Analysis and Simulation of Electrical
R-L Circuits. Math. Methods Appl. Sci. 2023, 46, 8362–8371. [CrossRef]
33. Adel, M.; Khader, M.M.; Ahmad, H.; Assiri, T.A.; Adel, M.; Khader, M.M.; Ahmad, H.; Assiri, T.A. Approximate Analytical
Solutions for the Blood Ethanol Concentration System and Predator-Prey Equations by Using Variational Iteration Method. AIMS
Math. 2023, 8, 19083–19096. [CrossRef]
34. Ibrahim, Y.F.; Abd El-Bar, S.E.; Khader, M.M.; Adel, M. Studying and Simulating the Fractional COVID-19 Model Using an
Efficient Spectral Collocation Approach. Fractal. Fract. 2023, 7, 307. [CrossRef]
35. Min, K.; Ni, F.; Chen, Z.; Liu, H.; Lee, C.-H. A Robot Positional Error Compensation Method Based on Improved Kriging
Interpolation and Kronecker Products. IEEE Trans. Ind. Electron. 2023, 1–10. [CrossRef]
36. Zhou, J.; Zheng, L.; Fan, W.; Zhang, X.; Cao, Y. Adaptive Hierarchical Positioning Error Compensation for Long-Term Service of
Industrial Robots Based on Incremental Learning with Fixed-Length Memory Window and Incremental Model Reconstruction.
Robot Comput. Integr. Manuf. 2023, 84, 102590. [CrossRef]
37. Li, R.; Ding, N.; Zhao, Y.; Liu, H. Real-Time Trajectory Position Error Compensation Technology of Industrial Robot. Measurement
2023, 208, 112418. [CrossRef]
38. Ma, S.; Deng, K.; Lu, Y.; Xu, X. Error Compensation Method of Industrial Robots Considering Non-Kinematic and Weak Rigid
Base Errors. Precis. Eng. 2023, 82, 304–315. [CrossRef]
39. Hinton, G.E.; Osindero, S.; Teh, Y.W. A Fast Learning Algorithm for Deep Belief Nets. Neural Comput. 2006, 18, 1527–1554.
[CrossRef]
40. Gao, S.; Xu, L.; Zhang, Y.; Pei, Z. Rolling Bearing Fault Diagnosis Based on SSA Optimized Self-Adaptive DBN. ISA Trans. 2022,
128, 485–502. [CrossRef]
41. Wang, Y.; Pan, Z.; Yuan, X.; Yang, C.; Gui, W. A Novel Deep Learning Based Fault Diagnosis Approach for Chemical Process with
Extended Deep Belief Network. ISA Trans. 2020, 96, 457–467. [CrossRef]
42. Liu, J.; Wu, N.; Qiao, Y.; Li, Z. Short-Term Traffic Flow Forecasting Using Ensemble Approach Based on Deep Belief Networks.
IEEE Trans. Intell. Transp. Syst. 2022, 23, 404–417. [CrossRef]
43. Storn, R.; Price, K. Differential Evolution—A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces. J.
Glob. Optim. 1997, 11, 341–359. [CrossRef]
44. Ahmad, M.F.; Isa, N.A.M.; Lim, W.H.; Ang, K.M. Differential Evolution: A Recent Review Based on State-of-the-Art Works. Alex.
Eng. J. 2022, 61, 3831–3872. [CrossRef]
45. Bilal; Pant, M.; Zaheer, H.; Garcia-Hernandez, L.; Abraham, A. Differential Evolution: A Review of More than Two Decades of
Research. Eng. Appl. Artif. Intell. 2020, 90, 103479. [CrossRef]
46. Deng, W.; Shang, S.; Cai, X.; Zhao, H.; Song, Y.; Xu, J. An Improved Differential Evolution Algorithm and Its Application in
Optimization Problem. Soft Comput. 2021, 25, 5277–5298. [CrossRef]
47. Khaparde, A.R.; Alassery, F.; Kumar, A.; Alotaibi, Y.; Khalaf, O.I.; Pillai, S.; Alghamdi, S. Differential Evolution Algorithm with
Hierarchical Fair Competition Model. Intell. Autom. Soft Comput. 2022, 33, 1045–1062. [CrossRef]
48. Fang, Z.; Roy, K.; Mares, J.; Sham, C.W.; Chen, B.; Lim, J.B.P. Deep Learning-Based Axial Capacity Prediction for Cold-Formed
Steel Channel Sections Using Deep Belief Network. Structures 2021, 33, 2792–2802. [CrossRef]
49. Tong, Z.; Xu, P.; Denœux, T. An Evidential Classifier Based on Dempster-Shafer Theory and Deep Learning. Neurocomputing 2021,
450, 275–293. [CrossRef]
50. Du, Y.W.; Zhong, J.J. Generalized Combination Rule for Evidential Reasoning Approach and Dempster–Shafer Theory of Evidence.
Inf. Sci. 2021, 547, 1201–1232. [CrossRef]
51. Deng, X.; Jiang, W.; Wang, Z. Zero-Sum Polymatrix Games with Link Uncertainty: A Dempster-Shafer Theory Solution. Appl.
Math. Comput. 2019, 340, 101–112. [CrossRef]
52. Gudiyangada Nachappa, T.; Tavakkoli Piralilou, S.; Gholamnia, K.; Ghorbanzadeh, O.; Rahmati, O.; Blaschke, T. Flood Suscepti-
bility Mapping with Machine Learning, Multi-Criteria Decision Analysis and Ensemble Using Dempster Shafer Theory. J. Hydrol.
2020, 590, 125275. [CrossRef]
53. Pan, Y.; Zhang, L.; Li, Z.W.; Ding, L. Improved Fuzzy Bayesian Network-Based Risk Analysis with Interval-Valued Fuzzy Sets
and D–S Evidence Theory. IEEE Trans. Fuzzy Syst. 2020, 28, 2063–2077. [CrossRef]
54. Xiao, F. Generalization of Dempster–Shafer Theory: A Complex Mass Function. Appl. Intell. 2020, 50, 3266–3275. [CrossRef]
252
Electronics 2023, 12, 3718
55. Xiao, F. A New Divergence Measure for Belief Functions in D–S Evidence Theory for Multisensor Data Fusion. Inf. Sci. 2020, 514,
462–483. [CrossRef]
56. Feng, R.; Xu, X.; Zhou, X.; Wan, J. A Trust Evaluation Algorithm for Wireless Sensor Networks Based on Node Behaviors and D-S
Evidence Theory. Sensors 2011, 11, 1345–1360. [CrossRef]
57. Wang, H.; Deng, X.; Jiang, W.; Geng, J. A New Belief Divergence Measure for Dempster–Shafer Theory Based on Belief and
Plausibility Function and Its Application in Multi-Source Data Fusion. Eng. Appl. Artif. Intell. 2021, 97, 104030. [CrossRef]
58. Slavkovic, N.; Zivanovic, S.; Kokotovic, B.; Dimic, Z.; Milutinovic, M. Simulation of Compensated Tool Path through Virtual
Robot Machining Model. J. Braz. Soc. Mech. Sci. Eng. 2020, 42, 374. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
253
electronics
Article
A Data-Driven Approach Using Enhanced Bayesian-LSTM
Deep Neural Networks for Picks Wear State Recognition
Dong Song 1,2, * and Yuanlong Zhao 3, *
Abstract: Picks are key components for the mechanized excavation of coal by mining machinery,
with their wear state directly influencing the efficiency of the mining equipment. In response to the
difficulty of determining the overall wear state of picks during coal-mining production, a data-driven
wear state identification model for picks has been constructed through the enhanced optimization
of Long Short-Term Memory (LSTM) networks via Bayesian algorithms. Initially, a mechanical
model of pick and coal-rock interaction is established through theoretical analysis, where the stress
characteristic of the pick is analyzed, and the wear mechanism of the pick is preliminarily revealed.
A method is proposed that categorizes the overall wear state of picks into three types based on
the statistical relation of the actual wear amount and the limited wear amount. Subsequently, the
vibration signals of the cutting drum from a bolter miner that contain the wear information of picks
are decomposed and denoised using wavelet packet decomposition, with the standard deviation
of wavelet packet coefficients from decomposed signal nodes selected as the feature signals. These
feature signals are normalized and then used to construct a feature matrix representing the vibration
signals. Finally, this constructed feature matrix and classification labels are fed into the Bayesian-
LSTM network for training, thus resulting in the picks wear state identification model. To validate
the effectiveness of the Bayesian-LSTM deep learning algorithm in identifying the overall picks wear
state of mining machinery, vibration signals from the X, Y, and Z axes of the cutting drum from
a bolter miner at the C coal mine in Shaanxi, China, are collected, effectively processed, and then
Citation: Song, D.; Zhao, Y. A input into deep LSTM and Back-Propagation (BP) neural networks respectively for comparison. The
Data-Driven Approach Using results showed that the Bayesian-LSTM network achieved a recognition accuracy of 98.33% for picks
Enhanced Bayesian-LSTM Deep wear state, showing a clear advantage over LSTM, BP network models, thus providing important
Neural Networks for Picks Wear
references for the identification of picks wear state based on deep learning algorithms. This method
State Recognition. Electronics 2023, 12,
only requires the processing and analysis of the equipment parameters automatically collected from
3593. https://doi.org/10.3390/
bolter miners or other mining equipment, offering the advantages of simplicity, low cost, and high
electronics12173593
accuracy, and providing a basis for a proper picks replacement strategy.
Academic Editor: Domenico Ursino
1. Introduction
The advancement of technologies such as the Internet of Things (IoT), 5G, Big Data,
Copyright: © 2023 by the authors.
Cloud Computing, and Artificial Intelligence (AI) has promoted the integration and innova-
Licensee MDPI, Basel, Switzerland.
tion of a new generation of information technology and coal-mining machinery technology,
This article is an open access article
providing specific technical approaches for the digital transformation and upgrade of
distributed under the terms and
coal-mining equipment. The picks are a key component of coal-mining equipment for
conditions of the Creative Commons
Attribution (CC BY) license (https://
mechanized coal extraction, and their wear state directly affects the efficiency of the mining
creativecommons.org/licenses/by/
equipment. In the process of cutting coal-rock, the picks crush and cut coal-rock under the
4.0/).
action of strong thrust, suffering from severe impact and high stress, and experience intense
friction with the coal wall. There is a strong non-linear coupling effect and friction wear
behavior between the picks and the coal-rock, which can easily lead to the wear failure
of the picks [1]. According to statistics, wear failure accounts for as much as 75–90% of
all failure modes of picks [2]. In actual coal-mining production, due to factors such as
coal-rock characteristics and different picks’ installation angles, the wear degree of the picks
at different positions of the mining equipment’s cutting drum will inevitably vary during
the cutting process. For picks that wear out quickly, if they are not replaced in time, this
will lead to increased wear on the other picks, seriously affecting the cutting efficiency of
mining equipment. Replacing picks immediately requires stopping the operation of mining
equipment, which also impacts work efficiency [3,4]. Coal-mining enterprises currently
rely mainly on manual experience to decide whether to replace picks, and to prevent
work efficiency from being affected by multiple pick replacements, they can only adopt
the strategy of replacing all picks of different wear levels at once, leading to significant
economic waste [5]. Therefore, if the wear state of the mining machinery picks can be
accurately identified, it will not only allow real-time understanding of the wear state of
the picks on the cutting drum, ensuring the efficient operation of mining equipment, but
could also help to propose a scientific picks replacement strategy, significantly reducing
production costs for enterprises.
Many scholars have conducted extensive research on the wear mechanism of picks
and the prediction of pick life. Dewangan [6] used an electron microscope and X-ray
energy dispersive spectroscopy to scan and analyze the images before and after the wear
of the pick, revealing the wear mechanism of the pick and the method of predicting wear
volume. Zhang et al. [7] used PFC 3D software 5.0 to simulate the cutting process of the
pick, conducted cutting experiments on different coated picks, calculated the mass loss
before and after cutting, and then predicted and analyzed the life of the pick. Qin et al. [8]
proposed a reliability model for the competitive failure of picks under random load impact
by considering the effects of sustained impact, variable rate acceleration degradation, and
hard failure threshold changes on pick wear. Tian et al. [9] proposed a degradation model
based on the Gamma process to describe the wear and tear of picks on the tunneling
machine, realizing the prediction of the remaining life of picks.
In recent years, with the development of sensor technology, some scholars have used
machine-learning methods to study the identification of the picks wear state. By studying
the features of vibration, acoustic emission, cutting force, power, and current signals during
the cutting process of picks, they have obtained indirect indicators reflecting the wear of
picks, thus achieving the identification of the picks wear state. Zhang et al. [10–12] built
experimental devices, extracted triaxial vibration signals, infrared temperature signals,
and current signals of picks with different wear degrees during the cutting process, con-
structed a multi-feature signal sample database for picks with different degrees of wear,
and established a pick wear degree identification model based on the BP neural network.
Jin et al. [13] used an acoustic emission signal acquisition device to collect signals from
cutting four different proportions of coal-rock specimens, applied three-layer wavelet
packet decomposition and reconstruction technology to process the signals, and used D-S
evidence theory to intelligently identify the degree of picks’ wear.
In summary, regarding the identification of the wear state of picks, existing research
focuses on one hand on using statistical methods to calculate the wear volume of picks and
predict their lifespan, and on the other hand on identifying the wear state of individual
picks based on multi-source information fusion. However, due to the constraints of the
underground application environment in coal mines, less attention is paid to the overall
wear state evaluation of the picks of mining equipment in coal-mine production. Moreover,
in the application of machine-learning methods, the commonly used methods in the exist-
ing research are shallow learning algorithms, including Support Vector Machines (SVM),
Hidden Markov Models (HMM), and BP neural networks. Compared with deep learning
models, traditional machine learning and shallow learning algorithms have obvious dis-
advantages in terms of data-processing capacity, non-linear processing capabilities, and
256
Electronics 2023, 12, 3593
2. Related Work
Based on the above, in the research of picks wear state recognition, the current recog-
nition methods are only conducted within a small sample range. When the data volume
is too large, computational difficulties arise, which cannot meet the demand for handling
massive data in the recognition of the overall wear state of picks [15–17]. Therefore, the
utilization of deep learning for recognizing the overall wear state of picks presents a sig-
nificant advantage. Currently, deep learning algorithms have begun to be used in areas
like machine tool wear state recognition. For instance, Huang et al. [18] proposed a new
method for tool wear prediction based on a deep convolutional neural network and multi-
domain feature fusion, constructing a high-accuracy tool wear prediction model combining
adaptive feature fusion and automatic continuous prediction. Furthermore, Ma et al. [19]
used milling force signals to establish a tool wear prediction model based on convolutional
bidirectional LSTM networks, achieving highly accurate prediction results. On the basis of
deep learning models, some researchers have attempted to use optimization algorithms to
address the reliance of recognition models on large data samples. Wu et al. [20] optimized
LSTM networks using a particle swarm optimization (PSO) algorithm and applied an
improved polynomial threshold function to denoise tool acceleration vibration signals, thus
achieving tool wear quantity prediction and wear state classification. Due to the picks wear
state information typically being a time series signal, certain researchers have employed a
1D convolutional neural network (CNN) for the feature classification of temporal signals
in related fields. Abdeljaber et al. [21] presented a compact 1D CNN architecture that
integrates feature extraction and classification modules, enabling automatic extraction
257
Electronics 2023, 12, 3593
of optimal image-sensitive features directly from raw acceleration signals, utilized for
real-time vibration-induced damage monitoring and localization, with a demonstrated out-
standing performance and an exceptionally high computational efficiency. Yuan et al. [22]
introduced a 1D CNN model for rapid and accurate comprehensive damage assessment
post-earthquake. Their results revealed that the prediction accuracy of the 1D CNN model
is comparable to that of 2D CNN models, yet with an over 90% reduced computation time
and an over 69% resource usage reduction. Abdoli et al. [23] introduced a 1D CNN-based
approach for environmental sound classification that directly captures audio signal patterns
through convolutional layers, achieving an average accuracy of 89% with fewer data than
traditional feature-based methods.
Through comparative analysis of the use of deep learning methods for tool wear state
recognition, it can be observed that most of the recognition models still employ traditional
structures such as CNN and LSTM. Some choose to combine optimization algorithms
like PSO and Genetic Algorithms (GA) to address convergence problems during weight
training. When considering the selection of input parameters, the vast majority of studies
still rely on the research experience of their predecessors, without considering the impact of
different input parameter combinations on the output results [24–27]. Additionally, these
network models yield fixed weight matrices after training, and these weight matrices are
no longer updated. The model cannot allocate different weights based on the change in
inputs, so its generalization ability when faced with different tasks can be significantly
constrained [28]. The LSTM deep learning network, with its unique memory units and
gate mechanisms, is adept at capturing dependencies in time series, offering a distinct
advantage in processing temporal data [29–32]. However, the randomness introduced by
environmental factors and parameter choices might compromise the accuracy of the recog-
nition results [33]. In recent years, several researchers have incorporated Bayesian theory
into LSTM deep learning networks to estimate weights and biases. This approach shifts
the neural network parameter estimation from point estimation to probability distribution,
enabling the network to evaluate the certainty or uncertainty of results. Consequently, this
enriches the deep learning network’s formidable data-fitting capability, further enhancing
its learning precision. Li et al. [34] proposed a method that leverages Bayesian-LSTM
to perform Stochastic Variational Inference (SVI) on process-based hydrological models.
By constructing a residual model, they sought to refine the predictions of uncertainty
in hydrological models. The results demonstrated that this method provided a highly
reliable uncertainty interval. Compared to the Bayesian linear regression model, Bayesian-
LSTM offered superior uncertainty estimation. Yang et al. [35] introduced a HiBayes-LSTM
method containing an FIE component to capture past and future time dependencies. By
collecting large-scale HTRO datasets, they extended the weights of the LSTM network
to a probabilistic model, ensuring uncertainty in the HM direction of the head trajectory
predictions. Experimental outcomes revealed that HiBayes-LSTM notably outperformed
nine other methods in predicting ODIs’ significance.
3. Preliminaries
This section analyzes the wear mechanism of picks by establishing a mechanical model
of the pick and coal-rock mass, and proposes a classification method for the overall wear
state of the picks of mining machinery. At the same time, a method for selecting the
characteristic parameters of the picks wear state is provided.
258
Electronics 2023, 12, 3593
on the pick tip during the drum movement process. The long-term reciprocating cutting
action causes the material on the pick surface to continuously peel off, thus intensifying
the wear of the pick. Therefore, the friction force on the alloy head is the main cause of the
wear of the pick tip.
According to the classic plane-cutting model of pick [36], it is assumed that the friction
coefficient between the coal-rock mass and the pick is μ, and the relationship between the
surface pressure stress q of the coal-rock mass and its compressive strength u is:
δA = rδφδl (3)
where φ is the fracture angle, l is the length from the tip to a point on the pick body, and
r is the radius of the cross-sectional circle.
259
Electronics 2023, 12, 3593
According to the relative positional relationship, the semi-axes a and b of the ellipse
can be expressed as follows:
c
a= , b = c 1 − tan Bj tan θ (4)
cos Bj
where dFf , dFY , and dFN are the frictional force element, cutting force element, and normal
pressure element on the pick surface, respectively.
After integrating Equation (6), the total horizontal force on the conical surface in
interaction between the pick and the coal-rock mass is obtained:
( H (sin θ +μ cos θ ) ( 2π ( a0
FY = dFY = 2tq (cos θ −μ sin θ ) sin θ
dϕ 0 dr
0
H (sin θ + μ cos θ ) cos θ + cos(θ + Bj ) (7)
= 2πtq c (cos θ − μ sin θ ) sin θ cos θ cos Bj
As can be seen from Equation (7), the cutting resistance of the pick in the rotating
cutting condition is a quadratic function of its cutting thickness, it is directly proportional
to the square of the tensile stress of the coal-rock mass and the ratio of the compressive
strength, and it has a complex trigonometric function relationship with the cutting angle.
The traction resistance of the pick is about (0.5–0.8) FY , and the lateral force is about
(0.1–0.2) FY .
260
Electronics 2023, 12, 3593
Figure 2. Trend of pick wear. (a) Slight wear, (b) Moderate wear, (c) Severe wear.
From the above analysis, it can be seen that for the wear of the pick tip, the contact
area between the pick and the coal-rock is the main influencing factor. The larger the
area of contact between the pick and the coal-rock, the more wear, that is, the more the
area of contact between the pick tip and the coal-rock can reflect the wear amount of the
pick. Therefore, without considering the self-rotation ability of the pick during the cutting
process, the wear coefficient η of a single pick can be represented by the following equation:
S L
η= ≈ (8)
Slim Llim
where S is the contact area with the coal-rock, Slim is the limited contact area with the
coal-rock, L is the layer cutting thickness, and Llim is the limited layer cutting thickness.
Based on this, this article proposes to establish an overall wear state coefficient H based on
the wear coefficient η of a single pick and use it to evaluate the overall wear situation of
picks, thereby achieving a method of quickly obtaining the overall wear degree of picks
during coal-mine production.
N N
H= ∑ Si ηi / ∑ Si (9)
i =1 i =1
where Si is the total number of each type of pick and ηi is the single pick wear rate.
According to the field pick replacement experience of engineering cases, before and after
pick replacement, the overall picks wear state can be divided into three levels: slight wear,
moderate wear, and severe wear. The range of the overall wear coefficient H corresponding
to the determined various wear states is shown in Table 1.
261
Electronics 2023, 12, 3593
signals at each scale, and form a feature parameter group. The principle of extracting pick
wear features with wavelet packets is as follows:
The expression of the wavelet packet function is
j +1
μnj+1,k (t) = 2 2 μ n (2 j +1 t − k ) (10)
where j is the scale parameter, n is the oscillation parameter, k is the translation parameter,
and t is the time variable.
The wavelet packet function satisfies the double scale equation:
⎧ √
⎨ μ2n (t) = 2 ∑ h(k )μn (2t − k)
⎪
√k∈ Z (11)
⎩μ2n+1 (t) = 2 ∑ g(k )μn (2t − k )
⎪
k∈ Z
In the formula, h(k ) is the coefficient of the low-pass filter, g(k) is the coefficient of the
high-pass filter, and {μn (t)}n∈ Z is the orthogonal wavelet packet.
The projection of the original parameter x (t) signal on {μn (t)}n∈ Z , that is, the wavelet
packet coefficient is
+∞
dkj = x (t) · μnj+1 (t)dt (12)
−∞
The algorithm of wavelet packet decomposition is
⎧ 2n
⎨ d j (k) = ∑ h(l − 2k)dnj+1 (l )
+1 (13)
⎩d2n
j (k) = ∑ g(l − 2k)dnj+1 (l )
This article relies on engineering examples to process the cutting vibration signal,
compares the node wavelet packet coefficients of the signal with other signal feature pa-
rameters, and finally proposes to use the standard deviation of the vibration signal wavelet
packet coefficients as a recognition indicator to identify the overall wear state of picks.
The feature vector definition for the overall wear state identification of picks is:
)
* n
*1 − 2
T ( x (t), j, r ) = + ∑ [d j,r (k ) − d j,r ] (14)
n k =1
where T ( x (t), j, r ) is the standard deviation of the wavelet packet coefficients of signal
x (t) at the node ( j, r ), d j,r (k ) is the k-th wavelet packet coefficient of the signal x (t) at the
−
node ( j, r ), and d j,r is the average of the wavelet packet coefficients of the signal x (t) at the
node ( j, r ).
4. Methods
The section illustrates the structure and characteristics of the LSTM neural network,
proposes a Bayesian-LSTM neural network optimized by the Bayesian algorithm, and also
provides the process of an overall picks wear state recognition model based on wavelet
packet decomposition and Bayesian-LSTM.
262
Electronics 2023, 12, 3593
data used in this paper are a kind of time series data, which reflect the change trend in
the wear condition of the picks over time. Therefore, this paper chooses to use the LSTM
network for the state recognition of time series.
Within the architecture of an LSTM model, every unit holds a cell, which essentially
acts as its memory store. The manner in which memory units in an LSTM are read and
modified is controlled by three critical components: the input gate, the forget gate, and the
output gate. Typically, sigmoid or tanh functions depict their operations. To illustrate, the
operational process of an LSTM unit proceeds as such: at each interval, it absorbs two forms
of external data-the current state and the preceding LSTM’s hidden state. Additionally, an
internal input, the state of the memory unit, is also fed to each gate. Following the receipt
of these input data, the gates compute the data from diverse sources, and the outcomes
determine their activation status. The input gate’s input is manipulated via a nonlinear
function, which then amalgamates with the memory unit state that the forget gate has
handled, creating a novel memory unit state. Ultimately, the memory unit state, after
being processed by a nonlinear function and dynamically managed by the output gate,
becomes the LSTM unit’s output. As a result, LSTM networks possess the capacity to retain
long-term dependencies as they can selectively eliminate certain data, maintain beneficial
information, and relay it to the subsequent step via the output gate. The basic structure of
LSTM is shown in Figure 3.
The data transfer within the LSTM neural unit follows these equations:
263
Electronics 2023, 12, 3593
where p(θ|y0 ) is the posterior distribution, p(θ) is the prior distribution, p(y0 ) is the
evidence or normalization constant, and its calculation formula is as follows:
In this equation, L(Λ) is the Evidence Lower Bound (ELBO). It can be seen that
the smaller the divergence, the greater the variational lower bound, indicating that the
variational distribution is closer to the original distribution. The maximum value of the
evidence lower bound is obtained to obtain the optimal distribution.
The idea of variational inference is to follow the gradient of variational parameters,
express the gradient as an expected value, and use the Monte Carlo method to estimate this
expectation. An unbiased gradient estimation is obtained by sampling from the variational
distribution, which saves the analytical calculation of the variational lower bound. The
objective function of variational inference is:
In this equation, Eq is the expectation about (q(θ|Λ), and p(y0 , θ) is the joint distribu-
tion of y0 and θ.
If Λ is the free parameter of q(θ|Λ), the gradient of the lower bound of the distribution
can be expressed as:
p(y0 ,θ)
∇( L(Λ)) = Eq(Λ) [∇ ln q(Λ) ln ] (24)
q(Λ)
According to the Monte Carlo sampling method, the gradient of the variational lower
bound is
1 p(y0 ,θ)
∇( L(Λ)) = ∑ N [∇ ln q(Λ) ln ] (25)
N i =1 q(Λ)
Therefore, for stochastic variational inference, the execution process of variational
inference is
1 p(y0 ,θ)
Λt+1 = Λt + ρt ∑ N [∇ ln q(Λ) ln ] (26)
N i =1 q(Λ)
where ρt is the learning rate. When the change in the free parameters Λ is less than a
given tolerance, the calculation stops. Based on the inferred network weights and bias
264
Electronics 2023, 12, 3593
parameters from the posterior distribution, the network can continue to train according to
the LSTM algorithm.
In summary, the proposed picks wear state recognition model is shown in Figure 4,
and the process of the picks wear state recognition model based on wavelet packet decom-
position and Bayesian optimization of LSTM is as follows:
(1) Use wavelet packet decomposition to decompose the original signal of cutting vibra-
tion, and choose the standard deviation of wavelet packet coefficients as the feature
signal of the neural network;
(2) Establish the parameter seeking model of the LSTM network, and use Bayesian
optimization theory to optimally seek parameters for the initial parameters of the
LSTM network;
(3) Build and initialize the LSTM and fully connected layer network based on the param-
eter seeking result, and set the hyperparameters of the network;
(4) Train the network on the sample training set, and use the trained network to perform
classification testing on the test samples.
5. Engineering Verification
To verify the effectiveness and efficiency of the proposed overall picks wear state recog-
nition model, extensive engineering experiments on real datasets have been implemented
against the classic methods under differently labeled rations.
265
Electronics 2023, 12, 3593
the overall wear coefficient H of the picks on drum cutting according to Formula (9), it was
found that if the picks were not replaced for 2 days, the overall wear coefficient could reach
0.3. If extended to more than 3 days, the wear coefficient could reach 0.5. This preliminarily
proved that delaying the replacement of picks can accelerate their wear. Therefore, the
field collected the X, Y, and Z directional vibration acceleration of the cutting drum of the
bolter miner immediately after replacing the picks, and 2 and 3 days later, respectively
representing slight wear, moderate wear, and severe wear levels. The vibration sensor
location is shown in Figure 5.
During the field tests, 2 s of X, Y, and Z directional vibration data were recorded every
minute under each working condition, ensuring the data collection was in a stable state.
In total, 100 sets of data were recorded under each working condition, with a total of two
detection tests conducted, forming a total of 600 sets of characteristic data. The collected
Y-directional raw vibration data are shown in Figure 6.
Figure 6. Y-directional raw vibration data. (a) Slight wear, (b) Moderate wear, (c) Severe wear.
266
Electronics 2023, 12, 3593
(3, 5), (3, 6), and (3, 7). Figure 7 shows the wavelet packet decomposition diagram of the
Y-directional vibration signals of picks with moderate wear.
(a) (e)
(b) (f)
(c) (g)
(d) (h)
Figure 7. Wavelet packet decomposition diagram of Y-directional vibration signals for picks with
moderate wear. (a) Coefficients of Packet (3, 0), (b) Coefficients of Packet (3, 1), (c) Coefficients of
Packet (3, 2), (d) Coefficients of Packet (3, 3), (e) Coefficients of Packet (3, 4), (f) Coefficients of Packet
(3, 5), (g) Coefficients of Packet (3, 6), (h) Coefficients of Packet (3, 7).
267
Electronics 2023, 12, 3593
Table 2. Standard deviations of wavelet packet coefficients for each picks wear state.
xt − xmin
xt∗ = (27)
xmax − xmin
Hyperparameter Settings
Hidden layer 6
Learn rate 0.001
Epoch 1000
Sample Num 10
To verify the recognition effect of the Bayesian-LSTM network, a deep LSTM network is
chosen for comparison analysis. In the deep LSTM network, the settings of hyperparameters
such as the learning rate and the number of hidden layers are consistent. The training
results of the two networks are shown in Figure 8.
As shown in Figure 8, with the increase in iterations, the Bayesian-LSTM network
decreases very quickly. Compared to the standard LSTM network, it achieves a higher
accuracy at a faster rate and the accuracy at each measurement point is higher than that of
the standard LSTM. To quantify the accuracy of the two prediction models, a comparison
of their accuracy rates is shown in Figure 9.
As can be seen from the above figures, given a certain set of hyperparameters, the
classification accuracy of the Bayesian-LSTM model is 98.33%, while the LSTM model’s
classification accuracy is 89.16%. In the aforementioned conclusions, the recognition
accuracy of the LSTM model is relatively low, which is because the weight parameters
of the LSTM model are fixed and have not yet been optimized. If we use the Adam
algorithm [39] to update and iterate the weights, and use Softmax as the classifier, the final
set of hyperparameters for the optimized LSTM recognition model would be as given in
Table 4 below.
268
Electronics 2023, 12, 3593
The final confusion matrix of the LSTM network optimized by Adam is shown
in Figure 10.
In order to further verify the accuracy and generalization ability of the Bayesian-LSTM
deep learning network in the recognition of the picks wear state, the obtained results are
compared with the classification results of the optimized LSTM and BP networks. The
comparison results are as shown in Table 5 below.
Hyperparameter Settings
Input size 24
Classification No. 3
Hidden layer 10
Learn rate 0.01
Epoch 500
Dropout 0.1
Optimizer Adam
269
Electronics 2023, 12, 3593
Table 5. Comparison of the accuracy of overall picks wear recognition under different algorithms.
From the table, it can be seen that the recognition accuracy of optimized LSTM and
Bayesian-LSTM are higher than that of the BP network, proving that deep learning networks
have a better accuracy when dealing with nonlinear data. However, in the macro view,
the classification accuracy of the LSTM network on small-sample data is not ideal and has
certain limitations. When the Bayesian theory is introduced, the Bayesian-LSTM model
effectively reduces the model overfitting caused by sparse data and noise and provides an
uncertainty quantification for prediction, effectively improving its recognition accuracy.
6. Conclusions
Accurate identification of the overall picks wear state is a core task in achieving in-
telligent upgrades of mining equipment. This study utilized theoretical analysis methods
to research the mechanical model of the interaction between the pick and the coal-rock,
preliminarily revealing the wear mechanism of the cutting picks. Based on this, we pro-
posed a classification judgment method for three types of overall pick wear state. This
study proposed an overall picks wear state recognition method based on Bayesian-LSTM.
Using the vibration signals of the bolter miner’s cutting drum as the basis for recognition,
we used status labels and feature matrices to train the recognition model. The trained
Bayesian-LSTM recognition model can effectively recognize the overall picks wear state.
Compared to deep LSTM and BP, this method has a higher recognition accuracy.
In conclusion, this method only requires processing and analyzing equipment parame-
ters automatically collected by mining machinery such as a bolter miner during its working
process. It has the advantages of being easy to implement, low-cost, and highly accurate,
providing a basis for the correct pick replacement strategy. However, there are still several
challenging issues in theoretical and practical research, and recommended research works
in the future as follows:
(1) It is necessary to research more feature parameters that can reflect the wear state of picks,
such as current data on the cutting motor and pressure signals of the hydraulic cylinder.
(2) It is essential to study further efficient signal processing methods that can reduce the
data disturbance caused by coal-mine scenes and to further improve the accuracy of
picks wear state recognition.
270
Electronics 2023, 12, 3593
Author Contributions: Conceptualization, D.S. and Y.Z.; writing-original draft preparation, D.S.;
visualization, D.S. and Y.Z.; project administration, D.S.; funding acquisition, D.S. All authors have
read and agreed to the published version of the manuscript.
Funding: This research was funded by the China National Key R&D Program (Grant No. 2020YFB1314000)
and the Research Project Supported by Shanxi Scholarship Council of China (Grant No. 2022-186).
Data Availability Statement: The data used to support the findings of this study are available from
the corresponding author upon request.
Acknowledgments: The partial study was completed at the National Engineering Laboratory for
Coal Mining and Excavation Machinery Equipment, and the author would like to thank the laboratory
for its assistance.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Dogruoz, C.; Bolukbasi, N.; Rostami, J.; Acar, C. An experimental study of cutting performances of worn picks. Rock Mech. Rock Eng.
2016, 49, 213–224. [CrossRef]
2. Holmberg, K.; Kivikytö-Reponen, P.; Härkisaari, P.; Valtonen, K.; Erdemir, A. Global energy consumption due to friction and
wear in the mining industry. Tribol. Int. 2017, 115, 116–139. [CrossRef]
3. Liu, S.; Ji, H.; Liu, X.; Jiang, H. Experimental research on wear of conical pick interacting with coal-rock. Eng. Fail. Anal. 2017,
74, 172–187. [CrossRef]
4. Zhao, L.; He, J.; Hu, J.; Liu, W. Effect of pick arrangement on the load of shearer in the thin coal seam. J. China Coal Soc. 2011,
36, 1401–1406.
5. Krauze, K.; Mucha, K.; Wydro, T.; Pieczora, E. Functional and operational requirements to be fulfilled by conical picks regarding
their wear rate and investment costs. Energies 2021, 14, 3696. [CrossRef]
6. Dewangan, S.; Chattopadhyaya, S. Characterization of wear mechanisms in distorted conical picks after coal cutting.
Rock Mech. Rock Eng. 2016, 49, 225–242. [CrossRef]
7. Zhang, Q.; Fan, Q.; Gao, H.; Wu, Y.; Xu, F. A study on pick cutting properties with full-scale rotary cutting experiments and
numerical simulations. PLoS ONE 2022, 17, e0266872. [CrossRef]
8. Qin, Y.; Zhang, X.; Zeng, J.; Shi, G.; Wu, B. Reliability analysis of mining machinery pick subject to competing failure processes
with continuous shock and changing rate degradation. IEEE Trans. Reliab. 2022, 72, 795–807. [CrossRef]
9. Tian, Y.; Wei, X.; Hao, T.; Jiayao, Z. Study on wear degradation mechanism of roadheader pick. Coal Sci. Technol. 2019, 47, 129–134.
10. Zhang, Q.; Gu, J.; Liu, J.; Liu, Z.; Tian, Y. Pick wear condition identification based on wavelet packet and SOM neural network.
J. China Coal Soc. 2018, 43, 2077–2083.
11. Zhang, Q.; Zhang, X.; Tian, Y.; Liu, Z. Research on recognition of pick cutting wear degree based on LVQ neural network.
Chin. J. Sens. Actuators 2018, 31, 1721–1726.
12. Zhang, Q.; Yu, W.; Wang, C. Research on identification of pick wear degree of road header based on PNN neural network.
Coal Sci. Technol. 2019, 47, 37–44.
13. Jin, L.; Cao, Y.; Qi, Y.; Yu, T.; Gu, J.; Zhang, Q. Identification of pick wear state based on acoustic emission and DS evidence theory.
Coal Sci. Technol. 2020, 48, 120–128.
14. Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef] [PubMed]
15. Su, H.; Qi, W.; Hu, Y.; Sandoval, J.; Zhang, L.; Schmirander, Y.; Chen, G.; Aliverti, A.; Knoll, A.; Ferrigno, G.; et al. Towards
mode-free tool dynamic identification and calibration using multi-layer neural network. Sensors 2019, 19, 3636. [CrossRef]
16. Liu, X.; Jing, W.; Zhou, M.; Li, Y. Multi-scale feature fusion for coal-rock recognition based on completed local binary pattern and
convolution neural network. Entropy 2019, 21, 622. [CrossRef]
17. Achmad, P.; Ryo, F.; Hideki, A. Image based identification of cutting tools in turning-mil1ing machines. J. Jpn. Soc. Precis. Eng.
2019, 85, 159–166.
18. Huang, Z.; Zhu, J.; Lei, J.; Li, X.; Tian, F. Tool wear predicting based on multi-domain feature fusion by deep convolutional neural
network in milling operations. J. Intell. Manuf. 2020, 31, 953–966. [CrossRef]
19. Ma, J.; Luo, D.; Liao, X.; Zhang, Z.; Huang, Y.; Lu, J. Tool wear mechanism and prediction in milling TC18 titanium alloy using
deep learning. Measurement 2021, 173, 108554. [CrossRef]
20. Wu, F.; Nong, H.; Ma, C. Tool wear prediction method based on particle swarm optimization long and short time memory model.
J. Jilin Univ. 2023, 53, 989–997.
21. Abdeljaber, O.; Avci, O.; Kiranyaz, S.; Gabbouj, M.; Inman, D.J. Real-time vibration-based structural damage detection using
one-dimensional convolutional neural networks. J. Sound Vib. 2017, 388, 154–170. [CrossRef]
22. Yuan, X.; Tanksley, D.; Li, L.; Zhang, H.; Chen, G.; Wunsch, D. Faster post-earthquake damage assessment based on 1D
convolutional neural networks. Appl. Sci. 2021, 11, 9844. [CrossRef]
23. Abdoli, S.; Cardinal, P.; Koerich, A. End-to-end environmental sound classification using a 1D convolutional neural network.
Expert Syst. Appl. 2019, 136, 252–263. [CrossRef]
271
Electronics 2023, 12, 3593
24. Zhu, Q.; Li, H.; Wang, Z.; Chen, J.F.; Wang, B.J.P.S.T. Short-term wind power forecasting based on LSTM. Power Syst. Technol. 2017,
41, 3797–3802.
25. Brili, N.; Ficko, M.; Klanènik, S. Automatic identification of tool wear based on thermography and a convolutional neural network
during the turning process. Sensors 2021, 21, 1917. [CrossRef] [PubMed]
26. Casado-Vara, R.; Martin del Rey, A.; Pérez-Palau, D.; de-la-Fuente-Valentín, L.; Corchado, J.M. Web traffic time series forecasting
using LSTM neural networks with distributed asy nchronous training. Mathematics 2021, 9, 421. [CrossRef]
27. Yang, T.; Chen, J.; Deng, H.; Lu, Y. UAV abnormal state detection model based on timestamp slice and multi-separable CNN.
Electronics 2023, 12, 1299. [CrossRef]
28. Bie, F.; Du, T.; Lyu, F.; Pang, M.; Guo, Y. An integrated approach based on improved CEEMDAN and LSTM deep learning neural
network for fault diagnosis of reciprocating pump. IEEE Access 2021, 9, 23301–23310. [CrossRef]
29. Marani, M.; Zeinali, M.; Songmene, V.; Mechefske, C.K. Tool wear prediction in high-speed turning of a steel alloy using long
short-term memory modelling. Measurement 2021, 177, 109329. [CrossRef]
30. Najafi, M.; Jalali, S.M.E.; KhaloKakaie, R.; Forouhandeh, F. Prediction of cavity growth rate during underground coal gasification
using multiple regression analysis. Int. J. Coal Sci. Technol. 2015, 2, 318–324. [CrossRef]
31. Gers, F.; Sshmidhuber, J.; Cummins, F. Learning to forget: Continual prediction with LSTM. Neural Comput. 2000, 12, 2451–2471.
[CrossRef] [PubMed]
32. Schoot, R.; Depaoli, S.; King, R.; Kramer, B.; Märtens, K.; Tadesse, M.G.; Vannucci, M.; Gelman, A.; Veen, D.; Willemsen, J.; et al.
Bayesian statistics and modelling. Nat. Rev. Methods Primers 2021, 1, 1. [CrossRef]
33. Song, Y.; Zhang, J.; Zhao, X.; Wang, J. An accelerator for semi-supervised classification with granulation selection. Electronics
2023, 12, 2239. [CrossRef]
34. Li, D.; Marshall, L.; Liang, Z.; Sharma, A.; Zhou, Y. Bayesian LSTM with stochastic variational inference for estimating model
uncertainty in process-based hydrological models. Water Resour. Res. 2021, 57, e2021WR029772. [CrossRef]
35. Yang, L.; Xu, M.; Guo, Y.; Deng, X.; Gao, F.; Guan, Z. Hierarchical Bayesian LSTM for head trajectory prediction on omnidirectional
images. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7563–7580. [CrossRef] [PubMed]
36. Evans, I. A theory of the cutting force for point-attack picks. Int. J. Rock Mech. Min. Sci. 1984, 2, 67–71. [CrossRef]
37. Hocheiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef]
38. Wu, X.; Marshall, L.; Sharma, A. The influence of data transformations in simulating total suspended solids using Bayesian
inference. Environ. Model. Softw. 2019, 121, 104493. [CrossRef]
39. Kingma, D.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
272
electronics
Article
Improving Question Answering over Knowledge Graphs with a
Chunked Learning Network
Zicheng Zuo 1 , Zhenfang Zhu 1 , Wenqing Wu 2 , Wenling Wang 3 , Jiangtao Qi 1 and Linghui Zhong 1, *
1 School of Information Science and Electrical Engineering, Shandong Jiao Tong University, Jinan 250104, China;
[email protected] (Z.Z.)
2 School of Economic and Management, Nanjing University of Science and Technology, Nanjing 210094, China
3 Chinese Lexicography Research Center, Lu Dong University, Yantai 264025, China
* Correspondence: [email protected]
Abstract: The objective of knowledge graph question answering is to assist users in answering ques-
tions by utilizing the information stored within the graph. Users are not required to comprehend the
underlying data structure. This is a difficult task because, on the one hand, correctly understanding
the semantics of a problem is difficult for machines. On the other hand, the growing knowledge graph
will inevitably lead to information retrieval errors. Specifically, the question-answering task has three
difficulties: word abbreviation, object complement, and entity ambiguity. An object complement
means that different entities share the same predicate, and entity ambiguity means that words have
different meanings in different contexts. To solve these problems, we propose a novel method named
the Chunked Learning Network. It uses different models according to different scenarios to obtain a
vector representation of the topic entity and relation in the question. The answer entity representation
that yields the closest fact triplet, according to a joint distance metric, is returned as the answer. For
sentences with an object complement, we use dependency parsing to construct dependency relation-
ships between words to obtain more accurate vector representations. Experiments demonstrate the
effectiveness of our method.
For example, consider the question, “Which actors have won an Oscar and also starred in a
Christopher Nolan movie?” To answer, the KGQA system would need to reason through
the knowledge graph, identifying entities related to actors, Oscar awards, and Christopher
Nolan movies to find the correct answer.
The accurate understanding of question semantics and the effective filtering of inter-
fering information are essential for successfully answering questions in KGQA. Currently,
commonly used methods rely on semantic parsing [6–9] and information retrieval. The
core concept behind semantic parsing is the conversion of natural language into a sequence
of formal logical forms. Through the bottom-up analysis of logical forms, a logical form
that can express the semantics of the entire problem is obtained, and the corresponding
query sentence is used in the knowledge graph. This method is based on a relatively simple
statistical method and has a greater dependence on data. Most importantly, it cannot map
the relationship from natural language phrases to complex knowledge graphs. Secondly,
supervised learning is needed when obtaining answers. And we need to train a classifier
to score the generated logical form. To train such a powerful semantic parsing classifier,
a great deal of training data is necessary. Whether it is Freebase [1] or WebQuestion [6],
these two datasets have relatively few question and answer pairs. To address this issue,
Zhang et al. [10] proposed a structural information constraint, which applies the structural
information of the problem to path reasoning based on reinforcement. Zhen et al. [11]
adopted a complementary approach, integrating a broader information retrieval model
and a highly precise semantic parsing model, eliminating the need for manual template
intervention.
The information retrieval method [12–14] is used to extract entities from the question
and then search for the entities in the knowledge graph to obtain entity-centric subgraphs.
Any node or edge in the subgraph can be a candidate answer. By observing the question
and extracting information according to certain rules or templates, the feature vector of
the question is obtained and a classifier is established. Then, the candidate answers are
filtered by the feature vector of the input question to obtain the final answer [15]. However,
KGQA needs to perform a multi-hop search to obtain the target entity when faced with
missing inference chains. This makes the time and space complexity of the algorithm
grow exponentially.
In addition, the same word can have different meanings in different contexts. We call
this phenomenon entity ambiguity. For example, the meaning of an apple in Cook’s hand
and an apple in Newton’s head are completely different. In daily life, people are used to
using abbreviations instead of full names, such as Newton instead of Isaac Newton. This
causes the algorithm to obtain a narrower entity search space. The diversity of predicates
will produce a broader entity search space. When the same predicate connects different
entities, its representation will be different, which requires the algorithm to be more robust.
We solve the above difficulties in two ways: (1) By embedding entities and relation-
ships into the same vector space as the knowledge graph, we can naturally solve the
problems caused by abbreviations, because similar entities can learn the same vector repre-
sentation. And entities in different contexts will also obtain different vector representations.
(2) Through the application of dependency parsing, a connection is established between
entities and predicates. Following this, we incorporate the semantics of entities into the
predicates, resulting in distinct weights being assigned to the relationships between various
entities. We divide the question into two parts, the entity and the predicate, and then use
different neural network methods to deal with these two parts, so our method is called the
Chunked Learning Network (CLN).
This paper makes the following contributions:
• To address the distinctions in vector representation between entities and predicates, we
employ separate modules for learning entities and predicates when tackling a question;
• By utilizing dependency parsing, we establish connections between entities and
predicates, incorporating entity semantics into predicates to derive distinct weights
for their relationships;
274
Electronics 2023, 12, 3363
2. Related Work
2.1. Question Answering over Knowledge Graphs
The Austrian linguist Edgar W. Schneider is credited with coining the term “knowl-
edge graph” as early as 1972. In 2012, Google introduced their knowledge graph, which
incorporates DBpedia, Freebase, and other sources. KGQA utilizes triples stored in the
knowledge graph to answer natural language questions. Knowledge graphs usually repre-
sent knowledge in the form of triples. The general format of triples is (head entity, relation,
tail entity), such as (Olympic Winter Games, Host city and the number of sessions, Beijing
24th), where “Olympic Winter Games” is the head entity, “Beijing 24th” is the tail entity,
and “Host city and the number of sessions” is the relationship between the two entities.
We use the lowercase letters h, r, and t to represent the head entity, relation, and tail entity,
respectively, and (h, r, t) represents a triple in the knowledge graph. In previous work [16],
transforming a multi-constraint question into a multi-constraint query graph was proposed.
Since these multi-constraint rules require manual design and the rules are not scalable, this
method does not perform well with large-scale knowledge graphs. Bordes A et al. [17]
proposed a system that learns to answer questions using fewer multi-constraint rules to
improve scalability. It uses a low-dimensional space to project the subgraph generated
by the head entity for question answering. Then, it calculates the relevance score and
determines the final answer by sorting. Likewise, so as not to be constrained by manual
design rules, Bordes A et al. [18] developed a model that maps the questions to vector
feature representations. A similarity function is learned during training to score questions
and corresponding triples. The question is scored using all candidate triples at test time,
and the highest-scoring entity is selected as the answer. But the vector representation of the
question adopts a method similar to the bag-of-words model, which ignores the language
order of the question (for example, the expressions of the two questions “who is George
W. Bush’s father?” and “Whose father is George W. Bush?” obtained by this method are
the same, but the meanings of the two questions are obviously different). To focus on the
order of words in the question, Dai Z et al. [19] use a Bidirectional Gate Recurrent Unit [20]
(hereinafter referred to as Bi-GRU) to model the feature representation vector of the sen-
tence and convert a simple single-fact QA question analysis into probabilistic questions.
However, when the knowledge graph is incomplete, it is difficult to find the appropriate
answer through probability. Based on the latest graph representation technology, Sun
H et al. [21] described a method that extracts answers from subgraphs related to questions
and linked texts, and they obtained good results. When the knowledge graph is incom-
plete, this method is effective, but external knowledge is not always obtainable. Recently,
some works [22,23] used knowledge graph embedding to deal with question answering.
With knowledge graph embedding, the potential semantic information can be retained,
and the incompleteness of the knowledge graph can be handled. But the above methods
model the problem and candidate relations separately without considering the word-level
interactions between them, which may lead to local optimal results. Xie et al. [24] used
275
Electronics 2023, 12, 3363
a convolution-based topic entity extraction model to eliminate the noise problem in the
process of extracting entities. Qiu et al. [25] proposed a global–local attention relationship
detection model, using a local module to learn the features of word-level interactions and a
global module to capture the nonlinear relationship between the question and the candidate
relationship in the knowledge graph. Zhou et al. [26] proposed a deep fusion model based
on knowledge graph embedding, which combines topic detection and predicate matching
in a unified framework, where the model shares multiple parameters for joint training at
the same time.
276
Electronics 2023, 12, 3363
berne et al. [35] added a reordering step to existing paragraph retrieval methods. When
reordering, a ranking algorithm is used to calculate the question’s score, and syntactic
features are added to the question as weights. Arif et al. [36] used tree kernels (i.e., partial
tree kernels (PTKs), subtree kernels (STKs), and subset tree kernels (SSTKs)) to consider the
syntactic structure between them to solve the answer-reordering problem. Alberto et al. [37]
calculated the similarity between trees based on the number of substructures shared be-
tween two syntactic trees and used this similarity to identify problems related to a new
problem. To enhance downstream dependency analysis, a novel skeleton grammar has been
proposed [38], which effectively represents the high-level structure of intricate problems.
This lightweight formalization, along with a BERT-based parsing algorithm, contributes
to the improvement of the analysis. For question-answering tasks, we make an improved
dependency matrix better suited for concise and structured interrogatives by using it as the
input of the GCN.
f i = σ Wx f xi + Wh −
→
h i −1
+ bf (1)
f
ri = σ Wxr xi + Wh −
→ + br (2)
r h i −1
si = f i si−1 + (1 − f i ) Wh −
→
h i −1
(3)
s
277
Electronics 2023, 12, 3363
−
→
h i = r i g ( s i ) + (1 − r i ) x i (4)
where f i , ri , and si are the forget gate, reset gate, and internal state, respectively. σ and g()
represent activation functions, and denotes the Hadamard product.
Figure 1. Overview of CLN. The right half shows the components of the model, and the left half is a
simple schematic diagram of the knowledge graph.
By combining the hidden states from both the forward and − backward directions,
→ ← −
we obtain the concatenated representation, denoted by hi = hi ; hi . The Bi-SRU is
complemented by a convolutional neural network (hereinafter referred to as CNN) module,
which captures nearby contextual information that is in proximity to the entity. Equation
(5) shows the calculation process of the j-th feature map of the l-th layer.
$ %
L
clj = ReLU ∑ cil−1 ∗ klij + blj (5)
i =1
where cil −1 represents the i-th input of the (l − 1)-th layer (when l = 1, cil −1 = hi ), the
symbol ∗ represents the convolutional operation, klij represents the weight of convolution
kernel j corresponding to the i-th input feature, and blj is the bias of the convolution kernel.
In the network described in this paper, we employ the rectified linear unit (ReLU) to
compute feature maps.
After the convolution operation, we replace the pooling layer with an attention layer
and apply its result to hi . Equations (6) and (7) illustrate this process. The weight and
bias of this layer are denoted by w and b, respectively. In this way, not only can the
information of the entity be extracted, but the contextual information can also be preserved
to a certain extent.
exp(ci )
αi = L (6)
∑i=1 exp(ci )
ei = tanh weT αi hi + be (7)
The result ei is then used as the target vector of the i-th token, and Equation (8) repre-
sents using the average of the target vectors of all tokens as the predictive representation of
the entity.
1 L
e,h = ∑ eiT (8)
L i =1
278
Electronics 2023, 12, 3363
e,h represents the learned entity vector representation. We independently train this module
so that the vector representations of entities in sentences are as close as possible to the
representations of entities in triples. The head-entity-learning module of the CLN is
depicted in Figure 2.
L
g-il = ∑ Aij W l glj−1 (9)
j =1
$ %
g-il
g-il = ReLU + bl (10)
di + 1
279
Electronics 2023, 12, 3363
where glj−1 ∈ R2dh is the representation of the j-th token obtained from the previous
GCN layer (when l = 1, glj−1 = x j ), gil ∈ R2dh is the output of the current GCN layer,
di = ∑ Lj=1 Aij is the degree of the i-th token in the dependency tree, and W l and bl are the
weight and bias matrices in the GCN layers, respectively.
Similar to Equations (6)–(8), we fuse the output of the GCN layer to hi through
the attention mechanism as the representation of a single token, and the mean of all
representations is the vector representation of the relation. Equations (11)–(13) show the
details of the calculation process.
exp ∑iL=1 gil
βi = (11)
∑iL=1 exp ∑iL=1 gil
pi = tanh w Tp β i hi + b p (12)
L
1
p,l =
L ∑ piT (13)
i =1
280
Electronics 2023, 12, 3363
The entity and non-entity tokens (HEDentity and HEDnon ) obtained will be passed to the
answer selection module.
minimizeh,l,t∈C pl − p̂l 2 + β 1 eh − eˆh 2 + β 2 f (eh , pl ) − eˆt 2 − β 3 sim n(h), HEDentity − β 4 sim[n(l ), HEDnon ] (15)
As shown in Equation (15), pl and eh are the relation and entity embeddings in the
knowledge graph, respectively. The sim[ x, y] function measures the similarity between
two strings, and n( x ) returns the name of an entity or predicate. β 1 , β 2 , β 3 , and β 4 are
predefined weights used to balance the contribution of each item.
4.1. Datasets
The knowledge graphs and datasets used in the experiments can be downloaded
through public channels.
FB5M [1]: The data in Freebase contain a lot of topics and types of knowledge, includ-
ing information about humans, media, geographical locations, and so on. In our study, we
utilized FB5M, which is among the more expansive subsets of Freebase.
SimpleQuestions [7]: This dataset comprises over 10,000 Freebase-related questions,
with the issues within the dataset being summarized using facts and articles as references.
FB5M was employed as the knowledge graph in our study, and TransE was used for
knowledge graph embedding to learn entity and relation representations. The performance
of the model is measured by the accuracy of finding the ground truth.
Methods Accuracy
Cfo [19] 0.626
MemNNs [7] 0.639
AMPCNN [40] 0.672
Character-level [41] 0.703
KEQA [23] 0.749
Te-biltm [42] 0.751
CLN (ours) 0.753 (+11.4%)
Note: Since the Freebase API is no longer available, thanks to Huang et al. [23] for re-evaluating the Freebase-API-
based models of Cfo [19] and AMPCNN [40] for new results.
281
Electronics 2023, 12, 3363
282
Electronics 2023, 12, 3363
the use of more complex models increases the accuracy of relation learning by 0.4%. The
improvements of these two models have improved the final accuracy. We can see that there
is a statistically significant improvement over the baseline when both CLN modules exist
at the same time.
Figure 4. Result analysis of different modules. The horizontal axis refers to the words in the sentence,
the vertical axis refers to the vector representation of the word, and the right half of the sentence
represents the words that are padded to make the sentences the same length. The relationship-
learning module’s sentence representation with the CLN is represented by pink dots, while the “Bi
GRU+attention” sentence representation is depicted by blue dots.
We delve into the joint impact of semantic parsing and the GCN using accuracy as an
example. In Figure 5, we can see that in the initial stage of training, the accuracy of the
relation-learning module rises rapidly, thanks to the combined effect of semantic parsing
and the GCN. When the relationship between words in all sentences is constructed, the
change in accuracy is relatively smooth. This trend of the change can also be seen through
the change in the loss of the head-entity-learning module. But this method is only suitable
for relational construction, so we only use this feature in relation-learning models.
283
Electronics 2023, 12, 3363
From the loss curves of the two modules in Figure 6, we can see that the loss of the
head-entity-learning module decreases rapidly at the beginning of training and then tends
to be flat, which indicates that the module has achieved good performance. At the same
time, the relation-learning module loss drops rapidly and remains largely unchanged in the
following periods, indicating that when the relational construction of words in all sentences
is completed, other parts of the model can also support relational learning well.
284
Electronics 2023, 12, 3363
proposed component into two existing models: EmbedKGQA [22] and TransferNet [43].
In EmbedKGQA, we incorporated the results of the relation-learning module into the
inference module. For TransferNet, we introduced the outputs of the head-entity-learning
module and the relation-learning module into step t using an attention mechanism. The
results obtained are shown in Table 3.
Methods WebQuestionsSP
EmbedKGQA 66.6
EmbedKGQA + CLN 67
TransferNet 71.4
TransferNet + CLN 71.6
The diverse architectures and parameter settings of different models can lead to varia-
tions in the performance of the introduced component within each model. A component
that exhibits promising performance in one model may not achieve its optimal effectiveness
when placed in another model. Furthermore, the design and functionality of other com-
ponents within the model can also impact the performance of the introduced component.
If there is a close interaction or dependency between the other components in the model
and the specific component being introduced, placing that component in different models
may yield different effects on its performance. It is worth noting that both our proposed
model and EmbedKGQA leverage knowledge graph embeddings. Thanks to the shared
utilization of knowledge graph embeddings, which enhances the models’ ability to capture
semantic relationships and facilitate reasoning capabilities, the introduced component
exhibits enhanced effectiveness when integrated into our model and EmbedKGQA.
5. Conclusions
We propose a Chunked Learning Network for KGQA in this paper. The objective is to
address the challenge of machines struggling to comprehend the semantic meaning of a
question. The model incorporates the vector representation of entities and predicates into
the question by utilizing the knowledge graph embedding. It employs distinct processing
methods for different word types within the question. Words with similar meanings,
such as word abbreviations, exhibit similar vector representations within the vector space.
Additionally, the graph convolutional neural network assigns varying weights to capture
the dependency relationship between words, thereby enhancing the contextual impact
on each word. The experimental results demonstrate that our method enhances KGQA
accuracy on datasets, and the proposed components indicate a promising direction for
future research. However, our method currently falls short in entity recognition accuracy
and faces challenges in coping with the expanding knowledge graph. To overcome this
challenge, we plan to take into account the dynamic properties of the knowledge graph, as
they are frequently updated in real-world scenarios.
Author Contributions: Writing—original draft, Z.Z. (Zicheng Zuo); Writing—review & editing, Z.Z.
(Zhenfang Zhu), W.W. (Wenqing Wu), W.W. (Wenling Wang), J.Q. and L.Z. All authors have read and
agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: The data that support the findings of this study is openly available at
https://github.com/ZuoZicheng/CLN, accessed on 18 July 2023.
Conflicts of Interest: The authors declare no conflict of interest.
285
Electronics 2023, 12, 3363
References
1. Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T.; Taylor, J. Freebase: A collaboratively created graph database for structuring
human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, BC,
Canada, 9–12 June 2008; pp. 32–58.
2. Lehmann, J.; Isele, R.; Jakob, M.; Jentzsch, A.; Kontokostas, D.; Mendes, P.N.; Hellmann, S.; Morsey, M.; Van Kleef, P. ; Auer, S.
Dbpedia—A large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web. 2015, 6, 167–195. [CrossRef]
3. Fabian, M.; Gjergji, K.; Gerhard, W. Yago: A core of semantic knowledge unifying wordnet and wikipedia. In Proceedings of the
16th International World Wide Web Conference, Banff, AL, Canada, 8–12 May 2007; pp. 697–706.
4. Carlson, A.; Betteridge, J.; Kisiel, B.; Settles, B.; Hruschka, E.R.; Mitchell, T.M. Toward an architecture for never-ending language
learning. In Proceedings of the 24th AAAI Conference on Artificial Intelligence, Atlanta, GA, USA, 11–15 July 2010; pp. 317–330.
5. Cyganiak, R. A relational algebra for SPARQL. Digit. Media Syst. Lab. Lab. Bristol 2005, 35, 9.
6. Berant, J.; Chou, A.; Frostig, R.; Liang, P. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013
conference on empirical methods in natural language processing, Grand Hyatt Seattle, Seattle, WA, USA, 18–21 October 2013;
pp. 1533–1544.
7. Bordes, A.; Usunier, N.; Chopra, S.; Weston, J. Large-scale simple question answering with memory networks. arXiv 2015,
arXiv:1506.02075.
8. Gomes, J., Jr.; de Mello, R.C.; Ströele, V.; de Souza, J.F. A hereditary attentive template-based approach for complex Knowledge
Base Question Answering systems. Expert Syst. Appl. 2022, 205, 117725. [CrossRef]
9. Sui, Y.; Feng, S.; Zhang, H.; Cao, J.; Hu, L.; Zhu, N. Causality-aware Enhanced Model for Multi-hop Question Answering over
Knowledge Graphs. Knowl.-Based Syst. 2022, 250, 108943. [CrossRef]
10. Zhang, J.; Zhang, L.; Hui, B.; Tian, L. Improving complex knowledge base question answering via structural information learning.
Knowl.-Based Syst. 2022, 242, 108252. [CrossRef]
11. Zhen, S.; Yi, X.; Lin, Z.; Xiao, W.; Su, H.; Liu, Y. An integrated method of semantic parsing and information retrieval for knowledge
base question answering. In Proceedings of the China Conference on Knowledge Graph and Semantic Computing, Online,
25 August 2021; pp. 44–51.
12. Kim, Y.; Bang, S.; Sohn, J.; Kim, H. Question answering method for infrastructure damage information retrieval from textual data
using bidirectional encoder representations from transformers. Autom. Constr. 2022, 134, 104061. [CrossRef]
13. Alsubhi, K.; Jamal, A.; Alhothali, A. Deep learning-based approach for Arabic open domain question answering. PeerJ Comput.
Sci. 2022, 8, e952. [CrossRef]
14. Kim, E.; Yoon, H.; Lee, J.; Kim, M. Accurate and prompt answering framework based on customer reviews and question-answer
pairs. Expert Syst. Appl. 2022, 203, 117405. [CrossRef]
15. Yao, X.; Van Durme, B. Information extraction over structured data: Question answering with freebase. In Proceedings of the 52th
Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, USA, 23–25 June 2014; pp. 956–966.
16. Eberhard, D.; Voges, E. Digital single sideband detection for interferometric sensors. In Proceedings of the 26th International
Conference on Computational Linguistics, Osaka, Japan, 11–16 December 2016; pp. 2503–2514.
17. Bordes, A.; Chopra, S.; Weston, J. Question answering with subgraph embeddings. arXiv 2014, arXiv:1406.3676.
18. Bordes, A.; Weston, J.; Usunier, N. Open question answering with weakly supervised embedding models. In Proceedings of the
Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Nancy, France, 15–19 September 2014;
pp. 165–180.
19. Dai, Z.; Li, L.; Xu, W. Cfo: Conditional focused neural question answering with large-scale knowledge bases. arXiv 2016,
arXiv:1506.02075.
20. Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations
using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078.
21. Sun, H.; Dhingra, B.; Zaheer, M.; Mazaitis, K.; Salakhutdinov, R.; Cohen, W.W. Open domain question answering using early
fusion of knowledge bases and text. arXiv 2018, arXiv:1506.02075.
22. Saxena, A.; Tripathi, A.; Talukdar, P. Improving multi-hop question answering over knowledge graphs using knowledge base
embeddings. In Proceedings of the 58th Annual meeting Of the Association for Computational Linguistics, Online, 5–10 July
2020; pp. 4498–4507.
23. Huang, X.; Zhang, J.; Li, D.; Li, P. Knowledge graph embedding based question answering. In Proceedings of the 12th ACM
International Conference on Web Search and Data Mining, Melbourne, VIC, Australia, 11–15 February 2019; pp. 105–113.
24. Xie, Z.; Zeng, Z.; Zhou, G.; Wang, W. Topic enhanced deep structured semantic models for knowledge base question answering.
Sci. China Inf. Sci. 2017, 60, 1–15. [CrossRef]
25. Qiu, C.; Zhou, G.; Cai, Z.; Sogaard, A. A Global–Local Attentive Relation Detection Model for Knowledge-Based Question
Answering. IEEE Trans. Artif. Intell. 2021, 2, 200–212. [CrossRef]
26. Zhou, G.; Xie, Z.; Yu, Z.; Huang, J.X. DFM: A parameter-shared deep fused model for knowledge base question answering. Inf.
Sci. 2021, 547, 103–118. [CrossRef]
27. Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data.
Adv. Neural Inf. Process. Syst. 2013, 26.
286
Electronics 2023, 12, 3363
28. Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the 28th
AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014; Volume 28.
29. Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex embeddings for simple link prediction. In Proceedings of
the International Conference on Machine Learning, New York City, NY, USA, 19–24 June 2016; pp. 2071–2080.
30. Yang, B.; Yih, W.-T.; He, X.; Gao, J.; Deng, L. Embedding entities and relations for learning and inference in knowledge bases.
arXiv 2014, arXiv:1412.6575.
31. Sun, Z.; Deng, Z.-H.; Nie, J.-Y.; Tang, J. Rotate: Knowledge graph embedding by relational rotation in complex space. arXiv 2019,
arXiv:1902.10197.
32. Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings
of the 29th AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–29 January 2015; Volume 29.
33. Manning, C.D.; Surdeanu, M.; Bauer, J.; Finkel, J.R.; Bethard, S.; McClosky, D. The Stanford CoreNLP natural language processing
toolkit. In Proceedings of the 52th Annual Meeting of the Association for Computational Linguistics: System Demonstrations,
Baltimore, MD, USA, 22–27 June 2014; pp. 55–60.
34. Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907.
35. Sun, K.; Zhang, R.; Mensah, S.; Mao, Y.; Liu, X. Aspect-level sentiment analysis via convolution over dependency tree. In
Proceedings of the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 3–7 November 2019;
pp. 5679–5688.
36. Verberne, S.; Boves, L.W.j.; Oostdijk, N.H.J.; Coppen, P.A.J.M. Using syntactic information for improving why-question answering.
In Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, UK, 18–22 August 2008.
37. Arif, R.; Bashir, M. Question Answer Re-Ranking using Syntactic Relationship. In Proceedings of the 15th International Conference
on Open Source Systems and Technologies, Online, 1–15 December 2021; pp. 1–6.
38. Sun, Y.; Li, P.; Cheng, G.; Qu, Y. Skeleton parsing for complex question answering over knowledge bases. J. Web Semant. 2022, 72,
100698. [CrossRef]
39. Lei, T.; Zhang, Y.; Wang, S.I.; Dai, H.; Artzi, Y. Simple recurrent units for highly parallelizable recurrence. arXiv 2017,
arXiv:1709.02755.
40. Yin, W.; Yu, M.; Xiang, B.; Zhou, B.; Schütze, H. Simple question answering by attentive convolutional neural network. arXiv
2016, arXiv:1606.03391.
41. Golub, D.; He, X. Character-level question answering with attention. arXiv 2016, arXiv:1604.00727.
42. Li, J.; Qu, K.; Li, K.; Chen, Z.; Fang, S.; Yan, J. Knowledge graph question answering based on TE-BiLTM and knowledge graph
embedding. In Proceedings of the 5th International Conference on Innovation in Artificial Intelligence, Xiamen, China, 5–8 March
2021; pp. 164–169.
43. Shi, J.; Cao, S.; Hou, L.; Li, J.; Zhang, H. Transfernet: An effective and transparent framework for multi-hop question answering
over relation graph. arXiv 2021, arXiv:2104.07302.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
287
electronics
Ȭ
¡ ¢
ŗ ǰ ¢ Řǰ Șǰ ř ř
ŗ ¢ ǰ ¢ǰ ŜřŝŖŖŘǰ Dz
¢ȓ ǯǯ
Ř ǰ ¢ǰ ŜřŝŖŖŘǰ
ř ǰ ¢ǰ ŜřŝŖŖŘǰ Dz
¢ȓǯ ǯǯ ǻǯǯǼDz £ȓǯ ǯǯ ǻ ǯǯǼ
Ș DZ £¢ȓǯ ǯǯ
DZ ǰ ǰ ¢
ǯ ¢ Ȭ
ǯ ¢ ě¢
Ěǰ ¢ ¢ ǯ
ǰ Ȭ Ȭ
ǻǼǯ ǻǼ
Ȧ ¢ǯ ǰ
¡ ¢ Ȭ
ǰ ¡
ǯ
ǯ ǰ £ Ȭ ǻǼ
ǯ ǰ Ě
¢ ¢ ¢ǯ Ȭ
Ěǰ
Ě ǯ ¢ǰ Ě
¢ ǰ Ě ę ǯ ¡
ě¢ ¡
DZ ǰ ǯDz ǰ ǯDz ǰ ǯDz
ǯ
ǰ ǯ
Ȭ
¢ DZ ¡ Dz Dz ĚDz
¡ ¢ǯ
ŘŖŘřǰ ŗŘǰ řřřŘǯ ĴDZȦȦ
ǯȦŗŖǯřřşŖȦŗŘŗśřřřŘ
ǽŚǾǯ ǰ ę
¡
¢ Ȭ ǽśǾǯ ǰ Ȃ
Ě ǯ ǰ ¢
ǯ ¢ ǰ
ǽŜǾǯ ǰ ǰ
ǽŝǾǯ
ǰ ¢ ǰ
ǯ
¢ ¢ ǰ
¢£ ǯ ǰ ǰ Ȭ
¢ ¡ Ȭ ¡
ǯ ¢ ǻǼ
ǽŞǾ
ǯ ¢ǰ
¢ ǯ ¡ ǯ ǰ
¢ ǰ ¢ ǯ ǰ £
¢ Ěǯ ǰ
ě
¡ǯ ¡ ¢ ǰ ¡
Ě ǯ ę
ǰ Ĵ ¡ ǯ
Řǯ
ǰ ¢
ǽşǾ ¢ ¢ǯ ǰ Ȭ
¢ ǰ ǰ
Ȭ ¡ ǯ Ȭ
¢ Ȭ ¢ Ȭ
¢ ¡ Ȭ ǰ ǯ ǽŗŖǾ
¡ ¢ Ȭ
ȯ ¢ ǰ
¢ ¢
ǯ ǽŗŗǾ ¢ Ȭ
ę ǯ Ȭ
¢ ¢ ¢ ǯ Ȭ
¢ ¢ ¢
ǽŗŘǾ ¢£ ǰ ¡
ǰ ęǰ ę¢ ¢
ǰ ¡ ǯ
Ȭ ǽŗřǾ ¢ Ȭ¢
£ ęǯ
¢ £ ǰ ¢ ǽŝǾǯ
¢ ¢ Ȭ
DZ ǯ
ǻ Ȭ ¢Ǽ
ǻ Ǽǯ ǽŗŚǾ ¢£
ǰ ¢ ǰ
ęǰ ¢£ ę ǯ Ȃ ǯ ǽŗśǾ Ȭ
¢ Ȭ ¢
Ȭ ǰ
Ȭ ǯ
¡ ¢
¡ǰ ¡ ę ǻǼ ¢
290
ŘŖŘřǰ ŗŘǰ řřřŘ
¢ǯ
¢£ Ȃ Ȭ
ǰ ǯ ǽŗŜǾ Ȭ ¢
ǯ ¡ £
¡ ǰ Ȭ Ĵ
¢ǯ ¢Ȭ
ǽŗŝǾ ¢ ¢ Ȭ
ě ¢ ǯ ě ¢
¢ ¢ ǯ ǯ ǽŗŞǾ
Ȭ ¢ Ȭ
ǯ
ǰ
¡
ǯ Ě ¡ ǰ
Ȭ ¢ ǽŗşǾ Ȭę ¡Ȭ
ǰ ¡ Ě ¡
¢ ǯ
Ȭ ¢ ǻȬȬǼ ǽŘŖǾ ¢ ¡Ȭ
¡ ¢ ¢
ǯ ¢
ǰ Ȭ ĴȬ ǰ
ǽŘŗǾǯ ȬȬ ǽŘŘǾ Ȭ
¡ ě¢ǯ ¡Ȭ
ǰ ǽŘřȮŘŜǾ
¢ Ȭ ¢ǯ ǰ Ȭ
ǰ ¢ ¢ Ȭ
ǻ Ǽ ǽŘŝǾǯ ǰ
££¢ ¢ ¢ ǽŘŞȮřŖǾǯ
291
ŘŖŘřǰ ŗŘǰ řřřŘ
ŗǯ ǯ ȃȬȄ
¢ ¢ ¡ǯ ¢ǰ ȃȬ
Ȅ Ȃ ¢ ¢ǯ ǰ ¢ǯ
¢ǰ ȃȄ ¢ ¢ ǯ ǰ
ǽǻǼ ƽ ŖǯŝŞŖŞŘŘǾ ¢ ¢ £
−ŗ ŗǯ ¡ ¡ ȃѪ䘹Ҷᇎ⭘Ⲵ⽬⢙ǰ 㘼 ړ
ږȄ ǻ DZ
ڕ
ڔ
¢ǰ ¢Ȃ ړ
ږǯǼ ¢ǰ ¢
ڕ
ڔ
ȃ䘹 ǻǰ ǯǰ DZ ŖǯŘşŖŝŖśǼȄǰ ȃᇎ⭘Ⲵ ǻǰ ǯǰ DZ ŖǯŘŜŝŗŞŜǼȄǰ ȃ⽬⢙ ǻǰ ǯǰ DZ
ŖǯřŚŜŗŘŞǼȄǰ ȃ ǻǯǰ DZ −ŖǯŚŝŘŜŖŘǼȄ ȃړ
ږǻǰ DZ −ŖǯřśŚŜŚŘǼȄǯ
ڕ
ڔ
¢ ǰ ŖǯŖŝŜŝŝśǰ Ȭ
ǯ ǰ Ě ȃȄ ¢ ǯ
ŗǯ ¢
ǯ ǰ
£ ¢ ǯ
ǰ DZ ę¢ǰ Ȭ
Dz ¢ǰ
ĚDz ¢ǰ ¡
Ěǯ ǰ ę¢ǰ ¢ ¢
ǯ ¡
¢ ǰ Ȃ ¡ Ȭ
ǯ ǰ ¢ ǰ
¢ ¢ Ěǯ ¢ǰ
¡ ¡ Ě
¢ ǯ
ęǯ ǰ Ȭ
ę ǯ
Vi = BERTembd(wi ) ǻŘǼ
292
ŘŖŘřǰ ŗŘǰ řřřŘ
ę Śǯ ¢ǯ ǻ ǰ Ǽ £ ¢
ǯ ǻ ǰ Ǽ ∈ ǽ−ŗǰ ŗǾǯ ŗǰ
¢ǯ
D
∑ Vi ∗ Vj
i =1
Sim wi , w j = ' ' ǻřǼ
D
2
D 2
∑ (Vi ) ∗ ∑ Vj
i =1 i =1
ę śǯ ǰ Ȭǯ Ȭ ę ¢
ŗǯ Ȭ ƽ ǿ ŗ ǰ Ř ǰ dzǰ Ȁǰ ∈ Ƹǯ ǻ·Ǽ ǯ
t
" #
NW − Set = Rank (Sim wi , w j ) ǻŚǼ
ę Ŝǯ ¢ ¡ ǰ ǯ ŗ Ř Ȭ
ǯ ¡ǰ ¢ ŗ ǰ ¢ Ř ǯ
¢ǰ ¢ ŗ ǰ ¢ Ř ǯ
⎧ " # . /
⎨[ MES1 = w11 , . . . , w1x , ten(w1i ) = positive, i = 1, . . . , x ] ∩ [ MES2 = w21 , . . . , w2y , ten(w2i ) = negative, i = 1, . . . , y]
. / ǻśǼ
⎩[ MES = "w1 , . . . , w x #, ten(wi ) = negative, i = 1, . . . , x ] ∩ [ MES = w1 , . . . , wy , ten(wi ) = positive, i = 1, . . . , y]
1 1 1 1 2 2 2 2
θ θȬ Ȭ ǯ θ ∈ ǽŗǰ ¡Ǿ ǽŗǰ ¢Ǿǰ ¡ǰ ¢ ǀ ǯ ǻ·Ǽ
¢ǯ
ę ŝǯ ǯ ¢
¢ ȯ ǻǼǯ
0
MES1 ( x > y)
RS = Dom( MES1 , MES2 ) = ǻŜǼ
MES2 ( x < y)
¡ ¢ ŗ ǰ ¢ ¢ Ř ǯ
293
ŘŖŘřǰ ŗŘǰ řřřŘ
294
ŘŖŘřǰ ŗŘǰ řřřŘ
řǯ ǯ ¡ ¡Ȭ
ǽǾ ǯ ǻŘǼǰ ¢
ǯ ǰ ǻǼ ǯ
¡ ǰ ǻŞǼǰ
¢ ǻşǼǯ
ȏ ǰ ǯ
Eemoji =: wi ȏw j : ǻşǼ
295
ŘŖŘřǰ ŗŘǰ řřřŘ
ȏ ¢ ¡ ǯ
ǯ
ǰ ǰ ǻŗŖǼǯ
řǯřǯ ¡
¡ ǯ
řǯŘǯŗǰ Ȃ
¢ ¢ ǯ Ě ǯ
ŗǯ ǰ
DZ ǰ ǰ
ǯ ¢ ǰ ǻŗŗǼǯ
C E emoji
S = [ w1 , . . . , w i , w1 , . . . , w k , e ] ǻŗŗǼ
296
ŘŖŘřǰ ŗŘǰ řřřŘ
Śǯ ¢ǰ
ǯ
śǯ ǰ ǰ
¢ ǰ Ȃ ¢ǯ
ǰ Ěǯ Ě ¡ǰ
ǯ ǰ
Ĵ Ě ǰ ę ǯ
ǯ
ǯ
ǯ ę Ě
ě Ěǯ
ę ŗŖǯ ǯ
¢ ę ǯ
ǯ ¡ǰ ƽ ǻŗ ǰ ǰ Ǽ ǻŘ ǰ ǰ
Ǽ ǻř ǰ ǰ Ǽǯ
Ě ę ¢ DZ ǻŗǼ Ȭ
ę Ě ǰ Ȭ
ǰ ȬȬ ȬȬ Ȭ
Dz ǻŘǼ Ě
ǰ ǰ Ȭ
Ȭ ȬȬDz ǻřǼ ¡ ¡
¢
ǰ Ě Ȭ
ǯ
śǯ
ǰ Ě ǰ Ȭ
ǯ ¢
¢ ǯ ¢
ǰ Ě ȏ ǻŗŜǼǯ ǻŗŜǼ
śǯ ę¢ǰ ŗ
297
ŘŖŘřǰ ŗŘǰ řřřŘ
Řǰ Ř řǰ ř ŗ řǯ ǰ
¢ ǰ Ś ǻŗŜǼ Řǯ
⎧
⎪
⎪ M 2 (sgn( M 1) > 0 sgn( M 2) < 0 sgn( M 3) < 0) (sgn( M 1) < 0 sgn( M 2) > 0 sgn( M 3) > 0)
⎪
⎪
⎨ M 3 (sgn( M 1) < 0 sgn( M 2) < 0 sgn( M 3) > 0) (sgn( M 1) > 0 sgn( M 2) > 0 sgn( M 3) < 0)
FlȏP = ǻŗŜǼ
⎪
⎪ M 1 & M 3 (sgn( M 1) > 0 sgn( M 2) < 0 sgn( M 3) > 0) (sgn( M 1) < 0 sgn( M 2) > 0 sgn( M 3) < 0)
⎪
⎪
⎩
M 2 (sgn( M 1) > 0 sgn( M 2) < 0)) (sgn( M 1) < 0 sgn( M 2) > 0))
Ȭ ǯ ǻ·Ǽ
ǯ
¡ ǯ Ȭ
Ě ě¢ǰ Ȭ
ǻǼ ¢ ¢ Ȭ
Ěǯ ǰ
Ě ǰ ǯ
ǰ
¢ǯ ǰ ǰ ¢ Ȭ
ǯ ¢ǰ
ǯ ¢
ǻŗŝǼǯ
R = length x + lengthy ǻŗŝǼ
¡ ǰ ¢ ǯ
ǰ
Ě ǯ Ě
ǰ ǰ ǰ ǰ
ǯ ǰ θ
¢
şŖ ǻŗŜǼǯ ǰ
Ě ǯ Ŝǰ
¡ ¡ Ě
Ě Ě
ǯ
ƽ ȃѪ䘹Ҷᇎ⭘Ⲵ⽬⢙ǰ 㘼 ړ ږȄ ǻDZ
ڕ
ڔ
¢ǰ ¢Ȃ ږ
ڕ
ڔ
ړ Ǽǰ ǰ ǯ ǰ
¢ Ĵ ǰ ǰ ǰ Ȭ
ǻŗŘǼ ǻŗřǼǯ ƽ ǽǾǽǾǽ Ǿǯ
£ ǰ ǰ
ǀ ǰ ǯ ¢ǰ ǰ
ǀ ǰ ǯ ǰ ¢ £
ǯ ǰ ǰ ǰ ǰ ¡
ǯ ǰ ¢ ǰ
ǰ ǰ ǰ ¡ ǯ
ǻŗŚǼ ǻŗśǼǯ ǰ ¢ǰ
ǻŗǼ ǰ ¢ ǻŗǼ
¢ǯ
ǰ Sit ǰ
Ȭ ǯ ¢ Sit ǰ
¢ Sit ǰ ǻŗŞǼǯ
∑ Sit
SK x = ǻŗŞǼ
ρx
ρ ¢ ǰ ¡ ǰ
ρ ρ¡ ǯ ǰ ρ ¢ ǰ
¡ ǰ ρ ρ ǯ
298
ŘŖŘřǰ ŗŘǰ řřřŘ
Ŝǯ Ȭ Ěǯ
ړ ږ¢ ǻşǼ ǻŗŖǼǰ ǯ
ڕ
ڔ
ǰ ǽǾ ¢ǰ ǽǾ ǯ ǰ
Ȃ ¢ ǯ
ǰ ǰ ǰ Ě ¢
ǻŗŜǼǯ Ě ¡ǰ Ȭ
ŗǰ
Řǰ řǯ ¢ǰ ¡
¢ ǰ ¡ ǯ
Řǰ Ȭ
Ěǯ
Řǯ Ě ǯ
¡ǻ·Ǽ ǻŗ ǰ Ř ǰ ř Ǽǰ
ρ ¢ Ȭ Ȭ ǯ
299
ŘŖŘřǰ ŗŘǰ řřřŘ
¢ǰ Ě Ȭ
ǰ £ ǻŘŖǼǯ
scorem f
sgn(m f ) ∗ e
FS = , inconsistent tendency ǻŘŖǼ
∑ escore M
ǻ·Ǽ ¡ ¢ǯ
Ě ǯ Ȭ
¡ ǰ
ǰ ŝǯ ǰ
ǰ ǯ ǰ ¢ Ĵ ǰ
ě ǯ ƽ ǽǾǽ Ǿǯ
£ǰ ǰ ǀ ǰ
ǯ ǰ £ ǯ
ǰ ǰ ǰ ǰ ¡ ǯ ǰ
¢ǰ `ǯ ǰ ¢ǰ
ǻŘǼ ǯ
ǰ Sit ǰ
Ȭ ǯ ¢ Sit
ǻŘŗǼǯ
∑ Sit
SK x = ǻŘŗǼ
ρx
ρ ¢ ǰ ¡ ǰ
ρ ρ¡ ǯ
300
ŘŖŘřǰ ŗŘǰ řřřŘ
Śǯ ¡
Śǯŗǯ
¡ ǰ Ȭ
¢ ǯ ǰ ¢ Ȭ
ŗŗŚǰŝŜŝ £ ǯ ¢ǰ
ǽ ǾǽǾǯ ǻ−ŝǰ ŝǼǯ
ǰ ¢ £ ǽ−ŗǰ ŗǾǯ
¡
ǯȦȦ ¡ ǰ
¡ ę ǯ ¡ śŖ
ǰ şŖř ¡ ǯ ¡ ¢ Ȭ
ǽǾǰ ǽǾǰ ǽǾǰ
¡ ƽ ǿǽǰ ǰ Ǿ ȩ ǽǰ Ǿ ȩ ǽǰ Ǿ ȩ ǽǰ ǾȀǰ řǯ
¡
Ѫ䘹Ҷᇎ⭘Ⲵ⽬⢙ǰ 㘼 ړ
ږ
ڕ
ڔ
ק
ש
ר
צ
Ǜ〟ᶱੁкǛᰕᆀ䘈㾱㔗㔝ǰ ᔰᗳ⛩Ƿ е
й
и
з
ж
ȓޫ➺⥛Ⲵᵾẁ ¢Ƿ Ѯ
ѱ
Ѱ
ѯ
301
ŘŖŘřǰ ŗŘǰ řřřŘ
śǰ ȃ¢Ȅ
ק
ש
ר
צǰ ȃ¢Ȅǰ
ę ȃ¢Ȅ ק
ש
ר
צǯ
śǯ ȁ¢ ¢ Ȃ ǻ ¡ Ǽǯ
|label − uw[score]|
T= × 100% ǻŘŘǼ
label
ǰ ¢ ¢ ¢
ǯ ǰ ¢
¢ǰ
¡ ǯ ¢ ǻŘřǼǯ
302
ŘŖŘřǰ ŗŘǰ řřřŘ
ǰ Ȭǯ ǽǾ
¢ ǯ ¡ ě
ǰ ƽ ǽřǰ śǰ ŝǰ şǰ ŗŗǰ ŗřǰ ŗśǾǯ
Şǰ řŖŖ ¢ ǰ ¢
ŝŚǯřƖ ŞśƖǯ şǰ ¢ ŞŘǯŝƖ
¢ ŝşǯřƖǯ Ȭ şǰ ¢
ǯ śŖŖ ¢ ǰ ¢ ŝŚǯŘƖ
ŞŚƖǯ şǰ ¢ ŞřǯŘƖ
¢ ŝŞƖǯ ǰ
¢ şǰ
ǯ
Şǯ řŖŖ śŖŖ ¢
¢ǯ ě Ȭ ǯ ¡
ě Ȭ ǯ
ǯ ¢ Ȭ
ǰ şǯ ǰ
£ Ȃ¢ ¢ Ȃ ¢ǰ Ȭ
ŘŗƖ ǯ ǰ ş ȁȂ
¢ ¢ǰ ¢ ŗǯŗřƖ ǯ ¢ǰ Ȭ
¢ ȃȄ ¢ǰ
¡¢ ŗśǯŗŞƖ ǯ
ש
ר
ק
צǽ¢Ǿǯ
ǽ¢Ǿ śǯ
Ŝǰ ړ
ږ¢ ǯ
ڕ
ڔ
ǯ ǰ
şŖř ¡ ŜŖŗ řŖŘ Ȭ
ǯ ǰ Ě ¡
ǯ ǰ ǯ ǰ ¢
şŞǯŜŝƖǯ
ǰ ¢ ¢
¢ ǯ ǯ
Ŗ ¢ǰ Ŗ ę Ȭ
¢ǯ ǰ ǯ
ǰ ¢ ŝǯ
ŗŖǰ ¢ ǻ DZ
䘹 ǻDZ ǼDz ᇎ⭘Ⲵ ǻDZ ǼDz ⽬⢙ ǻDZ Ǽǯ
DZ Ǽǯ
303
ŘŖŘřǰ ŗŘǰ řřřŘ
ŗŗǰ ¢ ǻǰ ǰ
ǰ Ǽ ¡ ǯ ǰ ǯ ǰ
¢ Şǯ
şǯ ǯ
Ŝǯ ǻ ¡ Ǽǯ
ŝǯ ¢ ǻ ¡ Ǽǯ
¢
ǻ DZ 䘹Ǽ ŖǯŘşŖŝŖś
ǻ DZ ᇎ⭘ⲴǼ ŖǯŘŜŝŗŞŜ
ǻ DZ ⽬⢙Ǽ ŖǯřŚŜŗŘş
−ŖǯŚŝŘŜŖř
304
ŘŖŘřǰ ŗŘǰ řřřŘ
ŗŖǯ ǯ ę
¢ ¡ ¢ ǯ
ǰ Ȭ ǻ DZ 䘹ǼDZ ǿ ǻǼǰ 䘹䍝 ǻǼǰ
ⴻѝ ǻ¢Ǽǰ 䘹ᇊ ǻ Ǽǰ ⬴䘹 ǻǼǰ 䍝Ҡ ǻǼǰ 䘹ᤙ ǻǼǰ 䘹ਆ ǻ ¡ǼȀDz
Ȭ ǻ DZ ᇎ⭘ⲴǼDZ ǿᇎᜐ ǻěǼǰ 㙀⭘ ǻǼǰ
ㆰঅ ǻ¢Ǽǰ ᯩ ׯǻǼǰ ᴹ⭘ ǻǼǰ ਸ㇇ ǻ Ǽǰ ࡂ㇇ ǻǼǰ ⴱ䫡 ǻǼǰ
ׯᇌ ǻǼȀDz Ȭ ǻ DZ ⽬⢙ǼDZ ǿ⽬૱ ǻǼǰ 䘱㔉
ǻ Ǽǰ 㿱䶒⽬ ǻ ¢ ǯ ę Ǽǰ ⭏ᰕ ǻ¢Ǽǰ 䍪⽬
ǻ¢ Ǽǰ 䍪 ǻ Ǽǰ 䘱⽬ ǻ Ǽǰ ᗳ ǻ Ǽǰ
䗷⭏ᰕ ǻ ¢ǼȀǯ
ŗŗǯ ǯ ę
¢ ¡ ǯ
305
ŘŖŘřǰ ŗŘǰ řřřŘ
Şǯ ¢ ǻ ¡ Ǽǯ
¢
ŖǯśŜŝŝřŘ
ŖǯŚŞŞśŞŚ
ŖǯŜŚŝŝśŖ
ŖǯŝŗŖŞŖŝ
ǻŘŘǼǰ
¢ ǯ ǻŗŜǼǰ Ě ǯ ¢ǰ
ǻŘŗǼǰ ǯ
şǯ
şǯ ě ǻ ¡ Ǽǯ
1
N ∑i
Loss = −[yi · log( pi ) + (1 − yi ) · log(1 − pi )] ǻŘŚǼ
ǰ ¢ Ȭ ǰ
¢ǯ
ǻDZ ǿȀ Ǽ
ŖǯŞŗŖŜ ŖǯşŝŖŖ ŖǯŞŜŝŗ ŖǯşŞŜŝ
śǯ
¢ Ĵ
¡ ¢ǯ
ǰ ¢ Ȭ
¢ǯ ǰ
306
ŘŖŘřǰ ŗŘǰ řřřŘ
ǯ ¢ ¢
¢ ǯ Ȃ
Ȭ ¢ ǰ
¡ ǯ ǰ ¢ Ȭ
ǰ ¡ ¢ ¡¢ śƖǯ
¢ǰ ǰ ¡
ǰ ¡ ¡¢ ǯ
ǰ ¡ ¢ ę ě¢
£ ¡ ǯ Ȭ
ǰ Ȭ
¢ £ ¢
ǯ ǰ
¢ǯ ǰ Ěǯ ǰ
ǰ £
¡ ǯ £
Ě ¢ ¡ǯ
Ěǰ ¢
¢ ǰ ¢ǯ
¢ ¢ ¡ şŞǯŜŝƖ Ȭ
¡ ¡ ¡ǯ ¢ǰ
Ȃ ¢
¡ ¢ǯ ǰ Ȭ
¢ ¢ ¢ ¡ǯ ǰ ¢
ǰ
ǯ ǰ ę Ȭ
ǰ ě ¢
ě ǰ ę
ǯ
ǰ ě
ę ǯ ǰ ¢ ęȬ
Ĵ ǯ ¢
£ ě ě
ę ǯ
DZ £ǰ ǯǯ ǯǯDz ¢ǰ ǯǯ ǯǯDz ǰ ǯǯ
ǯǯDz ǰ ǯǯ ǯǯDz ¢ǰ ǯǯDz ǰ ǯǯDz ǰ ǯǯDz
ǰ ǯǯ ǯǯDz ȯ ǰ ǯǯDz ȯ ǰ ǯǯDz
ǰ ǯǯ ǯ
DZ ¢ ¢
ǯ ŘŖŘŘ ŖřŘŘǰ ǻǯ ŘŖŘŖŖŗŖŗŖŖŖŗ ŘŖŘŗŖŗŖŗŖŖŖřǼ
¢ ǻǯ ŘŖŘŘȬřǼǯ
¢ DZ ǯ
Ě DZ Ě ǯ
ŗǯ ǰ ǯDz ǰ ǯ ¢DZ ¢ ǯ Ř
ǰ ǰ ǰ ǰ ŘřȮŘś ŘŖŖřDz ǯ ŝŖȮŝŝǯ
Řǯ ǰ ǯDz ǰ ǯ £ ǯ
¢ ǰ Ĵǰ ǰ ǰ ŘŘȮŘś ŘŖŖŚDz ǯ ŗŜŞȮŗŝŝǯ
řǯ ǰ ǯDz ǰ ǯ DZ ¡ £ ǯ ŘŖŖśǰ
DZŖśŖŜŖŝśǯ
307
ŘŖŘřǰ ŗŘǰ řřřŘ
Śǯ ǰ ǯǯDz ǰ ǯǯDz ǰ ǯǯ Ȭ ¢ ǯ ǯ ǯ ǯ ŘŖŗŖǰ
řŜǰ ŞŘřȮŞŚŞǯ ǽǾ
śǯ ¢ǰ ǯǯDz ¢ǰ ǯǯDz ¢ǰ ǯǯ ¡ Ȭ ¢ ¢ Ȃ
ǯ ǯ ǯ ǯ ǯ ŘŖŗşǰ ŗśǰ şŗȮŗŖŜǯ ǽǾ
Ŝǯ ǰ ǯDz Ȃǰ ǯDz ǰ ǯǯǯDz ǰ ǯ Ȭ Ĵ ę ǯ ǯ ǯ ¢ǯ ŘŖŗŞǰ
ŗŗŞǰ śŞŘǯ ǽǾ
ŝǯ ǰ ǯ ǯDz ǰ ǯDz ǰ ǯǯ DZ Ȭ
Ȭ ǯ Ĵ ǯ Ĵǯ ŘŖŘŘǰ ŗŜŖǰ ŗŗȮŗŞǯ ǽǾ
Şǯ ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯ
ǯ Ř ę ǰ ǰ ǰ
ǰ ŗŝȮŗş ŘŖŘŘǯ
şǯ ǰ ǯDz ¢ǰ ǯ Ĵ ¢ ǯ ř
ǰ Ȭ ǰ ǰ ǰ ŗşȮŘŖ
ŘŖŗśDz ǯ ŘŝşȮŘŞşǯ
ŗŖǯ ǰ ǯǯDz ǰ ǯǯDz £ǰ ǯ Ȭ ¢ Ȭ ¡ǯ ¡ ¢ǯ
ǯ ŘŖŘŖǰ ŗŚŞǰ ŗŗřŘřŚǯ ǽǾ
ŗŗǯ £ǰ ǯDz ǰ ǯǯ DZ ¡ ¢ ǯ
ǯȬ ¢ǯ ŘŖŗŝǰ ŗŘŘǰ ŗȮŗŜǯ ǽǾ
ŗŘǯ ¢ǰ ǯDz ¢ǰ ǯDz ǰ ǯDz ǰ ǯ Ȭ ¢ ǯ
ǯ ǯ ǯ ŘŖŘŖǯ ǽǾ
ŗřǯ ¢ǰ ǯDz ǰ ǯǯDz ǰ ǯǯDz ǰ ǯǯDz ǰ ǯDz ǰ ǯǯ ¢ DZ
ǯ ǯ ¢ ǯ ǯ ŘŖŘŘǰ śřǰ ŗŖŘśśŝǯ ǽǾ
ŗŚǯ ǰ ǯDz ǰ ǯ Ȭ ¢ ǯ ǯ ǯ ǯ ŘŖŘŗǰ řŖŖǰ ŚşřȮśŗřǯ
ǽǾ
ŗśǯ Ȃǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯ Ȭ ¢ǯ
¡ ¢ǯ ǯ ŘŖŘŖǰ ŗŚŖǰ ŗŗŘŞŝŗǯ
ŗŜǯ ǰ ǯǯDz ¢ǰ ǯDz ¢ǰ ǯ Ȭ ¢ ǯ ǯ ǯ
ŘŖŘŗǰ ŗŖŞǰ ŗŖŝŚŚŖǯ ǽǾ
ŗŝǯ ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯ ¢ ¢ ę Ȭ ǯ
¡ ¢ǯ ǯ ŘŖŗŝǰ ŝŘǰ ŘŘŗȮŘřŖǯ ǽǾ
ŗŞǯ ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯ Ȭ ¢ ě Ȭ
ǯ ǯȬ ¢ǯ ŘŖŘŘǰ Řřśǰ ŗŖŝŜŚřǯ ǽǾ
ŗşǯ ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯ Ȭ ¢ Ȭę ¡
ǯ ǯȬ ¢ǯ ŘŖŘŘǰ ŘŚřǰ ŗŖŞŚŝřǯ ǽǾ
ŘŖǯ ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯ Ȭ ¢
ǯ ǯ ǯ ¢ǯ ŘŖŘřǰ ŗŝǰ ŘŖřŝŗŜŖǯ ǽǾ
Řŗǯ ǰ ǯDz £ǰ ǯDz £¢ǰ ǯDz ǰ ǯ Ȭ ĴȬ Ȭ
¢ǯ Ȧ ȓ ǰ ǰ ǰ ŘŚȮŘş ŘŖŗşDz ǯ ŗŗȮŗŞǯ
ŘŘǯ ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯDz Ȭíǰ ǯ ę Ȭ
Ĵ ǯ ¡ ¢ǯ ǯ ŘŖŘřǰ ŘŘŗǰ ŗŗşŝřŖǯ ǽǾ
Řřǯ ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯ ¡ Ȭ ¢
ǯ ǯȬ ¢ǯ ŘŖŘřǰ Řśşǰ ŗŗŖŖŘśǯ ǽǾ
ŘŚǯ ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯ Ȭ ęǯ ǯ
ǯ ŘŖŘŘǰ ŜŖŖǰ ŝřȮşřǯ ǽǾ
Řśǯ ǰ ǯDz ǰ ǯǯ ¢ ¢ǯ ¡ ¢ǯ ǯ
ŘŖŘŘǰ ŗşśǰ ŗŗŜśŜŖǯ ǽǾ
ŘŜǯ ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯ DZ Ȭ ¢ ¢
ǯ ŘŖŘřǰ śŗŞǰ řŝřȮřŞřǯ ǽǾ
Řŝǯ ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯ Ȭ
ęǯ Ĵ ǯ Ĵǯ ŘŖŘřǰ ŗŜśǰ ŝŝȮŞřǯ ǽǾ
ŘŞǯ ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯDz ¢ǰ ǯǯDz ǰ ǯDz ǰ ǯ DZ ¡Ȭ ££¢
ǯ ǯ ££¢ ¢ǯ ŘŖŘŗǰ Řşǰ řŜşŜȮřŝŗŖǯ ǽǾ
Řşǯ ǰ ǯDz ǰ ǯDz ǰ ǯȬ ǯ ££¢ Ĝ Ȭ Ȭ
¢ǯ ǯ ££¢ ¢ǯ ŘŖŘřǰ řŗǰ ŗŚřŚȮŗŚŚŚǯ ǽǾ
řŖǯ ǰ ǯǯDz Ȭǰ ǯǯDz Ȭ ¢ǰ ǯǯDz Ȭ¢ǰ ǯǯ ǯDz Ȭǰ ǯ
¢ ¢ȬŘ ££¢ ȮȮ ¢ǯ ǯ ££¢ ¢ǯ ŘŖŘŖǰ Řşǰ ŘŝśȮŘŞśǯ
ǽǾ
308
ŘŖŘřǰ ŗŘǰ řřřŘ
řŗǯ ¢ǰ ǯǯ ǵ ę ǯ ŘŖŖŘǰ
DZŖŘŗŘŖřŘǯ
řŘǯ ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯDz ǰ ǯ ¢ Ȭ ęǯ ŘŖŗşǰ
DZŗşŖşǯŖŖŗŘŚǯ
ȦȂ DZ ǰ ¢ Ȭ
ǻǼ ǻǼ Ȧ ǻǼǯ Ȧ ǻǼ ¢ ¢ ¢
¢ ¢ ǰ ǰ ǯ
309
electronics
Article
An Enhancement Method in Few-Shot Scenarios for Intrusion
Detection in Smart Home Environments
Yajun Chen 1 , Junxiang Wang 1, *, Tao Yang 2 , Qinru Li 3 and Nahian Alom Nijhum 4
1 School of Electronic Information Engineering, China West Normal University, Nanchong 637001, China;
[email protected]
2 Education and Information Technology Center, China West Normal University, Nanchong 637001, China;
[email protected]
3 School of Computer Science, China West Normal University, Nanchong 637001, China; [email protected]
4 School of Software Engineering, China West Normal University, Nanchong 637001, China;
[email protected]
* Correspondence: [email protected]
Abstract: Different devices in the smart home environment are subject to different levels of attack.
Devices with lower attack frequencies confront difficulties in collecting attack data, which restricts
the ability to train intrusion detection models. Therefore, this paper presents a novel method called
EM-FEDE (enhancement method based on feature enhancement and data enhancement) to generate
adequate training data for expanding few-shot datasets. Training intrusion detection models with
an expanded dataset can enhance detection performance. Firstly, the EM-FEDE method adaptively
extends the features by analyzing the historical intrusion detection records of smart homes, achieving
format alignment of device data. Secondly, the EM-FEDE method performs data cleaning operations
to reduce noise and redundancy and uses a random sampling mechanism to ensure the diversity of
the few-shot data obtained by sampling. Finally, the processed sampling data is used as the input to
the CWGAN, and the loss between the generated and real data is calculated using the Wasserstein
distance. Based on this loss, the CWGAN is adjusted. Finally, the generator outputs effectively
generated data. According to the experimental findings, the accuracy of J48, Random Forest, Bagging,
PART, KStar, KNN, MLP, and CNN has been enhanced by 21.9%, 6.2%, 19.4%, 9.2%, 6.3%, 7%, 3.4%,
and 5.9%, respectively, when compared to the original dataset, along with the optimal generation
Citation: Chen, Y.; Wang, J.; Yang, T.;
sample ratio of each algorithm. The experimental findings demonstrate the effectiveness of the
Li, Q.; Nijhum, N.A. An
EM-FEDE approach in completing sparse data.
Enhancement Method in Few-Shot
Scenarios for Intrusion Detection in
Keywords: data enhancement; few-shot data; smart home; generative adversarial networks; intrusion
Smart Home Environments.
detection
Electronics 2023, 12, 3304. https://
doi.org/10.3390/electronics12153304
control measures to address privacy security concerns. The authors of [4] describe a prov-
able security authentication scheme to ensure the security of the smart home environment.
These studies, which are passive defense mechanisms, can improve the security of smart
homes to some extent, but they do not fully address all the security issues.
Intrusion detection, as a typical representative of active defense, is one of the critical
technologies for safeguarding the security of smart home systems [5]. It overcomes the
limitations of traditional network security techniques in terms of real-time responsiveness
and dynamic adaptability. Monitoring and identifying abnormal behavior in network
traffic enables timely detection and prevention of malicious attacks. Therefore, designing
an efficient intrusion detection model is of paramount importance in ensuring the security
of smart home systems. Traditional machine learning-based intrusion detection algorithms
are relatively straightforward to train, widely adopted, and demonstrate high efficiency and
reliability in practical applications [6,7]. On the other hand, intrusion detection algorithms
based on deep learning exhibit superior detection performance, but their exceptional
performance relies heavily on a significant amount of training data [8–10].
Currently, there is no comprehensive framework for research on smart home secu-
rity, and it still faces several challenges [11,12]. Due to the varying attack frequencies of
different devices in smart homes, there is an imbalance in collecting network traffic data,
with some devices having a deficient proportion of attack data compared to normal data.
The insufficient quantity of data makes it difficult to effectively train intrusion detection
models, resulting in a decline in their performance [13,14]. Therefore, this paper proposes
an enhancement method, EM-FEDE, applied to smart home intrusion detection in few-shot
scenarios. Firstly, the EM-FEDE method analyzes the historical intrusion detection records
of smart homes to determine whether there are features indicative of device types and data
types in the captured data and then adaptively extends the features to achieve format align-
ment of device data. Secondly, the EM-FEDE method performs data cleaning operations
to reduce noise and redundancy by removing duplicate entries and normalizing the data.
Furthermore, the method adjusts the random sampling mechanism to ensure the diversity
of the few-shot data obtained through sampling. Finally, the processed sampling data is
used as input for the CWGAN, a variant of GAN that improves data generation through
modifications in the loss function and optimization algorithms. The Wasserstein distance,
which measures the dissimilarity between two probability distributions, is employed by
CWGAN to calculate the loss between the generated data (fake data) and the real data.
Based on this loss, the CWGAN is adjusted, and the generator of the CWGAN outputs
effectively generates data. The main contributions of this paper are as follows:
• This paper proposes a feature enhancement module to improve the data quality in the
dataset by analyzing historical intrusion detection records of smart homes, adaptively
extending feature columns for the smart home devices dataset, and performing data
cleaning on the dataset;
• This paper proposes a data enhancement module to generate valid data to popu-
late the dataset using conditional Wasserstein GAN to realize the operation of data
enhancement for few-shot data;
• The effectiveness of the EM-FEDE method is evaluated using a typical smart home
device dataset, N-BaIoT. The performance of the original dataset and the expanded
dataset using the EM-FEDE method on each intrusion detection model is compared to
conclude that the classifier’s performance is higher for the expanded dataset than the
original dataset;
• The experiments demonstrate that expanding the dataset using the EM-FEDE method
is crucial and effective in improving the performance of attack detection. This work
successfully addresses the problem of few-shot data affecting the performance of
intrusion detection models.
312
Electronics 2023, 12, 3304
2. Related Works
2.1. Intrusion Detection Methods for Smart Homes
Intrusion detection methods for smart homes have gained significant attention in
recent years as a popular research direction in the field of smart homes, and many scholars
have conducted relevant research [15–17]. Many methods utilize sensors and network
communication functions within smart home devices to detect intrusions by monitoring
user behavior, device status, and other relevant data.
In 2021, the authors of [18] proposed an intrusion detection system that uses bidirec-
tional LSTM recursive behavior to save the learned information and uses CNN to perfectly
extract data features to detect anomalies in smart home networks. In 2021, the authors
of [19] proposed a two-layer feature processing method for massive data and a three-
layer hybrid architecture composed of binary classifiers in smart home environments to
detect malicious attack environments effectively. In 2022, the authors of [20] proposed
an intelligent two-tier intrusion detection system for the IoT. Using the feature selection
module combined with machine learning, both flow-based and packet-based, it can min-
imize the time cost without affecting the detection accuracy. In 2023, the authors of [21]
proposed an effective and time-saving intrusion detection system using an ML-based in-
tegrated algorithm design model. This model has high accuracy, better time efficiency,
and a lower false alarm rate. In 2023, the authors of [14] proposed a transformer-based
NIDS method for the Internet of Things. This method utilizes a self-attention mechanism
to learn the context embedding of input network features, reducing the negative impact of
heterogeneous features.
Even though numerous scholars have obtained commensurate outcomes pertaining to
the issue of smart home security, such research endeavors were executed with ample data
and did not consider the predicament of limited samples attributable to the shortage of
data emanating from various devices in smart homes. As a result, it is difficult for intrusion
detection models to assimilate the data feature, and the suggested models of the research
endeavors above are unsuitable for situations involving few-shot data.
313
Electronics 2023, 12, 3304
eration of methods that use a class of classification models to determine the authenticity of
facial images. This method improves cross-domain detection efficiency while maintaining
source-domain accuracy. In 2023, the authors of [26] proposed an attention-self-supervised
learning-aided classifier generative adversarial network algorithm to expand the samples
to improve the defect recognition ability of small sample data sets. In 2023, the authors
of [27] proposed a generative model for generating virtual marker samples by combining
supervised variational automatic encoders with Wasserstein GAN with a gradient penalty.
This model can significantly improve the prediction accuracy of soft sensor models for
small-sample problems.
Although scholars have made many achievements using GAN for data enhancement,
their applications are mainly carried out on images. In network security, there is still a lack
of research on data enhancement using GAN. In addition, the implementation of GANs
for data augmentation in the field of smart home intrusion detection has not been fully
explored, thereby limiting their potential to solve problems in this field.
3. EM-FEDE Method
3.1. Problem Analysis
Figure 1 shows a typical smart home environment. A diverse array of smart home
devices is linked to a gateway, which in turn is connected to the Internet via routers, and
the data collected by these devices is subsequently sent to terminals for user analysis.
314
Electronics 2023, 12, 3304
1,075,936). Moreover, the number of data points generated by different attack behaviors
also varies based on the attack frequencies of the devices. For instance, Figure 3 shows that
attack1 (a UDP attack by the Gafgyt botnet) and attack2 (a UDP attack by the Mirai botnet)
both utilize vulnerabilities to carry out DDoS attacks. However, attack2 is more effective
and straightforward, resulting in a higher frequency of occurrence. Therefore, attack1 has
far fewer data samples (255,111 vs. 1,229,999).
Through the utilization of authentic datasets, information retrieval, and prior knowl-
edge, the present study presents an account of the operational and safety conditions of
various commonplace smart home devices in Table 1. The tabulation highlights that dis-
tinctive devices within the smart home setting exhibit assorted data throughput and attack
frequencies. Additionally, diverse categories of smart devices are susceptible to differing
attack behaviors, which results in a dissimilar amount of attack-related data. This situation
leads to marked discrepancies in the data collected between various devices and attack
types. For example, in the case of smart light bulbs, detecting and identifying attacks on
these devices effectively is challenging due to the limited amount of attack data available.
This scarcity of data is a result of the relatively low number of attacks that have been
observed on this particular type of device. On the other hand, for smart door lock devices
that experience a high frequency of attacks, more attack data is typically collected. How-
ever, there may still be instances of infrequent attack behaviors of a specific type (such as
DDoS attacks commonly observed on smart cameras). These infrequent attack behaviors
generate a small amount of attack data, which can be considered a sample size. As the
tally of interconnected smart home devices continues to increase, these disparities become
more prominent. Accordingly, during the process of flow data collection, specific devices
are often unable to generate sufficient attack data, which impairs the efficacy of intrusion
detection models. This limitation ultimately has a bearing on the overall security and
stability of the smart home environment. Therefore, addressing the challenge of few-shot
data resulting from a shortage of attack-related data is a critical research direction in the
field of smart home device network security.
315
Electronics 2023, 12, 3304
$%
'
!""
#
#
316
Electronics 2023, 12, 3304
Symbols Description
R Historical intrusion detection records.
Used to determine the existence of the device class and the data class
SearchF ( x )
in x. It returns 1 if features are present and 0 otherwise.
Insert() Insert operation.
Class_Label ( x ) Obtain the corresponding class from the information in x.
ai The device class feature column.
bi The data class feature column.
The function is used for mapping during the process of
LabelEncoding( x )
numericalization in x.
FE_Duplicate( x ) The function is used for removing duplicate data in x.
FE_Normalization( x ) The function is used for normalizing the data in x.
L 1-Lipschitz function.
Preal Real data distribution.
Pz Data distribution of input noise.
G(z) Fake sample data generated by the generator.
The probability that the discriminator determines that x belongs to
D(x)
the real data.
Z Noise vector of the a priori noise distribution Pz .
∏ Preal , Pg Joint probability distribution of real data and generated data.
Fake_data Generated data with label y_fake.
00
Figure 5. Common pcap file format.
317
Electronics 2023, 12, 3304
To address the inability to directly use the raw data for training intrusion detection
models and improve data quality, this study proposes the feature enhancement module
of the EM-FEDE method. It achieves feature enhancement through R analysis, feature-
adaptive expansion, and data cleaning. This process optimizes the data and enables the
identification of missing class features.
The following is the specific process of feature enhancement:
Step 1. The LF is used to indicate the device class, and the data class in R is determined
by Equation (1). If LF = 1, go to Step 3; if LF = 0, go to Step 2.
Step 2. If direct prior knowledge (E) is available regarding the class of device and
data, the device class feature (ai) and data class feature (bi) can be added to R through E.
In the absence of such knowledge, the captured traffic data is analyzed to gather relevant
information. As different attacks take place at different timestamps and distinct source IP
addresses represent unique device characteristics, the timestamp and source IP address are
treated as prior knowledge E. The device class feature (ai) and data class feature (bi) are
then added to R, utilizing E. The specific equations pertaining to this process are illustrated
in (2) and (3).
[ ai, bi ] = Class_Label ( E), (2)
318
Electronics 2023, 12, 3304
$ 3 4. . 34%
66,
& 35*-
!
,77,
'
319
Electronics 2023, 12, 3304
#
5
!
' $%
(
6
6
, ,
$%
6 ,
#
# !""
#
To quantify the disparity between the real data distribution and the fake data distri-
bution, the EM-FEDE method employs the Wasserstein distance, which is expressed as
Equation (7):
That is, for any joint probability distribution γ there exist edge probability distributions
Preal and Pg . Two sample points, x and y, can be sampled from the edge distribution, and
the value of the Wasserstein distance is a lower boundary on the expectation of the x and
y distances.
The Wasserstein distance is a metric that quantifies the dissimilarity between two
probability distributions. It is smaller when the distributions are more similar. Even if the
two distributions have no overlap, the Wasserstein distance can still be computed, unlike
the Jensen–Shannon divergence, which cannot handle this case. This property has been
leveraged by the EM-FEDE method, which incorporates the Wasserstein distance into the
loss function of the CWGAN. As a result, the neural network structure is improved, and
the objective function is represented by an equation:
where D∈L, x is the sample from the real data distribution, Preal , and y is the conditional
variable, i.e., the class characteristics of the data.
The following are the main steps of the data enhancement process:
Step 1. The training set in R, after undergoing the feature enhancement process, is
utilized as the training data for the CWGAN. The generator and discriminator, both of
which employ multilayer perceptron models, are defined as two neural network models.
Equation (8) is employed to determine the objective function of the EM-FEDE method;
Step 2. Training the discriminator. The process of training the discriminator is illus-
trated in Figure 8. It involves inputting a set of randomly generated fake_data samples and
real_data samples of sizes n and m, respectively, into the discriminator. The loss values of
both sets of data are computed using Equation (9) and subsequently used to update the
discriminator’s parameters:
320
Electronics 2023, 12, 3304
Step 3. Training the generator. The process of training the generator is illustrated
in Figure 9. The generator is trained by generating a d-dimensional noise vector Z with
label y as input, producing a set of fake_data samples of size n. These fake_data samples,
along with the real_data samples, are then input into the discriminator. The loss value for
this set of fake_data is computed using Equation (10), and the generator’s parameters are
updated accordingly.
&#
:
5
'
6 ,
Step 4. The training process iterates Steps 2 and 3 repeatedly until the predetermined
number of iterations or loss of convergence is reached. Conversely, by generating a new set
of fake_data and returning R = [R ∪ fake_data].
Following the process of data enhancement, the imbalanced original dataset is enriched
with fake data, effectively ensuring a more even distribution of data across all classes within
the dataset.
In the EM-FEDE method, the computational cost of feature enhancement is negligible,
so its computational complexity depends mainly on the CWGAN part of the data enhance-
ment module. For the EM-FEDE method, the gradients of the generator and discriminator
need to be computed and updated. In each epoch, O(|gω | + |gθ |) floating-point opera-
tions are required (where gω is the gradient of the generator and gθ is the gradient of the
discriminator), and thus its overall complexity is O(|R|·(|gω | + |gθ |)·Ne)(where |R| is
321
Electronics 2023, 12, 3304
the training dataset and Ne is the total number of training times). Algorithm 1 gives the
detailed algorithmic flow of the EM-FEDE method.
Algorithm 1: EM-FEDE
Input: α = 0.0005, the learning rate; n = 50, the batch size; c = 0.01, the clipping parameter; ω0 ,
initial discriminator parameters; θ0 , initial generator parameters; Ne = 1000, the training cycles.
Output: Expanded R
Process:
1. Calculate LF by Equation (1)
2. If LF = 0
3. Add feature columns that are helpful for classification to R through Equations (2)–(4)
4. Numerization, de-duplication, and normalization by Equations (5)–(7)
5. Divide the processed R into training sets and test sets
6. End if
7. While θ has not converged or epoch < Ne do
8. epoch++
9. Sample of m noise samples{z1 , . . ., zn } ~ PZ a batch of prior data
10. Sample of m examples{(x1 ,y1 ), . . ., (xn ,yn )} ~ Preal a batch from the real data
11. Update the discriminator
D by ascending its stochastic gradient (gω )
m m
12. gω = ∇ ω 1
m ∑ f ω ( xi | yi ) − 1
m ∑ f ω ( gθ (z i |yi ))
i =1 i =1
13. ω = ω + α ∗ RMSProp(ω, gω )
14. ω = clip(ω, −c, c)
15. Sample of m noise samples{z1 , . . ., zm } ~ PZ a batch of prior data.
16. Update the generator G by ascending its stochastic gradient (gθ )
m
17. gθ = −∇θ m1 ∑ f ω ( gθ (z i |yi ))
i =1
18. θ = θ − α ∗ RMSProp(θ, gθ )
19. End while
20. Generate sample data for each class through the generator to populate R
21. Train the expanded R on different classifiers to obtain various evaluation indicators
4. Results
4.1. N-BaIoT Dataset Description
The N-BaIoT dataset, released in 2018, consists of network traffic samples extracted
from nine real IoT devices, featuring normal traffic from these devices and five varieties
of attack traffic from the gafgyt and mirai botnet families. Figures 10 and 11 illustrate the
differences in the data distribution across different traffic types and devices.
322
Electronics 2023, 12, 3304
The N-BaIoT dataset comprises extracted functions derived from raw IoT network
traffic information. Upon receipt of each packet, a synopsis of the protocol and the host’s
behavior is computed with respect to the transmission of each packet. The contextual
information of the data packet is then represented by a set of statistical features that are
generated whenever a data packet arrives. Specifically, the arrival of each data packet
leads to the extraction of 23 statistical features from five distinct time windows, namely,
100 ms, 500 ms, 1.5 s, 10 s, and 1 min. These five 23-dimensional vectors are subsequently
concatenated into a single 115-dimensional vector.
The N-BaIoT dataset has been obtained in a real-world IoT setting, thus ensuring a
high level of authenticity and representativeness. It serves as a standardized dataset that
can be used by researchers to evaluate and enhance the efficacy of intrusion detection
systems for IoT devices.
xi − xmin
xi = , (11)
xmax − xmin
where xi is the current feature, xmin is the minimum eigenvalue in the same dimension, and
xmax is the maximum eigenvalue in the same dimension.
To replicate the scarcity of data in smart home devices in the real world and to
guarantee that the dataset gathered from sampling includes samples from all categories,
this research employs stratified sampling. This method ensures that each sample has
an equal opportunity to be selected while maintaining the randomness of the samples.
Ultimately, 2860 data samples were randomly chosen from the dataset as representative
examples. The training and test data were then divided in a 7:3 ratio, and the sample
distribution of the training and test sets can be found in Table 3.
323
Electronics 2023, 12, 3304
Table 3. Cont.
Category Parameters
CPU Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20 GHz
RAM 64 GB
Programming Tools Jupyter Notebook
Programming Languages Python3.8
Deep Learning Framework Pytorch1.8
Machine Learning Platform Weka3.9
Data Processing Library Numpy, pandas, etc.
324
Electronics 2023, 12, 3304
Table 6. Cont.
TN + TP
Accuracy = , (12)
TN + FP + FN + TP
TP
Precision = , (13)
FP + TP
TP
Recall = , (14)
FN + TP
Prescision ∗ Recall
F1 Score = 2 ∗ , (15)
Prescision + Recall
where TP indicates the number of true positives; TN indicates the number of true negatives
in the sample; FN indicates the number of false negatives; and FP indicates the number of
false positives.
To test the effectiveness of the fake samples, the intrusion detection classifier was
trained on both the original training set and the training set enhanced by the EM-FEDE
method. Subsequently, the enhancement effect of the EM-FEDE method was evaluated by
assessing the comprehensive classification performance of the intrusion detection classifier
using the test set. Multiple sets of data were generated for experiments, each with different
ratios of fake samples, as documented in Table 7.
Dataset (Generated Sample Ratios) Number of Fake Samples Number of Samples after Expansion
x (Original sample size) 0 2002
2x 2004 4006 (2002 + 2004)
3x 4006 6008 (2002 + 4006)
4x 6118 8120 (2002 + 6118)
5x 8010 10,012 (2002 + 8010)
6x 10,012 12,014 (2002 + 10,012)
7x 12,014 14,016 (2002 + 12,014)
8x 14,016 16,018 (2002 + 14,016)
9x 16,018 18,020 (2002 + 16,018)
10x 18,009 20,011 (2002 + 18,009)
The distributions of the original data training set and the expanded training set are
shown in Figures 12 and 13, respectively, using a generated sample ratio of 5x as an example.
325
Electronics 2023, 12, 3304
Figure 13. Sample distribution of training set data with a generated sample ratio of 5x.
326
Electronics 2023, 12, 3304
Table 8. Comparison of multi-classification results between the original dataset of size x and the
mixed dataset with a generated sample ratio of 5x (the precision and F1 Score of some algorithms
are unknown (?), which is due to the presence of Nan values in the calculation of precision, i.e., a
denominator of 0). This scenario can occur when the algorithm fails to classify any sample into a
particular class or when it wrongly classifies all samples in that class.
The evaluation of the experiments was carried out using various classification algo-
rithms, including KNN, KStar, Bagging, PART, J48, Random Forest, MLP, and CNN. The
evaluated results for datasets enhanced with different generated sample ratios are shown
in Figure 14.
As shown in Figure 14, the utilization of the EM-FEDE method has improved the
accuracy of various classification algorithms. This improvement was observed when
expanding the dataset compared to the original dataset. Additionally, the optimal sample
ratio for achieving the best performance varies across different classification algorithms.
With an increase in the number of generated samples, the accuracy of each classification
algorithm gradually increases. The accuracy of J48 has increased from 62.39% (x) to 80.43%
(10x), RF has increased from 75.55% (x) to 81.73% (6x), PART has increased from 67.28%
(x) to 76.49% (5x), MLP has increased from 81.09% (x) to 84.45% (10x), CNN has increased
from 71.18% (x) to 77.05% (10x), KNN has increased from 76.8% (x) to 83.31% (4x), KStar
327
Electronics 2023, 12, 3304
has increased from 78.9% (x) to 85.24% (4x), and Bagging has increased from 65.5% (x) to
84.86% (2x).
However, when the generated sample ratio becomes too large, the accuracy of some
classification algorithms slightly decreases compared to a smaller generated sample ratio.
The accuracy of RF decreases from 81.73% (6x) to 76.44% (10x), PART decreases from 76.49%
(5x) to 65.91% (10x), KNN decreases from 83.31% (4x) to 81.32% (10x), KStar decreases from
85.24% (4x) to 81.94% (10x), and Bagging decreases from 84.86% (2x) to 73.98% (10x).
The accuracy of several classification algorithms such as RF, PART, KNN, KStar, and
Bagging initially improves as the number of generated samples increases until they reach
their optimal generated sample ratio, after which the accuracy decreases. This trend occurs
due to the presence of fake data, which can negatively affect the quality of the data. The
generator model aims to approximate the distribution of real data as closely as possible, but
if the quantity of fake data becomes too large, the generator model can experience mode
collapse. This phenomenon indicates that the fake data becomes excessively similar, and
increasing the data further no longer improves the classifier’s performance. Instead, it can
lead to a decrease in classification accuracy due to noise in the fake data.
In contrast, J48, MLP, and CNN exhibit a gradual increase in accuracy. J48, a machine
learning classifier based on feature partitioning, is typically sensitive to diversity and
complexity. MLP and CNN, as deep learning classifiers, possess stronger representational
and generalization capabilities. An increase in fake data leads to an increase in the training
data for classifiers. This increase provides more opportunities for the classifiers to learn
from different data distributions and features, leading to more complex and deeper levels
of feature representation. Consequently, the classifiers’ accuracy improves.
The variability in the best generated sample ratios is evident across different algo-
rithms, as illustrated in Figure 14. Table 9 presents the accuracy of said ratios, in contrast to
the original dataset, for various algorithms. The accuracy of J48, Random Forest, Bagging,
PART, KStar, KNN, MLP, and CNN improved by 21.9%, 6.2%, 19.4%, 9.2%, 6.3%, 7%, 3.4%,
and 5.9%, respectively. It is worth noting that the extended dataset demonstrated an overall
higher accuracy in comparison to the original dataset when scaled to the best generated
sample ratio of each algorithm.
Table 9. The accuracy of multi-classification is compared between the original data set and the mixed
data set with the optimal generation sample ratio of each algorithm.
SMOTE [29] is an oversampling method that generates new samples to expand the
dataset based on the relationship between samples, and CGAN [30] is an extension of
GAN for conditional sample generation. This part of the experiment examined the impact
of different generated sample ratios on accuracy in J48 and Bagging for mixed datasets
created using SMOTE, CGAN, and the proposed method. Additionally, we compared it
with the same number of datasets containing only real data to prove the effectiveness of the
proposed method in this paper. The experimental results are shown in Figures 15 and 16.
328
Electronics 2023, 12, 3304
Based on the results presented in Figure 15, in our method, it is evident that the
accuracy of the enhanced dataset, which includes a combination of fake data and real data
at the generated sample ratios of nx (n = 2, 3, . . ., 10), is superior to that of an equivalent
number of instances from the original dataset. At lower generated sample ratios, the mixed
dataset exhibits significantly improved accuracy on J48 in comparison to an equivalent
number of instances from the real dataset. As the generated sample ratio increases, the
accuracy of the mixed dataset on J48 exhibits fluctuation, albeit within a small range, and
ultimately reaches a plateau. Although the mixed dataset continues to outperform the real
dataset in terms of accuracy on J48, its advantage diminishes as the generated sample ratio
becomes larger.
Regarding the CWGAN, at lower generated sample ratios nx (n = 2, 3, 4), the accuracy
of the mixed dataset in J48 slightly improves compared to the same number of instances of
the real dataset. However, for generated sample ratios of nx (n = 4, . . ., 10), the accuracy of
the mixed dataset at J48 is lower than that of the equivalent number of real datasets, and
329
Electronics 2023, 12, 3304
the performance of the real dataset is significantly better than that of the mixed dataset as
the generated sample ratio increases.
Regarding the SMOTE, at the generated sample ratio of nx (n = 2, . . ., 7), the accuracy
of the mixed dataset in J48 is significantly higher than that of the real dataset with the same
number of samples. At the generated sample ratio of nx (n = 8, 9, 10), the accuracy of the
hybrid dataset starts to decrease and is lower than the equivalent number of real datasets.
Based on the experimental results, we can conclude that the J48 algorithm has more
capacity for learning the supplementary feature information that is provided by the ex-
panded dataset. This attribute of the algorithm contributes to an improved understanding
of the dataset’s traits and patterns, thereby leading to an enhancement of the classifier’s
performance. In addition to this, the introduction of a small quantity of artificial data
has been observed to have a beneficial effect on the model’s ability to generalize, and it
can also serve to mitigate the effects of overfitting and noisy data. However, it should be
noted that there is a threshold beyond which the quantity of artificially generated data
becomes sufficient, and further increments of such data do not yield any improvement in
the accuracy of the intrusion detection model.
The results in Figure 16 show that the accuracy of the mixed dataset with generated
sample ratio nx (n = 2, . . ., 6) on the Bagging algorithm is better than that of the corre-
sponding number of real datasets in the method of this paper. However, for the generated
sample ratio nx (n = 7, . . ., 10), the accuracy of the mixed dataset is lower than that of
the corresponding number of real datasets. The experimental results reveal that the op-
timal generated sample rate for the Bagging algorithm using the method in this paper is
2x. Moreover, the accuracy of Bagging decreases and stabilizes as the generated sample
rate increases.
Regarding the CWGAN, the accuracy of the mixed dataset is higher than the same
number of instances of the real dataset for the generation sample rate nx (n = 2, 3, 5).
However, for the generating sample ratio of nx (n = 4, 6, . . ., 10), the accuracy of the mixed
dataset is lower than the accuracy of the same number of real datasets. The results indicate
that the best generated sample ratio for the Bagging algorithm using the CWGAN is 3x.
Regarding the SMOTE, the accuracy of the mixed dataset is higher for the generation
sample ratio nx (n = 2, . . ., 6) compared to the same number of instances of the real dataset.
For the generation sample ratio nx (n = 7, . . ., 10), the accuracy of the mixed dataset is
lower than the accuracy of the same number of instances of the real dataset. From the
experimental results, it can be concluded that the optimal generation sample ratio for
Bagging on SMOTE is 4x.
When the generated sample ratio nx (n = 7, . . ., 10) is too large, the accuracy of both
the methods in this paper, CWGAN and SMOTE on Bagging, is lower than the equivalent
number of real datasets. Despite the decrease in accuracy, the accuracy of this paper’s
method and SMOTE is still higher than that of the original dataset x. By comparing this
paper’s method, CWGAN, and SMOTE, it can be concluded that this paper’s method
exhibited better performance.
Based on our experimental results, we can conclude that utilizing fake data for data
enhancement can significantly enhance the accuracy of the classifier, particularly when
the expansion multiplier is small. However, the introduction of fake data may result in
noise, and its proportion increases with the expansion multiplier. This difference between
real and fake data can make it challenging to provide sufficient useful feature information,
which can, in turn, impede the ability of the model to learn the data features. Ultimately,
this can lead to a reduction in the accuracy of the classifier.
The SMOTE algorithm analyzes the minority class samples and manually synthesizes
new samples to add to the dataset based on the minority class samples. This technique
of generating new samples through oversampling helps prevent overfitting. However, it
may generate the same number of new samples for each minority class sample, resulting
in increased overlap between classes and the creation of samples that do not offer useful
information. The CGAN method improves the data generation process by incorporating
330
Electronics 2023, 12, 3304
additional information to guide the model. However, the training process of CGAN is
not very stable, and the quality of the generated data can vary. In contrast, the EM-
FEDE method proposed in this paper uses the CWGAN approach to generate data with
greater diversity. It also provides more informative samples and is more stable during
training, resulting in higher-quality generated data compared to CGAN. To summarize,
the effectiveness of the EM-FEDE method has been demonstrated, making it suitable for
training datasets for intrusion detection models. However, it is crucial to consider that the
optimal generated sample ratio may differ based on the particular algorithm and model in
use. To attain the highest level of accuracy and performance for a given intrusion detection
algorithm or model, it is essential to undertake a meticulous evaluation and selection of
the most fitting generated sample ratio. This selection and evaluation process is crucial to
guaranteeing optimal outcomes.
5. Discussion
The present article discusses the issue of few-shot data on smart home devices and
the challenges this poses for intrusion detection models. Specifically, the study highlights
how the security dataset collected from traffic information often lacks data, which limits
the performance of intrusion detection models. To address this issue, the article proposes a
method called EM-FEDE, which enhances the dataset and effectively mitigates the impact
of few-shot data on intrusion detection performance, improving security in smart home
environments. The study evaluates the performance of datasets enhanced with different
generated sample ratios and analyzes the effect of using enhanced datasets for intrusion
detection model training. Furthermore, the article examines the influence of different
generated sample ratios on classification performance for specific classification algorithms.
The results indicate that the optimal generated sample ratio may vary depending on the
algorithm and model used. Based on the obtained results, it can be concluded that the
proposed method shows promising performance in solving few-shot data. In addition to
intrusion detection, it can be applied to different domains, such as sentiment analysis tasks
where the samples of various sentiment categories are highly imbalanced and underwater
target recognition tasks where the samples are too small to train an effective model.
In this paper, the specific details regarding the optimal expansion multiplier and the
ratio of generated data to real data for various classification algorithms are not extensively
explored. Thus, future studies will focus on optimizing the intrusion detection model by
selecting more suitable classification algorithms to enhance detection accuracy. Addition-
ally, further research will be conducted to determine the appropriate enhancement factors
and ratios between generated and real data during the data enhancement process.
Author Contributions: Conceptualization, T.Y. and J.W.; methodology, Y.C. and J.W.; software, Y.C.,
T.Y. and J.W.; validation, J.W., T.Y. and Y.C.; formal analysis, J.W.; investigation, J.W.; resources, Y.C.;
data curation, J.W.; writing—original draft preparation, J.W. and T.Y.; writing—review and editing,
J.W., Q.L. and N.A.N.; supervision, T.Y. and J.W. All authors have read and agreed to the published
version of the manuscript.
Funding: This work was supported by the Sichuan Science and Technology Program under Grant
No. 2022YFG0322, China Scholarship Council Program (Nos. 202001010001 and 202101010003),
the Innovation Team Funds of China West Normal University (No. KCXTD2022-3), the Nanchong
Federation of Social Science Associations Program under Grant No. NC22C280, and China West
Normal Universi-ty 2022 University-level College Student Innovation and Entrepreneurship Training
Program Project under Grant No. CXCY2022285.
Data Availability Statement: Data are unavailable due to privacy.
Acknowledgments: Thanks to everyone who contributed to this work.
Conflicts of Interest: The authors declare no conflict of interest.
331
Electronics 2023, 12, 3304
References
1. Cvitić, I.; Peraković, D.; Periša, M.; Jevremović, A.; Shalaginov, A. An Overview of Smart Home IoT Trends and related
Cybersecurity Challenges. Mob. Netw. Appl. 2022. [CrossRef]
2. Hammi, B.; Zeadally, S.; Khatoun, R.; Nebhen, J. Survey on smart homes: Vulnerabilities, risks, and countermeasures. Comput.
Secur. 2022, 117, 102677. [CrossRef]
3. Wang, Y.; Zhang, R.; Zhang, X.; Zhang, Y. Privacy Risk Assessment of Smart Home System Based on a STPA–FMEA Method.
Sensors 2023, 23, 4664. [CrossRef] [PubMed]
4. Wu, T.Y.; Meng, Q.; Chen, Y.C.; Kumari, S.; Chen, C.M. Toward a Secure Smart-Home IoT Access Control Scheme Based on Home
Registration Approach. Mathematics 2023, 11, 2123. [CrossRef]
5. Li, Y.; Zuo, Y.; Song, H.; Lv, Z. Deep learning in security of internet of things. IEEE Internet Things J. 2021, 9, 22133–22146.
[CrossRef]
6. Chkirbene, Z.; Erbad, A.; Hamila, R.; Gouissem, A.; Mohamed, A.; Guizani, M.; Hamdi, M. A weighted machine learning-based
attacks classification to alleviating class imbalance. IEEE Syst. J. 2020, 15, 4780–4791. [CrossRef]
7. Zivkovic, M.; Tair, M.; Venkatachalam, K.; Bacanin, N.; Hubálovský, Š.; Trojovský, P. Novel hybrid firefly algorithm: An
application to enhance XGBoost tuning for intrusion detection classification. PeerJ Comput. Sci. 2022, 8, e956.
8. Li, X.K.; Chen, W.; Zhang, Q.; Wu, L. Building auto-encoder intrusion detection system based on random forest feature selection.
Comput. Secur. 2020, 95, 101851.
9. Wang, Z.; Liu, Y.; He, D.; Chan, S. Intrusion detection methods based on integrated deep learning model. Comput. Secur. 2021,
103, 102177. [CrossRef]
10. Tsimenidis, S.; Lagkas, T.; Rantos, K. Deep learning in IoT intrusion detection. J. Netw. Syst. Manag. 2022, 30, 8. [CrossRef]
11. Heartfield, R.; Loukas, G.; Budimir, S.; Bezemskij, A.; Fontaine, J.R.; Filippoupolitis, A.; Roesch, E. A taxonomy of cyber-physical
threats and impact in the smart home. Comput. Secur. 2018, 78, 398–428. [CrossRef]
12. Touqeer, H.; Zaman, S.; Amin, R.; Hussain, M.; Al-Turjman, F.; Bilal, M. Smart home security: Challenges, issues and solutions at
different IoT layers. J. Supercomput. 2021, 77, 14053–14089. [CrossRef]
13. Cao, X.; Luo, Q.; Wu, P. Filter-GAN: Imbalanced Malicious Traffic Classification Based on Generative Adversarial Networks with
Filter. Mathematics 2022, 10, 3482. [CrossRef]
14. Wang, M.; Yang, N.; Weng, N. Securing a Smart Home with a Transformer-Based IoT Intrusion Detection System. Electronics 2023,
12, 2100. [CrossRef]
15. Guebli, W.; Belkhir, A. Inconsistency detection-based LOD in smart homes. Int. J. Semant. Web Inf. Syst. IJSWIS 2021, 17, 56–75.
[CrossRef]
16. Madhu, S.; Padunnavalappil, S.; Saajlal, P.P.; Vasudevan, V.A.; Mathew, J. Powering up an IoT-enabled smart home: A solar
powered smart inverter for sustainable development. Int. J. Softw. Sci. Comput. Intell. IJSSCI 2022, 14, 1–21. [CrossRef]
17. Tiwari, A.; Garg, R. Adaptive Ontology-Based IoT Resource Provisioning in Computing Systems. Int. J. Semant. Web Inf. Syst.
IJSWIS 2022, 18, 1–18. [CrossRef]
18. Elsayed, N.; Zaghloul, Z.S.; Azumah, S.W.; Li, C. Intrusion detection system in smart home network using bidirectional lstm and
convolutional neural networks hybrid model. In Proceedings of the 2021 IEEE International Midwest Symposium on Circuits and
Systems (MWSCAS), Lansing, MI, USA, 9–11 August 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 55–58.
19. Shi, L.; Wu, L.; Guan, Z. Three-layer hybrid intrusion detection model for smart home malicious attacks. Comput. Electr. Eng.
2021, 96, 107536. [CrossRef]
20. Alani, M.M.; Awad, A.I. An Intelligent Two-Layer Intrusion Detection System for the Internet of Things. IEEE Trans. Ind. Inform.
2022, 19, 683–692. [CrossRef]
21. Rani, D.; Gill, N.S.; Gulia, P.; Arena, F.; Pau, G. Design of an Intrusion Detection Model for IoT-Enabled Smart Home. IEEE Access
2023, 11, 52509–52526. [CrossRef]
22. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial
nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680.
23. Fu, W.; Qian, L.; Zhu, X. GAN-based intrusion detection data enhancement. In Proceedings of the 2021 33rd Chinese Control and
Decision Conference (CCDC), Kunming, China, 22–24 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 2739–2744.
24. Zhang, L.; Duan, L.; Hong, X.; Liu, X.; Zhang, X. Imbalanced data enhancement method based on improved DCGAN and its
application. J. Intell. Fuzzy Syst. 2021, 41, 3485–3498. [CrossRef]
25. Li, S.; Dutta, V.; He, X.; Matsumaru, T. Deep Learning Based One-Class Detection System for Fake Faces Generated by GAN
Network. Sensors 2022, 22, 7767. [CrossRef] [PubMed]
26. Yang, W.; Xiao, Y.; Shen, H.; Wang, Z. An effective data enhancement method of deep learning for small weld data defect
identification. Measurement 2023, 206, 112245. [CrossRef]
27. Jin, H.; Huang, S.; Wang, B.; Chen, X.; Yang, B.; Qian, B. Soft sensor modeling for small data scenarios based on data enhancement
and selective ensemble. Chem. Eng. Sci. 2023, 279, 118958. [CrossRef]
28. Meidan, Y.; Bohadana, M.; Mathov, Y.; Mirsky, Y.; Shabtai, A.; Breitenbacher, D.; Elovici, Y. N-BaIoT—Network-Based Detection of
IoT Botnet Attacks Using Deep Autoencoders. IEEE Pervasive Comput. 2019, 17, 12–22. [CrossRef]
332
Electronics 2023, 12, 3304
29. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell.
Res. 2002, 16, 321–357. [CrossRef]
30. Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
333
electronics
Article
A Network Clustering Algorithm for Protein Complex
Detection Fused with Power-Law Distribution Characteristic
Jie Wang 1, *, Ying Jia 1 , Arun Kumar Sangaiah 2,3, * and Yunsheng Song 4
1 School of Information, Shanxi University of Finance and Economics, Taiyuan 030006, China;
[email protected]
2 International Graduate Institute of Artificial Intelligence, National Yunlin University of Science and Technology,
Douliou 64002, Taiwan
3 Department of Electrical and Computer Engineering, Lebanese American University,
Byblos 1102-2801, Lebanon
4 School of Information Science and Engineering, Shandong Agricultural University, Taian 271018, China;
[email protected]
* Correspondence: [email protected] (J.W.); [email protected] (A.K.S.); Tel.: +86-351-7666-126 (J.W.)
Abstract: Network clustering for mining protein complexes from protein–protein interaction (PPI)
networks has emerged as a prominent research area in data mining and bioinformatics. Accurately
identifying complexes plays a crucial role in comprehending cellular organization and functionality.
Network characteristics are often useful in enhancing the performance of protein complex detection
methods. Many protein complex detection algorithms have been proposed, primarily focusing on
local micro-topological structure metrics while overlooking the potential power-law distribution
characteristic of community sizes at the macro global level. The effective use of this distribution
characteristic information may be beneficial for mining protein complexes. This paper proposes
a network clustering algorithm for protein complex detection fused with power-law distribution
characteristic. The clustering algorithm constructs a cluster generation model based on scale-free
power-law distribution to generate a cluster with a dense center and relatively sparse periphery.
Following the cluster generation model, a candidate cluster is obtained. From a global perspective,
the number distribution of clusters of varying sizes is taken into account. If the candidate cluster
aligns with the constraints defined by the power-law distribution function of community sizes, it
Citation: Wang, J.; Jia, Y.; Sangaiah, is designated as the final cluster; otherwise, it is discarded. To assess the prediction performance
A.K.; Song, Y. A Network Clustering of the proposed algorithm, the gold standard complex sets CYC2008 and MIPS are employed as
Algorithm for Protein Complex
benchmarks. The algorithm is compared to DPClus, IPCA, SEGC, Core, SR-MCL, and ELF-DPC in
Detection Fused with Power-Law
terms of F-measure and Accuracy on several widely used protein–protein interaction networks. The
Distribution Characteristic.
experimental results show that the algorithm can effectively detect protein complexes and is superior
Electronics 2023, 12, 3007. https://
to other comparative algorithms. This study further enriches the connection between analyzing
doi.org/10.3390/electronics12143007
complex network topology features and mining network function modules, thereby significantly
Academic Editor: Ping-Feng Pai contributing to the improvement of protein complex detection performance.
Received: 17 June 2023
Revised: 6 July 2023 Keywords: data mining; network clustering; protein complex detection; power-law distribution;
Accepted: 6 July 2023 topological characteristics
Published: 8 July 2023
1. Introduction
Copyright: © 2023 by the authors.
Cells rely on the interaction of multiple proteins for life activities. A protein complex,
Licensee MDPI, Basel, Switzerland.
formed through interactions, consists of molecules with similar functions. Detecting protein
This article is an open access article
complexes in protein–protein interaction (PPI) networks facilitates the exploration of the
distributed under the terms and
relationships between network structures and function modules. Moreover, it plays a
conditions of the Creative Commons
Attribution (CC BY) license (https://
crucial role in annotating the proteins with unknown functions and gaining insights into
creativecommons.org/licenses/by/
the organization and functionality of cells [1].
4.0/).
336
Electronics 2023, 12, 3007
clustering was performed in this transformed vector space [21,22]. One example of such an
algorithm is the ensemble learning framework for density peak clustering (ELF-DPC) [23].
ELF-DPC first maps the PPI network to the vector space and constructs a weighted network
to identify core edges. By integrating structural modularity and trained voting regression
models, the algorithm creates an ensemble learning model. ELF-DPC then expands the
core edges into clusters based on this learning model.
The PPI network, as a type of complex network, exhibits intricate network topology
characteristics [24–26]. The fundamental features used to describe the network topol-
ogy are primarily derived into three levels. Firstly, micro-topological structure metrics
focus on individual nodes or edges, including measures such as node degree and central-
ity [27,28]. Secondly, meso-topological metrics analyze groups of nodes, such as community
structure [29], modules, and motifs. Lastly, macro-topological metrics consider the entire
network, encompassing aspects such as degree distribution and community size distribu-
tion. Developing a network clustering algorithm that incorporates these network features
can enhance the accuracy of community detection [30]. At present, seed expansion methods
can effectively utilize network features. However, existing algorithms mainly consider local
micro-topological structure features [31] and ignore the potential distribution characteristics
of community size at a macro-global level. The distribution of community sizes in the PPI
network exhibits a certain correlation with power-law distribution [32].
In this paper, we present a novel network clustering approach that incorporates the
characteristics of power-law distribution to identify protein complexes. Our proposed
algorithm, named GCAPL, encompasses two main stages: cluster generation and cluster
determination. During the cluster generation stage, the GCAPL algorithm incorporates
node degree and clustering coefficient to assign weights to nodes. The unclustered nodes
with the highest weight were selected as seeds. Following that, a cluster generation model
leveraging the scale-free power-law distribution was given to discovery clusters with
dense centers and sparse peripheries. Through an iterative process, candidate nodes were
added to the seeds to form candidate clusters using the cluster generation model. In the
cluster determination stage, we constructed a power-law distribution function about the
distribution of cluster sizes and the cluster total number. The function acts as a criterion to
regulate the presence of clusters of various sizes. By applying the power-law distribution
function, we can assess whether a candidate cluster qualifies as a final cluster.
This paper makes several significant contributions: (1) Integrating multiple available
basic micro-topological structural information into the k-order neighborhood of a node for
seed selection; (2) Constructing a cluster generation model considering scale-free power-law
distribution to obtain inherent organization information of functional modules; (3) Giving
a cluster determination model based on macro-topological structure characteristic of the
number distribution of clusters of different sizes to constrain final clusters; (4) Verifying
the proposed network clustering algorithm fused with topological structural information
could effectively mine functional modules by the experiment results on the real datasets.
The other sections of our paper are as follows. Section 2 introduces preliminary
concepts and symbols. Section 3 presents a network clustering algorithm fused with power-
law distribution characteristics. Section 4 reports the relevant experiments to verify the
effectiveness of the network clustering algorithm. Section 5 provides conclusions.
2. Preliminary
A PPI network is represented by an undirected network G = (V, E), with V as the set of
proteins (nodes) and E as the set of interactions (edges) between proteins. Dia(G) represents
the diameter of the network G, which corresponds to the maximum value in the shortest
path between any two nodes in the network G. The k-adjacent nodes set of a given node vi
is denoted as NEk (vi ), and it is defined by
&
NE(vi ) if k = 1
NEk (vi ) = " # (1)
NEk−1 (vi ) ∪ v j ∈ V distance vi , v j = k if k > 1
337
Electronics 2023, 12, 3007
where distance(vi , v j ) represents the length of the distance between nodes vi and v j .
The clustering coefficient of vi [33] is
Symbols Meaning
Network G is composed of a collection of nodes V and
G = (V, E)
set of edges E.
vi Node i in a certain node set.
vi , v j The edge between nodes i and j.
distance vi , v j The shortest path distance between nodes i and j.
NEk Set of k-neighbors.
ES(M) The set of edges within sub-graph M.
CCE The clustering coefficient for a node
ND The degree of a node in the network.
w (.) The weight for a node or an edge
Xsize Set of cluster sizes
Ynum Set of cluster numbers
The tightness measure of node u with respect to
CT (u, M)
sub-graph M.
CS(v) Node set generated by the selected seed v.
Dia(G) The diameter of a network G
PC Final cluster set
λ Rate of change
3. Methods
GCAPL algorithm consists of two stages: cluster generation and cluster determination.
In the first stage, the algorithm calculates the weights of nodes and edges by incorporating
micro-topological structure metrics. A seed is the node that has the highest weight among
the unclustered nodes. The seed is expanded by a cluster generation model considering a
scale-free power-law distribution to a candidate cluster. In the second stage, we established
the cluster determination model with a power-law distribution of the cluster numbers
with different cluster sizes. This cluster decision model was utilized to determine the final
clusters. Figure 1 shows the algorithm flow chat.
338
Electronics 2023, 12, 3007
Initialize node
weights
Discarded Final clusters
Yes No
Weighting edges
and nodes
Develop a cluster
generation model
Expanding
Power law distribution
Candidate cluster
function
CCE(vi ) ND ( G ) CCE(v j ) ND ( G )
w vi , v j = × + × +
CCE( G ) ND (vi ) CCE( G ) ND (v j )
CCE(vu ) ND ( G ) (4)
∑ × ND (vu )
CCE( G )
u∈ NE(vi )∩ NE(v j )
Furthermore, Equation (4) from the previous section only considers the information of
the node’s direct neighbors. To highlight the importance of an edge within a large network
module, the edge weight in its t neighborhood can be defined as follows:
CCE(v ) ND ( G )
w t vi , v j = w t −1 ( v i ) × i
× ND(v ) +
CCE( G ) i
CCE(v j )
w t − 1 vj × × ND(G) + (5)
CCE( G ) ND (v j )
CCE(vu ) ND ( G )
∑ w t −1 ( u ) × CCE( G )
× ND (vu )
u∈ NE(vi )∩ NE(v j )
339
Electronics 2023, 12, 3007
Initially, the node weights are set to w0 (vi ) = 1 for all nodes, indicating that the initial
importance of all nodes is the same.
Once the node weight calculation is completed, the next step is to select a seed node v
from the node set V whose node weight is highest. Following that, the seed node is used to
establish the cluster generation model, which allows for the expansion of the seed into a
candidate cluster.
The cluster generation model aims to expand seed nodes into candidate clusters
based on connection strength. The obtained seed node v serves as the initial cluster
CS(v), and candidate nodes from the neighborhood NE(C(v)) are considered for addition
based on the compactness of CS(v) and the connection strength between CS(v) and a
candidate node u to expand the initial cluster CS(v). The compactness g of the cluster
CS(v) quantifies the connection density within the cluster and is defined as g(u, CS(v)) =
| NE(u) ∩ V (CS(v))|/|V (CS(v))|, where V (CS(v)) represents a set of nodes that make up
C(v), and | NE(u)| denotes the node u’s direct neighbor nodes. The connection strength
h of a candidate node u reflects the peripheral edges of the cluster and is defined as
h(u, CS(v)) = | NE(u) ∩ V (CS(v))|/| NE(u)|. The cluster generation model requires a
variable function to combine the compactness of the cluster and the peripheral edges of the
cluster, so that as the cluster size increases, the contribution of the cluster’s compactness to
the cluster generation gradually decreases while the contribution of the cluster’s peripheral
connections to the cluster generation gradually increases. A suitable choice for this function
is the scale-free power-law distribution function, which is a monotonic function. It serves
as a foundation for constructing the variable function that effectively fuses the above two
kinds of connection information. A power-law distribution function is y = cx −k and let
c = 1/λ, k = ND (v), x = V (CS(v)) − 1, then we can define the variable function as:
1
β(CS(v)) = 1 (7)
λ× ND (v)
V (CS(v)) − 1 + 1
where λ is a parameter to control the change of β(CS(v)). Then, define the cluster genera-
tion model as:
CT (u, CS(v)) = β(CS(v)) g(u, CS(v))+
(8)
(1 − β(CS(v)))h(u, CS(v))
When β(CS(v)) is set to 1, CT tends to prioritize the formation of dense clusters. On the
other hand, when β(CS(v)) is set to 0, nodes with lower degrees are more likely to be added
to CS(v). The β(CS(v)) enables the cluster generation model to find both dense clusters
and clusters with dense cores and sparse peripheries, providing flexibility in capturing
different types of cluster structures. For each candidate node u and threshold μ ∈ [0, 1],
if CT (u, CS(v)) > μ and Dia([CS(v) ∪ {u}]) ≤ δ (δ is a user-defined threshold), then the
node u is added to the cluster CS(v). This process is repeated for each node in NE(CS(v)),
resulting in the initial formation of a candidate cluster CS(v).
340
Electronics 2023, 12, 3007
341
Electronics 2023, 12, 3007
The time cost of the GCAPL algorithm lies in two parts: cluster generation and
cluster determination.
Assuming a network G has n nodes and m edges. In the cluster generation stage, the
node weighting process revealed a time cost of O(k × ND × n) = O(k × m). The time
cost of seed selection based on node weights is O(n × log n). The expansion of seeds into
clusters also has a time cost of O(n × log n). Therefore, O(| PC | × n × log n) is the total time
complexity of the cluster generation phase.
In the cluster determination phase, the worst-case scenario is when each candidate
cluster size needs to be compared with each element in the sequence Xsize . As a result,
this phase revealed a time cost of O(n × | Xsize |). Therefore, algorithm GCAPL’s overall
time complexity is O(| PC | × n × log n), considering both the cluster generation and cluster
determination phases.
The gold standard complex datasets CYC2008 [38] and MIPS [39] were utilized for
parameter analysis and evaluation of the clustering results.
342
Electronics 2023, 12, 3007
343
Electronics 2023, 12, 3007
(a) (b)
Figure 2. Performance impact analysis of parameters on BioGRID dataset: (a) analyze c and k; (b)
analyze errorsize .
Next, we kept the values of c = 200, k = 2.2, and errorsize = 6 fixed, and analyzed the
real-valued discrete parameters: the number of iterations iter, the adjustment parameter
λ ∈ [0, 1] of the change rate, and the tightness threshold μ ∈ [0, 1]. Considering the
interdependence among these parameters, an orthogonal matrix was employed to identify
the optimal parameter combination with a high likelihood. During the experimental
design phase, each parameter variable was treated as an independent factor. Feasible
values corresponding to these factors are assigned as distinct levels. The complete set
of parameter combinations represents the experimental space. An orthogonal array L36
(63 × 37) is employed, which comprises 36 parameter combinations. There are parameters,
iter ∈ {1, 2, 3, 4, 5, 6}, λ ∈ {0.1, 0.2, 0.3, 0.4, 0.5, 0.6}, and μ ∈ {0.1, 0.2, 0.3, 0.4, 0.5, 0.6}, that
we exclusively consider the initial three columns of the orthogonal array to facilitate the
analysis. Among the 36 parameter combinations, the one with the highest F − measure +
Accuracy is selected as the optimal configuration. Through the experiments, the parameters
are set to iter = 2, λ = 0.1, and μ = 0.4.
ln y = ln c − k ln x (15)
It was observed that ln y and ln x exhibit a linear relationship. Thus, the analysis of
the power-law distribution of x and y was transformed into a linear relationship analysis of
ln x and ln y.
In the clustering result of the BioGRID dataset, we took the logarithm of the cluster
size x and the corresponding cluster number y, resulting in transformed variables x = ln x
and y = ln y. To explore whether there is a linear relationship between x and y , a
linear fitting method was performed on x and y . The results of the linear fitting analysis
conducted on x and y is shown in Figure 3, providing valuable insights into the nature of
their relationship.
344
Electronics 2023, 12, 3007
Table 3 presents the calculated p-value and R2 for the linear fitting analysis conducted
on x and y . A small p-value indicates a strong fit of the clustering result, demonstrating
good fitting effectiveness. Similarly, a large value of R2 suggests a favorable fit. In Table 3,
the obtained p-value is 9.9 × 10−7 , and the value of R2 is 0.5. Thus, the sizes of clusters
generated by the proposed algorithm in the PPI network follow a power-law distribution,
along with the corresponding numbers of these clusters.
Criteria Value
p-value 9.90771462 × 10−7
R2 0.5001443526421876
345
Electronics 2023, 12, 3007
(a) (b)
(c) (d)
(a) (b)
(c) (d)
Figure 5. MIPS as benchmarks: Evaluation results by different algorithms on (a) Gavin02; (b) Gavin06;
(c) K-extend; (d) BioGRID.
346
Electronics 2023, 12, 3007
In summary, the GCAPL algorithm has good performance in detecting protein com-
plexes. The GCAPL algorithm uses not only micro-topological structure metrics but also
the macro-topological structure characteristic of the power-law distribution about clus-
ters, and it can obtain better results in complex detection. The GCAPL algorithm further
explores the relationship between network topological characteristics and functional mod-
ules in PPI networks, which is of great significance for improving the accuracy of protein
complex detection.
(a) (b)
(c) (d)
Figure 6. Examples of predicted protein complexes: (a) cluster a; (b) cluster b; (c) cluster c; (d)
cluster d.
347
Electronics 2023, 12, 3007
5. Conclusions
Detecting protein complexes is of great significance for understanding biological
mechanisms. This paper proposes a network clustering algorithm fused with power-law
distribution for protein complex detection. The algorithm begins by calculating node
weights, taking into account micro-topological structure metrics. Subsequently, the algo-
rithm selects the non-clustered nodes with the higher weights as seeds and forms initial
clusters around the seeds. Next, the algorithm greedily adds candidate nodes into the
initial clusters based on the characteristics of scale-free power-law distribution to generate
candidate clusters. A power-law distribution function, based on the macro-topological
structure feature of power-law distribution about cluster size and number, is established to
guide the cluster generation process. The power-law distribution function is employed to
determine whether a candidate cluster qualifies as a final cluster. Compared with other
algorithms, the F-measure + Accuracy of GCAPL improves by an average of 12.23% and
10.97% on the CYC2008 and MIPS benchmarks, respectively. The experimental analysis
reveals that the proposed algorithm exhibits distinct advantages over other approaches.
The GCAPL algorithm mainly considers the biological network whose community
size conforms to the power-law distribution characteristics. The algorithm does not take
into account other distribution characteristics of the community size and fully considers the
preferential attachment. The above information may further improve the performance of
our algorithm to detect protein complexes. In addition, in real PPI networks, the connections
between nodes are subject to constant changes, leading to variations in network topological
structures. To mine functional modules in dynamic PPI networks, our future work will
also focus on constructing dynamic networks and developing dynamic protein complex
identification methods.
Author Contributions: Conceptualizing the algorithm, designing the method and revising the draft,
J.W.; implementation of the computer code and writing the original draft, Y.J.; revising the manuscript,
A.K.S.; visualizing and curating data, Y.S. All authors have read and agreed to the published version
of the manuscript.
Funding: This paper was funded by the National Natural Science Foundation of China (No. 62006145);
the Scientific and Technological Innovation Programs of Higher Education Institutions in Shanxi,
China (No. 2020L0245); the Youth Science Foundation of Shanxi University of Finance and Eco-
nomics, China (No. QN-202016); and Shandong Provincial Natural Science Foundation, China
(No. ZR2020MF146).
348
Electronics 2023, 12, 3007
Data Availability Statement: The datasets used in this study are publicly available and downloaded
from the BioGRID database (https://downloads.thebiogrid.org/BioGRID, accessed on 1 March
2023), MIPS database (http://mips.gsf.de, accessed on 8 September 2019), and CYC2008 complexes
database (http://wodaklab.org/cyc2008/, accessed on 12 April 2023).
Acknowledgments: This study received support from the Teaching and Research Department of
Computer Science and Technology, Shanxi University of Finance and Economics, and all authors
would like to express their gratitude for this.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Wu, L.; Huang, S.; Wu, F.; Jiang, Q.; Yao, S.; Jin, X. Protein Subnuclear Localization Based on Radius-SMOTE and Kernel Linear
Discriminant Analysis Combined with Random Forest. Electronics 2020, 9, 1566. [CrossRef]
2. Ito, T.; Chiba, T.; Ozawa, R.; Yoshida, M.; Hattori, M.; Sakaki, Y. A comprehensive two-hybrid analysis to explore the yeast protein
interactome. Proc. Natl. Acad. Sci. USA 2001, 98, 4569–4574. [CrossRef] [PubMed]
3. Causier, B.; Davies, B. Analysing protein-protein interactions with the yeast two-hybrid system. Plant Mol. Biol. 2002, 50, 855–870.
[CrossRef] [PubMed]
4. Puig, O.; Caspary, F.; Rigaut, G.; Rutz, B.; Bouveret, E.; Bragado-Nilsson, E.; Wilm, M.; Séraphin, B. The tandem affinity
purification (TAP) method: A general procedure of protein complex purification. Methods 2001, 24, 218–229. [CrossRef] [PubMed]
5. Rahiminejad, S.; Maurya, M.R.; Subramaniam, S. Topological and functional comparison of community detection algorithms in
biological networks. BMC Bioinform. 2019, 20, 212. [CrossRef]
6. Spirin, V.; Mirny, L.A. Protein complexes and functional modules in molecular networks. Proc. Natl. Acad. Sci. USA 2003, 100,
12123–12128. [CrossRef]
7. Bai, L.; Cheng, X.; Liang, J.; Guo, Y. Fast graph clustering with a new description model for community detection. Inf. Sci. 2017,
388–389, 37–47. [CrossRef]
8. Saxena, A.; Prasad, M.; Gupta, A.; Bharill, N.; Patel, O.P.; Tiwari, A.; Er, M.J.; Ding, W.; Lin, C.-T. A review of clustering techniques
and developments. Neurocomputing 2017, 267, 664–681. [CrossRef]
9. Emmons, S.; Kobourov, S.; Gallant, M.; Börner, K. Analysis of network clustering algorithms and cluster quality metrics at scale.
PLoS ONE 2016, 11, e0159161. [CrossRef]
10. Bhowmick, S.S.; Seah, B.S. Clustering and summarizing protein-protein interaction networks: A survey. IEEE Trans. Knowl. Data
Eng. 2016, 28, 638–658. [CrossRef]
11. Pan, Y.; Guan, J.; Yao, H.; Shi, Y.; Zhou, Y. Computational methods for protein complex prediction: A survey. J. Front. Comput. Sci.
Technol. 2022, 16, 1–20.
12. Manipur, I.; Giordano, M.; Piccirillo, M.; Parashuraman, S.; Maddalena, L. Community Detection in Protein-Protein Interaction
Networks and Applications. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021, 20, 217–237. [CrossRef]
13. Liu, G.; Wong, L.; Chua, H.N. Complex discovery from weighted PPI networks. Bioinformatics 2009, 25, 1891–1897. [CrossRef]
14. Bader, G.D.; Hogue, C.W.V. An automated method for finding molecular complexes in large protein interaction networks. BMC
Bioinform. 2003, 4, 2. [CrossRef]
15. Palla, G.; Derényi, I.; Farkas, I.; Vicsek, T. Uncovering the overlapping community structure of complex networks in nature and
society. Nature 2005, 435, 814–818. [CrossRef] [PubMed]
16. Amin, A.U.; Shinbo, Y.; Mihara, K.; Kurokawa, K.; Kanaya, S. Development and implementation of an algorithm for detection of
protein complexes in large interaction networks. BMC Bioinform. 2006, 7, 207. [CrossRef]
17. Li, M.; Chen, J.-E.; Wang, J.-X.; Hu, B.; Chen, G. Modifying the DPClus algorithm for identifying protein complexes based on new
topological structures. BMC Bioinform. 2008, 9, 398. [CrossRef] [PubMed]
18. Wang, J.; Zheng, W.; Qian, Y.; Liang, J. A seed expansion graph clustering method for protein complexes detection in protein
interaction networks. Molecules 2017, 22, 2179. [CrossRef]
19. Leung, H.C.; Xiang, Q.; Yiu, S.M.; Chin, F.Y. Predicting protein complexes from PPI data: A core-attachment approach. J. Comput.
Biol. 2009, 16, 133–144. [CrossRef] [PubMed]
20. Yue, L.; Jun, X.; Sihang, Z.; Siwei, W.; Xifeng, G.; Xihong, Y.; Ke, L.; Wenxuan, T.; Wang, L.X. A survey of deep graph clustering:
Taxonomy, challenge, and application. arXiv 2022, arXiv:2211.12875.
21. Sun, H.; He, F.; Huang, J.; Sun, Y.; Li, Y.; Wang, C.; He, L.; Sun, Z.; Jia, X. Network embedding for community detection in
attributed networks. ACM Trans. Knowl. Discov. Data 2020, 14, 1–25. [CrossRef]
22. Kumar, S.; Panda, B.S.; Aggarwal, D. Community detection in complex networks using network embedding and gravitational
search algorithm. J. Intell. Inf. Syst. 2021, 57, 51–72. [CrossRef]
23. Wang, R.; Ma, H.; Wang, C. An ensemble learning framework for detecting protein complexes from PPI networks. Front. Genet.
2022, 13, 839949. [CrossRef]
24. Liu, X.; Yang, Z.; Zhou, Z.; Sun, Y.; Lin, H.; Wang, J.; Xu, B. The impact of protein interaction networks’ characteristics on
computational complex de-tection methods. J. Theor. Biol. 2018, 439, 141–151. [CrossRef]
349
Electronics 2023, 12, 3007
25. Cherifi, H.; Palla, G.; Szymanski, B.K.; Lu, X. On community structure in complex networks: Challenges and opportunities. Appl.
Netw. Sci. 2019, 4, 117. [CrossRef]
26. Huang, Z.; Zhong, X.; Wang, Q.; Gong, M.; Ma, X. Detecting community in attributed networks by dynamically exploring node
attributes and topological structure. Knowl.-Based Syst. 2020, 196, 105760. [CrossRef]
27. Ghalmane, Z.; Cherifi, C.; Cherifi, H.; El Hassouni, M. Centrality in complex networks with overlapping community structure.
Sci. Rep. 2019, 9, 10133. [CrossRef]
28. Rajeh, S.; Savonnet, M.; Leclercq, E.; Cherifi, H. Characterizing the interactions between classical and community-aware centrality
measures in complex networks. Sci. Rep. 2021, 11, 10088. [CrossRef] [PubMed]
29. Girvan, M.; Newman, M.E.J. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 2002, 99,
7821–7826. [CrossRef]
30. Sangaiah, A.K.; Rezaei, S.; Javadpour, A.; Zhang, W. Explainable AI in big data intelligence of community detection for
digitalization e-healthcare services. Appl. Soft Comput. 2023, 136, 110119. [CrossRef]
31. Ma, J.; Fan, J. Local optimization for clique-based overlapping community detection in complex networks. IEEE Access 2019, 8,
5091–5103. [CrossRef]
32. Kustudic, M.; Xue, B.; Zhong, H.; Tan, L.; Niu, B. Identifying Communication Topologies on Twitter. Electronics 2021, 10, 2151.
[CrossRef]
33. Watts, D.J.; Strogatz, S.H. Collective dynamics of ‘small-world’ networks. Nature 1998, 393, 440–442. [CrossRef]
34. Gavin, A.C.; Bösche, M.; Krause, R.; Grandi, P.; Marzioch, M.; Bauer, A.; Schultz, J.; Rick, J.M.; Michon, A.M.; Cruciat, C.M.; et al.
Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415, 141–147. [CrossRef]
[PubMed]
35. Gavin, A.-C.; Aloy, P.; Grandi, P.; Krause, R.; Boesche, M.; Marzioch, M.; Rau, C.; Jensen, L.J.; Bastuck, S.; Dümpelfeld, B.; et al.
Proteome survey reveals modularity of the yeast cell machinery. Nature 2006, 440, 631–636. [CrossRef]
36. Krogan, N.J.; Cagney, G.; Yu, H.; Zhong, G.; Guo, X.; Ignatchenko, A.; Li, J.; Pu, S.; Datta, N.; Tikuisis, A.P.; et al. Global landscape
of protein complexes in the yeast Saccharomyces cerevisiae. Nature 2006, 440, 637–643. [CrossRef]
37. Stark, C.; Breitkreutz, B.J.; Reguly, T.; Boucher, L.; Breitkreutz, A.; Tyers, M. BioGRID: A general repository for interaction datasets.
Nucleic Acids Res. 2006, 34 (Suppl. S1), D535–D539. [CrossRef]
38. Pu, S.; Wong, J.; Turner, B.; Cho, E.; Wodak, S.J. Up-to-date catalogues of yeast protein complexes. Nucleic Acids Res. 2009, 37,
825–831. [CrossRef] [PubMed]
39. Brohée, S.; Van Helden, J. Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinform. 2006,
7, 488. [CrossRef] [PubMed]
40. Li, X.; Wu, M.; Kwoh, C.-K.; Ng, S.-K. Computational approaches for detecting protein complexes from protein interaction
networks: A survey. BMC Genom. 2010, 11, S3. [CrossRef] [PubMed]
41. Ma, X.; Gao, L. Predicting protein complexes in protein interaction networks using a core-attachment algorithm based on graph
communicability. Inf. Sci. 2012, 189, 233–254. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
350
electronics
Article
Graph Convolution Network over Dependency Structure
Improve Knowledge Base Question Answering
Chenggong Zhang 1,2, *, Daren Zha 2 , Lei Wang 2 , Nan Mu 2 , Chengwei Yang 3 , Bin Wang 4 and Fuyong Xu 4, *
1 Institute of School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100043, China
2 Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100864, China;
[email protected] (D.Z.); [email protected] (L.W.); [email protected] (N.M.)
3 School of Management Science and Engineering, Shandong University of Finance and Economics,
Jinan 250014, China; [email protected]
4 School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China;
[email protected]
* Correspondence: [email protected] (C.Z.); [email protected] (F.X.)
Abstract: Knowledge base question answering (KBQA) can be divided into two types according to
the type of complexity: questions with constraints and questions with multiple hops of relationships.
Previous work on knowledge base question answering have mostly focused on entities and relations.
In a multihop question, it is insufficient to focus solely on topic entities and their relations since the
relation between words also contains some important information. In addition, because the question
contains constraints or multiple relationships, the information is difficult to capture, or the constraints
are missed. In this paper, we applied a dependency structure to questions that capture relation
information (e.g., constraint) between the words in question through a graph convolution network.
The captured relation information is integrated into the question for re-encoding, and the information
is used to generate and rank query graphs. Compared with existing sequence models and query graph
generation models, our approach achieves a 0.8–3% improvement on two benchmark datasets.
user questions and to provide accurate answers through fact retrieval and reasoning within
the knowledge base. The main process of KBQA is shown in Figure 1.
A knowledge base (KB) stores many complex structured information sets commonly
represented by triples (entity, entity, and the relations between them). The task of knowl-
edge base question answering (KBQA) is to answer the users’ natural language questions
using a knowledge base. For example, as shown in Figure 2, the triple starring (Jackie Chan,
New Fist of Fury), release date (New Fist of Fury, 8 July 1976), and directed by (New Fist of
Fury, Lo wei) can be used to answer the question “Who was the director of Jackie Chan’s
first starring film?”.
Figure 2. Regarding the triples involved in the question “Who was the director of Jackie Chan’s first
starring film?” in the knowledge graph; bold letters represent entities, pink circles represent topic
entities, blue circles represent traversed entities, green circles represent irrelevant entities, and orange
letters represent critical paths.
352
Electronics 2023, 12, 2675
Previous work [1–3] on KBQA mainly focused on external resources, pattern matching,
or the construction of handcrafted features [4,5] to address simple questions. These methods
need labeled logical supervision. However, these methods have difficulty dealing with
complex questions containing constraints, e.g., “the first” in the question “Who is the first
president of the United States”.
To address constraints in natural language questions, staged query graph generation
methods [6–8] have been proposed. These methods first identify the single-hop relation path
and then add constraints to the relation path from a query graph. The reply is obtainable by
executing the query graph in the knowledge base. However, in reality, there are questions
of not only single relations but also multihop relations, such as “Who is the wife of the
founder of Facebook?” There are two hops between the answer and “Facebook”, namely,
“wife” and “founder”. To answer this type of question, the longer relation path has to be
considered, which will increase the search space exponentially. The beam search method
was introduced by References [9,10] to reduce the search space by considering the best
matching relation to reduce the number of multihop relation paths. Lan et al. [11] proposed
modifying the staged query graph generation method to deal with longer relation paths
and large search spaces. However, allowing a longer relationship path causes constraints
to be ignored or connected with the wrong entity, resulting in errors in the prediction of
the intermediate relationships. If the prediction of the intermediate relationship is wrong,
the subsequent prediction will also be wrong. In query graph generation’s operation, it is
therefore particularly significant to analyze the relation between words.
A dependency tree can help the model capture the long-distance relationship between
words. Models that use the dependency parses [12,13] have been demonstrated to be very
effective in relationship extraction, since they capture long-distance semantic relations.
Multihop questions generally contain constraints and multiple relations. For example, for
the query “What posts did John Adams hold before he was president?”, the constraint is
“before”, and the answer is related to “John Adams” via two hops, namely, “president” and
“job”. To solve this situation, the relations between words need to be focused on to reach the
correct answers. We use the dependency analysis of the input question to assist the model
in selecting relations. An efficient graph convolution operation [14] was used to encode the
input question’s dependency structure to extract the entity-centered representation.
In this paper, to focus on the relationship between words and the constraints in a
question due to a long relation path, we propose a dependency structure for a question
based on a graph convolution network (GCN), which encodes the dependency formation
above the input query with efficient graph convolution actions to improve the attention paid
to the constraints in a question and then guides the actions of the query graph generation
and final ranking. This study makes three research contributions:
• For underutilization of the relationships between words in the question, we propose a
question answering method on a knowledge base by applying GCNs, which permits it
to efficiently pool information above arbitrary dependency formations and to produce
a more effective sequence vector representation.
• For the problem of an incorrect relation selection in the process of query graph gener-
ation, we analyze the dependency structure to establish the relation between words
and use the structure to obtain a more effective representation to further affect the
ranking and action selection of the query graph.
• On the WebQuestionsSP (WQSP) and ComplexQuestions (CQ) datasets, our method
performs well, and it is more effective in ranking query graphs.
The remainder of this paper is organized as follows. Related work about KBQA is
introduced in Section 2. Section 3 describes the proposed methods in this paper. Section 4
introduces the experiments and shows the results in this paper. Section 5 concludes this
paper and provides suggestions about KBQA.
353
Electronics 2023, 12, 2675
2. Related Work
The current approaches that are proposed to deal with the KBQA task can be ap-
proximately classified into two categories: semantic parsing (SP) and embedding-based
approaches [15,16]. These systems [17,18] are effective and provide an in-depth explanation
of the query, but they need reinforcement learning or expensive data annotations. However,
most SP-based approaches rely on aspects or handcrafted rules that limit their scalability
and transferability.
Recently, embedding-based methods [19,20] for KBQA have become increasingly
popular. Unlike SP-based methods, embedding-based approaches first allocate competi-
tors from the KG, depicting these competitors as distributed representations, and then
choose and rank these representations. Some embedding-based models directly predict
solutions [21,22], while others concentrate on separating relation trails and require fur-
ther procedures to obtain an answer [7,23]. Our method follows the same procedure
as embedding-based models and regards query graph generation as a multistep rela-
tion path extraction process. References [9,10,24] proposed considering better relations.
Lan et al. (2020) [11] proposed modifying a query graph generation process from longer
relations. However, the current method is defective in its action accuracy for query graph
generation. Extending the relationship path and allowing for longer relationship paths
means increasing intermediate relationships, and the information in the question may be
omitted. Therefore, capturing the relationship between words is particularly important
in the process of forming query graphs because it affects whether the information in the
question is fully utilized.
Our work also uses a dependency structure to help model the captured relations
between words. A dependency tree can help the relation extraction model capture the long-
distance relations between words. One common approach [12,13] is exploiting structure
features on parsed tree below the lowest common ancestor (LCA).
Our method is based on the existing query graph generation process method. We
add a dependency structure to the query to obtain the relation between the words and
to further improve the attention paid to the constraints in a question. Compared with
previous methods, we introduce the dependency structure of the question and analyze
it through a graph convolution network to focus more attention on the constraints. In
summary, to obtain a more effective representation, a graph convolution network is used,
which allows for efficiently pooling information from an arbitrary dependency structure
to achieve an effective action and to increase the accuracy of the intermediary relation
selection in the query graph generation process.
3. Method
3.1. Query Graph Generation
Formally, our method followed Lan et al. (2020) [11], which is an extension of the
existing staged query graph generation method. We use beam search to iteratively generate
candidate query graphs. The grounded entity represents the existing entity in the knowl-
edge base. The existential variable and lambda variable are ungrounded entities, where the
lambda variable represents the answer. Finally, the aggregate function is used to perform
function operations on specific entities, which usually captures some numerical features.
We assume that a set of query graphs is generated after the k − th iteration, denoted as
Gk . At the k + 1 iteration, we apply, extend, connect, and aggregate (the details are shown
in Figure 3) actions to grow Gk by one more edge and nodes. The extended action is used
to extend the core relation path by finding the relation. The action of the connection is to
find other grounded entities in the question and to connect them to the existing nodes. We
denote Gk+1 as the resulting query graph. After each iteration, a large number of query
graphs with applied actions will be generated. We use graph convolutional networks
(explained in Section 3.2) to select query graphs that use the correct action, which will affect
their scores.
354
Electronics 2023, 12, 2675
Then, we describe how the query graph is generated. At every iteration, the actions
{extend, connect, aggregate} will apply to query graph candidates. As shown in Figure 3, we
show how the three actions act on the query graph (in fact, there is no sequence for the
three actions) for the question “Who was the director of Jackie Chan’s first starring film?”.
First, in query graph (a), starting from a grounding entity “Jackie Chan”, a core relation
path is found to connect entities and answers. If there are no redundant constraint words
and other relations, the answer is x. However, because the question contains other relations,
query graph (b) applied an extended action to extend the core relation path. The query
graph (c) applies a connection action to find other grounded entities in the question and
connects them to the existing nodes. The query graph (d) applies an aggregate action to
add constraint nodes to the grounded entity or existential variable.
D E
VWDUUHG VWDUUHG VWDUULQJ GLUHFWHG
-DFNLH&KDQ \ VWDUULQJ
[ -DFNLH&KDQ \ \ GLUHFW \ BE\ [
ĞdžƚĞŶĚ
ĐŽŶŶĞĐƚ
Figure 3. A possible sequence of the graph generation for “Who was director of Jackie Chan’s first
starring film?” Note that (b–d) are the results of the extend, connect and aggregate actions, respectively.
In practice, the order of each action is not fixed, so several potential query graphs will
be generated. It is very important to select the correct action sequence and to determine
the correct query graph. This will affect the correctness of the final result query graph
because query graph candidates may contain intermediate relations and incorrect entities.
Following our intuition mentioned in the first subsection, to enhance the generation of the
query graph and to improve the accuracy of the intermediate relations, we employ the
dependency structure of the input question.
355
Electronics 2023, 12, 2675
This operation is superimposed onto Layer l to obtain a deep GCN network, where
(0) (0) ( L) ( L)
we set h1 , ..., hn to be the input word vector obtained by BERT and h1 , ..., hn as the
output word representations. All operations can be efficiently applied through matrix
multiplication, making the method suitable for batch computing and running on a GPU.
Thus far, we have obtained the question representation containing the relation between
words, which is used to affect the selection of the relations in the ranking of the query
graph. In addition, the representation also captures the edge information needed by the
selection relation.
tŚŽ
ŝƐ ĚŝƌĞĐƚŽƌ ͍
ƚŚĞ Ĩŝůŵ
:ĂĐŬŝĞ ąƐ
Figure 4. The dependency structure of the question “Who was the director of Jackie Chan’s first
starring film?” We treat the dependency graph as undirected.
ĞƉĞŶĚĞŶĐLJ
Y $
ŶĂůLJƐŝƐ
'ƌĂƉŚ
ŽŶǀŽůƵƚŝŽŶĂů KL
EĞƚǁŽƌŬ
>ĂŶŐƵĂŐĞ
W DŽĚĞů KT
356
Electronics 2023, 12, 2675
where hq is the question vector, h(l ) is the output vector from the GCN, and MLP(·) denotes
an MLP layer. Then, we derive a vector v g for each graph and put it into FFN. Finally, we
calculate the probability with softmax.
3.4. Learning
Without any correct query graph, we use question–answer pairs to train our model.
Inspried by Das et al. (2018) [28], we use a RL (reinforcement learning) algorithm to
obtain pθ ( gvq ) so that the query graph can fit the problem better. θ is the learnable
parameters. As our focus is not on the model’s optimization approach, but on a novel
graph’s application based method to KBQA, the procedure of model learning and RL
exploration is not described in detail.
4. Experiments
4.1. Datasets and Settings
WebQuestionsSP(WQSP) [8] WebQuestionsSP includes 5810 train samples. WQSP
annotates SPARQL query statements for each answer and removes some questions with
ambiguities, unclear intentions or no clear answer. WebQuestionsSP was created for the
task of question answering over structured data, specifically targeting Freebase, a large
knowledge base. Each sample in WebQuestionsSP is associated with a SPARQL query
statement that retrieves the answer from Freebase. To ensure the quality and clarity of
the dataset, certain questions that had ambiguities, unclear intentions, or no clear answer
were removed during the annotation process. This helps to maintain a reliable and focused
dataset for training and evaluating question answering models. The statistic of WQSP is
shown as Table 1.
WQSP CQ
Total QA pairs 4737 2100
Training set QA pairs 3098 1300
Test set QA pairs 1639 800
357
Electronics 2023, 12, 2675
Method Dataset
WQSP (F1) CQ (F1)
[8] 69.0 -
[6] - 40.9
[7] - 42.8
[29] 67.9 -
[9] 68.5 35.3
[30] 60.3 -
[31] 72.6 -
[11] 74.0 43.3
Our 74.8 44.2
358
Electronics 2023, 12, 2675
Table 3. Performance of question with constraints on the test dataset of CQ and WQSP.
Method CQ WQSP
Lan et al. (2020) [11] 0.715 0.640
Our method 0.730 0.670
To summarize, our method not only affects the selection of relations in the graph gen-
eration process but also affects the ranking of the final query graph and even successfully
captures some constraints that are difficult to capture. Therefore, our method is proven to
be effective. Our method successfully affects the query graph generation process by con-
voluting the dependency structure of the question. In addition, the results show that our
system performs stably and works well on not only multi-constraint questions but also on
simple questions.
5. Conclusions
In this paper, we proposed a graph convolution operation on a dependency structure
of the question to obtain relation information between words and then integrated the
relation information into the question vector to generate and rank the query graph. Our
proposed methods have a dual objective of reducing the search space and improving the
accuracy of relation selection during the query graph generation process. This, in turn,
has a direct impact on the ranking of query graphs. Through experimentation, the results
have demonstrates the effectiveness of our approach in addressing both complex questions
and the WQSP dataset, thereby highlighting the robustness of our method. Notably, our
method has shown a significant improvement over previous baseline methods.
Our methods also have its own weaknesses. One such weakness may be in the
handling of certain types of questions or datasets that require specialized treatment or have
unique characteristics. Additionally, there may be limitations in terms of scalability and
efficiency when dealing with extremely large-scale datasets or in scenarios with real-time
constraints. These weaknesses provide opportunities for future research and improvement.
In future work, we plan to explore additional enhancements. One aspect we will focus
on is pruning dependency structures to eliminate unnecessary information, which can
help streamline the processing and improve efficiency. Furthermore, we aim to increase
the accuracy of answer prediction, ensuring more precise and reliable responses. By
continuously refining and expanding our approach, we anticipate further advancements in
the field of question answering systems.
Author Contributions: Conceptualization, C.Z.; methodology, C.Z.; software, D.Z.; validation, D.Z.,
L.W. and C.Z.; formal analysis, N.M.; investigation, C.Z.; resources, C.Z.; writing—original draft
preparation, C.Y., C.Z.; writing—review and editing, C.Z., B.W., F.X. All authors have read and agreed
to the published version of the manuscript.
Funding: This work was supported in part by the National Social Science Foundation under Award
19BYY076; in part by the Key R & D project of Shandong Province 2019 JZZY010129; in part by
the Shandong Natural Science Foundation under Award ZR2021MF064, Award ZR2021MF064, and
Award ZR2021QG041; and in part by the Shandong Provincial Social Science Planning Project
under Award 19BJCJ51, Award 18CXWJ01, and Award 18BJYJ04. This project is also supported
by Major Science and Technology Demonstration Projects: Intelligent Perception Technology in
359
Electronics 2023, 12, 2675
Complex Dynamic Scenes and IT Application Demonstration in Emergency Management and Social
Governance, No. 2021SFGC0102).
Data Availability Statement: The data presented in this study are openly available in [6,8].
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Bordes, A.; Usunier, N.; Chopra, S.; Weston, J. Large-scale simple question answering with memory networks. arXiv 2015,
arXiv:1506.02075. [CrossRef].
2. Cai, Q.; Alexander, Y. Large-scale semantic parsing via schema matching and lexicon extension. In Proceedings of the Annual
Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 4–9 August 2013; pp. 423–433.
3. Krishnamurthy, J.; Mitchel, T.M. Weakly supervised training of semantic parsers. In Proceedings of the Conference on Empirical
Methods in Natural Language Processing, Jeju Island, Republic of Korea, 12–14 July 2012; pp. 754–765.
4. Abujabal, A.; Yahya, M.; Riedewald, M.; Weikum, G. Automated template generation for question answering over knowledge
graphs. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 1191–1200.
[CrossRef]
5. Hu, S.; Zou, L.; Yu, J.X.; Wang, H.; Zhao, D. Answering natural language questions by subgraph matching over knowledge
graphs. IEEE Trans. Knowl. Data Eng. 2017, 30, 824–837. [CrossRef]
6. Bao, J.; Duan, N.; Yan, Z.; Zhou, M.; Zhao, T. Constraint-based question answering with knowledge graph. In Proceedings of the
COLING, Osaka, Japan, 11–16 December 2016; pp. 2503–2514.
7. Luo, K.; Lin, F.; Luo, X.; Zhu, K.Q. Knowledge base question answering via encoding of complex query graphs. In Proceedings
of the Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018;
pp. 2185–2194. [CrossRef]
8. Yih, W.-T.; Chang, M.-W.; He, X.; Gao, J. Semantic Parsing via Staged Query Graph Generation: Question Answering with
Knowledge Base. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Beijing, China,
26–31 July 2015.
9. Chen, Z.-Y.; Chang, C.-H.; Chen, Y.-P.; Nayak, J.; Ku, L.-W. UHop: An unrestricted-hop relation extraction framework for
knowledge-based question answering. arXiv 2019, arXiv:1904.01246. [CrossRef].
10. Lan, Y.; Wang, S.; Jiang, J. Multi-hop knowledge base question answering with an iterative sequence matching model. In
Proceedings of the IEEE International Conference on Data Mining (ICDM), Beijing, China, 8–11 November 2019; pp. 359–368.
[CrossRef]
11. Lan, Y.; Jiang, J. Query graph generation for answering multi-hop complex questions from knowledge bases. In Proceedings of
the Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020. [CrossRef]
12. Miwa, M.; Bansal, M. End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures. In Proceedings of the
Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016. [CrossRef]
13. Xu, K.; Feng, Y.; Huang, S.; Zhao, D. Semantic Relation Classification via Convolutional Neural Networks with Simple Negative
Sampling. Comput. Sci. 2015, 71, 941–949. [CrossRef]
14. Youcef, D.; Gautam, S.; Wei, L.J.C. Fast and accurate convolution neural network for detecting manufacturing data. IEEE Trans.
Ind. Inform. 2020, 17, 2947–2955. [CrossRef]
15. Peng, H.; Chang, M.; Yih, W.T. Maximum margin reward networks for learning from explicit and implicit supervision. In Pro-
ceedings of the Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017;
pp. 2368–2378. [CrossRef]
16. Sorokin, D.; Gurevych, I. Modeling semantics with gated graph neural networks for knowledge base question answering. In
Proceedings of the 27th International Conference on Computational Linguistics, Association for Computational Linguistics, Santa
Fe, NM, USA, 20–26 August 2018; pp. 3306–3317. [CrossRef]
17. Iyyer, M.; Yih, W.-T.; Chang, M.-W. Search-based neural structured learning for sequential question answering. In Proceedings
of the Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017;
pp. 1821–1831. [CrossRef]
18. Krishnamurthy, J.; Dasigi, P.; Gardner, M. Neural semantic parsing with type constraints for semi-structured tables. In Proceedings
of the Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017;
pp. 1516–1526. [CrossRef]
19. Moqurrab, S.A.; Ayub, U.; Anjum, A.; Asghar, S.; Srivastava, G. An accurate deep learning model for clinical entity recognition
from clinical notes. IEEE J. Biomed. Health Inform. 2021, 25, 3804–3811. [CrossRef] [PubMed]
20. Wang, F.; Wu, W.; Li, Z.; Zhou, M. Named entity disambiguation for questions in community question answering. Knowl.-Based
Syst. 2017, 126, 68–77. [CrossRef]
21. Bast, H.; Haussmann, E. More accurate question answering on freebase. In Proceedings of the 24th ACM International Conference
on Information and Knowledge Management, Melbourne, Australia, 18–23 October 2015; pp. 1431–1440. [CrossRef]
22. Chakraborty, N.; Lukovnikov, D.; Maheshwari, G.; Trivedi, P.; Lehmann, J.; Fischer, A. Introduction to neural network based
approaches for question answering over knowledge graphs. arXiv 2019, arXiv:1907.09361. [CrossRef].
360
Electronics 2023, 12, 2675
23. Chen, H.-C.; Chen, Z.-Y.; Huang, S.-Y.; Ku, L.-W.; Chiu, Y.-S.; Yang, W.-J. Relation extraction in knowledge base question
answering: From general-domain to the catering industry. In Proceedings of the International Conference on HCI in Business,
Government, and Organizations, Las Vegas, NV, USA, 15 July 2018; pp. 26–41. [CrossRef]
24. Yang, Z.; Garg, H.; Li, J.; Srivastava, G.; Cao, Z. Investigation of multiple heterogeneous relationships using a q-rung orthopair
fuzzy multi-criteria decision algorithm. Neural Comput. Appl. 2021, 33, 10771–10786. [CrossRef]
25. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding.
arXiv 2019, arXiv:1810.04805. [CrossRef].
26. Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [CrossRef].
27. Marcheggiani, D.; Ivan, T. Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling. In Proceedings
of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017;
pp. 1506–1515. [CrossRef]
28. Das, R.; Dhuliawala, S.; Zaheer, M.; Vilnis, L.; Durugkar, I.; Krishnamurthy, A.; Smola, A.; McCallum, A. Go for a walk and arrive
at the answer: Reasoning over paths in knowledge bases using reinforcement learning. arXiv 2018, arXiv:1711.05851.
29. Lan, Y.; Wang, S.; Jiang, J. Knowledge base question answering with topic units. In Proceedings of the International Joint
Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 5046–5052. [CrossRef]
30. Bhutani, N.; Suhara, Y.; Tan, W.-C.; Halevy, A.Y.; Jagadis, H.V. Open Information Extraction from Question-Answer Pairs. In
Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics:
Human Language Technologies (NAACL-HLT 2019), Minneapolis, MN, USA, 3–5 June 2019; pp. 2294–2305.
31. Ahmed, G.A.; Saha, A.; Kumar, V.; Bhambhani, M.; Sankaranarayanan, K.; Chakrabarti, S. Neural Program Induction for KBQA
Without Gold Programs or Query Annotations. In Proceedings of the International Joint Conference on Artificial Intelligence,
Macao, China, 10–16 August 2019; pp. 4890–4896. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
361
electronics
Article
A Collaborative Multi-Granularity Architecture for
Multi-Source IoT Sensor Data in Air Quality Evaluations
Wantong Li, Chao Zhang *, Yifan Cui and Jiale Shi
School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China;
[email protected] (W.L.); [email protected] (Y.C.); [email protected] (J.S.)
* Correspondence: [email protected]
Abstract: Air pollution (AP) is a significant environmental issue that poses a potential threat to
human health. Its adverse effects on human health are diverse, ranging from sensory discomfort
to acute physiological reactions. As such, air quality evaluation (AQE) serves as a crucial process
that involves the collection of samples from the environment and their analysis to measure AP levels.
With the proliferation of Internet of Things (IoT) devices and sensors, real-time and continuous
measurement of air pollutants in urban environments has become possible. However, the data
obtained from multiple sources of IoT sensors can be uncertain and inaccurate, posing challenges
in effectively utilizing and fusing this data. Meanwhile, differences in opinions among decision-
makers regarding AQE can affect the outcome of the final decision. To tackle these challenges,
this paper systematically investigates a novel multi-attribute group decision-making (MAGDM)
approach based on hesitant trapezoidal fuzzy (HTrF) information and discusses its application to
AQE. First, by combining HTrF sets (HTrFSs) with multi-granulation rough sets (MGRSs), a new
rough set model, named HTrF MGRSs, on a two-universe model is proposed. Second, the definition
and property of the presented model are studied. Third, a decision-making approach based on the
background of AQE is constructed via utilizing decision-making index sets (DMISs). Lastly, the
validity and feasibility of the constructed approach are demonstrated via a case study conducted
in the AQE setting using experimental and comparative analyses. The outcomes of the experiment
demonstrate that the presented architecture owns the ability to handle multi-source IoT sensor data
(MSIoTSD), providing a sensible conclusion for AQE. In summary, the MAGDM method presented
Citation: Li, W.; Zhang, C.; Cui, Y.; in this article is a promising scheme for solving decision-making problems, where HTrFSs possess
Shi, J. A Collaborative excellent information description capabilities and can adequately describe indecision and uncertainty
Multi-Granularity Architecture for
information. Meanwhile, MGRSs serve as an outstanding information fusion tool that can improve
Multi-Source IoT Sensor Data in Air
the quality and level of decision-making. DMISs are better able to analyze and evaluate information
Quality Evaluations. Electronics 2023,
and reduce the impact of disagreement on decision outcomes. The proposed architecture, therefore,
12, 2380. https://doi.org/10.3390/
provides a viable solution for MSIoTSD facing uncertainty or hesitancy in the AQE environment.
electronics12112380
Academic Editor: Franco Cicirelli Keywords: granular computing; multi-granulation rough set; hesitant trapezoidal fuzzy set; air
Received: 17 April 2023
quality evaluation
Revised: 13 May 2023
Accepted: 23 May 2023
Published: 24 May 2023
1. Introduction
AP [1,2] is a matter of paramount concern to both the environment and public health,
brought about by the contamination of air by chemical, physical, or biological agents. This
Copyright: © 2023 by the authors.
deleterious phenomenon is known to have far-reaching implications in the agricultural
Licensee MDPI, Basel, Switzerland.
industry [3], as it has been demonstrated to cause acid rain, reduced crop production, and
This article is an open access article
inferior soil fertility. Notably, AP is a leading contributor to the global climate change
distributed under the terms and
crisis, resulting in more severe weather patterns across the globe [4]. Recent studies have
conditions of the Creative Commons
Attribution (CC BY) license (https://
provided compelling evidence to suggest that exposure to AP is linked to several negative
creativecommons.org/licenses/by/
health outcomes, including developmental delays in children [5], increased risk of mental
4.0/).
illnesses such as depression [6], and poor reproductive health in females [7]. In this light,
AP is currently one of the most significant risk factors affecting global health. According to
a survey by the European Environment Agency (EEA) in 2020, 96% of city residents in the
European Union were exposed to higher than recommended levels of fine particulate matter,
according to the World Health Organization (WHO) [8], resulting in 238,000 premature
deaths. Furthermore, WHO has conducted extensive research on the effects of AP and
found that environmental and household AP cause approximately 6.7 million deaths per
year, with 2.4 billion people exposed to hazardous levels of household AP.
AQE is an indispensable tool for comprehending the state of air quality and forecasting
its future trends. The objective of AQE is to mitigate the deleterious impact of AP and foster
a healthy atmospheric environment, and, thus, it has become a prominent research topic
in recent years [9,10]. Numerous scholars have conducted research in various directions
via using different approaches in the context of AQE. For instance, Oprea [11] utilized an
expert system to carry out research on knowledge modeling for better analysis of AP in
city regions. Wang et al. [12] proposed a deep convolutional neural network method for
predicting AP. Gu et al. [13] suggested a new fuzzy multiple linear regression model to
forecast the air quality index [14–16]. However, AQE, using neural networks, necessitates
an adequate amount of training examples to ensure sufficient training of the model, and
AQE based on expert systems requires frequent manual maintenance and manipulation
of the AQE knowledge base, which hinder the accuracy guarantee. With the widespread
adoption of IoT technologies [17–20], sensor networks have become increasingly popular
for collecting air quality data from multiple sources. IoT sensors [21] are capable of
collecting and transmitting data in real time, providing a dynamic understanding of AP
patterns by continuous and high-resolution measurements of air quality parameters. The
use of MSIoTSD allows for a more comprehensive and accurate assessment of air quality.
However, the accuracy and reliability of MSIoTSD [22] can be affected by various factors,
leading to uncertain data. Moreover, effectively utilizing and fusing MSIoTSD presents a
challenge. In contrast, AQE, using fuzzy methods, not only overcomes the limitations of the
aforementioned approaches, but can also effectively deal with multi-source uncertain data.
Furthermore, AQE is influenced by several factors, including different locations, attributes,
and times, which can be established as a typical MAGDM problem.
This paper primarily examines and resolves the AQE issue from three perspectives.
First, we investigate a fuzzy approach applied to AQE in the context of HTrFSs during the
information description process. Second, we use MGRSs to fuse multiple sources of AQE
data during the information fusion stage process. Finally, we employ DMISs to diminish
the impact of inconsistent opinions of individual decision-makers within a decision group
on the decision outcome during the information analysis process. Based on the analysis
above, we recall the components of HTrFSs and MGRSs below.
364
Electronics 2023, 12, 2380
they contain a specific interval with a full membership rank. Therefore, Ye [32] introduced
the concept of HTrFSs, which takes advantage of the unique benefits of TrFNs and HFSs.
The distinctive advantages of HTrFSs in dealing with uncertain information have prompted
scholars to conduct a substantial number of theoretical and practical explorations [33,34].
365
Electronics 2023, 12, 2380
2. Basic Knowledge
For a better understanding, this section introduces the fundamental concepts of HTrFSs
and MGRSs.
2.1. HTrFSs
HTrFSs have shown flexibility in handling hesitant, inaccurate information. Before
introducing the notion of HTrFSs, we first present the TrFN.
Definition 1 ([59]). A fuzzy number ã = ( a, b, c, d) is called a TrFN when its membership function
is denoted as: ⎧
⎪
⎪0, ( x < a or x > d)
⎪
⎪
⎨ ( x − a ) / ( b − a ), ( a ≤ x < b )
μα̃ ( x ) = , (1)
⎪
⎪ 1, (b ≤ x ≤ c)
⎪
⎪
⎩ ( x − d ) / ( c − d ), ( c < x ≤ d )
where 0 ≤ a ≤ b ≤ c ≤ d ≤ 1, a and the closed interval [b, c] and d stand for the lower, mode,
upper limits of ã, respectively.
E = { x, h E ( x )| x ∈ U }, (2)
As the rules for the operations of HTrFSs support decision-making processes to effi-
ciently analyze data, we present the laws for the operations of HTrFSs below.
366
Electronics 2023, 12, 2380
.
f
1. The complement of E1 , expressed as E1c , is given by ∀ x ∈ U, h E1c ( x ) =∼ h E1 ( x ) = (1 − a E1 ,
f f f
1 − bE1 ,1 − c E1 ,1 − d E1 ) | f = 1, 2, . . . l }.
2. The intersection of E1 and E2 , expressed as E1 ∩ E2 , is given by ∀ x ∈ U, h E1 ∩ E2 ( x )=
. /
f f f f f f f f
h E1 ( x ) ∧ h E2 ( x ) = ( a E1 ∧ a E2 , bE1 ∧ bE2 , c E1 ∧ c E2 , d E1 ∧ d E2 )| f = 1, 2, . . . l .
3. The union of E1 and E2 , expressed as E1 ∪ E2 , is given by ∀ x ∈ U, h E1 ∪ E2 ( x ) = h E1 ( x )∨
. /
f f f f f f f f
h E2 ( x ) = ( a E1 ∨ a E2 , bE1 ∨ bE2 , c E1 ∨ c E2 , d E1 ∨ d E2 )| f = 1, 2, . . . l .
The utilization of score functions represents a pivotal approach for the selection of the
optimal alternative in HTrF MAGDM problems; hence, we discuss the following notion of
HTrF score functions.
Definition 4 ([32]). For an HTrF element, h E ( x ), S(h E ( x )) = 4#(h1 ( x)) ∑ ã∈hE ( x) ã, where
E
#(h E ( x )) is the number of TrFNs in h E ( x ). For two HTrF elements, h E ( x ) and h F ( x ), if S(h E ( x )) ≥
S(h F ( x )), then h E ( x ) ≥ h F ( x ).
Definition 6 ([51]). Suppose U, V are two universes, and R is a binary compatibility relation family
over U × V, in respect to a family of binary mapping Fk : U → 2V u → {v ∈ V |(u, v) ∈ Rk } ,
Rk ∈ R, k = 1, 2, . . . , n. Then, the MG approximation space on two-universe is expressed as
(U, V, R).
Definition 7 ([52]). Suppose F1 and F2 are two binary mappings over U × V. ∀Y ⊆ V, the
pessimistic and optimistic lower and upper MG approximations in the matter of (U, V, R) are
expressed as:
apr P (Y ) = { x ∈ U | F1 ( x ) ⊆ Y ∧ F2 ( x ) ⊆ Y };
F1 + F2
(3)
aprO
F1 + F2
(Y ) = { x ∈ U | F1 ( x ) ⊆ Y ∨ F2 ( x ) ⊆ Y }; (5)
F1 + F2 (Y ) = apr
aprO O
F1 + F2
(Y c ) c , (6)
367
Electronics 2023, 12, 2380
⎧ ⎫
⎪ 2 3 ⎪
n O ⎨ ⎬
∑ Rk ( E) =
⎪
x, h n O (x) |x ∈ U ,
⎪
(9)
k =1 ⎩ ∑ Rk ( E) ⎭
k =1
" #
where h n O (x) = ∨nk=1 ∧y∈V h Rk c ( x, y) ∨ h E (y) ; h n O (x) =
∑ Rk ( E) ∑ Rk ( E)
k =1
" k =1
∧nk=1 ∨y∈V h Rk ( x, y) ∧ h E (y)}.
$ O
%
n O n
The pair ∑ R k ( E ), ∑ R k ( E ) indicates an optimistic HTrF MGRS on two-universe
k =1 k =1
of E in the matter of (U, V, Rk ).
368
Electronics 2023, 12, 2380
Proof.
n O "7 " #8 #
(1) ∀ x ∈ U, we have ∑ Rk ( Ec )= x, ∨nk=1 ∧y∈V h Rk c ( x, y) ∨ h Ec (y) x ∈ U
"7 " k =1 #8 # "7
= x, ∨nk=1 ∧y∈V (∼ h Rk ( x, y)) ∨ (∼ h"7
E ( y ))
x ∈ U = x, ∨nk=1 ∧#y∈8
" " V
∼ (h Rk ( x, y) ∧h E (y))} | x ∈ U } = x, ∼ (∧k=1 ∨y∈V (h Rk ( x, y) ∧ h E (y)) )
n
n O
x ∈ U } =( ∑ Rk ( E))c .
k =1
n O n O
∑ Rk ( Ec ) = ( ∑ Rk ( E))c is similarly obtained.
k =1 k =1
f f f
(2) Because of E ⊆ F, depending on Definition 5, we have h E (y)≺ h F (y) ⇔ a E ≤ a F ,bE ≤
. / . /
f f f f f f f f f
bF ,c E ≤ c F ,d E ≤ d F , so ∨nk=1 ∧y∈V a R c ∨ a E ≤ ∨nk=1 ∧y∈V a Rk c ∨ a F and
. / . k / . /
f f f f f f
∨nk=1 ∧y∈V bRk c ∨ bE ≤ ∨nk=1 ∧y∈V bRk c ∨ bF and ∨nk=1 ∧y∈V c Rk c ∨ c E ≤
. / . / . /
f f f f f f
∨k=1 ∧y∈V c R c ∨ c F and ∨k=1 ∧y∈V d R c ∨ d E ≤ ∨k=1 ∧y∈V d R c ∨ d F . There-
n n
k
n
k k
n O n O
fore, we have E ⊆ F ⇒ ∑ Rk ( E) ⊆ ∑ Rk ( F ).
k =1 k =1
n O n O
E ⊆ F ⇒ ∑ Rk ( E) ⊆ ∑ Rk ( F ) is similarly obtained.
k =1 k =1
n O
(3) ∀ x ∈ U, we have ∑ Rk ( E ∩ F ) = { x, ∨nk=1 ∧y∈V {h Rk c ( x, y) ∨ h E∩ F (y)}| x ∈ U } =
k =1
7 7
{ x, ∨nk=1 ∧y∈V {h Rk c ( x, y) ∨ (h E (y) ∧ h F (y))}| x ∈ U } = { x, (∨nk=1 ∧y∈V
f f f f f f f f f
{ a Rk c ∨ ( a E ∧ a F )}, ∨nk=1 ∧y∈V {bRk c ∨ (bE ∧ bF )}, ∨nk=1 ∧y∈V {c Rk c ∨ (c E ∧ c F )},
f f f f f f
∨nk=1 ∧y∈V {d Rk c ∨ (d E ∧ d F )})| x ∈ U } = { x, (∨nk=1 ∧y∈V {( a Rk c ∨ a E ) ∧ ( a Rk c ∨
f f f f f f f f f
a F )}, ∨nk=1 ∧y∈V {(bR c ∨ bE ) ∧ (bR c ∨ bF )}, ∨nk=1 ∧y∈V {(c Rk c ∨ cE ) ∧ ( c Rk c ∨ c F )},
k k
f f f f
∨nk=1 ∧y∈V {(d Rk c ∨ d E ) ∧ (d Rk c ∨ d F )})| x ∈ U } = { x, h n O ( x )| x ∈ U } ∧
∑ Rk ( E)
k =1
n O n O n O
{ x,h n O ( x )| x ∈ U }= ∑ Rk ( E) ∩ ∑ Rk ( F ) similarly, ∑ Rk ( E ∪ F )
∑ Rk ( F ) k =1 k =1 k =1
k =1
n O n O
= ∑ Rk ( E) ∪ ∑ Rk ( F ) is obtained.
k =1 k =1
n O n O
(4) Based on the above findings, it is easily obtained that ∑ Rk ( E ∪ F ) ⊇ ∑ Rk ( E) ∪
k =1 k =1
n O n O n O n O
∑ Rk ( F ) and ∑ Rk ( E ∩ F ) ⊆ ∑ Rk ( E)∩ ∑ Rk ( F ).
k =1 k =1 k =1 k =1
369
Electronics 2023, 12, 2380
n O n O
(2) ∑ Rk ( E) ⊇ ∑ Rk ( E), ∀ E ∈ HTrF (V ).
k =1 k =1
f f f
Proof. Because of Rk ⊆ Rk , depending on Definition 5, we have a R c ≥ a
Rk c
, and bR c ≥
k k
f f ff f
bR c , and ≥cR c
k
c , and d R c ≥ d
cR
k Rk c ∀( x, y ) ∈ (U × V ).
k k
n O "7 " #8 #
Thus, it can be seen that ∑ Rk ( E)= x, ∨nk=1 ∧y∈V h Rk c ( x, y) ∨ h E (y) x ∈ U =
k =1
"7 f f f f f f
x, (∨nk=1 ∧y∈V { a R c ∨ a E },∨nk=1 ∧y∈V {bR c ∨ bE }, ∨nk=1 ∧y∈V {c Rk c ∨ c E },
k k
f f f f f f
∨nk=1 ∧y∈V {d Rk c ∨ d E })| x ∈ U } ≥ { x, (∨nk=1 ∧y∈V { a R c ∨ a E }, ∨nk=1 ∧y∈V {bR c ∨ bE },
n O
f f f f
∨nk=1 ∧y∈V {c R c ∨ c E }, ∨nk=1 ∧y∈V {d R c ∨ d E })| x ∈ U }= ∑ Rk ( E). Therefore, we have
k =1
n O n O
∑ Rk ( E ) ⊆ ∑ R k ( E ).
k =1 k =1
n O n O
Similarly, ∑ Rk ( E) ⊇ ∑ Rk ( E) is obtained.
k =1 k =1
Theorem 2 states that the optimistic HTrF MG lower and upper approximations on two-
universe exhibit monotonicity in the matter of the monotonic forms of multiple HTrFRs.
⎧ ⎫
⎪ 2 3 ⎪
n P ⎨ ⎬
∑ Rk ( E) =
⎪
x, h n P (x) |x ∈ U ,
⎪
(11)
k =1 ⎩ ∑ Rk ( E) ⎭
k =1
" #
where h n P ( x ) = ∧nk=1 ∧y∈V h Rk c ( x, y) ∨ h E (y) ; h n P ( x ) = ∨nk=1 ∨y∈V
∑ Rk ( E) ∑ Rk ( E)
k =1
" k =1
h Rk ( x, y)∧ h E (y)}.
$ P
%
n P n
The pair ∑ R k ( E ), ∑ R k ( E ) indicates a pessimistic HTrF MGRS on two-universe
k =1 k =1
of E in the matter of (U, V, Rk ).
370
Electronics 2023, 12, 2380
n P n P n P n P n P n P
(3) ∑ Rk ( E ∩ F ) = ∑ Rk ( E) ∩ ∑ Rk ( F ), ∑ Rk ( E ∪ F ) = ∑ Rk ( E)∪ ∑ Rk ( F );
k =1 k =1 k =1 k =1 k =1 k =1
n P n P n P n P n P n P
(4) ∑ Rk ( E ∪ F ) ⊇ ∑ Rk ( E) ∪ ∑ Rk ( F ), ∑ Rk ( E ∩ F ) ⊆ ∑ Rk ( E)∩ ∑ Rk ( F ).
k =1 k =1 k =1 k =1 k =1 k =1
Theorem 4 states the pessimistic HTrF MG lower and upper approximations on two-
universe exhibit monotonicity in the matter of the monotonic forms of multiple HTrFRs.
n O
Proof. ∀ x ∈ U, ∑ Rk ( E) = { x, ∨nk=1 ∧y∈V { h Rk c ( x, y) ∨ h E (y)}| x ∈ U }≥ { x,∧nk=1 ∧y∈V
k =1
n P n P n O
{h Rk c ( x, y) ∨ h E (y)}| x ∈ U }= ∑ Rk ( E). ∑ Rk ( E) ⊇ ∑ Rk ( E) is similarly obtained.
k =1 k =1 k =1
Theorem 5 states that the optimistic HTrF MG lower approximation includes the pes-
simistic HTrF MG lower approximation, and the pessimistic HTrF MG upper approximation
includes the optimistic HTrF MG upper approximation.
Remark 1. This section presents a novel model, named HTrF MGRSs, on two-universe. The model
combines the advantages of HTrFSs with MGRSs, which serves as a powerful tool to effectively
deal with the AQE issue. The HTrFSs integrate HFSs with TrFNs, which offer a robust and
flexible way of representing uncertain and imprecise AQE MSIoTSD. Compared to other fuzzy
numbers, TrFNs demonstrate higher stability and are less susceptible to minor parameter variations.
Furthermore, the various shapes of their membership functions enable them to capture fuzzy concepts
in a flexible manner, reflecting real-world scenarios more accurately. Meanwhile, HFSs enable the
expression of expert knowledge fully, as they allow the assignment of multiple membership values to
an object, thus effectively representing the uncertainty and fuzziness in human reasoning. In the
AQE process, determining the optimal solution requires the evaluation results provided by different
experts. However, these experts may have distinct viewpoints on AQE. MGRSs on two-universe are
371
Electronics 2023, 12, 2380
distinguished by their remarkable information fusion capabilities, which enable the integration of
distinct evaluation results from numerous experts via the provision of pessimistic and optimistic
strategies, ultimately leading to a consensus and agreement. In summary, the proposed HTrF
MGRSs model on two-universe has the potential to improve AQE decision ability and provide sound
conclusions for AQE.
⎧ ⎧ ⎫⎫
⎨ ⎨ n P
n P
⎬⎬
T2 = jmax ∑ Rk ( E) x j ⊕ ∑ Rk ( E) x j (13)
⎩ x j ∈U ⎩ k = 1 k =1
⎭⎭
⎧ ⎧⎛ ⎞ ⎛ ⎞⎫⎫
⎨ ⎨ n O n O n P n P ⎬⎬
T3 = i max ⎝ ∑ Rk ( E)( xi ) ⊕ ∑ Rk ( E)( xi )⎠ ⊕⎝ ∑ Rk ( E)( xi ) ⊕ ∑ Rk ( E)( xi )⎠ (14)
⎩ x i ∈U ⎩ k =1 k =1 k =1 k =1
⎭⎭
where T1 , T2 , T3 denote the DMISs that consist of subscripts of the biggest HTrF ele-
n O n O n P n P
ment in the corresponding HTrFSs ∑ Rk ( E) ⊕ ∑ Rk ( E), ∑ Rk ( E)⊕ ∑ Rk ( E) and
k =1 k =1 k =1 k =1
$ O
% $ P
%
n O n n P n
∑ Rk ( E) ⊕ ∑ Rk ( E) ⊕ ∑ Rk ( E) ⊕ ∑ Rk ( E) , respectively. In accordance with
k =1 k =1 k =1 k =1
Definition 4, the computation of the values of the score function for the HTrF elements in the
372
Electronics 2023, 12, 2380
corresponding HTrFSs mentioned above is feasible. Subsequently, we can easily obtain the
T1 , T2 , and T3 index sets. Next, we will discuss the practical implications of the three DMISs
described above. Optimistic MGRSs are founded on the principle of “seeking common
ground while preserving differences”, i.e., retaining both the same and inconsistent parts
of the opinions given by different experts, which can be regarded as a relatively risky
risk-seeking approach to information fusion; whereas, pessimistic MGRSs are founded on
the principle of “seeking common ground while excluding differences”, i.e., retaining the
same parts of the opinions given by different experts and removing different opinions and
claims, which can be regarded as a relatively conservative risk-averse approach to infor-
mation fusion. Thus, T1 is the optimistic evaluation result, T2 is the pessimistic evaluation
result, and T3 is the weighted evaluation result of T1 and T2 , with a weighted value of 0.5.
According to the definitions above, the decision rules are given by:
1. In case T1 ∩ T2 ∩ T3 = ∅, that xl (l ∈ T1 ∩ T2 ∩ T3 ) is the optimal location.
2. In case T1 ∩ T2 ∩ T3 = ∅, but also T1 ∩ T2 = ∅, that xl (l ∈ T1 ∩ T2 ) is the optimal location.
3. In case T1 ∩ T2 ∩ T3 = ∅, but also T1 ∩ T2 = ∅, that xl (l ∈ T3 ) is the optimal location.
Algorithm 1 The algorithm based on HTrF MGRSs over two universes for AQE.
Require: An HTrF decision information system (U, V, Rk , E).
Ensure: The optimal location.
1 for i = 1 to p,j = 1 to n, t = 1 to q do
n O n O n P n P
2 Compute ∑ Rk ( E), ∑ Rk ( E), ∑ Rk ( E), and ∑ Rk ( E), respectively.
k =1 k =1 k =1 k =1
3 end for
4 for t = 1 to p do
n O n O n P n P
5 Compute ∑ Rk ( E) ⊕ ∑ Rk ( E) and ∑ Rk ( E) ⊕ ∑ Rk ( E), respectively.
k =1 k =1 k =1 k =1
6 end for
7 for t = 1 to p do
$ O
% $ P
%
n O n n P n
8 Compute ∑ Rk ( E) ⊕ ∑ Rk ( E) ⊕ ∑ Rk ( E) ⊕ ∑ Rk ( E) .
k =1 k =1 k =1 k =1
9 end for
10 for t = 1 to p do
11 Calculate T1 , T2 and T3 .
12 end for
13 Calculate T1 ∩ T2 ∩ T3 , T1 ∩ T2 , and determine the optimal location.
Remark 2. In the above steps, we set the number of locations as p, the number of attributes as q, and
the number of experts as n. The first step has a complexity of O( pnq). For the subsequent steps, i.e.,
steps 2 to 4, the complexity is represented as O( p). Then, the complexity of the last step is denoted
as O(1). Consequently, the overall complexity of the proposed algorithm is represented as O( pnq).
373
Electronics 2023, 12, 2380
In this section, we introduce a novel MAGDM method based on HTrF MGRSs on two-
universe. We begin by introducing the HTrF decision information system. Subsequently,
we describe the specific steps of the proposed MAGDM method in detail. Then, we
apply the proposed method to AQE and propose a specific algorithm for this domain.
Additionally, we conduct a complexity analysis of the proposed algorithm to assess its
computational efficiency.
5. Case Analysis
The present section showcases the viability of the proposed MAGDM approach within
the realm of AQE by means of a practical case study. Additionally, a comprehensive series
of comparative and experimental analyses are executed to validate the efficacy of the
presented approach.
October, and December, . respectively. Next, we can obtain/the corresponding HTrF element
represented as dij = (μ3ij , μ6ij , μ9ij , μ12
ij ), ( μij , μij , μij , μij ) . By following this process, we
2 6 10 12
374
Electronics 2023, 12, 2380
scenarios, and its ranking results have the advantage of being comprehensive. Thus, we
use the ranking results of the above set for comparative analysis.
30 30
25 25
Ranking
20 20
15 15
10 10
5 5
0 0
5 10 12 15 20 25 30
Alternatives
40 40 40 40
HTrFG operators HTrFG operators HTrFEA operators HTrFEA operators
The presented method The presented method The presented method The presented method
35 35 35 35
30 30 30 30
25 25 25 25
20 20 20 20
15 15 15 15
10 10 10 10
5 5 5 5
0 0 0 0
5 10 12 15 20 25 30 5 10 12 15 20 25 30
Alternatives Alternatives
Figure 1. Cont.
375
Electronics 2023, 12, 2380
40 40 40 40
HTrFEG operators HTrFEG operators The HTrF VIKOR method The HTrF VIKOR method
The presented method The presented method The presented method The presented method
35 35 35 35
30 30 30 30
25 25 25 25
20 20 20 20
15 15 15 15
10 10 10 10
5 5 5 5
0 0 0 0
5 10 12 15 20 25 30 5 10 12 15 20 25 30
Alternatives Alternatives
40 40
The HTrF TOPSIS method The HTrF TOPSIS method
The presented method The presented method
35 35
30 30
25 25
20 20
15 15
10 10
5 5
0 0
5 10 12 15 20 25 30
Alternatives
30 30
25 25
20 20
15 15
10 10
5 5
0 0
5 10 12 15 20 25 30
Alternatives
376
Electronics 2023, 12, 2380
From Figure 2, it is incontrovertible that the presented methodology and the HTrF
MABAC approach demonstrate a congruous overall trend, and, more notably, select the
same optimal scheme. This observation serves as further evidence of the efficacy and
soundness of the proposed methodology.
According to the above experimental analysis, the correlation between the presented
MAGDM approach and other comparable approaches is relatively strong, which validate
the validity and stability of the proposed MAGDM approach.
377
Electronics 2023, 12, 2380
5.4. Discussion
Deep learning algorithms, such as Convolutional Neural Networks, have been suc-
cessful in various applications. However, for MAGDM problems, they may not always
be the optimal solution. It should be noted that Convolutional Neural Networks and
their variations require the data to be divided into training and testing sets. While the
conventional division practice is usually to allocate 80% of the data to training and 20%
to testing, discrepancies in the ratio of training to testing data allocation, as well as the
stochasticity of the division process, may lead to dissimilar outcomes.
Regarding the data used in this study, the dataset included weather information for
367 cities in China from December 2013 onwards. However, for the purpose of demon-
stration, we selected some data from 31 provincial capital cities from 2018 to 2020 as our
sample. We took great care to ensure that the selected sample represents the characteristics
and distribution of the entire dataset. Nevertheless, future studies could use larger datasets
or incorporate additional attributes to improve the accuracy and generalizability of the
proposed method.
The experimental results outlined above demonstrate that the decision-making method
based on HTrF MGRSs on two-universe represents a comprehensive utilization of the
strengths of HTrFSs and MGRSs. First, HTrFSs offer significant advantages over other fuzzy
sets by allowing for a more precise representation of fuzzy or imprecise information through
TrFNs. Moreover, HTrFSs combine the advantages of HFSs to enable decision-makers to
express their hesitations or uncertainties during the decision-making process, thus enabling
them to consider all possible scenarios and make more informed decisions. Then, the
MGRSs on two-universe approach serves as an excellent information fusion strategy that
integrates the perspectives of different experts to arrive at a final conclusion. Furthermore,
we leverage DMISs to mitigate the impact of disagreements among the experts within the
expert group on the evaluation outcomes. By incorporating DMISs, the presented MAGDM
method offers a multifaceted evaluation scheme to experts, allowing them to attain more
sensible and precise evaluation results. In summary, the MAGDM method presented in
this article substantially reduces the uncertainty involved in decision-making and enhances
its accuracy and reliability. By combining the advantages of HTrFSs, MGRSs, and DMISs,
the proposed approach provides a viable option for assessment and decision-making in
378
Electronics 2023, 12, 2380
situations of uncertainty and fuzziness. The proposed approach demonstrates the potential
for solving decision problems in various domains.
Regarding the AQE in different cities, the experimental results indicate that the air
quality in Haikou is relatively good, whereas the air quality in Xining and Taiyuan is
relatively poor. First, the successful experience of Haikou city demonstrates that economic
development and environmental protection are not mutually exclusive. Therefore, the
government should actively strengthen ecological construction and protection to improve
air quality. Second, with sustained government control, recent data reveals that the overall
air quality in Xining has improved, suggesting the critical role of governance in improving
air quality. For cities such as Taiyuan, where coal is the main source of energy and coal
burning and industrial pollution are the main sources of pollution, the government should
actively promote the transformation of the energy structure, reduce dependence on coal,
promote clean energy, control the emissions from industrial pollution sources, and promote
other measures to reduce the emission of atmospheric pollutants. In summary, the govern-
ment should formulate and implement relevant policies and measures to improve urban
air quality and enhance residents’ quality of life.
This section presents a comprehensive case study that demonstrates the validity and
feasibility of the proposed MAGDM method within the domain of AQE. The evaluation em-
ploys comparative and experimental analysis to showcase the effectiveness of the proposed
approach. We begin by providing a detailed description of the experimental procedure.
Subsequently, we conduct a comparative analysis, where we compare and contrast the
proposed MAGDM method with several classical HTrF MAGDM methods and the HTrF
MABAC method. Moreover, we compute the Spearman correlation coefficient and plot a
graph that compares the proposed method with other similar methods. The advantages
of the proposed method are also presented in tabular format. Finally, a comprehensive
discussion and analysis is presented, which includes a discussion of the limitations of deep
learning methods, a detailed analysis of the datasets used in this paper, the potential of the
proposed method, and the implications of this paper’s research for government work.
6. Conclusions
AQE plays a crucial role in creating and maintaining a clean atmospheric environment.
In this article, we introduce a novel MAGDM method to AQE. First, we propose an HTrF
MGRS on two-universe model by combining the advantages of HTrFSs in information
representation and MGRSs in information fusion. Then, we investigate the fundamental
definitions and properties of optimistic and pessimistic HTrF MGRSs on two-universe.
Afterward, we present a general approach to the AQE decision problem. Finally, we
conduct several numerical analyses, using AQE-related datasets, to showcase the feasibility,
effectiveness, and stability of the presented MAGDM approach.
While the proposed architecture presents a promising solution for AQE, there are
still several challenging issues in theoretical and practical research. We recommend the
exploration of the following research directions in the future:
1. Realistic decision-making scenarios are diverse; hence, it is essential to extend the
application of the presented MAGDM approach to other real-world contexts, such as
water quality testing, forest fire prediction, disease diagnosis, etc.
2. Further exploration of property reduction methods and uncertainty measures for
HTrF MGRSs on two-universe has important implications for the application of the
presented MAGDM method to other uncertain and complicated decision scenarios.
3. Large-scale MAGDM can leverage the complementary knowledge structures of large
groups of people to enhance the precision and objectivity of decision-making. As such,
it is imperative to explore large-scale MAGDM to tackle intricate practical situations.
379
Electronics 2023, 12, 2380
Author Contributions: Conceptualization, C.Z.; software, W.L.; formal analysis, W.L., Y.C. and J.S.;
investigation, J.S.; writing—original draft preparation, W.L.; writing—review and editing, C.Z.;
visualization, Y.C. All authors have read and agreed to the published version of the manuscript.
Funding: This research was partially funded by the 20th Undergraduate Innovation and Entrepreneur-
ship Training Program of Shanxi University (No. X2022020043), the Special Fund for Science and
Technology Innovation Teams of Shanxi (No. 202204051001015).
Data Availability Statement: The dataset utilized in this research is available from https://www.
aqistudy.cn/historydata/ (accessed on 16 April 2023).
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
References
1. de Santos, U.P.; Arbex, M.A.; Braga, A.L.F.; Mizutani, R.F.; Cançado, J.E.D.; Terra-Filho, M.; Chatkin, J.M. Environmental air
pollution: Respiratory effects. J. Bras. Pneumol. 2021, 47, e20200267. [CrossRef] [PubMed]
2. González-Martín, J.; Kraakman, N.J.R.; Pérez, C.; Lebrero, R.; Muñoz, R. A state–of–the-art review on indoor air pollution and
strategies for indoor air pollution control. Chemosphere 2021, 262, 128376. [CrossRef]
3. Wei, W.; Wang, Z. Impact of industrial air pollution on agricultural production. Atmosphere 2021, 12, 639. [CrossRef]
4. Michetti, M.; Gualtieri, M.; Anav, A.; Adani, A.; Benassi, B.; Dalmastri, C.; D’Elia, I.; Piersanti, A.; Sannino, G.; Zanini, G.; et al.
Climate change and air pollution: Translating their interplay into present and future mortality risk for Rome and Milan
municipalities. Sci. Total Environ. 2022, 830, 154680. [CrossRef] [PubMed]
5. Caleyachetty, R.; Lufumpa, N.; Kumar, N.; Mohammed, N.I.; Bekele, H.; Kurmi, O.; Wells, J.; Manaseki-Holland, S. Exposure to
household air pollution from solid cookfuels and childhood stunting: A population-based, cross-sectional study of half a million
children in low- and middle-income countries. Int. Health 2022, 14, 639–647. [CrossRef] [PubMed]
6. Latham, R.M.; Kieling, C.; Arseneault, L.; Rocha, T.B.M.; Beddows, A.; Beevers, S.D.; Danese, A.; de Oliveira, K.; Kohrt, B.A.;
Moffitt, T.E.; et al. Childhood exposure to ambient air pollution and predicting individual risk of depression onset in UK
adolescents. J. Psychiatr. Res. 2021, 138, 60–67. [CrossRef]
7. Ahmed, M.; Shuai, C.; Abbas, K.; Rehman, F.U.; Khoso, W.M. Investigating health impacts of household air pollution on woman's
pregnancy and sterilization: Empirical evidence from Pakistan, India, and Bangladesh. Energy 2022, 247, 123562. [CrossRef]
8. Goshua, A.; Akdis, C.A.; Nadeau, K.C. World Health Organization global air quality guideline recommendations: Executive
summary. Allergy 2022, 77, 1955–1960. [CrossRef]
9. Huang, W.; Li, T.; Liu, J.; Xie, P.; Du, S.; Teng, F. An overview of air quality analysis by big data techniques: Monitoring, forecasting,
and traceability. Inf. Fusion 2021, 75, 28–40. [CrossRef]
10. Zhu, J.; Chen, L.; Liao, H. Multi-pollutant air pollution and associated health risks in China from 2014 to 2020. Atmos. Environ.
2022, 268, 118829. [CrossRef]
11. Oprea, M. A case study of knowledge modelling in an air pollution control decision support system. AI Commun. 2005,
18, 293–303.
12. Wang, W.; Mao, W.; Tong, X.; Xu, G. A novel recursive model based on a convolutional long short-term memory neural network
for air pollution prediction. Remote Sens. 2021, 13, 1284. [CrossRef]
13. Gu, Y.; Zhao, Y.; Zhou, J.; Li, H.; Wang, Y. A fuzzy multiple linear regression model based on meteorological factors for air quality
index forecast. J. Intell. Fuzzy Syst. 2021, 40, 10523–10547. [CrossRef]
380
Electronics 2023, 12, 2380
14. Ma, J.; Ma, X.; Yang, C.; Xie, L.; Zhang, W.; Li, X. An air pollutant forecast correction model based on ensemble learning algorithm.
Electronics 2023, 12, 1463. [CrossRef]
15. Gu, Y.; Li, B.; Meng, Q. Hybrid interpretable predictive machine learning model for air pollution prediction. Neurocomputing 2022,
468, 123–136. [CrossRef]
16. Tao, Y.; Wu, Y.; Zhou, J.; Wu, M.; Wang, S.; Zhang, L.; Xu, C. How to realize the effect of air pollution control? A hybrid decision
framework under the fuzzy environment. J. Clean. Prod. 2021, 305, 127093. [CrossRef]
17. Martín-Baos, J.Á.; Rodriguez-Benitez, L.; García-Ródenas, R.; Liu, J. IoT based monitoring of air quality and traffic using regression
analysis. Appl. Soft Comput. 2022, 115, 108282. [CrossRef]
18. Sangaiah, A.K.; Rostami, A.S.; Hosseinabadi, A.A.R.; Shareh, M.B.; Javadpour, A.; Bargh, S.H.; Hassan, M.M. Energy-Aware
geographic routing for Real-Time workforce monitoring in industrial informatics. IEEE Internet Things J. 2021, 8, 9753–9762.
[CrossRef]
19. Sangaiah, A.K.; Javadpour, A.; Ja’fari, F.; Pinto, P.; Zhang, W.; Balasubramanian, S. A hybrid heuristics artificial intelligence
feature selection for intrusion detection classifiers in cloud of things. Clust. Comput. 2023, 26, 599–612. [CrossRef]
20. Lin, M.; Huang, C.; Xu, Z.; Chen, R. Evaluating IoT platforms using integrated probabilistic linguistic MCDM method.
IEEE Internet Things J. 2020, 7, 11195–11208. [CrossRef]
21. Schilt, U.; Barahona, B.; Buck, R.; Meyer, P.; Kappani, P.; Möckli, Y.; Meyer, M.; Schuetz, P. Low-Cost sensor node for air quality
monitoring: Field tests and validation of particulate matter measurements. Sensors 2023, 23, 794. [CrossRef] [PubMed]
22. Dmytryk, N.; Leivadeas, A. A generic preprocessing architecture for Multi-Modal IoT sensor data in artificial general intelligence.
Electronics 2022, 11, 3816. [CrossRef]
23. Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [CrossRef]
24. Li, W.; Zhai, S.; Xu, W.; Pedrycz, W.; Qian, Y.; Ding, W.; Zhan, T. Feature selection approach based on improved fuzzy C-Means
with principle of refined justifiable granularity. IEEE Trans. Fuzzy Syst. 2022. [CrossRef]
25. Li, W.; Zhou, H.; Xu, W.; Wang, X.Z.; Pedrycz, W. Interval Dominance-Based feature selection for Interval-Valued ordered data.
IEEE Trans. Neural Netw. Learn. Syst. 2022. [CrossRef]
26. Lin, M.; Wang, H.; Xu, Z. TODIM-based multi-criteria decision-making method with hesitant fuzzy linguistic term sets.
Artif. Intell. Rev. 2020, 53, 3647–3671. [CrossRef]
27. Lin, M.; Li, X.; Chen, R.; Fujita, H.; Lin, J. Picture fuzzy interactional partitioned Heronian mean aggregation operators: An
application to MADM process. Artif. Intell. Rev. 2022, 55, 1171–1208. [CrossRef]
28. Torra, V. Hesitant fuzzy sets. Int. J. Intell. Syst. 2010, 25, 529–539. [CrossRef]
29. Divsalar, M.; Ahmadi, M.; Ebrahimi, E.; Ishizaka, A. A probabilistic hesitant fuzzy Choquet integral-based TODIM method for
multi-attribute group decision-making. Expert Syst. Appl. 2022, 191, 116266. [CrossRef]
30. Krishankumar, R.; Ravichandran, K.S.; Gandomi, A.H.; Kar, S. Interval-valued probabilistic hesitant fuzzy set-based framework
for group decision-making with unknown weight information. Neural Comput. Appl. 2021, 33, 2445–2457. [CrossRef]
31. Ahmad, F.; Adhami, A.Y.; John, B.; Reza, A. A novel approach for the solution of multiobjective optimization problem using
hesitant fuzzy aggregation operator. RAIRO-Oper. Res. 2022, 56, 275–292. [CrossRef]
32. Ye, J. Multicriteria decision-making method using expected values in trapezoidal hesitant fuzzy setting. J. Converg. Inf. Technol.
2013, 8, 135–143.
33. Deli, I. Bonferroni mean operators of generalized trapezoidal hesitant fuzzy numbers and their application to decision-making
problems. Soft Comput. 2021, 25, 4925–4949. [CrossRef]
34. Deli, I.; Karaaslan, F. Generalized trapezoidal hesitant fuzzy numbers and their applications to multi criteria decision-making
problems. Soft Comput. 2021, 25, 1017–1032. [CrossRef]
35. Zhang, C.; Ding, J.; Zhang, J.; Sangaiah, A.K.; Li, D. Fuzzy intelligence learning based on bounded rationality in IoMT systems: A
case study in parkinson’s disease. IEEE Trans. Comput. Soc. Syst. 2022. [CrossRef]
36. Zhang, C.; Li, X.; Sangaiah, A.K.; Li, W.; Wang, B.; Cao, F.; Shangguan, X. Collaborative fuzzy linguistic learning to Low-Resource
and robust decision system based on bounded rationality. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2023. [CrossRef]
37. Zadeh, L.A. Fuzzy sets and information granularity. Adv. Fuzzy Set Theory Appl. 1979, 11, 3–18.
38. Zadeh, L.A. Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic.
Fuzzy Sets Syst. 1997, 90, 111–127. [CrossRef]
39. Zhan, T. Granular-based state estimation for nonlinear fractional control systems and its circuit cognitive application.
Int. J. Cogn. Comput. Eng. 2023, 4, 1–5. [CrossRef]
40. Zhang, C.; Li, D.; Liang, J. Interval-valued hesitant fuzzy multi-granularity three-way decisions in consensus processes with
applications to multi-attribute group decision making. Inf. Sci. 2020, 511, 192–211. [CrossRef]
41. Ren, X.; Li, D.; Zhai, Y. Research on mixed decision implications based on formal concept analysis. Int. J. Cogn. Comput. Eng. 2023,
4, 71–77. [CrossRef]
42. Pawlak, Z. Rough sets. Int. J. Comput. Inf. Sci. 1982, 11, 341–356. [CrossRef]
43. Bai, J.; Sun, B.; Chu, X.; Wang, T.; Li, H.; Huang, Q. Neighborhood rough set-based multi-attribute prediction approach and its
application of gout patients. Appl. Soft Comput. 2022, 114, 108127. [CrossRef]
44. Abdullah, S.; Al-Shomrani, M.M.; Liu, P.; Ahmad, S. A new approach to three-way decisions making based on fractional fuzzy
decision-theoretical rough set. Int. J. Intell. Syst. 2022, 37, 2428–2457. [CrossRef]
381
Electronics 2023, 12, 2380
45. Bai, H.; Li, D.; Ge, Y.; Wang, J.; Cao, F. Spatial rough set-based geographical detectors for nominal target variables. Inf. Sci. 2022,
586, 525–539. [CrossRef]
46. Qian, Y.; Liang, J.; Yao, Y.; Dang, C. MGRS: A multi-granulation rough set. Inf. Sci. 2010, 180, 949–970. [CrossRef]
47. Bai, W.; Zhang, C.; Zhai, Y.; Sangaiah, A.K. Incomplete intuitionistic fuzzy behavioral group decision-making based on multi-
granulation probabilistic rough sets and MULTIMOORA for water quality inspection. J. Intell. Fuzzy Syst. 2023, 44, 4537–4556.
[CrossRef]
48. Zhang, C.; Ding, J.; Zhan, J.; Li, D. Incomplete three-way multi-attribute group decision making based on adjustable multigranu-
lation Pythagorean fuzzy probabilistic rough sets. Int. J. Approx. Reason. 2022, 147, 40–59. [CrossRef]
49. Zhang, C.; Bai, W.; Li, D.; Zhan, J. Multiple attribute group decision making based on multigranulation probabilistic models,
MULTIMOORA and TPOP in incomplete q-rung orthopair fuzzy information systems. Int. J. Approx. Reason. 2022, 143, 102–120.
[CrossRef]
50. Li, W.; Xu, W.; Zhang, X.; Zhang, J. Updating approximations with dynamic objects based on local multigranulation rough sets in
ordered information systems. Artif. Intell. Rev. 2022, 55, 1821–1855. [CrossRef]
51. Pei, D.; Xu, Z.B. Rough set models on two universes. Int. J. Gen. Syst. 2004, 33, 569–581. [CrossRef]
52. Sun, B.; Ma, W. Multigranulation rough set theory over two universes. J. Intell. Fuzzy Syst. 2015, 28, 1251–1269. [CrossRef]
53. Sun, B.; Zhou, X.; Lin, N. Diversified binary relation-based fuzzy multigranulation rough set over two universes and application
to multiple attribute group decision making. Inf. Fusion 2020, 55, 91–104. [CrossRef]
54. Yang, D.; Cai, M.; Li, Q.; Xu, F. Multigranulation fuzzy probabilistic rough set model on two universes. Int. J. Approx. Reason.
2022, 145, 18–35. [CrossRef]
55. Zhang, C.; Li, D.; Liang, J. Multi-granularity three-way decisions with adjustable hesitant fuzzy linguistic multigranulation
decision-theoretic rough sets over two universes. Inf. Sci. 2020, 507, 665–683. [CrossRef]
56. Gu, J.; Wang, D.; Hu, D.; Gao, F.; Xu, F. Temporal extraction of complex medicine by combining probabilistic soft logic and textual
feature feedback. Appl. Sci. 2023, 13, 3348. [CrossRef]
57. Alshukaili, D.; Fernandes, A.A.A.; Paton, N.W. Structuring linked data search results using probabilistic soft logic. In Proceedings
of the 15th International Semantic Web Conference (ISWC 2016), Kobe, Japan, 17–21 October 2016; pp. 3–19.
58. Fakhraei, S.; Huang, B.; Raschid, L.; Getoor, L. Network-Based Drug-Target interaction prediction with probabilistic soft logic.
IEEE/ACM Trans. Comput. Biol. Bioinform. 2014, 11, 775–787. [CrossRef]
59. Dubois, D.; Prade, H. Ranking fuzzy numbers in the setting of possibility theory. Inf. Sci. 1983, 30, 183–224. [CrossRef]
60. Sun, B.; Ma, W.; Zhao, H. A fuzzy rough set approach to emergency material demand prediction over two universes.
Appl. Math. Model. 2013, 37, 7062–7070. [CrossRef]
61. Pamučar, D.; Ćirović, G. The selection of transport and handling resources in logistics centers using Multi-Attributive Border
Approximation area Comparison (MABAC). Expert Syst. Appl. 2015, 42, 3016–3028. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
382
electronics
Article
A Variable Structure Multiple-Model Estimation Algorithm
Aided by Center Scaling
Qiang Wang, Guowei Li, Weitong Jin, Shurui Zhang * and Weixing Sheng
School of Electronic and Optical Engineering, Nanjing University of Science and Technology,
Nanjing 210094, China
* Correspondence: [email protected]
Abstract: The accuracy for target tracking using a conventional interacting multiple-model algorithm
(IMM) is limited. In this paper, a new variable structure of interacting multiple-model (VSIMM)
algorithm aided by center scaling (VSIMM-CS) is proposed to solve this problem. The novel VSIMM-
CS has two main steps. Firstly, we estimate the approximate location of the true model. This is aided
by the expected-mode augmentation algorithm (EMA), and a new method—namely, the expected
model optimization method—is proposed to further enhance the accuracy of EMA. Secondly, we
change the original model set to ensure the current true model as the symmetry center of the current
model set, and the model set is scaled down by a certain percentage. Considering the symmetry and
linearity of the system, the errors produced by symmetrical models can be well offset. Furthermore,
narrowing the distance between the true model and the default model is another effective method
to reduce the error. The second step is based on two theories: symmetric model set optimization
method and proportional reduction optimization method. All proposed theories aim to minimize
errors as much as possible, and simulation results highlight the correctness and effectiveness of the
proposed methods.
1. Introduction
Citation: Wang, Q.; Li, G.; Jin, W.;
Zhang, S.; Sheng, W. A Variable Multiple-model (MM) is an advanced method to solve many problems, especially
Structure Multiple-Model Estimation the target tracking problem [1,2]. Compared with traditional algorithms combined with
Algorithm Aided by Center Scaling. radar systems [3,4], MM’s power of MM comes from the teamwork of multi parallel
Electronics 2023, 12, 2257. https:// estimators [5], not only a single estimator. The MM approach has been presented for more
doi.org/10.3390/electronics12102257 than fifty years, and it was first proposed in [6,7]. It has a mature framework system [8,9],
and the parallel structure using Bayesian filters proves its great performance. Usually,
Academic Editor: Dimitris Apostolou
the model set designed in advance or generated in real-time is used to cover the possible
Received: 27 March 2023 true models. Then, the system dynamics can be described as hybrid systems [10,11] with
Revised: 11 May 2023 discrete modes and continuous states. The model set used during the target tracking
Accepted: 12 May 2023 process has a big influence on estimation results [12], and a better model set often leads to
Published: 16 May 2023 more precise tracking results. The overall estimation is the combination of all estimations
from the parallel-running Bayesian filters [13,14]. In recent decades, the MM methods
have achieved rapid development [15]. The MM has been used widely because of its
completeness of method and easiness of implementation. In addition, it goes through three
Copyright: © 2023 by the authors.
stages altogether [16]: static MM (SMM), interacting MM (IMM) and variable-structure
Licensee MDPI, Basel, Switzerland.
This article is an open access article
interacting of MM (VSIMM).
distributed under the terms and
Compared with SMM, the biggest advantage of IMM is considering the jumps between
conditions of the Creative Commons models [6,7], and this drawback was fixed by Blom and Bar-Shalom [17]. Many advanced
Attribution (CC BY) license (https:// methods of IMM have been proven that improving tracking accuracy and not increasing
creativecommons.org/licenses/by/ computation burden can be conducted at the same time [18–20]. A reweighted interact-
4.0/). ing multiple model algorithm [21], which is a recursive implementation of a maximum
384
Electronics 2023, 12, 2257
For many existing methods, such as FIMM, VSIMM-CS has excellent cost performance ratio
without more computation in the design of the initial model set. Compared with many
VSIMM algorithms, just like LMS proposed in [33], VSIMM has better implementability
with the same computation. In general, VSIMM-CS shows high precision and universality,
and it is also easy to implement.
The remaining parts of the paper are organized as follows: Section 2 introduces
the processes of the IMM and VSIMM. Section 3 provides three optimization methods:
symmetric model set optimization method, proportional reduction optimization method,
and expected model optimization method to prove the feasibility. Section 4 presents the
process of VSIMM-CS. Finally, Section 5 provides the conclusion.
2. Multiple-Model Algorithm
In this section, the processing of VSIMM and FIMM are briefly introduced.
where x = ( x, ẋ, y, ẏ) is the target state; a = ( a x , ay ) is the acceleration; the process noise
is wk ∼ N [0, Q]; the measurement value is z and its random measure error is v ∼ N [0, R];
F represents the state transition matrix, G represents the acceleration input matrix, and H
is the observation matrix.
Assume that the best target estimation, the state estimation covariance matrix and the
(i ) (i ) (i )
model probability of m(i) are x̂k|k , pk|k and uk at the time k, respectively. Then, the overall
state estimation and state estimation covariance are
(i ) (i )
x̂k|k = ∑i x k |k u k (2)
(i ) (i ) (i ) (i )
pk|k = ∑i uk [ pk|k + ( x̂k|k − x̂k|k )( x̂k|k − x̂k|k ) ] (3)
385
Electronics 2023, 12, 2257
(i )| Mk (i )| Mk
pk|k = ∑ [ pk|k + ( x̂k|k − x̂k|k )
m ( i ) ∈ Mk
(5)
(i )| Mk (i )| Mk
( x̂k|k − x̂k|k ) ]uk|k
; $ OLQHDU V\VWHP $; %
(1) (2)
mi + sk = mi , i = 1, ..., N (6)
VN
VN 2
2
P1 P1 P 2 P2
For IMM algorithm, the connection between the model probability u(i) and the distance
|m(i) − sk | is
1
| m (i ) − s k | ∝ (i ) (8)
u
Then, the following relation can be determined as
386
Electronics 2023, 12, 2257
Clearly, according to (8), the overall estimation error has been reduced to a certain
extent. However, since the system is linear and it is asymmetrical with respect to sk , the
(1)
error of each mi is not eliminated well.
Since M(2) holds symmetric properties, the relationships can be obtained as
⎧ (2) (2)
⎪
⎪ m − sk = . . . = m − sk
⎪
⎪
(1)
p1 p1
(2/N1 )
⎪
⎪
⎪
⎪ = sk − m
(2)
. . = sk − m
(2) (2)
= δ p1
⎪
⎪ (2/N1 +1) . ( N1 )
⎪
⎪ p1 p1
⎪
⎪ ..
⎪
⎪
⎪
⎪ .
⎪
⎪ (2) (2)
⎪
⎨ m − sk = . . . = m − sk
(1) (2/Ni )
pi pi
(11)
⎪
⎪ (2) (2)
= sk − m (2/Ni +1) . . . = sk − m ( Ni ) = δpi
(2)
⎪
⎪
⎪
⎪ p p
⎪
⎪ ..
i i
⎪
⎪
⎪
⎪ .
⎪
⎪ (2) (2)
⎪
⎪ m (1) − sk = . . . = m (2/Nn ) − sk
⎪
⎪
⎪
⎪
pn pn
⎪
⎩ = s k − m (2) (2) (2)
(2/Nn +1) . . . = sk − m ( Nn ) = δpn
pn pn
(2) (2)
where N = ∑ Ni , and Ni is even number. In this linear system, m (i ) and m (i + Nj /2) produce
i pj pj
(2) (2)
two opposite errors ε p j and −ε p j , and
(2) ( p j )| M(2)
ε p j = xk|k − sk
( p j+ N )| M(2) (12)
(2) j /2
−ε p j = xk|k − sk
Theoretically, if M(2) is strictly symmetric with respecting to sk , the overall errors are
equal to 0, without considering the system noise.
(1)
mi − sk
(2)
=α>1 (13)
mi − sk
Then, the model sets obeying the equation above can be called relative position
invariant model sets. The most important feature is that the position of the model sets
relative of sk does not change, as shown in Figure 3.
It is obvious that M(2) is more likely to have better performance than M(1) . However,
the precondition is that the topology formed by M(1) includes all possible real models, as
shown in Figure 4. If the topology is tangent to the true mode space S, this situation can be
called the critical point, namely, α = α0 .
387
Electronics 2023, 12, 2257
P11 P21
P1 2 P22
VN
P2 P2
P1 P1
P P
PRGHOVSDFH6
PRGHOVHW
P P
Figure 4. Topology of a model set including all model space S.
Thus, if some noise could be tolerated, the following relationships are determined as
⎧
⎪ (1) (2)
⎪
⎪ u1 = u1
⎪
⎪
⎨ u (1) = u2
(2)
2 (14)
⎪ ..
⎪
⎪ .
⎪
⎪
⎩ (1) (2)
uN = uN
For IMM, the distance between mi and sk directly affects the final error
(i )
|mi − sk | ∝ | x̂k|k − sk | (15)
(i )| M(2) (i )| M(1)
| x̂k|k − sk | < | x̂k|k − sk | (16)
Generally, if the model set is scaled down, its performance improves. However, the
scale should not be too small, or even exceed the critical point; otherwise, it may make the
covariance matrix irreversible.
388
Electronics 2023, 12, 2257
and it is closer than any model to sk , namely, |me − sk | < min{|m1 − sk |, |m2 − sk |, ...,
|m N − sk |}. However, the value of |me − sk | is not small enough, as shown in Figure 5.
Expected model me may appear in anywhere in expected model space. If me + γ = sk ,
where γ is the error, it is necessary to find a method to reduce γ to a certain extent.
P P
PRGHOVHW
PH
H[SHFWHGPRGHO PH
VN
WUXHPRGHO VN
P H[SHFWHGPRGHOVSDFH
P
Figure 5. Example of the mentioned hypothesis.
If me is scaled a little bit, let me becomes λme , where λ is scaling factor. Thus, (18) is
rewritten as
sk − λme = γ − (λ − 1)me (19)
and the conditions are 0
λ > 1, γ < 0
(20)
0 < λ < 1, γ > 0
Obviously, the error γ reduces to γ − (λ − 1)me . If λ is chosen reasonably, the error
e −sk
becomes small enough and λme becomes close enough to sk , namely, λm λme +sk ≈ 0. What is
worth noting is increasing or decreasing λ blindly contributes to serious mistakes, including
even worse performance.
389
Electronics 2023, 12, 2257
Obviously, if M(k) = M(k−1) , the processing becomes to FIMM. Thus, the FIMM is a
special case of VSIMM.
FI MM [ M ] :
i| M i| M i| M
{ x̂k|k , pk|k , uk|k } (22)
i| M i| M i| M
= FI MM( x̂k−1|k−1 , pk−1|k−1 , uk−1|k−1 )
The novel VSIMM-CS always has an original model set M, which is taken part in the
whole processing of target tracking, and M(k) is always generated by M. M(k) and M(k−1)
are probably completely different. The steps of VSIMM-CS are shown in Algorithm 1.
me = ∑i uik+1|k+1 mi
S4: Select the suitable values of α and λ to generate the current model set M(k+1) .
M(k+1) = α{mi + λme , i = 1, 2, ..., N }
S5: Run the VSI MM[ M(k) , M(k+1) ] cycle to obtain the final results.
(i )| M(k+1) (i )| M(k+1)
x̂k+1|k+1 = ∑ x̂k+1|k+1 uk
m ( i ) ∈ M ( k +1)
(i )| M(k+1) (i )| M(k+1)
p k +1| k +1 = ∑ [ pk+1|k+1 + ( x̂k+1|k+1 − x̂k+1|k+1 )
m ( i ) ∈ M ( k +1) )
(i )| M (i )| M(k+1)
( x̂k+1|k(+k+11) − x̂k+1|k+1 ) ]uk+1|k+1
S6: Go to S1.
The algorithm complexity is T (n) = 2n2 + 16n, and n equals to the number of models.
The most important part of the process is model set generation. The values of λ and
α have a huge impact on the performance of the model set. Therefore, it is unreasonable
to choose λ and α blindly. The biggest advantage of VSIMM-CS is that the model set is
generated in real-time without increasing huge computational complexity. In addition, if
the original model set is designed rational and the number of models is few, the proposed
VSIMM-CS can achieve rewarding results and keep the calculation volume within a reason-
able range at the same time. The rational combination of λ and α achieve great performance
in terms of precision.
5. Simulation Results
In this section, a reasonable simulation process is given: firstly, the target state, the
measurement equations and their specific parameters are presented. Secondly, the orig-
inal model set is given, including the probability transition matrix. Thirdly, the target
motion state and performance criterion are presented. Finally, different simulation results
are analyzed.
The target state and the measurement equations are, respectively, given as
( j) ( j) ( j) ( j)
xk+1 = Fk xk + Gk ak + wk (23)
390
Electronics 2023, 12, 2257
( j) ( j)
zk = Hk xk + vk (24)
( j) ( j) (i )
where ∼ N [0, 0.01];
wk ∼ N [0, 1250];
vk is target acceleration input;
ak
⎡ ⎤
1 T 0 0
⎢ 0 1 T 0 ⎥
F=⎢ ⎥ and G = 0.5 1 0 0
. T is time interval.
⎣ 0 0 1 T ⎦ 0 0 0.5 1
0 0 0 1
The original model set M included 4 models as
0 9
m1 = [−10, 10] m2 = [10, 10]
(25)
m4 = [−10, −10] m3 = [10, −10]
P1 P2
P P
where x̂k is the best estimation at time k, xk is the target state; and N = 300. Therefore,
different λ and α produce different ē, as shown in Table 2. The unit of ē is meters, and we
just consider the position error.
391
Electronics 2023, 12, 2257
α
0.5 0.6 0.7 0.8 0.9
λ
0.8 16.72 13.00 10.45 8.59 7.19
1.9 13.29 9.89 7.55 5.84 4.56
3 10.16 6.96 4.76 3.15 1.98
4.1 7.21 4.16 2.09 0.98 1.52
5.2 4.32 1.53 1.86 3.14 4.09
392
Electronics 2023, 12, 2257
30 25
= 0.5, = 0.8) = 0.5, = 0.8)
= 0.6, = 0.8) = 0.6, = 0.8)
25 = 0.7, = 0.8) = 0.7, = 0.8)
= 0.8, = 0.8) 20 = 0.8, = 0.8)
= 0.9, = 0.8) = 0.9, = 0.8)
IMM-4
RMSE of velocity(m/s)
IMM-4
RMSE of position(m)
20
15
15
10
10
5
5
0 0
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
time(s) time(s)
RMSE of velocity(m/s)
IMM-4
RMSE of position(m)
15 12
10
10 8
5 4
0 0
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
time(s) time(s)
12
10
10
8
8
6
6
4
4
2 2
0 0
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
time(s) time(s)
IMM-4
RMSE of position(m)
10 10
8 8
6 6
4 4
2 2
0 0
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
time(s) time(s)
IMM-4
RMSE of position(m)
12
10
10
8
8
6
6
4
4
2 2
0 0
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
time(s) time(s)
393
Electronics 2023, 12, 2257
When α = 0.7, 0.8, 0.9, respectively, there is a high probability that the proposed algorithm
achieves better performance.
In general, VSIMM-CS certainly achieves a big performance boost compared with
FIMM, if λ and α are selected rationally. Different systems tend to have different suitable
combinations of λ and α. Smaller α or larger λ do not promote the performance. According
to the current system, the reasonable λ and α always reflect satisfactory results.
6. Conclusions
In this paper, a new variable structure interacting multiple-model algorithm, VSIMM-
CS, is proposed. Its model set is generated in real-time, and the generated model set
is based on the original model set. Considering the error properties of a linear system
and the symmetry of model set structure, the two theories called proportional reduction
optimization method and symmetric model set optimization method are presented. The
main purpose of the two theories is to reduce errors. Without considering the effect of
noise, VSIMM-CS eliminates errors perfectly. To better locate the real model, the expected
model optimization method is proposed. The excepted model generated by this method
is closer than any other model. Simulation results show different combinations of α
and λ have different performances. In most cases, VSIMM-CS achieved better tracking
results. It is acceptable to sacrifice a certain amount of computation for high accuracy. A
huge performance boost can be obtained by the precise selection of α and λ. When the
performance achieves in the optimal situation, α and λ are 0.8 and 4.1, respectively. In
different simulation conditions, the results may be different. Many factors may influence
the values of α and λ, such as noise, original model set, and true mode space, etc., and our
following research aims to focus on these factors. However, the unreasonable selection of α
and λ leads to worse results. Simulation results also highlight the rationality and feasibility
of this novel approach.
Author Contributions: Writing—original draft preparation, Q.W.; investigation, Q.W. and G.L.; writ-
ing—review and editing, G.L., W.J. and S.Z.; project administration, S.Z. and G.L.; supervision, W.S.;
funding acquisition, S.Z. All authors have read and agreed to the published version of the manuscript.
Funding: This work was supported in part by the National Natural Science Foundation of China
under Grants 62001227, 61971224 and 62001232.
Data Availability Statement: Not applicable.
Acknowledgments: The authors thank the reviewers for their great help on the article during its
review progress.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Gorji, A.A.; Tharmarasa, R.; Kirubarajan, T. Performance measures for multiple target tracking problems. In Proceedings of the
14th International Conference on Information Fusion, Chicago, IL, USA, 5–8 July 2011; pp. 1–8.
2. Poore, A.B.; Gadaleta, S. Some assignment problems arising from multiple target tracking. Math. Comput. Model. 2006, 43,
1074–1091. [CrossRef]
3. Huang, X.; Tsoi, J.K.; Patel, N. mmWave Radar Sensors Fusion for Indoor Object Detection and Tracking. Electronics 2022, 11, 2209.
[CrossRef]
4. Wei, Y.; Hong, T.; Kadoch, M. Improved Kalman filter variants for UAV tracking with radar motion models. Electronics 2020,
9, 768. [CrossRef]
5. Li, X.R.; Bar-Shalom, Y. Multiple-model estimation with variable structure. IEEE Trans. Autom. Control 1996, 41, 478–493.
6. Magill, D. Optimal adaptive estimation of sampled stochastic processes. IEEE Trans. Autom. Control 1965, 10, 434–439. [CrossRef]
7. Lainiotis, D. Optimal adaptive estimation: Structure and parameter adaption. IEEE Trans. Autom. Control 1971, 16, 160–170.
[CrossRef]
8. Tudoroiu, N.; Khorasani, K. Satellite fault diagnosis using a bank of interacting Kalman filters. IEEE Trans. Aerosp. Electron. Syst.
2007, 43, 1334–1350. [CrossRef]
9. Kirubarajan, T.; Bar-Shalom, Y.; Pattipati, K.R.; Kadar, I. Ground target tracking with variable structure IMM estimator. IEEE
Trans. Aerosp. Electron. Syst. 2000, 36, 26–46. [CrossRef]
394
Electronics 2023, 12, 2257
10. Grossman, R.L.; Nerode, A.; Ravn, A.P.; Rischel, H. Hybrid Systems; Springer: Berlin/Heidelberg, Germany, 1993; Volume 736.
11. Branicky, M.S. Introduction to hybrid systems. In Handbook of Networked and Embedded Control Systems; Birkhäuser: Basel,
Switzerland, 2005; pp. 91–116.
12. Li, X.R. Multiple-model estimation with variable structure. II. Model-set adaptation. IEEE Trans. Autom. Control 2000, 45, 2047–2060.
13. Labbe, R. Kalman and bayesian filters in python. Chap 2014, 7, 4.
14. Zhang, G.; Lian, F.; Gao, X.; Kong, Y.; Chen, G.; Dai, S. An Efficient Estimation Method for Dynamic Systems in the Presence of
Inaccurate Noise Statistics. Electronics 2022, 11, 3548. [CrossRef]
15. Rong Li, X.; Jilkov, V. Survey of maneuvering target tracking. Part V. Multiple-model methods. IEEE Trans. Aerosp. Electron. Syst.
2005, 41, 1255–1321. [CrossRef]
16. Bar-Shalom, Y. Multitarget-Multisensor Tracking: Applications and Advances; Artech House, Inc.: Norwood, MA, USA, 2000;
Volume iii.
17. Blom, H.A.; Bar-Shalom, Y. The interacting multiple model algorithm for systems with Markovian switching coefficients. IEEE
Trans. Autom. Control 1988, 33, 780–783. [CrossRef]
18. Ma, Y.; Zhao, S.; Huang, B. Multiple-Model State Estimation Based on Variational Bayesian Inference. IEEE Trans. Autom. Control
2019, 64, 1679–1685. [CrossRef]
19. Wang, G.; Wang, X.; Zhang, Y. Variational Bayesian IMM-filter for JMSs with unknown noise covariances. IEEE Trans. Aerosp.
Electron. Syst. 2019, 56, 1652–1661. [CrossRef]
20. Li, H.; Yan, L.; Xia, Y. Distributed robust Kalman filtering for Markov jump systems with measurement loss of unknown
probabilities. IEEE Trans. Cybern. 2021, 52, 10151–10162. [CrossRef]
21. Johnston, L.; Krishnamurthy, V. An improvement to the interacting multiple model (IMM) algorithm. IEEE Trans. Signal Process.
2001, 49, 2909–2923. [CrossRef]
22. Fan, X.; Wang, G.; Han, J.; Wang, Y. Interacting Multiple Model Based on Maximum Correntropy Kalman Filter. IEEE Trans.
Circuits Syst. II Express Briefs 2021, 68, 3017–3021. [CrossRef]
23. Davis, R.R.; Clavier, O. Impulsive noise: A brief review. Hear. Res. 2017, 349, 34–36. [CrossRef]
24. Nie, X. Multiple model tracking algorithms based on neural network and multiple process noise soft switching. J. Syst. Eng.
Electron. 2009, 20, 1227–1232.
25. Mazor, E.; Averbuch, A.; Bar-Shalom, Y.; Dayan, J. Interacting multiple model methods in target tracking: A survey. IEEE Trans.
Aerosp. Electron. Syst. 1998, 34, 103–123. [CrossRef]
26. Gao, W.; Wang, Y.; Homaifa, A. Discrete-time variable structure control systems. IEEE Trans. Ind. Electron. 1995, 42, 117–122.
27. Li, X.R.; Bar-Shakm, Y. Mode-set adaptation in multiple-model estimators for hybrid systems. In Proceedings of the 1992
American Control Conference, Chicago, IL, USA, 24–26 June 1992; pp. 1794–1799.
28. Pannetier, B.; Benameur, K.; Nimier, V.; Rombaut, M. VS-IMM using road map information for a ground target tracking. In
Proceedings of the 2005 7th International Conference on Information Fusion, Philadelphia, PA, USA, 25–28 July 2005; Volume 1, 8p.
29. Xu, L.; Li, X.R. Multiple model estimation by hybrid grid. In Proceedings of the 2010 American Control Conference, Baltimore,
MD, USA, 30 June–2 July 2010; pp. 142–147.
30. Li, X.R.; Zwi, X.; Zwang, Y. Multiple-model estimation with variable structure. III. Model-group switching algorithm. IEEE Trans.
Aerosp. Electron. Syst. 1999, 35, 225–241.
31. Li, X.R.; Jilkov, V.P.; Ru, J. Multiple-model estimation with variable structure-part VI: expected-mode augmentation. IEEE Trans.
Aerosp. Electron. Syst. 2005, 41, 853–867.
32. Lan, J.; Li, X.R. Equivalent-Model Augmentation for Variable-Structure Multiple-Model Estimation. IEEE Trans. Aerosp. Electron.
Syst. 2013, 49, 2615–2630. [CrossRef]
33. Li, X.R.; Zhang, Y. Multiple-model estimation with variable structure. V. Likely-model set algorithm. IEEE Trans. Aerosp. Electron.
Syst. 2000, 36, 448–466.
34. Sun, F.; Xu, E.; Ma, H. Design and comparison of minimal symmetric model-subset for maneuvering target tracking. J. Syst. Eng.
Electron. 2010, 21, 268–272. [CrossRef]
35. Callier, F.M.; Desoer, C.A. Linear System Theory; Springer Science & Business Media: Berlin, Germany, 2012.
36. Kalman, R.E. A new approach to linear filtering and prediction problems. J. Basic Eng. 1960, 82, 35–45. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
395
electronics
Article
An Accelerator for Semi-Supervised Classification with
Granulation Selection
Yunsheng Song 1,2 , Jing Zhang 1, *, Xinyue Zhao 1 and Jie Wang 3
1 School of Information Science and Engineering, Shandong Agricultural University, Tai’an 271018, China;
[email protected] (Y.S.); [email protected] (X.Z.)
2 Key Laboratory of Huang-Huai-Hai Smart Agricultural Technology, Ministry of Agriculture and Rural Affairs,
Shandong Agricultural University, Tai’an 271018, China
3 School of Information, Shanxi University of Finance and Economics, Taiyuan 030006, China;
[email protected]
* Correspondence: [email protected]
Abstract: Semi-supervised classification is one of the core methods to deal with incomplete tag
information without manual intervention, which has been widely used in various real problems for
its excellent performance. However, the existing algorithms need to store all the unlabeled instances
and repeatedly use them in the process of iteration. Thus, the large population size may result in
slow execution speed and large memory requirements. Many efforts have been devoted to solving
this problem, but mainly focused on supervised classification. Now, we propose an approach to
decrease the size of the unlabeled instance set for semi-supervised classification algorithms. In
this algorithm, we first divide the unlabeled instance set into several subsets with the information
granulation mechanism, then sort the divided subsets according to the contribution to the classifier.
Following this order, the subsets that take great classification performance are saved. The proposed
algorithm is compared with the state-of-the-art algorithms on 12 real datasets, and experiment results
show it could get a similar prediction ability but have the lowest instance storage ratio.
co-training is its ability to handle large and complex datasets, where traditional supervised
learning methods may struggle. For instance, in NLP, co-training has been shown to be
effective when dealing with imbalanced datasets, where the number of positive instances
is much smaller than the number of negative instances. In such scenarios, co-training
can effectively leverage the information contained in the unlabeled data to improve the
performance of the classifier. Another area of application for co-training is in data privacy,
where it is often the case that only a limited amount of labeled data is available for training
machine learning models. In these scenarios, co-training can effectively leverage the
information contained in the unlabeled data to improve the performance of the classifier,
without compromising privacy or security [9].
In recent years, several variations and extensions of co-training have been proposed
to address its limitations and improve its performance. For instance, some researchers
have proposed using multiple views of the data rather than just two to capture more
information and make the semi-supervised learning process more robust [10]. Another
line of research has focused on developing new co-training algorithms that are able to
handle noisy or conflicting views of the data [11]. These algorithms aim to identify and
discard unreliable predictions made by one of the classifiers so that the other classifier can
make better predictions in the absence of high-quality supervision. Additionally, there has
been a growing interest in using deep learning models for co-training. For instance, one
approach is to use generative models, such as Generative Adversarial Networks (GANs),
to generate synthetic samples that can be used to augment the labeled data [12]. By using
these synthetic samples in co-training, it is possible to effectively increase the size of the
labeled data, leading to improved performance. Meanwhile, co-training can handle high-
dimensional, complex data representations with deep learning models. For instance, some
researchers have proposed using deep neural networks as the classifiers in co-training and
have shown that this can lead to improved performance in various applications, including
image classification, sentiment analysis, and document classification [13]. Overall, the field
of co-training and semi-supervised learning is rapidly evolving, and there is a wealth of
ongoing research aimed at improving the performance and robustness of these algorithms.
As such, it is an exciting and promising area of study for anyone interested in machine
learning and data science.
Although co-training plays an important role in the semi-supervised classification
task, large-scale data poses a huge challenge to the efficiency of its modeling [14]. Existing
co-training-based semi-supervised classification algorithms usually need to traverse all
unlabeled samples multiple times to find high-confidence elements or valuable classifica-
tion information, but large-scale unlabeled instances make it difficult to achieve efficient
modeling. Some literature proposes using different subsets of unlabeled samples after
division to improve the efficiency of the algorithm; it does not consider the differences in
the contribution of different unlabeled samples to the algorithm. However, it takes a great
challenge for traditional semi-supervised classification algorithms based on co-training to
handle large-scale data in terms of compatibility, effectiveness, and timeliness. Instance
selection as an important data reduction method can solve the large-scale classification
problem by reducing the labeled instances depending on enough label information to
achieve the aim [15,16]. Therefore, traditional instance selection methods cannot be ap-
plied to the semi-supervised classification problem because there exists a small number
of labeled instances with little labeled information. Moreover, each instance is seen as a
basic processing unit to judge whether it is selected or not [17]. It is difficult to follow this
approach to dealing with large-scale unlabeled instances, and there is a need to solve this
problem from a new perspective.
Granular computing is a methodology for processing and analyzing complex data
by partitioning it into smaller, more manageable pieces [18–22]. These smaller pieces, or
granules, can then be further analyzed and processed to provide insights into the original
data. The goal of granular computing is to simplify complex problems by reducing their
complexity to more manageable pieces. This approach has been applied to a variety of
398
Electronics 2023, 12, 2239
2. Related Work
A co-training-based semi-supervised classification algorithm needs to cooperate with
different classifiers from multiple perspectives at the same time to realize the utilization of
unlabeled data, and it has become the focus of research with its higher effectiveness [3,26].
According to the different learning strategies, the existing co-training algorithms are mainly
divided into two categories: the ones based on the sample set augmentation and the ones
based on regularization.
Co-training algorithms based on sample set augmentation, which use classifiers from
different perspectives to select high-confidence unlabeled samples and corresponding
prediction labels from the unlabeled sample set, alternately assign the newly added samples
to different classifiers for retraining and finally repeat the above process until the prediction
results converge. In such algorithms, how to efficiently select labeled samples with high
confidence is the bottleneck that restricts the efficiency of the algorithm. Paper [27] divides
the sample space into a set of equivalence classes and uses cross-validation to determine
how to label unlabeled samples. Paper [28] uses voting to select unlabeled samples with
high confidence; In order to improve the robustness of the collaborative training algorithm,
and papers [29,30] use filtering to screen the newly added unlabeled samples instead of
using them all [31].
Co-training algorithms based on regularization use the information provided by dif-
ferent perspective classifiers as the regularization term of the learning object, and transform
the semi-supervised multi-view learning problem into an optimization problem [32,33].
In order to improve the training efficiency of such algorithms, Sun et al. [34] propose a
sequential training method that uses the union of different unlabeled sample subsets and
labeled sample set L on the basis of dividing the unlabeled sample set into ten subsets
399
Electronics 2023, 12, 2239
of equal size, the union of the first unlabeled sample subset and set L is first used for
modeling, and then the next unlabeled sample set, some elements of the utilized unlabeled
sample set participate in the modeling, and the union modeling of set L. Finally, repeat
the previous step until all unlabeled sample subsets are utilized. Existing difference-based
semi-supervised classification algorithms need to traverse all unlabeled samples multiple
times to find high-confidence elements or valuable classification information, but the mas-
sive scale of unlabeled data makes it difficult to achieve efficient modeling. Although some
literature proposes to use different subsets of unlabeled samples after division to improve
the efficiency of the algorithm, it does not consider the differences in the contribution of
different unlabeled samples to the algorithm.
In conclusion, the existing large-scale co-training-based supervised classification al-
gorithms mainly improve the training efficiency from the view of optimization design.
However, the time complexion is difficult to reduce and obtain a greater improvement for
the large problem, and it still suffers from the large training burden of using the whole of
the unlabeled instances to participate in the training process.
3. Main Content
For the given training set T, which is the union of the labeled instance set
L = { x1 , x2 , · · · , xl } and the unlabeled instance set U = { xl +1 , xl +2 , · · · , xl +u }, where
xi denotes the training instance, l and u are the number of labeled instances and unlabeled
instances, and i = 1, 2, · · · , l + u. Semi-supervised classification algorithms simultaneously
use the labeled instance set L, and unlabeled instance set U to train a classifier f ( x ) with
good performance. A co-training-based semi-supervised classification algorithm uses the
idea of compatible complementarity of multiple views to learn the final classifier. It assumes
that the data has multiple sufficient and conditional independence views, and the classifier
trained on one view can offer supplemental information to the classifiers on the other view.
The supplemental information is achieved by selecting the most trusted unlabeled instances
and pseudo-labels. Nevertheless, several iterations are required, and each iteration must
scan the whole of the unlabeled instances set to the most trusted instances. The large-scale
unlabeled instance carries difficulty in efficiently learning the final classifier.
Instance selection, as one of the most important data preprocessing technologies
to reduce dataset size, is widely used for classification problems, as is the fact that the
contribution of training instances with the different locations in the space to learn a classifier
varies greatly. Numerous studies have shown that the training instances can be divided
into critical instances and non-critical instances, where critical instances mainly define
the class boundary and separate the instances of the same label from the ones from other
labels [16]. Meanwhile, the number of critical instances is significantly smaller than that
of non-critical instances in most real-world datasets. Therefore, the process requires an
effective way to reduce the training set to a relatively small subset by selecting critical
instances and preserving the original data information. Compared with the performance
on the original training set, the efficiency of training the classifier on the reduced set can be
significantly improved on the reduced subset.
Traditional instance selection methods for supervised classification tasks start with
each instance as the most basic processing unit, critical instances are selected by the con-
tribution of each labeled instance to the classifier. The contribution of an instance to the
classifier is usually measured by its location in the input space and the difference with
its nearest neighbors on the label. Although the instance selection is very efficient for
supervised classification tasks, it is difficult to apply directly to semi-supervised classifica-
tion tasks because of its limitations. Different from supervised classification, there exists
a large number of unlabeled instances and few labeled instances for the semi-supervised
classification tasks. Only labeled instances take labeled information to the learner, and this
information is vital to learn a classifier with good performance, so it cannot reduce labeled
instances. Otherwise, the generation ability of the obtained classifier could significantly
decrease. On other hand, traditional instance selection needs the labeled information of
400
Electronics 2023, 12, 2239
each instance to execute data reduction, while this condition is not met for the unlabeled in-
stances. Moreover, the way of treating each instance as the basic process unit is undesirable
for large-scale problems because it is very time-consuming.
To overcome this difficulty, we have proposed a novel instance selection with the
granulation mechanism. This proposed method consists of two key processes: unlabeled
information granulation and granulation selection.
This difference in the contribution of unlabeled instances to the classifier yields the
possibility of executing an instance selection. Compared with the abundant labeled informa-
tion of the labeled instances, unlabeled instances bring a limited classification contribution
to the learner. Due to the presence of a large number of unlabeled instances with limited
classification information, it is difficult to select critical unlabeled instances with their
contribution one by one. Furthermore, semi-supervised classification should not reduce
unlabeled instances one by one from an execution efficiency perspective. This process is a
disaster, especially for classification algorithms with high time complexity. Therefore, we
adopt the idea of granular computing to divide the unlabeled instance set U into m disjoint
< m
subsets Ui , U = ∑ Ui . All the instances in the same subset Ui are considered as a basic
i =1
information granularity to participate in the learning process. In this way, it can greatly
improve learning efficiency by only processing a small number of units. Meanwhile, the
contribution of a subset Ui is easy to obtain compared with the single unlabeled instance.
401
Electronics 2023, 12, 2239
Data partition, as one of the important data granulation techniques, plays a crucial
role in granular computing. There are three key factors to performing data partition for the
co-training-based semi-supervised classification tasks.
• Divided unlabeled instance subsets have the unbalanced information for the final
classifier to obtain a relatively small number of aim subsets to achieve data reduction.
• Number of divided subsets is determined by the characteristics of unlabeled instances
and the contribution to the classifier rather than the subjective prior determination.
• Data partition should use the contribution of the unlabeled instances to the semi-
supervised classifier and close together with the distinguishing feature of co-training.
Therefore, we utilize a similar framework as a Tri-Training [36] method to perform
data partition. Each initial decision tree classifier f r ( x ) is independently trained on the
different set Br sampled from the labeled set L using Bootstrap sampling method, where
r = 1, 2, 3. Owning to the feature of Bootstrap sampling, the sample set Br has a large
difference, as well as the classifier f r ( x ) on it. Then it iteratively retrains each classifier with
the enlarged labeled set L, which is created by introducing several confident unlabeled
instances and their pseudo-label until none of the classifiers changes. The confident
unlabeled instance and its pseudo-label obtained by each classifier are provided by the
remaining two classifiers. Specifically, if the two classifiers have the same prediction for the
same unlabeled instances, these instances are considered to have high labeling confidence
and are added to the labeled training set of the third classifier. In this way, we can estimate
the frequency f re( xi ) at which each unlabeled instance xi is selected as a confident element
during this process. Finally, the unlabeled set U is divided into several subsets according to
the condition that all the unlabeled instances xi in the same subset have the same frequency.
The pseudocode of the proposed method is presented in Algorithm 1.
A decision tree (DT) is chosen as the basic classifier for the Tri-training algorithm
for its unique advantages in Algorithm 1, that is, learning features, high efficiency, and
instability. Both the conditional probability distribution information for the class and the
local geometry information in the input space are used to learn the classifier of DT, and
this kind of information is very comprehensive. Furthermore, the time complexion of DT is
approximately linear of time complexion to efficiently process large-scale data. Finally, the
instability of DT is sensitive to data change for its instability, this is constructive to instance
selection [37].
The measurement f re( xi ) is the frequency at which each unlabeled instance xi is
selected as the confident instance for three classifiers in the whole training process, and it
reflects each unlabeled instance xi to learn the final classifier. The large value of frequency
f re( xi ) means the unlabeled instance xi is always chosen and has a large contribution to
the final classifier. A different value of f re( xi ) also indicates different degrees of effect on
classification performance. Different from previous methods to evaluate the contribution
with a real number, the measurement metrics takes a limited integer value. It is constructive
to divide the unlabeled set U into several subsets according to the possible value of the
measurement f re( xi ). Moreover, the number n = max f re( xi ) − min f re( xi ) of discrete
x i ∈U x i ∈U
values of the measurement f re( xi ) is not subjectively predetermined; it depends on the
effect of unlabeled instances on the classifier.
402
Electronics 2023, 12, 2239
403
Electronics 2023, 12, 2239
<
Acc(Ug Ug−1 ) − Acc(Ug−1 )
≥ α, (1)
Acc(Ug )
where α ∈ (0, 1), g = 2, 3, 4, · · · , m. Many papers suggest that the critical value α = 0.05
to obtain a significant change in performance [38]. Above all, the pseudocode of the
granulation selection is presented in Algorithm 2.
In Algorithm 2, the early stopping condition is used to prevent performing too many
< <
iterations. The classifier trained on set L Ug Ug−1 may improve the classification per-
<
formance of the one on the set L Ug because it adds more unlabeled sample information
from the set Ug−1 . Moreover, the unsupervised information of the set Ug that is construc-
tive to improve the classification performance could be more than the set Ug−1 , where
g = 2, 3, · · · , m. Therefore, the subset Uj is difficult to satisfy condition (1) if the previous
subset Ui cannot meet, where 1 ≤ j < i ≤ m. In this way, this selection process can be
terminated early and obtain a lower number of divided unlabeled instance subsets.
404
Electronics 2023, 12, 2239
4. Experiments
To verify the effectiveness and efficiency of the proposed algorithm for real problems,
extensive experiments on real datasets have been implemented against the typical method
under differently labeled rations.
Further, a typical image dataset of five generic categories called NORB [40] is se-
lected to test the performance of the proposed method for high-dimensional datasets. The
following Figure 2 shows examples of the training image and test image of the dataset.
For each dataset, about 3/4 of the data is selected as the training set and the rest as
the test set, where each training set is the union of the labeled subset L and the unlabeled
405
Electronics 2023, 12, 2239
406
Electronics 2023, 12, 2239
are 0.890, 0.920, 0.933, and 0.947, and the ones of the Tri algorithm are 0.896, 0.919, 0.930,
and 0.942. Therefore, the absolute difference between the two algorithms on the median of
classification under the same PL value is also very small.
Figure 3. The comparison of classification accuracy between two algorithms on the selected datasets.
Finally, the Wilcoxon signed rank test between two algorithms classification accuracy
is made to avoid the effect of the subjective judgment. p-values of this test under different
PL are 0.060, 0.720, 0.375, and 0.206; these values are all larger than the given significant
level of 0.05. Thus, there exists no significant difference in the classification accuracy
between two algorithms.
Besides classification accuracy, Cohen’s kappa is also used to evaluate the classification
performance of the learner, which can solve the problem that accuracy does not compensate
for hits that can be attributed to mere chance. Similar to the result of Table 2, Table 3 lists
Kappa of two algorithms under different labeled proportions, as well as the descriptive
statistics and p-value of the Wilcoxon signed-rank test. Figure 4 shows the comparison of
the Kappa of the two algorithms.
Figure 4 shows the ISTri algorithm also obtains quite a similar Kappa as the Tri
algorithm on each dataset under the same labeled rate PL = 0.2, 0.4, 0.6, and 0.8. A
407
Electronics 2023, 12, 2239
408
Electronics 2023, 12, 2239
Figure 4. The comparison of Kappa between two algorithms on the selected datasets.
409
Electronics 2023, 12, 2239
Table 5 shows that the ISTri algorithm has much less execution time on each dataset
under the same value of PL. Meanwhile, the means of ET of the ISTri algorithm on all the
datasets under different PL are 77.761, 115.005, 175.531, and 221.013, while the ones of
the Tri algorithm are 161.638, 196.253, 286.73, and 221.013. Additionally, the medians of
ET of the ISTri algorithm on all the datasets under different PL are 6.780, 10.515, 13.649,
and 16.683, while the ones of the Tri algorithm are 14.428, 18.621, 23.276, and 26.812. This
descriptive statistical result also corroborates the ISTri algorithm being able to obtain much
less execution time than the Tri algorithm. The execution time of algorithms is affected by
the dataset size, and its value is positively correlated with the amount of data.
To effectively compare the training efficiency of the algorithm, a speedup ratio named
SR = ET(Tri)/ET(ISTri) is defined, where ET(Tri) and ET(ISTri) are the execution time of
the Tri algorithm and IStri algorithm on the same dataset. This new relative indicator can
eliminate the effect of data volume on the algorithm performance, and it evaluates the
410
Electronics 2023, 12, 2239
difference between the two algorithms’ performance from a relative perspective. Table 6
lists the SR between two algorithms under different labeled proportions.
The result of Table 6 shows that the value of SR on each data is significantly greater
than one on each dataset under different PLs, and it confirms that the proposed algorithm
obtains higher training efficiency than the original algorithm. Especially, the ISTri algorithm
obtains a training efficiency of more than two times higher than the Tri algorithm on dataset
letter, optdigits, ring, seismic, and winequality–white under PL = 0.2. ISTri algorithm also
obtains nearly 1.5 times higher training efficiency than the Tri algorithm on most datasets
when PL = 0.2, 0.4, and 0.6. Simple statistical result lists that the means of SR on all the
dataset under different PLs are 2.113, 1.760, 1.652, and 1.613, and the medians are 1.993,
1.759, 1.660, and 1.590. Therefore, the ISTri algorithm achieves a training efficiency of more
than 0.5 times higher than the original algorithm from a global perspective.
The reason for the higher training efficiency of the ISTri algorithm is that it uses the
reduced unlabeled instance subset rather than the original unlabeled instance set to learn
the classifier. As we all know, the training time of the classifier is negatively correlated
with the training set size. The larger the training set, the longer the training time. For the
semi-supervised classification tasks, unlabeled instances make up a large proportion of
the training set. Moreover, the proposed instance selection method can effectively and
efficiently compress unlabeled instances while retaining most of the information valid
for the classifier, and this result can be verified by the low proportion of the selected
unlabeled instances.
Acc Kappa ET
PL PS-ISTri
Tri ISTri Tri ISTri Tri ISTri
0.2 0.980 0.971 0.974 0.968 481.778 271.731 0.326
0.4 0.987 0.982 0.984 0.979 621.516 514.607 0.424
0.6 0.994 0.992 0.992 0.990 795.907 686.806 0.499
0.8 0.996 0.995 0.994 0.993 936.799 836.467 0.514
411
Electronics 2023, 12, 2239
As the result in Table 7 shows, the ISTri algorithm efficiently and effectively processes
the high-dimensional problems and achieves comparable results to the Tri algorithm. The
absolute difference in Acc between two algorithms under different values of PL are 0.009,
0.005, 0.002, and 0.001, as well as the one on Kappa, are 0.006, 0.005, 0.002, and 0.001. In
the worst case, the largest difference between Acc and Kappa is 0.009 and 0.006, and this
difference is very small relative to the overall performance of the algorithm. Therefore,
there exists a negligible difference between Acc and Kappa under different values of PL.
The execution time of the ISTri algorithm is much less than the Tri algorithm under the
same PL. The ratio SR between them is 1.773, 1.208, 1.159, and 1.120; all the values are
larger than one. Therefore, the ISTri algorithm obtains higher training efficiency than the Tri
algorithm. The last column of Table 7 also lists the unlabeled selection proportion; the value
of PS is 0.326, 0.424, 0.499, and 0.514; the values are significantly smaller than one. To sum
up, the proposed instance selection method greatly reduces the size of unlabeled instances
while it can preserve the classification information to learn the classifier. Meanwhile, the
execution time of the IStri algorithm is much less than the Tri algorithm under each value
of PL, and the ratio SR between them is also larger than 1. Moreover, the selection ratio of
unlabeled instances under different PLs is 0.326, 0.424, 0.499, and 0.514, which shows that
the ISTri algorithm uses fewer unlabeled instances to constitute the training set. This result
demonstrates that the ISTri algorithm has a higher training efficiency than the Tri algorithm.
Figure 5 shows the change in classification performance of the proposed method under
different PLs, where the left fig describes classification accuracy and the right fig for Kappa.
There exists a noticeable difference in the value of Acc on almost all datasets except data
combine, pendgitits, and seismic from Figure 5a. The value of Kappa also has a significant
change on each dataset under different PL, especially for dataset connext-4, phoneme,
winequality–white from Figure 5b. Meanwhile, the p-values of the Friedman test on Acc
and Kappa are 4.02 × 10−7 and 5.49 × 10−7 , both smaller than the given significant level
of 0.05. So, PL affects the classification performance of the proposed method. Moreover,
Figure 5 also shows the value of Acc is positively correlated to PL on these datasets, i.e., its
value significantly increases by the increasing PL, as well as this similar result for Kappa.
The descriptive statistics of Acc and Kappa over all the datasets under different values
412
Electronics 2023, 12, 2239
of PLs also verify this result from Tables 3 and 4. The labeled instances take much more
valuable label information that is critical to learn the classifier than unlabeled instances, so
PS plays a key role in the classification performance of the classifier for semi-supervised
classification problems. This fact explains why the classification performance of the ISTri
algorithm is positively correlated with PS. Nevertheless, the ISTri algorithm still obtains no
significant difference from the Tri algorithm.
Figure 6 shows the change of the metric PS under different PLs. The value of PS
fluctuates greatly on each dataset, and this result is also proved by the numeric results
in Table 4. The p-value of the Friedman test on PS is 1.38 × 10−6 . smaller than the given
significant level of 0.05. Therefore, there exists a significant difference in PS under different
values of PL. Similar to the performance of Acc and Kappa under different PLs, the value
of PS is also positively correlated with PL. The unlabeled instances selection of the ISTri
algorithm mainly depends on the agreement on the pseudo-labels offered by the classifiers
on the labeled instance subsets, where the parameter PL controls the number of labeled
instances. The classification ability of multi-view classifiers trained on the labeled instance
subsets increasingly improves as the enlarging value of PL so that the likelihood that
predictive labels for each unlabeled instance are the same could increase obviously. In this
way, the final selection of unlabeled data increases significantly.
The change in speedup ratio (SR) under different PLs is shown in Figure 7, where the
baseline SR = 1 is also plotted on it. It can be found that all the value of SR on each dataset
under different values of PL is larger than one. The metric SR has significantly different
values on each dataset under different PLs, and it can be validated by the result of Table 6.
The p-value of the Friedman test on SR is 3.73 × 10−7 , smaller than the given significant
level of 0.05. Therefore, there exists a significant difference in SR under different values
of PL. Meanwhile, SR is negatively correlated with PL on each dataset from Figure 7. SR
evaluates the ratio of the execution time between the ISTri algorithm and the Tri algorithm,
and the main difference between them is the number of unlabeled instances that are used
to learn the classifier. The selected number of unlabeled instances continues to increase
with the increasing value of PL for the ISTri algorithm, and it also induces its execution
time to get longer. This result explains the reason that SR is negatively correlated with PL.
413
Electronics 2023, 12, 2239
5. Conclusions
For the problem of massive unlabeled instances bringing a great challenge to efficiently
train co-training-based semi-supervised classification algorithms, this paper has developed
an unlabeled instance selection algorithm based on the granulation mechanism. Different
from the previous approaches from the view of algorithm optimization, it takes advantage
of data reduction to avoid the difficulty of using domain knowledge to improve the
efficiency of algorithms. The proposed method treats the unlabeled instances with the
same frequency at which the trust instance is selected as the basic information granulation
rather than each unlabeled instance; it is constructive to significantly improve execution
efficiency. The selection of each unlabeled instance subset into the training set depends
on its contribution to the current classification performance; this operation is guaranteed
to have strong adaptability for different datasets and algorithms. The advantage of the
proposed method is verified by the experiment results on the medium-dimensional and
high-dimensional datasets. Especially it has a comparable classification performance with
the typical algorithm, while it has high execution efficiency and fewer unlabeled instances
within the training set. The proposed method can be widely used for driverless car obstacle
recognition, mobile phone face recognition, temperature monitoring in greenhouses, and
other large-scale application scenarios. Finally, this paper provides a potentially effective
solution to improve the training efficiency of other kinds of semi-supervised classification
algorithms. Future research work will explore the application of proposed algorithms in
practical systems such as text classification, image classification, and pattern recognition.
Author Contributions: Writing the original draft and data preparation, Y.S.; writing the review and
editing, J.Z.; oversight and leadership responsibility for the research activity planning and execution,
X.Z.; implementation of the computer code and supporting algorithms, J.W. All authors have read
and agreed to the published version of the manuscript.
Funding: National Natural Science Foundation of China: 62006145; Shandong Provincial Natural
Science Foundation, China: ZR2020MF146.
Data Availability Statement: The selected datasets in this paper are public, and they can be
freely downloaded at LIBSVM-dataset repository (https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/
datasets/, 12 April 2023), KEEL-dataset repository (https://sci2s.ugr.es/keel/datasets.php, 12 April
2023) and NORB (https://cs.nyu.edu/~yann/research/norb/, 12 April 2023).
414
Electronics 2023, 12, 2239
Acknowledgments: This paper was completed by Key Laboratory of Huang-Huai-Hai Smart Agri-
cultural Technology of Ministry of Agriculture and Rural Affairs, Shandong Agricultural University.
We thank the school for its support and help.
Conflicts of Interest: This paper represents the opinions of the authors and does not mean to
represent the position or opinions of the Shandong Agricultural University.
References
1. Blum, A.; Mitchell, T. Combining labeled and unlabeled data with co-training. In Proceedings of the Eleventh Annual Conference
on Computational Learning Theory, Madison, MI, USA, 24–26 July 1998; pp. 92–100.
2. Prasetio, B.H.; Tamura, H.; Tanno, K. Semi-supervised deep time-delay embedded clustering for stress speech analysis. Electronics
2019, 8, 1263. [CrossRef]
3. Ning, X.; Cai, W.; Zhang, L.; Yu, L. A review of research on co-training. Concurr. Comput. Pract. Exp. 2021, 21, e6276. [CrossRef]
4. Ng, K.W.; Furqan, M.S.; Gao, Y.; Ngiam, K.Y.; Khoo, E.T. HoloVein—Mixed-reality venipuncture aid via convolutional neural
networks and semi-supervised learning. Electronics 2023, 12, 292. [CrossRef]
5. Li, L.; Zhang, W.; Zhang, X.; Emam, M.; Jing, W. Semi-supervised remote sensing image semantic segmentation method based on
deep learning. Electronics 2023, 12, 348. [CrossRef]
6. Lang, H.; Agrawal, M.N.; Kim, Y.; Sontag, D. Co-training improves prompt-based learning for large language models. In Pro-
ceedings of the 39th International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 11985–12003.
7. Fan, J.; Gao, B.; Jin, H.; Jiang, L. Ucc: Uncertainty guided cross-head co-training for semi-supervised semantic segmentation.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June
2022; pp. 9947–9956.
8. Sheikh Hassani, M.; Green, J.R. Multi-view Co-training for microRNA prediction. Sci. Rep. 2019, 9, 10931. [CrossRef]
9. Wang, H.; Shen, H.; Li, F.; Wu, Y.; Li, M.; Shi, Z.; Deng, F. Novel PV power hybrid prediction model based on FL Co-Training
method. Electronics 2023, 12, 730. [CrossRef]
10. Sun, S.; Jin, F. Robust co-training. Int. J. Pattern Recognit. Artif. Intell. 2011, 25, 1113–1126. [CrossRef]
11. Dong, Y.; Jiang, L.; Li, C. Improving data and model quality in crowdsourcing using co-training-based noise correction. Inf. Sci.
2022, 583, 174–188. [CrossRef]
12. Cui, K.; Huang, J.; Luo, Z.; Zhang, G.; Zhan, F.; Lu, S. GenCo: Generative co-training for generative adversarial networks with
limited data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 22 February–1 March
2022; Volume 36, pp. 499–507.
13. Han, T.; Xie, W.; Zisserman, A. Self-supervised co-training for video representation learning. Adv. Neural Inf. Process. Syst. 2020,
33, 5679–5690.
14. Li, B.; Wang, J.; Yang, Z.; Yi, J.; Nie, F. Fast semi-supervised self-training algorithm based on data editing. Inf. Sci. 2023,
626, 293–314. [CrossRef]
15. Li, Y.; Maguire, L. Selecting critical patterns based on local geometrical and statistical information. IEEE Trans. Pattern Anal.
Mach. Intell. 2010, 33, 1189–1201.
16. Garcia, S.; Derrac, J.; Cano, J.; Herrera, F. Prototype selection for nearest neighbor classification: Taxonomy and empirical study.
IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 417–435. [CrossRef] [PubMed]
17. Li, Y.; Liang, D. Safe semi-supervised learning: a brief introduction. Front. Comput. Sci. 2019, 13, 669–676. [CrossRef]
18. Liang, J.; Qian, Y.; Li, D.; Qinghua, H. Theory and method of granular computing for big data mining. Sci. China Inf. Sci. 2015,
45, 188–198.
19. Yao, Y. Three-way granular computing, rough sets, and formal concept analysis. Int. J. Approx. Reason. 2020, 116, 106–125.
[CrossRef]
20. Zhang, Z.; Gao, J.; Gao, Y.; Yu, W. Two-sided matching decision making with multi-granular hesitant fuzzy linguistic term sets
and incomplete criteria weight information. Expert Syst. Appl. 2021, 168, 114311. [CrossRef]
21. Chu, X.; Sun, B.; Chu, X.; Wu, J.; Han, K.; Zhang, Y.; Huang, Q. Multi-granularity dominance rough concept attribute reduction
over hybrid information systems and its application in clinical decision-making. Inf. Sci. 2022, 597, 274–299. [CrossRef]
22. Sangaiah, A.K.; Javadpour, A.; Ja’fari, F.; Pinto, P.; Zhang, W.; Balasubramanian, S. A hybrid heuristics artificial intelligence
feature selection for intrusion detection classifiers in cloud of things. Clust. Comput. 2023, 26, 599–612. [CrossRef]
23. Song, Y.; Zhang, J.; Zhang, C. A survey of large-scale graph-based semi-supervised classification algorithms. Int. J. Cogn. Comput.
Eng. 2015, 45, 1355–1369. [CrossRef]
24. Zheng, W.; Qian, F.; Zhao, S.; Zhang, Y. M-GWNN: Multi-granularity graph wavelet neural networks for semi-supervised node
classification. Neurocomputing 2021, 453, 524–537. [CrossRef]
25. Zhu, P.; Zhang, W.; Wang, Y.; Hu, Q. Multi-granularity inter-class correlation based contrastive learning for open set recognition.
Int. J. Softw. Inf. 2022, 12, 157–175. [CrossRef]
26. Zhao, J.; Xie, X.; Xu, X.; Sun, S. Multi-view learning overview: Recent progress and new challenges. Inf. Fusion 2017, 38, 43–54.
[CrossRef]
415
Electronics 2023, 12, 2239
27. Zhou, Y.; Goldman, S. Democratic co-learning. In Proceedings of the 16th IEEE International Conference on Tools with Artificial
Intelligence, Boca Raton, FL, USA, 15–17 November 2004; pp. 594–602.
28. Li, M.; Zhou, Z. Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans.
Syst. Man Cybern.-Part A Syst. Hum. 2007, 37, 1088–1098. [CrossRef]
29. Xu, X.; Li, W.; Xu, D.; Tsang, I.W. Co-labeling for multi-view weakly labeled learning. IEEE Trans. Pattern Anal. Mach. Intell. 2015,
38, 1113–1125. [CrossRef]
30. Ma, F.; Meng, D.; Xie, Q.; Li, Z.; Dong, X. Self-paced co-training. In Proceedings of the 34th International Conference on Machine
Learning, Sydney, Australia, 6–11 August 2017; pp. 2275–2284.
31. Derrac, J.; Garcia, S.; Sanchez, L.; Herrera, F. Keel data-mining software tool: Data set repository, integration of algorithms and
experimental analysis framework. J. Mult. Valued Log. Soft Comput. 2015, 17, 255–287.
32. Ye, H.; Zhan, D.; Miao, Y.; Jiang, Y.; Zhou, Z. Rank consistency based multi-view learning: A privacy-preserving approach.
In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, Melbourne, Australia,
19–23 October 2015; pp. 991–1000.
33. Tang, J.; Tian, Y.; Zhang, P.; Liu, X. Multiview privileged support vector machines. IEEE Trans. Neural Netw. Learn. Syst. 2017,
29, 3463–3477. [PubMed]
34. Sun, S.; Shawe-Taylor, J. Sparse semi-supervised learning using conjugate functions. J. Mach. Learn. Res. 2010, 11, 2423–2455.
35. Van Engelen, J.E.; Hoos, H.H. A survey on semi-supervised learning. Mach. Learn. 2020, 109, 373–440. [CrossRef]
36. Zhou, Z.; Li, M. Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 2005, 17, 1529–1541.
[CrossRef]
37. Breiman, L. Heuristics of instability and stabilization in model selection. Ann. Stat. 1996, 24, 2350–2383. [CrossRef]
38. Song, Y.; Liang, J.; Lu, J.; Zhao, X. An efficient instance selection algorithm for k nearest neighbor regression. Neurocomputing
2017, 251, 26–34. [CrossRef]
39. Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. Acm Trans. Intell. Syst. Technol. 2011, 2, 1–27. [CrossRef]
40. LeCun, Y.; Huang, F.J.; Bottou, L. Learning methods for generic object recognition with invariance to pose and lighting.
In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC,
USA, 27 June–2 July 2004; Volume 2.
41. Ben-David, A. A lot of randomness is hiding in accuracy. Eng. Appl. Artif. Intell. 2007, 20, 875–885. [CrossRef]
42. Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
416
electronics
Article
Flight Delay Prediction Model Based on Lightweight Network
ECA-MobileNetV3
Jingyi Qu *, Bo Chen, Chang Liu and Jinfeng Wang
Tianjin Key Laboratory of Advanced Signal Processing, Civil Aviation University of China, Tianjin 300300, China
* Correspondence: [email protected]
Abstract: In exploring the flight delay problem, traditional deep learning algorithms suffer from
low accuracy and extreme computational complexity; therefore, the deep flight delay prediction
algorithm is difficult to directly deploy to the mobile terminal. In this paper, a flight delay prediction
model based on the lightweight network ECA-MobileNetV3 algorithm is proposed. The algorithm
first preprocesses the data with real flight information and weather information. Then, in order
to increase the accuracy of the model without increasing the computational complexity too much,
feature extraction is performed using the lightweight ECA-MobileNetV3 algorithm with the addition
of the Efficient Channel Attention mechanism. Finally, the flight delay classification prediction level
is output via a Softmax classifier. In the experiments of single airport and airport cluster datasets, the
optimal accuracy of the ECA-MobileNetV3 algorithm is 98.97% and 96.81%, the number of parameters
is 0.33 million and 0.55 million, and the computational volume is 32.80 million and 60.44 million,
respectively, which are better than the performance of the MobileNetV3 algorithm under the same
conditions. The improved model can achieve a better balance between accuracy and computational
complexity, which is more conducive mobility.
Keywords: delay prediction model; lightweight neural network; lightweight attention mechanism
1. Introduction
In recent years, China’s air traffic industry has grown rapidly with the implementation
Citation: Qu, J.; Chen, B.; Liu, C.; of the 13th Five-Year Plan for Civil Aviation [1]. However, the number of flights continues
Wang, J. Flight Delay Prediction to grow, but the normal rate of flights is becoming lower and lower. During this period, the
Model Based on Lightweight
Civil Aviation Administration carried out total control of flight slots and adjusted flight
Network ECA-MobileNetV3.
structure, and the problem of flight delays was alleviated. According to a report from the
Electronics 2023, 12, 1434. https://
Civil Aviation Work Conference 2022 held by the Civil Aviation Administration of China [2],
doi.org/10.3390/electronics12061434
since 2020, due to the impact of the epidemic, the number of flights has significantly
Academic Editor: Baris Aksanli decreased abnormally, so flight delays during the epidemic are not considered. In addition,
China will overtake the United States as the largest air transport organization in 2029,
Received: 15 February 2023
according to research from the International Air Transport Association (IATA) [3]. With
Revised: 14 March 2023
Accepted: 16 March 2023
the COVID-19 epidemic under effective control, the volume of air traffic will also increase
Published: 17 March 2023
rapidly. Therefore, the speed of air traffic recovery and the projections of international
reports firmly reflect the urgent traffic demand of China’s air traffic industry. Serious flight
delays are likely to trigger “mass incidents of air passengers” [4–6], thus endangering the
public safety of the airport and the personal safety of the passengers. Understanding flight
Copyright: © 2023 by the authors. delays in advance has become a pressing issue for civil aviation. To this end, a large number
Licensee MDPI, Basel, Switzerland. of studies have been carried out by domestic and foreign scholars in related fields.
This article is an open access article The traditional flight delay prediction methods mainly include statistical inference,
distributed under the terms and simulation and modeling, and machine learning methods [7]. Xu et al. [8] proposed
conditions of the Creative Commons
a permutation and incremental permutation SVM algorithm considering the demand
Attribution (CC BY) license (https://
of flight volume and real-time refreshment of flight data and validated it on manual
creativecommons.org/licenses/by/
data. Finally, the accuracy of flight delay prediction can reach more than 80%. Similarly,
4.0/).
Luo ‘s team [9] and Luo ‘s team [10] also gradually considered using support vector
machines or improved support vectors to analyze flight delays. In view of the irregular
dynamic distribution attributes of flight data, Cheng et al. [11] proposed a classification
prediction model of flight delay based on C4.5 decision tree to avoid the impact of flight
distribution changes on the algorithm model, which is a certain improvement compared
with the traditional Bayesian algorithm. Nigam and Govinda [12] analyzed the flight
data and meteorological data of several airports in the United States and used the logistic
regression algorithm in the machine learning algorithm to predict the flight departure
delay. Khanmohammadi et al. [13] proposed a flight delay prediction model based on an
improved Artificial Neural Network (ANN), and they used multiple linear N-1 coding
to preprocess complex airport data models. Wu et al. [14] proposed a flight delay spread
prediction model based on CBAM-CondenseNet, which enhances the transmission of
deep information in the network structure by adopting a channel and spatial attention
mechanism to increase the prediction accuracy. When using deep learning to predict
flight delay, these scholars chose a relatively deep learning network, which requires a lot of
computing time and resources. They can only choose to deploy the flight delay algorithm to
the PC terminal. However, for the deployment requirements of mobile terminals, there is no
trade-off between the accuracy and computational complexity of the prediction algorithm.
Recently, experts from home and abroad have carried out in-depth research and in-
novation in lightweight materials. In the beginning, experts used knowledge distillation,
model pruning and other methods to try out algorithms. The former first trains a Net-
T network and then uses network distillation to obtain a smaller Net-S network, thus
achieving the effect of a simplified model. The latter simplified the model through channel
pruning and other operations on the trained model [15–19]. Lightweight convolutional
neural networks are an emerging branch of deep learning algorithms. This type of net-
work applies lightweight operations to the algorithm itself and continuously innovates
the algorithm from within so that it can maximize accuracy and continuously meet the
computational power requirements of mobile devices. For example, the peerless team [20]
proposed the ShuffleNet series algorithm, which uses the Channel Shuffle and Channel
split operations [21] to speed up network and feature reuse, and the lightweight neural
network Efficientnet algorithms presented by the Google team [22]. This algorithm com-
prehensively considers the input data size, network depth, and width, and proposes a
model compound scaling method to control model computing power and ensure accu-
racy. Iandola et al. [23] proposed a lightweight SqueezeNet algorithm that proposes a Fire
module structure to design the network structure. In this structure, to extract features and
lessen model computation, single-layer and double-layer convolutions were used. The
Google team [24] proposed MobileNet series algorithms, which are extremely influential
in lightweight neural networks, which use deep separable convolution and SE attention
mechanism for feature extraction and combine with structures, such as inverted residual
error, which considerably improve the accuracy and computational performance of the
model [25,26]. Excellent results have been achieved in face recognition, image classification,
target detection, etc. [27–29].
To sum up, in view of the problems of low prediction accuracy and high computational
complexity of the existing flight delay prediction algorithms, which are not conducive
to deployment on mobile devices and other devices, this paper proposes an improved
lightweight ECA-MobilenetV3 algorithm, which replaces the SE model with a lightweight
ECA (Efficient Channel Attention) module, effectively reducing the computational com-
plexity of the model without losing accuracy; it lays a foundation for the application of the
model in mobile devices. The experiment uses real domestic meteorological data and flight
data for analysis and verification.
The organizational structure of this paper is as follows: Section 1 introduces the
background and significance of the paper, as well as the research status at home and
abroad. Section 2 proposes and introduces the ECA-MobileNetV3 network model. Section 3
introduces the building process of a flight delay prediction model in detail. Section 4
418
Electronics 2023, 12, 1434
shows the analysis of the experimental results and the application of the model. Section 5
summarizes the work of this paper and describes the future work.
1 × 1 Conv 1 × 1 Conv
1 × 1 Conv
3 × 3 DW-Conv 3 × 3 DW-Conv 1 × 1 Conv
3 × 3 DW-Conv
SE Attention ECA Attention 3 × 3 DW-Conv
1 × 1 PW-Conv 1 × 1 PW-Conv
Global Average
Pooling Global Average
1 × 1 Conv 1 × 1 Conv
Fully Connected Pooling
3 × 3 DW-Conv
Layer 3 × 3 DW-Conv Conv1D
SE Attention
ReLu6 ECA Attention ˄ k =ψ C ˅
1 × 1 PW-Conv
1 × 1 PW-Conv sigmoid
Fully Connected
Layer
...
...
H-Sigmoid
1 × 1 Conv Channel weight calibration
5 × 5 DW-Conv
1 × 1 Conv 1 × 1 PW-Conv
Channel weight calibration 5 × 5 DW-Conv
1 × 1 PW-Conv SE Attention
1 × 1 PW-Conv ECA Attention
1 × 1 PW-Conv
1 × 1 Conv 1 × 1 Conv
5 × 5 DW-Conv 5 × 5 DW-Conv
SE Attention ECA Attention
1 × 1 PW-Conv 1 × 1 PW-Conv
(a) (b)
Figure 1. Comparison of the backbone network structure of the two algorithms before and after the
improvement. (a) Backbone network structure of the MobileNetV3. (b) Backbone network structure
of the ECA-MobileNetV3.
419
Electronics 2023, 12, 1434
its size, the number of channels in the output matrix in each layer of the network can be
changed, so that the model size can be changed quickly.
Table 1. Configuration table of the flight delay prediction model based on the ECA-MobileNetV3 algorithm.
420
Electronics 2023, 12, 1434
K =ψ C
X
X
σ
H
H C
C
W
W
1 × 1× C 1×1× C 1 × 1× C
In Formula (1), Wc represents the channel-acquired weights, σ(•) the Sigmoid activa-
tion function, C1Dk (•) the adaptive one-dimensional convolution, and Zc the feature matrix
after global average pooling. In Formula (2), k represents the number of local cross-channel
interactions, which is the size of the one-dimensional convolutional kernel, C represents the
number of channels in the feature matrix, and γ and b represent constants. The experiment
is set to 2 and 1, respectively, according to the requirements of the original paper.
ReLU6( x + 3)
H-Swish( x ) = x (4)
6
1
Sigmoid( x ) = (5)
1 + e− x
Through the above description, we can obtain the feature matrix in the calculation
process, the convolution layer can be expressed as the following Formulas (6) and (7),
which can derive a residual module after three convolution operations, and they can be
represented by Formulas (8)–(10).
y1j = σ ( BN (∑ Wjk
1
⊗ Xk + b1j )) (8)
k
421
Electronics 2023, 12, 1434
y2j = σ ( BN (∑ Wjk
2
⊗ y1k + b2j )) (9)
k
where Wjkl represents the weight of the k-th feature to the j-th feature in the layer l − 1, blj
represents the bias of the j-th feature in the layer l, zlj represents the output value before
the k-th feature in layer l passes the activation function, σ (•) represents the activation
function, and ykl −1 represents the mapping value of the k-th feature in the layer l − 1 after
the activation function.
In addition, the feature matrix enters the ECA module after entering the deconvolu-
tional module and passing through traditional convolutional layers and deep convolutional
layers. The ECA module lies between deep convolution and pointwise convolution. As a
complete calculation unit for acquiring channel weights, its forward propagation process is
shown in Formula (11):
y j = yconv
k ⊗ sigmoid(C1D ( GAP(yconv
k )) (11)
where yconv
k represents the feature matrix after the deep convolution operation, and the
second half of the formula represents the feature weights acquired through the ECA module.
∂J ∂J
δjl = = l σ(zlj ) = BN (∑ Wjkl +1 ⊗δkl +1 + blj ) ⊗ σ (zlj ) (12)
∂zlj ∂y j k
∂J
= (δ4j ⊗ Wj4 + δ3j ⊗ Wj3 + δ2j ⊗ Wj2 ) ⊗ y j (13)
∂Wjk1
∂J
= (δ4j ⊗ Wj4 + δ3j ⊗ b3j + δ2j ⊗ b2j ) (14)
∂b1jk
where J represents the loss function, δjl represents the error value of the j-th eigenvalue in
layer l, Wjkl +1 represents the weight of neurons from k-th to j-th feature in layer l, and ⊗
represents the multiplication between matrices.δ2j , δ3j , and δ4j , respectively, represent the
error terms between traditional convolutional layer, deep convolutional layer, ECA module,
and point-by-point convolutional layer. According to Formulas (13) and (14), the weight
and bias can be updated from back to forward, respectively.
422
Electronics 2023, 12, 1434
Feature extraction
Conv2d
1 × 1 Conv
Inverted residue 3 × 3 DW
module1
ECA Attention
Data preprocessing 1 × 1 PW
Flight delay level output
Flight data
1 × 1 Conv
3 × 3 DW
Inverted residue
Softmax
ECA Attention
Inverted residue 1 × 1 PW
Data fusion Data encode module 3
Conv2d
GAP
FC
FC
423
Electronics 2023, 12, 1434
and matrix quadrature. For the dataset of Shanghai Hongqiao Airport, the flights of
Shanghai Hongqiao Airport should be extracted from the original dataset according to
the planned departure airport and planned arrival airport according to the four-character
code of civil aviation airport “ZSSS” (Shanghai Hongqiao International Airport). Similarly,
for the Beijing–Tianjin–Hebei airport cluster dataset, flights from major airports in Beijing,
Tianjin, and Shijiazhuang were, respectively, extracted according to the four-character code
of civil aviation airport “ZBAA” (Beijing Capital International Airport), “ZBAD” (Beijing
Daxing International Airport), “ZBTJ” (Tianjin Binhai International Airport), and “ZBSJ”
(Shijiazhuang Zhengding International Airport), and then the subsequent data pretreatment
work was carried out.
The first step in data preprocessing is data cleaning: attribute columns with many nulls
in the dataset, duplicate data and attribute deletion, and other operations. The second step
is data fusion operation: set the time attribute in the meteorological data as the association
primary key I, set the planned start time and planned landing time in the flight data as the
association primary key II according to the airport ID, and then conduct the association
fusion between the primary key I and the primary key II. In order to enhance the data, the
10 min meteorological information is fused in this paper to enlarge the feature information
of the fused data. The third step is the data encoding operation: Considering that the
categorical data in the dataset contain low-base data and high-base data, as well as the
numerical attributes of the data, the mixed encoding methods of Min–Max coding [35]
and CatBoost coding [36] are adopted in this paper to encode the dataset, so as to ensure
that the data remain in the same dimensional range before input into the algorithm. There
is also no dimensional explosion. The fourth step is the data matrix operation: since the
MobileNetV3 algorithm belongs to the convolutional neural network, its input data is
required to be in the form of a matrix, so the dataset in this paper needs to be converted
from the form of vector to the form of matrix before input into the algorithm, so as to meet
the input requirements.
424
Electronics 2023, 12, 1434
can extract features at different levels. During this period, the ECA-MobileNetV3 network
uses an ECA module and feature fusion technology to fuse feature maps at different levels
to improve the expression ability of feature maps. Then, the feature map passes through a
global average pooling layer. The global average pooling layer can reduce the dimension
of the feature map into a vector. Finally, this vector maps the feature vector to the target
category through a fully connected layer classifier to complete the classification task.
Flight Delay Grade Flight Delay Time T (minute) Hongqiao Airport Beijing–Tianjin–Hebei Airport Cluster
0 (No delay) T ≤ 15 242,873 898,033
1 (Mild delay) 15 < T ≤ 60 34,388 91,362
2 (Moderate delay) 60 < T ≤ 120 14,904 32,053
3 (Highly delayed) 120 < T ≤ 240 7379 16,932
4 (Heavy delay) T > 240 2049 10,195
The flight delay prediction algorithm then uses the Softmax classifier to determine
the flight delay level. Softmax function is a commonly used activation function, which
is often used for the final output of multi-classification problems. The original Softmax
classifier formula is shown in (15), where xi represents the i-th sample, q represents the
number of categories, and j represents the number of categories. The Softmax function can
map a q-dimension vector to a q-dimension probability distribution, where the value of
each element represents the probability size of the category. Therefore, the classifier can
compute a probability value for each delay level, and the highest value is used as each
datum’s final result. The Softmax classifier formula is shown in (16):
e xi
so f tmax ( xi ) = q (15)
∑ e xi
j =1
⎡ ⎤ ⎡ ⎤
e θ1 x ( i )
T
p ( y (i ) = 1) x (i ) ; θ
⎢ ⎥ ⎢ ⎥
⎢ e θ2 x ( i ) ⎥
T
⎢ p ( y (i ) = 2) x (i ) ; θ ⎥ ⎢ ⎥
⎢ ⎥ 1 ⎢ ⎥
hθ ( x ) = ⎢ p ( y (i ) = 3) x (i ) ; θ ⎥= q e θ3 x ( i )
T
⎢ ⎥ (16)
⎢ ⎥ θ x (i ) ⎢
T
⎥
⎣ ... ⎦ ∑ ej ⎣ ... ⎦
j =1
p ( y (i ) = q ) x (i ) ; θ e θq x (i )
T
425
Electronics 2023, 12, 1434
Among them, hθ ( x ) is the final output of the flight delay prediction model, θ is the
optimal parameter obtained by the model, i represents the serial number of data quantity,
and q represents the classification number of flight delay level.
4. Interpretation
4.1. Experimental Environment and Model Parameter Configuration
The computer used in this paper was set up as follows under the described experi-
mental setting: the processor was an Intel Xeon E5-1620 with a CPU frequency of 3.60 GHz;
memory 16.004 GB; the OS is Ubuntu16.04. The graphics accelerator GeForce GTX TITAN
Xp; the deep learning development framework is Tensorflow 2.3.0. The sample size of the
Shanghai Hongqiao Airport dataset used in the experiment is 301,089, the feature attribute
quantity is 64, and the size after matrix is 8 × 8; the sample size of Beijing–Tianjin–Hebei
airport cluster dataset is 1,650,797, the feature attribute quantity is 72, and the size after
matrix is 8 × 9. The specific experimental parameter configurations used to train the model
are shown in Table 3 below.
∑C
Accuracy = (17)
N
Computational complexity can describe the hardware consumption at runtime. The
higher the complexity, the more memory is occupied and the higher the processing time
required. It is mainly divided into spatial complexity and time complexity: spatial com-
plexity is expressed in terms of the number of parameters. The number of parameters of
single-layer convolutional layer and single-layer fully connected layer in the algorithm can
be approximated as Formulas (18) and (19). The time complexity is expressed in compu-
tational quantities, which might be understood as the quantity of FLOPs (Floating Point
Operations). The computation amount of single-layer convolutional layer and single-layer
fully connected layer can be approximated as Formulas (20) and (21).
PC = DK × DK × CF × NK (18)
PQ = DF × DF × CF × NK (19)
FC = DF × DF × CF × NK × DK × DK (20)
426
Electronics 2023, 12, 1434
FQ = DF × DF × CF × NK × 1 × 1 (21)
In Formulas (18) and (19), PC and PQ are the number of parameters of single-layer
convolutional layer and single-layer fully connected layer, respectively, DK is the convo-
lutional kernel size in the current layer, CF is the number of input feature channels of the
current layer, NK is the number of output feature channels of the current layer, and DF
is the input feature size of the current layer. In Formulas (20) and (21), FC and FQ are the
calculated amount of single-layer convolutional layer and single-layer full connection layer,
respectively. Thus, 1 represents the output feature size of the full connection layer, and the
other parameters have the same meaning in the parameter number formula.
Table 4. Comparison table of accuracy and loss values for different-width multipliers on Shanghai
Hongqiao Airport dataset.
MobileNetV3 ECA-MobileNetV3
Width Multiplier
Accuracy Loss Value Accuracy Loss Value
0.50 98.00% 0.0716 98.41% 0.0675
0.75 98.53% 0.0553 98.97% 0.0445
1.00 98.87% 0.0419 98.90% 0.0449
Based on the dataset of Shanghai Hongqiao Airport, the accuracy and loss curves
of the MobileNetV3 algorithm and ECA-MobileNetV3 algorithm under different channel
factors are, respectively, presented in Figures 5 and 6. According to the trend of the curves,
at different channel factors, the accuracy rate gently increases while the loss value gently
decreases. The loss values and accuracies of MobileNetV3 and ECA-MobileNetV3 tend
to stabilize when the number of training rounds is around 300. From the experimental
results, the MobileNetV3 algorithm has a loss value of about 0.0419 when the channel
factor is 1.00. The highest accuracy was 98.87%. When the channel factor is 0.75, the
lowest loss value of the ECA-MobileNetV3 algorithm is about 0.0449, and the highest
accuracy is 98.90%. Compared with the MobileNetV3 algorithm, the accuracy of the
ECA-MobileNetV3 algorithm with attention mechanism is slightly improved and the loss
value is slightly increased.
Based on the Beijing–Tianjin–Hebei airport cluster dataset, according to Table 5, from
the longitudinal analysis, as the channel factor becomes larger, the accuracy of the two
algorithms gradually increases and the loss value gradually decreases. Further, the accuracy
rates of the MobileNetV3 algorithm and ECA-MobileNetV3 algorithm reach the highest
when the channel factor is 1.00, and the accuracy rate of the MobileNetV3 algorithm reaches
96.60%; the accuracy rate of the ECA-MobileNetV3 algorithm reaches 96.81%. From a cross-
sectional perspective, the accuracy of the ECA-MobileNetV3 algorithm is slightly lower
427
Electronics 2023, 12, 1434
than that of the MobileNetV3 algorithm at channel factor numbers of 0.50 and 0.75, and the
accuracy of the improved algorithm is 0.18% lower than that before the improvement at a
channel factor of 0.50. At a channel factor of 1.00, the accuracy of the ECA-MobileNetV3
algorithm is slightly higher than that of the MobileNetV3 algorithm, and the accuracy
of the improved algorithm is 0.21% higher than that before the improvement. Therefore,
on the whole, the improved ECA-MobileNetV3 algorithm has a minor loss in accuracy
and still has some advantages in a multi-airport-associated cluster dataset such as the
Beijing–Tianjin–Hebei airport cluster dataset.
2 2 2
1 1 2 2
(a) (b)
Figure 5. Comparison of loss values and accuracy for different-width multipliers based on the
MobileNetV3 algorithm on Shanghai Hongqiao Airport dataset. (a) Accuracy comparison of different-
width multipliers. (b) Loss value comparison of different-width multipliers.
2 2 2
2
1
1 1 2 2
(a) (b)
Figure 6. Comparison of loss values and accuracy for different-width multipliers based on the
ECA-MobileNetV3 algorithm on Shanghai Hongqiao Airport dataset. (a) Accuracy comparison of
different-width multipliers. (b) Loss value comparison of different-width multipliers.
Based on the Beijing–Tianjin–Hebei airport cluster dataset, the accuracy and loss
curves of the MobileNetV3 algorithm and ECA-MobileNetV3 algorithm under different
channel factors are, respectively, presented in Figures 7 and 8. According to the trend
of the curves, at different channel factors, the accuracy rate gently increases while the
loss value gently decreases. The loss values and accuracies of MobileNetV3 and ECA-
MobileNetV3 tend to stabilize when the number of training rounds is around 150. From
428
Electronics 2023, 12, 1434
the experimental results, the MobileNetV3 algorithm has a loss value of about 0.0819 when
the channel factor is 1.00. The highest accuracy was 96.60%. When the channel factor is
1.00, the lowest loss value of the ECA-MobileNetV3 algorithm is about 0.0813, and the
highest accuracy is 96.81%. Compared with the MobileNetV3 algorithm, the accuracy of
the ECA-MobileNetV3 algorithm with an attention mechanism is slightly improved, while
the loss value is slightly decreased.
Table 5. Comparison table of accuracy and loss values for different width multipliers on Beijing–
Tianjin–Hebei airport cluster dataset.
MobileNetV3 ECA-MobileNetV3
Width Multiplier
Accuracy Loss Value Accuracy Loss Value
0.50 96.40% 0.0932 96.22% 0.1049
0.75 96.56% 0.0871 96.55% 0.0878
1.00 96.60% 0.0819 96.81% 0.0813
(a) (b)
Figure 7. Comparison of loss values and accuracy for different-width multipliers based on the
MobileNetV3 algorithm on Beijing–Tianjin–Hebei airport cluster dataset. (a) Accuracy comparison of
different-width multipliers. (b) Loss value comparison of different-width multipliers.
(a) (b)
Figure 8. Comparison of loss values and accuracy for different-width multipliers based on the ECA-
MobileNetV3 algorithm on Beijing–Tianjin–Hebei airport cluster dataset. (a) Accuracy comparison of
different-width multipliers. (b) Loss value comparison of different-width multipliers.
429
Electronics 2023, 12, 1434
MobileNetV3 ECA-MobileNetV3
Width Multiplier
Params(M) FLOPs(M) Accuracy Params(M) FLOPs(M) Accuracy
0.50 0.29 16.43 98.00% 0.17 16.21 98.41%
0.75 0.60 33.31 98.53% 0.33 32.80 98.97%
1.00 1.01 54.66 98.87% 0.55 53.76 98.90%
MobileNetV3 ECA-MobileNetV3
Width Multiplier
Params(M) FLOPs(M) Accuracy Params(M) FLOPs(M) Accuracy
0.50 0.29 18.44 96.40% 0.17 18.22 96.22%
0.75 0.60 37.39 96.56% 0.33 36.88 96.55%
1.00 1.01 61.35 96.60% 0.55 60.44 96.81%
It is clear from the experimental findings on the aforementioned two datasets that
the computational cost and precision of the proposed model are not only linear. We can
find algorithms that better balance the accuracy and computational complexity of the
model, which is also the direction of efforts in lightweight neural networks. By computing
conditions on different mobile devices, it is possible to match flight delay prediction models
of different sizes to maximize the model utilization.
430
Electronics 2023, 12, 1434
regard, this paper verifies the single airport and airport group datasets, compares the ECA-
MobileNetV3_1.00 model with the traditional ResNet [38], DenseNet [39] algorithm, and
MobileNetV2 algorithm under the same channel factor and analyzes it from the following
three aspects. ResNet and DenseNet are models that have been trained and widely used in
large-scale datasets and have achieved good results in many computer vision and natural
language processing tasks. Therefore, they are very representative models and can be used
as benchmarks for other models. MobileNetV2, as the leader in the lightweight model, has
been widely used in many mobile device applications. MobileNetV2 is the predecessor of
MobileNetV3, which can verify whether the improvement in ECA-MobileNetV3 is effective.
Taking MobileNetV2 as a comparative test can also provide reference and inspiration for
more lightweight model design. The results are shown in Table 8.
431
Electronics 2023, 12, 1434
prediction model studied in this paper. The subsequent application direction will focus
on the advantages of light weight. The lightweight model has the characteristics of fast
prediction speed, less demand for computing resources, higher real-time performance, and
portability. Therefore, this model can be deployed on some low-power devices, such as
mobile devices and sensors. This can quickly process data input and quickly update
forecast results and provide real-time information for airlines and the base to help them
plan and manage flight missions.
5. Conclusions
This paper studies the lightweight neural network MobileNetV3 algorithm and the
improved ECA-MobileNetV3 algorithm. By using the Shanghai Hongqiao Airport dataset
and the Beijing–Tianjin–Hebei Airport Cluster dataset, for example, analysis and practical
application of the model, the following conclusions are drawn: The algorithm proposed
in this paper can effectively reduce the computational complexity in the model without
loss of accuracy or with a small loss of accuracy by replacing the SE module in the original
MobileNetV3 algorithm with a lightweight ECA attention mechanism module. Compared
with the ResNet algorithm, DenseNet algorithm, and MobileNetV2 algorithm under the
same channel factor, the improved ECA-MobileNetV3 algorithm has more advantages in
computational complexity and accuracy. The flight delay prediction model based on ECA-
the MobileNetV3 algorithm has the advantage of light weight compared with the flight
delay prediction model that has been deployed now. The lightweight flight delay model
can bring faster execution speed, fewer computing resources, higher real-time performance,
and higher flexibility and portability, which can greatly lay the foundation for subsequent
deployment on mobile terminals and other platform devices, and for airlines, the airport
and passengers provide better service and better experience. However, there is still a lot
of room for improvement in the process of research in this paper. On the one hand, the
number of flight samples with different delay levels is quite different, which will affect the
accuracy of flight prediction. It is necessary to consider the impact of sample imbalance
on model training. On the other hand, the problem of flight delay is time-varying, and
the prediction model needs to be updated at any time. The next step is to explore how to
achieve a real-time update of the model and improve the practicability of the model.
Author Contributions: Methodology, J.Q.; validation, B.C. and C.L.; investigation, J.Q.; writing—
original draft preparation, B.C., C.L. and J.W.; writing—review and editing, B.C. All authors have
read and agreed to the published version of the manuscript.
Funding: This research was funded by the Tianjin Municipal Education Commission Scientific
Research Program, grant number 2022ZD006, and the Fundamental Research Funds for the Central
Universities, grant number 3122019185.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Five Major Development Tasks Clearly Clarified by the 13th Five-Year Plan for Civil Aviation. Available online: http://caacnews.
com.cn/1/1/201612/t20161222_1207217.html (accessed on 22 December 2016).
2. Report on the 2022 National Civil Aviation Work Conference. Available online: http://www.caac.gov.cn/XWZX/MHYW/202201
/t20220110_210827.html (accessed on 10 January 2022).
3. IATA: The Global Air Passenger Volume Will Reach 8.2 Billion Person-Times in 2007. Available online: https://www.ccaonline.
cn/news/top/461390.html (accessed on 30 October 2018).
4. Zhang, Y. Study on Group Time Countermeasures caused by Abnormal Flights-Take CEA as An Example. Master’s Thesis, East
China University of Political Science and Law, Shanghai, China, 2018.
5. Zhang, M. Research on Optimization of Countermeasures for Handling Mass Incidents of Passengers caused by Flight Delays.
Master’s Thesis, Handong University of Finance and Economics, Jinan, China, 2016.
6. Li, X.; Liu, G.C.; Yan, M.C.; Zhang, W. Economic losses of airlines and passengers caused by flight delays. Syst. Eng. 2007, 25,
20–23.
7. Liu, B.; Ye, B.J.; Tian, Y. Overview of flight delay prediction methods. Aviat. Comput. Technol. 2019, 49, 124–128.
432
Electronics 2023, 12, 1434
8. Xu, T.; Ding, J.L.; Gu, B.; Wang, J.D. Airport flight delay warning based on the incremental arrangement of support vector
machines. Aviat. J. 2009, 30, 1256–1263.
9. Luo, Q.; Zhang, Y.H.; Cheng, H.; Li, C. Model of hub airport flight delay based on aviation information network. Syst. Eng. Theory
Pract. 2014, 34, 143–150.
10. Luo, Z.S.; Chen, Z.J.; Tang, J.H.; Zhu, Y.W. A flight delay prediction study using SVM regression. Transp. Syst. Eng. Inf. 2015, 15,
143–149, 172.
11. Cheng, H.; Li, Y.M.; Luo, Q.; Li, C. Study on Prediction of arrival flight delay based on C4.5 decision tree method. Syst. Eng.
Theory Pract. 2014, 34, 239–247.
12. Nigam, R.; Govinda, K. Cloud Based Flight Delay Prediction using logistic Regression. In Proceedings of the 2017 International
Conference on Intelligent Sustainable Systems (ICISS), Palladam, India, 7–8 December 2017.
13. Khanmohammadi, S.; Tutun, S.; Kucuk, Y. A new multilevel input layer artificial neural network for predicting flight delays at
JFK airport. Procedia Comput. Sci. 2016, 95, 237–244. [CrossRef]
14. Wu, R.B.; Zhao, Y.Q.; Qu, J.Y. Flight delay spread prediction model based on CBAM-CondenseNet. J. Electron. Inf. Technol. 2021,
43, 187–195.
15. Hition, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. Comput. Sci. 2015, 14, 38–39.
16. Vongkulbhisal, J.; Vinayavekhin, P.; Visentini-scarzanella, M. Unifying Heterogeneous Classifiers with Heterogeneous Classifiers
with Distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach,
CA, USA, 15–20 June 2019.
17. Zhuang, l.; Li, J.G.; Shen, Z.; Zhang, C.M. Learning Efficient Convolutional Networks Through Network Slimming. In Proceedings
of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017.
18. He, Y.; Zhang, X.; Sun, J. Channel Pruning for Accelerating Very Deep Neural Networks. In Proceedings of the IEEE International
Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017.
19. Luo, J.H.; Wu, J.X.; Lin, W.Y. Thinet: A Filter Level Pruning Method for Deep Neural Network Compression. In Proceedings of
the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017.
20. Zhang, X.; Zhou, X.; Lin, M. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018.
21. Ma, N.N.; Zhang, X.Y.; Zheng, H.T. ShuffleNetV2: Practical Guidelines for Efficient CNN Architecture Design. In Proceedings of
the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018.
22. Tan, M.; Le, Q.E. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946.
23. Iandola, F.N.; Han, S.; Moskewicz, M.W. SqueezeNet: AlexNet-level Accuracy with 50x Fewer Parameters and <0.5 MB Model
Size. arXiv 2016, arXiv:1602.07360.
24. Howard, A.G.; Zhu, M.; Chen, B. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.
arXiv 2017, arXiv:1704.04861.
25. Sandler, M.; Howard, A.; Zhu, M. MobileNetV2: Inverted Residuals and Linearbottlenecks. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018.
26. Howard, A.; Sandler, M.; Chu, G. Searching for Mobilenetv3. In Proceedings of the IEEE/CVF International Conference on
Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–3 November 2019.
27. Cai, Q.J.; Peng, C.; Shi, X.W. Based on the MobieNetV2 lightweight face recognition algorithm. Comput. Appl. 2020, 40, 65–68.
28. Qi, Y.K. Lightweight algorithm for pavement obstacle detection based on MobileNet and YOLOv3. Comput. Syst. Appl. 2022, 31,
176–184.
29. Hu, J.L.; Shi, Y.P.; Xie, S.Y.; Chen, P. Improved MobileNet face recognition system based on Jetson Nano. Sens. Microsyst. 2021, 40,
102–105.
30. Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018.
31. Wang, Q.; Wu, B.; Zhu, P. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020.
32. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536.
[CrossRef]
33. Qu, J.; Zhao, T.; Ye, M. Flight delay prediction using deep convolutional neural network based on fusion of meteorological data.
Neural Process. Lett. 2020, 52, 1461–1484. [CrossRef]
34. Quality Controlled Local Climatological Data. Available online: https://www.ncdc.noaa.gov/orders/qclcd/ (accessed on
13 February 2019).
35. Cao, L. Research on Flight Delay Prediction and Visualization method based on CliqueNet. Master’s Thesis, Civil Aviation
University of China, Tianjin, China, 2020.
36. Prokhorenkova, L.; Gusev, G.; Vorobev, A. CatBoost: Unbiased Boosting with Categorical Features. Adv. Neural Inf. Process. Syst.
2018, 31, 6638–6648.
37. Flight Normal Management Regulations. Available online: https://xxgk.mot.gov.cn/2020/jigou/fgs/202006/t20200623_3307796.
html (accessed on 24 March 2016).
433
Electronics 2023, 12, 1434
38. He, K.; Zhang, X.; Ren, S. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016.
39. Huang, G.; Liu, Z.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
434
electronics
Article
Machine Learning-Based Prediction of Orphan Genes and
Analysis of Different Hybrid Features of Monocot and
Eudicot Plants
Qijuan Gao 1 , Xiaodan Zhang 2 , Hanwei Yan 3 and Xiu Jin 3, *
Abstract: Orphan genes (OGs) may evolve from noncoding sequences or be derived from older
coding material. Some shares of OGs are present in all sequenced genomes, participating in the
biochemical and physiological pathways of many species, while many of them may be associated with
the response to environmental stresses and species-specific traits or regulatory patterns. However,
identifying OGs is a laborious and time-consuming task. This paper presents an automated predictor,
XGBoost-A2OGs (identification of OGs for angiosperm based on XGBoost), used to identify OGs
for seven angiosperm species based on hybrid features and XGBoost. The precision and accuracy
of the proposed model based on fivefold cross-validation and independent testing reached 0.90
and 0.91, respectively, outperforming other classifiers in cross-species validation via other models,
namely, Random Forest, AdaBoost, GBDT, and SVM. Furthermore, by analyzing and subdividing the
hybrid features into five sets, it was proven that different hybrid feature sets influenced the prediction
performance of OGs involving eudicot and monocot groups. Finally, testing of small-scale empirical
datasets of each species separately based on optimal hybrid features revealed that the proposed
Citation: Gao, Q.; Zhang, X.; Yan, H.; model performed better for eudicot groups than for monocot groups.
Jin, X. Machine Learning-Based
Prediction of Orphan Genes and Keywords: orphan genes (OGs); hybrid features; machine learning; angiosperm
Analysis of Different Hybrid Features
of Monocot and Eudicot Plants.
Electronics 2023, 12, 1433.
https://doi.org/10.3390/ 1. Introduction
electronics12061433
Monocotyledonous and eudicotyledonous plants (monocots and eudicots) have mor-
Academic Editors: Chao Zhang, phological differences in the number and arrangement of their embryonic leaves. These
Wentao Li, Huiyan Zhang and are typically parallel in monocots and reticulate in eudicots; besides, monocots have a
Tao Zhan sheathing leaf base encircling the stem. Monocots diverged from their eudicot relatives
in angiosperm evolution derived from the whole genome duplication (WGD), which con-
Received: 16 February 2023
Revised: 11 March 2023
tributed to increased diversification, environmental adaptation, and genomic novelty [1].
Accepted: 12 March 2023
In the evolutionary process, orphan genes (OGs) can arise in a lineage and are preva-
Published: 17 March 2023 lently expressed in many organisms [2]. In particular, taxonomically restricted OGs are
widely distributed in angiosperm species, including eudicot and monocot groups, such
as Arabidopsis thaliana, Populus trichocarpa, Citrus sinensis, Triticum aestivum, Oryza sativa,
cowpea, Camellia sinensis, and Saccharum spontaneum [3–10]. Numerous studies of OGs have
Copyright: © 2023 by the authors. identified general trends in the sequence features of OGs across different species, including
Licensee MDPI, Basel, Switzerland. gene length, GC content, and introns, which are also vital for environmental adaptation,
This article is an open access article including biotic and abiotic stress [11–13]. Specifically, the OG Qua-Quine Starch (QQS)
distributed under the terms and
in Arabidopsis thaliana is known to regulate the ratio of protein and starch carbon. Being
conditions of the Creative Commons
transferred and expressed in other species, QQS has been reported to change the metabolic
Attribution (CC BY) license (https://
process by regulating the allocation of carbon and nitrogen in proteins and carbohydrates
creativecommons.org/licenses/by/
and affecting the compounds in seeds and leaves, consequently improving crop yields [14].
4.0/).
Previous studies also revealed that OGs play a vital role in response to drought stress in
cowpea and Fusarium resistance in Triticum aestivum [6,7].
OGs have usually been applied through BLAST (Basic Local Alignment Search Tool)
sequence alignment, involving genome and transcriptome sequences for all analysis pro-
cesses, including BLASTP, BLASTN, TBLASTX, and so on [15]. However, this method is
time-consuming and requires considerable server-driven resources to identify OGs. Alter-
natively, OGs can be distinguished from nonorphan genes (NOGs), e.g., protein-coding
genes, by more significant differences in gene length, exon number, GC content, and ex-
pression level [11]. Their analysis and further classification can be facilitated via machine
learning-based methods, which have already been successfully applied to classifying bio-
logical datasets and solving various discrimination problems. Thus, such ensemble learning
methods as Gradient Boosting Decision Tree (GBDT), Random Forest, and Adaboost have
been used for biological prediction based on genome datasets. In particular, Zhu et al. used
GBDT to classify tissue and cell types in cancer samples using a gene expression dataset,
which performed similarly to other machine learning methods [16]. In contrast, the Extreme
Gradient Boost (XGBoost) method adopted by Chen and Guestrin [17] outperformed nu-
merous machine learning methods and found wide applications in data mining, regression,
and classification domains. In addition, Gao et al. have used an effective model named
SMOTE-ENN-XGBoost to predict the OGs of A. thaliana [18]. However, to the best of the
authors’ knowledge, it has yet to be carried out in the bioinformatic field of predicting OGs
of different types of plant species.
In this study, OGs were measured by taking into account sequence features, which
share some characteristics of other angiosperm species (shorter sequence length, fewer
exon numbers, and lower GC content), while having fewer transcript support and lower
expression than NOGs [12]. Then, these protein features were extracted, and the XGBoost-
A2OG model was constructed and applied to the prediction of OGs in angiosperm species.
2. Related Works
Recently, machine learning methods have received considerable interest in the identifi-
cation of OGs fields, which are an important source of genetics and contribute to evolu-
tionary innovations. These methods include the Decision tree (DT) [19], Neural network
(NN) [19], Convolutional Neural Network (CNN) with transformer [20], and ensemble
learning method [20]. Besides, many researchers have been conducted to compare dif-
ferent machine learning algorithms or combined with other methods to accelerate the
performance of identification of OGs.
Gao et al. proposed a novel ensemble method to predict the OGs of A. thaliana
in bioinformatics studies. Then another deep learning method, CNN with transformer
technique was successfully applied to identifying OGs in moso bamboo which used a
convolutional neural network in combination with a transform neural network in protein
sequences [19]. Their proposed approach provides better performance in a specific species.
In addition, decision trees and neural networks were employed to improve the accurate
discovery of OGs by Casola et al. relying on basic sequence features obtained from DNA
and protein sequences in three angiosperm families. The experimental results showed that
both DT and NN classifiers achieve high levels of accuracy and recall in identifying OGs.
Recently, many studies have confirmed that OGs generated de novo in a species
may be more prevalent than gene duplication and be one of the main ways of orphan
generation [21–25]. Some researchers have found that in the newly evolved OGs in Ara-
bidopsis, protein length is usually shorter, mainly due to the evolution of the orphan gene
having fewer exons in the process, while in some species, the exon length is significantly
shorter [26,27].
However, these researchers haven’t focused on different families of angiosperm plants.
To find a general method to identify a large number of plants of OGs based on a rapid
accumulation of genomic data, we have analyzed some features regarding the genome and
protein sequences that may affect the results in the classification process.
436
Electronics 2023, 12, 1433
437
Electronics 2023, 12, 1433
The obtained 9022 OGs and 392,812 NOGs were identified with label 1 and label 0,
respectively, to thoroughly train the ensemble learning model. All of them were combined
to form the five plant species’ OG datasets. Then, we extracted the characteristics of gene
structure, cDNA sequence, and protein-coding genes of all five species from Phytozome
and Ensemble plants, forming databases containing high annotation of plant genomes.
n t−1
1
obj(t) = ∑ l (yi , ŷi ) + gi f t ( xi ) + hi f t2 ( xi )] + Ω( f t ) + constant · · · (2)
i =1
2
438
Electronics 2023, 12, 1433
1 T
Ω( f ) = γT + λ ∑ w2j (4)
2 j =1
(5) T is the number of leaf nodes, and w is the leaf node score. Substituting (2)–(4) into
(1) the objective function:
T
1
obj(t) = ∑ [Gj w j + 2 ( Hj + λ)w2j ] + γT (5)
j =1
(6) Among, Iij = {i |q( xi = j}, which represents the sample set belonging to the j-th
leaf node.
G j = ∑ i ∈ I gi , H j = ∑ i ∈ I h i (6)
j j
(7) To minimize the objective function, set the derivative being 0 and find the optimal
prediction score of each leaf node:
Gj
w∗j = − (7)
Hj + λ
(8) Substitute the objective function again to get its minimum value:
Gj 2
1 T
obj∗ = −
2 j∑
+ γT (8)
H
=1 j
+λ
(9) Use obj* to find the tree with the best structure and add it to the model and apply
the greedy algorithm to find the optimal tree structure. Each time when trying to add a
split to an existing leaf, the Gain is calculated as follows:
( ∑ gi ) 2 ( ∑ gi ) 2 ( ∑ gi ) 2
1 i ⊆ IL i ⊆ IR i⊆ I
Gain(Φ) = [ + − ]−γ (9)
2 ∑ hi + λ ∑ hi + λ ∑ hi + λ
i ⊆ IL i ⊆ IR i⊆ I
When the XGBoost model was used in the experiment, the following parameters
were adjusted to make the model perform its best performance. For example, one of the
most critical parameters in this and other tree-based ensemble algorithms, such as GBDT,
Random Forest (RF), and AdaBoost, is “learning_rate”, which dramatically affects the
model performance. Another parameter is “n_estimators”, which is the number of iterations
in training: too small or too large parameters will lead to underfitting or overfitting,
respectively. The third critical parameter is “max_depth”, which is the maximum depth of
the tree. Its higher values make the tree model more complex and improve its fitting ability,
but at the same time, it increases the risk of overfitting.
In contrast to XGBoost, the GBDT is a radial basis function kernel that adopts an
automatic gamma value (which is the inner product coefficient in the polynomial) and soft
margin parameter C = 1, which controls the trade-off between the slack variable penalty
and the margin size. Random Forest (RF) is based on trees and is characterized by the
square root of the number of features. In AdaBoost, the most critical parameters are
“base_estimator”, “n_estimators”, and “learning_rate”.
439
Electronics 2023, 12, 1433
corrupt data and then modifying or deleting these false data with some techniques. Differ-
ent datasets have various characteristics in actual research, so there are different ways to
predict the data.
In this paper, we divide into two parts feature selection, one is the filter-based feature
selection. This algorithm adopts some principles involving information, consistency, de-
pendency, and distance for measuring the feature characteristics, which are generalized
for various classifiers based on the independent features of the machine learning algo-
rithm [31]. For example, a variation filter is to remove the features with small difference
value and retain the features with large variance value, because the variance of each feature
determines the different degree of the feature in a sample. When a feature in the data
set is exposed to Bernoulli distribution (binary classification), it can be used the formula
as follow:
σ = p(1 − p) (10)
The classic Chi-square(Chi2) filter method is a statistical test for computing the cor-
relation from two types of categorical data. Considering the inconsistency between the
observed value and the expected value of the sampling frequency, such as the independent
variable equal to i and the dependent variable equal to j, the statistic is constructed, Chi2
tests use the following formula to calculate the test statistic:
( A − E )2
κ2 = (11)
E
The other part is manual feature selection. In this section, we set three main experi-
ments to evaluate the classifiers to validate the performance of classifying the OGs from
various feature datasets with the proposed model.
Firstly, two sets of experiments were organized based on nine gene pair feature datasets
involving GC, GC%, protein length, molecular weight (Mw(Da)), isoelectric point value
(pI), exon number, average exon length, intron number, average intron length, gene length,
and the output value as an assessment criterion, namely, AGI, for detecting the conditional
relatedness between a pair of genes. For model training, the datasets were divided into
two sections containing training and testing parts, and the target labels of AGI values were
marked as 1 s and 0 s for the two types of gene pairs. The total datasets were divided
into training, validating and testing processes using 5-fold cross-validation. The training
dataset was used to develop the aforementioned statistical criteria for selected models. The
testing dataset was applied to assess the performance of these models with the default
parameters without tuning.
Secondly, to explore the importance of genomic and cDNA sequence features after
selecting the optimal models, we used a feature selection method by removing one feature
from “set_all” of features each time with no redundancy, such as set1 of feature data with
no protein length, set2 with no protein of Mw(Da), set3 with no protein of pI, set4 with no
exon number, and set5 with no GC%.
Finally, to validate this model for predicting the OGs of each plant species with specific
feature sets, we selected seven testing datasets matched with seven plants (Arabidopsis
thaliana, Populus trichocarpa, Sorghum bicolour, Oryza sativa, Zea mays, Citrus sinensis, and
Camellia sinensis).
440
Electronics 2023, 12, 1433
TP + TN
Accuracy = (12)
TP + TN + FP + FN
(ii) Recall rate (accuracy rate of positive samples):
TP
Recall = (13)
TP + FN
(iii) Precision (precision rate of positive samples):
TP
Precision = (14)
TP + FP
(iv) F1-score value:
2PR
F1 = (15)
P+R
441
Electronics 2023, 12, 1433
Figure 3. Comparison of various features of OGs and NOGs in seven species: (a) pI, (b) gene length,
(c) Mw, (d) GC%.
Table 2. Performance measure indices of the five models based on the same parameters of the training
and test datasets.
442
Electronics 2023, 12, 1433
Moreover, according to the area under the curve (AUC) value shown in Figure 4,
the ROC and precision-recall (P-R) curves of the XGBoost model completely wrapped
those of the other four methods (AdaBoost, GBDT, RF, and SVM), outperforming them by
classification efficiency.
Figure 4. The ROC (a) and precision-recall (P-R) (b) curves of XGBoost, Adaboost, GBDT, RF, and
SVM methods.
4.3. Predicting OGs with Different Feature Sets in Eudicot and Monocot Species via
XGBoost-A2OGs
Some features might become noise, deteriorating the robustness and stability of the
constructed model. Moreover, contribution rates of various features differ, the highest
ones being the most lucrative for OGs’ prediction. Therefore, this work presents two
filter-based selection methods to remove irrelevant and redundant features in terms of
both training processes. In particular, we selected two types of delegated species from the
eudicot subclass (P. trichocarpa and Camellia sinensis) and monocot subclass (O. sativa and
S. bicolour) applied with filter-based selection methods. Then the filtered feature are the
same containing the GC, protein length, Mw (Da), and pI. Thus, the classification results on
these selection methods with four species separately by variation and Chi2 method based
on the XGBoost-A2OGs model are listed in Table 3.
443
Electronics 2023, 12, 1433
Table 3. Performance measure indices of eudicot and monocot species for the training and testing
datasets by filter method based on the same parameters.
Filter algorithms can scale for multiple dimensional datasets. However, the features
selected by the filter method ignore the interaction among features, and individual scores
in a filter-based method are assigned to each feature without considering its significance
in combination with other shared features. Therefore, we further proposed an artificial
group for feature selection to explore the contribution of each feature for different types
of angiosperm. First of all, we also selected a eudicot subclass (P. trichocarpa and Camellia
sinensis) and applied to them five sets of feature selection methods to identify the one with
the optimal performance. The classification results on five sets of feature selection methods
with two species separately based on XGBoost-A2OGs are listed in Table 4, where the Set3
of Camellia sinensis featured the lowest precision, accuracy, and AUC values (0.80, 0.69,
and 0.85). Meanwhile, the Set5 of P. trichocarpa combined the highest respective values
(precision of 0.9, accuracy of 0.92, and AUC = 0.94).
Table 4. Performance measure indices of eudicot species for the training and testing datasets by
feature sets based on the same parameters.
As it was mentioned earlier, monocots have branched off from eudicots via whole
genome duplication (WGD) [33]. Systematic identification of orphan genes in eudicots
revealed that the optimal precision of P. trichocarpa and Camellia sinensis orphan genes were
nearly 0.9 shown in Table 4. Five sets of feature selection methods were also applied to
reveal the optimal feature selection performance with XGBoost-A2Ogs for the monocot
group containing O. sativa and S. bicolour. The results are listed in Table 5, indicating that
the Set5 feature selection in the monocot group yielded higher precision, accuracy, and
AUC values than those obtained via the Set_all feature selection. The respective values
of S. bicolour in Set5 (0.82, 0.87, and 0.94) exceeded those in Set_all (0.65, 0.73, and 0.6) by
about 26, 19, and 57%, respectively.
444
Electronics 2023, 12, 1433
Table 5. Performance measure indices of monocot species for the training and testing datasets by
feature sets based on the same parameters.
Additionally, we further explored and compared these combined feature sets of four
selected plant species, containing the eudicot and monocot species of evolutionary lineages.
The results, plotted in Figure 5, strongly indicate that the featured protein of pI, which
plays a vital role in determining molecular biochemical function, is essential for predicting
OGs in eudicot genomes and further clarifying their biochemical function in eudicots via
proteomic studies.
Figure 5. The performance precision for eudicots and monocots via different selected feature sets in
angiosperm specie.
We also observed that GC content was more likely to impact prediction performance,
as real OGs in monocot groups evolved from eudicots, such as O. sativa and S. bicolour.
However, GC content is one of the critical compositional features of the genome and varies
significantly among different genomes and regions within a genome [34,35].
Finally, to further validate the performance of the XGBoost-A2OG model for eudicot
and monocot groups, we tested the model on the dataset of Arabidopsis thaliana, Populus
trichocarpa, Sorghum bicolour, Oryza sativa, Zea mays, Citrus sinensis, and Camellia sinensis
with feature set5 separately. The results are shown in Figure 6.
445
Electronics 2023, 12, 1433
The precision of predicting OGs for different angiosperm species was not the same,
indicating a higher reliability of XGBoost-A2OGs in identifying OGs of eudicot species
(P. trichocarpa, Camellia sinensis, Citrus sinensis, and A. thaliana) than that of monocot species
(O. sativa, S. bicolour, and Z. mays).
With a range of evolutionary processes, OGs can be derived in a lineage and provide
lineage-specific adaptations. As mentioned above, there is some evidence that the sequence
characteristics of orphan genes are common in two groups of angiosperm: eudicot and
monocot species. However, some of them play different roles in identifying OGs based on
the XGBoost-A2OG model due to differences in their evolution and origins. However, there
is a lack of evidence on the mechanism of origin for the divergence of essential features of
OGs between monocots and eudicots due to the rapid evolution of orphan genes.
5. Conclusions
Based on the background of enlarged genome sequences in angiosperm plants, this
study proposed an XGBoost-A2OGs model to identify orphan genes (OGs) via the ensemble
learning approach applied to several genome and cDNA features in angiosperm species,
some of which have a consistent distribution. Cross-species models were trained on
datasets of seven angiosperm species, performing better than SVM and other ensemble
models (Adaboost, GBDT, and Random Forest). The proposed XGBoost-A2OGs method
adopted makes multiple feature sets that have been proven helpful in OG identification
and used feature selection to select the optimal feature subset. Thus, plant OGs exhibited
discrepant results on combined features in eudicots (P. trichocarpa and Camellia sinensis) and
monocots (O. sativa and S. bicolour) but still shared some features. Finally, the proposed
method further established species-specific models with the optimal features on seven
plants’ datasets, which performed better on eudicot groups than on monocot ones.
In summary, XGBoost-A2OGs is a helpful method for identifying OGs from genome
features. The feature importance of monocot and eudicot orphans was analyzed, providing
a theoretical basis for the inheritance and variation of orphan genes in the process of
evolution. In future work, with the rapid development of next-generation sequencing
technologies, an ensemble learning approach with comparative genomics can be imported
to obtain information on different types of angiosperm plants. Alternative deep learning
algorithms, such as Transformer and LSTM, can also be applied to improve the potential
performance. The follow-up study envisages incorporating some other essential features,
such as gene expression, into the proposed model, which may significantly improve the
efficiency of predicting OGs in angiosperm plants.
446
Electronics 2023, 12, 1433
Author Contributions: Conceptualization, X.J.; methodology, Q.G.; software, X.Z.; H.Y.; writing—
original draft preparation, Q.G.; writing—review and editing, Q.G. All authors have read and agreed
to the published version of the manuscript.
Funding: This research was funded by commercial research fund named “High-throughput sequenc-
ing and metagenomic approaches for the study of functional health components of tea leaves” and
grant number as 20223401002858.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Bomblies, K.; Madlung, A. Polyploidy in the Arabidopsis genus. Chromosome Res. Int. J. Mol. Supramol. Evol. Asp. Chromosome
Biol. 2014, 22, 117–134. [CrossRef] [PubMed]
2. Wilson, G.A.; Bertrand, N.; Patel, Y.; Hughes, J.B.; Feil, E.J.; Field, D. Orphans as taxonomically restricted and ecologically
important genes. Microbiology 2005, 151, 2499–2501. [CrossRef] [PubMed]
3. Donoghue, M.T.A.; Keshavaiah, C.; Swamidatta, S.H.; Spillane, C. Evolutionary origins of Brassicaceae specific genes in Arabidopsis
thaliana. BMC Evol. Biol. 2011, 11, 47. [CrossRef] [PubMed]
4. Lin, W.L.; Cai, B.; Cheng, Z.M. Identification and characterization of lineage-specific genes in Populus trichocarpa. Plant Cell Tissue
Organ Cult. 2013, 116, 217–225. [CrossRef]
5. Xu, Y.; Wu, G.; Hao, B.; Chen, L.; Deng, X.; Xu, Q. Identification, characterization and expression analysis of lineage-specific genes
within sweet orange (Citrus sinensis). BMC Genom. 2015, 16, 995. [CrossRef]
6. Perochon, A.; Kahla, A.; Vranić, M.; Jia, J.; Malla, K.B.; Craze, M.; Wallington, E.; Doohan, F.M. A wheat NAC interacts with an
orphan protein and enhances resistance to Fusarium head blight disease. Plant Biotechnol. J. 2019, 17, 1892–1904. [CrossRef]
7. Li, G.; Wu, X.; Hu, Y.; Muñoz-Amatriaín, M.; Luo, J.; Zhou, W.; Wang, B.; Wang, Y.; Wu, X.; Huang, L.; et al. OGs are involved in
drought adaptations and ecoclimatic-oriented selections in domesticated cowpea. J. Exp. Bot. 2019, 70, 3101–3110. [CrossRef]
[PubMed]
8. Shen, S.; Peng, M.; Fang, H.; Wang, Z.; Zhou, S.; Jing, X.; Zhang, M.; Yang, C.; Guo, H.; Li, Y.; et al. An Oryza specific
hydroxycinnamoyl tyramine gene cluster contributes to enhanced disease resistance. Sci. Bull. 2021, 66, 2369–2380. [CrossRef]
[PubMed]
9. Zhao, Z.; Ma, D. Genome-wide identification, characterization and function analysis of lineage-specific genes in the tea plant
Camellia sinensis. Front. Genet. 2021, 12, 770570. [CrossRef]
10. Cardoso-Silva, C.B.; Aono, A.H.; Mancini, M.C.; Sforca, D.A.; da Silva, C.C.; Pinto, L.R.; de Souza, A.P. Taxonomically restricted
genes are associated with responses to biotic and abiotic stresses in Sugarcane (Saccharum spp.). bioRxiv 2022. [CrossRef]
11. Ma, S.W.; Yuan, Y.; Tao, Y.; Jia, H.Y.; Ma, Z.Q. Identification characterization and expression analysis of lineage-specific genes
within Triticeae. Genomics 2020, 112, 1343–1350. [CrossRef] [PubMed]
12. Arendsee, Z.W.; Li, L.; Wurtele, E.S. Coming of age: OGs in plants. Trends Plant Sci. 2014, 19, 698–708. [CrossRef] [PubMed]
13. Jiang, M.; Li, X.; Dong, X.; Zu, Y.; Zhan, Z.; Iiao, Z.; Lang, H. Research advances and prospects of OGs in plants. Front. Plant Sci.
2022, 13, 947129. [CrossRef] [PubMed]
14. O’Conner, S.; Neudorf, A.; Zheng, W.; Qi, M.; Zhao, X.; Du, C.; Nettleton, D.; Li, L. From Arabidopsis to crops: The Arabidopsis
QQS orphan gene modulates nitrogen allocation across species. In Engineering Nitrogen Utilization in Crop Plants; Springer: Cham,
Switzerland, 2018; pp. 95–117.
15. Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410.
[CrossRef] [PubMed]
16. Zhu, S.L.; Dong, J.; Zhang, C.; Huang, Y.B.; Pan, W. Application of machine learning in the diagnosis of gastric cancer based on
noninvasive characteristics. PLoS ONE 2020, 15, e0244869. [CrossRef]
17. Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016.
18. Gao, Q.; Jin, X.; Xia, E.; Wu, X.; Gu, L.; Yan, H.; Xia, Y.; Li, S. Identification of orphan genes in unbalanced datasets based on
ensemble learning. Front. Genet. 2020, 11, 820. [CrossRef]
19. Casola, C.; Owoyemi, A.; Pepper, A.E.; Ioerger, T.R. Accurate identification of de novo genes in plant genomes using machine
learning algorithms. bioRxiv 2022. [CrossRef]
20. Zhang, X.; Xuan, J.; Yao, C.; Gao, Q.; Wang, L.; Jin, X.; Li, S. A deep learning approach for orphan gene identification in moso
bamboo (Phyllostachys edulis) based on the CNN+ Transformer model. BMC Bioinform. 2022, 23, 162. [CrossRef]
21. Carvunis, A.R.; Rolland, T.; Wapinski, I.; Calderwood, M.A.; Yildirim, M.A.; Simonis, N.; Charloteaux, B.; Hidalgo, C.A.; Barbette,
J.; Santhanam, B.; et al. Proto-genes and de novo gene birth. Nature 2012, 487, 370–374. [CrossRef]
22. Prabh, N.; Rödelsperger, C. De novo, divergence, and mixed origin contribute to the emergence of orphan genes in Pristionchus
Nematodes. G3 2019, 9, 2277–2286. [CrossRef]
23. Schlötterer, C. Genes from scratch-the evolutionary fate of de novo genes. Trends Genet. 2015, 31, 215–219. [CrossRef]
447
Electronics 2023, 12, 1433
24. Zhang, W.Y.; Gao, Y.X.; Long, M.Y.; Shen, B.R. Origination and evolution of orphan genes and de novo genes in the genome of
Caenorhabditis elegans. Sci. China Life Sci. 2019, 62, 579–593. [CrossRef]
25. Singh, U.; Wurtele, E.S. How new genes are born. Elife 2020, 9, e55136. [CrossRef]
26. Albà, M.M.; Castresana, J. On homology searches by protein blast and the characterization of the age of genes. BMC Evol. Biol.
2007, 7, 53. [CrossRef]
27. Domazet-Lošo, T.; Brajković, J.; Tautz, D. A phylostrati graphy approach to uncover the genomic history of major adaptations in
metazoan lineages. Trends Genet. 2007, 23, 533–539. [CrossRef]
28. Goodstein, D.M.; Shu, S.; Howson, R.; Neupane, R.; Hayes, R.D.; Fazo, J. Phytozome: A comparative platform for green plant
genomics. Nucleic Acids Res. 2012, 40, D1178–D1186. [CrossRef]
29. Wheeler, D.L.; Barrett, T.; Benson, D.A.; Bryant, S.H.; Canese, K.; Church, D.M.; Yaschenko, E. Database resources of the National
Center for Biotechnology Information. Nucleic Acids Res. 2005, 33, D39–D45. [CrossRef]
30. Bolser, D.; Staines, D.M.; Pritchard, E.; Kersey, P. Ensembl plants: Integrating tools for visualizing, mining, and analyzing plant
genomics data. In Plant Bioinformatics; Humana Press: New York, NY, USA, 2016; pp. 115–140.
31. Halim, Z. An ensemble filter-based heuristic approach for cancerous gene expression classification. Knowl.-Based Syst. 2021, 234,
107560.
32. Ispandi, R.; Wahono, S. Application of genetic algorithms to optimize parameters in support vector machine to increase direct
marketing predictions. J. Intell. Syst. 2015, 1, 115–119.
33. Chaw, S.M.; Chang, C.C.; Chen, H.L.; Li, W.H. Dating the monocot dicot divergence and the origin of core eudicots using whole
chloroplast genomes. J. Mol. Evol. 2004, 58, 424–441.
34. Bowman, M.J.; Pulman, J.A.; Liu, T.L.; Childs, K.L. A modified GC-specific MAKER gene annotation method reveals improved
and novel gene predictions of high and low GC content in Oryza sativa. BMC Bioinform. 2017, 18, 522. [CrossRef]
35. Singh, R.; Ming, R.; Yu, Q. Comparative analysis of GC content variations in plant genomes. Trop. Plant Biol. 2016, 9, 136–149.
[CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
448
electronics
Article
UAV Abnormal State Detection Model Based on Timestamp
Slice and Multi-Separable CNN
Tao Yang, Jiangchuan Chen *, Hongli Deng and Yu Lu
School of Computer Science, China West Normal University, Nanchong 637002, China
* Correspondence: [email protected]
Abstract: With the rapid development of UAVs (Unmanned Aerial Vehicles), abnormal state detection
has become a critical technology to ensure the flight safety of UAVs. The position and orientation
system (POS) data, etc., used to evaluate UAV flight status are from different sensors. The traditional
abnormal state detection model ignores the difference of POS data in the frequency domain during
feature learning, which leads to the loss of key feature information and limits the further improvement
of detection performance. To deal with this and improve UAV flight safety, this paper presents a
method for detecting the abnormal state of a UAV based on a timestamp slice and multi-separable
convolutional neural network (TS-MSCNN). Firstly, TS-MSCNN divides the POS data reasonably in
the time domain by setting a set of specific timestamps and then extracts and fuses the key features
to avoid the loss of feature information. Secondly, TS-MSCNN converts these feature data into
grayscale images by data reconstruction. Lastly, TS-MSCNN utilizes a multi-separable convolution
neural network (MSCNN) to learn key features more effectively. The binary and multi-classification
experiments conducted on the real flight data, Air Lab Fault and Anomaly (ALFA), demonstrate that
the TS-MSCNN outperforms traditional machine learning (ML) and the latest deep learning methods
in terms of accuracy.
1. Introduction
Citation: Yang, T.; Chen, J.; Deng, H.; With the development of unmanned aerial vehicles (UAVs), their applications in
Lu, Y. UAV Abnormal State Detection
civilian and military fields have expanded, including agriculture [1], transportation [2], and
Model Based on Timestamp Slice and
fire protection [3]. However, as UAVs play an increasingly important role, their flight safety
Multi-Separable CNN. Electronics
problems have become more prominent [4]. Network attacks can lead to UAV failures,
2023, 12, 1299. https://doi.org/
and physical component failures such as elevators and rudders can also affect UAV flight
10.3390/electronics12061299
safety. For example, in June 2020, a US Air Force MQ-9 “Death” UAV crashed in Africa,
Academic Editor: Ping-Feng Pai causing a loss of USD 11.29 million [5]. In February 2022, a DJI civilian UAV crashed out of
control, resulting in a personal economic loss of up to 16,300 RMB [6]. According to the
Received: 2 February 2023
Revised: 1 March 2023
Civil Aviation Administration of China, the number of registered UAVs in China alone has
Accepted: 7 March 2023
reached 8.3 million [7]. Therefore, it is necessary to establish a UAV safety detection model
Published: 8 March 2023
to ensure the safety and reliability of UAV flights. Improving the flight safety of UAVs has
become a major research topic in the field of UAVs. Currently, a common method to ensure
UAV flight safety is to monitor UAV flight data for anomalies [8]. Abnormal flight data
indicates that the UAV may have hardware failure or misoperation, and timely identification
Copyright: © 2023 by the authors. of the cause of the failure can effectively prevent UAV flight accidents. Figure 1 shows the
Licensee MDPI, Basel, Switzerland. main components of a typical UAV anomaly detection system.
This article is an open access article UAV flight data is mainly extracted from attitude estimation data of different UAV
distributed under the terms and
sensors [9,10], which include the POS data and the system status (SS) data. These data
conditions of the Creative Commons
enable the detection of UAV flight status. The POS data consists of a triple of values in the
Attribution (CC BY) license (https://
x, y, and z directions, while the SS data contains only a single value. Additionally, these
creativecommons.org/licenses/by/
data are closely related to UAV guidance, navigation, and control (GNC) [11,12]. The early
4.0/).
UAV anomaly detection method was based on flight data rules; however, the rule-based
anomaly detection method has a low detection performance [13]. To better ensure the flight
safety of UAVs, ML and deep learning methods have been introduced into the research
field of UAV safety. The development of these methods has opened up new ideas for the
research of UAV anomaly detection. However, the traditional anomaly detection method
ignored the difference between POS data and SS data used to evaluate the flight status
of UAVs in the frequency domain, resulting in the loss of some key feature information
in-flight data. This limitation restricts the performance of UAV anomaly detection models.
To address these problems, this paper proposes a method of extracting frequency domain
information by setting timestamp slices and proposes a UAV anomaly detection model
based on a multi-separable convolution neural network fusion method. It should be noted
that this paper takes the time of UAV failure as the dividing point and does not consider
the recovery process.
In the next part of this paper, Section 2 describes the related research. Section 3
introduces the processing method of the ALFA dataset [14] and proposes the TS-MSCNN
anomaly detection model. Section 4 carries out experiments from various angles and
analyzes the experimental results of binary and multi-class classification. The final section
provides a summary and conclusion of this paper.
2. Related Works
This section provides a review of research related to UAV anomaly detection, covering
rule-based algorithms and those based on ML and deep learning methods.
Regarding rule-based algorithms, Chen et al. [15] investigated the impact of attackers’
behavior on the effectiveness of malware detection technology and proposed a specification-
based intrusion detection system that showed effective detection with high probability and
low false positives. Mitchell et al. [16] considered seven threat models and proposed a
specification-based intrusion detection system with specific adaptability and low runtime
resource consumption. Sedjelmaci et al. [17] studied four attacks—false information propa-
gation, GPS deception, jamming, and black hole and gray hole attacks—and designed and
implemented a new intrusion detection scheme with an efficient and lightweight response,
which showed high detection rates, low false alarm rates, and low communication over-
head. This scheme was also able to detect attacks well in situations involving many UAVs
and attackers.
In terms of the UAV anomaly detection model based on traditional ML methods,
Liu et al. [18] proposed a real-time UAV anomaly detection method based on the KNN
algorithm for the UAV flight sensor data stream in 2015, which has high efficiency and
450
Electronics 2023, 12, 1299
high accuracy. In 2016, Senouci et al. [19] focused on the two main problems of intrusion
detection and attacker pop-up in the UAV-assisted network. The Bayesian game model was
used to balance the intrusion detection rate and intrusion detection resource consumption.
This method achieved a high detection rate and a low false positive rate. In 2019, Keifour
et al. [20] released an initial version of the ALFA dataset [13] and proposed a real-time
UAV anomaly detection model using the least squares method. This method does not need
to assume a specific aircraft model and can detect multiple types of faults and anomalies.
In 2021, Shrestha et al. [21] simulated a 5G network and UAV environment through the
CSE-CIC-IDS-2018 network dataset, established a model for intrusion detection based on
the ML algorithm, and also implemented the model based on ML into ground or satellite
gateways. This research proves that the ML algorithm can be used to classify benign or
malicious packets in UAV networks to enhance security.
However, some outliers can be difficult to detect using traditional machine learning
(ML) techniques [22]. To address this challenge, deep learning (DL) methods have been
increasingly used to improve the detection accuracy of UAV anomalies, especially when
processing high-dimensional UAV flight data. In 2021, Park et al. [23] proposed a UAV
anomaly detection model using a stacking autoencoder to address the limitations of the
current rule-based model. This model mainly judges the normal and abnormal conditions
of data through the loss of data reconstruction. The experimental results on different UAV
data demonstrate the effectiveness of the proposed model. In 2022, Abu et al. [24] proposed
UAV intrusion detection models in homogeneous and heterogeneous UAV network envi-
ronments based on a convolutional neural network (CNN) using three types of UAV WIFI
data records. The final experimental results demonstrate the effectiveness of the proposed
model. Dudukcu et al. [25] utilized power consumption data and simple moving average
data of the UAV battery sensor as the multivariate input of the time-domain convolution
network to identify the anomaly of the instantaneous power consumption of the UAV bat-
tery. The simulation results show that the time-domain convolutional network can achieve
good results in instantaneous power consumption prediction and anomaly detection when
combining simple moving average data and UAV sensor data. In addition, some studies
have explored the use of probability models, time series data, and data dimensions for
anomaly detection, achieving effective results [26–28], which have important implications
for this study.
All of the previously mentioned methods have been successful in detecting anomalies,
but they have not taken into account the differences between the POS data and SS data
used to evaluate UAV flight status in the frequency domain. This has resulted in the loss of
some key feature information in the flight data, which limits the improvement of anomaly
detection model performance. The differences in the frequency domain can be seen in
two aspects: first, the feature information amount of the POS data and the SS data in the
frequency domain is inconsistent in the same time domain; second, the data structure is
different. The feature of POS data in the frequency domain is triple, while SS data is a
single value. When the amount of feature information is inconsistent, a feature vector with
variable length is generated, which leads to the loss of key feature information in the model
training process. Additionally, the difference in data structure causes POS data and SS
data to lose some key information due to the confusion of feature information during the
anomaly detection model’s feature extraction process.
To address the issues mentioned above, this paper proposes several solutions. Firstly, a
specific timestamp size is set, and the frequency domain information of UAV data is divided
and extracted to fuse key feature information, addressing the problem of inconsistency
between POS data and SS data in the frequency domain. Secondly, POS and SS data are
reconstructed into grayscale images. Lastly, the MSCNN is utilized to learn and fuse the
key features of POS and SS data, overcoming the problem of key feature information loss
caused by the structural differences between POS data and SS data. The following sections
will provide a detailed description of these solutions.
451
Electronics 2023, 12, 1299
452
Electronics 2023, 12, 1299
key feature information in the model training process. Suppose that at time t, by observing
the temperature information of the UAV battery, ftemperature can be expressed as a binary,
that is, ftemperature = {temp1 , temp2 }. At different times, the value of the ftemperature binary
is different. According to the above representation, other flight data information from
UAV, such as fluid pressure and magnetic field value, can be expressed as corresponding
characteristic tuples, namely f pressure = {pre1 , pre2 , pre3 , pre4 }, fmagnetic = {mag1 , mag2 , mag3 ,
mag4 , mag5 , mag6 }. These feature tuples are marked with inconsistent frequency domain
feature information at the same time (as shown in Figure 3a). During the calculation process,
features with more frequency domain information will cover other feature information
values, leading to the loss of key information. Therefore, this paper will process based on
the following methods.
Figure 3. (a) Distribution of various features. (b) Extraction of the features in the timestamp.
select(feature)= {vij| when t = tk and (index (vij & tk) = index (vij & tk−1))} (1)
where vij represents the characteristic value, i represents the characteristic number, j rep-
resents the characteristic value number, index() represents the index of the characteristic
value in the frequency domain, and tk represents the time.
Step 2: Frequency domain information fusion.
453
Electronics 2023, 12, 1299
Figure 4. (a) The data distribution of ALFA. (b) The balanced ALFA.
Figure 5. (a) The normal flight data. (b) The flight data of elevator failure.
454
Electronics 2023, 12, 1299
Figure 6. (a) The separable convolutions. (b) The separable convolutional neural network.
Set the input as M channels, the image size as Df_in × Df_in, the convolution kernel as
N × (M × Dk × Dk), and the output feature map as N channels and size Df_out × Df_out. So, the
parameters of the separable convolution are Dk × Dk × M + M × N; the parameter quantity
of the conventional convolution is Dk × Dk × M × N. The calculation consumption of the
separable convolution is M × Dk × Dk × Df_out × Df_out +1 × 1 × N × Df_out × Df_out; the cal-
culation consumption of the conventional convolution is M × Dk × Dk × Df_out × Df_out × N.
455
Electronics 2023, 12, 1299
where (i, j) is the pixel index in the feature map, xi,j is the input slice centered on the
position (i, j), c is the channel index in the feature map, and p is the separable convolutional
parallel number.
456
Electronics 2023, 12, 1299
The FEF layer is designed to extract features from the grayscale image corresponding
to the POS and SS data, and then fuse the two extracted features. The main fusion method
involves concatenating the two feature maps. For instance, if there are 3 feature maps from
the convolution layer for each of the POS and SS data, the resulting feature map size after
fusion will be 6.
Figure 11. (a) Feature mapping and classification layer. (b) The way the feature flattens out.
457
Electronics 2023, 12, 1299
detection, involves three main stages: forward propagation, backward propagation, and
model testing, which can be broken down into the following six steps.
Step 1: Feature data extraction and fusion. Set the timestamp slice, extract and fuse the
UAV frequency domain information through Equations (1) and (2), and obtain the fixed
length UAV flight data feature vector.
Step 2: Data to image. The POS and SS data of UAV are transformed into two-dimensional
grayscale images by data reconstruction to adapt to model input.
Step 3: Feature extraction and fusion. The grayscale image features of UAV POS data and
SS data are extracted and fused using the FEF layer pass-through Equation (3).
Step 4: Feature mapping and classification. The feature map from the FEF layer is flattened
into one-dimensional data, and then the one-dimensional feature data is mapped to the
sample category space using Equation (4) to achieve classification.
Step 5: Backpropagation and parameter updating. After classification, the cross-entropy
loss function is first used to calculate the loss between the predicted and actual values.
The cross-entropy loss function is given as Equation (5) (where p(si ) and q(si ), respectively,
represent the real and predicted distributions of sample i, and H represents the final loss
value. Backpropagation is then carried out according to the loss value. The Adam optimizer
is adopted for the backward propagation to update the weight and bias of each layer.):
k
H( p, q) = − ∑i=1 p(ci )log(q(ci )) (5)
Step 6: Model testing. Input test data into the model to test the effect of the model.
4. Experiment
This study employs the PyTorch [30] deep learning library to train the TS-MSCNN and
conventional CNN models. The experiments were conducted on an HP-Z480 workstation
equipped with an Intel Xeon ® CPU and 64 GB of RAM. In this section, we will first
introduce the evaluation metrics of the model and then demonstrate the performance of the
TS-MSCNN model in binary and multi-classification tasks. We compare our model with
conventional machine learning algorithms, conventional CNNs, and other relevant research
results to verify its effectiveness. It should be noted that to adapt the convolutional structure
for feature extraction, we convert the UAV flight data into a two-dimensional grayscale
image using a data reconstruction method. Figure 13 displays the data reconstruction
method and UAV image data, where the ‘ALL’ chart shows the image data used for
the single model structure. The detailed experimental process will be discussed in the
next section.
458
Electronics 2023, 12, 1299
Recall: Measure how many positive examples in the sample are correctly predicted,
that is, the proportion of all positive examples correctly predicted, as shown in Equation (7).
Precision: It is used to measure how many samples with positive prediction are real
positive samples, that is, the proportion of real positive samples in the results with positive
prediction, as shown in Equation (8).
F1-score: The F1-score measures the harmonic mean of the precision and recall, which
serves as a derived effectiveness measurement, as shown in Equation (9).
2 × Precision × Rcall
F1 = (9)
Precision + Rcall
459
Electronics 2023, 12, 1299
Model Accuracy
CNN 95.40%
SCNN 96.35%
Next, this paper will use conventional ML methods to detect binary anomalies based
on UAV flight data. Among them, the main algorithms used are ZeroR, OneR, Naive-
Bayes [31], KNN [32], J48 [33], RandomForest [34], RandomTree [35], and Adaboost [36].
Figure 14a shows the comparison between traditional ML algorithms and CNN and SCNN
models. Additionally, the SCNN model is the best, with 96.35%. Obviously, the CNN
model has great potential for detecting UAV anomalies, and it can accurately learn fea-
tures from data. At the same time, the SCNN model based on separable convolution has
higher accuracy.
Figure 14. (a) Performance of the single model. (b) Performance of the TS-MSCNN and other models.
460
Electronics 2023, 12, 1299
multi-classification experiment using a single-model SCNN and present the specific experi-
mental results in Table 5. The results show that, in the case of multi-classification, the SCNN
not only optimizes the convolution structure parameters and computational consumption
but also ensures the effectiveness of the model and accurately detects anomalies across
multiple classes.
Model Accuracy
CNN 93.10%
SCNN 94.68%
Furthermore, this paper also employs traditional ML methods, consistent with those
used above, to detect anomalies. Figure 15a presents the experimental results. Among
them, the SCNN model achieved the best performance, with 94.68%. These results indicate
that the SCNN model has advantages over traditional ML methods in processing high-
dimensional UAV data. Moreover, the OneR algorithm obtains the lowest accuracy rate, as
it only uses a specific feature in the training data as the classification basis.
Figure 15. (a) Performance of the single model. (b) Performance of the TS-MSCNN and other models.
461
Electronics 2023, 12, 1299
Figure 16. Comparison between the binary classification and the multiclass classification.
The research in [20] and [23] are similar to the research conducted in this paper. In
order to compare the experimental results, Table 7 is presented. It is important to note that
while [23] evaluates the area under the curve (AUC) of the receiver operating characteristic
curve (ROC), this section supplements the AUC results for multiple classifications. The
authors of [20] utilized a reduced version of the ALFA dataset, whereas [23] employed
the same full version of the ALFA dataset as used in this paper. The experimental model
proposed in this paper outperforms the other comparison algorithms. Overall, the experi-
mental results show that the TS-MSCNN model proposed in this paper has achieved the
desired purpose and is ready to be used for UAV flight anomaly detection.
Table 7. The accuracies of the TS-MSCNN and the other latest algorithm in multiclass classification.
AUC
Model ACC
Aileron_Failure Elevator_Failure Engine_Failure Rudder_Failure
TS-MSCNN 99.75% 98.35% 99.77% 98.14% 97.99%
Autoencoder [23] 75.09% 80.76% 76.46% 93.21% /
Recursive Least Squares [20] / / / / 88.23%
462
Electronics 2023, 12, 1299
5. Conclusions
UAV flight anomaly detection is a common safety measure to ensure the safety of
UAV flights by identifying abnormal UAV flight data. However, the conventional anomaly
detection model neglects the difference in POS data used to evaluate UAV flight status in
the frequency domain, resulting in the loss of some crucial feature information that limits
the improvement of the UAV anomaly detection model’s accuracy. Therefore, without
considering the recoverable operation of UAV, this paper proposes a TS-MSCNN anomaly
detection model based on timestamp slice and the MSCNN. Firstly, by setting a specific
timestamp size, this paper extracts and fuses the frequency domain key features of POS
data and SS data in the UAV flight log time domain. Then, the POS data and SS data are
transformed into two-dimensional grayscale images to serve as the input data of the TS-
SCNN model through data reconstruction. Finally, the TS-SCNN model accurately learns
and fuses UAV grayscale image data features. The final experimental results demonstrate
that the TS-SCNN model outperforms the comparative algorithm in the experimental
results of binary classification and multi-classification, which validates the effectiveness of
the TS-SCNN model proposed in this paper.
The deep learning model used in anomaly detection has a high time complexity, and
UAVs typically have limited resources. Therefore, in future research, the authors of this
paper will investigate a lightweight UAV anomaly detection model, taking into account
both the timeliness of the anomaly detection model and the computational resources
required by the model. The goal is to develop an anomaly detection model that can meet
the resource constraints of UAV-embedded systems.
Author Contributions: Conceptualization, J.C. and T.Y.; methodology, J.C., T.Y. and H.D.; writing—
original draft, J.C. and Y.L; validation, J.C., T.Y., H.D. and Y.L.; writing—review and editing, J.C., T.Y.,
H.D. and Y.L.; data curation, T.Y., H.D. and Y.L.; supervision, Y.L.; project administration, J.C and Y.L.
All authors have read and agreed to the published version of the manuscript.
Funding: This work was supported by the Sichuan Science and Technology Program under Grant
No. 2022YFG0322, China Scholarship Council Program (Nos. 202001010001 and 202101010003),
Sichuan Science and Technology Program under Grant No. 2020JDRC0075, the Innovation Team
Funds of China West Normal University (No. KCXTD2022-3), the Nanchong Federation of Social
Science Associations Program under Grant No. NC22C280, and the China West Normal University
2022 University-level College Student Innovation and Entrepreneurship Training Program Project
under Grant No. CXCY2022285.
Data Availability Statement: Not applicable.
Acknowledgments: This paper was completed by the Key Laboratory of the School of Computer
Science, China West Normal University. We thank the school for its support and help.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding
this present study.
References
1. Kulbacki, M.; Segen, J.; Knieć, W.; Klempous, R.; Kluwak, K.; Nikodem, J.; Kulbacka, J.; Serester, A. Survey of drones for
agriculture automation from planting to harvest. In Proceedings of the 2018 IEEE 22nd International Conference on Intelligent
Engineering Systems (INES), Las Palmas de Gran Canaria, Spain, 21–23 June 2018; pp. 000353–000358.
2. Puri, A. A survey of unmanned aerial vehicles (UAV) for traffic surveillance. Dep. Comput. Sci. Eng. Univ. S. Fla. 2005, 1–29.
3. Innocente, M.S.; Grasso, P. Self-organising swarms of firefighting drones: Harnessing the power of collective intelligence in
decentralised multi-robot systems. J. Comput. Sci. 2019, 34, 80–101. [CrossRef]
4. Choudhary, G.; Sharma, V.; You, I.; Yim, K.; Chen, R.; Cho, J.H. Intrusion detection systems for networked unmanned aerial
vehicles: A survey. In Proceedings of the 2018 14th International Wireless Communications & Mobile Computing Conference
(IWCMC), Limassol, Cyprus, 25–29 June 2018; pp. 560–565.
5. Available online: www.popularmechanics.com (accessed on 15 December 2022).
6. Jimu News. Available online: http://www.ctdsb.net/ (accessed on 10 December 2022).
7. Civil Aviation Administration of China. Available online: www.caac.gov.cn (accessed on 20 December 2022).
463
Electronics 2023, 12, 1299
8. Puranik, T.G.; Mavris, D.N. Identifying instantaneous anomalies in general aviation operations. In Proceedings of the 17th AIAA
Aviation Technology, Integration, and Operations Conference, Atlanta, GA, USA, 25–29 June 2017; p. 3779.
9. Hamel, T.; Mahony, R. Attitude estimation on SO [3] based on direct inertial measurements. In Proceedings of the 2006 IEEE
International Conference on Robotics and Automation, 2006. ICRA 2006, Orlando, FL, USA, 15–19 May 2006; pp. 2170–2175.
10. Garraffa, G.; Sferlazza, A.; D’Ippolito, F.; Alonge, F. Localization Based on Parallel Robots Kinematics as an Alternative to
Trilateration. IEEE Trans. Ind. Electron. 2021, 69, 999–1010. [CrossRef]
11. Kendoul, F. Survey of advances in guidance, navigation, and control of unmanned rotorcraft systems. J. Field Robot. 2012, 29,
315–378. [CrossRef]
12. Alonge, F.; D’Ippolito, F.; Fagiolini, A.; Garraffa, G.; Sferlazza, A. Trajectory robust control of autonomous quadcopters based on
model decoupling and disturbance estimation. Int. J. Adv. Robot. Syst. 2021, 18, 1729881421996974. [CrossRef]
13. Koubâa, A.; Allouch, A.; Alajlan, M.; Javed, Y.; Belghith, A.; Khalgui, M. Micro air vehicle link (mavlink) in a nutshell: A survey.
IEEE Access 2019, 7, 87658–87680. [CrossRef]
14. Keipour, A.; Mousaei, M.; Scherer, S. Alfa: A dataset for uav fault and anomaly detection. Int. J. Robot. Res. 2021, 40, 515–520.
[CrossRef]
15. Mitchell, R.; Chen, I.R. Specification based intrusion detection for unmanned aircraft systems. In Proceedings of the First ACM
MobiHoc Workshop on Airborne Networks and Communications, Hilton Head, SC, USA, 11 June 2012; pp. 31–36.
16. Mitchell, R.; Chen, R. Adaptive intrusion detection of malicious unmanned air vehicles using behavior rule specifications. IEEE
Trans. Syst. Man Cybern. Syst. 2013, 44, 593–604. [CrossRef]
17. Sedjelmaci, H.; Senouci, S.M.; Ansari, N. A hierarchical detection and response system to enhance security against lethal
cyber-attacks in UAV networks. IEEE Trans. Syst. Man Cybern. Syst. 2017, 48, 1594–1606. [CrossRef]
18. Liu, Y.; Ding, W. A KNNS based anomaly detection method applied for UAV flight data stream. In Proceedings of the 2015
Prognostics and System Health Management Conference (PHM), Beijing, China, 21–23 October 2015; pp. 1–8.
19. Sedjelmaci, H.; Senouci, S.M.; Ansari, N. Intrusion detection and ejection framework against lethal attacks in UAV-aided networks:
A Bayesian game-theoretic methodology. IEEE Trans. Intell. Transp. Syst. 2016, 18, 1143–1153. [CrossRef]
20. Keipour, A.; Mousaei, M.; Scherer, S. Automatic real-time anomaly detection for autonomous aerial vehicles. In Proceedings of
the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 5679–5685.
21. Shrestha, R.; Omidkar, A.; Roudi, S.A.; Abbas, R.; Kim, S. Machine-learning-enabled intrusion detection system for cellular
connected UAV networks. Electronics 2021, 10, 1549. [CrossRef]
22. Chowdhury MM, U.; Hammond, F.; Konowicz, G.; Xin, C.; Wu, H.; Li, J. A few-shot deep learning approach for improved
intrusion detection. In Proceedings of the 2017 IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication
Conference (UEMCON), New York, NY, USA, 19–21 October 2017; pp. 456–462.
23. Park, K.H.; Park, E.; Kim, H.K. Unsupervised fault detection on unmanned aerial vehicles: Encoding and thresholding approach.
Sensors 2021, 21, 2208. [CrossRef] [PubMed]
24. Abu Al-Haija, Q.; Al Badawi, A. High-performance intrusion detection system for networked UAVs via deep learning. Neural
Comput. Appl. 2022, 34, 10885–10900. [CrossRef]
25. Dudukcu, H.V.; Taskiran, M.; Kahraman, N. Unmanned Aerial Vehicles (UAVs) Battery Power Anomaly Detection Using
Temporal Convolutional Network with Simple Moving Average Algorithm. In Proceedings of the 2022 International Conference
on INnovations in Intelligent SysTems and Applications (INISTA), Biarritz, France, 8–12 August 2022; pp. 1–5.
26. Zhang, C.; Li, D.; Liang, J.; Wang, B. MAGDM-oriented dual hesitant fuzzy multigranulation probabilistic models based on
MULTIMOORA. Int. J. Mach. Learn. Cybern. 2021, 12, 1219–1241. [CrossRef]
27. Xie, H.; Hao, C.; Li, J.; Li, M.; Luo, P.; Zhu, J. Anomaly Detection for Time Series Data Based on Multi-granularity Neighbor
Residual Network. Int. J. Cogn. Comput. Eng. 2022, 3, 180–187. [CrossRef]
28. Khan, W.; Haroon, M. An unsupervised deep learning ensemble model for anomaly detection in static attributed social networks.
Int. J. Cogn. Comput. Eng. 2022, 3, 153–160. [CrossRef]
29. Sifre, L.; Mallat, S. Rigid-motion scattering for texture classification. arXiv 2014, arXiv:1403.1687.
30. Pytorch. Available online: https://pytorch.org/ (accessed on 1 December 2022).
31. GJohn, P.L. Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty
in Artificial Intelligence, Montreal, QC, Canada, 18–20 August 1995; pp. 338–345.
32. Peterson, L.E. K-nearest neighbor. Scholarpedia 2009, 4, 1883. [CrossRef]
33. Quinlan, J.R. C4. 5: Programs for Machine Learning; Elsevier: Amsterdam, The Netherlands, 2014.
34. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
35. Aldous, D. The continuum random tree. II. An overview. Stoch. Anal. 1991, 167, 23–70.
36. Schapire, R.E. Explaining adaboost. In Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik; Springer: Berlin/Heidelberg,
Germany, 2013; pp. 37–52. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
464
electronics
Article
A Context Awareness Hierarchical Attention Network for Next
POI Recommendation in IoT Environment
Xuebo Liu, Jingjing Guo * and Peng Qiao *
Shanxi Information Industry Technology Research Institute Co., Ltd., Taiyuan 030012, China
* Correspondence: [email protected] (J.G.); [email protected] (P.Q.)
Abstract: The rapid increase in the number of sensors in the Internet of things (IoT) environment has
resulted in the continuous generation of massive and rich data in Location-Based Social Networks
(LBSN). In LBSN, the next point-of-interest (POI) recommendation has become an important task,
which provides the best POI recommendation according to the user’s recent check-in sequences.
However, all existing methods for the next POI recommendation only focus on modeling the correla-
tion between POIs based on users’ check-in sequences but ignore the significant fact that the next
POI recommendation is a time-subtle recommendation task. In view of the fact that the attention
mechanism does not comprehensively consider the influence of the user’s trajectory sequences, time
information, social relations and geographic information of Point-of-Interest (POI) in the next POI
recommendation field, a Context Geographical-Temporal-Social Awareness Hierarchical Attention
Network (CGTS-HAN) model is proposed. The model extracts context information from the user’s
trajectory sequences and designs a Geographical-Temporal-Social attention network and a common
attention network for learning dynamic user preferences. In particular, a bidirectional LSTM model is
used to capture the temporal influence between POIs in a user’s check-in trajectory. Moreover, In
the context interaction layer, a feedforward neural network is introduced to capture the interaction
between users and context information, which can connect multiple context factors with users. Then
an embedded layer is added after the interaction layer, and three types of vectors are established
for each POI to represent its sign-in trend so as to solve the heterogeneity problem between context
factors. Finally reconstructs the objective function and learns model parameters through a negative
sampling algorithm. The experimental results on Foursquare and Yelp real datasets show that the
AUC, precision and recall of CGTS-HAN are better than the comparison models, which proves the
Citation: Liu, X.; Guo, J.; Qiao, P. A effectiveness and superiority of CGTS-HAN.
Context Awareness Hierarchical
Attention Network for Next POI
Keywords: context awareness; attention network; dynamic user preferences; next POI recommenda-
Recommendation in IoT
tion; IoT
Environment. Electronics 2022, 11,
3977. https://doi.org/10.3390/
electronics11233977
2. Related Work
The influence of sequential factors. Most next Points-of-Interest (POIs) recommenda-
tions rely on the sequence correlation in the user check-in. Wen et al. [5] calculated the
global and individual transition probabilities between clusters according to the user’s check-
in sequence, then used multi-order Markov chains to discover and rank subsequent clusters,
and finally combined with individual preferences to generate a ranking list. Although it
can model time series well, the correlation between POIs is not high. Zhang et al. [1] used
N-order Markov chains to implement POIs recommendations for higher-order sequences.
The authors recommend coarse-grained areas to users, such as location recommendations
466
Electronics 2022, 11, 3977
based on city streets, rather than user personalization. The recommended points of interest
are not targeted. Kong et al. [15] aimed to discover users’ uncertain check-in points of
interest; they extended Skip Gram to capture user preference transitions and then predicted
the next POI for uncertain check-in users. Wang et al. [22] propose a location-aware POI
recommendation system that user preferences by using user history trajectory and user
review information. However, the collaborative filtering algorithm used in the article is
likely to have a cold start problem.
The influence of geographical factors. Debnath et al. [7] mined sequential patterns from
each user’s check-in location, then used Markov chains to construct transition probability
matrices and combined them with spatial influences to generate space-aware location
recommendations. Liu et al. [2] improved the accuracy of POIs recommendations by
using the conversion mode of the user’s preference for the POIs category. Feng et al. [11]
established a hierarchical binary tree according to the physical distance between POIs to
reflect the influence of geographical location. However, previous efforts mainly consider the
location information of check-in points as a whole and ignore their temporal relation. Using
the information from users’ location history, Yang et al. [23] proposed a location-aware POI
recommendation system that uses information from users’ location history and models
user preferences based on their reviews. It aims to solve the user’s POIs recommendation
in new regions and cities without considering the impact of time context and other time
dynamics.
The influence of time factors. Considering the spatiotemporal characteristics of LBSN,
Cheng et al. [3] proposed FPMC with candidates Region Constraint (FPMC-LR) method
and provided new POIs for users by combining individual Markov chains and local and
regional constraints. Rendle et al. [10] used Factorized Personalized Markov Chains (FPMC)
to predict the next check-in interest point by expressing the short-term and long-term
preferences of users. Xiong et al. [8] proposed a Bayesian probability tensor decomposition
model based on time context, which dynamically acquired the potential features of users,
POIs and months and could learn the global evolution of potential features. However,
it is too sparse to model the time factor by month, which made the results not ideal.
However, in practical recommendation applications, the recommendation results obtained
by these traditional recommendation methods lack the user’s personalized requirements for
POI. [24]. Liu et al. [14] used Skip Gram to train the temporal latent representation vector
of POIs and proposed a time-aware POIs recommendation model. The spatio-temporal
model TS-RNN proposed in this paper takes the spatio-temporal context elements into
account in RNN mode to replace MF and FPMC, but the evaluation standard is still BPR.
With the continuous application of the Markov chain and factorization in the next
POIs recommendation, both of them show their own limitations. The Markov chain is that
it assumes strong independence among different factors, and the state of each POI in the
first-order Markov chain is only related to the previous POI, which limits its performance.
The limitation of tensor decomposition is that it is faced with a difficult problem which is
called cold-start.
Some research work [5,9,12,13,17–19] shows that combining sequence, geography
and time factors can obtain better recommendation results. Liu et al. [12] extended RNN
and proposed a spatiotemporal recurrent neural network method. In this method, the
time conversion matrix can be created with different time intervals to simulate the time
context, and the distance conversion matrix can be created with different geographical
distances to simulate the spatial context. Inspired by the Word2Vec framework, Zhao
et al. [13] proposed the Geo-Teaser model, which embedded the time factor into the model
to capture the time characteristics, and constructed the pairwise preference ranking at the
geographical level. Then, POIs are ranked according to the preference score function, and
the top-N POIs with the highest scores are recommended for users. In order to predict
the access preference for the next POIs, Li et al. [17] introduced the time and multi-level
context attention mechanism, which can dynamically select relevant check-in locations
and distinguish context factors. The geographic-time awareness hierarchical attention
467
Electronics 2022, 11, 3977
network, which is developed by Liu et al. [18], can reveal the dependencies of the overall
sequence and the relationship between POIs through the BiLSTM network while using
the geographic factor. Huang et al. [19] proposed a context-based self-attention network
for the next POIs recommendation, which used positional encoding matrices instead of
time encodings to model dynamic contextual dependencies. Guo et al. [25] proposed
DeepFM, which combined factorization machine and feature embedding and sharing
strategy to recommend. Among them, feature embedding and sharing strategies can avoid
the establishment of feature engineering. However, the invalid second-order combination
features may bring noise and adversely affect the model performance.
However, each of the above models [13,17–19,25] does not deeply mine the distance
and time relationship between POIs in the trajectory when obtaining the correlation between
POIs and does not add user social information into the model or framework. Research
work [26,27] shows that although the influence of social relations is far less than geographi-
cal factors and time factors, it can affect the user’s check-in location selection that introduces
the social factors into the next POIs recommendation.
The proposed CGTS-HAN model uses geographic factors to capture the features of
POIs and their correlations in order to improve the recommendation performance of the
next POIs recommendation.
3. Preliminaries
3.1. Problem Definition
This section presents five definitions [14] and specific problem statements related to
the proposed next POIs recommendation problem.
. /
Definition 1. User set. The user sets represent a set of |U | users, denoted by U = u1 , u2 , ..., u|U | .
Definition
. / sets. The POIs sets represent a set of p points of interest, denoted by P =
2. POIs
𝓅1 , 𝓅2 , ..., 𝓅|P | . Each object in set of POIs consists of a 2-tuple, denoted by (𝓁oni , 𝓁ati , ti ) ,
where 𝓁oni and 𝓁ati represent the longitude and latitude of the point of interest, respectively, and ti
represents the check-in time.
. /
Definition 3. Time state set. The time state set denoted by T = t1 , t2 , ..., t|T | , which is used to
indicate the time points of the user’s check-in sorted by time in a day.
Definition 4. Check-in records. The check-in records denoted by T ri . A check-in record represents
a record of a user’s visit to a point of interest in one day.
Problem statement: Given a user, denoted by ui , and his check-in history, denoted by
T r, according to the user’s check-in history and the current point of interest, recommend
the 𝓅n for the user to visit at the next moment from P .
468
Electronics 2022, 11, 3977
Symbols Interpretation
ui, pi Preference vectors of ui , 𝓅i
Tr Sets of trajectory sequences for users
T ri A set of check-ins for user ui
ci Context of user ui
ti ti latent semantic vector
Matrices of user preference, social relationship
Ui,So,Ti
and ti latent semantic
The geographical predecessor vector, the
pr,su,pi geographical successor vector, the preference
vector of POI
The geographical predecessor matrix, the
Pr,Su,Pi geographical successor matrix, the preference
matrix of POI
dp
ui Dynamic preferences of ui
W1 ,W2 ,W3 ,...,Wzu Weight matrices in the model
Feature vector of the interaction between user
eui ,ci
ui and context ci
p ui Original feature vector of ui
gci Context feature vector of ci
b1 , b2 , b3 ,..., buz Bias terms in the model
4. CGTS-HAN Model
The framework of the CGTS-HAN model proposed in this paper is shown in Figure 1.
The model mainly consists of a context interaction layer, a geo-temporal social attention
network and a co-attention network. The context interaction layer models the interaction
between each user and their context information in the context environment and obtains
the influence of each context on the user. Embedding layers are used to address the
heterogeneity that exists among recommendation factors. Afterward, the model introduces
a geo-temporal-social attention network to model the geographic relationships, temporal
dependencies, and users’ social relationships among POIs of check-in sequences. The
co-attention network is used to capture the dynamic preferences of users. Finally, we use a
negative sampling algorithm to train the model. The next POI recommendation usually
feeds back a sorted list of points of interest to the user, so this model first calculates the
probability of the target user visiting the points of interest, then calculates the scores of
candidate points of interest according to the Bayesian Equation, and finally sorts POIs to
obtain an ordered list of top N POIs.
469
Electronics 2022, 11, 3977
the context and obtain the influence of the context on the user. The eigenvectors are as
follows.
eui ,ci = f 1 ( pui , gci ) (1)
In the above formula, the f 1 (·) is used to represent the feature interaction function;
its input is the user and the context, which are represented by ui and ci , respectively. The
eui ,ci in the Equation represents the feature vector of the interaction between the user and
context.
The input layer is responsible for receiving input and distributing it to the hidden
layers (so called because they are invisible to the user). These hidden layers are responsible
for the required calculations and output to the output layer, and the user can see the
final output of the output layer. The modeling process of the context interaction layer
feedforward neural network is shown in Figure 2.
Since the user and the context belong to different feature types of input data, the model
uses a nonlinear connection layer to map the user’s original feature vector pui and context
feature vector gci to the additional semantic space. The Equation is as follows:
In the above formula, Wuz and Wcz are the weight matrices of the nonlinear connection
layer, bz is the bias term; RELU (·) is the nonlinear activation function Linear Unit. After
the input layer is multiplied by the weight, the result is often further processed; that is, the
result is used as the input of the first layer of the hidden layer.
In order to enhance the interaction between the user and the context, the model
builds three hidden layers on top of the nonlinear connection layer, which are specifically
expressed as follows:
y1 = RELU1 (W1 y + b1 ) (3a)
y2 = RELU2 (W2 y1 + b2 ) (3b)
y3 = RELU3 (W3 y2 + b3 ) (3c)
In the above formula, W1 , b1 , and RELU1 (·) represent the weight matrix, bias term
and RELU activation function of the first hidden layer, respectively. The pronouns in the
second and third hidden layer Equations have meanings and so on.
y1 , y2 , y3 represent the outgoing vectors of the first, second and third hidden layers,
respectively.
470
Electronics 2022, 11, 3977
The outgoing vector (y3 ) of the third hidden layer in the model is passed to the output
unit, and the output unit converts it into the feature vector of the context that acts on the
user, which is expressed as follows:
In the above formula, Wzu and buz represent the weight matrix and bias term of the
output layer, respectively.
QKT
AG ( Pr, Su) = softmax( √ ) Pi (5)
n
K = RELU(SuW K + Bk ) (7)
471
Electronics 2022, 11, 3977
P( x ) = αx −γ (8)
In the above formula, x and P( x ) are positive random variables, and α and γ are
constants greater than zero. In the model, the probability value of a user visiting a point of
interest follows a PLF.
The model introduces the influence of geographical factors between adjacent POIs
into the attention network, then embed Equation (8) into Equation (5), and the rewritten
Equation (5) is as follows.
QK T
>
AG ( Pr, Su) = softmax( √ ) Pi × P( x ) (9)
n
To output the geographic impact to the next stage (i.e., the temporal impact), the matrix
output by the geographical attention network from Equation (9) is defined as follows.
472
Electronics 2022, 11, 3977
In the above Equation, W1u and W2u are weight coefficients, b4 and b5 are bias terms,
and em is the embedded data of the user’s (ui ) neighbor user (um ). After s1 (ui , um ) is
calculated by Equation (15), it is normalized by the softmax function used in Equation (16),
and finally, the social influence score is obtained as follows.
exp(s1 (ui , um ))
αui ,um = softmax(s1 (ui , um )) = (16)
∑ exp(s1 (ui , un ))
n∈So
= Ψ(αi W α + b6 )W U
dp dp
ui (17)
where P(𝓅n |ui , ci , T r ) is the probability distribution of user ui ’s access to POI 𝓅n , which
dp
is calculated according to the weighted average of user ui ’s dynamic preference ui ’s
attention weight. pi is the transpose matrix of preference vectors of POI 𝓅i , and pv is the
T T
The computational cost of the above objective function will increase with the increase
of POIs during optimization. Using the negative sampling method to optimize the objective
function will reduce the training complexity, which can significantly improve computational
473
Electronics 2022, 11, 3977
efficiency. Therefore, the model rewrites Equation (18) using the Negative Sampling
technique, and the results are shown below.
K
dp dp T
L= ∑ (log σ(ui , piT ) + ∑ log σ (−ui , pi ) − λΘ (22)
x ∈X k =1,pi ∼q
In the above formula, σ = 1+1e−x is used to approximate the probability, and K is the
number of negatively sampled POIs.
The ranking score of each POI in the final recommendation list of the model is formed
p
according to the ranking score (r̂ui ) of the candidate POIs and their probability distribution.
p p
Sr̂ui = r̂ui × P( pn ui , ci , Tr ) (24)
474
Electronics 2022, 11, 3977
5. Experiment
5.1. Processing of Datasets
In this paper, we use the published Foursquare dataset and Yelp dataset for experi-
ments. Among them, the Foursquare dataset selects the check-in data of New York users
from 1 May to 30 June 2014. The Yelp dataset selects the activity data of New York users
from 1 August 1 to 30 October 2017. Moreover, we remove inactive users with less than 10
check-in locations and POIs with less than 10 check-ins from the datasets.
Table 2 shows the dataset statistics after preprocessing. In order to make the model
proposed in this paper more suitable for the check-in scenario of POIs, we take 80% of the
check-in trajectories of each user in the two datasets as the training sets and 20% as the test
sets.
According to the research results in related works, the two important factors in the
recommendation of the next POIs are the distance and time between POIs. Figure 3a
and Figure 3b, respectively, represent the Cumulative Distribution Function (CDF) of the
distance between two adjacent POIs checked in by each user in one day on the Foursquare
datasets and Yelp datasets. The role of CDF is to help us understand the imbalance
of distance distribution and find out which check-in distance accounts for the largest
proportion of the total.
(a) (b)
Figure 3. CDF of Dis between next Check-ins POIs: (a) CDF on Foursquare; (b) CDF on Yelp.
In Figure 4, the horizontal axis represents the check-in distance between POIs, and
the vertical axis represents the distance distribution ratio. It can be seen that about 85%
of the consecutive check-in distances in the Foursquare and Yelp datasets are within the
range of 8.2 km and 7.8 km. It is well known that weekdays and weekends in a week
have different effects on check-in locations, so we divide the check-in times in the datasets
into two categories: weekdays and weekends. Similarly, in order to observe the influence
of the time of day on the check-in location, we divided the time of day into six periods:
Early Morning (04:01–08:00), Morning (08:01–12:00), noon (Noon, 12:01–16:00), afternoon
(Afternoon, 16:01–20:00), evening (Night, 20:01–24:00) and late night (Wee, 00:01–04:00).
The results in Figure 4. show that the distance between consecutive check-in points on
weekends is slightly larger than that during weekdays, which means that people are more
inclined to go to places with farther distances between POIs on weekends.
475
Electronics 2022, 11, 3977
(a) (b)
(c) (d)
Figure 4. CDF of Dis between next Check-ins POIs on Weekday and Weekend: (a) Weekday’s CDF
on Foursquare; (b) Weekday’s CDF on Yelp; (c) Weekend’s CDF on Foursquare; (d) Weekend’s CDF
on Yelp.
In the above formula, J represents the positive sample set, and J represents the
negative sample set, ui represents the i’th user in the set U . When the indicator function
δ( pui ,j > pui ,j ) returns true, it means that the predicted probability of ui acting on positive
samples is greater than the predicted probability of negative samples. The opposite is true
if the instructed function δ( pui ,j > pui ,j ) returns false. The higher the AUC, the stronger
the ranking ability of the model.
Precision and Recall are used to compare the prediction results of all algorithms’ bias
sizes.
The calculation method of Precision@N is as follows.
T
1 Pui ∩ PuRi
|U | u∑
Precision@N = (26)
∈U
N
i
In the above formula, ui represents the i’th user in the set U , PuRi represents the set of
POIs recommended to ui in the training set, and PuTi represents the set of POIs that ui has
checked in in the test set. N represents the number of test instances. Moreover, the higher
the Precision@N and Recall@N, the more accurate and comprehensive the recommendation
results are.
476
Electronics 2022, 11, 3977
477
Electronics 2022, 11, 3977
(a) (b)
(c) (d)
Figure 5. Comparison Results of Precision with N: (a) Precision of train on Foursquare; (b) Precision
of test on Foursquare; (c) Precision of train on Yelp; (d) Precision of test on Yelp.
(a) (b)
(c) (d)
Figure 6. Comparison Results of Recall with N: (a) Recall of train on Foursquare; (b) Recall of test on
Foursquare; (c) Recall of train on Yelp; (d) Recall of test on Yelp.
It can be seen from Figure 5 that, with the increase of N, the Precision of the rest of the
models except the HST-LSTM model increases first and then gradually decreases. Moreover,
it can see from Figure 6 that the Recall rate of all models increases with the increase of N.
To balance the effects of precision and recall, we select the top 15 POIs for recommendation
in the experiment. In this case, the precision of the proposed CGTS-HAN model on the
Foursquare and Yelp datasets is 18% and 6% higher than that of DeepFM, and the Recall is
10% and 6% higher, respectively. This verifies the good effect of CGTS-HAN integrating
sequence, geographic, time and social influencing factors. Experiments demonstrate that
adding auxiliary information of social connections helps improve POIs recommendation
478
Electronics 2022, 11, 3977
performance and, at the same time, confirms the effectiveness of modeling with attention
networks.
() (b)
Figure 7. Parameter effect on α and β: (a) Precision on CGTS-HAN; (b) Recall of on CGTS-HAN.
479
Electronics 2022, 11, 3977
Author Contributions: Conceptualization, X.L. and J.G.; methodology, X.L.; software, X.L. and J.G.;
validation, X.L., J.G. and P.Q.; formal analysis, X.L.; investigation, J.G. and P.Q.; resources, P.Q.; data
curation, X.L.; writing—original draft preparation, X.L. and J.G., J.G. and P.Q.; writing—review and
editing, X.L., J.G. and P.Q.; visualization, J.G. and P.Q.; supervision, P.Q.; project administration, X.L.
All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: Data sharing not applicable. No new data were created or analyzed in
this study. Data sharing is not applicable to this article.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Zhang, J.D.; Chow, C.Y.; Li, Y. Lore: Exploiting sequential influence for location recommendations. In Proceedings of the 22nd
ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Dallas, TX, USA, 4–7 November
2014; pp. 103–112.
2. Liu, X.; Liu, Y.; Aberer, K.; Miao, C. Personalized point-of-interest recommendation by mining users’ preference transition. In
Proceedings of the 22nd ACM international conference on Information & Knowledge Management, San Francisco, CA, USA, 27
October–1 November 2013; pp. 733–738.
3. Cheng, C.; Yang, H.; Lyu, M.R.; King, I. Where you like to go next: Successive point-of-interest recommendation. In Proceedings
of the Twenty-Third international joint conference on Artificial Intelligence, Beijing, China, 3–9 August 2013; pp. 2605–2611.
4. Quba RC, A.; Hassas, S.; Fayyad, U.; Alshomary, M.; Gertosio, C. iSoNTRE: The Social Network Transformer into Recommendation
Engine. In Proceedings of the 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA),
Doha, Qatar, 10–13 November 2014; pp. 169–175.
5. Wen, Y.; Zhang, J.; Zeng, Q.; Chen, X.; Zhang, F. Loc2Vec-Based Cluster-Level Transition Behavior Mining for Successive POI
Recommendation; IEEE Access: Piscataway, NJ, USA, 2019; Volume 7, pp. 109311–109319.
6. Sarwat, M.; Mokbel, M.F. Differentially Private Location Recommendations in Geosocial Networks. In Proceedings of the IEEE
15th International Conference on Mobile Data Management, Brisbane, QLD, Australia, 14–18 July 2014; pp. 59–68.
7. Debnath, M.; Tripathi, P.K.; Elmasri, R. Preference-Aware Successive POI Recommendation with Spatial and Temporal Influence.
In International Conference on Social Informatics; Springer: Berlin/Heidelberg, Germany, 2016; Volume 10046, pp. 347–360.
8. Xiong, L.; Chen, X.; Huang, T.K.; Schneider, J.; Carbonell, J.G. Temporal collaborative filtering with bayesian probabilistic tensor
factorization. In Proceedings of the 2010 SIAM International Conference on Data Mining. Society for Industrial and Applied
Mathematics, Columbus, OH, USA, 29 April–1 May 2010; pp. 211–222.
9. Feng, S.; Li, X.; Zeng, Y.; Cong, G.; Chee, Y.W.; Yuan, Q. Personalized ranking metric embedding for next new poi recommendation.
In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July
2015; pp. 2069–2075.
10. Rendle, S.; Freudenthaler, C.; Schmidt-Thieme, L. Factorizing personalized markov chains for next-basket recommendation. In
Proceedings of the 19th International Conference on World Wide Web, Raleigh, NC, USA, 26-30 April 2010; pp. 811–820.
11. Feng, S.; Cong, G.; An, B.; Chee, Y.M. Poi2vec: Geographical latent representation for predicting future visitors. In Proceedings of
the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 102–108.
480
Electronics 2022, 11, 3977
12. Liu, X.; Liu, Y.; Li, X. Exploring the Context of Locations for Personalized Location Recommendations. In Proceedings of the
Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, NY, USA, 9–15 July 2016; pp. 1188–1194.
13. Zhao, S.; Zhao, T.; King, I.; Lyu, M.R. Geo-teaser: Geo-temporal sequential embedding rank for point-of-interest recommendation.
In Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, 3–7 April 2017; pp.
153–162.
14. Liu, Q.; Wu, S.; Wang, L.; Tan, T. Predicting the next location: A recurrent model with spatial and temporal contexts. In
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 194–200.
15. Kong, D.; Wu, F. HST-LSTM: A hierarchical spatial-temporal long-short term memory network for location prediction. In
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018;
Volume 18, pp. 2341–2347.
16. Li, C.; Li, D.; Zhang, Z.; Chu, D. MST-RNN: A Multi-Dimension Spatiotemporal Recurrent Neural Networks for Recommending
the Next Point of Interest. Mathematics 2022, 10, 1838. [CrossRef]
17. Li, R.; Shen, Y.; Zhu, Y. Next point-of-interest recommendation with temporal and multi-level context attention. In Proceedings of
the 2018 IEEE International Conference on Data Mining (ICDM), Singapore, 17–20 November 2018; pp. 1110–1115.
18. Liu, T.; Liao, J.; Wu, Z.; Wang, Y.; Wang, J. A geographical-temporal awareness hierarchical attention network for next point-of-
interest recommendation. In Proceedings of the 2019 on International Conference on Multimedia Retrieval, Ottawa, ON, Canada,
10–13 June 2019; pp. 7–15.
19. Huang, X.; Qian, S.; Fang, Q. Csan: Contextual self-attention network for user sequential recommendation. In Proceedings of the
26th ACM International Conference on Multimedia, Seoul, Republic of Korea, 22–26 October 2018; ACM: New York, NY, USA,
2018; pp. 447–455.
20. Xie, Y.; Zhao, J.; Qiang, B.; Mi, L.; Tang, C.; Li, L. Attention mechanism-based CNN-LSTM model for wind turbine fault prediction
using SSN ontology annotation. Wirel. Commun. Mob. Comput. 2021, 2021, 6627588. [CrossRef]
21. Ojagh, S.; Malek, M.R.; Saeedi, S.; Liang, S. An Internet of Things (IoT) Approach for Automatic Context Detection. In Proceedings
of the 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver,
BC, Canada, 1–3 November 2018; pp. 223–226.
22. Wang, X.; Liu, Y.; Zhou, X.; Wang, X.; Leng, Z. A Point-of-Interest Recommendation Method Exploiting Sequential, Category and
Geographical Influence. ISPRS Int. J. Geo-Inf. 2022, 11, 80. [CrossRef]
23. Yang, X.; Zimba, B.; Qiao, T.; Gao, K.; Chen, X. Exploring IoT location information to perform point of interest recommendation
engine: Traveling to a new geographical region. Sensors 2019, 19, 992. [CrossRef] [PubMed]
24. Wang, H.; Shen, H.; Ouyang, W.; Cheng, X. Exploiting POI-Specific Geographical Influence for Point-of-Interest Recommendation.
In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July
2018; pp. 3877–3883.
25. Guo, H.; Tang, R.; Ye, Y.; Li, Z.; He, X. DeepFM: A factorization-machine based neural network for CTR prediction. In Proceedings
of the Twenty-Sixth International Joint Conference on Artificial Intelligence(IJCAI-17), Melbourne, Australia, 19–25 August 2017;
pp. 1725–1731.
26. Li, J.; Sellis, T.; Culpepper, J.S.; He, Z.; Liu, C.; Wang, J. Geo-social influence spanning maximization. In IEEE Transactions on
Knowledge and Data Engineering; IEEE: Piscataway, NJ, USA, 2017; Volume 29, pp. 1653–1666.
27. Haldar, N.; Li, J.; Reynolds, M.; Sellis, T.; Yu, J.X. Location prediction in large-scale social networks: An in-depth benchmarking
study. VLDB J. 2019, 5, 623–648. [CrossRef]
28. Ye, M.; Yin, P.; Lee, W.C.; Lee, D.-L. Exploiting geographical influence for collaborative point-of-interest recommendation. In
Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing,
China, 24–28 July 2016; pp. 325–334.
29. Zhou, P.; Shi, W.; Tian, J.; Qi, Z.; Li, B.; Hao, H.; Xu, B. Attention-based bidirectional long short-term memory networks for relation
classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12
August 2016; Volume 2, pp. 207–212.
481
electronics
Article
Cost-Sensitive Multigranulation Approximation in
Decision-Making Applications
Jie Yang 1,2 , Juncheng Kuang 2 , Qun Liu 2 and Yanmin Liu 1,∗
1 School of Physics and Electronic Science, Zunyi Normal University, Zunyi 563002, China
2 Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts
and Telecommunications, Chongqing 400065, China
* Correspondence: [email protected]; Tel.: +86-1508-606-1907
Abstract: A multigranulation rough set (MGRS) model is an expansion of the Pawlak rough set, in
which the uncertain concept is characterized by optimistic and pessimistic upper/lower approximate
boundaries, respectively. However, there is a lack of approximate descriptions of uncertain concepts
by existing information granules in MGRS. The approximation sets of rough sets presented by
Zhang provide a way to approximately describe knowledge by using existing information granules.
Based on the approximation set theory, this paper proposes the cost-sensitive multigranulation
approximation of rough sets, i.e., optimistic approximation and pessimistic approximation. Their
related properties were further analyzed. Furthermore, a cost-sensitive selection algorithm to optimize
the multigranulation approximation was performed. The experimental results show that when
multigranulation approximation sets and upper/lower approximation sets are applied to decision-
making environments, multigranulation approximation produces the least misclassification costs on
each dataset. In particular, misclassification costs are reduced by more than 50% at each granularity
on some datasets.
1. Introduction
Citation: Yang, J.; Kuang, J.; Liu, Q.; As a human-inspired paradigm, granular computing (GrC) solves complex problems
Liu, Y. Cost-Sensitive Multigranulation by utilizing multiple granular layers [1–4]. Zadeh [1] noted that information in granules
Approximation in Decision-Making refer to pieces, classes, and groups, into which complex information is divided in accordance
Applications. Electronics 2022, 11, with the characteristics and processes of understanding and decision-making. From the
3801. https://doi.org/10.3390/
different views, GrC models mainly cover four types: fuzzy sets [5], rough sets [6], quotient
electronics11223801
spaces [7], and cloud models [8]. As representative models of GrC, rough sets describe
Academic Editor: Domenico Ursino uncertain concepts by upper and lower approximation boundaries, which have been
applied to data mining [9,10], medical systems [11], attribute reductions [12,13], decision
Received: 25 October 2022
systems [14,15], and machine learning [16].
Accepted: 16 November 2022
Regarding similarity, Zhang [17–20] presented the approximation set of rough sets,
Published: 18 November 2022
vague sets, rough fuzzy sets, rough vague sets, etc. These approximation models were
developed by utilizing the existing equivalence classes to describe uncertain concepts.
The approximation model has a higher similarity with the target concept than the up-
Copyright: © 2022 by the authors. per/lower approximations. Furthermore, the approximation model has been applied in
Licensee MDPI, Basel, Switzerland. attribute reduction [21], image segmentation [22], the optimization algorithm [23], etc.
This article is an open access article Based on the approximation set theory, Yang [24,25] developed the approximation model
distributed under the terms and of rough sets based on misclassification costs. In the process of cost-sensitive learning,
conditions of the Creative Commons the smaller misclassification costs will help to improve the decision-making qualities in
Attribution (CC BY) license (https:// real applications. Recently, from the perspective of three-way decisions [26–29], Yao [30]
creativecommons.org/licenses/by/ constructed a symbol–meaning–value (SMV) model for data analysis. In the three-way
4.0/).
decision model, the equivalence classes in a boundary region will produce misclassification
costs when they are used as approximation sets. Hence, the approximation model that
is constructed from the perspective of similarity is no longer applicable to cost-sensitive
scenarios. To minimize the misclassification costs of constructing the approximation set,
we proposed the multigranulation approximation, i.e., the optimistic approximation model
and pessimistic approximation model. Moreover, to search the optimal approximation
layer for multigranulation rough sets [31] under the constraints, the algorithm of the cost-
sensitive multigranulation approximation selection is further proposed to be applied to
decision-making environments.
The following sections are arranged as follows: Section 2 presents the related works.
Section 3 introduces the relevant definitions of the multigranulation rough set and ap-
proximation set. Section 4 introduces an approximate representation of the rough sets.
Section 5 presents the cost-sensitive multigranulation approximations of rough sets and
further introduces the optimal multigranulation approximation algorithm. To verify the
availability of the proposed model, the related experiments and discussion are presented in
Section 6. Ultimately, in Section 7, the conclusions are presented.
2. Literature Review
Rough sets are typically constructed based on a single binary relation. However,
in many cases, they may be described in multiple granularity structures. In order to extend
single granularity to multi-granularity in rough approximations, Qian [31] proposed the
multigranulation rough set model (MGRS), where the upper/lower approximations were
defined by multi-equivalence relations (multiple granulations) in the universe [32,33].
For the lower approximation of optimistic MGRS, at least one granular space was obtained,
such that objects completely belonged to the target concept. For the lower approximation
of pessimistic MGRS, objects completely belong to target concepts in each granular space.
MGRS has two advantages: (1) In the process of decision-making applications, the decision
of each decision maker may be independent of the same project (or an element) in the
universe [34]. In this situation, the intersection operations between any two granularity
structures will be redundant for decision-making [35]. (2) Extract decision rules from
distributive information systems and groups of intelligent agents by using rough set
approaches [34,36].
There are many works [33–35,37–42] on multigranulation rough sets. To extend
the MGRS to the neighborhood information system, Hu [43,44] presented matrix-based
incremental approaches to update knowledge about neighborhood information systems by
changing the granular structures. From the perspective of uncertainty measure, Sun [39]
proposed a feature selection based on fuzzy neighborhood multigranulation rough sets.
Xu [38] proposed a dynamic approximation update mechanism of a multigranulation
neighborhood rough set from a local viewpoint. Liu [35] introduced a parameter-free multi-
granularity attribute reduction scheme, which is more effective for microarray data than
other well-established attribute reductions. Based on the three-way decision theory, She [40]
presented a five-valued logic approach for the multigranulation rough set model. From the
above, however, the method of approximately describing the target concept with existing
information granules is not given, which limits the application of the multigranulation
rough set theory. Li [41] presented two kinds of local multigranulation rough set models
in the ordered decision system by extending the single granulation environment to a
multigranulation case. Zhang [42] constructed hesitant fuzzy multigranulation rough sets
to handle the hesitant fuzzy information and group decision-making for person–job fit.
3. Preliminaries
In this section, some necessary definitions related to the multigranulation rough
set and approximation set are reviewed to facilitate the framework of this paper. Let
S = (U, C ∪ D, V, f ) be a decision information table, where U is a non-empty finite set of
484
Electronics 2022, 11, 3801
A( X ) = { x ∈ U |[ x ] A ⊆ X },
A( X ) = { x ∈ U |[ x ] A ∩ X = φ}.
Based on the lower and upper approximations, the universe U can be divided into
three disjoint regions, which are expressed as follows:
POS A ( X ) = A( X ),
BND A ( X ) = A( X ) − A( X ),
NEG A ( X ) = U − A( X ).
m m
∑ AOi (X ) =∼ ∑ AOi (∼ X ). (2)
i =1 i =1
m m
Then, ( ∑ AO
i ( X ), ∑ Ai ( X )) is called optimistic multigranulation rough sets. The lower
O
i =1 i =1
and upper approximation sets of X in optimistic multigranulation rough sets are presented by
multiple independent approximation spaces. The boundary regions are defined as follows:
m m
BNDOm
∑ Ai
(X) = ∑ AOi (X ) − ∑ AOi (X ). (3)
i =1 i =1
i =1
m m
∑ AiP (X ) =∼ ∑ AiP (∼ X ). (5)
i =1 i =1
m m
Then, ( ∑ AiP ( X ), ∑ AiP ( X )) is called pessimistic multigranulation rough sets. The lower
i =1 i =1
and upper approximation sets of X in pessimistic multigranulation rough sets are presented by
485
Electronics 2022, 11, 3801
multiple independent approximation spaces. However, the strategy is different from optimistic
multigranulation rough sets. The boundary region is defined as follows:
m m
BND Pm
∑ Ai
(X) = ∑ AiP (X ) − ∑ AiP (X ). (6)
i =1 i =1
i =1
|[ x ]i ∩ X |
where 0 ≤ α ≤ 1, μ([ x ]i ) = |[ x ]i |
denotes the membership degree of the equivalence class [ x ]i
belongs to X.
[ x4 ]1 = { x1 , x2 , x3 , x4 };
[ x4 ]2 = { x3 , x4 , x7 , x8 };
[ x4 ]3 = { x2 , x4 , x6 , x8 }.
Accordingly, the membership degrees are computed:
0+0+0+1
μ([ x4 ]1 ) = = 0.25;
4
0+1+1+1
μ([ x4 ]2 ) = = 0.75;
4
0+1+1+1
μ([ x4 ]3 ) = = 0.75.
4
A1 A2 A3 X
x1 0 0 0 0
x2 0 0 1 0
x3 0 1 0 0
x4 0 1 1 1
x5 1 0 0 0
x6 1 0 1 1
x7 1 1 0 1
x8 1 1 1 1
If α is set to 0.5, considering the optimistic approximation, element x4 will be classified into
the optimistic lower approximation sets of X due to one of its membership degrees being greater than
0.5. However, if considering the pessimistic approximation, element x4 will only be classified into
the pessimistic upper approximation sets of X.
Based on the given conditions, we have:
0.25 + 0.25 + 0.25 0.25 + 0.25 + 0.75 0.25 + 0.75 + 0.05 0.25 + 0.75 + 0.75
X= + + +
x1 x2 x3 x4
0.75 + 0.25 + 0.25 0.75 + 0.25 + 0.75 0.75 + 0.75 + 0.25 0.75 + 0.75 + 0.75
+ + + + .
x5 x6 x7 x8
486
Electronics 2022, 11, 3801
Moreover, the results of the pessimistic approximations, in this case, are changed as follows:
m
∑ AiP (X ) = { x8 },
i =1
m
∑ AiP (X ) = { x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 },
i =1
BND Pm ( X ) = { x1 , x2 , x3 , x4 , x5 , x6 , x7 }.
∑ Ai
i =1
487
Electronics 2022, 11, 3801
Suppose 0 ≤ λ12 , λ21 ≤ 1, boundary region I and boundary region II are denoted
by BN1( X ) = { [ x ]i | λ λ+12λ ≤ μ([ x ]i ) < 1}and BN2( X ) = { [ x ]i |0 < μ([ x ]i ) < λ λ+12λ },
12 21 12 21
respectively, then BN ( X ) = BN1( X ) ∪ BN2( X ), and A( X ) = BN1( X ) ∪ POS( X ). Figure 1
shows the CSA of rough sets, where BN1( X ) is the dark blue region, which denotes the
region in the boundary region used as the approximation. BN2( X ) is the light blue region,
which denotes the region in the boundary region not used as the approximation. Therefore,
the region surrounded by the green broken line in Figure 1 constructs the approximations
of rough sets, and the misclassification costs come from two uncertain regions, defined as
follows:
DC ( A( X )) = ∑ λY + ∑ λ N . (11)
[ x ]i ∈ BN1( X ) [ x ]i ∈ BN2( X )
ΔDC A1 − A2 ( X ) = DC ( A1 ( X )) − DC ( A2 ( X ))
= λ12 (1 − μ( E1 ))| E1 | − λ12 (1 − μ( F1 ))| F1 | − λ12 (1 − μ( F2 ))| F2 |.
= (| E1 | − | F1 | − | F2 | + ∑ μ( xi ) + ∑ μ( xi )− ∑ μ( xi ))λ12 .
xi ∈ F1 xi ∈ F2 xi ∈ E1
ΔDC A1 − A2 = DC ( A1 ( X )) − DC ( A2 ( X ))
= λ12 (1 − μ( E1 ))| E1 | − λ12 (1 − μ( F1 ))| F1 | − λ21 μ( F2 )| F2 |
= | F2 |(λ12 − μ̄( F2 )(λ21 + λ12 ))
488
Electronics 2022, 11, 3801
ΔDC A1 − A2 = DC ( A1 ( X )) − DC ( A2 ( X ))
= μ̄( E1 )| E1 |λ21 − μ̄( F2 )| F2 |λ21 − (1 − μ̄( F1 ))| F1 |λ12
= | F1 |μ̄( F1 )((λ21 + λ12 ) − λ12 ).
ΔDC A1 − A2 = DC ( A1 ( X )) − DC ( A2 ( X ))
= μ̄( E1 )| E1 |λ21 − μ̄( F1 )| F1 |λ21 − μ̄( F2 )| F2 |λ21
= ( ∑ μ( xi ) − ∑ μ( xi ) − ∑ μ( xi ))λ21 .
xi ∈ E1 xi ∈ F1 xi ∈ F2
(a)
(b)
(c)
(d)
Figure 2. The granules subdivided in BN1( X ) and BN2( X ) of the cost-sensitive approximation
model of rough sets. And all the red circles in the figure represent the set X. In addition, (a) shows the
case 1 in which the granules are subdivided in BN1( X ); (b) shows the case 2 in which the granules
are subdivided in BN1( X ); (c) shows the case 1 in which the granules are subdivided in BN2( X );
(d) shows the case 2 in which the granules are subdivided in BN2( X ).
489
Electronics 2022, 11, 3801
m
i ( X ) can be =expressed as
From the perspective of the optimistic membership degree, ∑ AO
i =1
follows:
m
∑ AOi (X ) = { x|μO∑m A (x) ≥ γ, x ∈ U }. (14)
i =1 i
i =1
POSOm ( X ) = { x |μOm ( x ) = 1, x ∈ U },
∑ Ai ∑ Ai
i =1 i =1
NEGOm ( X ) = { x |μOm ( x ) = 0, x ∈ U }.
∑ Ai ∑ Ai
i =1 i =1
We have
m
∑ AOi (X ) = BN1O∑m A (X ) ∪ POSO∑m A (X ). (15)
i =1 i i
i =1 i =1
490
Electronics 2022, 11, 3801
The misclassification costs of approximations of optimistic MGRS come from two uncertain
regions BN1Om ( X ) and BN2Om ( X ), which are defined in the following:
∑ Ai ∑ Ai
i =1 i =1
m
DC ( ∑ AO
i ( X )) = ∑ λY + ∑ λN .
i =1 x ∈ BN1Om ( X ) x ∈ BN2Om ( X ) (16)
∑ Ai ∑ Ai
i =1 i =1
Proof of Theorem 2.
m <m
(1) i (X) =
From Formula (14), ∑ AO i =1 Ai ( X ) obviously holds.
i =1
m ?n <m ?n <m ?n
(2) i (
∑ AO j =1 Xj ) = i =1 i (
AO j =1 Xj ) = i =1 ( j =1 i ( X j )).
AO
i =1
m ?n <m ?n
(3) i (
∑ AO j =1 Xj ) = i =1 ( j =1 Ai ( X j ))
i =1? <m
= n
j =1 ( i =1 Ai ( X j ) ∩ · · ·
?n m
= j =1 ( ∑ AO
i ( X j )) ∩ · · ·
i =1
?n m
= j =1 ( ∑ AO
i ( X j )).
i =1
<n m m <n m <n
(4) From X j ⊆ j =1 i ( X j ) ⊆ ∑ Ai (
X j, we have ∑ AO O
j =1 X j ). Therefore, ∑ AO
i ( j =1 Xj ) ⊇
i =1 i =1 i =1
<n m
j =1 ( ∑ AO
i ( X j )).
i =1
(5) It is easy to prove by Formulas (1), (2), and (14).
μ Pm ( x ) = min{μ([ x ] Ai )|i = 1, 2, . . ., m }.
∑ Ai (17)
i =1
491
Electronics 2022, 11, 3801
m
From the perspective of the pessimistic membership degree, ∑ AiP ( X ) can be expressed
i =1
as follows:
m
∑ AiP (X ) = { x|μP∑m A (x) ≥ γ, x ∈ U }. (19)
i =1 i
i =1
BN1Pm ( X ) = { x |1 > μ Pm ( x ) ≥ γ, x ∈ U },
∑ Ai ∑ Ai
i =1 i =1
POS Pm ( X ) = { x |μ Pm ( x ) = 1, x ∈ U },
∑ Ai ∑ Ai
i =1 i =1
NEG Pm ( X ) = { x |μ Pm ( x ) = 0, x ∈ U }.
∑ Ai ∑ Ai
i =1 i =1
We have
m
∑ AiP (X ) = BN1P∑m A (X ) ∪ POSP∑m A (X ) . (20)
i =1 i i
i =1 i =1
The misclassification costs of approximations of optimistic MGRS come from two uncertain
regions BN1Pm ( X ) and BN2Pm ( X ) , which are defined in the following:
∑ Ai ∑ Ai
i =1 i =1
m
DC ( ∑ AiP ( X )) = ∑ λY + ∑ λN .
i =1 x ∈ BN1Pm (X) x ∈ BN1Pm (X) (21)
∑ Ai ∑ Ai
i =1 i =1
Proof of Theorem 3.
m
(1) ∀ x ∈ ∑ AiP ( X ), according to Definition 9, μ([ x ] Ai ) ≥ γ holds, i = 1, 2, · · ·, m. Ac-
i =1 ?m
cording to Definition 5, x ∈ Ai ( X ), i = 1, 2, · · ·, m and x ∈ i =1 ( Ai ( X )) holds, i.e.,
m ?m ?m
∑ AiP ( X ) ⊆ i =1 ( Ai ( X )). ∀ x ∈ i =1 ( Ai ( X )), μ ([ x ] Ai ) ≥ γ, i = 1, 2, · · ·, m. Accord-
i =1
m ?m m
ing to Definition 7, x ∈ ∑ AiP ( X ) holds, i.e., i =1 ( Ai ( X )) ⊆ ∑ AiP ( X ). Therefore,
i =1 i =1
m ?m
we have ∑ AiP ( X ) = i =1 ( Ai ( X )).
i =1
492
Electronics 2022, 11, 3801
m ?n ?m ?n
(2) From the proof of (1), ∑ AiP ( j =1 Xj ) = i =1 Ai ( j =1 X j ). According to Defini-
i =1
?m ?n ?m ?n ?m m
tion 7, i =1 Ai ( j =1 Xj ) = i =1 j =1 Ai ( X j ). Because i =1 Ai ( X j ) = ∑ AiP ( X j ),
i =1
m ?n ?m ?n ?n ?m ?n m
∑ AiP ( j =1 X j ) = i =1 A i ( j =1 X j ) = j =1 i =1 A i ( X j ) = j=1 ( ∑ Ai ( X j )).
P
i =1 i =1
<n m m
(3) ∀x ∈ j =1 (∑ AiP ( X j )), ∃ Xk (k ∈ {1, 2, · · ·, n}), x ∈ ∑ AiP ( X j ). According to Defini-
i =1 i =1
m <n
tion 7, ∀ X j , j = 1, 2, · · ·, n, μ([ x ] Ai ) ≥ γ, , i = 1, 2, · · ·, m, x ∈ ∑ AiP ( j =1 ( X j )) holds.
i =1
< m < m
Therefore, ∑ AiP ( nj=1 X j ) ⊇ nj=1 ( ∑ AiP ( X j )).
i =1 i =1
(4) It is easy to prove by Formulas (4), (5), and (19).
Proof of Theorem 4.
m m
(1) According to Definition 6, we only need to prove ∑ AiP ( X ) ⊆ ∑ AO
i ( X ). ∀ x ∈
i =1 i =1
m
∑ AiP ( X ), according to Definition 7, μ([ x ] Ai ) ≥ γ. From Definition 5, we have
i =1
m
x ∈ ∑ AO
i ( X ).
i =1
(2) It is easy to prove according to Definitions 5 and 7.
Proof of Lemma 1.
(1) λYEi − λ ENi = λ12 (1 − μ( Ei ))| Ei | − λ21 μ( Ei )| Ei | = | Ei |(λ12 − (λ12 + λ21 ))μ( Ei ), be-
λ12
cause Ei ∈ BN1( X ), we have λ12 +λ21 ≤ μ( Ei ) < 1, then λYEi ≤ λ ENi , therefore
∑ λYEi ≤ ∑ λ ENi .
Ei ∈ BN1( X ) Ei ∈ BN1( X )
(2) λYEi − λ ENi = λ12 (1 − μ( Ei ))| Ei | − λ21 μ( Ei )| Ei | = | Ei |(λ12 − (λ12 + λ21 ))μ( Ei ). Be-
λ12
cause Ei ∈ BN2( X ), we have 0 < μ( Ei ) < λ12 +λ21 , then λYEi ≥ λ ENi . Therefore,
∑ λ ENi ≤ ∑ λYEi .
Ei ∈ BN2( X ) Ei ∈ BN2( X )
Lemma 1 shows that the misclassification costs incurred by the equivalence classes in
characterizing X are not more than the misclassification costs incurred by the equivalence
classes when they do not characterize X in BN1( X ). Moreover, misclassification costs
incurred by the equivalence classes when they do not characterize X are not more than the
misclassification costs incurred by the equivalence classes in characterizing X in BN2( X ).
493
Electronics 2022, 11, 3801
m m
i ( X ) is taken as the approximation of X, DC ( ∑ Ai ( X )) =
Proof of Theorem 6. When ∑ AO O
i =1 i =1
m m
i ( X ) is considered the approximation of X, DC ( ∑ Ai ( X )) =
λ N ; when ∑ AO O
∑
x ∈ BN ( X ) i =1 i =1
m m
∑ λY ; when DC ( ∑ i ( X ))
AO is considered the approximation of X, DC ( ∑ AO
i ( X )) =
x ∈ BN ( X ) i =1 i =1
∑ λY + ∑ λN .
x ∈ BN1( X ) x ∈ BN2( X )
494
Electronics 2022, 11, 3801
m m m
Theorem 6 indicates that when DC ( ∑ AO
i ( X )), DC ( ∑ Ai ( X )) and DC ( ∑ Ai ( X ))
O O
i =1 i =1 i =1
m
are used as approximations of X, respectively, DC ( ∑ AO
i ( X )) generates the least misclassi-
i =1
fication costs.
m m m
DC ( ∑ AiP ( X )), DC ( ∑ AiP ( X )) and DC ( ∑ AiP ( X )) denote the misclassification costs
i =1 i =1 i =1
m m m
generated when ∑ AiP ( X ), ∑ AiP ( X ) and ∑ AiP ( X ) are approximated to X, respectively.
i =1 i =1 i =1
m m m
From Theorem 7, when DC ( ∑ AiP ( X )), DC ( ∑ AiP ( X )) and DC ( ∑ AiP ( X )) are used
i =1 i =1 i =1
m
as approximations of X, respectively, DC ( ∑ AiP ( X )) generates the least misclassification
i =1
costs. Theorems 6 and 7 reflect the advantages of the multigranulation approximation sets
that are used for approximating the target concept.
495
Electronics 2022, 11, 3801
Proof of Theorem 8.
m
DC ( ∑ AO
i ( X )) = ∑ λY + ∑ λN
i =1 x ∈ BN1Om ( X ) x ∈ BN2Om ( X )
∑ Ai ∑ Ai
i =1 i =1
= ∑ λ12 (1 − μ( x )) + ∑ λ21 μ( x ).
x ∈ BN1Om (X) x ∈ BN2Om (X)
∑ Ai ∑ Ai
i =1 i =1
μO
m −1 ( x ) ≤ μOm ( x ).
∑ Ai ∑ Ai
i =1 i =1
m −1 m
Obviously, DC ( ∑ AO
i ( X )) ≥ DC ( ∑ Ai ( X )).
O
i =1 i =1
From Theorem 7, for optimistic MGRS, to reduce the misclassification costs of the
approximation, we can add the attribute that only changes the membership of objects in
x ∈ BN1O
m −1
( X ).
∑ Ai
i =1
Proof of Theorem 9.
m
DC ( ∑ AiP ( X )) = ∑ λY + ∑ λN
i =1 x ∈ BN1Pm (X) x ∈ BN2Pm (X)
∑ Ai ∑ Ai
i =1 i =1
= ∑ λ12 (1 − μ( x )) + ∑ λ21 μ( x ).
x ∈ BN1Pm (X) x ∈ BN2Pm (X)
∑ Ai ∑ Ai
i =1 i =1
μmP−1 ( x ) ≥ μ Pm ( x ).
∑ Ai ∑ Ai
i =1 i =1
m −1 m
Obviously, DC ( ∑ AiP ( X )) ≥ DC ( ∑ AiP ( X )).
i =1 i =1
From Theorem 9, for pessimistic MGRS, to reduce the misclassification costs of the
approximation, we can add the attribute that only changes the membership of objects in
x ∈ BN2O
m −1
( X ).
∑ Ri
i =1
In practical applications, on the one hand, the factors included in the test cost, such as
money, time, environment, etc., are hard to evaluate objectively. On the other hand, these
factors are hard to be integrated because of their different dimensions. In this section, we
will evaluate test costs in an attribute-driven form, which are more objective.
496
Electronics 2022, 11, 3801
In this paper, for simplicity, to present the optimal granularity selection of the multi-
granulation approximation, we only use the optimistic MGRS as an example.
m
TC m = ∑ TC Ai . (24)
∑ AO
i i =1
i =1
In this paper, the misclassification and test costs for user requirements are represented
k
i ( X ) is selected to
as DCu and TCu , respectively. A multigranulation approximation ∑ AO
i =1
meet the constraints DC k ≤ DCu and TC k ≤ TCu , then the related decision
∑ i (X)
AO ∑ i (X)
AO
i =1 i =1
k
i ( X ). Figure 3 presents the optimal multigranulation approximation
are made on ∑ AO
i =1
3
i ( X ) complies with the requirements of the
selection of optimistic MGRS. Herein, ∑ AO
i =1
1 (X)
misclassification costs and fails to comply with the requirements of the test costs; AO
complies with the requirements of the test cost and fails to comply with the requirements
2
i ( X ) complies with both requirements of misclassification
of misclassification costs; ∑ AO
i =1
costs and test costs, enabling effective calculations according to granularity optimization.
Similarly, the optimal approximation selection of pessimistic MGRS is the same. We
formalize the computation as an optimization problem:
s.t.
ξDC k ≤ DCu ;
i (X)
∑ AO
i =1
TC k ≤ TCu .
i (X)
∑ AO
i =1
where Cost k = ξDC k + TC k , and Cost k denotes the total cost for
i (X)
∑ AO ∑ AO
i ∑ AO
i i (X)
∑ AO
i =1 i =1 i =1 i =1
k
|U |
i ( X ). ξ = ∗ 1
constructing ∑ AO
m O
DC ( Am ( X ))
reflects the contribution degree of the
i =1 ∑ A ( X )
i
i =1
k
i ( X ).
multigranulation approximation layer for the misclassification costs of ∑ AO
i =1
497
Electronics 2022, 11, 3801
Figure 3. The optimal granularity selection of the optimistic multigranulation approximation. And
the red circles in the figure represent the set X.
Firm E1 E2 E3 E4 E5 D
x1 3 3 3 3 1 High
x2 2 1 2 3 2 High
x3 2 1 2 1 2 High
· · · · · · ·
· · · · · · ·
· · · · · · ·
x898 3 3 2 2 3 Low
x899 3 1 3 3 1 Low
x900 1 1 3 3 1 Low
Attribute E1 E2 E3 E4 E5
Sig( a, C, D ) 0.74 0.56 0.54 0.36 0.77
498
Electronics 2022, 11, 3801
2 3 4 5
1 (X )
AO
i (X )
∑ AO i (X )
∑ AO i (X )
∑ AO i (X )
∑ AO
i =1 i =1 i =1 i =1
ξDC 8.3 5.9 4.7 3.6 3.5
TC 0.36 0.9 1.46 2.2 2.97
Cost 8.66 6.8 6.16 5.8 6.47
Cost k changes with the increased attributes and only Cost 3 and Cost 4
i (X)
∑ AO i (X)
∑ AO i (X)
∑ AO
i=1 i=1 i=1
satisfy DC k ≤ DCu and TC k ≤ TCu at the same time. According to the For-
i (X)
∑ AO i (X)
∑ AO
i =1 i =1
mula (25), we choose the multigranulation approximation layer with the lowest total cost
3
i ( X ). Therefore,
from the above layers; its corresponding approximation layer is ∑ AO
i =1
3
∑ i (X)
AO is the optimal multigranulation approximation used for deciding investment
i =1
plans, because it possesses lower misclassification costs, i.e., from the perspective of op-
timistic MGRS, E4 , E3 , and E2 are reasonable expert sets. The analysis of the case study
shows that the proposed method can search for a reasonable approximation under the
constraint conditions.
Attribute Condition
ID Dataset Instances
Characteristics Attributes
1 Bank Integer 39 12
2 Breast-Cancer Integer 699 9
3 Car Integer 1728 6
4 ENB2012data Real 768 8
5 Mushroom Integer 8124 22
6 Tic Integer 958 9
7 Air Quality Real 9358 12
8 Concrete Real 1030 8
9 Hcv Real 569 10
10 Wisconsin Real 699 9
11 Zoo Integer 101 16
12 Balance Integer 625 4
From Figure 4, for classical rough sets, the misclassification costs of the approximation
model monotonously decrease with the granularity being finer, which complies with human
cognitive habits.
499
Electronics 2022, 11, 3801
52 350 3000
51
300
2500
50
250
49 2000
48 200
1500
47 150
46 1000
100
45
500
50
44
43 0 0
G 1 G 2 G 3 G 4 G 5 G 6 G 1 G 2 G 3 G 4 G 5 G 6 G 1 G 2 G 3 G 4 G 5
940
7000 1800
920
6000 1600
900
840
3000 1000
820
2000 800
800
400
2.5 2300
350
2.4 2200
300
150
2.1 1900
100
2 1800
50
1.9 1700 0
G 1 G 2 G 3 G 4 G 5 G 6 G 1 G 2 G 3 G 4 G 5 G 6 G 1 G 2 G 3 G 4 G 5 G 6
2050 300
250
2000 250
200
Decision Cost
Decision Cost
Decision Cost
1950 200
150
1900 150
100
1850 100
50 1800 50
GL1 GL2 GL3 GL4 GL5 GL6 GL1 GL2 GL3 GL4 GL5 GL6 GL1 GL2 GL3 GL4 GL5 GL6
Granularity Granularity Granularity
Figure 4. The misclassification cost with the changing granularity on each dataset.
500
Electronics 2022, 11, 3801
100 2500
3000
80 2000
60 1500 2000
40 1000
1000
20 500
0 0 0
G 1 G 2 G 3 G 4 G 5 G 6 G 1 G 2 G 3 G 4 G 5 G 6 G 1 G 2 G 3 G 4 G 5
2000 1.5
1500
1500
1
1000
1000
0.5
500
500
0 0 0
G 1 G 2 G 3 G 4 G 5 G 6 G 1 G 2 G 3 G 4 G 5 G 6 G 1 G 2 G 3 G 4 G 5
2000 2 1000
600
1000 1
400
500 0.5
200
0 0 0
G 1 G 2 G 3 G 4 G 5 G 6 G 1 G 2 G 3 G 4 G 5 G 6 G 1 G 2 G 3 G 4 G 5 G 6
Decision Cost
1500 600
1000
1000 400
500
500 200
0 0 0
GL1 GL2 GL3 GL4 GL5 GL6 GL1 GL2 GL3 GL4 GL5 GL6 GL1 GL2 GL3 GL4 GL5 GL6
Granularity Granularity Granularity
Figure 5. The misclassification cost of two boundary regions under different granularities.
In Figure 6, the horizontal and vertical axes denote the granularity and misclassifi-
m
cation costs, respectively. O DC_lower, O DC_upper and O DC represent DC ( ∑ AO
i ( X )),
i =1
m m
DC ( ∑ i ( X ))
AO and DC ( ∑ i ( X )).
AO P DC_lower, P DC_upper and P DC represent
i =1 i =1
m m m
DC ( ∑ AiP ( X )),DC ( ∑ AiP ( X )) and DC ( ∑ AiP ( X )), respectively, namely, the misclassi-
i =1 i =1 i =1
m m m
i ( X ), ∑ Ri ( X ) and ∑ Ai ( X ) are approximated to
fication costs generated when ∑ AO O O
i =1 i =1 i =1
m m m
X and the misclassification costs generated when ∑ AiP ( X ), ∑ AiP ( X ) and ∑ AiP ( X )
i =1 i =1 i =1
m m
i ( X ) and ∑ Ai ( X ), the mis-
are approximated to X. Obviously, compared with ∑ AO O
i =1 i =1
m
classification costs of ∑ i (X)
AO are the least on each granularity. Similarly, compared
i =1
501
Electronics 2022, 11, 3801
m m m
with ∑ AiP ( X ), ∑ AiP ( X ), the misclassification costs of ∑ AiP ( X ) are the least on each
i =1 i =1 i =1
granularity. This is consistent with Theorems 5 and 6.
×10 4
3.4 4500 6000
O DClower O DClower
O DCupper 4000 5500 O DCupper
O DC O DC
3.2 P DClower P DClower
5000
P DCupper 3500 P DCupper
P DC P DC
4500
3 3000
4000
Decision Cost
Decision Cost
O DClower
Decision Cost
2500 O DCupper
O DC
2.8 P DClower
3500
2000 P DCupper
P DC 3000
2.6 1500
2500
1000
2000
2.4
500 1500
2.2 0 1000
GL1 GL2 GL3 GL4 GL5 GL6 GL1 GL2 GL3 GL4 GL5 GL6 GL1 GL2 GL3 GL4 GL5
Granularity Granularity Granularity
O DClower O DClower
O DCupper 3000 O DCupper
3500 O DC O DC
P DClower P DClower
2.5
P DCupper 2800 P DCupper
P DC P DC
3000
2600
2
Decision Cost
O DClower
Decision Cost
Decision Cost
2500 O DCupper 2400
O DC
P DClower
2000 P DCupper 2200
P DC 1.5
2000
1500
1800
1
1000
1600
O DClower O DClower
O DCupper O DCupper 1600
O DC O DC
3.2 P DClower
3000 P DClower
P DCupper P DCupper 1400
P DC P DC
3 2800 1200
Decision Cost
Decision Cost
Decision Cost
O DClower
1000 O DCupper
O DC
2.8 2600 P DClower
800 P DCupper
P DC
400
2.4 2200
200
2.2 2000 0
GL1 GL2 GL3 GL4 GL5 GL6 GL1 GL2 GL3 GL4 GL5 GL6 GL1 GL2 GL3 GL4 GL5 GL6
Granularity Granularity Granularity
O DClower
4000 O DCupper 1600
O DC
3000 P DClower
3500 P DCupper
P DC 1400
3000 2800
1200
Decision Cost
O DClower
Decision Cost
Decision Cost
O DClower
2500 O DCupper O DCupper
O DC O DC
P DClower
2600 1000 P DClower
2000 P DCupper P DCupper
P DC P DC
800
1500 2400
600
1000
2200
500 400
0 2000 200
GL1 GL2 GL3 GL4 GL5 GL6 GL1 GL2 GL3 GL4 GL5 GL6 GL1 GL2 GL3 GL4 GL5 GL6
Granularity Granularity Granularity
502
Electronics 2022, 11, 3801
7. Conclusions
In MGRS, optimistic and pessimistic upper/lower approximation boundaries are
utilized to characterize uncertain concepts. They still cannot take advantage of the known
equivalence classes to establish the approximation of an uncertain concept. To handle the
problem, cost-sensitive multigranulation approximations of rough sets were constructed.
Furthermore, an optimization mechanism of the multigranulation approximations is pro-
posed, which selects the optimal approximation to obtain the minimum misclassification
costs under the conditions. The case study shows that the proposed algorithm is capa-
ble of searching for a rational approximation under restraints. Finally, the experiments
demonstrate that the multigranulation approximations possess the least misclassification
costs. In particular, our models apply to the decision-making environment where each
decision-maker is independent. Moreover, our models are useful for extracting decision
rules from distributive information systems and groups of intelligent agents through rough
set approaches [34,36]. Figure 7 presents a diagram that summarizes the works conducted
in this paper. Herein, we present the process of the cost-sensitive multigranulation approxi-
mations of rough sets; according to different granulation mechanisms, our approach can
be extended to uncertainty models, i.e., vague sets, shadow sets, and neighborhood rough
sets. These results will be important to contribute to the progress of the GrC theory.
Our future work will focus on the following two aspects: (1) We hope to build a more
reasonable three-way decision model based on our model from optimistic and pessimistic
perspectives; (2) we wish to combine the model with the cloud model theory to extend
our model to construct a multigranulation approximation model with bidirectional cog-
nitive computing. This will offer more cognitive advantages and benefits in application
fields with uncertainty from multiple perspectives, i.e., image segmentation, clustering,
and recommendation systems.
Author Contributions: Conceptualization, J.Y.; methodology, J.Y., J.K. and Q.L.; writing—original
draft, J.Y. and J.K.; writing—review and editing, J.Y., J.K., Q.L. and Y.L.; data curation, J.Y., Q.L. and
Y.L.; supervision, Y.L. All authors have read and agreed to the published version of the manuscript.
Funding: This work is supported by the National Science Foundation of China (no. 6206049), Ex-
cellent Young Scientific and Technological Talents Foundation of Guizhou Province (QKH-platform
talent (2021) no. 5627), the Key Cooperation Project of Chongqing Municipal Education Commission
(HZ2021008), Guizhou Provincial Science and Technology Project (QKH-ZK (2021) General 332), Science
and Technology Top Talent Project of Guizhou Education Department (QJJ2022(088)), Key Laboratory
of Evolutionary Artificial Intelligence in Guizhou (QJJ[2022] No. 059) and the Key Talens Program in
digital economy of Guizhou Province, Electronic Manufacturing Industry University Research Base of
Ordinary Colleges and Universities in Guizhou Province (QJH-KY Zi (2014) no. 230-2).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
503
Electronics 2022, 11, 3801
Acknowledgments: This study was mainly completed at the Chongqing Key Laboratory of Compu-
tational Intelligence, Chongqing University of Posts and Telecommunications, and the authors would
like to thank the laboratory for its assistance.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Zadeh, L.A. Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets
Syst. 1997, 90, 111–127. [CrossRef]
2. Bello, M.; Nápoles, G.; Vanhoof, K.; Bello, R. Data quality measures based on granular computing for multi-label classification.
Inf. Sci. 2021, 560, 51–67. [CrossRef]
3. Pedrycz, W.; Chen, S. Interpretable Artificial Intelligence: A Perspective of Granular Computing; Springer Nature: Berlin/Heidelberg,
Germany, 2021; Volume 937.
4. Li, J.; Mei, C.; Xu, W.; Qian, Y. Concept learning via granular computing: A cognitive viewpoint. IEEE Trans. Fuzzy Syst. 2015,
298, 447–467. [CrossRef] [PubMed]
5. Zadeh, L. Fuzzy sets. Inf. Control. 1965, 8, 338–353. [CrossRef]
6. Pawlak, Z. Rough sets. Int. J. Comput. Inf. Sci. 1982, 11, 341–356. [CrossRef]
7. Zhang, L.; Zhang, B. The quotient space theory of problem solving. In Proceedings of the International Workshop on Rough
Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing, Chongqing, China, 26–29 May 2003; pp. 11–15.
8. Li, D.Y.; Meng, H.J.; Shi, X.M. Membership clouds and membership cloud generators. J. Comput. Res. Dev. 1995, 32, 15–20.
9. Colas-Marquez, R.; Mahfouf, M. Data Mining and Modelling of Charpy Impact Energy for Alloy Steels Using Fuzzy Rough Sets.
IFAC-Pap. 2017, 50, 14970–14975. [CrossRef]
10. Hasegawa, K.; Koyama, M.; Arakawa, M.; Funatsu, K. Application of data mining to quantitative structure-activity relationship
using rough set theory. Chemom. Intell. Lab. Syst. 2009, 99, 66–70. [CrossRef]
11. Santra, D.; Basu, S.K.; Mandal, J.K.; Goswami, S. Rough set based lattice structure for knowledge representation in medical expert
systems: Low back pain management case study. Expert Syst. Appl. 2020, 145, 113084. [CrossRef]
12. Chebrolu, S.; Sanjeevi, S.G. Attribute Reduction in Decision-Theoretic Rough Set Model using Particle Swarm Optimization with
the Threshold Parameters Determined using LMS Training Rule. Procedia Comput. Sci. 2015, 57, 527–536. [CrossRef]
13. Abdolrazzagh-Nezhad, M.; Radgohar, H.; Salimian, S.N. Enhanced cultural algorithm to solve multi-objective attribute reduction
based on rough set theory. Math. Comput. Simul. 2020, 170, 332–350. [CrossRef]
14. Beaubier, S.; Defaix, C.; Albe-Slabi, S.; Aymes, A.; Galet, O.; Fournier, F.; Kapel, R. Multiobjective decision making strategy
for selective albumin extraction from a rapeseed cold-pressed meal based on Rough Set approach. Food Bioprod. Process. 2022,
133, 34–44. [CrossRef]
15. Landowski, M.; Landowska, A. Usage of the rough set theory for generating decision rules of number of traffic vehicles. Transp.
Res. Procedia 2019, 39, 260–269. [CrossRef]
16. Tawhid, M.; Ibrahim, A. Feature selection based on rough set approach, wrapper approach, and binary whale optimization
algorithm. Int. J. Mach. Learn. Cybern. 2020, 11, 573–602. [CrossRef]
17. Zhang, Q.H.; Wang, G.Y.; Yu, X. Approximation sets of rough sets. J. Softw. 2012, 23, 1745–1759. [CrossRef]
18. Zhang, Q.H.; Wang, J.; Wang, G.Y. The approximate representation of rough-fuzzy sets. Chin. J. Comput. Jisuanji Xuebao 2015,
38, 1484–1496.
19. Zhang, Q.; Wang, J.; Wang, G.; Yu, H. The approximation set of a vague set in rough approximation space. Inf. Sci. 2015, 300, 1–19.
[CrossRef]
20. Zhang, Q.H.; Zhang, P.; Wang, G.Y. Research on approximation set of rough set based on fuzzy similarity. J. Intell. Fuzzy Syst.
2017, 32, 2549–2562. [CrossRef]
21. Zhang, Q.H.; Yang, J.J.; Yao, L.Y. Attribute reduction based on rough approximation set in algebra and information views. IEEE
Access 2016, 4, 5399–5407. [CrossRef]
22. Yao, L.Y.; Zhang, Q.H.; Hu, S.P.; Zhang, Q. Rough entropy for image segmentation based on approximation sets and particle
swarm optimization. J. Front. Comput. Sci. Technol. 2016, 10, 699–708.
23. Zhang, Q.H.; Liu, K.X.; Gao, M. Approximation sets of rough sets and granularity optimization algorithm based on cost-sensitive.
J. Control. Decis. 2020, 35, 2070–2080.
24. Yang, J.; Yuan, L.; Luo, T. Approximation set of rough fuzzy set based on misclassification cost. J. Chongqing Univ. Posts
Telecommun. (Nat. Sci. Ed.) 2021, 33, 780–791.
25. Yang, J.; Luo, T.; Zeng, L.J.; Jin, X. The cost-sensitive approximation of neighborhood rough sets and granular layer selection. J.
Intell. Fuzzy Syst. 2022, 42, 3993–4003. [CrossRef]
26. Siminski, K. 3WDNFS—Three-way decision neuro-fuzzy system for classification. Fuzzy Sets Syst. 2022, in press. [CrossRef]
27. Subhashini, L.; Li, Y.; Zhang, J.; Atukorale, A.S. Assessing the effectiveness of a three-way decision-making framework with
multiple features in simulating human judgement of opinion classification. Inf. Process. Manag. 2022, 59, 102823. [CrossRef]
28. Subhashini, L.; Li, Y.; Zhang, J.; Atukorale, A.S. Integration of semantic patterns and fuzzy concepts to reduce the boundary
region in three-way decision-making. Inf. Sci. 2022, 595, 257–277. [CrossRef]
504
Electronics 2022, 11, 3801
29. Mondal, A.; Roy, S.K.; Pamucar, D. Regret-based three-way decision making with possibility dominance and SPA theory in
incomplete information system. Expert Syst. Appl. 2023, 211, 118688. [CrossRef]
30. Yao, Y.Y. Symbols-Meaning-Value (SMV) space as a basis for a conceptual model of data science. Int. J. Approx. Reason. 2022,
144, 113–128. [CrossRef]
31. Qian, Y.H.; Liang, J.Y.; Dang, C.Y. Incomplete multigranulation rough set. IEEE Trans. Syst. Man-Cybern.-Part Syst. Humans 2009,
40, 420–431. [CrossRef]
32. Huang, B.; Guo, C.X.; Zhuang, Y.L.; Li, H.X.; Zhou, X.Z. Intuitionistic fuzzy multigranulation rough sets. Inf. Sci. 2014,
277, 299–320. [CrossRef]
33. Li, F.J.; Qian, Y.H.; Wang, J.T.; Liang, J. Multigranulation information fusion: A Dempster-Shafer evidence theory-based clustering
ensemble method. Inf. Sci. 2017, 378, 389–409. [CrossRef]
34. Liu, X.; Qian, Y.H.; Liang, J.Y. A rule-extraction framework under multigranulation rough sets. Int. J. Mach. Learn. Cybern. 2014,
5, 319–326. [CrossRef]
35. Liu, K.; Li, T.; Yang, X.; Ju, H.; Yang, X.; Liu, D. Hierarchical neighborhood entropy based multi-granularity attribute reduction
with application to gene prioritization. Int. J. Approx. Reason. 2022, 148, 57–67. [CrossRef]
36. Qian, Y.H.; Liang, X.Y.; Lin, G.P.; Guo, Q.; Liang, J. Local multigranulation decision-theoretic rough sets. Int. J. Approx. Reason.
2017, 82, 119–137. [CrossRef]
37. Qian, Y.H.; Zhang, H.; Sang, Y.L.; Liang, J. Multigranulation decision-theoretic rough sets. Int. J. Approx. Reason. 2014, 55, 225–237.
[CrossRef]
38. Xu, W.; Yuan, K.; Li, W. Dynamic updating approximations of local generalized multigranulation neighborhood rough set. Appl.
Intell. 2022, 52, 9148–9173. [CrossRef]
39. Sun, L.; Wang, L.; Ding, W.; Qian, Y.; Xu, J. Feature selection using fuzzy neighborhood entropy-based uncertainty measures for
fuzzy neighborhood multigranulation rough sets. IEEE Trans. Fuzzy Syst. 2020, 29, 19–33. [CrossRef]
40. She, Y.H.; He, X.L.; Shi, H.X.; Qian, Y. A multiple-valued logic approach for multigranulation rough set model. Int. J. Approx.
Reason. 2017, 82, 270–284. [CrossRef]
41. Li, W.; Xu, W.; Zhang, X.Y.; Zhang, J. Updating approximations with dynamic objects based on local multigranulation rough sets
in ordered information systems. Artif. Intell. Rev. 2021, 55, 1821–1855. [CrossRef]
42. Zhang, C.; Li, D.; Zhai, Y.; Yang, Y. Multigranulation rough set model in hesitant fuzzy information systems and its application in
person-job fit. Int. J. Mach. Learn. Cybern. 2019, 10, 717–729. [CrossRef]
43. Hu, C.; Zhang, L.; Wang, B.; Zhang, Z.; Li, F. Incremental updating knowledge in neighborhood multigranulation rough sets
under dynamic granular structures. Knowl.-Based Syst. 2019, 163, 811–829. [CrossRef]
44. Hu, C.; Zhang, L. Dynamic dominance-based multigranulation rough sets approaches with evolving ordered data. Int. J. Mach.
Learn. Cybern. 2021, 12, 17–38. [CrossRef]
505
electronics
Article
Relative Knowledge Distance Measure of Intuitionistic
Fuzzy Concept
Jie Yang 1,2, *, Xiaodan Qin 1 , Guoyin Wang 1 , Xiaoxia Zhang 1 and Baoli Wang 3
Abstract: Knowledge distance is used to measure the difference between granular spaces, which
is an uncertainty measure with strong distinguishing ability in a rough set. However, the current
knowledge distance failed to take the relative difference between granular spaces into account
under the given perspective of uncertain concepts. To solve this problem, this paper studies the
relative knowledge distance of intuitionistic fuzzy concept (IFC). Firstly, a micro-knowledge distance
(md) based on information entropy is proposed to measure the difference between intuitionistic
fuzzy information granules. Then, based on md, a macro-knowledge distance (MD) with strong
distinguishing ability is further constructed, and it is revealed the rule that MD is monotonic with the
granularity being finer in multi-granularity spaces. Furthermore, the relative MD is further proposed
to analyze the relative differences between different granular spaces from multiple perspectives.
Finally, the effectiveness of relative MD is verified by relevant experiments. According to these
experiments, the relative MD has successfully measured the differences in granular space from
multiple perspectives. Compared with other attribute reduction algorithms, the number of subsets
after reduction by our algorithm is in the middle, and the mean-square error value is appropriate.
Keywords: intuitionistic fuzzy concept; rough set; multi-granularity; relative macro-knowledge distance
the concept of knowledge distance, and there have been several works on knowledge dis-
tance in recent years. Li [16] proposed an interval-valued intuitionistic fuzzy set to describe
fuzzy granular structure distance, and proved that knowledge distance is a special form of
intuitionistic fuzzy granular structure distance. Yang [17,18] proposed a partition-based
knowledge distance based on the Earth Mover’s Distance and further established the fuzzy
knowledge distance. Chen [19] presented a new measure formula of knowledge distance
by using Jaccard distance to replace set similarity. To measure the uncertainty derived from
the disparities between local upper and lower approximation sets, Xia [20] introduced the
local knowledge distance.
In practical applications, the target concept may be vague or uncertain. As a classical
soft computing tool, the intuitionistic fuzzy set [21] extends the membership from single
value to interval value. For uncertain information, the intuitionistic fuzzy set has more pow-
erful ability than the fuzzy set [22], and it is currently extensively applied in different fields,
i.e., decision-making [23–25], pattern recognition [26,27], control and reasoning [28,29],
and fuzzy reasoning [30,31]. In rough set, an intuitionistic fuzzy concept (IFC) can be
characterized by a pair of lower and upper approximation fuzzy sets. There are many
research works [32–37] on the combination between rough set and the intuitionistic fuzzy
set. In particular, the uncertainty measure of IFC in granular spaces becomes a basic
issue. A novel concept of an intuitionistic fuzzy rough set based on two universes was
proposed by Zhang [32] along with a specification of the associated operators. On the basis
of the rough set, Dubey [35] presented an intuitionistic fuzzy c-means clustering algorithm
and applied it to the segmentation of the magnetic resonance brain images. Zheng [36]
proposed an improved roughness method to measure the uncertainty of covering-based
rough intuitionistic fuzzy sets. These works indicate that intuitionistic fuzzy set and rough
set are suitable mathematical methods for studying vagueness and uncertainty. Current
uncertainty measures failed to distinguish different rough granular spaces with the same
uncertainty when they are used to describe an IFC; that is, it is difficult to reflect on the
differences between them. However, in some situations, such as attribute reduction or gran-
ularity selection, the different rough granular spaces for describing an IFC are necessary to
distinguish. To solve this problem, based on our previous works [17,18], two-layer knowl-
edge distance measures—that is, micro-knowledge distance (md) and macro-knowledge
distance (MD)—are constructed to reflect the difference between granular spaces for de-
scribing an IFC. Finally, in order to analyze the relative differences between rough granular
spaces under certain prior granular spaces, the concept of relative MD applied to data
analysis is also proposed.
The following are the main contributions of our paper: (1) Based on information
entropy, md is designed to measure the difference among intuitionistic fuzzy information
granules. (2) On the basis of md, MD with strong distinguishing ability is further con-
structed, which can calculate the difference between rough granular spaces for describing
an IFC. (3) The relative MD is proposed to analyze the relative difference between two rough
granular spaces from multiple perspectives. (4) An algorithm of attribute reduction based
on MD or relative MD is presented, and its effectiveness is verified by relevant experiments.
The rest of this paper is arranged as follows. Section 2 introduces related preliminary
concepts. In Section 3, the two types of information entropy-based distance measure
(md and MD) are presented. Section 4 presents the concept of relative MD. The relevant
experiments are reported in Section 5. Finally, in Section 6, conclusions are formed.
2. Preliminaries
This part will go through some of the core concepts. Let S = (U, C ∪ D, V, f ) be an
information system, where U, C, D and V represent the universe of discourse, condition
attribute set, decision attribute set and attribute value set corresponding to each object,
respectively, and f : U × C is an information function that specifies the property value of
each object x in U.
508
Electronics 2022, 11, 3373
Definition 1 (Intuitionistic fuzzy set). Assume that U is the universe of discourse, the following
is the definition of an intuitionistic fuzzy set I on U:
I = {< x, γ I ( x ), υ I ( x ) > | x ∈ U }
where γ I ( x ) and υ I ( x ) denote two nonempty finite sets on the interval [0, 1], which refer to the set
of degrees of membership and non-membership of x on I, respectively, and satisfy the conditions:
∀ xi ∈ U, 0 ≤ γ I ( xi ) + υ I ( xi ) ≤ 1.
Note: For convenience, all I below are represented as intuitionistic fuzzy sets on U.
Definition 2 (Average step intuitionistic fuzzy set [38]). Assume that in S = (U, C ∪ D ),
R ⊆ C and U/R = {[ x ] R } = {[ x ]1 , [ x ]2 , · · · , [ x ]l }, where ∀ x ∈ [ x ]i , i = 1, 2, · · · , l, then,
I R ( x ) = [γ I R ( x ), 1 − υ I R ( x )]
∑ γ IR ( x ) ∑ υ IR ( x )
x ∈[ x ] x ∈[ x ]i
where, γ I R ( x ) = i
|[ x ]i |
, υ I R (x) = |[ x ]i |
. I R ( x ) is therefore referred to as an average
step intuitionistic fuzzy set on U/R .
509
Electronics 2022, 11, 3373
Then
0.5
e ĪR ( x1 ) = e ĪR ( x2 ) = e ĪR ( x3 ) = −2 μlog2 μdμ = 0.209
0.3
0.8
e ĪR ( x4 ) = e ĪR ( x5 ) = −2 μlog2 μdμ = 0.062
0.7
By Formula (2),
Definition 3 (Distance measure [39]). Assume that U is the universe of discourse; Y, P and Q are
three finite sets on U. When d(·, ·) meets the following criteria, it is considered a distance measure,
(1) Positive: d( P, Q) ≥ 0;
(2) Symmetric: d( P, Q) = D ( Q, P);
(3) Triangle inequality: d(Y, P) + d( P, Q) ≥ d(Y, Q).
510
Electronics 2022, 11, 3373
By Formula (2),
It shows that calculating the average information entropy does not necessarily distin-
guish and describe two different rough granular spaces. Although the average information
entropy values of U/R1 and U/R2 are the same, U/R2 is superior to U/R1 in terms of gran-
ularity selection, since U/R2 has a coarser granularity and has a stronger generalization
ability for describing IFC.
Assume S = (U, C ∪ D ), A is a finite set on U. Then, we call the intuitionistic fuzzy set
generated by A as the intuitionistic fuzzy information granule (FIG), abbreviated as FIG A .
Proof of Theorem 1. Let IFGY , IFGP and IFGQ be three intuitionistic fuzzy information
granules. Let:
Because (Y ∪ P − Y ∩ P) + ( P ∪ Q − P ∩ Q) ≥ Y ∪ Q − Y ∩ Q, then a + b ≥ c.
511
Electronics 2022, 11, 3373
Then,
1− υ I ( x i )
e I ( x i ) = −2 μlog2 μdμ
γ I ( xi )
EI (x) = ∑ e I ( xi ) = 1.318
x i ∈U
From Definition 6,
Then,
Therefore, md(Y, P) ≤ md(Y, Q) holds. Similarly, it is easy to get md( P, Q) ≤ md(Y, Q).
512
Electronics 2022, 11, 3373
Then
md(Y, P) + md( P, Q)
∑ e FIGY ∪ FIGP ( xi ) − ∑ e FIGY ∩ FIGP ( xi ) ∑ e FIGP ∪ FIGQ ( xi ) − ∑ e FIGP ∩ FIGQ ( xi )
x i ∈U x i ∈U x i ∈U x i ∈U
= +
EI (x) EI (x)
∑ e FIGQ ( xi ) − ∑ e FIGY ( xi )
x i ∈U x i ∈U
=
EI (x)
= md(Y, Q)
Based on md, this research further created MD, which is formulated as follows, to ex-
press the difference between two rough granular spaces for characterizing an IFC.
U/R1 = {{ x1 , x2 }, { x3 }, { x4 , x5 }} = { g1 , g2 , g3 }
g1 = s R 1 ( x 1 ) = s R 1 ( x 2 ) = { x 1 , x 2 }
g2 = s R 1 ( x 3 ) = { x 3 }
g3 = s R 1 ( x 4 ) = s R 1 ( x 5 ) = { x 4 , x 5 }
513
Electronics 2022, 11, 3373
n m m l
1 1
MD (U/R1 , U/R2 ) + MD (U/R2 , U/R3 ) =
|U | ∑ ∑ mdij fij + |U | ∑ ∑ mdij fij
i =1 j =1 j =1 k =1
1
=
|U | ∑ (md(s R1 ( xi ), s R2 ( xi )) + md(s R2 ( xi ), s R3 ( xi )))
x i ∈U
n l
1 1
MD (U/R1 , U/R3 ) =
|U | ∑ ∑ mdij fij = |U | ∑ md(s R1 ( xi ), s R3 ( xi )) (4)
i =1 j =1 x i ∈U
1 1
|U | ∑ (md(s R1 ( xi ), s R2 ( xi )) + md(s R2 ( xi ), s R3 ( xi ))) ≥
|U | ∑ md(s R1 ( xi ), s R3 ( xi ))
x i ∈U x i ∈U
In fact, md measures the difference between two sets, and MD measures the difference
between two rough granular spaces, which integrates the md of all sets of the two granular
spaces. According to Theorem 1, Theorem 4 and Formula (4), as long as md in MD is a
distance measure, then MD is a distance measure.
1 3 2
|U | i∑ ∑ mdij fij
MD (U/R1 , U/R2 ) =
=1 j =1
md11 + md21 ∗ 2 + md32 ∗ 2
=
5
0.356 + 0.209 × 2 + 0 × 2
=
5 × 0.686
= 0.226
514
Electronics 2022, 11, 3373
1 n m
1
|U | i∑ ∑ mdij fij = |U | (md( g1 , g1 )g1 + md( g1 , g2 )g2 )
MD (U/R1 , U/R2 ) =
=1 j =1
n l
1
MD (U/R1 , U/R3 ) =
|U | ∑ ∑ mdij fij
i =1 j =1
1
= (md( g1 , g1 )g1 + md( g1 , g2 )g2 + md( g1 , g3 )g3 )
|U |
Because g1 = g1 ∪ g2 , g2 = g3
In this paper, the finest and coarsest granular spaces are represented by ω and σ,
respectively. The following corollaries derive from Theorem 5:
515
Electronics 2022, 11, 3373
Similarly,
1
MD (U/R2 , U/R3 ) = (md( g1 , g1 )g1 + md( g1 , g2 )g2 )
|U |
Because g1 = g1 ∪ g2 , g1 = g1 ∪ g2 and g2 = g3 .
According to Theorem 3,
|U |−1
Corollary 5. Assume that in S = (U, C ∪ D ), R ⊆ C. Then MD (U/R, ω ) + MD (U/R, σ)= |U |
.
516
Electronics 2022, 11, 3373
Proof of Corollary 5.
n |U |
1 1
MD (U/R, ω ) =
|U | ∑ ∑ mdij fij = |U | ∑ md(s R ( xi ), { xi })
i =1 j =1 x i ∈U
1 ∑ esR ( xi )−{ xi } ( x )
=
|U | ∑ EI (x)
x i ∈U
n 1
1 1
MD (U/R, σ) =
|U | ∑ ∑ mdij fij = |U | ∑ md(s R ( xi ), U )
i =1 j =1 x i ∈U
1 ∑ eCU sR ( xi ) ( x )
=
|U | ∑ EI (x)
x i ∈U
MD (U/R, ω ) + MD (U/R, σ)
1 ∑ esR ( xi )−{ xi } ( x ) 1 ∑ eCU sR ( xi ) ( x )
=
|U | ∑ EI (x)
+
|U | ∑ EI (x)
x i ∈U x i ∈U
1 ∑ eU −{ xi } ( x ) 1 E I ( x ) − e ( xi )
=
|U | ∑ EI (x)
=
|U | ∑ EI (x)
x i ∈U x i ∈U
1 |U | × E I ( x ) − E I ( x ) |U | − 1
= × =
|U | EI (x) |U |
|U |−1
Therefore, MD (U/R, ω ) + MD (U/R, σ) = |U |
holds.
From Corollary 3 and Theorem 6, for an IFC, the larger the granularity difference
between granular spaces in hierarchical granular structure, the larger MD between them.
From Corollary 4 and Theorem 7, for an IFC, the larger the information difference be-
tween granular spaces in hierarchical granular structure, the larger MD between them.
From Corollary 5, the larger the information measure, the smaller the granularity measure,
and one measure value can be deduced from another.
Note: By using a suitable md in Formula (3), the method of this paper is able to extend
to quantify the difference between any types of granular spaces. These specifics are outside
the scope of this paper’s discussion.
1
RMD ((U/R1 , U/R2 )/(U/R)) =
|U | ∑ md(s R1 /R ( xi ), s R2 /R ( xi )) (5)
x i ∈U
where, s R1 /R ( xi ) = s R1 ( xi ) ∩ s R ( xi ) and s R2 /R ( xi ) = s R2 ( xi ) ∩ s R ( xi ).
517
Electronics 2022, 11, 3373
Based on the original MD, this definition adds prior granular space U/R, which reflects
the relative differences between two rough granular spaces from different perspectives.
Proof of Theorem 9. Assume that in S = (U, C ∪ D ), U/R is the prior granular space on U.
R1 , R2 , R3 ⊆ C, U/R1 = { g1 , g2 , · · · , gn }, U/R2 = { g1 , g2 , · · · , gm } and
U/R3 = { g1 , g2 , · · · , gl } are three granular spaces induced by R1 , R2 andR3 , respectively.
Obviously, RMD (·, ·/·) is positive and symmetric.
1
RMD ((U/R1 , U/R3 )/(U/R)) =
|U | ∑ md(s R1 /R ( xi ), s R3 /R ( xi ))
x i ∈U
According to Theorem 1,
Then,
1 1
|U | ∑ md(s R1 /R ( xi ), s R2 /R ( xi )) +
|U | ∑ md(s R2 /R ( xi ), s R3 /R ( xi ))
x i ∈U x i ∈U
1
≥
|U | ∑ md(s R1 /R ( xi ), s R3 /R ( xi ))
x i ∈U
1
RMD ((U/R1 , U/R2 )/(U/R3 )) =
|U | ∑ md(s R1 /R3 ( xi ), s R2 /R3 ( xi )) = 0
x i ∈U
1
RMD ((U/R1 , U/R2 )/(U/R4 )) =
|U | ∑ md(s R1 /R4 ( xi ), s R2 /R4 ( xi ))
x i ∈U
1 0.175 0.209
= × (0 + + + 0 + 0) = 0.112
5 0.686 0.686
From Examples 5 and 6, after adding the prior granular space, the difference between
the two rough granular spaces may change, and when the prior granular space is different,
the obtained results may also be different.
Theorem 10. Assume that in S = (U, C ∪ D ), U/R is the prior granular space on U. If
R1 ⊆ R2 ⊆ R3 ⊆ C, then RMD ((U/R1 , U/R2 )/(U/R)) ≤ RMD ((U/R1 , U/R3 )/(U/R)).
518
Electronics 2022, 11, 3373
Proof of Theorem 10. According to the conditions, U/R3 ≺U/R2 ≺U/R1 , then
s R3 ( xi ) ⊆ s R2 ( xi ) ⊆ s R1 ( xi ), xi ⊆ U.
s R1 /R ( xi ) = s R1 ( xi ) ∩ s R ( xi )
s R2 /R ( xi ) = s R2 ( xi ) ∩ s R ( xi )
s R3 /R ( xi ) = s R3 ( xi ) ∩ s R ( xi )
So, s R3 /R ( xi ) ⊆ s R2 /R ( xi ) ⊆ s R1 /R ( xi ).
According to Theorem 2,
md(s R1 /R ( xi ), s R2 /R ( xi )) ≤ md(s R1 /R ( xi ), s R3 /R ( xi ))
1 1
|U | x∑ | x∑
md(s R1 /R ( xi ), s R2 /R ( xi )) ≤ md(s R1 /R ( xi ), s R3 /R ( xi ))
∈U | U ∈U
i i
Therefore, RMD ((U/R1 , U/R2 )/(U/R)) ≤ RMD ((U/R1 , U/R3 )/(U/R)) holds.
Similarly, it is easy to get
RMD ((U/R2 , U/R3 )/(U/R)) ≤ RMD ((U/R1 , U/R3 )/(U/R)).
Theorem 11. Assume that in S = (U, C ∪ D ), U/R is the prior granular space on U. If
R1 ⊆ R2 ⊆ R3 ⊆ C, then RMD ((U/R1 , U/R3 )/(U/R)) = RMD ((U/R1 , U/R2 )/(U/R))
+ RMD ((U/R2 , U/R3 )/(U/R)).
1
|U | ∑ md(s R1 /R ( xi ), s R3 /R ( xi ))
x i ∈U
1 1
=
|U | ∑ md(s R1 /R ( xi ), s R2 /R ( xi )) +
|U | ∑ md(s R2 /R ( xi ), s R3 /R ( xi ))
x i ∈U x i ∈U
Theorem 12. Assume that in S = (U, C ∪ D ), U/R3 and U/R4 are two prior granular spaces
on U, respectively, R1 , R2 ⊆ C. If U/R3 ≺U/R4 ,
then RMD ((U/R1 , U/R2 )/(U/R3 )) ≤ RMD ((U/R1 , U/R2 )/(U/R4 )).
519
Electronics 2022, 11, 3373
a= ∑ (∑ e(s R
1
( xi )∩s R3 ( xi ))∪(s R2 ( xi )∩s R3 ( xi )) ( x ) − ∑ e(s R
1
( xi )∩s R3 ( xi ))∩(s R2 ( xi )∩s R3 ( xi )) ( x ))
x i ∈U x ∈U x ∈U
= ∑ (∑ e((sR
1
( xi )∪s R2 ( xi ))∩s R3 ( xi )) ( x ) − ∑ e(s R
1
( xi )∩s R2 ( xi )∩s R3 ( xi )) ( x ))
x i ∈U x ∈U x ∈U
= ∑ ∑ e((sR
1
( xi )∪s R2 ( xi )−s R1 ( xi )∩s R2 ( xi ))∩s R3 ( xi )) ( x )
x i ∈U x ∈U
Similarly,
RMD ((U/R1 , U/R2 )/(U/R4 )) = ∑ ∑ e((sR ( xi )∪s R2 ( xi )−s R1 ( xi )∩s R2 ( xi ))∩s R4 ( xi )) ( x )
x i ∈U x ∈U 1
((s R1 ( xi ) ∪ s R2 ( xi ) − s R1 ( xi ) ∩ s R2 ( xi )) ∩ s R3 ( xi ))
⊆ ((s R1 ( xi ) ∪ s R2 ( xi ) − s R1 ( xi ) ∩ s R2 ( xi )) ∩ s R4 ( xi ))
∑ ∑ e((sR (xi )∪sR (xi )−sR (xi )∩sR (xi ))∩sR (xi )) (x)
1 2 1 2 3
x i ∈U x ∈U
≤ ∑ ∑ e((sR
1
( xi )∪s R2 ( xi )−s R1 ( xi )∩s R2 ( xi ))∩s R4 ( xi )) ( x )
x i ∈U x ∈U
Proof of Corollary 6.
1
RMD ((U/R1 , U/R2 )/σ ) =
|U | ∑ md(s R1 ( xi ) ∩ U, s R2 ( xi ) ∩ U )
x i ∈U
1
=
|U | ∑ md(s R1 ( xi ), s R2 ( xi )) = MD (U/R1 , U/R2 )
x i ∈U
Proof of Corollary 7.
1
RMD ((U/R1 , U/R2 )/ω ) =
|U | ∑ md(s R1 ( xi ) ∩ { xi }, s R2 ( xi ) ∩ { xi })
x i ∈U
1
=
|U | ∑ md({ xi }, { xi }) = 0
x i ∈U
Note: From Example 6, when the prior granular space is not the most refined, the rela-
tive MD may also be zero. Therefore, the prior granular space is the most refined granular
space, which is only a sufficient condition for the relative MD to be zero, not a neces-
sary condition.
According to Corollary 6, the absolute MD is the relative MD without any prior
granular space; that is, the absolute MD can be viewed as a special case of the relative MD.
By Corollary 7, when the prior granular space is fine enough, the relative MD between two
different rough granular spaces has been infinitely reduced or even to zero. Combining
520
Electronics 2022, 11, 3373
Theorem 12, it follows that RMD ((U/R1 , U/R2 )/ω )) ≤ RMD ((U/R1 , U/R2 )/(U/R)) ≤
RMD ((U/R1 , U/R2 )/σ ) is true when ω ≺ U/R≺ σ.
That is, 0 ≤ RMD ((U/R1 , U/R2 )/(U/R)) ≤ MD (U/R1 , U/R2 ).
1
+
|U | ∑ (md(s R1 ( xi ) ∩ s R2 ( xi ), s R2 ( xi ) ∩ s R2 ( xi )))
x i ∈U
1 1
=
|U | ∑ (md(s R1 ( xi ), s R2 ( xi ) ∩ s R1 ( xi ))) +
|U | ∑ (md(s R1 ( xi ) ∩ s R2 ( xi ), s R2 ( xi )))
x i ∈U x i ∈U
According to Theorem 3,
1 1
|U | ∑ (md(s R1 ( xi ), s R2 ( xi ) ∩ s R1 ( xi ))) +
|U | ∑ (md(s R1 ( xi ) ∩ s R2 ( xi ), s R2 ( xi )))
x i ∈U x i ∈U
1
=
|U | ∑ md(s R1 ( xi ), s R2 ( xi )) = MD (U/R1 , U/R2 )
x i ∈U
521
Electronics 2022, 11, 3373
Condition
ID Dataset Instances
Attributes
1 Hungarian Chickenpox Cases Dataset 521 19
Data from: Relative importance of chemical attractiveness
2 67 7
to parasites for susceptibility to trematode infection [49]
Waterlow score on admission in acutely admitted patients
3 839 11
aged 65 and over [50]
Data from: Salivary gland ultrasonography as a
4 70 10
predictor of clinical activity in Sjögren’s syndrome [51]
Data from: Development and validation of a postoperative
5 delirium prediction model for patients admitted to an 300 13
intensive care unit in China: a prospective study [52]
Data from: Age of first infection across a range of parasite
6 140 12
taxa in a wild mammalian population [53]
7 Air Quality 9538 10
8 Concrete 1030 8
9 ENB2012 768 8
522
Electronics 2022, 11, 3373
Figure 3. The change of MD between different granular spaces. Each dataset is represented by ID
number.
Some attributes of the dataset in Table 1 were selected in the experiment. Taking the
calculation of attribute reduction based on relative MD as an example, Algorithm 1 is the
algorithm used in the experiment. Attribute reduction based on absolute MD only needs
523
Electronics 2022, 11, 3373
to change the fourth step in Algorithm 1 to delete the first and last items in conT; that
is, without any prior conditions. In this paper, Algorithm 2 is used to represent attribute
reduction based on absolute MD. Suppose an information system S = (U, C ∪ D, V, f ),
then the calculation formula of attribute importance id is as follows:
524
Electronics 2022, 11, 3373
(Note: Figure 4 is only used to analyze the importance of the conditional attribute
of a single system, so there is no correlation between the height of the line graph of
different systems).
As shown in Table 3, in the attribute reduction based on absolute MD, ξ is the maxi-
mum absolute MD between the granular space divided by attribute subsets after attribute
reduction and the granular space divided by all attributes. In attribute reduction based
on relative MD, ξ is the maximum relative MD between the granular space divided by
attribute subsets after attribute reduction and the granular space divided by all attributes.
This paper sets ξ to 0.003 and 0.006 for comparison. In the table, numbers are directly used
to represent the serial numbers of the conditional attributes.
According to the analysis in Figure 4 and Table 3:
(1) When the prior conditions are more important attributes, the number of attributes
is significantly reduced compared to the attribute reduction based on absolute MD, which
shows that selecting more important properties increases the cognitive ability of the system,
consistent with Theorem 12.
(2) When the prior condition is an unimportant attribute, compared with the prior
condition is an important attribute, the number of subsets after attribute reduction is
usually more, which also indicates that the more important the prior condition is, the more
cognitive ability of the attribute to the system can be improved.
(3) When ξ is different—that is, the maximum MD between the granular space remains
conditionally divided after attribute reduction and the granular space divided without
reduction changes—the subsets after attribute reduction may be different, which illustrates
the efficiency of this algorithm. The algorithm will obtain different attribute subsets as the
requirements increase and decrease.
(4) The reduced attributes are all attributes with low attribute importance, which
shows the effectiveness of this algorithm in calculating attribute importance.
525
Electronics 2022, 11, 3373
Table 3. Cont.
526
Electronics 2022, 11, 3373
different regression models (random forest regression, decision tree regression and GBDT
regularization) after the normalization of the nine datasets. In the figure, the prior condition
of relative MD1 is the least important attribute, and the prior condition of relative MD2
is the most important attribute. After attribute reduction, we discover that the mean-
squared error does not significantly change, and sometimes even decreases. This shows the
feasibility of our algorithm and also shows that the algorithm can be effectively used in
data analysis.
Average number of subsets after attribute reduction
12 12
10 10
8 8
6 6
4 4
2 2
0 0
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
ID (dataset) ID (dataset)
Figure 5. Average number of attribute subsets after five different attribute reductions.
Figure 6. Average value of the mean-square error on each dataset. Each dataset is represented by
ID number.
This paper also conducted a series of comparative experiments using the five algo-
rithms mentioned above to compare the mean-square error values following the reduction
of five different attributes. The experimental results are shown in Figure 7. In order to unify
the standard, ξ = 0.003 is used. Except for the datasets with ID 8 and ID 9, the mean-square
error values obtained by our algorithm are in the middle. From Figure 6, after attribute
reduction of dataset ID 8 and dataset ID 9, the mean-square error does not change much.
Therefore, the reason for this result is that the correlation between some attributes of the
527
Electronics 2022, 11, 3373
dataset itself and the decision attributes is too large or too small. There is still room for
improvement in this regard.
Mean-square error
Mean-square error
Mean-square error
Algorithm 3 Algorithm 3 Algorithm 3
Algorithm 4 Algorithm 4 Algorithm 4
Algorithm 5 Algorithm 5 Algorithm 5
Mean-square error
Mean-square error
Algorithm 3 Algorithm 3 Algorithm 3
Algorithm 4 Algorithm 4 Algorithm 4
Algorithm 5 Algorithm 5 Algorithm 5
Mean-square error
Mean-square error
Algorithm 3 Algorithm 3 Algorithm 3
Algorithm 4 Algorithm 4 Algorithm 4
Algorithm 5 Algorithm 5 Algorithm 5
Figure 7. Average value of the mean-square error of five different attribute reductions. Each dataset
is represented by ID number.
528
Electronics 2022, 11, 3373
Author Contributions: Conceptualization, J.Y. and X.Z.; methodology, J.Y., X.Q. and X.Z.; writing—
original draft, J.Y. and X.Q; writing—review and editing, J.Y., X.Q., G.W. and B.W.; data curation,
G.W., X.Z. and B.W.; supervision, X.Z. All authors have read and agreed to the published version of
the manuscript.
Funding: This work is supported by the National Science Foundation of China (No. 62066049), Excel-
lent Young Scientific and Technological Talents Foundation of Guizhou Province (QKH-platform tal-
ent[2021] No. 5627), National Science Foundation of Chongqing (cstc2021ycjh-bgzxm0013), Guizhou
Provincial Science and Technology Project (QKH-ZK [2021]General 332), Science Foundation of
Guizhou Provincial Education Department (QJJ2022[088]), The Applied Basic Research Program of
Shanxi Province (No. 201901D211462), Electronic Manufacturing Industry-University-Research Base
of Ordinary Colleges and Universities in Guizhou Province (QJH-KY-Zi [2014] No. 230-2).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Acknowledgments: This study was completed at the Chongqing Key Laboratory of Computational
Intelligence, Chongqing University of Posts and Telecommunications, and the authors would like to
thank the laboratory for its assistance.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Yao, Y.Y. The art of granular computing. In Proceedings of the International Conference on Rough Sets and Intelligent Systems
Paradigms, Warsaw, Poland, 28–30 June 2007; pp. 101–112.
2. Bargiela, A.; Pedrycz, W. Toward a theory of granular computing for human-centered information processing. IEEE Trans. Fuzzy
Syst. 2008, 16, 320–330. [CrossRef]
3. Yao, J.T.; Vasilakos, A.V.; Pedrycz, W. Granular computing: Perspectives and challenges. IEEE Trans. Cybern. 2013, 43, 1977–1989.
[CrossRef] [PubMed]
4. Yao, Y.Y. Granular computing: Basic issues and possible solutions. In Proceedings of the 5th Joint Conference on Information
Sciences, Atlantic City, NJ, USA, 27 February–3 March 2000; Volume 1, pp. 186–189.
5. Yao, Y.Y. Set-theoretic models of three-way decision. Granul. Comput. 2021, 6, 133–148. [CrossRef]
6. Yao, Y.Y. Tri-level thinking: Models of three-way decision. Int. J. Mach. Learn. Cybern. 2020, 11, 947–959. [CrossRef]
7. Wang, G.Y.; Yang, J.; Xu, J. Granular computing: From granularity optimization to multi-granularity joint problem solving.
Granul. Comput. 2017, 2, 105–120. [CrossRef]
8. Wang, G.Y. DGCC: Data-driven granular cognitive computing. Granul. Comput. 2017, 2, 343–355. [CrossRef]
9. Pawlak, Z. Rough sets. Int. J. Comput. Inf. Sci. 1982, 11, 341–356. [CrossRef]
10. Wang, C.Z.; Huang, Y.; Shao, M.W.; Hu, Q.H.; Chen, D.G. Feature selection based on neighborhood self-information. IEEE Trans.
Cybern. 2019, 50, 4031–4042. [CrossRef]
11. Li, Z.W.; Zhang, P.F.; Ge, X.; Xie, N.X.; Zhang, G.Q. Uncertainty measurement for a covering information system. Soft Comput.
2019, 23, 5307–5325. [CrossRef]
12. Sun, L.; Wang, L.Y.; Ding, W.P.; Qian, Y.H.; Xu, J.C. Feature selection using fuzzy neighborhood entropy-based uncertainty
measures for fuzzy neighborhood multigranulation rough sets. IEEE Trans. Fuzzy Syst. 2020, 29, 19–33. [CrossRef]
13. Wang, Z.H.; Yue, H.F.; Deng, J.P. An uncertainty measure based on lower and upper approximations for generalized rough set
models. Fundam. Informaticae 2019, 166, 273–296. [CrossRef]
14. Qian, Y.H.; Liang, J.Y.; Dang, C.Y. Knowledge structure, knowledge granulation and knowledge distance in a knowledge base.
Int. J. Approx. Reason. 2009, 50, 174–188. [CrossRef]
15. Qian, Y.H.; Li, Y.B.; Liang, J.Y.; Lin, G.P.; Dang, C.Y. Fuzzy granular structure distance. IEEE Trans. Fuzzy Syst. 2015, 23, 2245–2259.
[CrossRef]
16. Li, S.; Yang, J.; Wang, G.Y.; Xu, T.H. Multi-granularity distance measure for interval-valued intuitionistic fuzzy concepts. Inf. Sci.
2021, 570, 599–622. [CrossRef]
17. Yang, J.; Wang, G.Y.; Zhang, Q.H.; Wang, H.M. Knowledge distance measure for the multigranularity rough approximations of a
fuzzy concept. IEEE Trans. Fuzzy Syst. 2020, 28, 706–717. [CrossRef]
18. Yang, J.; Wang, G.Y.; Zhang, Q.H. Knowledge distance measure in multigranulation spaces of fuzzy equivalence relations. Inf.
Sci. 2018, 448, 18–35. [CrossRef]
529
Electronics 2022, 11, 3373
19. Chen, Y.M.; Qin, N.; Li, W.; Xu, F.F. Granule structures, distances and measures in neighborhood systems. Knowl.-Based Syst.
2019, 165, 268–281. [CrossRef]
20. Xia, D.Y.; Wang, G.Y.; Yang, J.; Zhang, Q.H.; Li, S. Local Knowledge Distance for Rough Approximation Measure in Multi-
granularity Spaces. Inf. Sci. 2022, 605, 413-432. [CrossRef]
21. Atanassov, K.T. Intuitionistic fuzzy sets. Fuzzy Sets Syst. 1986, 20, 87–96. [CrossRef]
22. Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [CrossRef]
23. Yang, C.C.; Zhang, Q.H.; Zhao, F. Hierarchical three-way decisions with intuitionistic fuzzy numbers in multi-granularity spaces.
IEEE Access 2019, 7, 24362–24375. [CrossRef]
24. Zhang, Q.H.; Yang, C.C.; Wang, G.Y. A sequential three-way decision model with intuitionistic fuzzy numbers. IEEE Trans. Syst.
Man, Cybern. Syst. 2019, 51, 2640–2652. [CrossRef]
25. Boran, F.E.; Genç, S.; Kurt, M.; Akay, D. A multi-criteria intuitionistic fuzzy group decision making for supplier selection with
TOPSIS method. Expert Syst. Appl. 2009, 36, 11363–11368. [CrossRef]
26. Garg, H.; Rani, D. Novel similarity measure based on the transformed right-angled triangles between intuitionistic fuzzy sets
and its applications. Cogn. Comput. 2021, 13, 447–465. [CrossRef]
27. Liu, H.C.; You, J.X.; Duan, C.Y. An integrated approach for failure mode and effect analysis under interval-valued intuitionistic
fuzzy environment. Int. J. Prod. Econ. 2019, 207, 163–172. [CrossRef]
28. Akram, M.; Shahzad, S.; Butt, A.; Khaliq, A. Intuitionistic fuzzy logic control for heater fans. Math. Comput. Sci. 2013, 7, 367–378.
[CrossRef]
29. Atan, Ö.; Kutlu, F.; Castillo, O. Intuitionistic Fuzzy Sliding Controller for Uncertain Hyperchaotic Synchronization. Int. J. Fuzzy
Syst. 2020, 22, 1430–1443. [CrossRef]
30. Debnath, P.; Mohiuddine, S. Soft Computing Techniques in Engineering, Health, Mathematical and Social Sciences; CRC Press:
Boca Raton, FL, USA, 2021.
31. Mordeso, J.N.; Nair, P.S. Fuzzy Mathematics: An Introduction for Engineers and Scientists; Physica Verlag: Heidelberg, Germany, 2001.
32. Zhang, X.H.; Zhou, B.; Li, P. A general frame for intuitionistic fuzzy rough sets. Inf. Sci. 2012, 216, 34–49. [CrossRef]
33. Zhou, L.; Wu, W.Z. Characterization of rough set approximations in Atanassov intuitionistic fuzzy set theory. Comput. Math.
Appl. 2011, 62, 282–296. [CrossRef]
34. Jiang, Y.C.; Tang, Y.; Wang, J.; Tang, S.Q. Reasoning within intuitionistic fuzzy rough description logics. Inf. Sci. 2009,
179, 2362–2378. [CrossRef]
35. Dubey, Y.K.; Mushrif, M.M.; Mitra, K. Segmentation of brain MR images using rough set based intuitionistic fuzzy clustering.
Biocybern. Biomed. Eng. 2016, 36, 413–426. [CrossRef]
36. Zheng, T.T.; Zhang, M.Y.; Zheng, W.R.; Zhou, L.G. A new uncertainty measure of covering-based rough interval-valued
intuitionistic fuzzy sets. IEEE Access 2019, 7, 53213–53224. [CrossRef]
37. Huang, B.; Guo, C.X.; Li, H.X.; Feng, G.F.; Zhou, X.Z. Hierarchical structures and uncertainty measures for intuitionistic fuzzy
approximation space. Inf. Sci. 2016, 336, 92–114. [CrossRef]
38. Zhang, Q.H.; Wang, J.; Wang, G.Y.; Yu, H. The approximation set of a vague set in rough approximation space. Inf. Sci. 2015,
300, 1–19. [CrossRef]
39. Lawvere, F.W. Metric spaces, generalized logic, and closed categories. Rend. Del Semin. Matématico E Fis. Di Milano 1973,
43, 135–166. [CrossRef]
40. Liang, J.Y.; Chin, K.S.; Dang, C.Y.; Yam, R.C. A new method for measuring uncertainty and fuzziness in rough set theory. Int. J.
Gen. Syst. 2002, 31, 331–342. [CrossRef]
41. Yao, Y.Y.; Zhao, L.Q. A measurement theory view on the granularity of partitions. Inf. Sci. 2012, 213, 1–13. [CrossRef]
42. Du, W.S.; Hu, B.Q. Aggregation distance measure and its induced similarity measure between intuitionistic fuzzy sets. Pattern
Recognit. Lett. 2015, 60, 65–71. [CrossRef]
43. Du, W.S. Subtraction and division operations on intuitionistic fuzzy sets derived from the Hamming distance. Inf. Sci. 2021,
571, 206–224. [CrossRef]
44. Ju, F.; Yuan, Y.Z.; Yuan, Y.; Quan, W. A divergence-based distance measure for intuitionistic fuzzy sets and its application in the
decision-making of innovation management. IEEE Access 2019, 8, 1105–1117. [CrossRef]
45. Jiang, Q.; Jin, X.; Lee, S.J.; Yao, S.W. A new similarity/distance measure between intuitionistic fuzzy sets based on the transformed
isosceles triangles and its applications to pattern recognition. Expert Syst. Appl. 2019, 116, 439–453. [CrossRef]
46. Wang, T.; Wang, B.L.; Han, S.Q.; Lian, K.C.; Lin, G.P. Relative knowledge distance and its cognition characteristic description in
information systems. J. Bohai Univ. Sci. Ed. 2022, 43, 151–160.
47. UCI Repository. 2007. Available online: Http://archive.ics.uci.edu/ml/(accessed on 10 June 2022).
48. Li, F.; Hu, B.Q.; Wang, J. Stepwise optimal scale selection for multi-scale decision tables via attribute significance. Knowl.-Based
Syst. 2017, 129, 4–16. [CrossRef]
49. Langeloh, L.; Seppälä, O. Relative importance of chemical attractiveness to parasites for susceptibility to trematode infection.
Ecol. Evol. 2018, 8, 8921–8929. [CrossRef]
50. Wang, J.W. Waterlow score on admission in acutely admitted patients aged 65 and over. BMJ Open 2019, 9, e032347. [CrossRef]
51. Fidelix, T.; Czapkowski, A.; Azjen, S.; Andriolo, A.; Trevisani, V.F. Salivary gland ultrasonography as a predictor of clinical
activity in Sjögren’s syndrome. PLoS ONE 2017, 12, e0182287. [CrossRef]
530
Electronics 2022, 11, 3373
52. Xing, H.M.; Zhou, W.D.; Fan, Y.Y.; Wen, T.X.; Wang, X.H.; Chang, G.M. Development and validation of a postoperative delirium
prediction model for patients admitted to an intensive care unit in China: A prospective study. BMJ Open 2019, 9, e030733.
[CrossRef]
53. Combrink, L.; Glidden, C.K.; Beechler, B.R.; Charleston, B.; Koehler, A.V.; Sisson, D.; Gasser, R.B.; Jabbar, A.; Jolles, A.E. Age of
first infection across a range of parasite taxa in a wild mammalian population. Biol. Lett. 2020, 16, 20190811. [CrossRef]
531
MDPI
St. Alban-Anlage 66
4052 Basel
Switzerland
www.mdpi.com
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are
solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s).
MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from
any ideas, methods, instructions or products referred to in the content.
Academic Open
Access Publishing